Indexing pagemaker files
November 3, 2005 2:03 AM Subscribe
We are currently using Pagemaker and moving to indesign, I have over 100 gigs of pagemaker files that I would like to index so I can find files by names in the files (we are business card manufacturers and have kept the files by date) and I would now like to find files by the mainlines of the cards. Does anyone know of a program that can create a searchable index. I was thinking of turning them all into searchable pdf but I am not sure how to do this or if it even exists.
I'm assuming "mainline" means the first text line on the business card? Normally the person's name?
Google Desktop will search within PDFs, according to the help pages. Whether the text will survive the transition to PDF unmauled... I'm not an expert.
posted by Leon at 3:24 AM on November 3, 2005
Google Desktop will search within PDFs, according to the help pages. Whether the text will survive the transition to PDF unmauled... I'm not an expert.
posted by Leon at 3:24 AM on November 3, 2005
Most PDF files created from desktop publishing or word processing programs will contain the text and the typographic information. This won't work if instead of a text mainline, you have a vector or raster graphic, or have converted the card to vector paths.
Spotlight and DevonThink on the Macintosh, and Google Desktop and Copernic Desktop on Windows are short and simple routes to the job (I like Copernic better myself, but YMMV.) However, these solutions are probably not portable between workstations.
For a webapp or client/server solution, you might look at swish-e or Lucene. But then you will need a pdf->text converter like pdftotext.
Depending on the route you wish to go, it might be easier to export to HTML as well. Then you can write a script to parse the html text and add the data to a relational database with fulltext indexing. If the html file has the same base filename, it should give you what you want.
posted by KirkJobSluder at 6:09 AM on November 3, 2005
Spotlight and DevonThink on the Macintosh, and Google Desktop and Copernic Desktop on Windows are short and simple routes to the job (I like Copernic better myself, but YMMV.) However, these solutions are probably not portable between workstations.
For a webapp or client/server solution, you might look at swish-e or Lucene. But then you will need a pdf->text converter like pdftotext.
Depending on the route you wish to go, it might be easier to export to HTML as well. Then you can write a script to parse the html text and add the data to a relational database with fulltext indexing. If the html file has the same base filename, it should give you what you want.
posted by KirkJobSluder at 6:09 AM on November 3, 2005
If you are using Macintosh, it's built-in Find (cmd-F) function will search content in files on your harddrive.
From your Finder>Find click on Add and select the folder that contains your Pagemaker documents. Make sure that it is the only file that is checkmarked. Then modify your search by creating a "Search for items whose: Content includes *****"
This will bring up the Pagemaker document and any PDFs that you have exported.
posted by UnclePlayground at 6:20 AM on November 3, 2005
From your Finder>Find click on Add and select the folder that contains your Pagemaker documents. Make sure that it is the only file that is checkmarked. Then modify your search by creating a "Search for items whose: Content includes *****"
This will bring up the Pagemaker document and any PDFs that you have exported.
posted by UnclePlayground at 6:20 AM on November 3, 2005
Are these PageMaker files already in PDF? Do you have a plan for that? Is that part of this question?
If you need to convert all your PageMaker files to PDF I recommend using an application called AdLib. (http://www.adlibsoftware.com/)
You can convert just about any file type to PDF, Multipage TIFF or single page TIFF using it. It can automatically OCR in like 100 languages too if you need it to. It preserves the original file name as well.
I use it all the time. It can be frustrating at times but generally is a very useful tool.
If you already have your files in PDF and need to know how to search them you have many options. Obviously there are the ones already mentioned here, Google, Yahoo and even MSN.
I have been looking into a program called DT Search. (http://www.dtsearch.com/)
I have heard positive things from some of my clients who are using it to search 100,000 plus page document productions.
Adobe Acrobat also has a built in search that will allow you to search across your entire PDF population. When you click on search you have the option to search only the PDF you are in or all PDFs in a specific directory. I've done this a few times and it does work but it is very slow.
If you have more questions feel free to email me.
posted by thefinned1 at 9:12 PM on November 3, 2005
If you need to convert all your PageMaker files to PDF I recommend using an application called AdLib. (http://www.adlibsoftware.com/)
You can convert just about any file type to PDF, Multipage TIFF or single page TIFF using it. It can automatically OCR in like 100 languages too if you need it to. It preserves the original file name as well.
I use it all the time. It can be frustrating at times but generally is a very useful tool.
If you already have your files in PDF and need to know how to search them you have many options. Obviously there are the ones already mentioned here, Google, Yahoo and even MSN.
I have been looking into a program called DT Search. (http://www.dtsearch.com/)
I have heard positive things from some of my clients who are using it to search 100,000 plus page document productions.
Adobe Acrobat also has a built in search that will allow you to search across your entire PDF population. When you click on search you have the option to search only the PDF you are in or all PDFs in a specific directory. I've done this a few times and it does work but it is very slow.
If you have more questions feel free to email me.
posted by thefinned1 at 9:12 PM on November 3, 2005
This thread is closed to new comments.
If on Mac, if I'm not mistaken Spotlight (the built in search engine of Tiger) should index Pagemaker files.
If you are on Windows, I don't really know. Perhaps something like Google/Yahoo/Msn/whatever search?
posted by lodev at 2:14 AM on November 3, 2005