What is the best way to search the internal text of OpenOffice.org's odt files?
January 29, 2011 3:51 PM Subscribe
What is the best way to search the internal text of OpenOffice.org's odt files?
This is an ongoing problem for me. So far the best thing I've found is a program called DocSearcher. But it maintains its own index, which, since it only updates itself when I open the program (sometimes only once a month), and since I have tens of thousands of odts, is slow. The GUI is also terrible. Does anyone have any suggestions?
I use OO.o because I've built up a huge replacement table over the years that I enjoy being able to back up and port to new version and new computers, etc, which I don't think I can do with MS Word. I could just start saving everything in docs I guess, but I'd still have this massive backlog of files in odt format that I like to be able to search. If there's really nothing decent I might consider converting all my odts to docs so that the native Windows search function will search them--which, if that's the best option, could anyone recommend a mass-conversion utility for that?
I use WinXP because I use some old weird software/hardware that isn't compatible with Vista, though not sure about Win7.
Thanks.
This is an ongoing problem for me. So far the best thing I've found is a program called DocSearcher. But it maintains its own index, which, since it only updates itself when I open the program (sometimes only once a month), and since I have tens of thousands of odts, is slow. The GUI is also terrible. Does anyone have any suggestions?
I use OO.o because I've built up a huge replacement table over the years that I enjoy being able to back up and port to new version and new computers, etc, which I don't think I can do with MS Word. I could just start saving everything in docs I guess, but I'd still have this massive backlog of files in odt format that I like to be able to search. If there's really nothing decent I might consider converting all my odts to docs so that the native Windows search function will search them--which, if that's the best option, could anyone recommend a mass-conversion utility for that?
I use WinXP because I use some old weird software/hardware that isn't compatible with Vista, though not sure about Win7.
Thanks.
It looks like there's a ODT search plugin for Google Desktop supports odt files.
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011
Too bad there's no plugin for my broken english.
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011
What does "internal text" mean? Is that some special subset of all the text, or do you just mean "text"?
If it was me, and since odt files are XML, I would try just using grep, and see if that proved adequate.
posted by tylerkaraszewski at 4:49 PM on January 29, 2011
If it was me, and since odt files are XML, I would try just using grep, and see if that proved adequate.
posted by tylerkaraszewski at 4:49 PM on January 29, 2011
Wait a minute, ODT files are just XML files, like tylerkaraszewski said. Why can't you use Windows Search to index them for you?
But if you can't get that working then old version of Copernic Desktop Search will do it for you.
If you're unconcerned with performance then you can get this grep-like utility for OpenOffice files as well, but I don't see how it's any different from grep.
posted by asymptotic at 5:39 PM on January 29, 2011
But if you can't get that working then old version of Copernic Desktop Search will do it for you.
If you're unconcerned with performance then you can get this grep-like utility for OpenOffice files as well, but I don't see how it's any different from grep.
posted by asymptotic at 5:39 PM on January 29, 2011
(sorry, I have trouble with being explicit and I wanted to do so now. Windows Search is a Microsoft application available for use on Windows XP. Microsoft also helpfully call the search functionality in Windows Vista and Windows "7 Windows Search".)
posted by asymptotic at 5:41 PM on January 29, 2011
posted by asymptotic at 5:41 PM on January 29, 2011
ODT files can be either straight XML or a compressed group of XML files. OO spits out compressed (gzip), and zcat and zgrep won't handle the odt files properly. I'm not sure if WIndows Search has a handler for the compression. If you install cygwin, you can do something like this in bash:
for i in *.odt;
do unzip -ac $i | grep -l "text to search for";
done
posted by Cat Pie Hurts at 6:15 PM on January 29, 2011
for i in *.odt;
do unzip -ac $i | grep -l "text to search for";
done
posted by Cat Pie Hurts at 6:15 PM on January 29, 2011
An ODT file is a ZIP file which contains a content.xml file with the text of your document, and many other XML and binary files (eg images). You can search a single ODT file on Linux with
posted by miyabo at 6:16 PM on January 29, 2011
unzip -lc filename.odt content.xml | grep XXX
. No idea what the windows equivalent is.posted by miyabo at 6:16 PM on January 29, 2011
Response by poster: Thanks everyone.
By internal text I meant, as opposed to the filename, which is currently all that the native Windows search function searches.
I'm on my phone right now and the "Windows Search" won't open properly for me. Is it different from the native Windows search function you get from Start button -> Search? Googling, it does seem to be a different program, but if it handled ODTs I'd think it would show up on searches/discussions for ODT searching? I'll look into it.
I don't have cygwin installed and don't normally do command line stuff, but I'll check out that German program.
Thanks everyone.
posted by skwt at 2:45 PM on January 30, 2011
By internal text I meant, as opposed to the filename, which is currently all that the native Windows search function searches.
I'm on my phone right now and the "Windows Search" won't open properly for me. Is it different from the native Windows search function you get from Start button -> Search? Googling, it does seem to be a different program, but if it handled ODTs I'd think it would show up on searches/discussions for ODT searching? I'll look into it.
I don't have cygwin installed and don't normally do command line stuff, but I'll check out that German program.
Thanks everyone.
posted by skwt at 2:45 PM on January 30, 2011
This thread is closed to new comments.
posted by Night_owl at 3:56 PM on January 29, 2011