What is the best way to search the internal text of OpenOffice.org's odt files?
January 29, 2011 3:51 PM   Subscribe

What is the best way to search the internal text of OpenOffice.org's odt files?

This is an ongoing problem for me. So far the best thing I've found is a program called DocSearcher. But it maintains its own index, which, since it only updates itself when I open the program (sometimes only once a month), and since I have tens of thousands of odts, is slow. The GUI is also terrible. Does anyone have any suggestions?

I use OO.o because I've built up a huge replacement table over the years that I enjoy being able to back up and port to new version and new computers, etc, which I don't think I can do with MS Word. I could just start saving everything in docs I guess, but I'd still have this massive backlog of files in odt format that I like to be able to search. If there's really nothing decent I might consider converting all my odts to docs so that the native Windows search function will search them--which, if that's the best option, could anyone recommend a mass-conversion utility for that?

I use WinXP because I use some old weird software/hardware that isn't compatible with Vista, though not sure about Win7.

Thanks.
posted by skwt to Computers & Internet (9 answers total)
 
Suggestion: open the program once a day? Maybe have it open as your computer is starting up...
posted by Night_owl at 3:56 PM on January 29, 2011


It looks like there's a ODT search plugin for Google Desktop supports odt files.
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011


Too bad there's no plugin for my broken english.
posted by Cat Pie Hurts at 3:59 PM on January 29, 2011


What does "internal text" mean? Is that some special subset of all the text, or do you just mean "text"?

If it was me, and since odt files are XML, I would try just using grep, and see if that proved adequate.
posted by tylerkaraszewski at 4:49 PM on January 29, 2011


Wait a minute, ODT files are just XML files, like tylerkaraszewski said. Why can't you use Windows Search to index them for you?

But if you can't get that working then old version of Copernic Desktop Search will do it for you.

If you're unconcerned with performance then you can get this grep-like utility for OpenOffice files as well, but I don't see how it's any different from grep.
posted by asymptotic at 5:39 PM on January 29, 2011


(sorry, I have trouble with being explicit and I wanted to do so now. Windows Search is a Microsoft application available for use on Windows XP. Microsoft also helpfully call the search functionality in Windows Vista and Windows "7 Windows Search".)
posted by asymptotic at 5:41 PM on January 29, 2011


ODT files can be either straight XML or a compressed group of XML files. OO spits out compressed (gzip), and zcat and zgrep won't handle the odt files properly. I'm not sure if WIndows Search has a handler for the compression. If you install cygwin, you can do something like this in bash:

for i in *.odt;
do unzip -ac $i | grep -l "text to search for";
done
posted by Cat Pie Hurts at 6:15 PM on January 29, 2011


An ODT file is a ZIP file which contains a content.xml file with the text of your document, and many other XML and binary files (eg images). You can search a single ODT file on Linux with unzip -lc filename.odt content.xml | grep XXX. No idea what the windows equivalent is.
posted by miyabo at 6:16 PM on January 29, 2011


Response by poster: Thanks everyone.

By internal text I meant, as opposed to the filename, which is currently all that the native Windows search function searches.

I'm on my phone right now and the "Windows Search" won't open properly for me. Is it different from the native Windows search function you get from Start button -> Search? Googling, it does seem to be a different program, but if it handled ODTs I'd think it would show up on searches/discussions for ODT searching? I'll look into it.

I don't have cygwin installed and don't normally do command line stuff, but I'll check out that German program.

Thanks everyone.
posted by skwt at 2:45 PM on January 30, 2011


« Older I'm thisclose to breaking my hip on my weekly...   |   Looking for Virtual Lincoln Logs Newer »
This thread is closed to new comments.