Organizing and accessing a really really big collection of quotes
October 4, 2022 2:38 AM Subscribe
What are the user-friendly, not-overly-techy options for managing a very large data set of plain text snippets, with an eye on maintaining the long-term accessibility and integrity of said data set?
For many years now, I've been a collector of words and quotes. The pieces in my collection, which I also call 'fragments', run from a sentence to several paragraphs, or the length of a short article.
Currently, my collection exists in the format of plain text notes organized into several dozen categories (examples: quotes_general_a-c, quotes_nonfiction_[SUBJECT], quotes_fiction_[AUTHOR], ephemera, textposts, etc.) which are then organized alphabetically or chronologically within that note category. These notes are synchronized via simplenote, and backed up regularly in a variety of locations as plain text files. I use nvALT for Mac and the simplenote iOS app to interact with my collection. This system has worked pretty well in most regards for many years now, but as my collection has grown into several hundreds of thousands of words, navigation has become increasingly creaky and cumbersome, with nvALT crashing on the regular. More and more, it seems like the stuff I collect just disappears into my archives, never to be seen again. Making a practice in recent years to add tags to each new fragment and dividing my category notes into smaller and smaller sets has helped somewhat. Still, as my collection grows, this set-up's failures in terms of searchability, interconnectedness, and serendipity have only grown more acute, and have sent me looking for new options.
A complication in my search is that long-term accessibility of my collection is a major priority. I am extremely resistant to the thought of adding all my collection to some proprietary app, only to have the company that owns it shut down or change their business model at some point in the future. This has happened to several people who've asked similar questions here in the past. This is also why I've always been resistant to something like Evernote, or putting everything into an online blog (privacy is also a concern there).
Currently, I am dependent on simplenote, but if they shut down tomorrow, I'd still have everything I've collected in plain text. I'm not adverse to buying a program or subscribing to a service to help me manage all this data, but I need to be certain that no matter what companies or programs come and go, or whatever evolutions in technology occur, my collection will be fully accessible decades from now. If I do buy a program or subscribe to a service, it would need to have robust and stable backup/ export options for the long-term stability of my data.
I've thought about starting a private personal wiki, but am intimated by the technical side of setting it up, as I have little patience for any kind of coding nowadays. I've also thought about actually printing up my collection as a private series of self-published books. The resulting objects would be extremely delightful, and having a physical copy is good practice in terms of long-term accessibility, but the initial set-up would be a huge, time-consuming undertaking. Buying a cheap printer and putting everything on 5x8 cards could work too, but again, huge initial set-up time. That said, I'm not averse to a set-up that requires a large initial time commitment. I'm not averse to spending money on this either.
My ideal system would handle this very large set of data with grace, and make it easy and pleasurable to search and peruse by tag, content, or source. It would also have some mechanism to allow for random browsing. Is there a program or some other solution out there that can help me manage, organize, and interact with my collection more successfully?
For many years now, I've been a collector of words and quotes. The pieces in my collection, which I also call 'fragments', run from a sentence to several paragraphs, or the length of a short article.
Currently, my collection exists in the format of plain text notes organized into several dozen categories (examples: quotes_general_a-c, quotes_nonfiction_[SUBJECT], quotes_fiction_[AUTHOR], ephemera, textposts, etc.) which are then organized alphabetically or chronologically within that note category. These notes are synchronized via simplenote, and backed up regularly in a variety of locations as plain text files. I use nvALT for Mac and the simplenote iOS app to interact with my collection. This system has worked pretty well in most regards for many years now, but as my collection has grown into several hundreds of thousands of words, navigation has become increasingly creaky and cumbersome, with nvALT crashing on the regular. More and more, it seems like the stuff I collect just disappears into my archives, never to be seen again. Making a practice in recent years to add tags to each new fragment and dividing my category notes into smaller and smaller sets has helped somewhat. Still, as my collection grows, this set-up's failures in terms of searchability, interconnectedness, and serendipity have only grown more acute, and have sent me looking for new options.
A complication in my search is that long-term accessibility of my collection is a major priority. I am extremely resistant to the thought of adding all my collection to some proprietary app, only to have the company that owns it shut down or change their business model at some point in the future. This has happened to several people who've asked similar questions here in the past. This is also why I've always been resistant to something like Evernote, or putting everything into an online blog (privacy is also a concern there).
Currently, I am dependent on simplenote, but if they shut down tomorrow, I'd still have everything I've collected in plain text. I'm not adverse to buying a program or subscribing to a service to help me manage all this data, but I need to be certain that no matter what companies or programs come and go, or whatever evolutions in technology occur, my collection will be fully accessible decades from now. If I do buy a program or subscribe to a service, it would need to have robust and stable backup/ export options for the long-term stability of my data.
I've thought about starting a private personal wiki, but am intimated by the technical side of setting it up, as I have little patience for any kind of coding nowadays. I've also thought about actually printing up my collection as a private series of self-published books. The resulting objects would be extremely delightful, and having a physical copy is good practice in terms of long-term accessibility, but the initial set-up would be a huge, time-consuming undertaking. Buying a cheap printer and putting everything on 5x8 cards could work too, but again, huge initial set-up time. That said, I'm not averse to a set-up that requires a large initial time commitment. I'm not averse to spending money on this either.
My ideal system would handle this very large set of data with grace, and make it easy and pleasurable to search and peruse by tag, content, or source. It would also have some mechanism to allow for random browsing. Is there a program or some other solution out there that can help me manage, organize, and interact with my collection more successfully?
I have little patience for any kind of coding nowadays
That's the design constraint I'd personally choose to work on easing.
The classic Unix text processing tools exist for this kind of work and will laugh in the face of the size of any collection of text that could feasibly be curated by one human being over a lifetime. It doesn't take a huge amount of shell scripting to unlock their frankly enormous capabilities.
posted by flabdablet at 5:40 AM on October 4, 2022 [4 favorites]
That's the design constraint I'd personally choose to work on easing.
The classic Unix text processing tools exist for this kind of work and will laugh in the face of the size of any collection of text that could feasibly be curated by one human being over a lifetime. It doesn't take a huge amount of shell scripting to unlock their frankly enormous capabilities.
posted by flabdablet at 5:40 AM on October 4, 2022 [4 favorites]
I'll add that there are a lot of classic Unix test processing tools that can be strung together on the command line and don't require 'coding' per-se, just a little knowledge of command line options and how to pipe the output of one into another.
posted by RonButNotStupid at 5:59 AM on October 4, 2022 [1 favorite]
posted by RonButNotStupid at 5:59 AM on October 4, 2022 [1 favorite]
Have you explored the software available for writers to organize materials when writing a book?
Even though you may never actually produce a single work, I believe that there are tools for handling arbitrary pieces of text, while also managing metadata. Sort of the modern equivalent of the timeless "pile of index cards," plus search. :7)
I regret to say that I am not a writer who uses them, so I can't recommend one over others.
For myself, I love the Dokuwiki platform: it doesn't have a database, just plaintext files (with some very minimal markup). In the past I had some scripts that generated pages on a schedule, so they were always current. It runs locally or can be hosted. Importing a lot of .txt files might take some help, but I bet you aren't the first person to do it! https://www.dokuwiki.org/dokuwiki
posted by wenestvedt at 6:37 AM on October 4, 2022 [1 favorite]
Even though you may never actually produce a single work, I believe that there are tools for handling arbitrary pieces of text, while also managing metadata. Sort of the modern equivalent of the timeless "pile of index cards," plus search. :7)
I regret to say that I am not a writer who uses them, so I can't recommend one over others.
For myself, I love the Dokuwiki platform: it doesn't have a database, just plaintext files (with some very minimal markup). In the past I had some scripts that generated pages on a schedule, so they were always current. It runs locally or can be hosted. Importing a lot of .txt files might take some help, but I bet you aren't the first person to do it! https://www.dokuwiki.org/dokuwiki
posted by wenestvedt at 6:37 AM on October 4, 2022 [1 favorite]
Best answer: Obsidian is what you want. No coding required, works via text documents, so you aren’t locked in to a database or anything.
The power in obsidian really comes from the plugins, both “core” (that ship with the software) and community developed. One of the core plugins is the obsidian graph view which displays conceptual links between your snippets. This will give you the random browsing piece you’re looking for. I suspect there are other, community developed plug-ins that also provide randomness/discoverability to your content.
posted by bluloo at 12:44 PM on October 4, 2022
The power in obsidian really comes from the plugins, both “core” (that ship with the software) and community developed. One of the core plugins is the obsidian graph view which displays conceptual links between your snippets. This will give you the random browsing piece you’re looking for. I suspect there are other, community developed plug-ins that also provide randomness/discoverability to your content.
posted by bluloo at 12:44 PM on October 4, 2022
This thread is closed to new comments.
There are a lot of very enthusiastic people in its community - I'm not one of them because I've only just switched and it seems fine, if more complicated than I really need. But it sounds like it might suit you.
In case it's helpful I wrote this Python script to help move my Simplenote files to Obsidian.
posted by fabius at 5:12 AM on October 4, 2022