How does DailyLit split up books into bits?
September 30, 2013 2:05 PM   Subscribe

How do sites like DailyLit split up large text files into smaller files and then email them?

There's a similar site called Dripread where you can upload a webpage and have it send the text in smaller pieces over a few days or weeks.

I'd like to figure out how to create this type of site so I can do something similar on a domain I own. I have no idea how long these sites will be around, and I find getting daily emails of books and articles to be invaluable. Where would I start?
posted by reenum to Computers & Internet (4 answers total)
 
See if your web host supports cron jobs (a mechanism for scheduling automated actions). If so, I'd suggest writing this first as a command line script executed from cron. Any Python (or Ruby or Perl or whatever) book will teach you enough to write a command line script that splits text files, sends email, etc. That plus cron gets you to the point of having a working solution for yourself. Incorporating that or extremely similar logic into a database-driven web site later is a separable task.
posted by Monsieur Caution at 2:39 PM on September 30, 2013


Does it have to be a website? If you can get your hands on the raw text of whatever it is you want chunked and mailed, there are a variety of Unix tools that are ideally suited for this sort of thing.

The split command, for example, does just that - separate a file into chunks of N lines. Now that you've got your text chunked up, you mail each chunk out on some schedule. As a fairly lazy guy, I'd probably write some little shell script to iterate over directory holding the chunks, mail one out, then delete it (or move it someplace else).

Repeat daily as needed. Getting your articles, books and whatnot into a useable format might be a little trickier. Published websites can nearly always be reduced to their useful text pretty easily. E-books, PDFs and so on might need a little more massaging first.
posted by jquinby at 2:42 PM on September 30, 2013


Response by poster: jquinby, no need for it to be a website. Which UNIX tools would be good to look into?
posted by reenum at 3:21 PM on September 30, 2013


I'd start with the aforementioned split if you've got raw text to work with.

For web pages, I'd probably use lynx with the -dump and -nolist switches, like so:

lynx -dump -nolist http://www.website.com/some_page.html > dumped_file.txt

You'll also want to take a look at pdftotext which does just that.

Mailing the files can use the built-in mail utility, assuming your server is correctly configured to send mail and allowed to by your ISP - most block outbound connections on port 25 to prevent spamming from home. Scheduling the job will be done with cron, and it would probably make sense to wrap some of these tasks in a shell script. bash is easy to work with. I sometimes do scripts in the Bourne shell (sh) just for the hell of it.

Other utilities that come in handy are the old standbys: sed, awk, cut, and fmt. There are oodles of books written on sed and awk, but if you want to get useful quickly, look at these compilations of useful one-liners: sed & awk.

Unix was built for text manipulation, so these sorts of tasks are ideally suited for the CLI. The learning curve can be steep, but the payoff is that you can pretty much do anything you can imagine with text.
posted by jquinby at 3:52 PM on September 30, 2013 [1 favorite]


« Older Help me win PUA hearts and minds   |   How do I use digital technology to let people know... Newer »
This thread is closed to new comments.