Shrinking or splitting HTML bookmarks file
March 24, 2021 5:39 AM   Subscribe

I recently selected Raindrop.io as my bookmark manager. This led me to go searching for all of my previous bookmark exports from across the years. I could only find one I had which is 56 mb - it contains more than 13 000 pages of bookmarks. Help me wrangle pls.

I cannot import the file directly to sort, it is too big to edit without locking up my computer... I have tried the "auto shrink your html sites", and some automatic formatting tools. I a thinking the best idea is to somehow automatically split the file into multiple files, but I am not sure the best method for this.

Raindrop.io has proven really awesome thus far, with over 13 000 bookmarks already, and I am very excited at the prospect of making those older bookmarks usable once more.
posted by infinite intimation to Computers & Internet (8 answers total) 2 users marked this as a favorite
 
It would be useful to know whether you’re using Windows, Mac, etc, as some text editors might be able to open the file while others wouldn’t. What have you tried to use to edit it so far?
posted by fabius at 6:46 AM on March 24


Best answer: You can use "split" to split your file into small chunks. If you have a Mac or Linux, you should already have it - you just have to open a terminal. If you have Windows, you can install Git Bash first, and then use that as your terminal.

At the command line in the terminal, navigate to wherever you have your large file and just type "split your-file-name.html". It'll make 1000-line chunks by default. You can use different options on the command line to specify a suffix for all the new files that it will create if you want them all to be called whatever.html, or to make the chunks bigger, etc.

Then you'd probably want some kind of batch upload of all your little html files into raindrop.io.
posted by rd45 at 8:36 AM on March 24


Response by poster: I am on a Mac, but have access to windows if there is something easier. I have tried the split function in terminal... it works - I end up with a bunch of files - but I can't seem to be able to figure out how to reduce the overall number of output files... I think I have followed the instructions listed on that 'options' link. Raindrop.io can import files up to 50 mb, so I was really actually quite close to the limit - if I could break my file into only three or four divisions it would be perfect.

I always end up with "filename_aa - filename_qw". To import this will take some time, though if this is the only option I can manage this now.
posted by infinite intimation at 9:33 AM on March 24


13000 bookmarks is not that much, I'm around half that and my Firefox exported HTML bookmark file is less than 3 MB. Something seems off here.

What's the source of that HTML file? Asking because recent Firefox versions by default include the favicon in the HTML export, greatly bloating its size.

If it's a Netscape style bookmark file (like Firefox and Shaarli use) I'd create a temporary fresh Firefox profile, symlink its profile folder to a ramdisk, import your 56 MB (might take a few minutes), close Firefox, remove the favicons.sqlite from the profile directory as described in the link above, then re-open Firefox and re-export the 13000 bookmarks.

(The ramdisk step is highly recommended if not essential btw, Firefox fsyncs each bookmark operation and having the places.sqlite database on a physical disk will make the importing process take what it feels like an order of magnitude longer)
posted by Bangaioh at 10:59 AM on March 24


Actually ignore the above advice, current Firefox removes all annotations from bookmarks, which you probably don't want to do.

Unless you're 100% sure you don't have any important data on that field, instead of an updated Firefox use one of the legacy versions that still retain the bookmark annotations when importing/exporting, while turning off internet connectivity to avoid Firefox auto-updating to a bad version during the process.
posted by Bangaioh at 11:47 AM on March 24


Response by poster: It’s actually 13 000 pages of bookmarks (when viewing as a webpage) that I’m trying to shrink or split (confusing because I mentioned Raindrop.io working well with the 13000 bookmarks imported thus far). It was an export from a time when I imported into a browser from a large number of past backups than saved the results.
posted by infinite intimation at 11:52 AM on March 24 [1 favorite]


Best answer: Use “split -l 10000 filename.html” to make 10,000-line chunks. Or substitute with any other integer. That’s a lower-case letter L.

To count lines in the original file, use “wc -l filename.html”. Makes it very easy to pick a suitable size for the split. Again, it’s a lower-case L.
posted by rd45 at 2:20 PM on March 24 [1 favorite]


Response by poster: Thank you all so much! Brilliant!
I had been putting the "-l#" on the other side of the filename - so it was spitting out a huge number of files named -l 10000 lol

$ split -l 100000 bm.html
this gave me 5 files of about 15-19 mb each.
Again, thank you all, and thank you rd45, for spelling it out - y'all have both fed me, and also taught me some tips which will facilitate fishing in the future.
posted by infinite intimation at 3:05 AM on March 25


« Older Debating about dairy   |   Dealing with other people’s children Newer »

You are not logged in, either login or create an account to post comments