Scraping years of photos & comments from a facebook group
May 13, 2020 7:48 PM   Subscribe

I'm in a private group on facebook, about a unique regional industry that's now almost gone, and is poorly documented. A vast number of photographs and comments have been posted to this group over several years. The historical record they represent is irreplacable, and the people posting are getting old and passing away. I don't trust Facebook to preserve all this for posterity. How can I retrieve it?

I want to extract all the posts and comments, including photos, and maintain the relationships between them. Very often, someone posts a photo with no caption and then a whole conversation happens in the comments which reveals what it's of, where it was taken, and who's in it. That information is invaluable.

I'm aware of the copyright issues, but I believe that maintaining an offline backup of this for posterity is morally justified. I would love to get everyone to move to a different platform and repost everything, but there are 1500+ members, many of them very old, and some are barely coping with Facebook.

I'm aware that there is no official way to get all this from Facebook, and that the answers will have to involve scraping it from a browser or emulating one.

I am a very experienced programmer but I don't work on modern web stuff at all. Figuring out how to write a suitable scraper from scratch is probably technically within my ability but would take me a long time and a lot of trial and error. I'm sure other people must have implemented this or are better placed than me to do so.

I found a couple of commercial options:
Bino Posts Scraper & Publisher
Personal Groupware

It's not clear to me if either will do what I need. The first seems to be focused on businesses wanting to scrape posts from one group/page to post on another, and doesn't look like it captures comments. The second looks more promising but doesn't seem to have been updated since 2016, and a forum post says it had stopped working in 2019.

Does anyone have experience with either? Are you aware of any other options? Do you know of anyone who's done this and has their own code to do it?

I am happy to pay for a working solution, or for help from someone who has successfully done this, or something similar, before.

I realise that this is something people may not want to publicise their tools for, and am prepared to be discreet. MeMail me if you prefer.
posted by automatronic to Computers & Internet (4 answers total) 7 users marked this as a favorite
I would ask over at /r/datahoarder
posted by fellion at 11:36 AM on May 14, 2020 [1 favorite]

Facebook just introduced a backup to google photos through the data transfer project.
It will only backup the pictures, not the comments.

The other option would be to download all data from facebook and then use your programming skills to scrape on your desktop rather than having to do it online.
posted by radsqd at 1:29 PM on May 14, 2020

Response by poster: The official processes from Facebook don't help. You can only download your own data, not what other people have posted.

This open source Ultimate Facebook Scraper project is looking promising, but it looks like it doesn't handle comments yet.

It looks like this question has been asked several times on /r/datahoarder with no useful replies.
posted by automatronic at 3:24 AM on May 15, 2020

I have this post about scraping FB for posts and comments using Beautiful Soup and Python saved, but I haven't tried it yet. Might be a direction worth investigating? Not clear that it automatically saves photos, though, but it could probably be modified to do so.
posted by Pandora Kouti at 4:26 AM on August 15, 2020

« Older Books about early/midcentury Asian American...   |   Cohesive cross-genre compilation albums? Newer »
This thread is closed to new comments.