I lik ur profile
January 25, 2013 6:25 PM   Subscribe

How do I get a fair size (100+ messages) corpus of first-contact messages on dating sites?

I have a weird text classification idea and I want to test it on real corpuses. By "first-contact messages on dating sites", I mean the first message from someone to someone else on a preferably general-audience dating site. Anonymous, of course, and it's OK if scrubbed of all identifying details, of course. A Googling doesn't bring up any results. Does such a thing exist publicly? What about a simulation of same?
posted by curuinor to Computers & Internet (9 answers total) 2 users marked this as a favorite

That guy has already made a bunch of profiles and collected data on messaging rates; maybe he would have a corpus of good enough size and let you access it?
posted by Maecenas at 6:42 PM on January 25, 2013 [1 favorite]

If you're a girl, just make an OKCupid profile and give it a week. I hadn't even written anything in mine (just one photo + basic info + answered a bunch of questions) and got an overwhelming amount of messages. Not very textually-interesting ones though, I'm afraid.
posted by ella wren at 7:00 PM on January 25, 2013 [1 favorite]

Yeah, be female and generic on OKCupid and you'll have a VERY large corpus VERY QUICKLY.

However, the words "hey," "baby," and "wazzup" will each occur more frequently than "the" or "and."

BTW, that guy who made the profiles? Based on info from the OKCupid blog, I'm betting his results for the women are flawed - the women whose outfits/photos displayed cleavage were the "winners," rather than the women who were the most objectively attractive. Interestingly, both the men who got messages showed toothy smiles.

So, perhaps my first sentence should have been "be female and generic and show cleavage," when you think about it.
posted by SMPA at 7:15 PM on January 25, 2013

On behalf of people using these sites for their actual purpose, it'd be great if you could ignore the temptation (and suggestions) to make fake profiles that waste our time and sabotage our efforts to make contact with real people, instead take the higher path obtaining the information legitimately (which I assume is really your question - you clearly wouldn't need to ask metafilter about how to fake profiles :) ) even though the legitimate way is harder to get traction with.

As to how, I'm sorry to say I got nothing. OKcupid blog (mentioned above) indicates they they do a lot of their own social analysis, so they do access the material. I'm guessing they keep data access very guarded, but even then some of them are clearly maths geeks and probably share you interest in this sort of stuff. If your idea is something you can share, you might be able to talk to someone about writing them a perl script (or whatever) for them to run on their data, and only share the results with you, or collaborate with someone because it's fascinating or because it would make good OKtrends fodder?
posted by anonymisc at 7:28 PM on January 25, 2013 [10 favorites]

As others have mentioned, you can easily get 100+ messages from setting up a profile with a picture of an attractive woman. If you'd rather not do that, you could probably get people to give you a random selection of the messages they've received (though some people, like me, will have deleted all the 'hi whats up'-esque messages from their inbox). I'd happily send you a bunch.
posted by littlegreen at 7:46 PM on January 25, 2013

I like littlegreen's idea - If the messages don't need a consistent recipient, then I could offer you a dozen or so sent to me, details stripped. Get a few more people on board and maybe you could get your data direct via Mefi?
posted by anonymisc at 8:15 PM on January 25, 2013

It depends on how particular your task is trying to be, but the other problem (after the ethical issue) of fake profiles is that they colour the recipients. You can post an attractive woman showing cleavage, but the race and age will have presumably some impact. If you post anything beyond a picture and a few checkboxes, then you will further affect the results. If the profile describes a love of horse riding, George Strait and Jesus, you'll get different messages than if you talk about libraries, Radiohead and the Muppets.

I like littlegreen's idea of MeFites providing examples of messages directly; it seems like the ethical way, and it would provide messages sent to a variety of profiles (even if they may skew more to my second example), which would build a stronger corpus. If our colleagues here have thrown out all of the high-volume approaches ("sup girl") in favour of the high-quality ones, then a profile with a picture and nothing else will at least only waste the time of the high-volume group, and not much time at that.
posted by Homeboy Trouble at 9:02 PM on January 25, 2013

Response by poster: Thanks to the responses and to the mefite who memailed me! I think I have enough.
posted by curuinor at 12:02 AM on January 26, 2013 [1 favorite]

You may be interested in Andrew Fiore's masters thesis. (Link to that is available on his faculty profile page).
posted by Fuego at 10:12 PM on January 26, 2013

« Older Are kittens like babies or cats?   |   How can I be supportive to a friend under these... Newer »
This thread is closed to new comments.