Tools to explore social graphs, language, and networks - for a layperson
March 17, 2017 7:58 AM   Subscribe

I'm really interested in analysing social networks, social graphs, use of language and behaviour therein. I am not a mathematician or academic. Where should I start, in terms of software or toolkits (I have a reasonable background in software development and nerdy things, but this is a slightly out-there area for me, hence my framing as a layperson - maybe an outsider academic is better or less worse?)

In the past I've had occasion to dabble with Gephi, but anything written in Java makes me wince - sorry to say it, but here we are, I just said it. I'm flexible with Python and comfortable in the wild lands of JavaScript, but particularly love things like Max/MSP where I can click and probe and explore things as part of programmatic discovery (and also embed external systems like JS and Python, so that's a winner for me).

Having hunted through previous ask.mefi questions in this area (and wanting to avoid Excel even more than I want to avoid Java), I'm coming round to the idea I might need to look at this area more with Python (and py/pyext for Max is sharpened and to the ready). However, before I ride my unproven steed into a dead end, I'm wondering what tools people who may already be working with looking at this area in this area; I'm particularly keen to avoid writing tiresome scripts to scrape and download data, which is where I'm hoping Python's extant twitter and reddit modules will be of use (and another blow to Java integration).

I also looked at the preview of the Diestel book and frankly was terrified. Is that the level of theory and understanding I have to be at to start in this space, or is that just an in-joke to put people off buying the real thing?

To muddy the waters further - what sane tools are people using to storage and query this kind of data, if I've not been overly-vague about defining 'this' and 'data'?

Thanks in advance to any clever people years ahead of me in this area.

question doesn't fit easily in any single category!
posted by davemee to Society & Culture (13 answers total) 11 users marked this as a favorite
 
Response by poster: Maybe a little more context I'm missing would help - I'm really interested in looking at the intersection of communities around areas of twitter, facebook and reddit, as well as their correspondence with particular ideologies, behaviours, tactics, and language use.
posted by davemee at 8:02 AM on March 17, 2017


I am a scientist and I do some coding, but the only network analysis I've been invoked with had another lead on coding, and she used R. My advice is to follow the packages. Python and R would both be fine staring points. R seems more widely used to me in my field.

What scientists do is ask a question or find a problem then select the best tool for that task. At least in theory. But if you just want to poke around for fun, use what looks fun, otherwise why bother?

Simple googling of R/Python network analysis reveals several promising tools/frameworks.

Also recall before it was hip and cool to apply it to teh webz and tweets, this stuff was called Graph theory, and you may find additional tools by searching on that.
posted by SaltySalticid at 8:35 AM on March 17, 2017


Response by poster: Thanks, SaltySalticid - the 'graph theory' nomenclature came up in my ask.mefi research and that's where I ended up curled up in a ball courtesy That Diestel Book.

The phrase 'network analysis' is one I'd not thought of, fearing I'd end up in a world of Cisco.

R has been floating around my awareness for some time but it felt like opening a door to not only another language but another world of paradigms, so I quickly slammed it shut. Maybe I should look again...
posted by davemee at 9:03 AM on March 17, 2017


If you want to start on the REALLY basic and accessible side, try Kumu.
posted by rachelpapers at 9:22 AM on March 17, 2017


Other terms to search for, besides graph theory, are "social network analysis" or SNA.
posted by rachelpapers at 9:23 AM on March 17, 2017


Take a look at this upcoming Coursera course, Applied Social Network Analysis in Python (despite the language in the course info page you can enroll for free if you don't need a course certificate).
posted by needled at 10:11 AM on March 17, 2017


If you decide you're willing to wade into R, the statnet suite of R packages has tons of functionality for social network analysis. I would say that its documentation is is designed for people who have some background in this area, and might be difficult to self-teach from. Social Network Analysis: Methods and Applications by Wasserman and Faust is probably the definitive book on methods for social network analysis. It's fairly academic; there are probably friendlier introductions out there.

You might also look into igraph, which I've just learned has not only an R package, but also Python and C libraries.
posted by thrungva at 10:14 AM on March 17, 2017


Re: your update. The funny thing is, the same R code packages (or Python, etc) can be used to analyze networks no matter what they represent! Sexual partners, servers on the internet backbone, knots in my shawl, -- the math don't care what you're thinking about, only the edges and vertices (or nodes and connections, if you prefer) matter. This is the power of abstraction™

I suppose you do know this but I think it's also worth pointing out for clarity.

Also: as for what you need to know to start: even a casual inspection of things like the degree distribution and how it changes over time and across communities can lead to interesting insights. Sure, there's tons of high powered machinery and arcane theorems that can be deployed, but a bright highschool student can start to play around with this stuff find interesting things.
posted by SaltySalticid at 10:30 AM on March 17, 2017


Leskovic intended cs224w to be accessible to extremely technical PHD students and less technical folks. Uhh, given his background, he sort of failed at the less technical bit, but you can get an idea of what a crazy Stanford professor's idea of "less technical" network analysis looks like.

Don't use his package, go use networkx.

Clicking is great but not fun >5000 nodes >10000 edges. Real easy to get both, though.
posted by hleehowon at 10:36 AM on March 17, 2017


Yes networkx. Language-wise: If I were doing this from scratch, I'd probably use something like Gensim's doc2vec to get a reasonable data-driven vector representation of language by a user; any number of easy ways to get language similarity between any two user vectors.
posted by supercres at 12:20 PM on March 17, 2017


For teaching beginners, databasic.io's Connect The Dots is a fun introduction to network graph concepts.
posted by zamboni at 12:31 PM on March 17, 2017


Response by poster: Oh god. Thanks everybody. Some really helpful stuff here, can I hand out best answers willy-nilly to everybody?

So I think (if I surmise the journey with detours I'm going to look at here)

Making sense of the space - theory and courses
Zamboni suggests Connect The Dots to have a gentle introduction to network graph theory.
Hleehowon points to cs224w which is a course that pretends it isn't for technological overlords by Jure Leskovec (whose name kept popping up in this space).
Needled points to Coursera which offers a free-without-certification course Applied Social Network Analysis in Python, which is module 5 in their 'Applied Data Science with Python' (they suggest you're familiar with some of the earlier modules first).

Python toolkits and frameworks
networkx, as suggested by Hleehowon and supercres, seems to be the clear winner in this space, as it's well-documented, reasonably data neutral, and python.
igraph, as suggested by thrungva, is available for Python, c++, and R, which offers some portability perks for grown-ups that may need such things (me, I'm happy I can pip install networkx on OSX without admin rights and make it available to Max/MSP through the Py/PyEXT modules)

Graph-tool popped up too, which doesn't seem as widespread, but may be of interest to people crunching larger datasets due to being a native c implementation (SHOOT ME DOWN IF I'M TALKING unicodepoopemoji) #speculation

Low-commitment starting playgrounds
Rachelpapers points to Kumu as a very-low-barrier-to-entry option, a hosted platform with nothing to install and demo datasets to poke around with immediately.

Language analysis
I nearly wrote lexical there and realised I was out of my depth. Supacres suggests Gensim is a python library for "generating similar(ity)" amongst documents and corpuscles (did I use that right? :) As a python library, this fits nicely into my intended toolkits. doc2vec as a layer to generate maths numbers out of text for further analysis (... good, they're using 'bags of words' as a technical term!)

For those cleverer than I
thrungva and Saltsalticid suggest R as a good platform for this kind of analysis, with statnet a great suite of tools for Social Network Analysis

Magic Words and incantations
"degree distribution", "Social Network Analysis", "SNA", "Graph Theory". Suppose I should add these as tags for future discoverability.

Thanks everybody - I'm paralysed trying to think of who gets a 'best answer' here. Do I give them to everybody or is my mastication and regurgitation of everybody's great answers here a better gift to give back? It may help others going down the same fraught journey.
posted by davemee at 4:27 AM on March 18, 2017


Diestel's a grad-level text and doesn't pull its punches, FWIW, so it moves quick and gets esoteric. It's a great survey of the principal tools and results of graph theory but it's not really meant to be an introductory text for laypeople.

There seem to be few undergraduate or nonacademic texts which provide a gentle introduction specifically to graph theory (it usually gets folded into a chapter or two in a more general discrete-math-oriented text) Robin Wilson's Introduction to Graph Theory is quite approachable; Doug West's text of the same name is also good.

Admittedly, the basic principles of graph theory are a long ways from the modeling and analysis techniques used on social networks, so if you either want a pop-science appreciation of the field, or a knowledge of algorithms without too much of the underlying theory, studying introductory graph theory may not get you to where you're trying to go.
posted by jackbishop at 6:00 AM on March 18, 2017


« Older Vehicle shopping again: utility/commuting edition.   |   tiny neck, tiny seam, thick fabric, mostly cotton... Newer »
This thread is closed to new comments.