Statistical Analysis of the LiveJournal Network

During the summer of 2003, while searching for a project at the Santa Fe Institute Complex Systems Summer School, I began casting about for large social networks to analyze. Knowing that most of the important work on understanding the common statistical structural properties of social networks had been done, but that little had been done on enormous networks, I specifically began looking for them.

Brad Fitzgerald, Pete Krawczyk, and Evan Martine at LiveJournal very kindly offered me an anonymized snapshot of the LJ network from that summer. Over the next few months, I ran the statistical methods described in Mark Newmans "Structure and funtion of networks" review article. My results follow.

 total number of people (N) 798,017 total number of relationships (E) 6,410,380 average number of contacts (mean degree) 16.0654 /- 34.8178 highest number of contacts (max degree) 9832 contacts cluster coefficient (1) 0.218334 cluster coefficient (2) 0.308404 average hand-shake separation (harmonic mean) 5.2113 largest hand-shake separation (diameter) 15

To make the network analysis more simple, all relationships were assumed to be reflexive (i.e., symmetric - the graph was converted into an undirected version of the original data). Briefly, these results line up nicely with the standard ones for social networks. Specifically, we see that there is an incredibly high degree of clustering (a.k.a. network transitivity) representing the presence of many more triangles of friends than one would expect in an Erdos-Renyi random graph. We also see a relatively small diameter for the size of the network (15) and a mean separation degree which lines up with the seminal Stanley Milgram study of the 1960s and the more recent study by Duncan Watts. Of particular note however, and somewhat surprising for a purely social network, the degree distribution bears a strong resemblance to a power law for the higher degrees. This result is somewhat surprising, as one would expect in a physical social network that relationships would require some degree of maintenence and thus one would not see individuals with nearly 10,000 relationships; however, in electronic social networks, such maintenance is not required as once established a friendship can be forever.

Moving forward on this work, I'm collaborating with Mark Newman to improve his community structure inference algorithm so that it scales up to enormous networks like this. One interesting idea one could explore with the LJ data is to see how well individual community declarations (LiveJournal supports special nodes called 'communities' which members may join) reflect the actual community structure discovered via the betweenness-centrality (or modularity) metric. It seems likely that the interests in a person's local social network is representative of one's interests as well.