May 01, 2008
Hierarchical structure of networks
Many scientists believe that complex networks, like those we use to describe the interactions of genes, social relationships, and food webs, have a modular structure, in which genes or people or critters tend to cluster into densely interacting groups that are only loosely connected to each other. This idea is appealing since it agrees with a lot of our everyday experience or beliefs about how the world works. But, within those groups, are interactions uniformly random? Some folks believe that these modules themselves can be decomposed into sub-modules, and they into sub-sub-modules, etc. Similarly, modules may group together into super-modules, etc. This kind of recursive structure is what I mean by hierarchical group structure. 
There's been a lot of interest among both physicists and biologists in methods for extracting either modular or hierarchical structure in networks. In fact, one of my first papers in grad school was a fast algorithm for clustering nodes in very large networks. Many of the methods for getting at the hierarchical structure of networks are rather ad hoc, with the hierarchy produced being largely a byproduct of the particular behavior of the algorithm, rather than something inherent to the network itself. What was missing was a direct model of hierarchy.
Many of you will know (perhaps from here or here), that I've done work in this area with Cris Moore and Mark Newman, and that I care a lot about null models and making appropriate inferences from data. Our first paper on hierarchy is on the arxiv; in it, we showed some fancy things you could do with a model of hierarchy, such as assign connections a "surprisingness" value based on how unlikely they were under our model. Our second paper, in which we show that hierarchy is a very good predictor of missing connections in networks appeared today in Nature. [2,3] There's also a very nice accompanying News & Views piece by Sid Redner. Accurately predicting missing connections has many applications, including the obvious one for homeland security, but also for laboratory or field scientists who construct networks laboriously, testing or looking for one or a few edges at a time.
Another nice thing that came out of this work is that the hierarchy we extract from real networks seems to be extremely good at simultaneously reproducing many other commonly measured statistical features of networks, including things like a right-skewed degree distribution, high (or low) clustering coefficients, etc. In some sense, this suggests that hierarchy may be a fundamental principle of organization for these networks. That is, it may turn out that different kinds of hierarchies of modules is partly what causes real-world networks to look the way they do. General principles like this are wonderful (but not easy) to find, as they suggest we're on the right track to boiling a complex system down to its fundamental parts.
Of course, there are several important missing pieces from this picture, one of which is that real networks are often functional, while the hierarchical model may not completely circumscribe the networks that accomplish the necessary functions for the biological or social context they exist in. In that sense, we still have a long way to go before we understand why things like genetic regulatory networks are shaped the way they are, but hierarchy at least gives us a reasonable way to think about the large-scale organization of these fantastically complex systems.
Update 5 May 2008: Coverage of our results have appeared on Roland Piquepaille's Technology Trends, and also on Slashdot. Now I can live my days out in peace knowing that something I did made it on /. ...
 Hierarchical group structure is different from a hierarchy on the nodes themselves, which is more like a military hierarchy or an org-chart, where control or information flows from individuals higher in the hierarchy to other individuals lower in the hierarchy. For gene networks, there is probably some of both kinds of hierarchy, as there are certainly genes that control the behavior of large numbers of other genes. For instance, see
G. Halder, P. Callaerts and W.J Gehring. "Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila". Science 267, 1788–1792 (1995).
 "Hierarchical structure and the prediction of missing links in networks." A. Clauset, C. Moore and M. E. J. Newman. Nature 453, 98 - 101 (2008).
The code for fitting the model to network data (C++), for predicting missing connections in networks (C++), and for visualizing the inferred hierarchical structure (Matlab) is available on my website.
 It's especially nice to have this paper in print now as it was the last remaining unpublished chapter of my dissertation. Time for new projects!
posted May 1, 2008 08:58 AM in Networks | permalink
Aaron, it is really nice to see the interesting application about the hierarchical structures. As you mentioned, the key missing piece here is the "functional brick". what is the really mechanism behind all of those fantastically statistical features of complex networks. Does the functional requirments on individual node constrain the hierarchical structures (like motif, modular, fractal(Nature 433, 392), fractal boundary(arXiv/math-ph/08041968)) formations? Or vice verse?
Posted by: Feng at May 1, 2008 03:49 PM
This work appears very innovative and fundamentally exciting. Congratulations to you and your co-authors.
I have these questions:
- Do you see this same approach as applicable to semantic Web or social networks?
- How might the missing node analysis be applied to checking graph integrity or gaps?, and
- In a graph of knowledge spaces, do you think module identification or similar might help in identifying natural knowledge "domains"?
At any rate, very exciting work. I will be studying your materials further.
Posted by: Mike Bergman at May 3, 2008 01:53 PM
Congrats on the Nature pub + slashdot article. You are officially badass.
Posted by: Virgil at May 4, 2008 02:23 AM
Thanks Virgil. Coming from Mr. Wikiscanner, that mean a lot :)
Posted by: Aaron at May 5, 2008 08:37 AM
Mike, I think the same general approach - that is, one of writing down a generative model of topological structure and then using inference methods to extract information from real data - is certainly applicable to the semantic Web and social networks. The hard part is not the inference part, but rather writing down a useful generative model. Missing node analysis could potentially be used to check the integrity of graphs, although accurately predicting missing nodes is likely a much harder task than accurately predicting missing edges, so maybe it's not that practical. Finally, a lot of people have thought about using graph clustering methods to identify domains of human knowledge in an unsupervised way. These methods are largely what I would call descriptive; a real test of them would be to ask how well they predict how the network of knowledge changes over time.
Posted by: Aaron at May 5, 2008 08:42 AM