« DS07 | Main | Power laws and all that jazz »

June 05, 2007

Thoughts on NetSci 2007

Clearly, attending four conferences in four different states over five weeks has been too much for your humble blogger to handle in a timely fashion. This entry covers some of the notes I took while I was in New York City for the International Conference on Network Science (NetSci 2007), held at the New York Hall of Science.

Stefan Bornholdt (Bremen) gave an reasonably interesting talk on computation in molecular networks on Day 1. The basic idea is that while we pretty much understand how the storage (DNA), transmission (reproduction) and alteration (mutation, crossover) of heritable traits works, we have very little idea about how the machinery that interacts with them actually computes (regulates, manages, etc.) the things they code for. Using a very simple model of this computation, and a set of inferred interactions for the genes governing cell division in yeast, he constructed a simple dynamical system and analyzed its behavior. His main result was a claim that the simple dynamical model, combined with the topology of which genes talk to each other, reproduces the actual 13-state sequence for cell division. The one problem with this claim was that the figure he showed of this sequence was a chain ending in a fixed point, rather than a limit cycle. Still, interesting work, and it suggests that combining the right dynamical model with a network topology can lead to interesting dynamical behaviors.

Stephen North (AT&T; also one of the contact people for GraphViz) gave a brief talk about network visualization procedures. I think the consequences of visualization are under-appreciated by the networks community. That is, "good" visualizations should show us things about the data that we didn't know were there (or maybe we did), but it should try hard not to create artifacts in the visualization. I wasn't surprised to recently learn that the ubiquitous spring-force layout algorithms exhibit strong pathologies in terms of their getting stuck in local energy-minima that can give the appearance of structure that may not exist. For visualization, the desirable features he suggests are purely local -- we want few edge crossings and we want edges to be shot and straight -- but for very large networks, I think it would be more informative to get the cluster structure right than to make the inner details of the clusters pretty. I'm not aware of any layout algorithms that do this, but this is at least partially because getting the cluster structure right can require very sophisticated pre-processing.

Jennifer Dunne (Santa Fe Institute) opened Day 2 with discussion of her recent work on understanding paleolithic foodwebs and their similarity (or dissimilarity) to modern-day foodwebs. Amazingly, experts on paleolithic species can reconstruct a large fraction of the corresponding foodwebs from fossils, based on lucky fossilizations (animals caught in the act of feeding, or fossilized stomach contents), morphological considerations (jaw structure, sizes, etc.), damage patterns, etc. Each edge is assigned a confidence value by the expert. Dunne's analysis of these inferred topologies (e.g., from the Burgess Shale) seems pretty careful, showing that paleo webs and modern webs actually don't seem to differ in the network-features that ecologists typically study, although they do differ in their details. Her work is unusual in the networks community for explicitly doing an analysis of the uncertainty of the conclusions in light of the known uncertainties in the topology -- this was a topic that came up during the panel discussion at the IPAM workshop earlier in the month, and I think it deserves a lot more attention in the networks world. That is, there's a strong need for better methods to understand how to quote reasonable uncertainties in our conclusions.

One of my favorite talks of the week was by James Collins (Boston University) on his work on reverse engineering gene networks. He used a multiple linear regression model to infer an (weighted?) adjacency matrix from gene expression data for the E. coli SOS network (a particular set of responses E. coli initiates under high stress), and then matched its predictions for what genes to target to knockout the overall response of the network with in vivo experiments. I asked if a more complicated model might do even better -- he suggested that the fact that the linear model does so well suggests that the system is highly damped, and that more complicated models (e.g., those with say, non-linear effects built-in) do seem to do slightly better.

Another of my favorite talks of the week was by Jon Kleinberg (Cornell) on the difficulties of anonymizing large social network data sets. He pointed out that many companies have and will release anonymized version of their social networks (e.g., facebook or LiveJournal), where the anonymization is done simply by assigning each node a random unique label. Jon points out that, and then shows us exactly how, a malicious attacker could, by introducing a few additional edges or nodes to the network (e.g., by opening up some new email accounts and sending some emails), specifically de-anonymize a set of edges and nodes in the network. The results he showed relied on some very beautiful and very old results from combinatorics (e.g., some work by Paul Erdos on Ramsey theory). He then pitched an open question, are there good privacy-preserving mechanisms for anonymizing social network data? He suggested that zero-knowledge proofs or interactive proof schemes might be one way to guarantee anonymity, but at the expense of severely limiting the kinds of analysis that researchers could do with the network data.

Lise Getoor (U. Maryland) gave a short talk on her work on entity resolution for citation and coauthor networks (code available here; x86 chips only). Although she didn't speak directly about the effect of this kind of aliasing on the topological patterns that people are often interested in, the implication was clear that the topology (and thus both small-scale and large-scale patterns) can change quite dramatically when you appropriately clean the data. I'm curious to see what happens to the basic network statistics like the degree distribution, clustering coefficient and geodesic-length distribution, along with larger things like the community structure change when these things are fixed. (This would also give us some insight into just how much uncertainty we should include in our conclusions when we don't do this kind of alias resolution.)

posted June 5, 2007 02:38 PM in Networks | permalink

Comments

Thanks for the summaries!

BTW, since you were there... Any info about proceedings? I wasn't there, and have been checking the "Proceedings" link of the conf. webpage in vain. Were there written proceedings at all? Some CD in the participants' bags? Anything beyond the abstracts? I asked to the contact email, but no reply so far. Thanks for any hint!

Posted by: dileffante at June 14, 2007 04:17 PM

I don't know anything about the proceedings, unfortunately.

Posted by: Aaron at June 15, 2007 04:39 PM