October 24, 2009

This is the life I've chosen

An oldie, but goodie: John Oliver reporting on how academia really works.

The Daily Show With Jon StewartMon - Thurs 11p / 10c
Human's Closest Relative
www.thedailyshow.com
Daily Show
Full Episodes
Political HumorHealth Care Crisis

If that's not enough hilarity about chimps vs. orangs, or, if you were really intrigued by the arguments in favor of orangs, read this.

Tip to Jake Hofman.

posted October 24, 2009 09:45 AM in Simply Academic | permalink | Comments (0)

January 26, 2009

The right place for science

Dennis Overbye has a very nice little essay in the Science Times this week on the restoration of science to its rightful place in society, and on the common themes that make both science and democracy function. Here's a blurb:

Science is not a monument of received Truth but something that people do to look for truth.

That endeavor, which has transformed the world in the last few centuries, does indeed teach values. Those values, among others, are honesty, doubt, respect for evidence, openness, accountability and tolerance and indeed hunger for opposing points of view. These are the unabashedly pragmatic working principles that guide the buzzing, testing, poking, probing, argumentative, gossiping, gadgety, joking, dreaming and tendentious cloud of activity — the writer and biologist Lewis Thomas once likened it to an anthill — that is slowly and thoroughly penetrating every nook and cranny of the world.

Nobody appeared in a cloud of smoke and taught scientists these virtues. This behavior simply evolved because it worked.

This sounds pretty good, doesn't it? And I think it's basically true (although not necessarily in the way we might naively expect) that aspects of science are pervading almost every part of modern life and thought. One thing that I've found particularly bizarre in recent years is the media's promotion of words like "Why" and "How" to a pole position in their headlines. For instance, Time Magazine now routinely blasts "How such and such happens" across its front page, suggesting that within its pages, definitive answers for the mysteries of life will be revealed. To me, this apes the way scientists often talk, and capitalizes on society's susceptibility to that kind of language. If science weren't such a dominant force in our society, this kind of tactic would surely not sell magazines...

posted January 26, 2009 10:28 PM in Simply Academic | permalink | Comments (0)

October 23, 2007

SFI is hiring

From personal experience, I can attest to the fact that SFI is a great place to work, do science, learn stuff, explore new areas, and otherwise build your career. Start your LaTeX engines! (Deadline is Nov. 15, a scant 3 weeks away!)

Postdoctoral Fellowship Opportunities at the Santa Fe Institute

The Santa Fe Institute (SFI) is selectively seeking applications for Postdoctoral Fellows for appointments beginning Fall 2008.

Fellows are appointed for up to three years during which they pursue research questions of their own design and are encouraged to transcend disciplinary lines. SFI’s unique structure and resources enable Fellows to collaborate with members of the SFI faculty, other Fellows, and researchers from around the world.

As the leader in multidisciplinary research, SFI has no formal programs or departments and we accept applications from any field. Research topics span the full range of natural and social sciences and often make connections with the humanities. Most research at SFI is theoretical and/or computational in nature, although some research includes an empirical component in collaboration with other institutions.

The compensation package includes a competitive salary and excellent health and retirement benefits. As full participants in the SFI community, Fellows are encouraged to invite speakers, organize workshops and working groups and engage in research outside their field. Funds are available to support this full range of research activities. Applications are welcome from candidates in any country. Successful foreign applicants must acquire an acceptable visa (usually a J-1) as a condition of employment. Women and minorities are especially encouraged to apply.

For complete information and application instructions, please follow the link to http://www.santafe.edu/postdocapp08.

The online application process opens October 15, 2007. Application deadline is November 15, 2007.

posted October 23, 2007 06:22 PM in Simply Academic | permalink | Comments (0)

April 15, 2007

A rose is a rose

Warning: Because I'm still recovering from my catastrophic loss last Monday, blogging will be light or ridiculous for a little while longer. So, without further ado...

A few weeks ago, I inadvertently initiated a competition in the comment thread of Scott Aaronson's blog on how to identify physicists. It all started with Scott claiming that he was not a mathematician (as New Scientist claimed he was in an article about D-wave's press releases about quantum computers). As various peoples weighed in on Scott's mathematicianness, finally Dave Bacon proposed a sure fire way to settle the question:

Place yourself and a large potted plant in a huge room together. If you get tangled up in the plant, you are a mathematician. I draw this test from careful observation of the MSRI in Berkeley.

I then wondered aloud how to identify physicists, and I was returned a laundry list of characteristic behaviors:

  1. Hearing the word “engineering” causes a skin rash. [John Sidles]
  2. Writes “a”, says “b”, means “c”, but it should be “d” [Polya]
  3. Frequently begins sentences with “As a physicist…” (as in “As a physicist, I care about the real world, not the logical consequences of the assumption I just made”) [Scott Aaronson]
  4. When told he is actually a mathematician he thinks: “LOL” and all the mathematician go: “OMFG”. [Peter Sheldrick]
  5. They think that, since walking forwards gets them from their house to work, walking backwards in the opposite direction must have the same outcome. (Re: the replica method) [James]
  6. Is interested in creating just one job. [John Sidles]
  7. Considers chemists to be underqualified physicists, and biologists to be overqualified philatelists. [anonymous]

Amusingly, I know many people (physicists, mostly) who are walking, talking caricature of these. I also know some excellent people in physics departments who certainly are not, and I'm not sure what they do is "physics". I wonder if they think of themselves as physicists...

I know I promised to keep this ridiculous, but I can hardly help myself. So, if you'll permit me a lengthy navel-gazing digression, there's an interesting question here, which has to do with the labels communities of people choose to adopt, and how they view interlopers. For instance, I have no idea whether to call myself an applied mathematician (maybe not), a physicist (almost certainly not, although most of my publications are in physics journals), a computer scientist (still not quite right even though my doctorate is in CS), or what. Informatician sounds like a career in oratory, no one knows what an "applied computer scientist" is, and none of Complex systemsatist, "compleximagician," or statico-phyico-algorithmo-informa-complexicist have that nifty ring to them. (And, for that matter, neither does plecticist.)

With my recent phase change, when people ask what I do, I've taken to simply saying that I'm a "scientist." But, that just encourages them to ask the obvious follow up: What kind of scientist? In some sense, applied mathematician seems colloquially, kind of, maybe, almost like what I do. But, I'm not sure I could teach in a mathematics department, nor would other applied mathematicians call me one of their own. Obviously, these labels are all artificial, but they do matter for hiring, publishing, and general academic success. The complex systems community hasn't achieved a critical-enough mass to assert its own labels for the people who seem to do that kind of work, so, in the meantime, how should we name the practitioners in this field?

Update, 16 April 2007: One colleague suggests "mathematical scientist" as an appropriate moniker, which I tend to also like. Sadly, I'm not sure other scientists would agree that this is a useful label, nor do I expect to see many Departments of Mathematical Science being created in the near future (and similarly for "computational scientist") ... End Update

posted April 15, 2007 08:30 PM in Humor | permalink | Comments (8)

March 31, 2007

arXiv phase change

The arXiv has been discussing for some time the need to change the way it tags submissions. The principal motivation was that the number of monthly submissions in some subject classes (math and cond-mat, for instance) has been steadily rising over the past few years, and would have likely crossed the break-point of 1000 per month sometime later this year. 1000 is the magic number because the current arxiv tag is formatted as "subject-class/YYMMNNN".

The new tagging system, which goes into effect for all submissions tomorrow April 1st and later, moves to a "YYMM.NNNN" format and drops the subject classification prefix. I rather like this change, since, by decoupling the classification and the tag, it gives arXiv a lot more flexibility to adapt its internal subject classes to scientific trends. This will make it easier to place multi-disciplinary articles (like those I write, most of which end up on the physics arxiv), will (hopefully) make it less confusing for people to find articles, and will (potentially) let the arxiv expand into other scientific domains.

posted March 31, 2007 12:01 AM in Simply Academic | permalink | Comments (0)

March 29, 2007

Nemesis or Archenemy

Via Julianne of Cosmic Variance, the rules of the game for choosing your archnemesis. The rules are so great, I reproduce them here, in full.

1. Your archnemesis cannot be your junior. Someone who is in a weaker position than you is not worthy of being your archnemesis. If you designate someone junior as your archnemesis, you’re abusing your power.

2. You cannot have more than one archnemesis. Most of us have had run-ins with scientific groups who range continuous war against all outsiders. They take a scorched earth policy to anyone who is not a member of their club. However, while these people are worthy candidates for being your archnemesis, they are not allowed to have that many archnemeses themselves. If you find that many, many people are your archnemeses, then you’re either (1) paranoid; (2) an asshole; or (3) in a subfield that is so poisonous that you should switch topics. If (1) or (2) is the case, tone it down and try to be a bit more gracious.

3. Your archnemesis has to be comparable to you in scientific ability. It is tempting to despise the one or two people in your field who seem to nab all the job offers, grants, and prizes. However, sometimes they do so because they are simply more effective scientists (i.e. more publications, more timely ideas, etc) or lucky (i.e. wound up discovering something unexpected but cool). If you choose one of these people as an archnemesis based on greater success alone, it comes off as sour grapes. Now, if they nabbed all the job offers, grants, and prizes because they stole people’s data, terrorized their juniors, and misrepresented their work, then they are ripe and juicy for picking as your archnemesis. They will make an even more satisfying archnemesis if their sins are not widely known, because you have the future hope of watching their fall from grace (not that this actually happens in most cases, but the possibility is delicious). Likewise, other scientists may be irritating because their work is consistently confusing and misguided. However, they too are not candidates for becoming your archnemesis. You need to take a benevolent view of their struggles, which are greater than your own. [Ed: Upon recovering my composure after reading this last line, I decided it is, indeed, extremely good advice.]

4. Archnemesisness is not necessarily reciprocal. Because of the rules of not picking fights with your juniors, you are not necessarily your archnemesis’s archnemesis. A senior person who has attempted to cut down a grad student or postdoc is worthy of being an archnemesis, but the junior people in that relationship are not worthy of being the archnemesis of the senior person. There’s also the issue that archnemeses are simply more evil than you, so while they’ll work hard to undermine you, you are sufficiently noble and good that you would not actively work to destroy them (though you would smirk if it were to happen).

Now, what does one do with an archnemesis? Nothing. The key to using your archnemesis effectively is to never, ever act as if they’re your archnemesis (except maybe over beers with a few close friends when you need to let off steam). You do not let yourself sink to their level, and take on petty fights. You do not waste time obsessing about them. Instead, you treat them with the same respect that you would any other colleague (though of course never letting them into a position where they could hurt you, like dealing with a cobra). You only should let your archnemesis serve as motivation to keep pursuing excellence (because nothing annoys a good archnemesis like other people’s success) and as a model of how not to act towards others. You’re allowed to take private pleasure in their struggles or downfall, but you must not ever gloat.

While I’m sure the above sounds so thrilling that you want to rush out and get yourself an archnemesis, if one has not been thrust upon you, count your blessings. May your good fortune continue throughout your career.

In the comment thread, bswift points to a 2004 Esquire magazine piece by Chuck Klosterman on the difference between your (arch)nemesis and your archenemy. Again, quoting liberally.

Now, I know that you’re probably asking yourself, How do I know the difference between my nemesis and my archenemy? Here is the short answer: You kind of like your nemesis, despite the fact that you despise him. If your nemesis invited you out for cocktails, you would accept the offer. If he died, you would attend his funeral and—privately—you might shed a tear over his passing. But you would never have drinks with your archenemy, unless you were attempting to spike his gin with hemlock. If you were to perish, your archenemy would dance on your grave, and then he’d burn down your house and molest your children. You hate your archenemy so much that you try to keep your hatred secret, because you don’t want your archenemy to have the satisfaction of being hated.

Naturally I wonder, Do I have an archnemesis, or an archenemy? Over the years, I've certainly had a few adversarial relationships, and many lively sparring matches, with people at least as junior as me, but they've never been driven by the same kind of deep-seated resentment, and general bad behavior, that these two categories seem to require. So, I count myself lucky that in the fictional story of my life, I've had only "benign" professional relationships - that is, the kind disqualified from nemesis status. However, on the (quantum mechanical) chance that my fictional life takes a dramatic turn, and a figure emerges to play the Mr. Burns to my Homer Simpson, the Newman to my Seinfeld, the Dr. Evil to my Austin Powers, I'll keep these rules (and that small dose of hemlock) handy.

Update, March 30, 2007: Over in the comment section, I posed the question of whether Feynman was Gell-Mann's archnemesis, as I suspected. Having recently read biographies of both men (here and here), it was hard to ignore the subtle (and not-so-subtle) digs that each man made at the other through these stories. A fellow commenter Elliot, who was at Caltech when Gell-Mann received his Nobel confirmed that Feynman was indeed Gell-Mann's archnemesis, not for scientific reasons, but for social ones. Looking back over the rules of the game, Feynman does indeed satisfy all the criteria. Cute.

posted March 29, 2007 12:04 AM in Simply Academic | permalink | Comments (0)

February 17, 2007

What makes a good (peer) reviewer?

The peer review process is widely criticized for its failings, and many of them are legitimate complaints. But, to paraphrase Churchill, No one pretends that peer review is perfect or all-wise. In fact, peer review is the worst of all systems, except for all the others. Still, peer review is itself only occasionally studied. So, that makes the work of two medical researchers all the more interesting. Callaham and Tercier studied about 3000 reviews of about 1500 manuscripts by about 300 reviewers over a four-year period, and the corresponding quality scores given these reviews by editors.

Our study confirms that there are no easily identifiable types of formal training or experience that predict reviewer performance. Skill in scientific peer review may be as ill defined and hard to impart as is “common sense.” Without a better understanding of those skills, it seems unlikely journals and editors will be successful in systematically improving their selection of reviewers. This inability to predict performance makes it imperative that all but the smallest journals implement routine review ratings systems to routinely monitor the quality of their reviews (and thus the quality of the science they publish).

The other choice results of their study include a consistent negative correlation between the quality of the review and the number of years of experience. That is, younger reviewers write better reviews. To anyone in academia, this should be a truism for obvious reasons. Ironically, service on an Institutional Review Board (IRB; permission from such a board is required to conduct experiments with human subjects) consistently correlated with lower-quality reviews. The caveat here, of course, is that both these and the other factors were only slightly significant.

I've been reviewing for a variety of journals and conferences (across Computer Science, Physics, Biology and Political Science) for a number of years now, and I still find myself trying to write thoughtful, and sometimes lengthy, reviews. I think this is because I honestly believe in the system of peer review, and always appreciate thoughtful reviews myself. Over the years, I've changed some things about how I review papers. I often start earlier now, write a first draft of the review, and then put it down for several days. This lets my thoughts settle on the important points of the paper, rather than on the details that jump out initially. If the paper is good, I try to make small constructive suggestions. If the paper isn't so good, I try to point out the positive aspects, and couch my criticism on firm scientific grounds. In both, I try to see the large context that the results fit into. For some manuscripts, these things are harder than others, particularly if the work seems to have been done hastily, the methodology suspect or poorly described, the conclusions overly broad, etc. My hope is that, once I have a more time-consuming position, I'll have developed some tricks and habits that let me continue to be thoughtful in my reviews, but able to spend less time doing them.

Callaham and Tercier, "The Relationship of Previous Training and Experience of Journal Peer Reviewers to Subsequent Review Quality." PLoS Medicine 4(1): e40 (2007).

Tip to Ars Technica, which has its own take on the study.

Digg!

posted February 17, 2007 04:47 PM in Simply Academic | permalink | Comments (0)

January 25, 2007

DIMACS - Complex networks and their applications (Day 3)

The third day of the workshop focused on applications to biochemical networks (no food webs), with a lot of that focus being on the difficulties of taking fuzzy biological data (e.g., gene expression data) and converting it into an accurate and meaningful form for further analysis or for hypothesis testing. Only a few of the talks were theoretical, but this perhaps reflects the current distribution of focus in biology today. After the workshop was done, I wondered just how much information crossed between the various disciplines represented at the workshop - certainly, I came away from it with a few new ideas, and a few new insights from the good talks I attended. And I think that's the sign of a successful workshop.

Complex Networks in Biology

Chris Wiggins (Columbia) delivered a great survey of interesting connections between machine learning and biochemical networks. It's probably fair to say that biologists are interested in constructing an understanding of cellular-level systems that compares favorably to an electrical engineer's understanding of circuits (Pointer: Can a Biologist Fix a Radio?). But, this is hard because living stuff is messy, inconsistent in funny ways, and has a tendency to change while you're studying it. So, it's harder to get a clean view of what's going on under the hood than it was with particle physics. This, of course, is where machine learning is going to save us - ML offers powerful and principled ways to sift through (torture) all this data.

The most interesting part of his talk, I think, was his presentation of NetBoost, a mechanism discriminator that can tell you which (among a specific suite of existing candidates) is the most likely to have generated your observed network data [1]. For instance, was it preferential attachment (PA) or duplication-mutation-complementation (DMC) that produced a given protein-interaction network (conclusion: the latter is better supported). The method basically works by constructing a decision tree that looks at the subgraph decomposition of a network and scores it's belief that each of the various mechanisms produced it [2]. With the ongoing proliferation of network mechanisms (theorists really don't have enough to do these days), this kind of approach serves as an excellent way to test a new mechanism against the data it's supposed to be emulating.

One point Chris made that resonated strongly with me - and which Cris and Mark made yesterday - is the problem with what you might call "soft validation" [3]. Typically, a study will cluster or do some other kind of analysis with the data, and then tell a biological story about why these results make sense. On the other hand, forcing the clustering to make testable predictions would be a stronger kind of validation.

Network Inference and Analysis for Systems Biology

Just before lunch, Joel Bader (Johns Hopkins) gave a brief talk about his work on building a good view of the protein-protein interaction network (PPIN). The main problems with this widely studied data are the high error rate, both for false positives (interactions that we think exist, but don't) and false negatives (interactions that we think don't exist, but do). To drive home just how bad the data is, he pointed out that two independent studies of the human PPIN showed just 1% overlap in the sets of "observed" interactions.

He's done a tremendous amount of work on trying to improve the accuracy of our understanding of PPINs, but here he described a recent approach that fits degree-based generative models [4] to the data using our old friend expectation-maximization (EM) [5]. His results suggest that we're seeing about 30-40% of the real edges, but that our false positive rate is about 10-15%. This is a depressing signal-to-noise ratio (roughly 1%), because the number of real interactions is O(n), while our false positive rate is O(n^2). Clearly, the biological methods used to infer the interactions need to be improved before we have a clear idea of what this network looks like, but it also suggests that a lot of the previous results on this network are almost surely wrong. Another question is whether it's possible to incorporate these kinds of uncertainties into our analyses of the network structure.

Activating Interaction Networks and the Dynamics of Biological Networks

Meredith Betterton (UC-Boulder) presented some interesting work on signaling and regulatory networks. One of the more surprising tidbits she used in her motivation is the following. In yeast, the mRNA transcription undergoes a consistent 40-minute genome-wide oscillation, but when exposed to an antidepressant (in this case, phenelzine), the period doubles [6]. (The fact that gene expression oscillates like this poses another serious problem for the results of gene expression analysis that doesn't account for such oscillations.)

The point Meredith wanted to drive home, though, was we shouldn't just think of biochemical networks as static objects - they also represent the form that the cellular dynamics must follow. Using a simple dynamical model of activation and inhibition, she showed that the structure (who points to who, and whether an edge inhibits or activates its target) of a real-world circadian rhythm network and a real-world membrane-based signal cascade basically behave exactly as you would expect - one oscillates and the other doesn't. But, then she showed that it only takes a relatively small number of flips (activation to inhibition, or vice versa) to dramatically change the steady-state behavior of these cellular circuits. In a sense, this suggests that these circuits are highly adaptable, given a little pressure.

There are several interesting questions that came to mind while she was presenting. For instance, if we believe there are modules within the signaling pathways that accomplish a specific function, how can we identify them? Do sparsely connected dense subgraphs (assortative community structure) map onto these functional modules? What are the good models for understanding these dynamics, systems of differential equations, discrete time and matrix multiplication, or something more akin to a cellular version of Ohm's Law? [7]

-----

[1] M. Middendorf, E. Ziv and C. Wiggins, "Inferring Network Mechanisms: The Drosophila melanogaster Protein Interaction Network." PNAS USA 102 (9), 3192 (2005).

[2] Technically, it's using these subgraphs as generic features and then crunching the feature vectors from examples of each mechanism through a generalized decision tree in order to learn how to discriminate among them. Boosting is used within this process in order to reduce the error rates. The advantage of this approach to model selection and validation, as Chris pointed out, is that it doesn't assume a priori which features (e.g., degree distribution, clustering coefficient, distance distribution, whatever) are interesting, but rather chooses the ones that can actually discriminate between things we believe are different.

[3] Chris called it "biological validation," but the same thing happens in sociology and Internet modeling, too.

[4] I admit that I'm a little skeptical of degree-based models of these networks, since they seem to assume that we're getting the degree distribution roughly right. That assumption is only reasonable if our sampling of the interactions attached to a particular vertex is unbiased, which I'm not sure about.

[5] After some digging, I couldn't find the reference for this work. I did find this one, however, which illustrates a different technique for a related problem. I. Iossifov et al., "Probabilistic inference of molecular networks from noisy data sources." 20 (8), 1205 (2004).

[6] C. M. Li and R. R. Klevecz, "A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change." PNAS USA 103 (44), 16254 (2006).

[7] Maribeth Oscamou pointed out to me during the talk that any attempt to construct such rules have to account for processes like the biochemical degradation of the signals. That is, unlike electric circuits, there's no strict conservation of the "charge" carrier.

posted January 25, 2007 01:20 PM in Scientifically Speaking | permalink | Comments (0)

January 24, 2007

DIMACS - Complex networks and their applications (Day 2)

There were several interesting talks today, or rather, I should say that there were several talks today that made me think about things beyond just what the presenters said. Here's a brief recap of the ones that made me think the most, and some commentary about what I thought about. There were other good talks today, too. For instance, I particularly enjoyed Frank McSherry's talk on doing PageRank on his laptop. There was also one talk on power laws and scale-free graphs that stimulated a lot of audience, ah, interaction - it seems that there's a lot of confusion both over what a scale-free graph is (admittedly the term has no consistent definition in the literature, although there have been some recent attempts to clarify it in a principled manner), and how to best show that some data exhibit power-law behavior. Tomorrow's talks will be more about networks in various biological contexts.

Complex Structures in Complex Networks

Mark Newman's (U. Michigan) plenary talk mainly focused on the importance of having good techniques to extract information from networks, and being able to do so without making a lot of assumptions about what the technique is supposed to look for. That is, rather than assume that some particular kind of structure exists and then look for it in our data, why not let the data tell you what kind of interesting structure it has to offer? [1] The tricky thing about this approach to network analysis, though, is working out a method that is flexible enough to find many different kinds of structure, and to present only that which is unusually strong. (Point to ponder: what should we mean by "unusually strong"?) This point was a common theme in a couple of the talks today. The first example that Mark gave of a technique that has this nice property was a beautiful application of spectral graph theory to the task of find a partition of the vertices that give an extremal value of modularity. If we ask for the maximum modularity, this heuristic method [2], using the positive eigenvalues of the resulting solution, gives us a partition with very high modularity. But, using the negative eigenvalues gives a partition that minimizes the modularity. I think we normally think of modules meaning assortative structures, i.e., sparsely connected dense subgraphs. But, some networks exhibit modules that are approximately bipartite, i.e., they are disassortative, being densely connected sparse subgraphs. Mark's method naturally allows you to look for either. The second method he presented was a powerful probabilistic model of node clustering that can be appropriately parameterized (fitted to data) via expectation-maximization (EM). This method can be used to accomplish much the same results as the previous spectral method, except that it can look for both assortative and disassortative structure simultaneously in the same network.

Hierarchical Structure and the Prediction of Missing Links
In an afternoon talk, Cris Moore (U. New Mexico) presented a new and powerful model of network structure, the hierarchical random graph (HRG) [5]. (Disclaimer: this is joint work with myself and Mark Newman.) A lot of people in the complex networks literature have talked about hierarchy, and, presumably, when they do so, they mean something roughly along the lines of the HRG that Cris presented. That is, they mean that nodes with a common ancestor low in the hierarchical structure are more likely to be connected to each other, and that different cuts across it should produce partitions that look like communities. The HRG model Cris presented makes these notions explicit, but also naturally captures the kind of assortative hierarchical structure and the disassortative structure that Mark's methods find. (Test to do: use HRG to generate mixture of assortative and disassortative structure, then use Mark's second method to find it.) There are several other attractive qualities of the HRG, too. For instance, using a Monte Carlo Markov chain, you can find the hierarchical decomposition of a single real-world network, and then use the HRG to generate a whole ensemble of networks that are statistically similar to the original graph [6]. And, because the MCMC samples the entire posterior distribution of models-given-the-data, you can look not only at models that give the best fit to the data, but you can look at the large number of models that give an almost-best fit. Averaging properties over this ensemble can give you more robust estimates of unusual topological patterns, and Cris showed how it can also be used to predict missing edges. That is, suppose I hide some edges and then ask the model to predict which ones I hid. If it can do well at this task, then we've shown that the model is capturing real correlations in the topology of the real graph - it has the kind of explanatory power that comes from making correct predictions. These kinds of predictions could be extremely useful for laboratory or field scientists who manually collect network data (e.g., protein interaction networks or food webs) [7]. Okay, enough about my own work!

The Optimization Origins of Preferential Attachment
Although I've seen Raissa D'Souza (UC Davis) talk about competition-induced preferential attachment [8] before, it's such an elegant generalization of PA that I enjoyed it a second time today. Raissa began by pointing out that most power laws in the real-world can't extend to infinity - in most systems, there are finite limits to the size that things can be (the energy released in an earthquake or the number of edges a vertex can have), and these finite effects will typically manifest themselves as exponential cutoffs in the far upper tail of the distribution, which takes the probability of these super-large events to zero. She used this discussion as a springboard to introduce a relatively simple model of resource constraints and competition among vertices in a growing network that produces a power-law degree distribution with such an exponential cutoff. The thing I like most about this model is that it provides a way for (tempered) PA to emerge from microscopic and inherently local interactions (normally, to get pure PA to work, you need global information about the system). The next step, of course, is to find some way to measure evidence for this mechanism in real-world networks [9]. I also wonder how brittle the power-law result is, i.e., if you tweak the dynamics a little, does the power-law behavior disappear?

Web Search and Online Communities
Andrew Tomkins (of Yahoo! Reserch) is a data guy, and his plenary talk drove home the point that Web 2.0 applications (i.e., things that revolve around user-generated content) are creating a huge amount of data, and offering unparalleled challenges for combining, analyzing, and visualizing this data in meaningful ways. He used Flickr (a recent Y! acquisition) as a compelling example by showing an interactive (with fast-rewind and fast-forward features) visual stream of the trends in user-generated tags for user-posted images, annotated with notable examples of those images. He talked a little about the trickiness of the algorithms necessary to make such an application, but what struck me most was his plea for help and ideas in how to combine information drawn from social networks with user behavior with blog content, etc. to make more meaningful and more useful applications - there's all this data, and they only have a few ideas about how to combine it. The more I learn about Y! Research, the more impressed I am with both the quality of their scientists (they recently hired Duncan Watts), and the quality of their data. Web 2.0 stuff like this gives me the late-1990s shivers all over again. (Tomkins mentioned that in Korea, unlike in the US, PageRank-based search has been overtaken by an engine called Naver, which is driven by users building good sets of responses to common search queries.)

-----

[1] To be more concrete, and perhaps in lieu of having a better way of approaching the problem, much of the past work on network analysis has taken the following approach. First, think of some structure that you think might be interesting (e.g., the density of triangles or the division into sparsely connected dense subgraphs), design a measure that captures that structure, and then measure it in your data (it turns out to be non-trivial to do this in an algorithm independent way). Of course, the big problem with this approach is that you'll never know whether there is other structure that's just as important as, or maybe more important than, the kind you looked for, and that you just weren't clever enough to think to look for it.

[2] Heuristic because Mark's method is a polynomial time algorithm, while the problem of modularity maximization was recently (finally...) shown to be NP-complete. The proof is simple, and, in retrospect, obvious - just as most such proofs inevitably end up being. See U. Brandes et al. "Maximizing Modularity is hard." Preprint (2006).

[3] M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices." PRE 74, 036104 (2006).

[4] M. E. J. Newman and E. A. Leicht, "Mixture models and exploratory data analysis in networks." Submitted to PNAS USA (2006).

[5] A. Clauset, C. Moore and M. E. J. Newman, "Structural Inference of Hierarchies in Networks." In Proc. of the 23rd ICML, Workshop on "Statistical Network Analysis", Springer LNCS (Pittsburgh, June 2006).

[6] This capability seems genuinely novel. Given that there are an astronomical number of ways to rearrange the edges on a graph, it's kind of amazing that the hierarchical decomposition gives you a way to do such a rearrangement, but one which preserves the statistical regularities in the original graph. We've demonstrated this for the degree distribution, the clustering coefficient, and the distribution of pair-wise distances. Because of the details of the model, it sometimes gets the clustering coefficient a little wrong, but I wonder just how powerful / how general this capability is.

[7] More generally though, I think the idea of testing a network model by asking how well it can predict things about real-world problems is an important step forward for the field; previously, "validation" consisted of showing only a qualitative (or worse, a subjective) agreement between some statistical measure of the model's behavior (e.g., degree distribution is right-skewed) and the same statistical measure on a real-world network. By being more quantitative - by being more stringent - we can say stronger things about the correctness of our mechanisms and models.

[8] R. M. D'Souza, C. Borgs, J. T. Chayes, N. Berger, and R. Kleinberg, "Emergence of Tempered Preferential Attachment From Optimization", To appear in PNAS USA, (2007).

[9] I think the best candidate here would be the BGP graph, since there is clearly competition there, although I suspect that the BGP graph structure is a lot more rich than the simple power-law-centric analysis has suggested. This is primarily due to the fact that almost all previous analyses have ignored the fact that the BGP graph exists as an expression of the interaction of business interests with the affordances of the Border Gateway Protocol itself. So, its topological structure is meaningless without accounting for the way it's used, and this means accounting for complexities of the customer-provider and peer-to-peer relationships on the edges (to say nothing of the sampling issues involved in getting an accurate BGP map).

posted January 24, 2007 02:18 AM in Scientifically Speaking | permalink | Comments (1)

January 23, 2007

DIMACS - Complex networks and their applications (Day 1)

Today and tomorrow, I'm at the DIMACS workshop on complex networks and their applications, held at Georgia Tech's College of Computing. Over the course of the workshop, I'll be blogging about the talks I see and whatever ideas they stimulate (sadly, I missed most of the first day because of travel).

The most interesting talk I saw Monday afternoon was by Ravi Kumar (Yahoo! Research), who took location data of users on LiveJournal, and asked Do we see the same kind of routable structure - i.e., an inverses-square law relationship in the distance between people and the likelihood that they have a LJ connection - that Kleinberg showed was optimal for distributed / local search? Surprisingly, they were able to show that in the US, once you correct for the fact that there can be many people at a single "location" in geographic space (approximated to the city level), you do indeed observe exactly the kind of power-law that Kleinberg predicted [1]. Truly, this was a kind of stunning confirmation of Kleinberg's theory. So now, the logical question would be, What mechanism might produce this kind of structure in geographic space? Although you could probably get away with assuming a priori the population distribution, what linking dynamics would construct the observed topological pattern? My first project in graduate school asked exactly this question for the pure Kleinberg model, and I wonder if it could be adapted to the geographic version that Kumar et al. consider.

[1] D. Liben-Nowell, et al. "Geographic Routing in Social Networks." PNAS USA 102, 33 11623-1162 (2005).

posted January 23, 2007 06:24 AM in Scientifically Speaking | permalink | Comments (0)

December 19, 2006

Phase change

This past weekend I graduated with distinction with my doctorate from the University of New Mexico's Department of Computer Science. My advisor Cristopher Moore hooded me at the main Commencement ceremony on Friday, and on Saturday, the School of Engineering had its own smaller (and nicer) Convocation ceremony for its graduates. I was invited to be the graduate speaker at this event, and I made a few brief remarks that you can read here.

It's been an intense and highly educational four and a half years, but it's nice to finally be done.

posted December 19, 2006 12:36 PM in Self Referential | permalink | Comments (1)

September 08, 2006

Academic publishing, tomorrow

Imagine a world where academic publishing is handled purely by academics, rather than ruthless, greedy corporate entities. [1] Imagine a world where hiring decisions were made on the techincal merit of your work, rather than the coterie of journals associated with your c.v. Imagine a world where papers are living documents, actively discussed and modified (wikified?) by the relevant community of interested intellectuals. This, and a bit more, is the future, according to Adam Rogers, a senior associate editor at "Wired" magazine. (tip to The Geomblog)

The gist of Rogers' argument is that the Web will change academic publishing into this utopian paradise of open information. I seriously doubt things will be like he predicts, but he does raise some excellent points about how the Web is facilitating new ways of communicating technical results. For instance, he mentions a couple of on-going experiments in this area:

In other quarters, traditional peer review has already been abandoned. Physicists and mathematicians today mainly communicate via a Web site called arXiv. (The X is supposed to be the Greek letter chi; it's pronounced "archive." If you were a physicist, you'd find that hilarious.) Since 1991, arXiv has been allowing researchers to post prepublication papers for their colleagues to read. The online journal Biology Direct publishes any article for which the author can find three members of its editorial board to write reviews. (The journal also posts the reviews – author names attached.) And when PLoS ONE launches later this year, the papers on its site will have been evaluated only for technical merit – do the work right and acceptance is guaranteed.

It's a bit hasty to claim that peer review has been "abandoned", but the arxiv has certainly almost completely supplanted some journals in their role of disseminating new research [2]. This is probably most true for physicists, since they're the ones who started the arxiv; other fields, like biology, don't have a pre-print archive (that I know of), but they seem to be moving toward open access journals for the same purpose. In computer science, we already have something like this, since the primary venue for publication is in conferences (which are peer reviewed, unlike conference in just about every other discipline), and whose papers are typically picked up by CiteSeer.

It seems that a lot of people are thinking or talking about open access this week. The Chronicle of Higher Education has a piece on the momentum for greater open access journals. It's main message is the new letter, signed by 53 presidents of liberal arts colleges (including my own Haverford College) in support of the bill currently in Congress (although unlikely to pass this year) that would mandate that all federally funded research be eventually made publicly available. The comments from the publishing industry are unsurprisingly self-interested and uninspiring, but they also betray a great deal of arrogance and greed. I wholeheartedly support more open access to articles - publicly funded research should be free to the public, just like public roads are free for everyone to use.

But, the bigger question here is, Could any these various alternatives to the pay-for-access model really replace journals? I'm less sure of the future here, as journals also serve a couple of other roles that things like the arxiv were never intended to fill. That is, journals run the peer review process, which, at its best, prevents erroneous research from getting a stamp of "community approval" and thereby distracting researchers for a while as they a) figure out that it's mistaken, and b) write new papers to correct it. This is why, I think, there is a lot of crap on the arxiv. A lot of authors self-police themselves quite well, and end up submitting nearly error-free and highly competent work to journals, but the error-checking process is crucial, I think. Sure, peer review does miss a lot of errors (and frauds), but, to paraphrase Mason Porter paraphrasing Churchill on democracy, peer review is the worst form of quality control for research, except for all the others. The real point here is that until something comes along that can replace journals as being the "community approved" body of work, I doubt they'll disappear. I do hope, though, that they'll morph into more benign organizations. PNAS and PLoS are excellent role models for the future, I think. And, they also happen to publish really great research.

Another point Rogers makes about the changes the Web is encouraging is a social one.

[...] Today’s undergrads have ... never functioned without IM and Wikipedia and arXiv, and they’re going to demand different kinds of review for different kinds of papers.

It's certainly true that I conduct my research very differently because I have access to Wikipedia, arxiv, email, etc. In fact, I would say that the real change these technologies will have on the world of research will be to decentralize it a little. It's now much easier to be a productive, contributing member of a research community without being down the hall from your colleagues and collaborators than it was 20 years ago. These electronic modes of communication just make it easier for information to flow freely, and I think that ultimately has a very positive effect on research itself. Taking that role away from the journals suggests that they will become more about getting that stamp of approval, than anything else. With its increased relative importance, who knows, perhaps journals will do a better job at running the peer review process (they could certainly use the Web, etc. to do a better job at picking reviewers...).

(For some more thoughts on this, see a recent discussion of mine with Mason Porter.)

Update Sept. 9: Suresh points to a recent post of his own about the arxiv and the issue of time-stamping.

[1] Actually, computer science conferences, impressively, are a reasonable approximation to this, although they have their own fair share of issues.

[2] A side effect of the arXiv is that it presents tricky issues regarding citation, timing and proper attribution. For instance, if a research article becomes a "living" documents, proper citation becomes rather problematic. For instance, which version of an article do you cite? (Surely not all of them!) And, if you revise your article after someone posts a derivative work, are you obligated to cite it in your revision?

posted September 8, 2006 05:23 PM in Simply Academic | permalink | Comments (3)

August 12, 2006

Your academic-journal dollars at work

Having now returned from a relaxing and rejuvenating trip to a remote (read: no Internet) beach with my family, I am trying to catch up on where the world has moved since I last checked. Comfortably, it's still in one piece, although I'm not thrilled about the latest draconian attempts to scare people into feeling safe about flying in airplanes. Amazingly, only half of the 300 emails I received were spam, and what remained were relatively quickly dispatched. In catching up on science news, I find a new movement afoot to stop Elsevier - the ruthless, and notoriously over-priced, academic publishing house - from organizing arms fairs via one of its subsidiaries. Having recently watched the excellent documentary Why We Fight, on the modern military-industrial complex, this makes me a little concerned.

I've only refereed once for any Elsevier journal, and I now plan to never referee for any of them again. This idea is, apparently, not uncommon among other scientists, e.g., here, here and here. Charging exorbitant prices to under-funded academics who produce and vet the very same content being sold is one thing - exploitative, yes; deadly, no - but arms fairs are a whole different kind of serious. Idiolect is running a petition against this behavior.

Update, Aug. 24: Digging around on YouTube, I found this interview with Eugene Jarecki, the director of Why We Fight.

posted August 12, 2006 09:19 PM in Simply Academic | permalink | Comments (0)

July 10, 2006

That career thing

I'm sure this piece of advice to young scientists by John Baez (of quantum gravity fame) is old news now (3 years on). But, seeing as it was written before I was paying attention to this kind of stuff myself, and it seems like quite good advice, here is it, in a nutshell:

1. Read voraciously, ask questions, don't be scared of "experts", and figure out what are the good problems to work on in your field.
2. Go the most prestigious school, and work with the best possible advisor.
3. Publish often and publish stuff people will want to read (and cite).
4. Go to conferences and give good, memorable talks.

Looking back over my success, so far, I think I've done a pretty good job on most of these things. His advice about going to a prestigious place seems to be more about getting a good advisor - I suppose that in physics, being a very old field, the best advisors can only be found at the most prestigious places. But, I'm not entirely convinced that this is true for the interdisciplinary mashup, which includes complex networks and the other things I like to study, yet...

posted July 10, 2006 12:50 AM in Simply Academic | permalink | Comments (3)

April 16, 2006

The view from the top

Richard Hamming, of coding theory fame, gave a talk at Bell Labs in 1986 as a retrospective on his career and his insights into how to do great research. In it, he tells many amusing anecdotes of his time at Bell Labs, including how he and Shannon were office mates at the same time he was working on information theory, and why so many of the smart people he knew produced little great research by the end of their careers. A fascinating read.

Hamming on the subject of a researcher's drive:

You observe that most great scientists have tremendous drive. I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode's office and said, ``How can anybody my age know as much as John Tukey does?'' He leaned back in his chair, put his hands behind his head, grinned slightly, and said, ``You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.'' I simply slunk out of the office!

On the topic of knowing the limitations of your theories:

Great scientists tolerate ambiguity very well. They believe the theory enough to go ahead; they doubt it enough to notice the errors and faults so they can step forward and create the new replacement theory. If you believe too much you'll never notice the flaws; if you doubt too much you won't get started. It requires a lovely balance. But most great scientists are well aware of why their theories are true and they are also well aware of some slight misfits which don't quite fit and they don't forget it.

The rest of his talk is more of the same, but with longer stories and amusing anecdotes.

posted April 16, 2006 10:37 PM in Simply Academic | permalink | Comments (0)

March 07, 2006

Running a conference (redux)

Once again, for the past eight months or so, I've been heavily involved in running a small conference. The second annual Computer Science UNM Student Conference (CSUSC) happened this past Friday and was, in every sense of the word, a resounding success. Originally, this little shindig was conceived as a way for students to show off their research to each other, to the faculty, and to folks at Sandia National Labs. As such, this year's forum was just as strong as last year's inaugural session, having ten well-done research talks and more than a dozen poster presentations. Our keynote address was delivered by the friendly and soft-spoken David Brooks (no, not that one) from Harvard University, on power efficiency in computing. (Naturally, power density has been an important constraint on computing for a long time.)

Having organized this conference twice now, I have a very healthy respect for how much time is involved in making such an event a success. Although most of one's time is spent making sure all the gears are turning at the proper speeds (which includes, metaphorically, keeping the wheels greased and free of obstructions) so that each part completes in time to hand-off to the next, I'm also happy with how much of a learning experience its seems to have been for everyone involved (including me). This year's success was largely due to the excellent and tireless work of the Executive Committee, while, I'm confident saying that, all of the little hiccoughs we encountered were oversights on my part. Perhaps next year, those things will be done better by my successor.

But, the future success of the CSUSC is far from guaranteed: the probability of a fatal dip in the inertia of student interest in organizing it is non-trivial. This is a risk, I believe, that every small venue faces, since there are only ever a handful of students interested in taking time away from their usual menu of research and course work to try their hand at professional service. I wonder, What fraction of researchers are ever involved in organizing a conference? Reviewing papers is a standard professional duty, but the level of commitment required to run a conference is significantly larger - it takes a special degree of willingness (masochism?) and is yet another of the many parts of academic life that you have to learn in the trenches. For the CSUSC, I simply hope that the goodness that we've created so far continues on for a few more years, and am personally just glad we had such a good run over the past two.

With this out of the way, my conference calendar isn't quite empty, and is already rapidly refilling. Concurrent to my duties to the CSUSC, I've also been serving on the Program Committee for the 5th International Workshop on Experimental Algorithms (WEA), a medium-sized conference on the design, analysis and implementation of algorithms. An interesting experience, in itself, in part for broadening my perspective on the kind of research being done in algorithms. In May, always my busiest month for conferences, I'll be attending two events on network science. The first is CAIDA's Workshop on Internet Topology (WIT) in San Diego, while the second is the NetSci 2006 in Bloomington, Indiana.

posted March 7, 2006 04:37 AM in Simply Academic | permalink | Comments (0)

February 21, 2006

Pirates off the Coast of Paradise

At the beginning of graduate school, few people have a clear idea of what area of research they ultimately want to get into. Many come in with vague or ill-informed notions of their likes and dislikes, most of which are due to the idiosyncrasies of their undergraduate major's curriculum, and perhaps scraps of advice from busy professors. For Computer Science, it seems that most undergraduate curricula emphasize the physical computer, i.e., the programming, the operating system and basic algorithm analysis, over the science, let alone the underlying theory that makes computing itself understandable. For instance, as a teaching assistant for an algorithms course during my first semester in grad school, I was disabused of any preconceptions when many students had trouble designing, carrying-out, and writing-up a simple numerical experiment to measure the running time of an algorithm as a function of its input size, and I distinctly remember seeing several minds explode (and, not in the Eureka! sense) during a sketch of Cantor's diagonalization argument. When you consider these anecdotes along with the flat or declining numbers of students enrolling in computer science, we have a grim picture of both the value that society attributes to Computer Science and the future of the discipline.

The naive inference here would be that students are (rightly) shying away from a field that serves little purpose to society, or to them, beyond providing programming talent for other fields (e.g., the various biological or medical sciences, or IT departments, which have a bottomless appetite for people who can manage information with a computer). And, with programming jobs being outsourced to India and China, one might wonder if the future holds anything but an increasing Dilbert-ization of Computer Science.

This brings us to a recent talk delivered by Prof. Bernard Chazelle (CS, Princeton) at the AAAS Annual Meeting about the relevance of the Theory of Computer Science (TCS for short). Chazelle's talk was covered briefly by PhysOrg, although his separate and longer essay really does a better job of making the point,

Moore's Law has fueled computer science's sizzle and sparkle, but it may have obscured its uncanny resemblance to pre-Einstein physics: healthy and plump and ripe for a revolution. Computing promises to be the most disruptive scientific paradigm since quantum mechanics. Unfortunately, it is the proverbial riddle wrapped in a mystery inside an enigma. The stakes are high, for our inability to “get” what computing is all about may well play iceberg to the Titanic of modern science.

He means that behind the glitz and glam of iPods, Internet porn, and unmanned autonomous vehicles armed with GPS-guided missles, TCS has been drawing fundamental connections, through the paradigm of abstract computation, between previously disparate areas throughout science. Suresh Venkatasubramanian (see also Jeff Erickson and Lance Fortnow) phrases it in the form of something like a Buddhist koan,

Theoretical computer science would exist even if there were no computers.

Scott Aaronson, in his inimitable style, puts it more directly and draws an important connection with physics,

The first lesson is that computational complexity theory is really, really, really not about computers. Computers play the same role in complexity that clocks, trains, and elevators play in relativity. They're a great way to illustrate the point, they were probably essential for discovering the point, but they're not the point. The best definition of complexity theory I can think of is that it's quantitative theology: the mathematical study of hypothetical superintelligent beings such as gods.

Actually, that last bit may be overstating things a little, but the idea is fair. Just as theoretical physics describes the physical limits of reality, theoretical computer science describes both the limits of what can be computed and how. But, what is physically possible is tightly related to what is computationally possible; physics is a certain kind of computation. For instance, a guiding principle of physics is that of energy minimization, which is a specific kind of search problem, and search problems are the hallmark of CS.

The Theory of Computer Science is, quite to the contrary of the impression with which I was left after my several TCS courses in graduate school, much more than proving that certain problems are "hard" (NP-complete) or "easy" (in P), or that we can sometimes get "close" to the best much more easily than we can find the best itself (approximation algorithms), or especially that working in TCS requires learning a host of seemingly unrelated tricks, hacks and gimmicks. Were it only these, TCS would be interesting in the same way that Sudoku puzzles are interesting - mildly diverting for some time, but eventually you get tired of doing the same thing over and over.

Fortunately, TCS is much more than these things. It is the thin filament that connects the mathematics of every natural science, touching at once game theory, information theory, learning theory, search and optimization, number theory, and many more. Results in TCS, and in complexity theory specifically, have deep and profound implications for what the future will look like. (E.g., do we live in a world where no secret can actually be kept hidden from a nosey third party?) A few TCS-related topics that John Baez, a mathematical physicist at UC Riverside who's become a promoter of TCS, pointed to recently include "cryptographic hash functions, pseudo-random number generators, and the amazing theorem of Razborov and Rudich which says roughly that if P is not equal to NP, then this fact is hard to prove." (If you know what P and NP mean, then this last one probably doesn't seem that surprising, but that means you're thinking about it in the wrong direction!) In fact, the question of P versus NP may even have something to say about the kind of self-consistency we can expect in the laws of physics, and whether we can ever hope to find a Grand Unified Theory. (For those of you hoping for worm-hole-based FTL travel in the future, P vs. NP now concerns you, too.)

Alas my enthusiasm for these implications and connections is stunted by a developing cynicism, not because of a failure to deliver on its founding promises (as, for instance, was the problem that ultimately toppled artificial intelligence), but rather because of its inability to convince not just the funding agencies like NSF that it matters, but its inability to convince the rest of Computer Science that it matters. That is, TCS is a vitally important, but a needlessly remote, field of CS, and is valued by the rest of CS for reasons analogous to those for which CS is valued by other disciplines: its ability to get things done, i.e., actual algorithms. This problem is aggravated by the fact that the mathematical training necessary to build toward a career in TCS is not a part of the standard CS curriculum (I mean at the undergraduate level, but the graduate one seems equally faulted). Instead, you acquire that knowledge by either working with the luminaries of the field (if you end up at the right school), or by essentially picking up the equivalent of a degree in higher mathematics (e.g., analysis, measure theory, abstract algebra, group theory, etc.). As Chazelle puts it in his pre-talk interview, "Computer science ... is messy and infuriatingly complex." I argue that this complexity is what makes CS, and particularly TCS, inaccessible and hard-to-appreciated. If Computer Science as a discipline wants to survive to see the "revolution" Chazelle forecasts, it needs to reevaluate how it trains its future members, what it means to have a science of computers, and even further, what it means to have a theory of computers (a point CS does abysmally on). No computer scientist likes to be told her particular area of study is glorified programming, but without significant internal and external restructuring, that is all Computer Science will be to the rest of the world.

posted February 21, 2006 12:06 AM in Scientifically Speaking | permalink | Comments (0)

February 01, 2006

Defending academic freedom

Michael Bérubé, a literature and culture studies professor at Penn. State University, has written a lecture (now an essay) on the academic freedom of the professoriat and the demands by (radical right) conservatives to demolish it, through state-oversight, in the name of... academic freedom. The Medium Lobster would indeed be proud.

As someone who believes deeply in the importance of the free pursuit of intellectual endeavors, and who has a strong interest in the institutions that facilitate that path (understandable given my current choice of careers), Bérubé's commentary resonated strongly with me. Primarily, I just want to advertise Bérubé's essay, but I can't help but editorialize a little. Let's start with the late Sidney Hook, a liberal who turned staunchly conservative as a result of pondering the threat of Communism, who wrote in his 1970 book Academic Freedom and Academic Anarchy that

The qualified teacher, whose qualifications may be inferred from his acquisition of tenure, has the right honestly to reach, and hold, and proclaim any conclusion in the field of his competence. In other words, academic freedom carries with it the right to heresy as well as the right to restate and defend the traditional views. This takes in considerable ground. If a teacher in honest pursuit of an inquiry or argument comes to a conclusion that appears fascist or communist or racist or what-not in the eyes of others, once he has been certified as professionally competent in the eyes of his peers, then those who believe in academic freedom must defend his right to be wrong—if they consider him wrong—whatever their orthodoxy may be.

That is, it doesn't matter what your political or religious stripes may be, academic freedom is a foundational part of having a free society. At it's heart, Hook's statement is simply a more academic restatement of Voltaire's famous assertion: "I disapprove of what you say, but I will defend to the death your right to say it." In today's age of unblinking irony (e.g., Bush's "Healthy Forests" initiative) for formerly shameful acts of corruption, cronyism and outright greed, such sentiments are depressingly rare.

Although I had read a little about the radical right's effort to install affirmative action for conservative professors in public universities (again, these people have no sense of irony), what I didn't know about is the national effort to introduce legislation (passed into law in Pennsylvania and pending in more than twenty other states) that gives the state oversight ability of the contents of the classroom, mostly by allowing students (non-experts) to sue professors (experts) for introducing controversial material in the classroom. Thus, the legislature and the courts (non-experts) would be able to define what is legally permissible classroom content, by clarifying the legal term "controversial", rather than professors (experts). Bérubé:

When [Ohio state senator Larry Mumper] introduced Senate Bill 24 [which allows students to sue professors, as described above] last year, he was asked by a Columbus Dispatch reporter what he would consider 'controversial matter' that should be barred from the classroom. "Religion and politics, those are the main things," he replied.

All I can say in response is that college is not a kind of dinner party. It can indeed be rude to bring up religion or politics at a dinner party, particularly if you are not familiar with all the guests. But at American universities, religion and politics are two of the hundreds of things we discuss on a daily basis. It really is part of our job, even — or especially — if some of us have unpopular opinions on those subjects.

How else do we learn but by having our pre- and misconceptions challenged by those people who have studied these things, been trained by other experts and been recognized by their peers as an authority? Without academic freedom as defined by Hook and defended by Bérubé, a university degree will signify nothing more than having received the official State-sanctioned version of truth. Few things would be more toxic to freedom and democracy.

posted February 1, 2006 10:45 PM in Simply Academic | permalink | Comments (0)

December 19, 2005

On modeling the human response time function; Part 3.

Much to my surprise, this morning I awoke to find several emails in my inbox apparently related to my commentary on the Barabasi paper in Nature. This morning, Anders Johansen pointed out to myself and Luis Amaral (I can only assume that he has already communicated this to Barabasi) that in 2004 he published an article entitled Probing human response times in Physica A about the very same topic using the very same data as that of Barabasi's paper. In it, he displays the now familiar heavy-tailed distribution of response times and fits a power law of the form P(t) ~ 1/(t+c) where c is a constant estimated from the data. Asymptotically, this is the same as Barabasi's P(t) ~ 1/t; it differs in the lower tail, i.e., for t < c where it scales more uniformly. As an originating mechanism, he suggests something related to a spin-glass model of human dynamics.

Although Johansen's paper raises other issues, which I'll discuss briefly in a moment, let's step back and think about this controversy from a scientific perspective. There are two slightly different approaches to modeling that are being employed to understand the response-time function of human behavior. The first is a purely "fit-the-data" approach, which is largely what Johansen has done, and certainly what Amaral's group has done. The other, employed by Barabasi, uses enough data analysis to extract some interesting features, posits a mechanism for the origin of those and then sets about connecting the two. The advantage of developing such a mechanistic explanation is that (if done properly) it provides falsifiable hypotheses and can move the discussion past simple data-analysis techniques. The trouble begins, as I've mentioned before, when either a possible mechanistic model is declared to be "correct" before being properly vetted, or when an insufficient amount of data analysis is done before positing a mechanism. This latter kind of trouble allows for a debate over how much support the data really provides to the proposed mechanism, and is exactly the source of the exchange between Barabasi et al. and Stouffer et al.

I tend to agree with the idea implicitly put forward by Stouffer et al.'s comment that Barabasi should have done more thorough data analysis before publishing, or alternatively, been a little more cautious in his claims of the universality of his mechanism. In light of Johansen's paper and Johansen's statement that he and Barabasi spoke at the talk in 2003 where Johansen presented his results, there is now the specter that either previous work was not cited that should have been, or something more egregious happened. While not to say that this aspect of the story isn't an important issue in itself, it is a separate one from the issues regarding the modeling, and it is those with which I am primarily concerned. But, given the high profile of articles published in journals like Nature, this kind of gross error in attribution does little to reassure me that such journals are not aggravating certain systemic problems in the scientific publication system. This will probably be a topic of a later post, if I ever get around to it. But let's get back to the modeling questions.

Seeking to be more physics and less statistics, the ultimate goal of such a study of human behavior should be to understand the mechanism at play, and at least Barabasi did put forward and analyze a plausible suggestion there, even if a) he may not have done enough data analysis to properly support it or his claims of universality, and b) his model assumes some reasonably unrealistic behavior on the part of humans. Indeed, the former is my chief complaint about his paper, and why I am grateful for the Stouffer et al. comment and the ensuing discussion. With regard to the latter, my preference would have been for Barabasi to have discussed the fragility of his model with respect to the particular assumptions he describes. That is, although he assumes it, humans probably don't assign priorities to their tasks with anything like a uniformly random distribution and nor do humans always execute their highest priority task next. For instance, can you decide, right now without thinking, what the most important email in your inbox is at this moment? Instead, he commits the crime of hubris and neglects these details in favor of the suggestiveness of his model given the data. On the other hand, regardless of their implausibility, both of these assumptions about human behavior can be tested through experiments with real people and through numerical simulation. That is, these assumptions become predictions about the world that, if they fail to agree with experiment, would falsify the model. This seems to me an advantage of Barabasi's mechanism over that proposed by Johansen, which, by relying on a spin glass model of human behavior, seems quite trickier to falsify.

But let's get back to the topic of the data analysis and the argument between Stouffer et al. and Barabasi et al. (now also Johansen) over whether the data better supports a log-normal or a power-law distribution. The importance of this point is that if the log-normal is the better fit, then the mathematical model Barabasi proposes cannot be the originating mechanism. From my experience with distributions with heavy tails, it can be difficult to statistically (let alone visually) distinguish between a log-normal and various kinds of power laws. In human systems, there is almost never enough data (read: orders of magnitude) to distinguish these without using standard (but sophisticated) statistical tools. This is because for any finite sample of data from an asymptotic distribution, there will be deviations that will blur the functional form just enough to look rather like the other. For instance, if you look closely at the data of Barabasi or Johansen, there are deviations from the power-law distribution in the far upper tail. Stouffer et al. cite these as examples of the poor fit of the power law and as evidence supporting the log-normal. Unfortunately, they could simply be due to deviations due to finite-sample effects (not to be confused with finite-size effects), and the only way to determine if they could have been is to try resampling the hypothesized distribution and measuring the sample deviation against the observed one.

The approach that I tend to favor for resolving this kind of question combines a goodness-of-fit test with a statistical power test to distinguish between alternative models. It's a bit more labor-intensive than the Bayesian model selection employed by Stouffer et al., but this approach offers, in addition to others that I'll describe momentarily, the advantage of being able to say that, given the data, neither model is good or that both models are good.

Using Monte Carlo simulation and something like the Kolmogorov-Smirnov goodness-of-fit test, you can quantitatively gauge how likely a random sample drawn from your hypothesized function F (which can be derived using maximum likelihood parameter estimation or by something like a least-squares fit; it doesn't matter) will have a deviation from F at least as big as the one observed in the data. By then comparing the deviations with an alternative function G (e.g., a power law versus a log-normal), you get a measure of the power of F over G as an originating model of the data. For heavy-tailed distributions, particularly those with a sample-mean that converges slowly or never at all (as is the case for something like P(t) ~ 1/t), sampling deviations can cause pretty significant problems with model selection, and I suspect that the Bayesian model selection approach is sensitive to these. On the other hand, by incorporating sampling variation into the model selection process itself, one can get an idea of whether it is even possible to select one model over another. If someone were to use this approach to analyze the data of human response times, I suspect that the pure power law would be a poor fit (the data looks too curved for that), but that the power law suggested in Johansen's paper would be largely statistically indistinguishable from a log-normal. With this knowledge in hand, one is then free to posit mechanisms that generate either distribution and then proceed to validate the theory by testing its predictions (e.g., its assumptions).

So, in the end, we may not have gained much in arguing about which heavy-tailed distribution the data likely came from, and instead should consider whether or not an equally plausible mechanism for generating the response-time data could be derived from the standard mechanisms for producing log-normal distributions. If we had such an alternative mechanism, then we could devise some experiments to distinguish between them and perhaps actually settle this question like scientists.

As a closing thought, my interest in this debate is not particularly in its politics. Rather, I think this story suggests some excellent questions about the practice of modeling, the questions a good modeler should ponder on the road to truth, and some of the pot holes strewn about the field of complex systems. It also, unfortunately, provides some anecdotal evidence of some systemic problems with attribution, the scientific publishing industry and the current state of peer-review at high-profile, fast turn-around-time journals.

References for those interested in reading the source material.

A. Johansen, "Probing human response times." Physica A 338 (2004) 286-291.

A.-L. Barabasi, "The origin of bursts and heavy tails in human dynamics." Nature 435 (2005) 207-211.

D. B. Stouffer, R. D. Malmgren and L. A. N. Amaral "Comment on 'The origin of bursts and heavy tails in human dynamics'." e-print (2005).

J.-P. Eckmann, E. Moses and D. Sergi, "Entropy of dialogues creates coherent structures in e-mail traffic." PNAS USA 101 (2004) 14333-14337.

A.-L. Barabasi, K.-I. Goh, A. Vazquez, "Reply to Comment on 'The origin of bursts and heavy tails in human dynamics'." e-print (2005).

posted December 19, 2005 04:32 PM in Scientifically Speaking | permalink | Comments (0)

November 27, 2005

Irrational exuberance plus indelible sniping yields delectable entertainment

In a past entry (which sadly has not yet scrolled off the bottom of the front page - sad because it indicates how infrequently I am posting these days), I briefly discussed the amusing public debate by Barabasi et al. and Souffer et al. over Barabasi's model of correspondence. At that point, I found the exchange amusing and was inclined to agree with the response article. However, let me rehash this topic and expose a little more light on the subject.

From the original abstract of the article posted on arxiv.org by Barabasi:

Current models of human dynamics, used from risk assessment to communications, assume that human actions are randomly distributed in time and thus well approximated by Poisson processes. In contrast, ... the timing of many human activities, ranging from communication to entertainment and work patterns, [are] ... characterized by bursts of rapidly occurring events separated by long periods of inactivity. Here we show that the bursty nature of human behavior is a consequence of a decision based queuing process: when individuals execute tasks based on some perceived priority, the timing of the tasks will be heavy tailed, most tasks being rapidly executed, while a few experience very long waiting times.

(Emphasis is mine.) Barabasi is not one to shy away from grand claims of universality. As such, he epitomizes the thing that many of those outside of the discipline hate about physicists, i.e., their apparent arrogance. My opinion is that most physicists accused of intellectual arrogant are misunderstood, but that's a topic for another time.

Stouffer et al. responded a few months after Barabasi's original idea, as published in Nature, with the following (abstract):

In a recent letter, Barabasi claims that the dynamics of a number of human activities are scale-free. He specifically reports that the probability distribution of time intervals tau between consecutive e-mails sent by a single user and time delays for e-mail replies follow a power-law with an exponent -1, and proposes a priority-queuing process as an explanation of the bursty nature of human activity. Here, we quantitatively demonstrate that the reported power-law distributions are solely an artifact of the analysis of the empirical data and that the proposed model is not representative of e-mail communication patterns.

(Emphasis is mine.) In this comment, Stouffer et al. strongly criticize the data analysis that Barabasi uses to argue for the plausibility and, indeed, the correctness of his priority-based queueing model. I admit that when I first read Barabasi's queueing model, I thought that surely the smart folks who have been dealing with queueing theory (a topic nearly a century old!) knew something like this already. Even if that were the case, the idea certainly qualifies as interesting, and I'm happy to see a) the idea published, although Nature was likely not the appropriate place and b) the press attention that Barabasi has brought to the discipline of complex systems and modeling. Anyway, the heart of the data-analysis based critique of Barabasi's work lies in distinguishing two different kinds of heavy-tailed distributions: the log-normal and the power law. Because of a heavy tail is an asymptotic property, these two distributions can be extremely difficult to differentiate when the data only spans a few orders of magnitude (as is the case here). Fortunately, statisticians (and occasionally, myself) enjoy this sort of thing. Stouffer et al. employ such statistical tools in the form of Bayesian model selection to choose between the two hypotheses and find the evidence of the power law lacking. It was quite dissatisfying, however, that Stouffer et al. neglected to discuss their model selection procedure in detail, and instead chose to discuss the politicking over Barabasi's publication in Nature.

And so, it should come as no surprise that a rejoinder from Barabasi was soon issued. With each iteration of this process, the veneer of professionalism cracks away a little more:

[Stouffer et al.] revisit the datasets [we] studied..., making four technical observations. Some of [their] observations ... are based on the authors' unfamiliarity with the details of the data collection process and have little relevance to [our] findings ... and others are resolved in quantitative fashion by other authors.

In the response, Barabasi discusses the details of the dataset that Stouffer et al. fixated on: that the extreme short-time behavior of the data is actually an artifact of the way messages to multiple recipients were logged. They rightly emphasize that it is the existence of a heavy tail that is primarily interesting, rather than its exact form (of course, Barabasi made some noise about the exact form in the original paper). However, it is not sufficient to simply observe a heavy tail, posit an apparently plausible model that produces some kind of such tail and then declare victory, universality and issue a press release. (I'll return to this thought in a moment.) As a result, Barabasi's response, while clarifying a few details, does not address the fundamental problems with the original work. Problems that Stouffer et al. seem to intuit, but don't directly point out.

A month ago, Suresh over at the Geomblog published a comment on the controversy by Michael Mitzenmacher (whose work I greatly enjoy) in which he touches briefly on the real issue here.

While the rebuttal suggests the data is a better fit for the lognormal distribution, I am not a big believer in the fit-the-data approach to distinguish these distributions. The Barabasi paper actually suggested a model, which is nice, although the problem of how to verify such a model is challenge... This seems to be the real problem. Trust me, anyone can come up with a power law model. The challenge is figuring out how to show your model is actually right.

That is, first and foremost, the bursty nature of human activity is odd and, in that alluring voice only those fascinated by complex systems can hear, begs for an explanation. Second, a priority-based queueing process is merely one possible explanation (out of perhaps many) for the heaviness and burstiness. The emphasis is to point out that there is a real difficulty in nailing down causal mechanisms in human systems. often the best we can do is concoct a theory and see if the data supports it. That is, it is exceedingly difficult to go beyond mere plausibility without an overwhelming weight of empirical evidence and, preferably, the vetting of falsifiable hypotheses. The theory of natural selection is an excellent example that has been validated by just such a method (and continues to be). Unfortunately, simply looking at the response time statistics for email or letters by Darwin or Einstein, while interesting from the socio-historical perspective, does not prove the model. On the contrary: it merely suggests it.

That is, Barabasi's work demonstrates the empirical evidence (heavy-tails in the response times of correspondence) and offers a mathematical model that generates statistics of a similar form. It does not show causality, nor does it provide falsifiable hypotheses by which it could be invalidated. Barabasi's work in this case is suggestive but not explanatory, and should be judged accordingly. To me, it seems that the contention over the result derives partly from the overstatement of its generality, i.e., the authors claims their model to be explanatory. Thus, the argument over the empirical data is really just an argument about how much plausibility it imparts to the model. Had Barabasi gone beyond suggestion, I seriously doubt the controversy would exist.

Considering the issues raised here, personally, I think it's okay to publish a results that is merely suggestive so long as it is honestly made, diligently investigated and embodies a compelling and plausible story. That is to say that, ideally, authors should discuss the weakness of their model, empirical results and/or mathematical analysis, avoid overstating the generality of the result (sadly, a frequent problem in many of the papers I referee), carefully investigate possible biases and sources of error, and ideally, discuss alternative explanations. Admittedly, this last one may be asking a bit much. In a sense, these are the things I think about when I read any paper, but particularly when I referee something. This thread of thought seems to be fashionable right now, as I just noticed that Cosma's latest post discusses criteria for accepting or rejecting papers in the peer review process.

posted November 27, 2005 05:00 AM in Scientifically Speaking | permalink | Comments (0)

November 06, 2005

Finding your audience

Some time ago, a discussion erupted on Crooked Timber about the ettiquete of interdisciplinary research. This conversation was originally sparked by Eszter Hargittai, a sociologist with a distinct interest in social network analysis, who complained about some physicists working on social networks and failing to appropriately cite previous work in the area. I won't rehash the details, since you can read them for yourself. However, the point of the discussion that is salient for this post is the question of where and how one should publish and promote interdisciplinary work.

Over the better half of this past year, I have had my own journey with doing interdisciplinary research in political science. Long-time readers will know that I'm referring to my work with here, here and here). In our paper (old version via arxiv), we use tools from extremal statistics and physics to think carefully about the nature and evolution of terrorism, and, I think, uncover some interesting properties and trends at the global level. Throughout the process of getting our results published in an appropriate technical venue, I have espoused the belief that it should either go to an interdisciplinary journal or one that political scientists will read. That is, I felt that it should go to a journal with an audience that would both appreciate the results and understand their implications.

This idea of appropriateness and audience, I think, is a central problem for interdisciplinary researchers. In an ideal world, every piece of novel research would be communicated to exactly that group of people who would get the most out of learning about the new result and who would be able to utilize the advance to further deepen our knowledge of the natural world. Academic journals and conferences are a poor approximation of this ideal, but currently they're the best institutional mechanism we have. To correct for the non-idealness of these institutions, academics have always distributed preprints of their work to their colleagues (who often pass them to their own friends, etc.). Blogs, e-print archives and the world wide web in general constitute interesting new developments in this practice and show how the fundamental need to communicate ideas will co-opt whatever technology is available. Returning to the point, however, what is interesting about interdisciplinary research is that by definition it has multiple target audiences to which it could, or should, be communicated. Choosing that audience can become a question of choosing what aspects of the work you think are most important to science in general, i.e., what audience has the most potential to further develop your ideas? For physicists working on networks, some of their work can and should be sent to sociology journals, as its main contribution is in the form of understanding social structure and implication, and sociologists are best able to use these discoveries to explain other complex social phenomena and to incorporate them into their existing theoretical frameworks.

In our work on the statistics of terrorism, Maxwell and I have chosen a compromise strategy to address this question: while we selected general science or interdisciplinary journals to send our first manuscript on the topic, we have simultaneously been making contacts and promoting our ideas in political science so as to try to understand how to further develop these ideas within their framework (and perhaps how to encourage the establishment to engage in these ideas directly). This process has been educational in a number of ways, and recently has begun to bear fruit. For instance, at the end of October, Maxwell and I attended the International Security Annual Conference (in Denver this year) where we presented our work in the second of two panels on terrorism. Although it may have been because we announced ourselves as computer scientists, stood up to speak, used slides and showed lots of colorful figures, the audience (mostly political scientists, with apparently some government folk present as well) was extremely receptive to our presentation (despite the expected questions about statistics, the use of randomness and various other technical points that were unfamiliar to them). This led to several interesting contacts and conversations after the session, and an invitation to the both of us to attend a workshop in Washington DC on predictive analysis for terrorism that will be attended by people from the entire alphabet soup of spook agencies. Also, thanks to the mention of our work in The Economist over the summer, we have similarly been contacted be a handful of political scientists who are doing rigorous quantitative work in a similar vein as ours. We're cautiously optimistic that this may all lead to some fruitful collaborations, and ultimately to communicating our ideas to the people to whom they will matter the most.

Despite the current popularity of the idea of interdisciplinary research (not to be confused with excitement about the topic itself, which would take the form of funding), if you are interested in pursuing a career in it, like many aspects of an academic career, there is little education about its pitfalls. The question of etiquette in academic research deserves much more attention in graduate school than it currently receives, as does its subtopic of interdisciplinary etiquette. Essentially, it is this last idea that lays at the heart of Eszter Hargittai's original complaint about physicists working on social networks: because science is a fundamentally social exercise, there are social consequences for not observing the accepted etiquette, and those consequences can be a little unpredictable when the etiquette is still being hammered out as in the case of interdisciplinary research. For our work on terrorism, our compromise strategy has worked so far, but I fully expect that, as we continue to work in the area, we will need to more fully adopt the mode and convention of our target audience in order to communicate effectively with them.

posted November 6, 2005 01:15 PM in Simply Academic | permalink | Comments (1)

October 27, 2005

Links, links, links.

The title is perhaps a modern variation on Hamlet's famous "words, words, words" quip to Lord Polonius. Some things I've read recently, with mild amounts of editorializing:

Tim Burke (History professor at Swarthmore College) recently discussed (again) his thoughts on the future of academia. That is, why would it take for college costs to actually decrease. I assume this arises at least partially as a result of the recent New York Times article on the ever increasing tuition rates for colleges in this country. He argues that modern college costs rise at least partially as a result of pressure from lawsuits and parents to provide in loco parentis to the kids attending. Given the degree of hand-holding I experienced at Haverford, perhaps the closest thing to Swarthmore without actually being Swat, this makes a lot of sense. I suspect, however, that tuition prices will continue to increase apace for the time being, if only because enrollment rates continue to remain high.

Speaking of high enrollment rates, Burke makes the interesting point

... the more highly selective a college or university is in its admission policies, the more useful it is for an employer as a device for identifying potentially valuable employees, even if the employer doesn’t know or care what happened to the potential employee while he or she was a student.

This assertion belies an assumption about whose pervasiveness I wonder. Basically, Burke is claiming that selectivity is an objective measure of something. Indeed, it is. It's an objective measure of the popularity of the school, filtered through the finite size of a freshman class that the school can reasonably admit, and nothing else. A huge institution could catapult itself higher in the selectivity rankings simply by cutting the number of students it admits.

Barabasi's recent promotion of his ideas about the relationship between "bursty behavior" among humans and our managing a queue of tasks to accomplish continues to generate press. New Scientist and Physics Web both picked the piece of work on Darwin's, Einstein's and modern email-usage communication patterns. To briefly summarize from Barabasi's own paper:

Here we show that the bursty nature of human behavior is a consequence of a decision based queueing process: when individuals execute tasks based on some perceived priority, the timing of the tasks will be heavy tailed, most tasks being rapidly executed, while a few experience very long waiting times.

A.-L. Barabasi (2005) "The origin of bursts and heavy tails in human dynamics." Nature 435, 207.

That is, the response times are described by a power law with exponent between 1.0 and 1.5. Once again, power laws are everywhere. (NB: In the interest of full disclosure, power laws are one focus of my research, although I've gone on record saying that there's something of an irrational exuberance for them these days.) To those of you experiencing power-law fatigue, it may not come as any surprise that last night in the daily arXiv mailing of new work, a very critical (I am even tempted to say scathing) comment on Barabasi's work appeared. Again, to briefly summarize from the comment:

... we quantitatively demonstrate that the reported power-law distributions are solely an artifact of the analysis of the empirical data and that the proposed model is not representative of e-mail communication patterns.

D. B. Stouffer, R. D. Malmgren and L. A. N. Amaral (2005) "Comment on The origin of bursts and heavy tails in human dynamics." e-print.

There are several interesting threads imbedded in this discussion, the main one being on the twin supports of good empirical research: 1) rigorous quantitative tools for data analysis, and 2) a firm basis in empirical and statistical methods to support whatever conclusions you draw with aforementioned tools. In this case, Stouffer, Malmgren and Amaral utilize Bayesian model selection to eliminate the power law as a model, and instead show that the distributions are better described by a log-normal distribution. This idea of the importance of good tools and good statistics is something I've written on before. Cosma Shalizi is a continual booster of these issues, particularly among physicists working in extremal statistics and social science.

And finally, Carl Zimmer, always excellent, on the evolution of language.

[Update: After Cosma linked to my post, I realized it needed a little bit of cleaning up.]

posted October 27, 2005 01:23 AM in Thinking Aloud | permalink | Comments (0)

September 29, 2005

Networks in our nation's capital

This past week, I attended the Statistics on Networks workshop at the National Academies of Science in Washington DC, where I saw many familiar faces and many new ones. In particular, I was very happy to finally meet Jon Kleinberg, John Doyle, Steve Borgatti and my collaborator Dimitris Achlioptas. And it was nice to see Walter Willinger and Chris Wiggins again, both of whom I met at the MSRI workshop on networks earlier this year. And naturally, it was nice to see my collaborator Mark Newman again, even though we correspond pretty regularly. Now that I've distributed the appropriate linkage for the search engines, let me get on with my thoughts.

This workshop was interesting for a couple of reasons. First, the audience contained statisticians, social scientists, computer science/physics people, and engineers/biologists. Certainly the latter two groups presented very different perspectives on networks, with the former being interested in universality properties and random models of networks, while the latter was much more interested in building or decomposing a particular kind or instance of a network. The social scientists present (and there were many of them) seemed to have a nicely balanced perspective on the usefulness of random models, with perhaps a slight leaning toward the computer science/physics side. Naturally, this all made for interesting dinner and wrap-up discussion. For myself, my bias is naturally in the direction of appreciating models that incorporate randomness. However, it's true that when translated through a particular network model, randomness can itself generate structure (e.g., random graphs with power law degree distributions tend to have a densely connected core of high degree vertices, a structure that is a poor model for the core of the internet, where mixing is disassortative). In the case of real world networks, I think random models yield the most benefit when used to explore the space of viable solutions to a particular constraint or control problem. Eve Marder's work (also at the workshop) on small networks of self-regulating neurons (in this case, those of the lobster gut) is a particularly good example of this approach.

Second, although there were very few graduate students in attendance (I counted three, myself included), the environment was friendly, supportive and generally interesting. The workshop coordinators did a good job of inviting people doing interesting work, and I enjoyed just about all of the talks. Finally, it was interesting to see inside the National Academies a little. This institution is the one that fulfills the scientific inquiries of Congress, although I can't imagine this Congress listens to its scientists very much.

posted September 29, 2005 09:43 PM in Simply Academic | permalink | Comments (0)

August 30, 2005

Reliability in the currency of ideas

The grist of the scientific mill is publications - these are the currency that academics use to prove their worth and contributions to society. When I first dreamt of becoming a scientist, I rationalized that while I would gain less materially than certain other careers, I would be contributing to society in a noble way. But what happens to the currency when its reliability is questionable, when the noblesse is in doubt?

A recent paper in the Public Library of Science (PLoS) Medicine by John Ioannidis discusses "Why most published research findings are false" (New Scientist has a lay-person summary available). While Ioannidis is primarily concerned with results in medicine and biochemistry, his criticism of experimental design, experimenter bias and scientific accuracy likely apply to the broad range of disciplines. In his own words,

The probability that a research claim is true may depend on the study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field.

Ioannidis argues is that the current reliance upon the statistical significance p-value in only one direction, i.e., is the chance that the observed data is no different than the null hypothesis measured to be less than some threshold (typically, chance less than 1 in 20), is a dangerous precedent as it ignores the influence of research bias (from things such as finite-size effects, hypothesis and test flexibility, pressure to publish significant findings, etc.). Ioannidis goes on to argue that scientists are often careless in ruling out potential biases in data, methodology and even the hypotheses tested, and that replication by independent research groups is the best way of validating research findings as they constitute the most independent kind of trial possible. That is, confirming an already published result is at least as important as the original finding itself. Yet, he also argues that even then, significance may simply represent broadly shared assumptions.

... most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve.

In the field of complex systems, where arguably there is a non-trivial amount of pressure to produce interesting and, pardon the expression, universal results, Ioannidis's concerns seem particularly relevant. Without beating the dead horse of finding power laws everyone you look, shouldn't we who seek to explain the complexity of the natural (and man-made) world through simple organizing principles be held to exacting standards of rigor and significance? My work as a referee leads me to believe that my chosen field has insufficiently indoctrinated its practitioners as to the importance of experimental and methodological rigor, and of not over-generalizing or over-stating the importance of your results.

Ioannidis, J. P. A. (2005) "Why most published research findings are false." PLoS Med 2(8):e124

posted August 30, 2005 10:13 AM in Simply Academic | permalink | Comments (0)

July 26, 2005

Global patterns in terrorism; part III

Neil Johnson, a physicist at Lincoln College of Oxford University, with whom I've been corresponding about the mathematics of terrorism for several months, has recently put out a paper that considers the evolution of the conflicts in Iraq and Colombia. The paper (on arxiv, here) relies heavily on the work Maxwell Young and I did on the power law relationship between the frequency and severity of terrorist attacks worldwide.

Neil's article, much like our original one, has garnered some attention among the popular press, so far yielding an article at The Economist (July 21st) that also heavily references our previous work. I strongly suspect that there will be more, particularly considering the July 7th terrorist bombings in London, and Britain's continued conflicted relationship with its own involvement in the Iraq debacle.

Given the reasonably timely attention these analyses are garnering, the next obvious step in this kind of work is to make it more useful for policy-makers. What does it mean for law enforcement, for the United Nations, for forward-looking politicians that terrorism (and, if Neil is correct in his conjecture, the future of modern armed geopolitical conflict) has this stable mathematical structure? How should countries distribute their resources so as to minimize the fallout from the likely catastrophic terrorist attacks of the future? These are the questions that scientific papers typically stay as far from as possible - attempting to answer them takes one out of the scientific world and into the world of policy and politics (shark infested waters for sure). And yet, in order for this work to be relevant outside the world of intellectual edification, some sort of venture must be made.

posted July 26, 2005 12:57 AM in Scientifically Speaking | permalink | Comments (0)

May 21, 2005

The inter-disciplinary politics of interdisciplinary research or, "Hey, that was my idea first."

A few days ago, Eszter Hargittai posted a rant on the joint-blog Crooked Timber about the entre of physicists into the subfield in sociology of social networks and her perception of their contributing mostly nothing of value. Her entry was prompted by this paper about the EuroVision Contest. I learned about the entry first when she reproduced it on the social networking listserv SOCNET; a list on which I lurk mostly because I'm too cheap to pay the membership fee and also because I mainly use it as a way to collect journal references for sociology literature. References which I imagine to myself that I'll read or use one day, although given the poor job I'm currently doing at keeping up with the recent papers in my own field, I may realistically never get around to. (This point is salient, and I'll return to it momentarily.) In the ensuing and relatively lively debate in the post's comments section, someone called for and then received attention from friend Cosma Shalizi, who blogs his own thoughts on the subject in his usual lengthy, heavily cross-referenced and edifying way.

Several meta-commentary thoughts come immediately to mind:

1. Cosma's points are extremely thoughtful and are likely right on the money in terms of seeing the merits of both physicists contributions to social sciences and the argument of their reinvention of wheels. Most relevant to the rant about physicists not contributing anything of value to the field of social networks, he gives four excellent and broad examples of how physicists have added to our knowledge.

2. One of these points, which bears rehashing here, is that physicists are not just interested in social networks (it unfortunately illustrates the irony of the sociologists claims of academic injustice that this observation is abscent from their complaints). Physics training, and particularly that of statistical mechanics, the subfield that most physicists interested in social networks hail from, emphasizes that items of inquiry can, to as great an extent as possible, be treated as interchangeable. Thus, complex networks is the idea that social networks are just one kind of network. The progress physicists have made in carving out the field of complex networks has been somewhat spotty, perhaps because of their not knowing entirely how much of statistical mechanics to import and how much of a reliance on numerical simulation is reasonable (this touches on a related point, that there is not a firm consensus on how computational modeling and simulation should be incorporated into science to the same degree that theory and empiricism have been). If they have been arrogant toward other fields in their attempts to do this, then they should be chastised through letters to the editor of the journals that publish the offending articles. With regard to the EuroVision Contest article, Eszter Hargittai and Kieran Healy's best recourse is to write such a letter to Physica A illustrating that the work is not novel.

3. A point which Cosma omits in his list is connection to social network analysis, via complex network analysis, a large body of mathematical techniques from physics such as percolation theory (he does point out the contribution via network epidemiology), group renormalization, random graph theory, ideas of entropy and techniques for modeling dynamic systems. I may be wrong on these contributions, since I will easily admit that I don't read enough sociology literature. (Update: Cosma notes that sociologists and affiliated statisticians were familiar with Erdos-Renyi random graph theory before the physicists came along.)

4. There's a deeper issue at play here, which Cosma has also discussed (his prolificness is truly impressive, even more so given its high quality). Namely, that there are more physicists than there is funding (or interest?) for physics problems. While I was at Haverford, one of my physics professors told me, without a hint of a smile, that in order to get a job in traditional physics, you basically had to work at one of the national laboratories, work at a particle accelerator laboratory, or work in condensed matter physics. None of these seemed particularly appealing, yet the ideas and approaches of physics were. So, it is perhaps entirely expected that similar folks in my position eventually branch out into other fields. This is, after all, the nature of interdisciplinary research, and physicists (along with mathematicians and, to a lesser degree, chemists) seem particularly well-equipped for this kind of adventure. With the rising emphasis among both funding agencies and universities for interdisciplinary research (which may or may not be simply lip-service), the future likelihood of inter-disciplinary ego-bruising seems high.

5. Obviously, in any scientific endeavor, interdisciplinary or otherwise, scientists should understand the literature that came before (I dislike the term "master", because it implies an amount of time-commitment that I think few people can honestly claim to have spent with literature). In my recent referee work for Physical Review E, I have routinely chastised authors for not writing better introductions that leave a reader with a firm understanding of the context (past and present) in which the fundamental questions they seek to address sit. When it comes to interdisciplinary work, these problems are particularly acute; not only do you have multiple bodies of literature to quickly and succinctly review, but you must also do so in a way accessible to the members of the each field. Some (but, by no means, all) physicists are certainly guilty of this when it comes to writing about social networks, as they are prone to reinventing the wheel. The most egregious example of which is the preferential attachment model of Barabasi and Albert, but it can (and should) be argued that this reinvention was extremely valuable, as it helped spark a wide degree of interest in the previous work and has prompted some excellent work on developing that idea since. So, the fundamental question that I think all of we who claim to be interdisciplinary must face and ultimately answer (in a way that can be communicated to future generations of interdisciplinary researchers, many of whom are in college right now) is, What is the most principled and reasonable way, given the constraints on attention, energy, time, knowledge, intelligence, etc., to allocate proper recognition (typically via citations and coauthorships) to previous and on-going work that is relevant to some interdisciplinary effort?

Or, more succinctly, what's the most practical way to mitigate the inter-disciplinary politics of interdisciplinary research while encouraging it to the fullest extent possible? Closely related are questions about adequately evaluating the merit of research that does not fall squarely within the domain of a large enough body of experts for peer-review. As is the question of how academic departments should value interdisciplinary researchers and what role they should fill in the highly compartmentalized and territorial realm of academic disciplines.

Manual TrackBack: Three-toed Sloth

posted May 21, 2005 04:09 PM in Simply Academic | permalink | Comments (1)

April 21, 2005

Social and anti-social

Over the past week, I have attended a workshop in my field. The workshop is relatively small, although the number of people registered is over 100. What's interesting is the degree to which the social structure of the workshop resembles high school. At the core, you have the popular kids, who have known and worked with each other for years. Their primary interest is in seeing their friends and talking about possible new collaborations. Surrounding this inner-circle is a set of groupies, who know the popular kids, but don't quite have the common history to be considered part of it. Surrounding that group is a possibly larger one of people who are just beginning their social climb in this hierarchy. These are often graduate students, or people who are just moving in to the field.

In retrospect, this kind of hierarchy seems entirely natural, especially when you consider that smart/good people have limited time and likely want to work with other known smart/good people than spend the time cultivating new contacts of unknown quality. The trouble, of course, is that the casual preference for old friendships will tend to lead to perceived exclusivity. That is, if no effort is made to keep the social circles open, they will naturally close.

For a while now, I've been mulling over the assumptions and stereotypes of academia/research being either a social or anti-social endeavor. Some recent thoughts: While certainly there are parts of research that are extremely collaboratory, there's a great deal of it where you sit alone, thinking about something that few other people in the world are interested in. The peer-review process is, on the surface, fairly objective, yet the common single-blindedness of the review process makes it easy for reputation to substitute for quality. The job-search venue appears to be at least as much about who you know as about how good your work is - letters of introduction and reference from known people are often enough (or a requirement) to get a job in a specific field. This part would seem to make it harder for interdisciplinary people to get jobs in more traditional departments; something I'm slightly nervous about. And then, the conference world is largely run by pure social dynamics, with all the trappings of high school mixers, albeit obfuscated, unacknowledged or perhaps slightly ameliorated.

This is, of course, not to say that anyone is going to get a "swirly", or have their lunch money taken away from them. Academics are much too polite for that. But in the ultra-rational world of academia, there are certainly equivalents. I can't imagine that the business world is any better, and indeed, may be significantly worse. Perhaps this is just how human organizations operate: selfishly, irrationally and in a largely ad hoc manner...

posted April 21, 2005 12:34 PM in Simply Academic | permalink | Comments (1)

April 15, 2005

Academia trips over own hubris

It was only a matter of time before cheeky computer science students (from MIT, no less), perhaps inspired by the success of the ever witty and popular R. Robot's random blogging, have developed a tool for creating random computer science papers (text, graphs, and citations). One of these random papers was accepted at WMSCI 2005.

What is WMSCI? In the traditionally easy-to-understand language of conference mission statements, it is

an international forum for scientists and engineers, researchers and, consultants, theoreticians and practitioners in the fields of Systemics, Cybernetics and Informatics.

Obviously. Perhaps for an encore, the students should host a randomly generated conference. To add layer upon layer of hubris to the embarrassment, the conference organizers defended their acceptance of the random paper. Academia rarely gets so tangled in its own contradictions...

Update: Lance Fortnow has an interesting take on the random paper: it's equivalent to academic fraud. His readers, however, seem to think that the prank is more akin to a validation of the review process at the SCI conference (which the conference failed).

posted April 15, 2005 02:59 AM in Simply Academic | permalink | Comments (0)

March 05, 2005

Running a conference

For the past eight months, I've been heavily involved in organizing a "mini" conference within my department. Originally hatched as a way to get graduate students to talk to each other about their research (and similarly to make professors aware of research being done by other groups in our department), it was supposed to replace the long dead "graduate tea" series that used to fill the same role on a weekly basis. And so, myself and the other officers of the Computer Science Graduate Student Association decided to try to make this even as realistic a conference as possible, complete with a review committee, research talks, a poster session, a keynote address and all the trim.

After hundreds of hours of work, many meetings, a couple of free lunches (thank you CSGSA), the conference actually happened yesterday, Friday March 4th. We had 60+ attendees and 20+ presenters (about 10 talks, and 15 posters), a keynote address by Orran Krieger from IBM Research in New York, lots of free food courtesy of sponsorship from Sandia National Labs, and generally a really successful mini-conference. We even got a couple of nice emails from the faculty after-the-fact, thanking us for putting on the event. (About half of them showed up at some point (a few even stayed the entire day), and several sent nice apologies for not attending; it would have been nice to have seen all of them show, as a kind of voting-with-their feet support for students and their research. I guess you can't win them all...) Orran said something very nice about the conference while I was chatting with him before the keynote - he said that his graduate department at Toronto would never have had something like this, which brought together so many people from such divergent aspects of computer science. Perhaps we really did do something unusual.

Having been the general chair for the mini-conference, I can safely say that organizing one of these things is a highly non-trivial task. Duh. Mostly, the pain of doing it revolves around coordinating people, setting time-lines and doing basic logistics, since you rely on other people provide the content for the event. Being the general chair is a bit like being a potter - using only your hands, you have to mold a hunk of rapidly rotating wet clay (which basically wants to fly apart and get everything, including you, very messy) into a coherent, balanced and pleasing form, all before the water evaporates... :) For this kind of event, I'm very grateful that I'd done some things very similar in a previous life at Haverford College, when I was deeply involved in the Customs Program (a.k.a., the freshman orientation and residential advising program). It's definitely true that the more of this kind of thing you do, the easier the next one becomes. You're less scatter-brained, less fatigued, less frustrated, more likely to cover all the bases, more likely to manage the micro-crises that always pop-up, more likely to make good logistical decisions, etc. Yet, there is no part of me that wants to do this kind of thing for a living. It's fun occasionally, as a pleasant change of pace, but there is nothing so mind-numbing as logistics and endless massaging of egos to get things done.

In the next few months, I'll be attending both a high-powered workshop at the Mathematical Science Research Institute (MSRI) in Berkeley and the ACM Symposium on Theory of Computing (STOC) in Baltimore. I have much respect for the people who organize these large-scale events, since they can have hundreds of submissions, hundreds of attendees and budgets orders of magnitude larger than ours. But knowing myself, and my apparent complete inability to stay away from organizing things (indeed, I seem to have an almost compulsive desire to reshape my environment to suite my egotistical beliefs/desires), I'm a bit fearful for the day that I'll actually want to organize something so large!

But for now, it's nice to have another small line on my c.v., but more importantly to have added one more interesting life-experience to my history. Next on my list of life-experiences: a two week trip to Japan later this month.

posted March 5, 2005 02:37 PM in Simply Academic | permalink | Comments (1)

January 24, 2005

The Dark Underbelly

Fear and Loathing are not words that you typically associate with people engaged in research. Things like Serious and Measured, or even, for some people, Creative and Dramatic. I recently had a pair of extremely unpleasant experiences, in which the guilty, who shall remain nameless, exhibited all the open-mindedness and aplomb of a jealous and insecure thirteen year old. What on earth causes grown men, established academics no less, to behave like this?

Academic research, although it pretends to be a meritocracy, uses social constructs like reputation, affiliation and social-circles as a short-hand for quality. This is the heart of how we can avoid reading every paper or listening to every presentation with a totally open mind - after all, if someone has produced a lot of good work before, that's probably a pretty good indicator that they'll do it again. "The best predictor of the future is past behavior." Unfortunately, these social constructs eventually become themselves elements of optimization in a competitive system, and some people focus on them in lieu of doing good work. This, I believe, was the root cause of the overt and insulting hostility I experienced.

Ultimately, because everyone has a finite amount of time and energy, you do have to become more choosy about whom you collaborate with and what ideas you push on. But if everyone only ever did things that moved them "up" in these constructs, no one would ever work with anyone else. What's the point of being an intellectual if it's all turf wars and hostility? Shouldn't one work on things that bring pleasure instead of a constant stream of frustration over poor prestige or paranoia over being scooped? Shouldn't the whole point of being supported by the largess of society be to give as much back as possible, even if this means occasionally not being the most famous or not the guy who breaks the big news?

Maybe these guys don't, but I sure do.

posted January 24, 2005 10:58 AM in Rant | permalink | Comments (0)

January 21, 2005

Reality Distortion Fields

Charisma, they may call it. Jealously being their reaction, while their disdain becomes a weapon of their retribution. Such are the slings and arrows of being both successful and unconventional within academia.

Some people (and institutions) are naturally media hounds. They thrive on the attention and, in turn, the attention drives them toward generating more of the same. For people, we call this "drama" and them "drama queens", but for institutions, we don't for some reason. But you have to admire places like the MIT Media Lab, which consistently pursues a radical vision of the future, despite disdain from the more traditional (provincial?) halls of the academy. Unfortunately, this is no surprise considering America's long tradition of love-hate for the people that the famous Chiat/Day advertising campaign for Apple Computer hailed when it said "the people who are crazy enough to think they can change the world are the ones who usually do." The tech boom of the 1990s seemed to suggest a cultural détente between the forces of tradition and the forces of freakdom, but in the increasingly conservative environment of today, we seem less accommodating.

I have been here for a week now, soaking up the cultural vibe that splilleth over so copiously. Surrounded by passionate people, clashing colored facades, ubiquitously snaking computer cables and omnipresent flashing monitors, the Media Lab feels like a perpetual start-up company that never has to go public or grow into a curmudgeonly hierarchy. As I sit now in a third floor office attached to the Borg Lab (a.k.a. the wearable computing lab) , I think I have a sense of what makes this place special, what makes this place tick and why it both deserves and preserves the professional envy it receives. I remember that when I asked one of my professors at my alma mater about the Media Lab, which I was considering for graduate school, he demurred by saying that they were very creative people who often do pretty outlandish research.

Perhaps he didn't realize how accurate he was being - creative and outlandish are exactly what make the Media Lab unique, and exactly what attracts smart students and faculty bent on changing the world. Although they certainly do research, the pretty strange topics they explore could be more accurately described as "creative engineering".

With an emphasis on demo-able projects that can be shown-off to the corporate sponsors who keep the Lab flush with money, it's natural that there is both a degree of competition as to who can have the most flashy demo, and a natural drive toward creating the applications of technology that will define the future. Truly, the Media Lab is an outsourced research and development center, primed with the passions and ambitions of smart people in love with the possibility of changing the world through technology.

posted January 21, 2005 02:00 AM in Simply Academic | permalink | Comments (0)

January 16, 2005

The Democratization of the Academy

While news surfing the Web today, I came across an article on Slate about the decline of the real prestige that an Ivy League education garners within the business world. The article builds off of a recent paper by two Wharton School economists who chart the decline in the number of Ivy League degrees among the business executives in the Fortune 100 over the last 20 years. Although the Slate article is interesting, the paper itself yields some great insights:

"In 2001, ... executives were younger, more likely to be women, and less likely to have been Ivy League educated. Most important, they got to the executive suite about four years faster than in 1980 and did so by holding fewer jobs on the way to the top. (In particular, women in 2001 got to their executive jobs faster than their male counterparts -- there were no women executives in the Fortune 100 list in 1980)."

Although I'm less concerned in general with the business world side of this discussion, it closely mirrors an issue which sometimes seems painfully important to me as a graduate student at a public university that is not considered to be an elite institution. If the business world data supports an ending of the Ivy League hegemony, then one may wonder if the same is also happening within academia itself. Is the meritocratic, yet oddly idealistic dream coming true that one's worth in the academy will be based wholly on the work one has produced and not based on either the institution's name attached to one's resume?

Somehow, I don't think news of this revolution has reached the ears of the hiring committees at the elite institutions, but I'll leave that discussion for another entry. In a narcissistic article published in Physical Review, covered for popular consumption by the New York Times, documents the rise of scientific publications and Nobel prize winners coming from outside the U.S. The self-absorbed U.S. media reported this observation negatively, as being representative of the diminishing pre-eminence of U.S. science. I viewed it more optimistically: it would seem that the world community is becoming more active in science and that we may, in fact, be witnessing the forces of democracy assaulting the ivory towers themselves.

But what are the prospects of a talented, but non-prestigous degree-bearing post-graduate? My advisor frequently tries to deflect my concern about such prospects, saying that in the past 20 to 30 years, a significant trend in academia has been gaining momentum.

During this time, he sagely counsels me, a lot of great people have ended up at places that used to be not so great. And now, it's not so important where you went as much as who you worked for and what you produced.

In support of this egalitarian sentiment, when I served on the faculty search committee in my department in Spring 2004, I observed something surprisingly hopeful. Something which I can only hope is an ascendent practice among hiring committees, although given my own previous experience at a prestigous institution, I'm not sure the forces of democracy have done much to assail bastions of the elite. When we on the search committee looked at a candidate's resume, if they graduated from an elite institution, we applied more strict standards, and generally, considered the list of publications to be paramount to their value.

"Given that they had all these resources available to them, what did they do with their time?", we asked.

"This person was in a really good lab at a really good school, but look at this small/weak publication list".

"This person has great publications," someone would say, without ever mentioning the school they went to.

So, despite occasional bouts of prestige-envy of my fellows at MIT, Yale, Columbia, Berkeley and Stanford, I now nurture the slight optimism that the academy may be maturing into the meritocratic utopia that it pretends to be. Of course, the competing trends of the corporatization of universities and the down-conversion of tenure track positions to part-time adjunct positions may mean this positive note is ultimately squelched before it can become widespread.

posted January 16, 2005 12:35 AM in Simply Academic | permalink | Comments (0)