Design and Analysis of Algorithms (CSCI 5454)
Returning from paternity leave last semester, it is back into the fire this semester with my large graduate-level algorithms class, Design and Analysis of Algorithms (CSCI 5454). This year, I am again shocked to find that it is the largest grad class in my department, with 60 enrolled students and 30 students on the waiting list. In the past, enrollment churn at the beginning of the semester  has been high enough that everyone on the wait list has had an opportunity to take the class. I am not sure that will be true this year. This year will also be the last time I teach it, for a while.
My version of this course covers much of the standard material in algorithms, but with a few twists. Having tinkered considerably with the material a year ago, I'm going to tinker a little less this year. That being said, improving is a never-ending process and there are a number of little things on the problem sets and lectures that I want to try differently this time. The overall goals remain the same: to show students how different strategies for algorithm design work (and sometimes don't work), to get them thinking about boundary cases and rigorous thinking, to get them to think carefully about whether an algorithm is correct or not, and to introduce them to several advanced topics and algorithms they might encounter out in industry.
For those of you interested in following along, I will again be posting my lecture notes on the class website.
 Caused in part by some students enrolling thinking the class would be easy, and then dropping it when they realize that it requires real effort. To be honest, I am okay with this, although it does make for scary enrollment numbers on the first day.
2012: a year in review
This is probably it for the year, so here's a look back at 2012, by the numbers.
Papers published (or accepted): 4 (the pipeline is moving again)
Pre-prints posted on the arxiv: 5
Other publications: 0
Papers currently under review: 3
Manuscripts near completion: 7
Rejections: 9 (includes rejection without review)
New citations to past papers: 1495 (+20% over 2011)
Projects in-the-works: too many to count
Half-baked projects unlikely to be completed: already forgotten
Papers read: >202
Research talks given: 9
Invited talks: 8
Visitors hosted: 1
Conferences, workshops organized: 0
Conferences, workshops, summer schools attended: 4
Number of those at which I delivered a research talk: 3
Number of times other people have written about my research: >11
Number of times Nate Silver wrote about my research: 1 (cool)
Number of interviews about my research: 1
Students advised: 13 (7 PhD, 1 MS, 3 BS; 2 rotation student)
Students graduated: 1 MS
Thesis/dissertation committees: 6
Number of recommendation letters written: 10
Summer school faculty positions: 2
University courses taught: 1 (repeated)
Students enrolled in said courses: 51 grad
Number of problems assigned: 85
Pages of student work graded: 5282 (roughly 103 pages per student, with 2 graders)
Number of class-related emails received: >1174
Number of conversations with the faculty honor council liaison: 0
Guest lectures for colleagues: 0
Proposals refereed for grant-making agencies: 15
Manuscripts refereed for various journals, conferences: 16
Words written in per report: 1528 (average)
Referee requests declined: 50 (-4% over 2011)
Conference program committees: 1
Grant proposals submitted: 6 (totaling $2,212,343)
Proposals rejected: 2
New grants awarded: 3 (totaling $1,379,260)
Proposals pending: 3
New proposals in the works: 2
Emails sent: >8061 (+9% over 2011)
Emails received (non-spam): >15503 (+18% over 2011)
Fraction about work-related topics: 0.89 (same as 2011)
Emails received about power-law distributions: 146 (3 per week)
Unique visitors to professional homepage: 31,000
Hits overall: 79,000
Fraction of visitors looking for power-law distributions: 0.63 (wow)
Unique visitors to blog: 11,500
Hits overall: 18,000
Most popular blog post among those visitors: Our ignorance of intelligence (from 2005)
Blog posts written: 14 (-30% from last year)
Most popular 2012 blog post: A crisis in higher education?
Number of twitter accounts: 1, my first (I blame peer pressure)
Retweets: 330ish (remarkably)
New followers on Twitter: >346 (astonishingly)
Number of computers purchased: 1
Movies/shows via Netflix: 32 dvds, 100 instant
Books purchased: 11
Songs added to iTunes: 148
Photos added to iPhoto: 874
Jigsaw puzzle pieces assembled: >5,000
Major life / career changes: 2 (see next two entries)
Houses purchased: 1
Babies: 1, naturally born
Semesters of paternity leave: 1
Photos taken of baby so far: >683 (about 5 per day)
Fun trips with friends / family: 9
Half-marathons completed: 0
Trips to Las Vegas, NV: 1
Trips to New York, NY: 0
Trips to Santa Fe, NM: 7
States visited (in the US): 4
Foreign countries visited: 1 (Germany)
Other continents visited: 1
Airplane flights: 22
Here's to a great year, and hoping that 2013 is even better.
BioFrontiers is hiring
And, in more hiring news, the BioFrontiers Institute here at the University of Colorado Boulder is hiring new tenure-track faculty this year. One of the great things about this hiring line is that the candidate basically gets to select which of nine departments (including Computer Science, Physics, Applied Math., and a variety of biology departments) they want to call home. Another is that there are genuinely interesting and exciting things happening in research and education within it. In short: it's a great place to work.
We invite applications for a tenure-track faculty position from candidates seeking to develop an innovative research program that addresses significant problems in biology or medicine at the interface with the physical sciences, mathematics, and/or computer science. Researchers in the area of biological engineering are asked to look for an advertisement next year, when we anticipate searching for faculty in that area.
Be an Omidyar Fellow at the Santa Fe Institute
Having spent four years at SFI as an Omidyar Fellow before coming to the University of Colorado, Boulder, I feel comfortable asserting that this fellowship is one of the best in higher education: it provides several years of full funding to work on your own big ideas about complex systems and to collaborate with other SFI researchers.
I also feel comfortable saying that SFI tends to prefer young scholars who are already fairly independent, as being an Omidyar Fellow is a lot like being a junior faculty member, except without the teaching responsibilities. No one tells you what to work on. If you have your own interests and your own ideas, this is ideal. SFI also tends to prefer young scholars with a strong quantitative background. If you plan to apply, I recommend writing a fresh research statement (rather than recycling the one you send to university positions) that focuses on your ideas about complex systems and your independent research plans for the next few years.
To help explain just how great a position it is, SFI recently put together a series of short videos, which you can find here.
Deadline: November 1
Wanting to hire
Postdoctoral Fellowship in Study of Networks
Along with Cris Moore, I am looking to hire a postdoc in the area of complex networks and statistical inference. There are two such positions available, one located at the Santa Fe Institute (working with Cris) and one at the University of Colorado Boulder (working with me). Both are funded by a grant from DARPA to investigate the use of generative models to perform statistical inference in complex networks. The larger team includes Mark Newman at the University of Michigan, and there will be ample opportunity to travel and collaborate among the three institutions.
The grant has a particular emphasis on community detection methods, including methods for detecting changes in community structure in dynamic graphs; functional groups that are not merely "clumps" of densely connected nodes; predicting missing links and identifying spurious ones; building on incomplete or noisy information about the network, generalizing from known attributes of nodes and edges to unknown ones; and identifying surprising or anomalous structures or events.
If you are interested in applying, or know someone who has a strong quantitative background in physics, statistics or computer science, see the application information.
The application deadline is 13 January 2013 with an anticipated start date of May or August 2013.
Onward, upward (2012 edition)
Long-time readers will have noticed the distinct lack of blog activity over the past few months. I decided to take the summer off from blogging in order to focus on finishing or pushing along as many projects as possible (like this one, about "How large should whales be?") before the arrival of my daughter Parker Grace Clauset, who was naturally born on 31 July 2012. I am very proud of her momma and am excited to embark on this new and profound journey. Regular blogging will resume shortly.
If it disagrees with experiment, it is wrong
The eloquent Feynman on the essential nature of science. And, he nails it exactly: science is a process of making certain types of guesses about the world around us (what we call "theories" or hypotheses), deriving their consequences (what we call "predictions") and then comparing those consequences with experiment (what we call "validation" or "testing").
Although he doesn't elaborate them, the two transitions around the middle step are, I think, quite crucial.
First, how do we derive the consequences of our theories? It depends, in fact, on what kind of theory it is. Classically, these were mathematical equations and deriving the consequences meant deriving mathematical structures from them, in the form of relationships between variables. Today, with the rise of computational science, theories can be algorithmic and stochastic in nature, and this makes the derivation of their consequences trickier. The degree to which we can derive clean consequences from our theory is a measure of how well we have done in specifying our guess, but not a measure of how likely our guess is to be true. If a theory makes no measurable predictions, i.e., if there are no consequences that can be compared with experiment or if there is no experiment that could disagree with the theory, then it is not a scientific theory. Science is the process of learning about the world around us and measuring our mistakes relative to our expectations is how we learn. Thus, a theory that can make no mistakes teaches us nothing. 
Second, how do we compare our predictions with experiment? Here, the answer is clear: using statistics. This part remains true regardless of what kind of theory we have. If the theory predicts that two variables should be related in a certain way, when we measure those variables, we must decide to what degree the data support that relation. This is a subtle point even for experienced scientists because it requires making specific but defensible choices about what constitutes a genuine deviation from the target pattern and how large a deviation is allowed before the theory is discarded (or must be modified) . Choosing an optimistic threshold is what gets many papers into trouble.
For experimental science, designing a better experiment can make it easier to compare predictions with data , although complicated experiments necessarily require sensitive comparisons, i.e., statistics. For observational science (which includes astronomy, paleontology, as well as many social and biological questions), we are often stuck with the data we can get rather than the data we want, and here careful statistics is the only path forward. The difficulty is knowing just how small or large a deviation is allowed by your theory. Here again, Feynman has something to say about what is required to be a good scientist:
I'm talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you are maybe wrong, that you ought to have when acting as a scientist. And this is our responsibility as scientists...
This is a very high standard to meet. But, that is the point. Most people, scientists included, find it difficult to be proven wrong, and Feynman is advocating the active self-infliction of these psychic wounds. I've heard a number of (sometimes quite eminent) scientists argue that it is not their responsibility to disprove their theories, only to show that their theory is plausible. Other people can repeat the experiments and check if it holds under other conditions. At its extreme, this is a very cynical perspective, because it can take a very long time for the self-corrective nature of science to get around to disproving celebrated but ultimately false ideas.
The problem, I think, is that externalizing the validation step in science, i.e., lowering the bar of what qualifies as a "plausible" claim, assumes that other scientists will actually check the work. But that's not really the case since almost all the glory and priority goes to the proposer not the tester. There's little incentive to close that loop. Further, we teach the next generation of scientists to hold themselves to a lower standard of evidence, and this almost surely limits the forward progress of science. The solution is to strive for that high standard of truth, to torture our pet theories until the false ones confess and we are left with the good ideas .
 Duty obliges me to recommend Mayo's classic book "Error and the Growth of Experimental Knowledge." If you read one book on the role of statistics in science, read this one.
 A good historical example of precisely this problem is Millikan's efforts to measure the charge on an electron. The most insightful analysis I know of is the second chapter of Goodstein's "Fact or Fraud". The key point is that Millikan had a very specific and scientifically grounded notion of what constituted a real deviation from his theory, and he used this notion to guide his data collection efforts. Fundamentally, the "controversy" over his results is about this specific question.
 I imagine Lord Rutherford would not be pleased with his disciples in high energy physics.
 There's a soft middle ground here as to how much should be done by the investigator and how much should be done by others. Fights with referees during the peer-review process often seem to come down to a disagreement about how much and what kind of torture methods should be used before publication. This kind of adversarial relationship with referees is also a problem, I think, as it encourages authors to do what is conventional rather than what is right, and it encourages referees to nitpick or to move the goal posts.