Proposal to Implement Community Detection Algorithms in NetworkX

Synopsis

NetworkX is a powerful network analysis toolkit for the Python programming language. While NetworkX has a very extensive algorithmic base, it has few community detection algorithms. This project would seek to complete several partially complete algorithms currently on the NetworkX development site, implement several new algorithms including Modularity Maximization algorithms, the Kernighan-Lin algorithm, Spectral Partitioning, as well as any additional algorithms identified by the community as essential to the NetworkX code base.

Project Detail

Background

The identification of communities within complex networks is a popular topic of research. This has lead to the proliferation of algorithms to detect communities in all manner of complex networks (see Fortunato "Community detection in graphs" Phys. Rep. 486, 75-174, 2010 and Freeman "Finding social groups" in Dynamic Social Network Modeling and Analysis, 2003 for a review).

New Algorithms

This project would implement a number of new algorithms, which fall under a variety of techniques.

Bisection Techniques

Some of the simplest methods of community detection involve the partitioning of the graph into two separate communities. As a part of this project I would implement two of the more well known bisection algorithms, the Kernighan-Lin algorithm and the spectral partitioning method of Fiedler.

Modularity and Spin Glass Techniques

Newman and Girvan introduced the idea of modularity as a measure of the quality of a community partition of a graph, and a number of modularity maximization techniques have been developed for community detection. I would propose to implement the basic greedy modularity maximization algorithm as well as the spectral modularity maximization algorithm. Variations to the modularity measure itself which account for small communities in large networks, would also be included.

A generalization of modularity maximization which uses ideas analogous to finding the ground state in ising spin glass systems was introduced by Reichardt and Bornholdt. My implementation of this algorithm would include the general community finding technique for a whole network, as well as a method for finding the community identification of a single node as described in the literature.

Other Techniques

Because these algorithms may not fulfill all the needs of the NetworkX community, this project will also seek to identify additional algorithms that are in high demand. This may include Block Model style algorithms, k-means clustering algorithms, or betweenness centrality algorithms.

Currently Open Tickets

Ticket #239, #158, and #245, on the NetworkX Developer site all contain code for community detection algorithms in various states of completion. As a part of this proposal the work needed to close these tickets would be completed.

  • Ticket #239 contains a modified modularity maximization algorithm, and simply requires the addition of tests and examples.
  • Ticket #245 contains a modified modularity maximization algorithm that contains a bug, and requires testing, examples and documentation.
  • Ticket #158 contains code for yet another modularity maximization algorithm, but has been reported to be slow and incorrect and requires further documentation and testing.

Implementation of Algorithms

All algorithms will be implemented in Pure Python, following with NetworkX's current standard. In addition to correct implementations of each algorithm, clear documentation with references to literature which gives concise descriptions of the algorithms, and unit tests which exercise key features of the algorithms will be included. Additionally, I will ask the NetworkX community for help in testing algorithms as there development progresses.

Mentors

Here is a list of potential mentors and their GSoC mentor IDs:

  • Aric Hagberg(ahagberg)
  • Dan Schult(dschult)

Benefits for NetworkX

Community detection algorithms are among the most frequently requested algorithms on the NetworkX mailing list, and implementing a number of them would make NetworkX a more attractive choice for complex networks research.

Success Criteria

  1. Close three currently open community detection algorithm tickets
  2. Implement above listed algorithms
  3. Have all new functionality include thorough documentation, tests, and examples.

Project Time line

Pre-Coding
Work with NetworkX community to identify any algorithms not already planned to be implemented. Begin collecting literature describing the algorithms, and any existing data sets and code for testing.
Week 1-2
Implement Kernighan-Lin algorithm and spectral bisection along with appropriate, documentation, tests and examples.
Week 3-4
Complete work to close 2 of the 3 open community detection tickets.
Week 5-8
Implement all Modularity Maximization algorithms, including documentation, tests, and examples.
Week 9-11
Close final open community detection ticket, implement any additional algorithms deemed essential by NetworkX community.
Week 12
Complete any remaining documentation and bug fixes, integrate patches into the NetworkX repository.

Biography

Personal History

I received by bachelor's degree in Mathematics and Computer Engineering in 2006 from the South Dakota School of Mines and Technology. I am currently seeking a PhD in Computer Science from the University of New Mexico under Stephanie Forrest. My current research focuses on evaluating Internet growth using agent based models. I am also interested in the structure of social networks and how demographic processes create small world social networks, as well as graph models embedded in metric spaces.

NetworkX

I have used Python and NetworkX extensively in my research. I became involved in the NetworkX project in the Summer of 2010, and in the past year have made several contributions which are in the current codebase: Tickets #356, #323, #357, #375, and #388. I also have contributed code in several pending tickets and discussions (Tickets #378, #359, #345, #390, #387, #371, #396, #533, #360, #395, and #355). This makes me very familiar with the codebase, and able to quickly develop new functionality for NetworkX.

Python

In addition to NetworkX I use Python and associated libraries extensively in my research, as it allows for quick manipulation and analysis of data in a variety of formats. I am familiar with scipy, numpy, matplotlib, and a number of other packages. I have produced a number of useful libraries as well as collected several functions written by others (some of which I have modified) that are very useful and are available in a library called python_lib.

Contact

Ben Edwards

bedwards <at> cs <dot> unm <dot> edu

Farris Engineering Center 355d

University of New Mexico

Albuquerque, NM 87131

This document is created using Org-mode and Org-babel. The original plain-text document is available at ben-edwards.org (preview).