News Archives

RSS Feed

  • UNM
  • >Home
  • >News
  • >2008
  • >February
  • >[Colloquium] Tree-based Overlay Networks for Scalable, Reliable Tools and Applications

[Colloquium] Tree-based Overlay Networks for Scalable, Reliable Tools and Applications

February 26, 2008

Watch Colloquium: 

  • Date: Tuesday, February 26, 2008 
  • Time: 11 am — 12:15 pm 
  • Place: ME 218

Dorian Arnold
Computer Sciences Department
University of Wisconsin-Madison

Abstract: HPC systems continue to grow in size and complexity making the development of scalable software systems increasingly difficult. As a result, very few tools and applications run effectively or at all at today’s largest scales (tens and hundreds of thousands of processors). To make matters worse, million processor systems are scheduled for availability within the next two to four years.

Tree-based Overlay Networks (TBONs) have proven to be an effective computational model for scalable distributed tools and applications. A TBON is a network of hierarchically organized processes that exploits the logarithmic scaling properties of trees to provide scalable data multicast, gather, and in-network aggregation. In this talk, I will describe the TBON model, demonstrating its power and flexibility with unprecedented scalability results from a variety of application domains. I also will describe our novel TBON failure recovery model, state compensation, which relies on inherent information redundancies amongst TBON processes. State compensation features fast, decentralized tree reconstruction and state recovery protocols involving a small subset of the tree and no process coordination. The protocols are scalable because their performance is a function of the tree’s fan-out, not total size. A tree with a fan-out of 64 recovers from failures in milliseconds: with only four levels, such a tree supports over a 16,000,000 processes!

Bio: Dorian Arnold is a doctoral candidate and Intel Foundation Ph.D. fellow in the Computer Sciences Department at the University of Wisconsin. He holds a M.S. degree in Computer Science from the University of Tennessee and a B.S. degree in Mathematics and Computer Science from Regis University (Denver, CO). From 1999 to 2001, Dorian served as technical lead of the NetSolve project at the University of Tennessee’s Innovative Computing Laboratory. In 2006, Dorian was a technical scholar at Lawrence Livermore National Laboratory. His research focuses on the performance and scalability issues of large distributed systems including efficient communication and runtime data analysis, fault-tolerance, and system deployment.