News Archives

Fault-Localization in Distributed Resource Allocation

April 29, 2004

Date: Thursday April 29, 204
Time: 11am-12:15pm
Location: Woodward 149

Scott Pike <pike@aya.yale.edu>
Department of Computer Science and Engineering Ohio State University

Abstract: Ideally, faults should be isolated within small, local neighborhoods of impact. Failure locality quantifies impact as the radius of the largest set of processes disrupted by a given fault. The locality radius of a distributed algorithm demarcates a halo beyond which faults are masked. As such, fault-local algorithms are central to engineering survivable distributed systems, because they protect against cascading and epidemic failures. My work makes theoretical and practical contributions to fault-localization for the generalized dining philosophers problem, subject to crash failures. The optimal crash locality for synchronous dining is 0, but this metric degrades to 2 for asynchronous dining. My work resolves the apparent complexity gap by constructing the first known dining algorithms to achieve crash locality 1 under partial synchrony. The software-engineering context of my approach consists in augmenting existing dining algorithms with unreliable failure detection oracles. My extended results characterize optimal locality bounds for every detection oracle in the Chandra-Toueg hierarchy with respect to four practical models of mutual exclusion. Analysis of the resulting lattice identifies the weakest detector for solving each dining problem, and discovers two points of discontinuity indicating unresolved complexity gaps.

Bio: Scott Pike is a Doctoral Candidate in Computer Science & Engineering at Ohio State University. He received his M.S. in Computer Science from Ohio State in 2000, and his B.A. in Philosophy from Yale University in 1996. His research interests focus on software engineering and distributed computing, and, more concretely, on scalable approaches to building agile, adaptive, and survivable components for distributed systems.