Thesis and Dissertation Defenses
A Hierarchical Group Model for Programming Sensor Networks
Date: Monday, September 15th, 2008
Time: 10:00AM
Place: FEC 141
James Horey
Committee: Arthur Maccabe, chair Stephanie Forrest, Patrick Bridges, Wennie Wei Shu
Abstract:
A hierarchical group model that decouples computation from hardware can characterize and aid in the construction of sensor network software with minimal overhead. Future sensor network applications will move beyond static, homogeneous deployments to include dynamic, heterogeneous elements. These sensor networks will also gain new users, including casual users that will expect intuitive interfaces to interact with sensor networks. In order to address these challenges, a new computational model and systems implementing the model are necessary. This model must ensure that computation can be readily (re)assigned as sensor nodes are introduced or removed. Simultaneously, the model must include methods for communication that take into account these dynamic elements.
This dissertation presents a detailed description and design of a computational model that resolves these challenges using a hierarchical group mechanism. In this model, computation is tasked to logical groups and split into collective and local components that communicate hierarchically. Local computation is primarily used for data production and publishes data to the collective computation. Similarly, collective computation is primarily used for data aggregation and pushes results back to the local computation. Finally, the model also includes data-processing functions that sit between local and collective functions and are responsible for data conversion.
This dissertation also presents implementations and applications of the model. Implementations include Kensho, a C-based implementation of the hierarchical group model, that can be used for implementing a variety of user applications. Another implementation, Tables, presents a spreadsheet-inspired view of the sensor network that takes advantage of hierarchical groups for both computation and communication. Users are able to specify both local and collective functions that execute on the sensor network via the spreadsheet interface. Besides these implementations, applications of the model are also explored. One such application, FUSN, provides a set of methods for constructing filesystem-based interfaces for sensor networks. This demonstrates the general applicability of our model as applied to sensor network programming and management interfaces. Finally, we apply our model to novel privacy algorithms to demonstrate that our model isn't strictly limited to programming interfaces.
Analysis and Inference of Resource Usage Information
Date: Thursday, August 29th, 2008
Time: 8:00AM
Place: FEC 141
Jorge A. Navas
Committee : George Luger, Manuel Hermenegildo, Deepak Kapur, Stephanie Forrest
Static analysis is a powerful technique used traditionally for optimizing programs, and, more recently, in tools to aid the software development process, and in particular, in finding bugs and security vulnerabilities. More concretely, the importance of static analyses that can infer information about the costs of computations is well recognized since such information is very useful in a large number of applications.
Furthermore, the increasing relevance of analysis applications such as static debugging, resource bounds certification of mobile code, and granularity control in parallel computing makes it interesting to develop analyses for resource notions that are actually application-dependent. This may include, for example, bytes sent or received by an application, number of files left open, number of SMSs sent or received, energy consumption, number of accesses to a database, etc.
In this thesis, we present a resource usage analysis that aims at inferring upper and lower bounds on the cost of programs as a function of its data size for a given set of user-definable resources of interest. We use logic programming as our basic paradigm since it subsumes many others and this allows us treating the problem at a considerable level of generality.
Resource usage analysis requires various pre-analysis steps. An important one is the Set-Sharing analysis which attempts to detect mode information and which variables do not point to the same memory location, providing essential information to the resource usage analysis. Hence, this thesis also investigates the problem of efficient Set-Sharing analyses presenting two different alternatives: (1) via widening operators, and (2) defining compact and effective encodings.
Moving to the area of applications, a very interesting class involves certification of the resources used by mobile code. In this context, Java bytecode is widely used, mainly due to its security features and the fact that it is platform independent. Therefore, this thesis finally presents a resource usage analysis tool for Java bytecode that includes also a transformation which provides a block-level view of the bytecode, and can be used as a basis for developing analyses. We have also developed for this purpose, a generic, abstract interpretation-based fixpoint algorithm which is parametric in the abstract domain. By plugging appropriate abstract domains into it, the framework provides useful information that can improve the accuracy of the resource usage information.
Modeling the Heap: A Practical Approach
Date: Wednesday, August 28th, 2008
Time: 3:30 pm
Place: FEC 141
Mark Marron
Committee: Deepak Kapur, Co-Chair; Dark Stefanovic, Co- Chair; Manuel Hermenegildo; Rupak Majumdar
The ability to accurately model the evolution of the program heap during the execution of a program is critical for many types of program optimization and verification techniques. Past work on the topic of heap analysis has identified a number of important heap properties (connectivity, shape, heap based dependence information, and region identification) that are used in many client applications (parallelization, scheduling, memory management) and has explored a range of approaches for computing this information. However, past work has been unable to compute this information with the required level of accuracy and/or has not been computationally efficient enough to make the analysis practical. The inability to compute the required heap information has limited the scope or effectiveness of many branches of research (the inability to compute the required heap information made the optimization technique impractical or severely limited its effectiveness).
A general--purpose heap analysis technique is proposed. The major objective in the construction of this is to provide the range of heap information needed by the optimization applications stated above, on real world programs, in a computationally tractable manner. Keeping tractability in mind, a set of properties/information required by client optimization applications, e.g., automatic parallelization, classical scheduling, redundancy elimination transformation, stack allocation, pool allocation, object co--location or static garbage collection, is proposed. A set of design heuristics are selected to ensure that the analysis is fast and precise in the common case, and that ensure good computational performance by reducing precision in less common situations.
This general approach allows the construction of an analysis that satisfies a number of key requirements for producing a practical heap analysis: The ability to handle most of the Java 1.4 language including major language features; arrays, virtual calls, interfaces, exceptions, etc., as well as the standard Java libraries. No restrictions are placed on the heap structures that are constructed; no restrictions on the recursive data structures built or how these structures are connected/shared. The analysis is able to provide precise information needed for a wide range of optimization applications; aliasing, connectivity, shape, identification of logical data structures and heap carried data dependence information. Finally, that in practice this information can be computed efficiently.
The technique has been implemented and used on a range of small to medium size benchmarks from the JOlden and SPECjvm suites as well as a number of benchmarks that were developed during this work. The analysis technique is evaluated using a number of detailed case studies which focus on the heap structures that the analysis is able to identify in the program and how this information can be used in a variety of optimization techniques (focusing on vectorization/loop scheduling, thread-level parallelization and memory management). Most of these benchmarks are naturally amenable to thread--level parallelization, thus as a more empirical evaluation of the heap analysis results a detailed evaluation of the analysis is done using the results to drive the parallelization of the benchmark programs, resulting in a substantial speedup over the benchmark suites.
The runtime costs of the analysis are evaluated both in terms of the total analysis time and the general scalability of the analysis. The results of this evaluation indicate that the technique presented in this thesis is indeed a general purpose solution to the heap analysis problem (at least for small/medium size programs) that can be used to efficiently compute precise and useful information for program optimization over a wide range of real world Java programs.
A Generic Approach to Bytecode Analysis
Date: Wednesday, August 28th, 2008
Time: 9:00 am
Place: FEC 141
Mario Mendez
Committee: Manuel Hermenegildo, Deepak Kapur, Darko Stefanovic, German Puebla, George Luger
Analysis of the Java language (either in its source version or its compiled bytecode) using the framework of abstract interpretation has been the subject of significant research in the last decade. Most of this research concentrates on finding new abstract domains that better approximate a particular concrete property of the program analyzed in order to optimize compilation or statically verify certain properties about the run-time behavior of the code. In contrast to this concentration and progress on the development of new, refined domains, there has been comparatively little work in the development of extensible, generic frameworks that sustain the analysis. The first component in such a generic framework is a standard representation of the program that facilitates later analyses or optimizations. Although many times the description of that Control Flow Graph is omitted, we show that a uniform, compact representation is fundamental in order to manipulate similar constructions of the language in a uniform way. The Horn clause-based representation chosen is general enough to represent not only object-oriented programs, but also logic programming applications.
In the context of the abstract interpretation framework, the fixpoint algorithm that lies at its very core has been demonstrated to have dramatic impact in the efficiency and precision of any analysis. A particularly optimal combination of the two attributes can be achieved by a flow-sensitive, context-sensitive fixpoint algorithm. We present a detailed description of such an algorithm, which contemplates mutually recursive nodes in the Control Flow Graph and uses memoization for further efficiency.
Generic abstract interpretation frameworks work in conjunction with abstract domains. Each domain captures a particular property of the program. A very interesting characteristic to analyze is whether a set of variables share, i.e., they might reach the same memory location. The information gathered by a sharing domain is used for parallelization and/or for optimizing the compilation. What we present is a combination of domains (sharing, nullity and types) which can work together to refine their results, i.e, be more precise. The approach is shown to achieve better results than previous sharing analyses.
The combinatorial nature of the set sharing domain has been the subject of intense debate within the analysis community. The exponential complexity of some of the operations (e.g., abstract unification in Prolog or field load/store in Java) seemed to be a big obstacle that prevented the domain from being widely used. In this thesis, we present the first scalable implementation of set sharing, which is based in a special type of Binary Decision Diagrams, called Zero-supressed Binary Decision Diagrams (ZBDDs). By representing sets of sets with this data structure, we not only improve the memory requirements of the analysis but also achieve better efficiency in terms of the overall running time.
Leveraging Positive and Negative Representation of Information
Date: Monday, August 25th, 2008
Time: 9:00 am
Place: FEC 141
Eric Trias
Committee: Stephanie Forrest (chair), Greg Heileman, Terran Lane, Jedidiah Crandall
In this dissertation, I advance the concept known as negative databases, or negative representation of information, by developing a negative relational algebra, a negative database management system architecture, and by exploring how negative information can improve finding solutions to combinatorial problems, such as set-sharing analysis. This research strives to leverage both positive data and its complement, or negative data, to provide a secure and/or compact representation as required by an application.
Characterizing HPC Application Sensitivity to OS Interference Using a Kernel-level Noise Injection Framework
Date: Wednesday, May 14th, 2008
Time: 10:00 am
Place: FEC 141
Kurt Ferreira
Operating system interference or noise has been shown to be a key limiter of application scalability in high-end systems. While several studies have attempted to quantify the sources and effects of system interference using user-level mechanisms, there are few published studies on the effect of different kinds of kernel-generated OS noise on application performance at scale. In this thesis, we describe a kernel-level noise injection architecture and examine the sensitivity of real-world, large-scale applications to a range of OS noise patterns. This noise injection framework is built into the Catamount lightweight operating system that runs on the 10,000 nodes Cray Red Storm machine located at Sandia National Labs. Using this framework, we are able to demonstrate the importance of how how noise is generated, in terms of frequency and duration, and how this impact changes with application scale. For example, on Red Storm we are able to show that 2.5% net processor noise at 10,000 nodes can have no effect or can result in over a factor 20 slowdown for the same application, depending solely on how the noise is generated. In addition, this thesis discusses how the characteristics of the applications we study, for example computation/communication ratios, collective communication sizes, and other characteristics, relate to their tendency to amplify or absorb OS noise. The results of this work are important as they are key to understanding the trade-offs involved in the design and implementation of operating systems, middleware, and other system services of high-end parallel systems.
Thesis Committee: Patrick Bridges (advisor), Arthur (Barney) Maccabe, Rolf Riesen
Machine Learning of Human Behavior in Interactive Games
Date: Wednesday, May 7th, 2008
Time: 12:30 pm
Place: FEC 145
Nathan D Fabian
Abstract:
I present a technique for creating video game agents using machine learning to clone the behavior of human trainers playing the game. I compare the results of C4.5, a traditional attribute-value learning algorithm against first-order, relational learning algorithms: FOIL, PROGOL, and PROXIMITY. I establish a synthetic framework in which to analyze various attributes of behavior learning, and I will argue that although FOIL is more fragile as a result of being a greedy search, it is the most pragmatic option of the three. Finally I show a comparison of results of C4.5 and FOIL on human trainers.
Thesis Committee: Terran Lane, George Luger, and Cris Moore.
COSMOS: A Context-Sensitive Probabilistic Modeling System
Date: Tuesday, April 15, 2008
Time: 11 am
Place: FEC 141
Nikita A. Sakhanenko
Abstract:
Modeling real-time dynamic environments, whose information is recorded using distributed sensors, can require reestimation of current models and assumptions. Model reorganization is needed when the nature of the situation changes. Such context changes in the data can arise from sensor failure, when objects in the sensor field come and go, or when modalities of causal interaction change. Additionally, the dynamic nature of environments requires fast inference across interpretive models. Model pruning is needed to reduce the model size by eliminating redundant and irrelevant information. A general, context-sensitive probabilistic modeling problem is defined in this thesis.
This thesis describes a flexible, multi-layered architecture for context-sensitive stochastic modeling (COSMOS). At the core of this architecture is a form of probabilistic logic defined by sets of recursive relationships. The system consists of an ensemble of related probabilistic models paired with a mechanism for switching between context-activated models. Additionally, a causal reasoning component directs context activation. This architecture employs the failure-driven learning and repair procedure (similar to developmental learning in humans), which is based on a combination of abductive inference and expectation maximization (EM) parameter learning. Over time, these mechanisms incrementally populate the ensemble by generating new models when current models no longer perform acceptably. This thesis describes an interface representation of COSMOS characterizing contextual parameters of models and details the connection of interfaces with the rest of the system.
Thesis Committee: George Luger, Carl Stern, Tom Caudell and Lance Williams
Inference of Large-Scale Structure in Networks
Date: Monday, April 14, 2008
Time: 3:30 pm
Place: ECE 118
Tiffany A. S. Pierce
Abstract:
The analysis of large-scale structural patterns on networks can be a difficult task, especially when they consist of a large number of vertices. Often it is useful to classify the network vertices into a set of groups and analyze how the groups interact with each other. This work proposes a statistical way to infer group membership of vertices in a given graph based on the general graph structure. We will use an iterative approach that combines maximum likelihood and Markov Chain Monte Carlo (MCMC) methods to identify likely node groupings. One strength of this method is that it can be used to identify many kinds of structure, including node clusters and disassortative groups, without knowing a priori what large-scale patterns to look for. We investigate the performance of the algorithm on synthetic and real-world data sets to demonstrate its flexibility and robustness in the face of noisy data sets.
Thesis Committee:
Cristopher Moore, Terran Lane, and Melanie Moses
Virtual Windows: Designing and Implementing a System for Ad-hoc, Positional Based Rendering
Date: Friday, April 11th, 2008
Time: 1:30 p.m.
Place: FEC 141
Mark Waligora
Abstract:
This work presents the development and implementation of the Virtual Window Framework with which a user or users may view, through one or more displays each, the contents of a virtual environment which is overlaid on a physical environment. This augmented reality system tracks the positions of the users' eyes as well as the positions of the displays in order to render images as if a user is looking through a display like a window into a virtual world. The physically based input to the system is hoped to provide the user with a very intuitive way to view and interact with rendered images. This sort of positional based rendering isn't necessarily a new concept, but the manner in which it is implemented here is unique in its support of an extremely wide variety of displays as well as multiple users. In designing and implementing a real-time rendering system of this type, there are many system design considerations that must be taken into account. The implementation described here focuses its considerations on accuracy, performance, usability of hardware, and portability. This work demonstrates the Virtual Window Framework in action with several sample applications. These sample applications, however, only touch on the wide variety of applications that this framework could be used with.
Detecting community structure in financial markets
Date: Tuesday, March 11th, 2008
Time: 3:30 pm
Place: FEC 141
Todd Kaplan
Abstract:
In many ways, a stock market resembles an ecosystem. An ecosystem is defined by the interaction of its organisms and the environment. In financial markets, the organisms are traders and the environment is defined by the state of the market through which traders interact. Feedback loops exist between organisms and their environment. Traders impact their environment through trades. Changes in the market environment affect future decisions of traders. Ecosystems, as well as financial markets, display many of the traits associated with a complex adaptive system: aggregation, adaptation, feedback loops, and nonlinearity.
A trophic network, commonly referred to as a food web, describes the feeding relationships between different groups of species in an ecosystem. Ecologists construct trophic networks to aid their understanding of ecosystems. In the realm of financial markets, trophic networks could serve an analogous role. They could potentially illuminate the underlying dynamics responsible for commonly observed macro-level phenomena. For example, one might hypothesize that periods of market volatility occur after a keystone trader species becomes inactive and the trophic network subsequently restructures itself. My research investigates the following question: is it possible to detect trophic structure within a real-world financial market?
Towards this goal, I introduce three new tools contributing to the detection of community structure in complex networks. First, I introduce a two-phase macro-strategy for community detection that provides high-yield, robust results. Second, a fine-granularity measure of community structure is developed. Finally, I describe a dual-assortative measure (DAMM) of community structure. DAMM extends the domain of networks that can be analyzed for community structure to include those with negatively weighted edges.
After introducing the community detection tools, I shift my focus to the evolution of software agents that compete in an artificial financial market. The evolutionary framework is based on a stack-based language (Staq) that I developed for genetic programming (GP). I will examine the genetic program of an agent that consistently evolves. This evolutionary agent reveals a shortcoming in the simulated market: a lack of fundamentalism. To remedy this limitation, two value-based strategies are introduced.
I close the talk by introducing an algorithm for detecting trophic species in financial market data. The approach utilizes the community detection tools introduced earlier. Further, I describe a methodology for assessing the significance of detected structure. The efficacy of the trophic detection algorithm is demonstrated using simulated data. Finally, real-world data from the London Stock Exchange is examined.
Bio:
Todd Kaplan is a Ph.D. candidate in Computer Science at the University of New Mexico. He received his undergraduate degree in Biology at Brown University and holds graduate degrees in Computer Science (Cambridge University, UK) and Applied Mathematics (Oxford University, UK). He is a member of the Adaptive Computation group at UNM, led by Professor Stephanie Forrest. In this talk, Todd hopes to dispel rumors that he is a tenured graduate student.
Group learning using contrast NMF : Application to functional and structural MRI of schizophrenia.
Date: Monday, December 10th, 2007
Time: 11 am
Place: FEC 141
Vamsi Potluru
Committee: Shuang Luan (CS); Vince D Calhoun (EE); Paul Heintz (Radiology)
Abstract:
Non-negative Matrix factorization (NMF) has increasingly been used as a tool in signal processing in the last couple of years. NMF, like independent component analysis (ICA) is useful for decomposing high dimensional data sets into a lower dimensional space. Here, we use NMF to learn the features of both structural and functional magnetic resonance imaging (sMRI/fMRI) images. NMF can be applied to do group analysis of imaging data and we apply it to learn the spatio-temporal activations across subjects doing a common task. We add an additional contrast term to NMF (called co-NMF) to identify features distinctive between two groups. We apply our approach to a dataset consisting of schizophrenia patients and healthy controls. The results from co-NMF make sense in the light of expectations and are improved compared to the NMF results. Our method is general and may prove to be a useful tool for identifying differences between multiple groups.
A Scalable Experiment Management System
Date: Friday, November 9th, 2007
Time: 10 am
Place: FEC 141
Ahmad Douglas
Committee: Patrick Bridges, M.S. Thesis Advisor Barney Maccabe, UNM Computer Science Faculty Rolf Riesen, UNM Computer Science Faculty
Abstract:
In computer science research, limited hardware resources are often shared among several different projects or teams. Under such an arrangement, a robust system must exist to archive important experimental data and system configuration parameters. Without one, an organization risks the loss of valuable experimental data at minimum; a loss of credibility could even result if some of these lost data were later called into question.
Our thesis covers the development of a storage management system to promote systematic retention of critical experimental and configuration data, paying particular attention to the challenges imposed by systems research. Toward this end, we develop formal requirements for an acceptable system. We deploy and evaluate two separate candidate solutions against our requirements, testing against the backdrop of a real-world computer systems research laboratory.
The results of our work are mixed. We found the first solution, an existing open source tool adapted to our needs, to be insufficient in functionality and limited in scalability. The second solution, which was developed specifically to exploit the strengths and address the weaknesses found in the existing tool, met the requirements under certain assumptions. In general, we find that storage management has fundamental limitations which govern its scalability for our purpose.
Scale Invariant Raster Image Representation Through Topological Encoding
Date: Friday, November 9th, 2007
Time: 2:30 pm
Place: FEC 141
Warren Hunt
Abstract:
Raster images have proved themselves to be a pervasive and necessary medium, however their fixed resolution and band-limited "digital" nature impose significant limitations. Vector graphics formats achieve resolution independence and can represent discontinuous image features, but at the cost of real-time performance. A desire for interactive and real-time performance, coupled with a need for scale invariant rendering, drives the search for improved 2D and 3D dataset imaging techniques.
This work leverages a topological encoding of a dataset to attain scale invariance from a rasterized representation. This technique, called IStar, is a novel application of topology that captures the semantic characteristics in a graph-like structure. This structure, along with a reparametrized and sampled representation of the segmented dataset, is packaged as a standard raster image which can then be substantially downsampled and compressed. To view or render an IStar image, the encoded image is upsampled and the topological structure is used to reconstruct the original dataset. Unlike traditional raster approaches, this representation can preserve sharp discontinuities at any level of magnification, much like scalable vector graphics. Because this technique is raster-based, it is well suited to the real-time rendering pipeline. This technique is demonstrated, along with several extensions of the rendering function made possible by the topological encoding.
A Table Based Programming Language for Sensors
Date: Monday, November 5th, 2007
Time: 10 am
Place: HPC Seminar Room
Eric Nelson
Abstract:
Wireless sensor networks are becoming an important tool for a variety of different applications including environmental monitoring, urban sensing, personal networks, and wildlife tracking. However the growth of sensor network technology is hampered by the fact that sensor networks remain difficult to program and maintain. This is due to a variety of reasons including difficulty of programming a massively distributed application, programming under severe resource constraints, and the cost of installing and maintaining multiple networks. This problem is made even more difficult when considering that most users that are interested in using sensor networks do not have an extensive programming background. Our goal is to simplify programming complete applications for sensor network domain specialists. It is our goal to create a programming environment that allows such researchers to employ their expertise when creating a sensor network application.
To accomplish this goal, we present the Table Based Language Environment for Sensors, Tables. Tables is a flexible programming environment coupled with an intuitive user interface based on the concept of spreadsheet. Spread sheets provide a method to spatially organize and manipulate the sensor network and data coming from the network. Tables is intended to be implemented by leveraging lower level software stacks with layers like Kensho, but can also work in a more limited fashion with a less advanced API. The concepts of pivot tables and functions are the basis of Tables power and ease of use. Basic and complex applications are presented that demonstrate the data centric power of Tables. Implementation details are described and evaluated with comparisons of the overhead of Tables.
Light-Weight Online Performance Monitoring and Tuning with Embedded Gossip
Date: Tuesday, October 30th, 2007
Time: 8 am
Place: Seminar room at High Performance Computing Center at UNM
Wenbin Zhu
Abstract:
Understanding the performance of large-scale, long-running applications is difficult. Traditional trace-based performance monitoring can provide detailed performance data, however, correctly merging of traced data from all processes and interpreting the resulting traces make this technique only useful post-mortem. Statistical methods have lower overhead on monitoring, however, correlating raw performance data between processes in even more difficult than in trace-based systems, and again not appropriate for runtime execution.
This dissertation describes a new online performance monitoring approach, called Embedded Gossip (EG). EG uses an application's natural communication to propagate and gather performance data at each process. By piggybacking performance information on each outgoing message and extracting the information from each incoming message, each application process is eventually aware of performance problems on nodes with which they directly or indirectly communicate. This approach is useful for large-scale, long-running, dynamic applications, where static configuration may not be optimal, and traditional detailed performance monitoring is expensive.
To demonstrate the viability of Embedded Gossip for global system monitoring and adaptation, this dissertation describes a general architecture for implementing EG-based monitoring systems, including a new distributed agreement protocol, 3-wait, for EG-based global system adaptations. It then presents experimental results for the two monitoring systems and one global adaptation system implemented using this architecture, including global critical path profiling, global progress counting, and global scheduling of load balancing actions. The results show that Embedded Gossip can be used to implement a wide range of runtime monitoring systems. The resulting monitoring systems have low runtime overhead, and the information gathered using these systems can effectively drive global system adaptation decisions online.
What makes NP-complete problems hard?
Date: Thursday, October 25th, 2007
Time: 12:15 pm
Place: FEC 141
Haixia Jia
Abstract:
To understand what makes NP-complete problems so hard, I conduct my research through two approaches: construct hard problem instances and study how bad the solvers work. A simple way to construct problem instances is to choose a random truth assignment A, and then choose clauses randomly from among those satisfied by A. However, this method tends to produce easy problems, since the majority of literals point toward the "hidden"' assignment A. Previously, we proposed a problem generator that cancel, this effect by hiding both A and its complement. While the resulting formulas appear to be just as hard for DPLL algorithms as random 3-SAT formulas with no hidden assignment, they can be solved by WalkSAT in only polynomial time. Here we propose a new method to cancel the attraction to A, by choosing a clause with t > 0 literals satisfied by A with probability proportional to q^t for some q < 1. By varying q, we can generate formulas whose variables have no bias, i.e., which are equally likely to be true or false; we can even cause the formula to deceptively point away from A. We present theoretical and experimental results suggesting that these formulas are exponentially hard both for DPLL algorithms and for incomplete algorithms such as WalkSAT.
Next, we introduce a new application for constructing hard to solve SAT instances: Negative Databases (NDBs), a strange type of database that only stores records not in the original database. NDBs offer the promise of preserving the confidentiality of the individual records in a database, while permitting certain restricted queries. NDBs have many applications in the field of data security. Constructing an NDB corresponding to a DB is equivalent to constructing a SAT formula with a set of satisfiable truth assignments as its only satisfiable truth assignments. We present a new approach to construct such formulas, and we are able to show that the formula can only be satisfied by the truth assignments that are very close to one of the hidden assignments. We then demonstrate experimentally that the formulas generated by our scheme are much harder to solve than the formulas generated by the previously suggested prefix algorithm and RNDB algorithm, and indeed they are hard enough to ensure security. We hope that this new approach could create interest in constructing hard SAT formulas as a way to protect the confidentiality of databases.
Finally, to understand how bad the solvers work, we analyze the behavior of two natural variants of the Davis-Putnam-Logemann-Loveland (DPLL) algorithm for Graph 3-Coloring on sparse random graphs G(n,p=c/n). First, we calculate analytically the probability Pc(0) that these algorithms find a 3-coloring with no backtracking at all, and show that it goes to zero faster than any analytic function as c approach c* = 3.847... Then we show that even in the "easy" phase 1 < c < c*; where Pc(0) > 0, including just above the emergence of the giant component, the expected number of backtracks is exponentially large with positive probability. To our knowledge this is the first rigorous proof that the running time of a natural backtracking algorithm has a heavy tail for graph coloring. In addition, we give experimental evidence and heuristic arguments that this tail takes the form Pc(b) ~ b-1 up to an exponential cutoff.
Pretenuring Enhancements in the 64bit Older-First Collector
Date: Friday, September 21st, 2007
Time: 10 am
Place: FEC 141
Rotem Bentzur
Committee: Darko Stefanovic, Patrick Bridges, Sean Luan
Abstract:
This work describes the design and implementation of the Older-First
garbage collector in combination with pretenuring enhancements in the 64-bit version of the JikesRVM project. The addition of simple
pretenuring enhancements to the collection algorithm, helps mitigate
the collectors relative high copying cost and thus achieve better
behavior and performance in long-running server applications when
compared to the basic Older-First algorithm or to generational copying
algorithms such as the Appel-style generational collector.
Parallel Simulated Annealing Applied to RNAi Gene Family Knockdown
Date: Friday, June 8th, 2007
Time: 9:45 am
Place: FEC 141
M. Leigh Fanning
Committee: Shuang Luan, David Bader, Lance Williams.
Abstract:
Simulated Annealing has been widely used an effective heuristic for computationally intractable search problems. We employ the technique in the specific application of RNA Interference gene family knockdown, where both the problem, and the search technique, are extended. Potential applications include development of therapeutic gene based medicines and enhanced gene function inference.The multi-step RNA Interference (RNAi) process is functionally dependent on matching a double-stranded RNA or fragmented segment of RNA to an equivalent length along a target messenger RNA. Matching as part of a biochemical process renders messenger RNA incapable of completing its intended transcription, effectively preventing translation to amino acids and expression of the anticipated protein encoded by a particular gene. We extend the problem domain to include approximate, rather than exact matching. For gene family knockdown, matching must extend to all members of a target gene set while simultaneously leaving all remaining genes within the genome unmatched. Following exclusion of background genome substrings, we show that the problem naturally reduces to an instance of Set Cover. We transform this instance into a constraint optimization and search for the smallest possible substring set that can represent all members of a target gene family. We extend Simulated Annealing to incorporated parallelism, and present results showing smaller covers determined more efficiently for a parallel implementation in comparsion to a sequential implementation for several combinations of genomes and gene targets. We further speculate on additional ways to incorporate parallelism as a means of extending Simulated Annealing effectiveness.
Phylogenetic Networks: Simulation, Characterization, and Reconstruction
Date: Friday, June 8th, 2007
Time: 9:00 am
Place: FEC 141
Monique Morin
Committee: Bernard Moret (chair), Stephanie Forrest, David Bader, and Randal (Randy) Linder (UT Austin Biology)
Abstract:
Phylogenetic trees are topological depictions of evolutionary histories which involve birth and death events. This tree model inherently assumes that active lineages evolve independently from all others. However, inter-species events such as hybridization and lateral gene transfer are known to occur resulting in an intermingling of DNA information. In essence, this creates a bridge between two previously independent lineages. By definition, these processes are fundamentally dependent on other contemporary species and lead to a more complex history that is often referred to as a phylogenetic network. While tree topology software is abundant and mature, algorithms and tools that permit and address issues specific to phylogenetic networks are less common and still in their infancy.
This work examines the generation of phylogenetic networks and methods for both their reconstruction and characterization. This required the development of three new tools: NetGen for simulating source topologies, NetReconstruct for inferring histories, and NetMeasure for quantitatively comparing and categorizing network features. Phylogenetic networks are created by extending the traditional birth-death model to include hybridizations. This algorithm is unique in that the DNA sequences are evolved in conjunction with the topology, thereby permitting the outcome of hybrid events to be influenced by genetics. By utilizing the sequence information from NetGen's final active lineages, a proposed history containing a single diploid hybrid is inferred. This reconstruction process segments the species into sets for which subtrees are created and then merged to form the final topology. Finally, hybrid events in a topology can be quantitatively characterized with three new measures and two networks can be compared using the existing tripartition method.
Our results show that tripartition scores for this reconstruction model improve when the number of current day species increase and the branch lengths of the topology shorten. Additionally, topologies containing a single ancient hybridization (one which occurred early in time) result in reconstructions more analogous to their source histories as compared to those with hybrids that have occurred more recently. This work supports the ongoing effort of phylogenetic reconstruction in fields such as pharmacology, genetics, and systematics.
Thesis Defense: Psychological Measurement of the Perceptual Quality/Compression Ratio Tradeoff in JPEG and JPEG 2000
Date: Tuesday, June 5th, 2007
Time: 10:00 am
Place: FEC 141
Frederick Crawford
Committee: Lance Williams (chair), Sean Luan, and Melanie Moses
Abstract:
Previous efforts to measure the relative compression efficiencies of
the JPEG and JPEG 2000 lossy image compression algorithms have either been based
on easy to compute quantities, which are not well correlated with perceptual
quality, or on purely subjective methods. This work attempts to assess the tradeoff
between perceptual quality and compression ratio in the two algorithms by measuring
the compression ratio at which each algorithm first introduces noticeable differences.
This is accomplished using a two-alternative, forced-choice experimental design;
subjects are repeatedly presented with pairs of images where one image is either
a compressed or uncompressed copy of the other image. We objectively measure
the subjects’ abilities to detect differences between the two images as
a function of compression ratio. Somewhat surprisingly, our results show that
classical JPEG is superior to JPEG 2000 at low compression ratios and that neither
mean squared error (MSE), root mean squared error (RMSE), nor peak signal-to-noise
ratio (PSNR) are well correlated with just noticeable difference (JND).
Thesis Defense: A Unifying Framework for Computing the Robinson-Foulds Metric
Date: Friday, June 1st, 2007
Time: 10:00 am
Place: FEC 141
Eric Gottlieb
Committee: Bernard Moret(co-chair), George Luger (co-chair), Ed Angel
Abstract:
Phylogenetic Trees are commonly compared using the Robinson-Foulds metric. This in turn is normally computed using Day's Algorithm which works on pairs of trees. Other algorithms such as the Fast Algorithm or the Treehash Algorithm have been proposed as alternatives to Day's algorithm. While, at first glance, these algorithms seem disparate, they may all be understood to be variations of a single unifying algorithm and comparisons between them can be made from that perspective.
Thesis Defense: Psychophysical Investigation of Bistability and Anisotropy in Illusory Contour Shape Computation
Date: Friday, May 11th, 2007
Time: 11:00 am
Place: FEC 141
Robert Young
Thesis advisor: Professor Williams
Committee: Professors Williams, Caudell, Luger, and Moses
Abstract:
Illusory contours provide a glimpse of the mechanisms responsible for computation in human vision. We developed two psychophysical experiments to quantify the shape and sharpness of the illusory contour percept elicited by the Koffka cross figure. In the 1st experiment we developed a novel experimental design based on a two dimensional adaptive partitioning, which allowed us to quantify the shape and sharpness of the illusory contour percept based on participants’ responses to the location of a probe points relative to the illusory contour. Because we were unsatisfied with aspects of this experimental design, we developed a second design that addressed its weaknesses. In the second design, we presented the observer with a Koffka cross figure followed by a Koff;ka cross figure with a literal circle or square in the center gap and asked participants if they detected a shape change. This design allowed us to quickly quantify the illusory contour shape percept for a range of parameters. We discovered that our second experimental design could also be used to determine if the computation underlying illusory contour perception is isotropic by comparing shape results for two figure orientations. We found a statistically significant difierence in the figures at the two orientations which provides strong evidence that illusory contour computation is anisotropic.
Thesis Defense: Bayesian Belief Networks for Astronomical Object Recognition and Classification in CTI-II
Date: Monday, May 7th, 2007
Time: 1:00 pm
Place: FEC 141
Mike Ritthaler
Abstract:
The University of New Mexico (UNM) is currently designing and building
the CCD Transit Instrument II (CTI-II), a 1.8m transit survey
telescope. The stationary CTI-II uses the time delay and integrate
readout mode for a mosaic of charge coupled devices (CCD) to generate
over 100 gigapixels per night which is required to be analyzed within
a day of observation. This work describes the construction and
operation of the BACAN (Bayesian Astronomical Clue Analyzing Network)
system to help with real-time and near-real-time analysis of the data.
This document describes the theory of the Bayesian networks which
underlie BACAN, the choices of features which BACAN uses as evidence,
and the accuracy of numerous classification topologies with a
synthetic data set. An application of self-organizing maps for
discretization of continuous variables is also presented. Finally,
there is an in-depth discussion of the best topology results with an
emphasis on how the reasoning of the network can be shown to the
astronomer using it.
Thesis Defense: Viability of a Parallel Narrow-Band Level Set Method
Date: Wednesday, May 2nd, 2007
Time: 1:00 pm
Place: FEC 141
Eric Martin
Abstract:
Level set methods have brought many desirable properties to the
representation and tracking of arbitrary interfaces at a cost of increased
computation. Previous efforts have reduced the cost of the full-matrix
method significantly by only updating cell values along a dynamic
narrow-band surrounding the interface. The full-matrix method also
demonstrates near linear speedup when calculations are performed in parallel
on a distributed clustered computing environment. The work expressed herein
explores the potential performance increases obtained by combining the narrow-band
method with data dependent parallelism. A parallel narrow-band algorithm is
developed using MPI (Message Passing Interface) in order to test performance
and scalability of general level set methods when applied to very large problems.
Comparisons are made between the full-matrix and narrow-band methods in serial
and parallel by calculating the motion of interfaces due to curvature-driven
flow and an external velocity field. Data shows that significant gains may
be made with a limited number of processors dependent upon the size and partitioning
of the narrow-band. Algorithmic details are discussed and future work is examined
as these advances point to realistic performance increases for level set applications.
Dissertation Defense: Learning Bayesian Networks from Hierarchically Related Data with a Neuroimaging Data Application
Date: Friday, April 20th, 2007
Time: 1:00 pm
Place: FEC 141
John Burge
My primary contributions are twofold. First, I propose a novel application of discrete Bayesian networks—a modeling framework often employed in Machine Learning—to the challenging task of eliciting correlations among neuroanatomical regions of interest from a corpus of functional magnetic resonance imaging data. I demonstrate an application of this method on a dataset consisting of healthy and demented elderly adults, resulting in descriptive models for both populations of subjects. I show how the results of the models can be validated via robustness and confidence testing. Second, I propose three extensions to BN structure search that can be used to improve the results from the demented experiment. These extensions include a method for learning networks which identify correlations that change between classes of data and methods for improving both structure and parameter learning given a hierarchical relationship among modeled random variables. To validate these extensions, I perform a wide variety of experiments on both simulated datasets and five neuroimaging data sets. Overall, I show that incorporating my proposed methods allows for learned models to identify class-discriminative structures more effectively than traditional methods and, for domains with hierarchically related RVs, I demonstrate that higher scoring models can be found with parameters that generalize more effectively
Dissertation Defense: Residual Information on Sanitized Magnetic Media
Date: Tuesday, April 10th, 2007
Time: 11:00 am
Place: FEC 141
Torsten Staab
While a variety of software tools for the recovery of information from formatted, overwritten, or deleted computer hard drives and floppy disks already exist, none of these technologies are capable of recovering information from demagnetized magnetic storage media. De-magnetization, which is also known as degaussing, is a physical process in which magnetic storage media is passed through a very powerful oscillating or constant magnetic field. Degaussing is supposed to rearrange the magnetic domains on the media, thereby removing any semblance of the previously recorded signal. Besides physically distorting the original recording, degaussing also results in extremely low recording Signal-to-Noise Ratios (SNR). The SNR after degaussing is far below the sensitivity level of the storage unit's read head and outside its noise filtering and error correction capabilities. Therefore, degaussed (i.e., sanitized) magnetic media can no longer be read by its original recording system. Information recovery from degaussed magnetic storage media is widely considered to be the holy grail of computer forensics.
The objective of the research presented here was to advance the state-of-the-art in computer forensics by modeling and developing novel signal and image processing algorithms for recovering and "descrambling" of residual information patterns from degaussed magnetic storage media. This presentation will provide an overview of these computational data recovery models and algorithms, discuss experimental results, and outline future extensions.
Thesis Defense: Automated Quality Assurance of Plain Radiographs
Date: Friday, April 6th, 2007
Time: 11:00 am
Place: FEC 141
Rory L.P. McGuire
Committee: Shuang Luan, Lance Williams, Philip Heintz
Picture Archiving and Communication Systems (PACS) are becoming prevalent in many radiology departments. However, with such a system, the radiographic image retake rate is still between 2.3% and 4.6%. At this rate, tens of millions of dollars may be wasted and diagnoses for millions of patients delayed.
The goal of this research is to explore various ways to automatically qualify diagnostic radiographs based on criteria determined by radiologists. We have developed a prototype real-time radiograph quality assurance system that could significantly reduce the overall x-ray retake rate, lower public exposure to radiation, and implicitly train radiology technicians by providing instant feedback on imaging techniques. This system could also reduce time required by the patient, technician and radiologist, thereby reducing the total cost of treatment.
Our system has two main parts: (1) anatomical site categorization of an image, and (2) registration, learning, and classification of an image for each anatomical site. For the first part, we use mutual information to categorize knee and lung images. For the second part, we focus on knee images. We use the Hough transform, a well-known simple edge detection algorithm; the "eigenfaces" method of calculating the principal components of a set of images and reducing the dimensionality of the search space; and a simple, nearest-neighbor supervised learning and classification algorithm. We believe our system could serve as a springboard for similar systems to qualify medical imaging data of other anatomical sites.
Dissertation Defense: Automated Tactics Modeling: Techniques and Applications
Date: Thursday, April 5th, 2007
Time: 8:30 am
Place: FEC 141
Rob Abbott
Committee: Stephanie Forrest, Chair, Terran Lane, Peter Stone, Department of Computer Science, University of Texas at Austin, Jean-Paul Watson, Computer Science Research Institute, Sandia National Labs
Interactive computer simulation can be an efficient and effective tool for training, but faces several technical obstacles: first, agents in the scenario (simulated allies and opponents) must exhibit realistic tactics; second, the dynamics of the simulation must be valid in order to reinforce tactics which are applicable in reality; and third, the training system should provide corrective feedback when students commit errors. To be practical, a technical approach must accomplish all of these objectives quickly and inexpensively. In this dissertation I develop solutions for these requirements with techniques that enable subject matter experts (SMEs) to create models of their own tactics. Each of the three obstacles is addressed with an approach based on behavior modeling: first, the models provide a way to transfer real-world tactics to software agents. Second, the dynamics of the simulator are validated by measuring the effectiveness of real-world vs. synthetic tactics in the simulator. Third, students using the simulator for training are evaluated by comparing their actions to those predicted by models of SME behavior. Because the models are constructed by an automated process, knowledge engineers and software developers need not mediate between the SME and the student audience. The model itself becomes a medium for indirect educational interactions between the SME and student.
Improving the information derived from human brain mapping experiments
Date: February 21st, 2006
Time: 9:00 am
Place: FEC 141
Sergey Plis
Advances in the development of noninvasive imaging modalities allow us to examine human brain function at various ways and at different spatial and temporal scales. However, each modality provides limited information and no single modality covers the large range of spatial and temporal scales over which neural activity operates. There is considerable interest in combining different modalities to obtain a clearer and more complete picture of brain function. This work is focused on combining and improving information extraction from spatiotemporal data of electroencephalography and magnetoencephalography. Contributions of this thesis include 1) a spatiotemporal noise covariance model for better noise characterization and subsequent improvement of source analysis results, 2) a generalization of this covariance model that improves the running time of the analysis routines while maintaining most of the benefits of the full noise covariance model, and 3) a probabilistic forward model for electroencephalography. This probabilistic forward model explicitly accounts for uncertainties in the skull conductivity measurements and demonstrates a principled approach of incorporating uncertainties of the forward model into the Bayesian analysis and propagating them to the resulting posterior distribution.
Anomaly Detection for HTTP Intrusion Detection: Algorithm Comparisons and the Effect of Generalization on Accuracy
Date: December 1st, 2006
Time: 3:00 pm
Place: FEC 141
Kenneth Ingham
Network servers are vulnerable to attack, and this state of affairs shows no sign of abating. Therefore security measures to protect vulnerable software is an important part of keeping systems secure. Anomaly detection systems have the potential to improve the state of affairs, because they can independently learn a model of normal behavior from a set of training data, and then use the model to detect novel attacks. In most cases, this model represents more instances than were in the training data set---this is generalization, and it is necessary for accurate anomaly detection.
This dissertation describes a framework for testing anomaly detection algorithms under identical conditions. Because quality test data representative of today's web servers is not available, this dissertation also describes the Hypertext Transfer Protocol (HTTP) request data collected from four web sites to use as training and test data representing normal HTTP requests. A collection of attacks against web servers and their applications did not exist either, so prior to testing it was necessary to also build a database of HTTP attacks, the largest publicly-available one.
These data were used to test nine algorithms. This testing was more rigorous than any performed previously, and it shows that the previously-proposed algorithms are not accurate enough for production use on many of the web servers in use today, and might explain the lack of their widespread adoption. Two newer algorithms (deterministic finite automaton induction and $n$-grams) show more promise.
This dissertation shows that accurate anomaly detection requires carefully controlled generalization. Too much or too little will result inaccurate results. Calculating the growth rate of the set that describes the anomaly detector's model of normal provides a means of comparing anomaly detection algorithms and predicting their accuracy. Identification of undergeneralization locations can be automated, leading to more rapid discovery of the heuristics needed to allow an anomaly detection system to achieve the required accuracy for production use.
Ph.D. Dissertation Defense: Document Interrogation: Architecture, Information Extraction, Approximate Answers
Date: November 7th, 2006
Time: 10:00 am
Place: FEC 141
Soraya Abad-Mota
Organizations produce large numbers of documents. Often the contents of these documents do not reach the operational databases or data warehouses of the enterprise. With the world-wide accessibility to the web these documents are made available to a wide audience, but browsing through them manually is cumbersome, at best. The definition of the semantic web has led to fascinating possibilities of research in trying to make explicit the semantics of terabytes of unstructured data available today. Our research seeks to improve the use of these unstructured sources, in particular, textual sources.
We present an architecture for structuring and querying the contents of a set of documents which belong to an organization. The structure is a database which is semi-automatically populated using information extraction techniques. A language to interrogate the contents of the documents is provided and it is based on an ontology. The processing of queries in this language can provide approximate answers and triggers a mechanism for improving the answers by searching additional sources. Individual database items have associated quality metadata that can be used in the assessment of answers' quality. The interaction between information extraction and query processing is a pivotal aspect of this research.
Master's Thesis Defense: Implementation of Steerable Pyamide with Hexagonal Sampling
Date: November 7th, 2006
Time: 10:00 am
Place: Dean's
Office Conference Room
Ron Hospelhorn
Multi-scale image processing frequently serves as the first step in scene analysis, pattern recognition, texture analysis, image reconstruction, and many other early vision tasks. A pyramid architecture analyzes an image independently at several scales, and a steerable pyramid can extract oriented features at each of those scales. Hexagonal sampling of image data promises processing economies over rectangular sampling as well as higher angular resolution. This thesis discusses the implementation of filters for hexagonally sampled data and uses them in two steerable pyramid designs.
Master's Thesis Defense: Real-time Application in Dome Theater Systems
Date: November 3rd, 2006
Time: 2:30 pm
Place: FEC 141
Jin Xiong
Digital fulldome theaters provide an immersive environment for educational and scientific purposes. The architecture of the new generation of digital fulldome systems enables more flexibility in creating fulldome shows. The adoption of real-time rendering provides even more versatility. As an exploration of future applications of fulldome systems, we successfully implemented dome versions of the Pong game and a fractal generator. These research works exhibit the potential interactive capability of fulldome systems.
In this thesis, first we introduce the architecture of current fulldome theater systems and the process of presenting a fulldome show. We then explore possible approaches to real-time rendering in multi-projector system as a general case and the fulldome theater system as a special case. Finally we describe the implementation of interactive real-time applications that are based on the cluster architecture of fulldome theater systems.
Ph.D. Dissertation Defense: String Kernels and Their Applications to RNA Interference
Date: Friday, August 25th, 2006
Time: 8 am
Place: FEC 141
Shibin Qiu
Many bioinformatic problems require effective pattern recognition and machine learning algorithms. Kernel methods are powerful approaches for data classification, prediction and description. In this work, we develop kernel learning methods for bioinformatics data by introducing novel string kernels and unifying multiple kernels, and apply them to problems in RNA interference (RNAi), a cutting edge technology with many biological and pharmaceutical applications. We develop RNA string kernels to model RNA sequence similarities by simulating mismatch, G-U wobble, and bulge patterns. For the first time, we give RNAi a mathematical formalism using RNA kernels. To systematically quantify the off-target effects of RNAi, we define an off-target error rate and evaluate it using RNA kernels. Since computing the error rates requires large, transcriptomic scale searches, we develop efficient implementations for the RNA kernels, which have achieved speedups of six orders of magnitude. The resultant off-target error rates in multiple organisms characterize the quantitative relationship between the error rates and RNAi parameters, are consistent with biological findings, provide guidelines for RNAi designs, and project error rates for future RNAi biological applications. To further improve upon the RNA kernels, we propose a group of randomized string kernels and convert a random Hamming distance into a string kernel. For the first time, we use support vector regression to predict the exact silencing efficacy scores for siRNA sequences. We employ the RNA and randomized string kernels in regression prediction and have achieved better accuracies on biological data sets than do existing string kernels or numerical kernels computed from efficacy rules proposed in the RNAi literature. To better capture variations of the data, we develop a multiple kernel regression technique that unifies multiple kernels via a quadratically constrained quadratic programming (QCQP) formulation. Results demonstrate that using more than one kernel simultaneously improves prediction accuracy and reduces the number of support vectors, turning a complicated problem (requiring more support vectors) into an easier one. For simplification of the multiple kernel framework, we propose several heuristics based on the fitness between a kernel and a label set. Applying the heuristics to the siRNA efficacy problem shows that combining string kernels with numerical kernels from efficacy rules significantly improves prediction accuracy. The multiple kernel framework also estimates the relative importance of the kernels representing different siRNA efficacy rules, enabling RNAi designers to focus on important rules.
MS Thesis Defense: A New, Virtual-Machine-Independent Approach to Garbage Collection Tracing
Date: Friday, August 11th, 2006
Time: 9 am.
Place: FEC 141
James Foucar
A garbage collection trace (GC-trace) enumerates all the events that happen during program execution that affect the heap. Therefore, GC-tracing is an excellent way to examine the behavior of an object-oriented system. Knowing behavior expedites many forms of virtual machine research including garbage collection research. GC-traces were originally generated by the painfully slow brute-force technique. Subsequently, this technique was replaced by the Merlin algorithm which can generate GC-traces at a fraction of the cost of the brute-force algorithm.
Our work introduces a new GC-tracing system. Our system uses the Merlin algorithm within the context of a general shadow heap that is independent of the virtual machine. The shadow heap supports many optimizations to the Merlin algorithm and also supports analysis and verification. We examine the advantages and disadvantages of our approach, the viability of using a C-library for analysis in conjunction with a Java-in-Java virtual machine, the various costs of integrating our shadow heap with a virtual machine, and the effectiveness of our GC-tracing optimizations. Finally, we compare our GC-tracing system with the GC-tracing system that is currently bundled with JikesRVM.
Thesis & Dissertation Defenses Archive
