# UNM Computer Science

## Thesis and Dissertation Defenses

### Interactive Exploration of Multitask Dependency Networks

Date: May 2, 2013
Time: 3:00 pm
Place: FEC 141

Diane Oyen

Committee Chair:
Dr. George Lugar, UNM CS Department

Committee Co-Chair:
Dr. Terran Lane, UNM CS Department

Committee Member:
Dr. Eric Eaton, Computer Science Department, Bryn Mawr College

Committee Member:
Dr. Vincent Clark, UNM MIND Research Network

Committee Member:
Dr. Lance Williams, UNM CS Department

Abstract:
Scientists in various fields increasingly depend on machine learning algorithms to discover patterns in complex data. Two specific examples addressed in this dissertation are: (1) identifying how information sharing among regions of the brain develops due to learning or gets disrupted due to disease; and, (2) learning dependency networks of blood proteins associated with cancer. In both cases dependency networks, or graphical models, are learned from the observed data in order to make comparisons between the various sub-populations of the dataset. Rarely is there sufficient data to infer robust individual networks for each sub-population. Therefore, the multiple networks must be considered simultaneously; dramatically expanding the hypothesis space of the learning problem, and furthermore there are numerous solutions that are nearly score-equivalent. Effectively exploring this complex solution space requires input from the domain scientist to refine the objective function.

This dissertation proposes incorporating domain knowledge in transfer learning algorithms for network structure learning to facilitate the exploration of solutions. Existing transfer learning algorithms for network structure identification do not allow for any human input or feedback. This dissertation gives a framework for incorporating human input or prior knowledge into any transfer network structure learning algorithm and shows that it is a generalization of existing formulations. Furthermore, solutions produced with human input narrow down the variance of solutions to those solutions that are of interest to domain scientists. Transfer algorithms have been developed under the assumption that transfer will strengthen the learning power for those parts of the networks that are shared in common among various tasks. Yet, domain scientists are often more interested in learning the differences between tasks. This dissertation shows that differences between networks are learned with higher confidence using transfer learning algorithms than through the existing standard method of bootstrapping. This finding implies that transfer algorithms may be the ideal method for making comparisons among related dependency networks, whether looking for similarities or differences. Finally, an interactive tool is provided to enable domain scientists to explore the full space of solutions efficiently.

### Ontology-based Annotation Using Naive Bayes and Decision Trees

Date: April 26, 2013
Time: 10:00 am
Place: Centennial Engineering Center, Dean's Conference room, 3071

Jiawei Xu

Committee Chair:
Dr. George Lugar, UNM CS Department

Committee Member:
Dr. Lance R. Williams, UNM CS Department

Committee Member:
Dr. Jessica A. Turner, UNM MIND Research Network

Abstract:
The Cognitive Paradigm Ontology (CogPO) defines an ontological relationship between academic terms and experiments in the field of neuroscience. BrainMap is a database of literature describing these experiments, which are annotated by human experts based on the ontological framework defined in CogPO. We present a stochastic approach to automate this process. We model the human expert's domain knowledge with a group of naive Bayes classifiers, then explore the inherent relationship among different components defined by the ontology using a probabilistic decision tree model. Our solution simulates the basic decision making process of a human expert, and outperforms conventional text mining approaches by taking advantage of an ontology which structuralizes the specific knowledge domain.

### Font Rendering on the GPU

Date: March 1, 2013
Time: 3:00 pm
Place: FEC 141

Jeffrey Bowles

Committee Chair:
Dr. Joe Kniss, UNM CS Department

Committee Member:
Dr. George Lugar, UNM CS Department

Committee Member:
Dr. Edward Angel, UNM CS Department

Abstract:
The glyphs described in a font are usually rendered using software that runs on the CPU. This can be done very quickly and is a well understood process. These techniques work well when the glyph is rendered such that each point on the rendered glyph maps one-to-one to pixels on the screen. When this one-to-one mapping is removed, visual quality can degrade dramatically.

This thesis describes a method to avoid this degradation by moving the rendering of glyphs from the CPU to the Graphics Processing Unit (GPU). This method extracts glyph metrics from Truetype font files using the Freetype library, processes all glyph data in the font file and places it into a form that can be used by GPU shader programs to draw the glyphs such that the visual quality does not degrade regardless of the applied transformations.

### Scaling in the Immune System

Date: January 31, 2013
Time: 10:00 am
Place: FEC 141

Soumya Banerjee

Committee Chair:
Dr. Melanie Moses, UNM CS Department

Committee Member:
Dr. Stephanie Forrest, UNM CS Department

Committee Member:
Dr. Terran Lane, UNM CS Department

Committee Member:
Dr. Alan S. Perelson, Los Alamos National Laboratory

Committee Member:
Dr. Frederick Koster, UNM Center for Infectious Disease and Immunity

Abstract:
How different is the immune system in a human from that of a mouse? Do pathogens replicate at the same rate in different species? Answers to these questions have impact on human health since multi-host pathogens that jump from animals to humans affect millions worldwide.

It is not known how rates of immune response and viral dynamics vary from species to species and how they depend on species body size. Metabolic scaling theory predicts that intracellular processes are slower in larger animals since cellular metabolic rates are slower. We test how rates of pathogenesis and immune system response rates depend on species body size.

We hypothesize that immune response rates are invariant with body size. Our work suggests how the physical architecture of the immune system and chemical signals within it may lead to nearly scale-invariant immune search and response. We fit mathematical models to experimental West Nile Virus (WNV, a multi-host pathogen) infection data and investigate how model parameters characterizing the pathogen and the immune response change with respect to animal mass.

Phylogeny also affects pathogenesis and immune response. We use a hierarchical Bayesian model, that incorporates phylogeny, to test hypotheses about the role of mass and phylogeny on pathogen replication and immune response. We observe that:
1) Hierarchical models (informed by phylogeny) make more accurate predictions of experimental data and more realistic estimates of biologically relevant parameters characterizing WNV infection.
2) Rates of WNV production decline with species body mass, modified by a phylogenetic influence.

Our work is the first to systematically explore the role of host body mass in pathogenesis using mathematical models and empirical data. We investigate the complex interplay between the physical structure of the immune system and host body mass in determining immune response. The modeling strategies and tools outlined here are likely to be applicable to modeling of other multi-host pathogens. This work could also be extended to understand how drug and vaccine efficacy differ in humans from model organisms like mice where most immunological experiments are conducted.

### Hopscotch: Robust Multi-agent Search

Date: November 27, 2012
Time: 10:00 am
Place: FEC 141

Edward "Chris" Miles

Committee Chair:
Dr. Melanie Moses, UNM CS Department

Committee Member:
Dr. Lydia Tapia, UNM CS Department

Committee Member:
Dr. Rafael Fierro, UNM ECE Department

Abstract:
The task of searching a space is an important one with numerous applications. Because many times the search task takes place in a remote or hazardous location, and because the task is easily divisible, there has been a lot of work exploring the use of multi-robot teams to accomplish the search task. An important topic of research in this area is the division of the task among robot agents. Interrelated with subtask assignment is failure handling, in the sense that, when an agent fails, its part of the task must then be performed by other agents.

This thesis describes Hopscotch, a multi-agent search strategy that divides the search area into a grid of squares. Each agent is assigned responsibility to search one square lot at a time, and upon completing the search of that lot the agent is assigned a new lot. Assignment occurs in real time using a very simple contract net. Because lots that have been previously searched are skipped, the order of search from the point of view of a particular agent is reminiscent of the progression of steps in the playground game of hopscotch.

Decomposition of the search area is a common approach to multi-agent search, and auction-based contract net strategies have appeared in recent literature as a method of task allocation in multi-agent systems. The Hopscotch strategy combines the two, with a strong focus on robust tolerance of agent failures. Conventionally, contract nets divide all known tasks among available resources. In contrast, Hopscotch limits each agent to one assigned lot at a time, so that failure of an agent compels reallocation of only one lot search task. Furthermore, the contract net is implemented in an unconventional manner that empowers each agent with responsibility for contract management. This novel combination of real-time assignment and decentralized management allows Hopscotch to resiliently cope with agent failures.

The Hopscotch strategy was modeled and compared to other multi-agent strategies that tackle the search task in a variety of ways. Simulation results show that Hopscotch is very effective and failure-tolerant in comparison to the other approaches. Although the search task modeled here is a basic one, results from simulations show the promise of using this strategy for more complicated scenarios, and with actual robot agents.

### Speculatively Bootstrapping Better Intrusion Detection System Performance

Date: September 27, 2012
Time: 9:30 am
Place: FEC 141

Sunny Fugate

Committee Chair:
Dr. George Luger, UNM CS Department

Committee Co-Chair:
Dr. Jed Crandall, UNM CS Department

Committee Members:
Dr. Tom Hayes, UNM CS Department
Dr. Tom Caudell, UNM CS Department
Dr. LorRaine Duffy, US Navy

Abstract:
During the last 25 years, the designers of computer Intrusion Detection Systems (IDS) have been continuously challenged with performance bottlenecks and scalability issues. The number of threats is enormous. But the more we try to detect, the more of our computing resources are being used solely for protection, whittling away at what remains for performing useful computations. The performance of many IDS systems depends primarily on the quantity of input data and complexity of detected patterns. During noisy attacks, system load tends to increase proportional to increasing data rates, making IDS systems vulnerable to flooding and denial-of-service attacks. Unfortunately, the number, type, and sophistication of threats is quickly increasing, outpacing our ability to detect them.

This dissertation describes methods for assessing the current scaling performance of signature-based intrusion detection systems (IDS) and presents models for speculatively bootstrapping better IDS performance. Using measurements of the coverage and scaling performance of a modern signature-based IDS in the context of an anticipatory model I argue that maintaining compact, low-coverage signature-sets does not provide optimal protection for modern heterogeneous computing environments. Nonetheless, I argue that signature-based IDS will remain a necessary approach in our arsenal of detection approaches, providing the requisite precision for meaningful detection. My primary contribution is an analysis of how mechanisms of anticipatory bias can be used to achieve performance improvements.

To support my theoretical models, I have explored and implemented two principal approaches. The first uses a combination of anticipation and feedback in an attempt to decrease per-signature costs by (counter-intuitively) increasing system coverage. The approach uses learned sequence statistics to make predictions of future events. Each prediction above a chosen threshold is used to decrease per-stream detection cost by shunting traffic to smaller detectors (at the cost of increased error rates). The second approach applies primarily to improving the performance of IDS when under stress. When overburdened, an IDS will drop input data (often arbitrarily). I describe a "probabilistic signature activation" approach which improves error rates by decreasing the total amount of input data lost by probabilistically dropping signature activations based on learned event statistics and system load.

### Energy Conserving Privacy Enhancing Algorithms in Resource-Constrained Devices

Date: July 27, 2012
Time: 3:00 pm
Place: Room: FEC 141

Michael Groat

Committee Chair:
Dr. Stephanie Forrest, UNM CS Department

Committee Co-Chair:
Dr. Wenbo He, UNM CS Department

Committee Members:
Dr. Jared Saia, UNM CS Department
Dr. Terran Lane, UNM CS Department
Dr. Fernando Esponda, Professor of Computer Science at Instituto Tecnologico Autonomo de Mexico

Abstract:
Resource-constrained devices such as wireless sensor networks, body area networks, or smart phones increasingly collect confidential and sensitive information about their users. Traditional solutions to protect these data, such as encryption, consume a significant amount resources to be viable. In this dissertation, I present two energy efficient information collection protocols based on the notion that by relaxing the definition of privacy, for example by using indistinguishability, energy use can be reduced. The first protocol, multidimensional negative surveys (MDNSs), protects multivariate categorical data by perturbing data to something other than what was actually sensed and transmitting the perturbed value to a central information collection server, thus making the data indistinguishable from the other locations in transit and at the server. The second protocol, KIPDA, protects the privacy of data that are aggregated in wireless sensor networks. It is specialized for the maximum and minimum aggregation functions and is one of the first techniques to provide protection from other adversarial nodes in the network. Sensitive data are obfuscated by hiding them among a set of camouflage values, enabling k-indistinguishability data aggregation. Because the sensitive data are not encrypted, they can be easily and efficiently aggregated with minimal in-network processing delay. While radio usage is expensive, I show through analysis, simulations, and implementations that broadcasting a modest amount of camouflage data is more energy efficient when encryption is eliminated. Simulations and implementations on physical devices illustrate how both approaches can protect the privacy of a participant's data, while reducing energy use and allowing useful aggregate information to be collected.

### Multivalent Random Walkers: A Computational Model of Superdiffusive Transport at the Nanoscale

Date: July 24, 2012
Time: 1:00 pm
Place: Room: FEC 112 (Conference Room in the main CS Office)

Mark J. Olah

Committee Chair:
Dr. Darko Stefanovic, UNM CS Department
Committee Members:
Dr. Cris Moore, UNM CS Department
Dr. Lance Williams, UNM CS Department
Dr. Milan Stojanovic, Assistant Professor at Medicine at Columbia University Departmentent of Pharmacy

Abstract:
We present a new stochastic model and associated simulation framework for a synthetic nanoscale walker that can be used to transport materials and information at superdiffusive rates in artificial molecular systems. Our \emph-multivalent random walker- model describes the motion of walkers with flexible, enzymatic legs that bind to and modify chemical sites arranged as nanoscale paths and tracks.

We use Markov Chain Monte Carlo techniques to simulate both the equilibrium and non-equilibrium behavior of the walker systems at different time-scales, allowing us to model the influence of external load forces on the walker. We show that multivalent random walkers move superdiffusively and directionally even in opposition to a force, operating as a new type of molecular motor that transforms chemical energy from surface bound sites into biased, directional motion and ordered, mechanical work. To generate these results we developed a distributed object-oriented simulation and analysis framework that uses object-relational mapping to store simulation objects persistently in a relational database.

### Energy Consumption in Networks on Chip: Efficiency and Scaling

Date: July 5, 2012
Time: 8:00 am
Place: Room: FEC 141

George Bezerra

Committee Chair:
Dr. Stephanie Forrest, UNM CS Department
Committee Members:
Dr. Melanie Moses, UNM CS Department
Dr. Dorian Arnold, UNM CS Department
Dr. Payman Zarkesh-Ha, Assistant Professor at UNM Electrical and Computer Engineering Department

Abstract:
Computer architecture design is in a new era where performance is increased by replicating processing cores on a chip rather than making larger and faster CPUs. This design strategy is motivated by the superior energy efficiency of the multi-core architecture compared to the traditional monolithic CPU. If the trend continues as expected, the number of cores on a chip is predicted to grow exponentially over time as the density of transistors on a die increases.

A major challenge to the efficiency of multi-core chips is the energy cost of communication among cores over a Network on Chip (NoC). As the number of cores increases, this energy also increases, imposing serious constraints on design and performance of both applications and architectures. Therefore, understanding the impact of different design choices on NoC power and energy consumption is crucial to the success of the multi- and many-core designs.

This dissertation proposes methods for modeling and optimizing energy consumption in multi- and many-core chips, with special focus on the energy used for communication on the NoC. We present a number of tools and models to optimize energy consumption and model its scaling behavior as the number of cores increases. We use synthetic traffic patterns and full system simulations to test and validate our methods. Finally, we take a step back and look at the evolution of computer hardware in the last 40 years and, using a scaling theory from biology, present a predictive theory for power-performance scaling in microprocessor systems.

### Fault-tolerant Wireless Sensor Networks using Evolutionary Games

Date: June 27, 2012
Time: 10:00 am
Place: Room: FEC 145

Ricardo Villalon

Committee Chair:
Dr. Patrick Bridges, UNM CS Department
Committee Members:
Dr. Melanie Moses, UNM CS Department
Dr. David Ackley, UNM CS Department
Dr. Thomas Caudell, Associate Professor at UNM Electrical and Computer Engineering Department

Abstract:
This dissertation describes an agent-based approach to optimizing protocols for wireless sensor networks, where the behavior of the system is determined by the makeup of the population of agents. This approach optimizes the composition of the population of agents based on the use of biologically-inspired strategies, primarily evolutionary games. In addition, this dissertation describes several improvements to the general evolutionary game strategy of system optimization.

To enable the construction and evaluation of systems based on this approach, this dissertation also describes the design and implementation of a software framework for implementing agent-based wireless network protocols in the TinyOS software environment. This includes the design and implementation of a network routing protocol for TinyOS using the agent-based approach to create applications for wireless sensor networks. The evaluation of this approach demonstrates its effectiveness in optimizing wireless network routing in both failure-free and faulty networks.

### Privacy-Preserving Techniques for Computer and Network Forensics

Date: March 12, 2012
Time: 8am to 11am
Place: Room 118, UNM Electrical and Computer Engineering

Bilal Shebaro

Committee Chair:
Dr. Jedidiah Crandall, UNM CS Department
Committee Members:
Dr. Dorian Arnold, UNM CS Department
Dr. Fernando Perez-Gonzalez, Professor at UNM Electrical and Computer Engineering Department
Dr. Pedro Comesana Alfaro, Assistant Professor at UNM Electrical and Computer Engineering Department

Abstract:
Clients, administrators, and law enforcement personnel have many privacy concerns when it comes to network forensics. Clients would like to use network services in a free environment that protects their privacy and personal data. Administrators would like to monitor their network, and audit its behavior and functionality for debugging and statistical purposes (which could involve invading the privacy of its network users). Finally, members of law enforcement would like to track and identify any type of digital crimes that occur on the network, and charge the suspect with the appropriate crimes. Members of law enforcement could use some security back doors made available by network administrators, or other forensic tools, that could potentially invade the privacy of network users.

In my dissertation, I will be identifying and implementing techniques that each of these entities could use to achieve their goals while preserving the privacy of users on the network. I will show a privacy-preserving implementation of network flow recording that can allow administrators to monitor and audit their network behavior and functionality for debugging and statistical purposes without having this data contain any private information about its users. This implementation is based on identity-based encryption and differential privacy. I will also be showing how law enforcement could use timing channel fingerprinting techniques to identify suspect users who are running websites with illegal content and services anonymously. Finally I will show the results from a thought experiment about how network administrators can identify pattern-like software that is running on clients' machines remotely without any administrative privileges.

The goal of my work is to understand what privileges administrators or law enforcement need to achieve their goals, and the privacy issues inherent in this, and to develop technologies that help administrators and law enforcement achieve their goals while preserving the privacy of network users.

### Applications of Voronoi Partitions in Particle Therapy

Date: November 14, 2011
Time: 1pm to 3pm
Place: Room 190, Department of Physics and Astronomy ~ 1919 Lomas Blvd NE

Daniel Riofrio

Committee Chair:
Dr. Shuang Luan, UNM CS Department
Committee Members:
Dr. Michael Holzscheiter, Adjunct Professor at UNM Physics Department
Dr. Philip H. Heintz, Professor at UNM Department of Radiology

Abstract:
Treating cancer using charged particles is becoming more and more popular in modern cancer management due to its increased dose to the targeted tumors and improved sparing of surrounding normal tissues and critical structures. Many challenging and interesting mathematical optimization problems arise in the planning of charged particle radiation therapy. In this thesis, we study three important optimization problems in particle therapy, which includes dose optimization, energy modulation change reduction, and LET (linear energy transfer) painting.

Generally speaking the goal of treatment planning is to use computer based algorithms and software to find a therapeutic plan that maximizes the "dose" delivered to the target tumor while at the meantime minimizing that to the surrounding normal tissues and critical structures. The word "dose" here can refer to physical dose, i.e., energy imparted by the incoming ionizing radiation to the patient, radiobiological dose such as percentage of cells killed, or a combination of both. As an example, the LET painting problem that we studied can be viewed as a combination of physical dose and radiobiological dose, because the LET distribution of a plan can be viewed as a "surrogate" for beam quality and increasing the LET can lead to increased cell killing efficiency under certain circumstances. Various machine properties are also considered in these optimizations. In most particle facilities, changing the beam energies requires an undesirable delay; therefore, in the energy modulation change reduction we aim to find a high quality treatment plan with minimum number of energy changes without compromising the final physical dose distributions.

The contribution of the thesis include the following. (1) We have developed a parameterizable prototype treatment planning system for physical dose optimizations which implements kernel based dose calculations for non-uniform mediums, and dose optimization using non-negative least squares. (2) We found that Voronoi partitions can provide effective heuristic solutions to the energy modulation change reduction and LET painting problems. In addition, this thesis also identified an array of important and challenging computational problems that are not only of importance to the clinicians but also of considerable interests to computer scientists.

### Efficiently Bootstrapping Extreme Scale Software Systems

Date: September 7, 2011
Time: 9:30AM
Place: FEC 141

Josh Goehner

Committee Chair:
Dorian Arnold, Assistant Professor of Computer Science at UNM
Committee Members:
Patrick Bridges, Associate Professor of Computer Science at UNM
Lydia Tapia, Assistant Professor of Computer Science at UNM
Dong Ahn, Lawrence Livermore National Laboratory
Greg Lee, Lawrence Livermore National Laboratory

Abstract:
Many scientific fields and commercial industries need to solve computationally large problems. These problems can be broken into smaller sub-problems, which can be executed in parallel, on large computer systems. In pursuit of solving ever larger problems in a more timely manner, the number of computers in these large computer systems have grown to extremely large scales. All of the extreme scale software systems that run on these extreme scale computational systems go through what we call a bootstrapping phase. During this phase, the software system is deployed onto a set of computers and its initialization information is disseminated.

In this thesis, we present a framework called the Lightweight Infrastructure-Bootstrapping Infrastructure (LIBI) to support extreme scale software systems during their bootstrapping phase. The contributions of this thesis are as follows: a classification system for process launching strategies, an algorithm for optimal process launching, an implementation of LIBI and a performance evaluation of LIBI. Our performance evaluation demonstrates that we decreased the time required for software system bootstrapping by up to 50%.

### Practical, Scalable Algorithms for Byzantine Agreement

Date: August 23, 2011
Time: 2:00PM
Place: FEC 141

Olumuyiwa Oluwasanmi

Committee Chair:
Jared Saia, Associate Professor of Computer Science at UNM
Committee Members:
Valerie King, Professor of Computer Science at University of Victoria
Patrick Bridges, Associate Professor of Computer Science at UNM
Cris Moore, Professor of Computer Science at UNM

Abstract:
With the growth of the Internet, there has been a push toward designing reliable algorithms that scale effectively in terms of latency, bandwidth and other computational resources. Scalability has been a serious problem especially with peer-to-peer(p2p) networks which may have sizes of more than a million nodes.

An important problem in designing reliable algorithms is Byzantine agreement. For reasons of scalability, message complexity is a critical resource for this problem. Unfortunately, previous solutions to Byzantine agreement require each processor to send $O(n)$ messages, where $n$ is the total number of processors in the network.

In this dissertation, we show that the Byzantine Agreement problem can be solved with significantly less that a linear number of messages both in theory and in practice. We implement and test algorithms that solve the classical problem with each processor sending only $\tilde{O}(\sqrt{n})$ messages. Further, we consider the problem in the case where we assume the existence of a random beacon: a global source of random bits. We show that with this assumption, the required number of messages drops to $O(\log n)$, with small hidden constants.

Our algorithms are Monte-Carlo and succeed with high probability, that is probability $1-O(n^k)$ for some positive constant $k$. Our empirical results suggest that our algorithms may outperform classical solutions to Byzantine agreement for network of size larger than 30,000 nodes.

### Keeping Traditional Checkpoint/Restart Viable for Exascale Systems

Date: July 22, 2011
Time: 11:00AM
Place: FEC 145

Kurt Ferreira

Committee Chair:
Patrick Bridges, Associate Professor of Computer Science at UNM
Committee Members:
Dorian Arnold, Assistant Professor of Computer Science at UNM
Jed Crandall, Assistant Professor of Computer Science at UNM
Michela Taufer, Assistant Professor of Computer Science at University of Delaware

Abstract:
Next-generation exascale systems, those capable of performing a quintillion or 10^18 operations per second, are expected to be delivered in the next 8-10 years. These systems, which will be a 1,000 times faster than current systems, will be of unprecedented scale. As these systems continue to grow in size, faults will become increasing common even over the course of small calculations. Therefore, issues such as fault tolerance and reliability will limit application scalability. Current techniques to ensure progress across faults like checkpoint/restart, the dominant fault tolerance mechanism for the last 25 years, are increasing problematic the scales of future systems due to the excessive overheads predicted to more than double an application's time to solution. In this work we evaluate a number of methods to decrease the overhead of checkpoint/restart and therefore keep this method viable for exascale systems. More specifically, this work evaluates state-machine replication to dramatically increase the checkpoint interval (the time between successive checkpoint) and hash-based, probabilistic incremental checkpointing using graphics processing units to decrease the checkpoint commit time (the time to save one checkpoint). Using a combination of empirical analysis, modeling, and simulation we study the costs and benefits of these approaches on a wide range of parameters. These results, which cover of number of high-performance computing capability workloads, different failure distributions, hardware mean time to failures, and I/O bandwidths, show the potential benefit of these techniques for meeting the reliability demands of future exascale platforms.

### Improving peer review with ACORN: Ant Colony Optimization algorithm for Reviewer's Network

Date: June 14, 2011
Time: 10:00AM
Place: FEC 145

Mark Flynn

Committee Chair:
Melanie Moses, Assistant Professor of Computer Science at UNM
Committee Members:
George Luger, Professor and Associate Chair of Computer Science at UNM
Kshanti Greene, President of Social Logic Institute

Abstract:
Peer review, our current system for determining which papers to accept and which to reject by journals and conferences, has limitations that impair the quality of scientific communication. Under the current system, reviewers have only a limited amount of time to devote to evaluating papers and each paper receives an equal amount of attention regardless of how good the paper is. We propose to implement a new system for computer science peer review based on ant colony optimization (ACO) algorithms. In our model, each reviewer has a set of ants that goes out and finds articles. The reviewer assesses the paper that the ant brings according to the criteria specified by the conference organizers and the ant deposits pheromone that is proportional to the quality of the review. Each subsequent ant then samples the pheromones and probabilistically selects the next article based on the strength of the pheromones. We used an agent-based model to determine if an ACO-based paper selection system will direct reviewers. attention to the best articles and if the average quality of papers increases with each round of reviews. We also conducted an experiment in conjunction with the 2011 UNM Computer Science Graduate Student Association conference and compared the results with our simulation. To assess the usefulness of our approach, we compared our algorithm to a greedy algorithm that always takes the best un-reviewed paper and a latent factor analysis recommender-based system. We found that the ACO-based algorithm was more robust to biased initial reviews and correctly ranked these papers better than either of the greedy or recommender algorithms.

### Fine-grained Reasoning about the Security and Usability Trade-off in Modern Security Tools

Date: May 10, 2011
Time: 11:00AM
Place: FEC 141

Mohammed Al-Saleh

Committee Chair:
Jed Crandall, Assistant Professor of Computer Science at UNM
Committee Members:
Dorian Arnold, Assistant Professor of Computer Science at UNM
Terran Lane, Associate Professor of Computer Science at UNM
Rafael Fierro, Associate Professor of Electrical and Computer Engineering at UNM

Abstract:
Defense techniques detect or prevent attacks based on their ability to model the attacks. A balance between security and usability should always be established in any kind of defense technique. Attacks that exploit the weak points in security tools are very powerful and thus can go undetected. One source of those weak points in security tools comes when security is compromised for usability reasons, where if a security tool completely secures a system against attacks the whole system will not be usable because of the large false alarms or the very restricted policies it will create, or if the security tool decides not to secure a system against certain attacks, those attacks will simply and easily succeed.

The key contribution of this dissertation is that it digs deeply into modern security tools and reasons about the inherent security and usability trade-offs based on identifying the low-level, contributing factors to known issues. This is accomplished by implementing full systems and then testing those systems in realistic scenarios. The thesis that this dissertation tests is that this approach is more fruitful in terms of exposing the underlying causes of usability trade-offs than existing approaches which use controlled experiments in more limited environments. Furthermore, this dissertation provides practical solutions and suggestions to reach a good balance between security and usability. We study two modern security tools, Dynamic Information Flow Tracking (DIFT) and Antivirus (AV) software, for their importance and wide usage.

DIFT is a powerful technique that is used in various aspects of security systems. It works by tagging certain inputs and propagating the tags along with the inputs in the target system. However, current DIFT systems do not track implicit information flow because if all DIFT propagation rules are directly applied in a conservative way, the target system will be full of tagged data (a problem called overtagging) and thus useless because the tags tell us very little about the actual information flow of the system. So, current DIFT systems drop some security for usability. In this dissertation, we reason about the sources of the overtagging problem and provide practical ways to deal with it, while previous approaches have focused on abstract descriptions of the main causes of the problem based on limited experiments.

The second security tool we consider in this dissertation is antivirus (AV) software. AV is a very important tool that protects systems against worms and viruses by scanning data against a database of signatures. Despite its importance and wide usage, AV has received little attention from the security research community. In this dissertation, we examine the AV internals and reason about the possibility of creating timing channel attacks against AV software. The attacker could infer information about the AV based only on the scanning time the AV spends to scan benign inputs. The other aspect of AV this dissertation explores is the low-level AV performance impact on systems. Even though the performance overhead of AV is a well known issue, the exact reasons behind this overhead are not well-studied. In this dissertation, we design a methodology that utilizes Event Tracing for Windows technology (ETW), a technology that accounts for all OS events, to reason about AV performance impact from the OS point of view. We show that the main performance impact of the AV on a task is the longer waiting time the task spends waiting on events. The extra CPU usage time the task is caused to spend because of the AV's intrusiveness is also a factor.

### Parallel Network Protocol Stacks Using Replication

Date: April 29, 2011
Time: 11:00AM
Place: FEC 141

Donour Sizemore

Committee Chair:
Patrick Bridges, Associate Professor of Computer Science at UNM
Committee Members:
Dorian Arnold, Assistant Professor of Computer Science at UNM
Jed Crandall, Assistant Professor of Computer Science at UNM
Nasir Ghani, Associate Professor of Electrical and Computer Engineering at UNM

Abstract:
Computing applications demand good performance from networking systems. This includes high-bandwidth communication using protocols with sophisticated features such as ordering, reliability, and congestion control. Much of this protocol processing occurs in software, both on desktop systems and servers. Multi-processing is a requirement on today's computer architectures because their design does not allow for increased processor frequencies. At the same time, network bandwidths continue to increase. In order to meet application demand for throughput, protocol processing must be parallel to leverage the full capabilities of multi-processor or multi-core systems. Existing parallelization strategies have performance difficulties that limit their scalability and their application to single, high-speed data streams.

This dissertation introduces a new approach to parallelizing network protocol processing without the need for locks or for global state. Rather than maintain global states, each processor maintains its own copy of protocol state. Therefore, updates are local and don't require fine-grained locks or explicit synchronization. State management work is replicated, but logically independent work is parallelized. Along with the approach, this dissertation describes Dominoes, a new framework for implementing replicated processing systems. Dominoes organizes the state information into Domains and the communication into Channels. These two abstractions provide a powerful, but flexible model for testing the replication approach.

This dissertation uses Dominoes to build a replicated network protocol system. The performance of common protocols, such as TCP/IP, is increased by multiprocessing single connections. On commodity hardware, throughput increases between 15-300\% depending on the type of communication. Most gains are possible when communicating with unmodified peer implementations, such as Linux. In addition to quantitative results, protocol behavior is studied as it relates to the replication approach.

### Simulating the Evolution of Recruitment Behavior In Foraging Ants

Date: October 28, 2010
Time: 12:30PM
Place: FEC 141

Kenneth Letendre

Committee Chair:
Melanie Moses, Assistant Professor of Computer Science at UNM
Committee Members:
Stephanie Forrest, Chair of the Computer Science Department at UNM
Paul Watson, Research Professor from the Department of Biology at UNM

Abstract:
Spatial heterogeneity in the distribution of food is an important determinant of species' optimal foraging strategies, and of the dynamics of populations and communities. In order to explore the interaction of food heterogeneity and colony size in their effects on the behavior of foraging ant colonies, we built agent-based models of the foraging and recruitment behavior of harvester ants of the genus Pogonomyrmex. We optimized the behavior of these models using genetic algorithms over a variety of food distributions and colony sizes, and validated their behavior by comparison with data collected on harvester ants foraging for seeds in the field. We compared two models: one in which ants lay a pheromone trail each time they return to the nest with food; and another in which ants lay pheromone trails selectively, depending on the density of other food available in the area where food was found. We found that the density-dependent trail-laying model fit the field data better. We found that in this density-dependent recruitment model, colonies of all sizes evolved intense recruitment behavior, even when optimized for environments in which the majority of foods are distributed homogeneously. We discuss the implications of these models to the understanding of optimal foraging strategy and community dynamics among ants, and potential for application to ACO and other distributed problem-solving systems.

### Security in Network Games

Date: July 8th, 2010
Time: 10:00 AM
Place: FEC 141

Navin Rustagi

Committee Chair:
Jared Saia, Professor of Computer Science at UNM
Committee Members:
James Aspnes, Yale Computer Science
Josep Diaz, Universitat Politecnica de Catalunya
Thomas Hayes, Assistant Professor of Computer Science at UNM

Abstract:
Attacks on the Internet are characterized by several alarming trends: 1) increases in frequency; 2) increases in speed; and 3) increases in severity. Modern computer worms simply propagate too quickly for human detection. Since attacks are now occurring at a speed which prevents direct human intervention, there is a need to develop automated defenses. Since the financial, social and political stakes are so high, we need defenses which are provably good against a worst case attacks and are not too costly to deploy. In this talk we present two approaches to tackle these problems.

For the first part of the talk we consider a game between an alert and a worm over a large network. We show, for this game, that it is possible to design an algorithm for the alerts that can prevent any worm from infecting more than a vanishingly small fraction of the nodes with high probability. Critical to our result is designing a communication network for spreading the alerts that has high expansion. The expansion of the network is related to the gap between the 1st and 2nd eigenvalues of the adjacency matrix. Intuitively high expansion ensures redundant connectivity. We also present results simulating our algorithm on networks of size up to 2^25.

In the second part of the talk we consider the virus inoculation game which models the selfish behavior of the nodes involved. We present a technique for this game which makes it possible to achieve the "windfall of malice" even without the actual presence of malicious players. We also show the limitations of this technique for congestion games that are known to have a windfall of malice.

### Dynamic Photon Painting

Date: July 1st, 2010
Time: 10:00 AM
Place: ME427

Zhe Chen

Committee Chair:
Shuang Luan, Assistant Professor of Computer Science at UNM
Committee Members:
Adam Hecht, Assistant Professor of Nuclear Engineering at UNM
Daliang Cao, Medical Physicist in Radiation Oncology Physics at the Swedish Cancer Institute

Abstract:
awaiting abstract

### Molecular Simulations of Dendritic Molecules: a Study of PAMAM and Penyl-Acetylene Dendrimers

Date: May 10, 2010
Time: 1PM
Place: FEC 141

Mark Fleharty

Committee Chair:
Sean Luan, Assistant Professor of Computer Science at UNM
Committee Members:
Darko Stefanovic, Associate Professor of Computer Science at UNM
Deborah Evans, Associate Professor of Chemistry at UNM

Abstract:
Dendrimers are branched molecules that often have chemical properties similar to proteins and other large organic molecules. Dendrimers presently have applications as reactive surfaces for catalysis, and as hosts for drug delivery. Computer simulations of dendritic molecules are difficult due to their relatively large size and the tendency of atoms within a dendrimer to come within very close proximity to each other. The large number of steric interactions makes modeling of dendrimers difficult due to unphysically high energies that arise when a modeler attempts to construct a starting dendrimer from which to minimize its energy. Here we present a method based on rigid body mechanics and a Monte Carlo method to set up the initial conditions for a dendrimer and present our findings. We found that this method is able to rapidly find conformations of dendrimers that can be readily placed into molecular mechanics, and molecular dynamics codes for further study of the dendrimer.

### Programmer Feedback and Dynamic Analysis to Enable Optimization in Java Applications: The D.U.P.O. Framework

Date: April 9th, 2010
Time: 3:00PM
Place: Mechanical Engineering, room 427

David Mohr

Committee Chair:
Darko Stefanovic, Associate Professor of Computer Science at UNM
Committee Members:
Amer Diwan, Assistant Professor of Computer Science at the University of Colorado, Boulder
Patrick Bridges, Associate Professor of Computer Science at UNM

Abstract:
It is inherently difficult for static analysis to make precise decisions about dynamic features of modern object-oriented languages, making it more difficult to apply optimizations aggressively. This thesis introduces the D.U.P.O. framework to facilitate the use of dynamic analysis to enable performance optimizations. Since dynamic analysis cannot guarantee complete code coverage, a two part strategy is employed: Unit tests are used as a ed-facto specification, and the programmer provides final verification. The interaction can be kept at a minimum by using the rich information provided by the dynamic analysis. Object inlining is discussed as an example instance of the framework. :

### Collective belief models for representing consensus and divergence in communities of Bayesian decision-makers

Date: April 9th, 2010
Time: 2:00PM
Place: FEC 141

Kshanti Greene

Committee Chair:
George Luger, Professor of Computer Science at UNM
Committee Members:
Joe Kniss, Assistant Professor of Computer Science at UNM
Melanie Moses, Assistant Professor of Computer Science at UNM
Tim Ross, Professor of Civil Engineering at UNM
Carl Stern, Management Sciences Inc., Albuquerque

Abstract:
Bayesian belief aggregation is the process of forming a consensus model from the probabilistic beliefs of multiple individuals. Preference aggregation attempts to find an optimal solution for a population considering each individual's beliefs, desires and objectives. Belief and preference aggregation approaches that form a single consensus average away any diversity in a population. In the process they may fail to uphold a set of mathematical properties for rational aggregation defined by social choice theorists. I introduce a new aggregation approach that maintains the diversity of a population and allows the competitive aspects of a situation to emerge, enabling game theoretic analysis in large populations of decision-makers. Each individual's beliefs and preferences are represented by a Bayesian network. Based on the result of inference on the networks, a population is separated into collectives whose members agree on the relatively likelihood or desirability of the possible outcomes of a situation. An aggregate for each collective can then be computed such that the aggregate upholds the rationality properties. Game theoretic analysis is then applied using "super-agents" that represent each collective as the game players. In this manner, the set of Pareto optimal and Nash equilibrium solutions can be found, even in situations that cause single consensus models to return non-Pareto or otherwise "irrational"" solutions.

### Algorithms for self-healing in networks

Date: April 1st, 2010
Time: 9:30PM
Place: FEC 141

Amitabh Trehan

Committee Chair:
Jared Saia, Professor of Computer Science at UNM
Committee Members:
Cristopher Moore, Professor of Computer Science at UNM
Thomas Hayes, Assistant Professor of Computer Science at UNM
Tanya Berger-Wolf,

Abstract:
Many modern networks are reconfigurable, in the sense that the topology of the network can be changed by the nodes in the network. For example, peer-to-peer, wireless and ad-hoc networks are reconfigurable. More generally, many social networks, such as a company's organizational chart; infrastructure networks, such as an airline's transportation network; and biological networks, such as the human brain, are also reconfigurable. Modern reconfigurable networks have a complexity unprecedented in the history of engineering, resembling more a dynamic and evolving living animal rather than a structure of steel designed from a blueprint. Unfortunately, our mathematical and algorithmic tools have not yet developed enough to handle this complexity and fully exploit the flexibility of these networks.

We believe that it is no longer possible to build networks that are scalable and never have node failures. Instead, these networks should be able to admit small, and maybe, periodic failures and still recover like skin heals from a cut. This process, where the network can recover itself by maintaining key invariants in response to attack by a powerful adversary is what we call self-healing.

Here, we present several fast and provably good distributed algorithms for self-healing in reconfigurable dynamic networks. Each of these algorithms have different properties, a different set of guarantees and limitations. We also discuss future directions and theoretical questions we would like to answer.

### Efficient Algorithms for Phylogenetic Post-Analysis

Date: March 29, 2010
Time: 4:00PM
Place: FEC 141

Nick Pattengale

Committee Chair:
Bernard Moret, EPFL, Switzerland
Committee Members:
Alexandros Stamatakis, Technical University of Munich, Germany
Jared Saia, UNM CS
Cris Moore, UNM CS

Abstract:
A variety of tasks are typically performed after a phylogenetic reconstruction proper -- tasks which fall under the category 'phylogenetic post-analysis'. In this dissertation, we present novel approaches and efficient algorithms for three post-analysis tasks: taking distances between (typically, all pairs in a set of) trees, bootstrapping, and building consensus trees.

For instance, it is often the case that reconstruction finds multiple equally plausible trees. One basic way of addressing this situation is to take distances between pairs of trees, in order to gain an understanding of the extent to which the trees disagree. The most frequently employed manner for computing the distance between a tree pair is the Robinson-Foulds metric, a natural dissimilarity measure between a pair of phylogenetic trees. We present a novel family of algorithms for efficiently computing the Robinson-Foulds metric.

Bootstrapping is a post-analysis technique for assessing the extent to which the underlying data (e.g., molecular sequences) supports a reconstructed tree. The basis of the approach is to reconstruct many trees, called replicates, based on random subsampling of the original data. However, to date, there has been little treatment in phylogeny regarding the question of how many bootstrap replicates to generate. We propose 'bootstopping criteria' which are designed to provide on-the-fly (i.e., runtime) guidance for determining when enough bootstrap replicates have been reconstructed.

Another common post-analysis task is to build a consensus tree, a summary tree that attempts to capture the information agreed upon by bootstrap replicates. Unfortunately, the most popular consensus methods are susceptible to confusion by rogue taxa, i.e. taxa that cannot be placed with assurance anywhere within the tree. We present novel theory and efficient algorithms to identify rogue taxa, as well as a novel technique for interpreting the results (in the context of bootstrapping).

### Markov Random Field Modeling of the Spatial Distribution of Proteins on Cell Membranes

Date: March 5, 2010
Time: 9:00AM
Place: FEC 145

Jun Zhang

Committee Chair:
Lance Williams, Associate Professor of Computer Science at UNM
Committee Members:
Terran Lane, Associate Professor of Computer Science at UNM
Shuang Luan, Assistant Professor of Computer Science at UNM
Stanly Steinberg, Professor Emeritus of Mathematics & Statistics at UNM
Bridget Wilson, Professor of Pathology at UNM

Abstract:
Cell membranes display a range of receptors that bind ligands and activate signaling pathways. Signaling is characterized by dramatic changes in membrane molecular topography, including the co-clustering of receptors with signaling molecules and the segregation of other signaling molecules away from receptors. Electron microscopy of immunogoldlabeled membranes is a critical technique to generate topographical information at the 5-10 nm resolution needed to understand how signaling complexes assemble and function. However, due to experimental limitations, only two molecular species can usually be labeled at a time. A formidable challenge is to integrate experimental data across multiple experiments where there are from 10 to 100 different proteins and lipids of interest and only the positions of two species can be observed simultaneously. As a solution, Markov random field (MRF) modeling is proposed to reconstruct the distribution of multiple cell membrane constituents from pair-wise data sets. MRFs are a powerful mathematical formalism for modeling correlations between states associated with neighboring sites in spatial lattices. The presence or absence of a protein of a specific type at a point on the cell membrane is a state. Since only two protein types can be observed, i.e., those bound to particles, and the rest cannot be observed, the problem is one of deducing the conditional distribution of a MRF with unobservable (hidden) states. Here, a multiscale MRF model has been developed and mathematical programming techniques have been used to infer the conditional distribution of a MRF for proteins of three types from observations showing the spatial relationships between only two types. Application to synthesized data shows that the spatial distributions of three proteins can be reliably estimated. Application to experimental data provides the first maps of the spatial relationship between groups of three different signaling molecules. A small neighborhood system, 4-neighborhood, has been used in the MRF modeling. In order to improve reconstruction quality, a larger 8- neighborhood system is utilized in a multiscale Gibbs random field (GRF) modeling by exploiting the Markov-Gibbs equivalence. Application of the multiscale GRF modeling to synthesized and experimental data shows that the quality of reconstruction has been improved. The work is an important step towards a more complete understanding of membrane spatial organization and dynamics during signaling.

Thesis & Dissertation Defenses Archive