proposal-presentation-talk ========================== Author: Eric Schulte Date: 2012-08-15 12:48:14 MDT Table of Contents ================= 1 Introduction 1.1 Outline 1.2 Natural Selection of Software 1.3 Motivation 2 Related Work 2.1 Biological Robustness Genotype and Phenotype 2.2 Biological Robustness Mechanisms of robustness 2.3 Biological Robustness Robustness and Evolution 2.4 Evolutionary Computation Digital Evolution 2.5 Evolutionary Computation Genetic Programming 2.6 Software Engineering Acceptably Correct Computation 3 Preliminary Work 3.1 Software Mutational Robustness 3.2 Software Mutational Robustness -- Mutation Operators 3.3 Software Mutational Robustness -- Semantic Space & Mutation Testing 3.4 Software Mutational Robustness -- Benchmarks 3.5 Software Mutational Robustness -- Preliminary Results 4 Investigation of Mutational Robustness 4.1 Causes of Mutational Robustness -- Level: Source VS. Compiled 4.2 Causes of Mutational Robustness -- Provenance: Evolved VS. Engineered 4.3 Correlates of Mutational Robustness -- Evolvability 4.4 Correlates of Mutational Robustness -- Environmental Robustness 5 Applications of Mutational Robustness 5.1 Software Diversity Compilers and Linkers 5.2 Software Diversity Program Atavism using Version Control 5.3 Software Optimization 5.4 Software Husbandry 6 Conclusion 6.1 Work-plan and Timeline 6.2 Conclusion 1 Introduction =============== - Hello, my name is Eric Schulte - just starting my fourth year - today I will be presenting the proposed outline of my dissertation research - Thank you all for attending and serving on my committee 1.1 Outline ------------ - Begin by introducing the main thesis of this work - following by a review of related work in a number of disparate fields - present the results of some preliminary work - outline further research and possible applications - conclude with - a summary of the work performed so far, and - a schedule for the remainder of this dissertation 1.2 Natural Selection of Software ---------------------------------- The thesis of this work is that the software development environment including - applications - interfaces - operating systems - programming languages - compilers and linkers Is the result of natural selection. In this cases software developers perform the selection, mutation, and reproduction. - rather than view software as an implementation of a formal specification, this research embraces the messy side of software development as an iterative process of modification and evaluation We will investigate properties which are common across any evolved system, such as *robustness* and *evolvability*. This work will be split into two main parts addressing two main questions 1. first an investigation of the /natural/ aspects of software 2. second look at some ways we can exploit software robustness and evolvability to develop new tools and techniques 1.3 Motivation --------------- There is currently both a need and an opportunity for this work. - Software Development is an increasingly important aspect of the US economy consuming the daily efforts of a large number of men and women - increased understanding of software and - new tools to efficiently develop and maintain software are important goals Currently there is a confluence between the computational and biological communities which promises to yield new insights into the natural aspects of software. I will now touch on the high points of this recent work across - robustness in biological systems - work in "evolutionary computation" and - some "natural" trends in software engineering 2 Related Work =============== 2.1 Biological Robustness Genotype and Phenotype ------------------------------------------------- I would like to begin by introducing a distinction which will run throughout this work. genotype phenotype ----------------+-------------------------- specification action in the world ----------------+-------------------------- source code program behavior ----------------+-------------------------- can be mutated has a measurable fitness ----------------+-------------------------- changes -> changes Each of these realms has an associated notion of robustness - robustness to internal or external variance - biological - mutations - environment - software - code compiler - host system, inputs 2.2 Biological Robustness Mechanisms of robustness --------------------------------------------------- There has been a great deal of research into the robustness of biological systems. Mechanisms through which these systems achieve robustness exist at many levels. - amino acids - important amino acids over represented, many encodings - similar codings lead to similar amino acids - metabolic pathways - produce stable output over wide ranges of inputs - if one input is low/missing another can compensate - results in both mutational and environmental robustness - degenerate vital functions - no single neuron is necessary, but - no two neurons are exactly the same 2.3 Biological Robustness Robustness and Evolution --------------------------------------------------- There is an intimate link between mutational robustness and evolution in biological systems. - program fitness space - very high dimensional program space - points are programs - neighbors are separated by single mutations - each point has an associated fitness - neutral spaces - neutral spaces are regions of equal fitness - as there are no selective forces, populations drift through these spaces - tends towards the center of neutral spaces - organisms with high mut-rb have large neutral spaces - accrued information while traveling through neutral spaces allow innovation and jumps to new neutral spaces 2.4 Evolutionary Computation Digital Evolution ----------------------------------------------- - computational platform for evolutionary experiments - simplified assembly languages or /chemistries/ - evolved variants are responsible for copying themselves - designed to allow investigations not possible /in situ/ - evolutionary time frames - environmental controls - metrics - ability re-run experiments 2.5 Evolutionary Computation Genetic Programming ------------------------------------------------- Use of natural selection as a heuristic guide automatic programming of computers While dissertation describes the natural selection of software systems where humans write, select and copy programs over decades in GP computers perform these tasks in seconds - GP has been an active area of research for decades with numerous dedicated conferences and journals - generally operates over simplified languages and can only address problems with well defined, quickly executable fitness functions - recently used however, to modify existing real-world software written in normal programming languages using the software's test suite as a fitness function 2.6 Software Engineering Acceptably Correct Computation -------------------------------------------------------- Community - away from ideas of formal correctness - towards approximate correctness or acceptable performance - Failure oblivious - executes through errors, wrapping out of bounds reads/writes or returning dummy values - Program Hallucination - replace unacceptable inputs with learned "normal" inputs - on the borders of the system - Clearview - learns normal behavior (invariants) - enforces these behaviors at runtime - internal to the system - Red Team and Firefox: clearview "detected and blocked" all attacks - Loop Perforation - trades time and power resources for accuracy all anathema to formal correctness, but appropriate when - safety - continuity - energy - runtime are more important 3 Preliminary Work =================== 3.1 Software Mutational Robustness ----------------------------------- - software mutational robustness, foundation of this dissertation - grew from work using GP to repair programs - one of the reasons that work was so surprisingly successfully I will define Software Mutational Robustness Depends upon all of these... Surprisingly Independent of these factors and seems to be a fundamental property of modern software. 3.2 Software Mutational Robustness -- Mutation Operators --------------------------------------------------------- Here we present operators at both - source AST - ASM level of linear vectors of ASM instructions - I have developed techniques for manipulating ELF binaries directly, however these are very similar to ASM mutation operations and are not shown These operations - manipulate existing code (don't add new) - are "natural" - taken from the GP community - capable of adding value to program representations - and productively evolving software These operations can be used to define a syntactic space of software with these operations themselves as the distance metric. 3.3 Software Mutational Robustness -- Semantic Space & Mutation Testing ------------------------------------------------------------------------ Syntactic Space defined by our mutation operations, mapped to a Semantic Space of programs (compilation) - each point defines a set of equivalent program implementations - many different source codes may compile to the same equivalent program - blue ball is specification (if that exists), acceptable performance - red ball is defined by the test suite - hopefully - intersection large - exclusive dis-junction small - program itself in their intersection Talk briefly about the mutation testing community: - Uses % mutants which fail the test suite as a metric of test suite quality - Does not recognize the possibility of non-faulty non-equivalent mutants. - Effectively reduces the Test suite (red) to the point of the original program - Test suites approach diffs against the source of the original program This work on the other hand will call all mutants which satisfy the test suite "neutral", and will seek to make use of them. 3.4 Software Mutational Robustness -- Benchmarks ------------------------------------------------- Three classes - Sorters - simple - exhaustive test coverage - Siemens - long history in mut-test community - extremely high quality test suite (30 tests per execution branch) - Real Programs - real world - real test suites 3.5 Software Mutational Robustness -- Preliminary Results ---------------------------------------------------------- - Surprisingly, contrary to common folk-wisdom - ~36% of mutations are neutral on average - bottoms out at 20% Across both programs and test suites robustness is largely constant. Leaves only mutation operators. While one could certainly construct mutation operators with different levels of mutational robustness, these operators are - simple - standard - able to improve programs 4 Investigation of Mutational Robustness ========================================= 4.1 Causes of Mutational Robustness -- Level: Source VS. Compiled ------------------------------------------------------------------ Having established that mutational robustness exists, lets look for causes. Biological systems buffer the effects of mutation at many levels. - changes in codons → similar proteins - collection of proteins into functional units Is the same true of the computational analogs? Do compilers and linkers increase robustness? Comparison of mutation operators across levels Threats - different mutation operators required at each level - what are we measuring 4.2 Causes of Mutational Robustness -- Provenance: Evolved VS. Engineered -------------------------------------------------------------------------- Artifacts of evolutionary processes must be amenable to those processes. Compare robustness and evolvability across three classes of software. Requires - metric for robustness - metric for evolvability 4.3 Correlates of Mutational Robustness -- Evolvability -------------------------------------------------------- In biological systems, many known correlates of evolvability including - evolvability and - environmental robustness With our evolvability and robustness metrics in hand and large program benchmark suites... Lets see if they're correlated If so, lets see if we can use one to change the other. 4.4 Correlates of Mutational Robustness -- Environmental Robustness -------------------------------------------------------------------- Compare mutational and environmental robustness across a large program benchmark suite. Fuzz testing as a metric of environmental robustness. 5 Applications of Mutational Robustness ======================================== 5.1 Software Diversity Compilers and Linkers --------------------------------------------- Using multiple compilers and compiler flags many different executable programs may be generated from a single source file. These diverse variants would likely be similar enough to breed productively (maybe). This could be used to... 5.2 Software Diversity Program Atavism using Version Control ------------------------------------------------------------- Biological systems carry around huge amounts of historical DNA which is "turned off". Similar information is available for software projects through their version control repositories. Challenges - designing a representation which can hold patch/version information - designing mutation operators for this representation Benefits - more genetic fodder - more edits in buggy locations - large jumps through neutral space 5.3 Software Optimization -------------------------- Using - evolutionary techniques with test-suite defined fitness - system emulators or execution directly on target systems will allow - more flexibility than compilers - explicit specification of priorities w/multi-objective fitness - incorporation of *all* system factors (e.g., ENV variables) 5.4 Software Husbandry ----------------------- Multiple programs which all conform to the same test suite could be combined using crossover. - programs conforming to standards e.g., - router implementations - YAML parsers - allowing - optimization transfer - functionality transfer - source of implementation diversity 6 Conclusion ============= 6.1 Work-plan and Timeline --------------------------- - A timeline of how this work might fall out - likely way too ambitious to think I'll actually be able to plan at this scale, but it is at least true that - I will be working on a mutational robustness publication this fall - I'm working on optimization now - I plan to look at diversity this fall 6.2 Conclusion --------------- - Simply by persisting in the world, software is natural - Despite common folk wisdom among engineers software is not fragile - Software robustness can be used to build new tools and techniques