# Publications

## 2009 |

Brun, Yuriy; Serugendo, Giovanna Di Marzo; Gacek, Cristina; Giese, Holger; Kienle, Holger; Litoiu, Marin; Müller, Hausi; Pezzè, Mauro; Shaw, Mary Engineering self-adaptive systems through feedback loops Incollection In: Cheng, Betty H. C.; Lemos, Rogério; Giese, Holger; Inverardi, Paola; Magee, Jeff (Ed.): Software Engineering for Self-Adaptive Systems, vol. 5525, pp. 48–70, Springer-Verlag, 2009, ISBN: 978-3-642-02160-2. @incollection{Brun09SEfSAS, To deal with the increasing complexity of software systems and uncertainty of their environments, software engineers have turned to self-adaptivity. Self-adaptive systems are capable of dealing with a continuously changing environment and emerging requirements that may be unknown at design-time. However, building such systems cost-effectively and in a predictable manner is a major engineering challenge. In this paper, we explore the state-of-the-art in engineering self-adaptive systems and identify potential improvements in the design process. Our most important finding is that in designing self-adaptive systems, the feedback loops that control self-adaptation must become first-class entities. We explore feedback loops from the perspective of control engineering and within existing self-adaptive systems in nature and biology. Finally, we identify the critical challenges our community must address to enable systematic and well-organized engineering of self-adaptive and self-managing software systems. |

## 2008 |

Brun, Yuriy Reducing tileset size: 3-SAT and beyond Inproceedings In: Proceedings of the 14th International Meeting on DNA Computing (DNA), pp. 178, Prague, Czech Republic, 2008. @inproceedings{Brun08dna-sat, In self-assembly research, reducing the number of distinct tiles necessary to compute functions can make it feasible to implement tile systems to solve complex problems. Existing methods for solving 3-SAT, a well-known NP-complete problem, in the tile assembly model involve either using $Theta(n^2)$ distinct tiles to nondeterministically decide whether an $n$-variable Boolean formula is satisfiable or simulating a cellular automata system simulating a Turing machine, which requires a constant but large number of tiles to deterministically make the decision. Here, I propose a system that solves k-SAT nondeterministically, for all $k in mathbbN$, in time linear in the input size using only 64 distinct tiles. Further, I propose a mechanism for converting the tilesets of tile systems for many NP-complete and other problems from tilesets whose size depends on the input into $Theta(1)$-size tilesets. |

Brun, Yuriy University of Southern California, 2008, ISBN: 978-0-549-60920-9. @phdthesis{Brun08PhD, When engineers compare biological and software systems, the former come out ahead in the majority of dimensions. For example, the human body is far more complex, better suited to deal with faulty components, more resistant to malicious agents such as viruses, and more adaptive to environmental changes than your favorite operating system. Thus it follows that we, the engineers, may be able to build better software systems than the ones we build today by borrowing technologies from nature and injecting them into our system design process. In this dissertation, I present an architectural style and accompanying implementation support for building distributed software systems that allow large networks, such as the Internet, to solve computationally intensive problems. This architectural style, the tile style, is based on a nature's system of crystal growth, and thus inherits some of nature's dependability, fault and adversary tolerance, scalability, and security. The tile style allows one to distribute computation onto a large network in a way that guarantees that unless someone controls a large fraction of the network, they cannot learn the private data within the computation or force the computation to fail. These systems are highly scalable, capable of dealing with faulty and malicious nodes, and are discreet since every sufficiently small group of nodes knows neither the problem nor the data. The tile style is based on a formal mathematical model of self-assembly. In order to leverage this model to build software, I define the notion of self-assembling computation and develop systems that compute functions such as adding, multiplying, factoring, and solving NP-complete problems SubsetSum and SAT. For each system, I prove its correctness, compute its probability of successful computation, and show that its running time and tileset size are asymptotically optimal. I use the mathematical nature of the tile assembly model to present a formal mathematical analysis of the tile style, proving that software systems built using this style are discreet, fault- and adversary-tolerant, and scalable. I further implement a tile-style system and use it to distribute computation to empirically evaluate the tile style's utility. |

Brun, Yuriy Solving NP-complete problems in the tile assembly model Journal Article In: Theoretical Computer Science, vol. 395, no. 1, pp. 31–46, 2008, ISSN: 0304-3975. @article{Brun08np-c, Formalized study of self-assembly has led to the definition of the tile assembly model, a highly distributed parallel model of computation that may be implemented using molecules or a large computer network such as the Internet. Previously, I defined deterministic and nondeterministic computation in the tile assembly model and showed how to add, multiply, and factor. Here, I extend the notion of computation to include deciding subsets of the natural numbers, and present a system that decides $mathitSubsetSum$, a well known NP-complete problem. The computation is nondeterministic and each parallel assembly executes in time linear in the input. The system requires only a constant number of different tile types: 49. I describe mechanisms for finding the successful solutions among the many parallel assemblies and explore bounds on the probability of such a nondeterministic system succeeding and prove that probability can be made arbitrarily close to 1. |

Brun, Yuriy Nondeterministic polynomial time factoring in the tile assembly model Journal Article In: Theoretical Computer Science, vol. 395, no. 1, pp. 3–23, 2008, ISSN: 0304-3975. @article{Brun08factor, Formalized study of self-assembly has led to the definition of the tile assembly model. Previously, I presented ways to compute arithmetic functions, such as addition and multiplication, in the tile assembly model: a highly distributed parallel model of computation that may be implemented using molecules or a large computer network such as the Internet. Here, I present tile assembly model systems that factor numbers nondeterministically using $Theta(1)$ distinct components. The computation takes advantage of nondeterminism, but theoretically, each of the nondeterministic paths is executed in parallel, yielding the solution in time linear in the size of the input, with high probability. I describe mechanisms for finding the successful solutions among the many parallel executions and explore bounds on the probability of such a nondeterministic system succeeding and prove that probability can be made arbitrarily close to 1. |

Brun, Yuriy Solving satisfiability in the tile assembly model with a constant-size tileset Journal Article In: Journal of Algorithms, vol. 63, no. 4, pp. 151–166, 2008, ISSN: 0196-6774. @article{Brun08sat, Biological systems are far more complex and robust than systems we can engineer today. One way to increase the complexity and robustness of our engineered systems is to study how biological systems function. The tile assembly model is a highly distributed parallel model of nature's self-assembly. Previously, I defined deterministic and nondeterministic computation in the tile assembly model and showed how to add, multiply, factor, and solve $mathitSubsetSum$. Here, I present a system that decides satisfiability, a well-known NP-complete problem. The computation is nondeterministic and each parallel assembly executes in time linear in the input. The system requires only a constant number of different tile types: $64$, an improvement over previously best known system that uses $Theta(n^2)$ tile types. I describe mechanisms for finding the successful solutions among the many parallel assemblies and explore bounds on the probability of such a nondeterministic system succeeding and prove that probability can be made arbitrarily close to $1$. |

Brun, Yuriy Building biologically-inspired self-adapting systems Inproceedings In: Cheng, Betty H. C.; Lemos, Rogério; Giese, Holger; Inverardi, Paola; Magee, Jeff (Ed.): Proceedings of the Schloss Dagstuhl seminar 08031: Software Engineering for Self-Adaptive Systems, Dagstuhl, Germany, 2008, ISSN: 1862-4405. @inproceedings{Brun08dagstuhl, Biological systems are far more complex than systems we design and build today. The human body alone has orders of magnitude more complexity than our most-intricate designed systems. Further, biological systems are decentralized in such a way that allows them to benefit from built-in error-correction, fault tolerance, and scalability. It follows that if we can extract certain properties of biological systems and inject them into our software design process, we may be able to build complex self-adaptive software systems. Biological systems' complexity makes them not only desirable to guide software design, but also difficult to fully understand. Thus one approach to building software similar to biological systems is by first building models of biology that we can understand. Then these models can guide the high-level design, or architecture of the software systems, resulting in systems that retain the model's fault tolerance, scalability, and other properties. I present a general outline of how one might use biology to create a model to guide the architecture of a software system, and develop one such model and the resulting architectural style, the tile style, for computational systems that can use a large distributed network of computers, such as the Internet, to solve computationally-intensive problems in a discreet, fault-tolerant, and scalable manner. |

Brun, Yuriy Constant-size tileset for solving an NP-complete problem in nondeterministic linear time Journal Article In: Lecture Notes on Computer Science, vol. 4848/2008, pp. 26–35, 2008, ISSN: 0302-9743. @article{Brun08dna-lncs, The tile assembly model, a formal model of crystal growth, is of special interest to computer scientists and mathematicians because it is universal. Therefore, tile assembly model systems can compute all the functions that computers compute. In this paper, I formally define what it means for a system to nondeterministically decide a set, and present a system that solves an NP-complete problem called $mathitSubsetSum$. Because of the nature of NP-complete problems, this system can be used to solve all NP problems in polynomial time, with high probability. While the proof that the tile assembly model is universal implies the construction of such systems, those systems are in some sense ``large'' and ``slow.'' The system presented here uses $49 = Theta(1)$ different tiles and computes in time linear in the input size. I also propose how such systems can be leveraged to program large distributed software systems. |

## 2007 |

Brun, Yuriy; Medvidovic, Nenad Fault and adversary tolerance as an emergent property of distributed systems' software architectures Inproceedings In: Proceedings of the 2nd International Workshop on Engineering Fault Tolerant Systems (EFTS), pp. 38–43, Dubrovnik, Croatia, 2007. @inproceedings{Brun07efts, Fault and adversary tolerance have become not only desirable but required properties of software systems because mission-critical systems are commonly distributed on large networks of insecure nodes. In this paper, we describe how the tile style, an architectural style designed to distribute computation, can inject fault and adversary tolerance. The result is a notion of tolerance that is entirely abstracted away from the functional properties of the software system. The client may specify what fraction of the network is faulty or malicious (e.g., $25%$) and the acceptable system failure rate (e.g., $2^-10$), and the system's architecture adjusts automatically to ensure a failure rate no higher than the one specified. The technique is entirely automated and consists of a ``smart redundancy'' mechanism that brings the failure rate exponentially close to $0$ by slowing down the execution speed linearly. |

Brun, Yuriy Arithmetic computation in the tile assembly model: Addition and multiplication Journal Article In: Theoretical Computer Science, vol. 378, no. 1, pp. 17–31, 2007, ISSN: 0304-3975. @article{Brun07arith, Formalized study of self-assembly has led to the definition of the tile assembly model. Research has identified two issues at the heart of self-assembling systems: the number of steps it takes for an assembly to complete, assuming maximum parallelism, and the minimal number of tiles necessary to assemble a shape. In this paper, I define the notion of a tile assembly system that computes a function, and tackle these issues for systems that compute the sum and product of two numbers. I demonstrate constructions of such systems with optimal $Theta(1)$ distinct tile types and prove the assembly time is linear in the size of the input. |

Brun, Yuriy; Medvidovic, Nenad An architectural style for solving computationally intensive problems on large networks Inproceedings In: Proceedings of Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Minneapolis, MN, USA, 2007, (SEAMS 2020 Most Influential Paper Award.). @inproceedings{Brun07seams, Large networks, such as the Internet, pose an ideal medium for solving computationally intensive problems, such as NP-complete problems, yet no well-scaling architecture for computational Internet-sized systems exists. We propose a software architectural style for large networks, based on a formal mathematical study of crystal growth that will exhibit properties of (1) discreetness (nodes on the network cannot learn the algorithm or input of the computation), (2) fault-tolerance (malicious, faulty, and unstable nodes may not break the computation), and (3) scalability (communication among the nodes does not increase with network or problem size). |

Brun, Yuriy A discreet, fault-tolerant, and scalable software architectural style for Internet-sized networks Inproceedings In: Proceedings of the Doctoral Symposium at the 29th International Conference on Software Engineering (ICSE), pp. 83–84, Minneapolis, MN, USA, 2007. @inproceedings{Brun07icse-doc-symp, Large networks, such as the Internet, pose an ideal medium for solving computationally intensive problems, such as NP-complete problems, yet no well-scaling architecture for Internet-sized systems exists. I propose a software architectural style for large networks, based on a formal mathematical study of crystal growth that will exhibit properties of (1) discreetness (nodes on the network cannot learn the algorithm or input of the computation), (2) fault-tolerance (malicious, faulty, and unstable nodes cannot break the computation), and (3) scalability (communication among the nodes does not increase with network or problem size). I plan to evaluate the style both theoretically and empirically for these three properties. |

Brun, Yuriy Adding and multiplying in the tile assembly model Inproceedings In: Proceedings of the 4th Foundations of Nanoscience: Self-Assembled Architectures and Devices (FNANO), Snowbird, UT, USA, 2007. @inproceedings{Brun07fnano, The tile assembly model, a formal model of crystal growth, is of special interest to computer scientists and mathematicians because it is universal. Therefore, tile assembly model systems can compute all the functions that computers compute. In this paper, I formally define what it means for a system to compute a function deterministically and present systems that add and multiply. While the proof that the tile assembly model is universal implies the construction of such systems, those systems are in some sense ``large'' and ``slow.'' The systems presented here all use $Theta(1)$ different tiles (8 to add and 28 to multiply) and compute in time linear in the input size. |

## 2006 |

Brun, Yuriy; Gopalkrishnan, Manoj Toward in vivo disease diagnosis and treatment using DNA Inproceedings In: Proceedings of the 2006 International Conference on Bioinformatics & Computational Biology (BIOCOMP), pp. 182–186, Las Vegas, NV, USA, 2006, ISBN: 1-60132-002-7. @inproceedings{Brun06biocomp, We propose a technique to diagnose and treat individual cells in the human body. A virus-like system delivers a copy of a diagnosis and treatment DNA complex to each cell. The complex determines whether the cell has a specific disease based on the presence or absence of indicator mRNA molecules and, if the diagnosis is positive, releases a proper drug for treatment. As a tool for the diagnosis and treatment system, we develop a DNA implementation of an arbitrary finite state machine. |

## 2005 |

Reishus, Dustin; Shaw, Bilal; Brun, Yuriy; Chelyapov, Nickolas; Adleman, Leonard Self-assembly of DNA double-double crossover complexes into high-density, doubly connected, planar structures Journal Article In: Journal of the American Chemical Society (JACS), vol. 127, no. 50, pp. 17590–17591, 2005. @article{Reishus05, |

## 2004 |

Chelyapov, Nickolas; Brun, Yuriy; Gopalkrishnan, Manoj; Reishus, Dustin; Shaw, Bilal; Adleman, Leonard DNA triangles and self-assembled hexagonal tilings Journal Article In: Journal of the American Chemical Society (JACS), vol. 126, no. 43, pp. 13924–13925, 2004. @article{Chelyapov04, |

Brun, Yuriy; Ernst, Michael D. Finding latent code errors via machine learning over programs executions Inproceedings In: Proceedings of the 26th International Conference on Software Engineering (ICSE), pp. 480–490, Edinburgh, Scotland, 2004. @inproceedings{Brun04icse, This paper proposes a technique for identifying program properties that indicate errors. The technique generates machine learning models of program properties known to result from errors, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. Given a set of properties produced by the program analysis, the technique selects a subset of properties that are most likely to reveal an error. An implementation, the Fault Invariant Classifier, demonstrates the efficacy of the technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. In our experimental evaluation, the technique increases the relevance (the concentration of fault-revealing properties) by a factor of 50 on average for the C programs, and 4.8 for the Java programs. Preliminary experience suggests that most of the fault-revealing properties do lead a programmer to an error. |

Brun, Yuriy; Gopalkrishnan, Manoj; Reishus, Dustin; Shaw, Bilal; Chelyapov, Nickolas; Adleman, Leonard Building blocks for DNA self-assembly Inproceedings In: Proceedings of the 1st Foundations of Nanoscience: Self-Assembled Architectures and Devices (FNANO), pp. 2–15, Snowbird, UT, USA, 2004. @inproceedings{Brun04fnano, DNA complexes, like the double crossover, are used as building blocks for the assembly of higher-order structures. Currently, the number of experimentally proven reliable complexes is small. We have begun work on expanding the collection of such complexes. Here we report on our design concepts and initial experiments. In particular, we present experimental evidence of two new complexes: quadruple crossovers and triangles. In principle, quadruple crossovers can be extended to three-dimensional, space-filling lego brick complexes, while triangles are capable of hexagonally tiling the plane. |

## 2003 |

Brun, Yuriy Software fault identification via dynamic analysis and machine learning Masters Thesis Massachusetts Institute of Technology, Cambridge, MA, USA, 2003. @mastersthesis{Brun03masters, I propose a technique that identifies program properties that may indicate errors. The technique generates machine learning models of run-time program properties known to expose faults, and applies these models to program properties of user-written code to classify and rank properties that may lead the user to errors. I evaluate an implementation of the technique, the Fault Invariant Classifier, that demonstrates the efficacy of the error finding technique. The implementation uses dynamic invariant detection to generate program properties. It uses support vector machine and decision tree learning tools to classify those properties. Given a set of properties produced by the program analysis, some of which are indicative of errors, the technique selects a subset of properties that are most likely to reveal an error. The experimental evaluation over 941,000 lines of code, showed that a user must examine only the 2.2 highest-ranked properties for C programs and 1.7 for Java programs to find a fault-revealing property. The technique increases the relevance (the concentration of properties that reveal errors) by a factor of 50 on average for C programs, and 4.8 for Java programs. |

## 2002 |

Brun, Yuriy The four-color theorem Journal Article In: Undergraduate Journal of Mathematics, pp. 21–28, 2002. @article{Brun02four-color, In this review paper, I introduce graph theory, and discuss the Four Color Theorem. Then I prove several theorems, including Euler's formula and the Five Color Theorem. |

## 1997 |

Vekhter, Daniel; Rasin, Alex; Brun, Yuriy Mutual exclusion algorithms in distributed networks Journal Article In: Journal of Student Research, Science and Technology, vol. 2, no. 1, pp. 65–67, 1997. @article{Vekhter97, The problem of mutual exclusion is ever-present in distributed networks. Various algorithms have been suggested as a means to solve the problem. This paper discusses the efficiency of three such algorithms --- Test-and-Set bit, Dijkstra's, and Peterson's. The performance of each of the algorithms depends on the number of competing processes and the length of the remainder section. Our experimental results show which of the three algorithms are the most efficient to use depending on the size of the network and the number of remainder steps. |