# Theory Seminar

Welcome to the official page for the **UMass-Amherst Computer Science Theory Seminar** for Spring 2023. The seminar is **1-2 pm on Tuesdays** in **Room 140** of the Computer Science Building at UMass-Amherst, and is free and open to the public. The faculty host is **Hung Le**. If you are interested in giving a talk, please email Professor Le or **Rik Sengupta**. Note that in addition to being a public lecture series, this is also a one-credit graduate seminar (**CompSci 891M**) that can be taken repeatedly for credit.

All Tuesday talks are in **CS Room 140**, unless otherwise stated. Note that there are some additional theory talks this semester, typically on Thursdays.

*Spring 2023 Schedule of Speakers*

**NOTE:** In order to ensure you get weekly updates for all the talks, please make sure you are part of the seminars@cs.umass.edu mailing list. If you wish to give a talk, or would like to nominate someone to give one, please email us to let us know!

*Tuesday, February 7 @ 1 pm*

** Organizational Meeting**

**SPECIAL TIME:** Thursday, February 9 @ 4 pm in CS 150-151

**Foundations of Responsible Machine Learning**

Michael Kim (UC Berkeley)

## ▶ **Abstract**

Algorithms make predictions about people constantly. The spread of such prediction systems has raised concerns that machine learning algorithms may exhibit problematic behavior, especially against individuals from marginalized groups. This talk will provide an overview of my research building a theory of “responsible” machine learning. Specifically, I will highlight a notion of fairness in prediction, called Multicalibration (ICML’18), which formalizes the goals of fair prediction through the lens of complexity theory. Multicalibration requires that algorithmic predictions be well-calibrated, not simply overall, but simultaneously over a rich collection of subpopulations. This “multi-group” approach strengthens the guarantees of group fairness definitions, without incurring the costs (statistical and computational) associated with individual-level protections. Additionally, I will present a new paradigm for learning, Outcome Indistinguishability (STOC’21), which provides a broad framework for learning predictors satisfying formal guarantees of responsibility. Finally, I will discuss the threat of Undetectable Backdoors (FOCS’22), which represent a serious challenge for building trust in machine learning models.

## ▶ **Bio**

Michael P. Kim is a Postdoctoral Research Fellow at the Miller Institute for Basic Research in Science at UC Berkeley, hosted by Shafi Goldwasser. Before this, Kim completed his PhD in Computer Science at Stanford University, advised by Omer Reingold. Kim’s research addresses basic questions about the appropriate use of machine learning algorithms that make predictions about people.

*Tuesday, February 14*

No meeting

**SPECIAL TIME:** Thursday, February 16 @ 4 pm in CS 150-151

**Sequential Prediction: Calibration and Selectivity**

Mingda Qiao (Stanford)

## ▶ **Abstract**

This talk will discuss new perspectives and results on sequential prediction/learning under minimal assumptions on the data. In the first part, I will discuss a model of online binary prediction in which a forecaster observes a sequence of T bits one by one and, before each bit is revealed, predicts the “probability” that the bit is 1. The forecaster is “well-calibrated” if, for each value p, among the timesteps when probability p was predicted, a p-fraction of those bits were 1. The calibration error quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an O(T^{2/3}) calibration error is achievable even when the bits are chosen adversarially, whereas there is a trivial lower bound of Omega(T^{1/2}). I will present the first improvement over this T^{1/2} rate in the lower bound.

The second part of the talk will cover new models of “selective prediction/learning”: The forecaster observes a data sequence one at a time. At any time of its choosing, the forecaster may select a window length w and make a prediction about the next w unseen data points. Surprisingly, we will show that the forecaster can obtain non-trivial prediction and learning guarantees even if the data are arbitrary.

This talk is based on joint work with Gregory Valiant.

## ▶ **Bio**

Mingda Qiao a fifth-year Ph.D. student in Computer Science at Stanford University, advised by Gregory Valiant. He works on the theoretical foundations of machine learning and artificial intelligence. His doctoral research focuses on the theoretical aspects of prediction, learning, and decision-making in sequential settings, as well as decision tree learning. With his collaborators, his contributions include the first non-trivial lower bound for sequential calibration and a faster algorithm for properly learning decision trees. Prior to Stanford, Mingda received his BEng in Computer Science from Yao Class at Tsinghua University in 2018.

*Tuesday, February 21 @ 1 pm*

**Data Structures for Density Estimation**

Sandeep Silwal (MIT)

## ▶ **Abstract**

We consider the following problem: we are given knowledge of k discrete distributions v_i for 1 <= i <= k over the domain [n] = {1,…,n} which we can preprocess. Then we get samples from an unknown discrete distribution p, also over [n]. The goal is to output the closest distribution to p among the v_i’s in TV distance (up to some small additive error). The state of the art algorithms require Theta(log k) samples and run in near linear time.

We introduce a fresh perspective on the problem and ask if we can output the closest distribution in sublinear time. This question is particularly motivated as the problem is a generalization of the traditional nearest neighbor search problem: if we take enough samples, we can learn p explicitly up to low TV distance and then find the closest v_i in o(k) time using standard nearest neighbor search algorithms.

However, this approach requires Omega(n) samples. Thus, it is natural to ask: can we obtain both sublinear number of samples and sublinear query time? We present some nice progress towards this question.

This is joint work with Anders Aamand, Alex Andoni, Justin Chen, Piotr Indyk, and Shyam Narayanan.

## ▶ **Bio**

Sandeep is a 4th year PhD student at MIT advised by Piotr Indyk. He is broadly interested in the intersection of machine learning and algorithm design. Recently, he has worked on algorithms for processing large datasets as well as using machine learning to inspire algorithm design.

Website: sandeepsilwal.com.

**SPECIAL TIME:** Tuesday, February 21 @ 4 pm in CS 150-151

**Pseudorandomness and Space Complexity: New Methods, New Insights, and New Milestone**

William Hoza (UC Berkeley)

## ▶ **Abstract**

Algorithm designers often introduce random choices into their algorithms in an effort to improve efficiency. However, random bits cannot necessarily be produced for free, so deterministic algorithms are preferable to randomized algorithms, all else being equal. Is randomness ever truly necessary for efficient computation? What, ultimately, is the role of randomness in computing?

In this talk, I will discuss the “L = BPL” conjecture, which says that for every clever randomized algorithm, there is an even cleverer deterministic algorithm that does the same job with roughly the same *space complexity.* The most traditional approach for trying to prove this conjecture is based on pseudorandom generators (PRGs), which have additional applications beyond derandomizing algorithms. There are also other approaches based on variants of the PRG concept, most notably “weighted PRGs” and “hitting set generators.” I will give an overview of my contributions in this area (with collaborators), consisting of new constructions and applications of these three types of generators.

## ▶ **Bio**

William Hoza is a postdoctoral fellow at the University of California, Berkeley through the Simons Institute for the Theory of Computing. His Ph.D. is from the University of Texas at Austin, where he was advised by David Zuckerman. He studies topics in computational complexity theory, especially pseudorandomness and derandomization.

**SPECIAL TIME:** Thursday, February 23 @ 4 pm in CS 150-151

**A New Approach for Simulating Randomness in Computation, and Resulting Directions of Research**

Roei Tell (IAS and DIMACS)

## ▶ **Abstract**

Randomness is used throughout computer science, but when does it actually make computation more efficient? A central area in theoretical computer science studies the value of randomness for solving various types of problems, and the ways to simulate true randomness by deterministic algorithms.

I will describe a new algorithmic approach that I introduced for simulating randomness, which avoids using pseudorandom generators, and instead tailors custom-made pseudorandomness to each individual algorithm. This approach enabled new directions of research that I have been pursuing, and I will mention two of those:

1. Free lunch theorems: Simulating randomness for rich classes of algorithms at *no observable cost* (under suitable assumptions), avoiding overheads that are inherent to classical pseudorandom generators.

2. Strengthening certain connections in computer science: Proving that simulating randomness is *equivalent* to solving problems from complexity theory, cryptography, learning theory, and other areas.

I will also present recent progress in circuit complexity, which is a path towards proving that P ? NP that is closely connected to simulating randomness. Specifically, I will describe new results and methods of mine on a main current frontier in this area: Proving better limitations for simplistic models of shallow neural networks, called threshold circuits.

## ▶ **Bio**

Roei Tell is a theoretical computer scientist whose research focuses on randomness in computation, and more broadly on complexity theory. He is currently a postdoc at the Institute for Advanced Study, joint with DIMACS, and prior to that he was a postdoc at MIT. He completed his PhD and MSc at the Weizmann Institute of Science, following undergraduate studies in psychology, math, and computer science.

**SPECIAL TIME:** Monday, February 27 @ 4 pm in CS 150-151

**Distance-Estimation in Modern Graphs: Algorithms and Impossibility**

Nicole Wein (DIMACS)

## ▶ **Abstract**

The size and complexity of today’s graphs present challenges that necessitate the discovery of new algorithms. One central area of research in this endeavor is computing and estimating distances in graphs. In this talk I will discuss two fundamental families of distance problems in the context of modern graphs: Diameter/Radius/Eccentricities and Hopsets/Shortcut Sets.

The best known algorithm for computing the diameter (largest distance) of a graph is the naive algorithm of computing all-pairs shortest paths and returning the largest distance. Unfortunately, this can be prohibitively slow for massive graphs. Thus, it is important to understand how fast and how accurately the diameter of a graph can be approximated. I will present tight bounds for this problem via conditional lower bounds from fine-grained complexity.

Secondly, for a number of settings relevant to modern graphs (e.g. parallel algorithms, streaming algorithms, dynamic algorithms), distance computation is more efficient when the input graph has low hop-diameter. Thus, a useful preprocessing step is to add a set of edges (a hopset) to the graph that reduces the hop-diameter of the graph, while preserving important distance information. I will present progress on upper and lower bounds for hopsets.

## ▶ **Bio**

Nicole Wein is a Simons Postdoctoral Leader at DIMACS at Rutgers University. Previously, she obtained her Ph.D. from MIT advised by Virginia Vassilevska Williams. She is a theoretical computer scientist and her research interests include graph algorithms and lower bounds including in the areas of distance-estimation algorithms, dynamic algorithms, and fine-grained complexity.

*Tuesday, February 28*

No meeting

**SPECIAL TIME:** Wednesday, March 1 @ 4 pm in CS 150-151

**Sampling from Graphical Models via Spectral Independence**

Zongchen Cheng (MIT)

## ▶ **Abstract**

In many scientific settings we use a statistical model to describe a high-dimensional distribution over many variables. Such models are often represented as a weighted graph encoding the dependencies between different variables and are known as graphical models. Graphical models arise in a wide variety of scientific fields throughout science and engineering.

One fundamental task for graphical models is to generate random samples from the associated distribution. The Markov chain Monte Carlo (MCMC) method is one of the simplest and most popular approaches to tackle such problems. Despite the popularity of graphical models and MCMC algorithms, theoretical guarantees of their performance are not known even for some simple models. I will describe a new tool called “spectral independence” to analyze MCMC algorithms and more importantly to reveal the mathematical nature behind such models. I will also discuss how these structural properties can be applied to sampling when MCMC fails and to other statistical problems like parameter learning or model fitting.

## ▶ **Bio**

Zongchen Chen is an instructor (postdoc) in Mathematics at MIT. He received his Ph.D. degree in Algorithms, Combinatorics and Optimization (ACO) at Georgia Tech in 2021 advised by Eric Vigoda. His thesis received the 2021 Georgia Tech College of Computing Outstanding Doctoral Dissertation Award. He received his BS degree in Mathematics & Applied Mathematics from Zhiyuan College at Shanghai Jiao Tong University in 2016. He is broadly interested in randomized algorithms, discrete probability, and machine learning. His current research interests include Markov chain Monte Carlo (MCMC) methods, approximate counting and sampling, and learning and testing for high-dimensional distributions.

*Tuesday, March 7 @ 1 pm*

**Fast and Accurate Least-Mean-Squares Solvers**

Alaa Maalouf (MIT)

## ▶ **Abstract**

Least-mean-squares (LMS) solvers such as Linear / Ridge-Regression and SVD not only solve fundamental machine learning problems but are also the building blocks in various other methods, such as matrix factorizations.

We suggest an algorithm that gets a finite set of n d-dimensional real vectors and returns a subset of d+1 vectors with positive weights whose weighted sum is *exactly* the same. The constructive proof in Caratheodory’s Theorem computes such a subset in O(n^2d^2) time and is thus not used in practice. Our algorithm computes this subset in O(nd+d^4*log(n)) time, using O(log n) calls to Caratheodory’s construction on small but “smart” subsets. This is based on a novel fusion paradigm between different data summarization techniques, sketches, and coresets.

For large values of d, we suggest a faster construction that takes O(nd) time and returns a weighted subset of O(d) sparsified input points. Here, a sparsified point means that some of its entries were set to zero.

As an application, we show how to boost the performance of existing LMS solvers, such as those in the scikit-learn library, up to x100. Generalization for streaming and distributed data is trivial. Extensive experimental results and open-source code are provided.

## ▶ **Bio**

I am a Postdoctoral Researcher at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), honorably working with Prof. Daniela Rus. I received my Ph.D. from the Department of Computer Science at the University of Haifa, where I was lucky enough to be supervised by Prof. Dan Feldman. During my Ph.D. I also worked as a Lead Machine Learning Researcher at DataHeroes, and prior to that, I worked as a chip design engineer at Mellanox technologies. My main research interests are Deep Learning, Machine Learning, Robotics, and Big Data. I am passionate to contribute to society by making Machine/Deep Learning algorithms more efficient and reliable in order to enhance their (daily) usage in different fields such as Computer Vision, Natural Language Processing, and Robotics.

*Tuesday, March 14*

No meeting — Spring Recess!

*Tuesday, March 21 @ 1 pm*

**Efficient Locally Private Algorithms for Estimation Problems**

Huy Nguyen (Northeastern)

## ▶ **Abstract**

In federated computation, user data is distributed over many devices which each communicate to some central server for downstream analytics and/or machine learning tasks. Already in the classical setting, there are many desiderata to be balanced in the system from accurate aggregation to low communication and fast runtime for the users and the server. Adding to the already delicate balancing, privacy concern necessitates new methods that are efficient and accurate while preserving the privacy of individual data. In this talk we will focus on two basic tasks underlying many applications: computing the histogram of user data and estimating their mean. While the problems are well-studied, a lot of recent works have been devoted to developing efficient and optimally accurate algorithms beyond asymptotic behavior. We will describe new algorithms achieving near optimal error, fast runtime, and low communication for these tasks. This is based on joint work with Hilal Asi, Vitaly Feldman, Jelani Nelson, and Kunal Talwar.

## ▶ **Bio**

Huy Nguyen is an associate professor at Northeastern University. He received his MEng from MIT in 2009 and PhD from Princeton in 2014 under the supervision of Moses Charikar. In 2016, he joined Khoury College at Northeastern University. His research interests include algorithms for data streams and massively parallel computation, discrete and continuous optimization, differentially private algorithms, and machine learning.

*Tuesday, March 28 @ 1 pm*

**Designing Convex and Consistent Surrogates via Loss Embeddings**

Jessie Finocchiaro (Harvard)

## ▶ **Abstract**

We formalize and study the natural approach of designing convex surrogate loss functions via embeddings, for problems such as classification, ranking, or structured prediction. In this approach, one embeds each of the finitely many predictions (e.g. rankings) as a point in the d-dimensional reals, assigns the original loss values to these points, and “convexifies” the loss in some way to obtain a surrogate. We establish a strong connection between this approach and polyhedral (piecewise-linear convex) surrogate losses: every discrete loss is embedded by some polyhedral loss, and every polyhedral loss embeds some discrete loss. Moreover, an embedding gives rise to a consistent link function as well as linear surrogate regret bounds. Our results are constructive, as we illustrate with several examples. In particular, our framework gives succinct proofs of consistency or inconsistency for various polyhedral surrogates in the literature, and for inconsistent surrogates, it further reveals the discrete losses for which these surrogates are consistent. We go on to show additional structure of embeddings, such as the equivalence of embedding and matching Bayes risks, and the equivalence of various notions of non-redudancy. Using these results, we establish that indirect elicitation, a necessary condition for consistency, is also sufficient when working with polyhedral surrogates. Based on joint work with Raf Frongillo and Bo Waggoner.

## ▶ **Bio**

Jessie Finocchiaro (she/her) is a National Science Foundation Mathematical Sciences Postdoctoral Research Fellow and CRCS Fellow at Harvard University, working with Drs. Yiling Chen and Milind Tambe. She graduated in the CS Theory group at University of Colorado Boulder, advised by Dr. Rafael Frongillo. In general, her research interests intersect Theoretical Machine Learning, Algorithmic Game Theory, and Computational Economics. In particular, she is interested in how {the questions we ask of data, the objectives we optimize} affect what we learn and how this information affects people. Previously, she was a 2019 National Science Foundation Graduate Research Fellow.

*Tuesday, April 4 @ 1 pm*

**Smoothed Analysis of the Simplex Method**

Sophie Huiberts (Columbia)

## ▶ **Abstract**

The simplex method is an important algorithm for solving linear optimization problems. The algorithm is very efficient in practice, but theoretical explanations of this fact are lacking. In this talk, I will describe smoothed analysis, one of the primary theoretical frameworks for analyzing the simplex method, and present upper and lower bounds on the running time.

## ▶ **Bio**

TBD.

*Tuesday, April 11 @ 1 pm*

**TBD**

TBD

## ▶ **Abstract**

TBD.

## ▶ **Bio**

TBD.

*Tuesday, April 18*

No meeting — Patriot’s Day!

*Tuesday, April 25 @ 1 pm*

**TBD**

TBD

## ▶ **Abstract**

TBD.

## ▶ **Bio**

TBD.

*Tuesday, May 2 @ 1 pm*

**Reachability Shortcuts: New Bounds and Algorithms**

Merav Parter (Weizmann Institute)

## ▶ **Abstract**

For an n-vertex digraph G=(V,E), a *shortcut set* is a (small) subset of edges H taken from the transitive closure of G that, when added to G guarantees that the diameter of G \cup H is small. Shortcut sets, introduced by Thorup in 1993, have a wide range of applications in algorithm design, especially in the context of parallel, distributed and dynamic computation on directed graphs. A folklore result in this context shows that every n-vertex digraph admits a shortcut set of linear size (i.e., of O(n) edges) that reduces the diameter to ~O(sqrt{n}). Despite extensive research over the years, the question of whether one can reduce the diameter to o(sqrt{n}) with ~O(n) shortcut edges has been left open.

In this talk, I will present the first improved diameter-sparsity tradeoff for this problem, breaking the sqrt{n} diameter barrier. Specifically, we show an O(n^{\omega})-time randomized algorithm for computing a linear shortcut set that reduces the diameter of the digraph to ~O(n^{1/3}). I also present time efficient algorithms for computing these shortcuts, and discuss some open ends.

Based on a joint work with Shimon Kogan (SODA 2022, ICALP 2022, SODA 2023).

## ▶ **Bio**

Merav Parter is an Associate Professor in the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science. Before joining Weizmann, she was a Fulbright and Rothschild Postdoctoral Researcher in the EECS department of MIT. In the past, she was a Google European Fellow in Distributed Computing, 2012. Her research interests include reliable distributed communication, graph theory, and neural networks. Parter’s research is supported (in part) by the European Research Council (starting grant DistRES, 2020-2025), the Israeli Science Foundation (ISF) and the NSF-BSF organization. She is the recipient of the Krill prize for Excellence in Scientific Research of 2021.

*Tuesday, May 9 @ 1 pm*

**TBD**

TBD

## ▶ **Abstract**

TBD.

## ▶ **Bio**

TBD.

*Tuesday, May 16 @ 1 pm*

**TBD**

TBD

## ▶ **Abstract**

TBD.

## ▶ **Bio**

TBD.