Clustering and classification play a central role in machine learning systems for processing images, text, high-dimensional data, graph-based learning and various other use cases of unsupervised and supervised learning. Despite its long history, clustering and classification are still active areas of research, and in the past decade, a variety of new techniques and new models have been applied to these applications. This workshop will give an overview of new angles on this topic including scalable algorithms for high-dimensional and graph-based data, novel models and objectives (including hierarchical and overlapping), applications in semi-supervised and unsupervised learning, beyond worst-case analysis and robustness, fairness and privacy, deep learning-augmented techniques, etc.

 When? September 18-20, 2019. Where? Toyota Technological Institute at Chicago. What? The program includes longer keynote talks, shorter spotlight talks, a panel discussion and a poster session. See the schedule and the list of speakers. Do I need to register? Yes, registration is free but required.

# Day 1: Wednesday, September 18

• 09:00 – 10:00

Breakfast

• 10:00 – 12:00
Jennifer Dy (Northeastern)
Iterative Spectral Method for Alternative Clustering

Given a dataset and an existing clustering as input, alternative clustering aims to find an alternative partition. One of the state-of-the-art approaches is Kernel Dimension Alternative Clustering (KDAC). We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC. Our algorithm is intuitive, relies on easily implementable spectral decompositions, and comes with theoretical guarantees. Its computation time improves upon existing implementations of KDAC by as much as 5 orders of magnitude.

Distributed k-Center Clustering via Sublinear Coresets

Metric clustering is a fundamental primitive in machine learning with several applications for mining massive data-sets. An important example of metric clustering is the $k$-center problem. While this problem has been extensively studied in distributed settings, all previous algorithms require $\Omega(k)$ space per machine and $\Omega(n k)$ total work. In this paper, we develop the first highly scalable approximation algorithm for $k$-center clustering requiring $o(k)$ space per machine with $o(n k)$ total work. In particular, our algorithm needs $\widetilde{O}(n^{\eps})$ space per machine and $\tilde{O}(n^{1+\epsilon})$ total work, and computes an $O(\log \log \log n)$-approximation of the problem by selecting $(1+o(1))k$ centers in $O(\log \log n)$ rounds. This is achieved by introducing core-sets of truly sublinear size. Finally, these theoretical results are complemented with an empirical study which shows that our distributed algorithm very quickly converges to a solution comparable to that of the sequential approximation algorithm.

Zohar Karnin (Amazon NYC)
Discrepancy, Coresets, and Sketches in Machine Learning

We defines the notion of class discrepancy for families of functions and show that low discrepancy classes admit small offline and streaming (additive) coresets. We provide general techniques for bounding the class discrepancy of machine learning problems. As corollaries of the general technique we bound the discrepancy, and therefore coreset complexity, of logistic regression, sigmoid activation loss, matrix covariance, kernel density and any analytic function of the dot product or the squared distance. This resolves the long-standing open problem regarding the coreset complexity of Gaussian kernel density estimation. We provide two more related but independent results. First, an exponential improvement of the widely used merge-and-reduce trick which gives improved streaming sketches for any low discrepancy problem. Second, an extremely simple deterministic algorithm for finding low discrepancy sequences (and therefore coresets) for any positive semi-definite kernel. This paper establishes some explicit connections between class discrepancy, coreset complexity, learnability, and streaming algorithms.

Aravindan Vijayaraghavan (Northwestern)
On Robustness to Adversarial Examples and Polynomial Optimization

The talk will focus on classification algorithms that are robust to adversarial (test time) perturbations. While there have numerous recent works on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions including (1) when and how can one design provably robust learning algorithms?, and (2) what is the price of achieving robustness to adversarial examples in a computationally efficient manner?

In this talk, I will show a strong connection between achieving robustness to adversarial examples, and approximation algorithms for a rich class of polynomial optimization problems, thereby making progress on the above questions. In particular, this connection can be leveraged to design the computationally efficient robust algorithms with provable guarantees, and also give a precise characterization of the price of achieving robustness in a computationally efficient manner for some of these classes. Time permitting, I will also describe how we can also design efficient algorithms to certify robustness and generate adversarial attacks in a principled manner for $2$-layer neural networks.

•  12:00 – 14:00

Lunch

• 14:00 – 15:00
Konstantin Makarychev (Northwestern)
Correlation Clustering

TBD

Yury Makarychev (TTIC)
Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k/ε)/ε^2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p.

For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

Based on a joint work with Konstantin Makarychev and Ilya Razenshteyn.

• 15:00 – 15:30

Tea time

• 15:30 – 16:30
Moses Charikar (Stanford)
Keynote: An Optimization Perspective on Hierarchical Clustering

Algorithms for Hierarchical Clustering have been studied since the 1960's. A few years ago, Dasgupta proposed a new objective function for this problem, spurring a flurry of work designing new algorithms and shedding light on old algorithms for hierarchical clustering. In this talk, I will provide an overview of some of these developments.

# Day 2: Thursday, September 19

• 09:00 – 09:30

Breakfast

• 09:30 – 10:30
Elchanan Mossel (MIT)
Keynote: Hierarchical Models and Depth in Inference

I will discuss rigorous attempts to explain the useful of depth in inference by depth in generation.

• 10:30 – 12:30
Distributed Balanced Partitioning and Applications in Large-scale Load Balancing

Balanced partitioning is often a crucial first step in solving large-scale optimization problems where it is used to break the problem into smaller, almost independent subproblems. We focus on a minimum-cut formulation, develop heuristics for the problem, and apply the resulting distributed algorithm to a host of applications in Web Search, Maps, Advertisement, and Job Scheduling. In this talk, I will explain a load-balancing application in the Web Search backend.

Sliding Window Algorithms for k-Clustering Problems

The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median, in this setting. We generalize our recent result on maximization in the sliding window setting (PODS'19) to a framework that allows minimization problem in the sliding window. We then show how this gives a simple and practical algorithm that come with stronger performance guarantees than previously known results for k-clustering problems. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with cost only slightly worse than those returned by algorithms that have access to the full dataset.

Hierarchical Clustering: Objectives and Efficient Algorithms

Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Hierarchical clustering has mostly been studied in procedural terms, i.e., a hierarchical cluster tree is obtained using certain bottom-up (agglomerative) or top-down (divisive) heuristics, rather than finding a hierarchical cluster tree as the solution obtained by optimizing some suitable objective. Dasgupta (2016) identified this as a reason why the theory of hierarchical clustering lagged behind that of flat clustering and proposed an objective function. In this talk, I will take an axiomatic approach to identifying suitable objective functions for hierarchical clustering. I will also describe a new random-graph model with a planted hierarchy. New and existing algorithms and their performance in theory and on some preliminary experiments will be discussed.

Based on joint work with Vincent Cohen-Addad, Claire Mathieu and Frederik Mallmann-Trenn.

Hierarchical Clustering with Triplet Constraints

Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by the set of features available to the algorithm. This gives rise to the problem of "hierarchical clustering with structural constraints". Structural constraints pose major challenges for bottom-up approaches like average/single linkage and even though they can be naturally incorporated into top-down divisive algorithms, no formal guarantees exist on the quality of their output. In this talk, I explain how we adapt simple top-down algorithms to cope with constraints with provable approximation guarantees, using a recently introduced optimization viewpoint of hierarchical clustering with pairwise similarity information [Dasgupta, 2016]. We further show how to find good solutions even in the presence of conflicting prior information, by formulating a constraint-based regularization of the objective. Finally, we demonstrate our approach on a real dataset for the taxonomy application.

•  12:30 – 14:00

Lunch

• 14:00 – 15:00
Connected Components In Large-Scale Graphs

We consider the problem of finding connected components in a distributed setting. We present state of the art algorithms that achieve near-optimal round complexity and work well in practice. Our main focus is the Massively Parallel Computation (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark.

This is joint work with Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Vahab Mirrokni, Warren Schudy and Michał Włodarczyk.

Hierarchical Clustering Objectives

Hierarchical Clustering is a data analysis method that has been used for decades. Despite its wide use, the method has an underdeveloped analytical foundation. In this talk, we discuss a recent line of work on objective functions for hierarchical clustering, kicked off by Dasgupta in 2016.

• 15:00 – 15:30

Tea time

• 15:30 – 17:00
Michael Mahoney (Berkeley)
Minimax and Bayesian Experimental Design: Bridging the Gap between Statistical and Worst-Case Approaches to Least Squares Regression

In experimental design, we are given a large collection of vectors, each with a hidden response value, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gaussian noise, in which case the goal is to provide an unbiased estimator with smallest mean squared error. A different approach, more common in computer science, is to assume the responses are arbitrary but fixed, in which case the goal is to estimate the least squares solution using few responses, as quickly as possible, for worst-case inputs. Characterizing the relationship between these two approaches has proven elusive. We address this by proposing a framework for experimental design where the responses are produced by an arbitrary unknown distribution. We show that there is an efficient randomized experimental design procedure that achieves strong variance bounds for an unbiased estimator using few responses in this general model. Nearly tight bounds for the classical A-optimality criterion, as well as improved bounds for worst-case responses, emerge as special cases of this result. Extensions to Bayesian experimental design and connections with determinantal point processes will also be described.

Andrew McGregor (UMass)
Exact Parameter Learning for Mixture Distributions with Applications

We present upper bounds on the sample complexity on learning the exact parameters of a mixture distribution. The bound applies for a range of distributions including Gaussian, Binomial, Poisson, and Chi-Squared and, at least in the case of Binomial mixtures, the bound is essentially tight. We then discuss applications to the problems of trace reconstruction (i.e., how many random subsequences of an unknown string x are required to determine x) and mixed sparse linear regression (i.e., how many linear measurements are required to learn a collection of sparse signals).

This is joint work with Akshay Krishnamurthy, Arya Mazumdar, and Soumyabrata Pal. Related papers appear at ESA 2019 and NeurIPS 2019.

Andrej Risteski (CMU)
Understanding Langevin Dynamics Beyond Log-Concavity

A key task in Bayesian approaches in machine learning is sampling from distributions of the form p(x) = e^{-f(x)}/Z where Z is a normalizing constant, for some function f whose values and gradients we can query. One prevalent example of this is sampling posteriors in parametric distributions, such as latent-variable generative models -- which is the natural Bayesian analogue of clustering. However sampling (even very approximately) can be #P-hard.

Classical results on sampling focus on log-concave distributions (i.e. f is convex), and show a natural Markov process called Langevin diffusion mixes in polynomial time. However, log-concavity is quite restrictive in practice: in particular such distributions are uni-modal. I will address in this talk some ways to move beyond log-concavity, by assuming benign forms of multimodality, and time-permitting concentration of the distribution near a manifold with good curvature properties.

• 17:00 – 18:00
Kamalika Chaudhuri (UCSD)
Keynote: Challenges in Reliable Machine Learning

TBD As machine learning is increasingly used in real applications, there is a need for reliable and robust methods. In this talk, we will discuss two such challenges that arise in reliable machine learning. The first is sample selection bias, where training data is available from a distribution conditioned on a sample selection policy, but the resultant classifier needs to be evaluated on the entire population. We will show how we can use active learning to get a small amount of labeled data from the entire population that can be used to correct this kind of sample selection bias. The second is robustness to adversarial examples -- slight strategic perturbations of legitimate test inputs that cause misclassification. We next look at adversarial examples in the context of a simple non-parametric classifier -- the k-nearest neighbor classifier, and look at its robustness properties. We provide bounds on its robustness as a function of k, and propose a more robust 1-nearest neighbor classifier.

Joint work with Songbai Yan, Tara Javidi, Yaoyuan Yang, Cyrus Rastchian, Yizhen Wang and Somesh Jha

# Day 3: Friday, September 20

• 09:00 – 09:30

Breakfast

• 09:30 – 10:30
Keynote: Privacy Amplification

A rigorous foundational approach to private data analysis has emerged in theoretical computer science in the last decade, with differential privacy and its close variants playing a central role. Recent works show that several machine learning problems can be solved accurately while giving strong differentially privacy guarantees. The analyses of these algorithms rely on a class of results known as privacy amplification theorems. In this talk, I will describe two recent results of this kind.

• 10:30 – 12:30
Variance Reduction in Bipartite Experiments through Correlation Clustering

Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. We introduce a novel clustering objective and a corresponding correlation clustering algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments.

Krzysztof Onak (IBM Research)
Scalable Fair Clustering

In the fair variant of the classic k-median problem, the goal is to minimize the same average distance objective while ensuring that all clusters have an approximately equal number of points of each color. Cherichetti et al. (NIPS 2017) introduced a two-step framework for solving this problem. In the first step, the input set of points is decomposed into small balanced subsets, called fairlets, that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. In the original work, the running time was dominated by the complexity of the first step, which took super-quadratic time. I will present a practical approximate fairlet decomposition algorithm that runs in near linear time.

Joint work with Arturs Backurs, Piotr Indyk, Baruch Schieber, Ali Vakilian, and Tal Wagner.

Haipeng Luo (USC)
Logistic Regression: The Importance of Being Improper

Learning linear predictors with the logistic loss (both in stochastic and online settings) is a fundamental task in machine learning and statistics. Existing "fast rates" for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. In this work, however, with a simple observation that logistic loss is 1-mixable we show how improper learning can circumvent this negative result with a regret bound exhibiting a doubly-exponential improvement in the dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter (2012). Based on this improvement, we also provide a series of applications resolving several more open problems, including better algorithms for online bandit multiclass learning and online boosting.

David Woodruff (CMU)
Strong Coresets for k-Median and Subspace Clustering: Goodbye Dimension

In this talk I will discuss recent progress on building small coresets for clustering and subspace approximation. A coreset allows one to efficiently summarize a data set, while still preserving useful properties, and has applications to low-memory streaming algorithms and low-communication distributed protocols. We obtain the first strong coresets for the k-median and subspace approximation problems with sum of distances objective function, on n points in d dimensions, with a number of weighted points that is independent of both n and d; namely, our coresets have size poly(k/ε). A strong coreset (1+ε)-approximates the cost function for all possible sets of centers simultaneously. We also give efficient nnz(A)+(n+d)poly(k/ε)+exp(poly(k/ε)) time algorithms for computing these coresets. We obtain the result by introducing a new dimensionality reduction technique for coresets that significantly generalizes an earlier result of Feldman, Sohler and Schmidt for squared Euclidean distances to sums of p-th powers of Euclidean distances for constant p≥1.

Joint work with Christian Sohler.

•  12:30 – 14:00

Lunch

• 14:00 – 15:00
Streaming Coresets for M-Estimators

We introduce a new method of maintaining a (k,epsilon)-coreset for clustering M-estimators over insertion-only streams. Let (P,w) be a weighted set (where w : P - > [0,infty) is the weight function) of points in a rho-metric space (meaning a set X equipped with a positive-semidefinite symmetric function D such that D(x,z) <=rho(D(x,y) + D(y,z)) for all x,y,z in X). For any set of points C, we define COST(P,w,C) = sum_{p in P} w(p) min_{c in C} D(p,c). A (k,epsilon)-coreset for (P,w) is a weighted set (Q,v) such that for every set C of k points, (1-epsilon)COST(P,w,C) <= COST(Q,v,C) <= (1+epsilon)COST(P,w,C). M-estimators are functions D(x,y) that can be written as psi(d(x,y)) where ({X}, d) is a true metric (i.e. 1-metric) space. Special cases of M-estimators include the well-known k-median (psi(x) =x) and k-means (psi(x) = x^2) functions. Our technique takes an existing offline construction for an M-estimator coreset and converts it into the streaming setting, where n data points arrive sequentially. Our streaming construction does not rely on the merge-and-reduce tree. For example, our coreset for streaming metric k-means uses O(epsilon^{-2} k log k log n) points of storage. The previous state-of-the-art required storing at least O(epsilon^{-2} k log k log^{4} n) points.

Joint work with Dan Feldman, Harry Lang and Daniela Rus, (RANDOM 2019)

If time permits we will also describe our recent results on coresets for order clustering (ICML 2019, joint work with Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu) and sketched distributed SGD (NeuRIPS 2019, joint work with Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Ion Stoica, Raman Arora)

Quanquan Gu (UCLA)
Towards Understanding Overparameterized Deep Neural Networks: From Optimization To Generalization

Deep learning has achieved tremendous successes in many applications. However, why deep learning is so powerful is still less well understood. One of the mysteries is that deep neural networks used in practice are often heavily over-parameterized such that they can even fit random labels to the input data, while they can still achieve very small test error when trained with real labels. In order to understand this phenomenon, in this talk, I will first show that with over-parameterization and a proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs with the ReLU activation function. Then I will show stochastic gradient descent with a proper random initialization is able to train a sufficiently over-parameterized DNN to achieve small generalization error. I will conclude by discussing implications, challenges and future work along this line of research.

• 15:00 – 15:30

Tea time

• 15:30 – 16:30
Po-Ling Loh (UWisc)
Robust Information Bottleneck

We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small, when parametrized by the inputs. We derive the optimal solution explicitly when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in that setting. Furthermore, we propose a method for optimizing the robust information bottleneck objective in general settings using a form of stochastic gradient descent that may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction method indeed produces classifiers with increased robustness to perturbations.

Ludwig Schmidt (Berkeley)
Do ImageNet Classifiers Generalize to ImageNet?

We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of classifiers and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the classifiers' inability to generalize to slightly "harder" images than those found in the original test sets.

• 16:30 – 17:30
Aaron Roth (UPenn)
Keynote: Average Individual Fairness

The literature on algorithmic fairness has primarily focused on "statistical" definitions of fairness that are easy to impose and check, but offer only limited guarantees. In contrast, the smaller literature on "individual" fairness has studied more meaningful guarantees, but ones that are often only applicable under some kind of strong assumption. In this work, we identify an individual-level fairness constraint that can be applied without making assumptions in certain scenarios: in particular, when there is a distribution over -problems- as well as over individuals.

This is joint work with Michael Kearns and Saeed Sharifi.

# Speakers

## Keynote Speakers

• ### Moses Charikar (Stanford University)

Moses Charikar is the Donald E. Knuth professor of Computer Science at Stanford University. He obtained his PhD from Stanford in 2000, spent a year in the research group at Google, and was on the faculty at Princeton from 2001-2015. He is broadly interested in approximation algorithms (especially the power of mathematical programming approaches), metric embeddings, algorithmic techniques for big data, efficient algorithms for computational problems in high-dimensional statistics and optimization problems in machine learning. He won the best paper award at FOCS 2003 for his work on the impossibility of dimension reduction, the best paper award at COLT 2017 and the 10 year best paper award at VLDB 2017. He was jointly awarded the 2012 Paris Kanellakis Theory and Practice Award for his work on locality sensitive hashing, and was named a Simons Investigator in theoretical computer science in 2014.

• ### Kamalika Chaudhuri (University of California, San Diego)

Kamalika Chaudhuri received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from Indian Institute of Technology, Kanpur, and a PhD in Computer Science from University of California at Berkeley in 2007. In July 2010, she joined the CSE department at UC San Diego as an assistant professor. She received an NSF CAREER Award in 2013 and a Hellman Faculty Fellowship in 2012.

Kamalika's research interests are in the statistical foundations of machine learning. She is particularly interested in the foundations of trustworthy machine learning, which includes problems such as learning from sensitive data while preserving privacy, learning under sampling bias, and in the presence of an adversary.

• ### Elchanan Mossel (Massachusetts Institute of Technology)

Elchanan Mossel became a professor of mathematics at MIT in 2016. He was previously a professor of statistics at the University of Pennsylvania. Elchanan was awarded a Ph.D. in mathematics from Hebrew University of Jerusalem. Since then, he was appointed as a postdoc in the Theory Group at Microsoft Research Redmond, an Alfred P. Sloan Research Fellow, a Miller Research Fellow at University of California, Berkeley, a professor of mathematics, applied mathematics and computer science at the Weizmann Institute of Science and a professor of statistics and computer science at University of California, Berkeley.

He studies fundamental problems in probability and analysis, computational complexity and algorithms, as well as applications to machine learning, Markov chain Monte Carlo methods, social choice and networks, and computational biology.

• ### Aaron Roth (University of Pennsylvania)

Aaron Roth is the class of 1940 Bicentennial Term associate professor of Computer and Information Sciences at the University of Pennsylvania, and co-director of the Networked and Social Systems Engineering (NETS) program. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, and research awards from Amazon, Google, and Yahoo. His research focuses on data privacy, algorithmic fairness, game theory, and machine learning. Together with Cynthia Dwork, he is the author of "The Algorithmic Foundations of Differential Privacy". Together with Michael Kearns, he is the author of “The Ethical Algorithm”, forthcoming in 2019 from Oxford University Press.

• ### Kunal Talwar (Google Brain, MTV)

Kunal Talwar is a Theoretical Computer Scientist, working in the areas of Differential Privacy, Machine Learning, Algorithms, and Data Structures. He is a Research Scientist in the Google Brain team. He got his PhD from UC Berkeley in 2004 and was at Microsoft Research 2004-2014.

## Spotlight Speakers

Mohammad Hossein Bateni is a staff research scientist at Google NYC working on large-scale graph and optimization problems. Before joining the team in 2011, he obtained Ph.D. and M.A. degrees in Computer Science from Princeton University, and a B.E. in Computer Engineering from Sharif University of Technology.

Hossein's research interests lie broadly in the design of approximation and distributed algorithms as well as combinatorial optimization. More recently he has focused on network design and distributed clustering problems.

• ### Jennifer Dy (Northeastern University)

Jennifer G. Dy is a Professor at the Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, where she first joined the faculty in 2002. She received her M.S. and Ph.D. in 1997 and 2001 respectively from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, and her B.S. degree from the Department of Electrical Engineering, University of the Philippines, in 1993. Her research spans both fundamental research in machine learning and their application to biomedical imaging, health, science and engineering, with research contributions in clustering, multiple alternative clustering, dimensionality reduction, feature selection and sparse methods, learning from uncertain experts, active learning, and Bayesian nonparametric models. She received an NSF Career award in 2004. She has served or is serving as Secretary for the International Machine Learning Society, associate editor/editorial board member for the Journal of Machine Learning Research, Machine Learning journal, IEEE Transactions on Pattern Analysis and Machine Intelligence, organizing and or technical program committee member for premier conferences in machine learning and data mining (ICML, ACM SIGKDD, AAAI, IJCAI, UAI, AISTATS, SIAM SDM), and program co-chair for SIAM SDM 2013 and ICML 2018.

• ### Alessandro Epasto (Google AI, NYC)

Alessandro Epasto is a research scientist at Google NYC in the graph mining team, part of the Google NYC Algorithms and Optimization team lead by Vahab Mirrokni. Before joining Google in 2016 he was a postdoc at Brown University advised by prof. Eli Upfal. He received a Ph.D. in computer science from Sapienza University of Rome where he was advised by prof. Alessandro Panconesi and supported by the Google European Doctoral Fellowship. His research interests include algorithmic problems in large-scale data analysis.

• ### Quanquan Gu (University of California, Los Angeles)

Quanquan Gu is an Assistant Professor of Computer Science at UCLA. His current research is in the area of artificial intelligence and machine learning, with a focus on developing and analyzing nonconvex optimization algorithms for machine learning to understand large-scale, dynamic, complex and heterogeneous data, and building the theoretical foundations of deep learning. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign. He is a recipient of the Yahoo! Academic Career Enhancement Award, NSF CAREER Award, Simons Berkeley Research Fellowship, Adobe Research Award, and Salesforce Deep Learning Research Award.

• ### Jakub Łącki (Google Research, NYC)

Jakub Łącki is a Research Scientist at Google New York. His interests include distributed algorithms, clustering and dynamic graph algorithms. He received his PhD from Univeristy of Warsaw in 2015, advised by Piotr Sankowski. Before joining Google he was a postdoctoral researcher at Sapienza University of Rome, working with Stefano Leonardi.

• ### Haipeng Luo (University of Southern California)

Haipeng Luo is an assistant professor at the CS department of the University of Southern California. He obtained his PhD from Princeton University in 2016 and spent a year at Microsoft Research, NYC as a post-doc researcher afterwards. His research interest is in theoretical and applied machine learning, with a focus on online learning, bandit algorithms, and interactive machine learning. He received several awards over the years, including three best paper and best student paper awards at ICML, NeurIPS, and COLT respectively.

• ### Varun Kanade (University of Oxford)

Varun Kanade is an Associate Professor in Computer Science at the University of Oxford, and a Turing Fellow at the Alan Turing Institute in London. He was awarded a Ph.D. from Harvard University in 2012. Subsequently, he was a postdoctoral fellow supported by the FSMP at ENS, Paris and by the Simons Foundation at UC Berkeley. His research interest lie in theoretical aspects of machine learning and has recently been working on various problems related to clustering, randomized graph algorithms and random graphs, and supervised learning.

• ### Zohar Karnin (Amazon Web Services, NYC)

Zohar Karnin is a Principal Scientist in Amazon AWS AI. His research interests are in the area of large scale and online machine learning algorithms, and computation with limited resources. He is the technical lead for the science efforts of several components in Amazon SageMaker, including the development of built-in machine learning algorithms.

• ### Michael Mahoney (University of California, Berkeley)

Michael Mahoney is in the Department of Statistics at the University of California, Berkeley. His research focuses on algorithmic and statistical aspects of modern large-scale data analysis and large-scale machine learning. Specific topics include randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. He served on the National Advisory Committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI); he was on the National Research Council's Committee on the Analysis of Massive Data; he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets; and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation's program on the Theoretical Foundations of Big Data Analysis.

• ### Yury Makarychev (Toyota Technological University, Chicago)

Yury Makarychev is an associate professor at the Toyota Technological Institute at Chicago. His primary research interests include combinatorial optimization, beyond worst-case analysis, approximation algorithms, and metric geometry. Yury received his undergraduate degree in mathematics from Moscow State University and PhD in computer science from Princeton University. Before joining TTIC, he had spent two years at Microsoft Research as a postdoctoral fellow.

• ### Andrew McGregor (University of Massachusetts, Amherst)

Andrew McGregor is an Associate Professor at the University of Massachusetts, Amherst. He received a B.A. degree and the Certificate of Advance Study in Mathematics from the University of Cambridge and a Ph.D. from the University of Pennsylvania. He also spent a couple of years as a post-doc at UC San Diego and Microsoft Research SVC. He is interested in many areas of theoretical computer science and specializes in data stream algorithms, linear sketching, and communication complexity. He received the NSF Career Award in 2010.

Rad Niazadeh is a Motwani postdoctoral fellow at Stanford University, Department of Computer Science, where he is hosted by Amin Saberi, Moses Charikar and Tim Roughgarden. He is joining Chicago Booth as an assistant professor in July 2020, and Google Research NYC as a research visitor in Dec 2019. Prior to Stanford, he obtained his Ph.D. in Computer Science from Cornell University. His research interests are broadly at the intersection of algorithms, game theory and optimization, with a focus on applications in market design, operations research, and machine learning.

• ### Krzysztof Onak (IBM T.J. Watson Research Center)

Krzysztof Onak is a computer scientist who works at the IBM T.J. Watson Research Center near Yorktown Heights, NY. He is interested in computation with limited resources, including sublinear-time algorithms, streaming algorithms, and algorithms for modern parallel systems. Krzysztof received his Master's degree from the University of Warsaw and his PhD from the Massachusetts Institute of Technology. Before joining IBM, he was a Simons Postdoctoral Fellow at Carnegie Mellon University.

Jean Pouget-Abadie is a research scientist at Google NYC on the Algorithms and Optimization team led by Vahab Mirrokni. He holds a PhD in Computer Science from Harvard University, advised by Edoardo Airoldi and Salil Vadhan. His recent research interests include causal inference and experimental design.

• ### Andrej Risteski (Carnegie Mellon University)

Andrej Risteski is an Assistant Professor in the Machine Learning Department at Carnegie Mellon University. His work lies in the intersection of machine learning and theoretical computer science. The broad goal of his research is theoretically understanding statistical and algorithmic phenomena and problems arising in modern machine learning. He was previously a Norbert Wiener Fellow and Applied Mathematics Instructor at MIT, and prior to that received his Ph.D. at Princeton University.

• ### Ludwig Schmidt (University of California, Berkeley)

Ludwig Schmidt is a postdoctoral researcher at UC Berkeley working with Moritz Hardt, Ben Recht, and Martin Wainwright. Ludwig’s research interests revolve around the foundations of machine learning, often with the goal of making machine learning more reliable. Before Berkeley, Ludwig completed his PhD at MIT under the supervision of Piotr Indyk. Ludwig received a Google PhD fellowship, a Microsoft Simons fellowship, a best paper award at the International Conference on Machine Learning (ICML), and the Sprowls dissertation award from MIT.

• ### Aravindan Vijayaraghavan (Northwestern University)

Aravindan Vijayaraghavan is an Assistant Professor of Computer Science at Northwestern University. After obtaining his PhD in Computer Science from Princeton University in 2012, he was a Simons Postdoctoral Fellow at Carnegie Mellon University. He also spent a year as a postdoc at the Courant Institute, with the Simons Collaboration on Algorithms & Geometry. His research interests are in designing efficient algorithms for problems in Combinatorial Optimization and Machine Learning, and in using paradigms that go Beyond Worst-Case Analysis to obtain good algorithmic guarantees.

• ### Joshua Wang (Google Research, Mountian View)

Joshua Wang is a Research Scientist at Google. Prior to that, he did a PhD at Stanford University, advised by Tim Roughgarden. His current research interests include submodular functions, graph algorithms, and algorithmic game theory. Joshua has won a SPAA 2016 Best Paper Award as well as an ESA 2014 Best Student Paper Award. His work has also been recognized with oral presentations at NeurIPS 2017 and 2018.

• ### David Woodruff (Carnegie Mellon University)

David Woodruff is an associate professor at Carnegie Mellon University, where he has been since 2017. Prior to that he was a reearcher at the IBM Almaden Research Center from August, 2007- August, 2017, which he joined after completing his Ph.D. at MIT in theoretical computer science. His research interests include communication complexity, data stream algorithms, machine learning, numerical linear algebra, sketching, and sparse recovery. He is the recipient of the 2014 Presburger Award and Best Paper Awards at STOC 2013 and PODS 2010. At IBM he was a member of the Academy of Technology and a Master Inventor.

# Organizers

• ### Vahab Mirrokni (Google Research, NYC)

Vahab Mirrokni is a Principal Research Scientist, heading the algorithms research group at Google Research, New York. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 1999. He joined Google Research in New York in 2008, after spending a couple of years at Microsoft Research, MIT and Amazon.com. He is the co-winner of a SODA'05 best student paper award and ACM EC'08 best paper award. His research areas include algorithms, algorithmic game theory, combinatorial optimization, and social networks analysis. At Google, he is mainly working on algorithmic and economic problems related to search and online advertising. Recently he is working on online ad allocation problems, distributed algorithms for large-scale graph mining, and mechanism design for advertising exchanges.

• ### Grigory Yaroslavtsev (Indiana University, Bloomington)

Grigory Yaroslavtsev is an assistant professor of Computer Science at Indiana University and the founding director of the Center for Algorthms and Machine Learning at IU. Prior to that he was a postdoctoral fellow at the Warren Center for Network and Data Sciences at the University of Pennsylvania. He was previously a Postdoctoral Fellow in Mathematics at Brown University, ICERM. He received his Ph.D. in Theoretical Computer Science in 2014 from Pennsylvania State University and an M.Sc. in Applied Mathematics and Physics from the Academic University of the Russian Academy of Sciences in 2010. Grigory works on efficient algorithms for sparsification, summarization and testing properties of large data, including approximation, parallel and online algorithms, learning theory and property testing, communication and information complexity and private data release.