The worst-case analysis of algorithms provides accurate guidance about how to solve some but not all computational problems. In this talk, I'll survey my recent work in three application domains where novel algorithm analysis frameworks enable meaningful performance guarantees: revenue-maximizing auctions, graph partitioning, and social network analysis.
Tim Roughgarden received his Ph.D. from Cornell University in 2002 and joined the Stanford CS department in 2004, where he is currently an associate professor. His research interests are in theoretical computer science, especially its interfaces with game theory and networks. He wrote the book "Selfish Routing and the Price of Anarchy" (MIT Press, 2005) and co-edited the book "Algorithmic Game Theory", with Nisan, Tardos, and Vazirani (Cambridge, 2007). His awards include the 2002 ACM Doctoral Dissertation Award (Honorable Mention), the 2003 Tucker Prize, a 2007 PECASE Award, the 2008 Shapley Lectureship of the Game Theory Society, the 2009 ACM Grace Murray Hopper Award, and the 2012 EATCS-SIGACT Godel Prize. He has taught two online courses on algorithms delivered via Coursera.
Machine learning is a vibrant field with many rich techniques. However, these approaches are often heuristic: we cannot prove good bounds on either their performance or their running time except in limited settings. This talk will focus on designing algorithms whose performance can be analyzed rigorously. I will describe recent work on the nonnegative matrix factorization problem, which has important applications throughout machine learning (and theory). As is often the case, this problem is NP-hard when considered in full generality. However, we study a sub-case called separable nonnegative matrix factorization that we believe is the right notion in various contexts. We give a polynomial time algorithm for this problem, and leverage this algorithm to efficiently learn the topics in a Latent Dirichlet Allocation model (and beyond). This is an auspicious example where theory can lead to inherently new algorithms that have highly-practical performance on real data sets. I will also describe recent and ongoing work on dictionary learning which is another important direction in advancing this agenda.
This is based on joint works with Sanjeev Arora and Rong Ge
Peer-to-Peer (P2P) networks are highly dynamic decentralized networks that experience heavy node churn, i.e., nodes join and leave the network continuously over time. We model such P2P systems as synchronous dynamic networks. In each round, an adversary can add and remove a large number of nodes, and also rewire the network subject to some connectivity constraints. We are interested in solving the problem of storing and searching data items despite such high churn rate and network dynamism. In the course of solving this problem, we develop a random walks based sampling technique to sample nodes uniformly at random from the network. While it is well known that random walks are useful for sampling, their application in our context is nontrivial because the churn and network dynamism can potentially bias them or even destroy them. Furthermore, we believe that this sampling technique may prove to be useful in a variety of other applications.
More details can be found in our paper “Storage and Search in Dynamic Peer-to-Peer Networks,” joint with Anisur Molla, Ehab Morsy, Gopal Pandurangan, Peter Robinson, and Eli Upfal. This paper was presented in SPAA 2013.
John Augustine is an assistant professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Madras. He earned his PhD in theoretical computer science from the University of California at Irvine. After his PhD, he has held several positions both in academia and in industry. He is broadly interested in designing and analysing algorithms.
A tour in a graph is a connected walk that visits every vertex at least once, and returns to the starting vertex. We give improved approximation results for a tour with the minimum number of edges in regular graphs.
For cubic bipartite graphs, we provide a polynomial-time (9/7)-approximation algorithm for minimum tours. For connected d-regular graph with n vertices, we provide a method that constructs a tour of length at most (1 + O(d-1/2))n, improving the previous result of Vishnoi (2012) that demonstrated a tour of length at most (1 + O((log d)-1/2))n.
The former result uses the cubic bipartite graph structure to find a cycle cover with large average length. The latter finds a spanning tree with few odd-degree vertices and augments it to a tour. Finding such spanning trees to augment is related to the linear arboricity conjecture of Akiyama, Exoo and Harary (1981), or alternatively, to a conjecture of Magnant and Martin (2009) regarding the path cover number of regular graphs.
Joint work with Uriel Feige (Weizmann Institute), Jeremy Karp (CMU) and Mohit Singh (Microsoft Research).
Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in machine learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising, to dynamic pricing. In many of these application domains the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the time horizon. The literature lacks a general model encompassing these sorts of problems. We introduce such a model, called ``bandits with knapsacks'', that combines aspects of stochastic integer programming with online learning. A distinctive feature of our problem, in comparison to the existing regret-minimization literature, is that the optimal policy for a given latent distribution may significantly outperform the policy that plays the optimal fixed arm. Consequently, achieving sublinear regret in the bandits-with-knapsacks problem is significantly more challenging than in conventional bandit problems. We present two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel ``balanced exploration'' paradigm, while the other is a primal-dual algorithm that uses multiplicative updates. Further, we prove that the regret achieved by both algorithms is optimal up to polylogarithmic factors.
Joint work with Robert Kleinberg and Ashwinkumar Badanidiyuru. Appeared at FOCS 2013.
Full version can be found here.
This is joint work with Ken Shirley of ATT Labs and Richard Cole of NYU.
The intuition that profit is optimized by maximizing marginal revenue is a guiding principle in microeconomics. In the classical auction theory for agents with linear utility and single-dimensional preferences, Bulow and Roberts (1989) show that the optimal auction of Myerson (1981) is in fact optimizing marginal revenue. In particular Myerson's virtual values are exactly the derivative of an appropriate revenue curve.
In this talk I will generalize the theory of marginal revenue maximization from the ideal setting above, to agents with multi-dimensional and non-linear utility. This is a main challenge area for mechanism design. I will show that marginal revenue maximization, though no longer optimal, continues to be approximately optimal quite generally. Moreover, the result gives a reduction from auction design for multi-dimensional non-linear agents to auction design for single-dimensional linear agents. The latter being the most studied setting of auction theory. This approximate reduction implies that many results automatically approximately extend.
Joint work with Saeed Alaei, Hu Fu, and Nima Haghpanah; appeared in FOCS 2013.
In this talk, I will describe some recent algorithmic results related to computing optimal flows and cuts in surface embedded graphs. These results are motivated by a desire to bring our understanding of efficient planar flow and cut algorithms to more general classes of graphs.
I will begin by sketching an algorithm to compute a global minimum cut in an embedded genus g undirected graph. The algorithm runs in time gO(g) n log log n. For constant g, this running time is a polylogarithmic improvement over the best known running time for sparse graphs due to Karger. Unlike Karger’s algorithm, the one I will describe is deterministic
Second, I will describe an algorithm for counting minimum s,t-cuts in an embedded genus g directed graph. This problem has applications in image segmentation, and despite the problem being #P-complete in general, the algorithm I will describe runs in 2O(g) n2 time. The algorithm can also be used to aid in sampling minimum s,t-cuts uniformly at random.
When the genus is constant, both algorithms I will describe match the running time of the best known planar graph algorithms for their respective problems. I will conclude the talk by discussing some future work and open problems in this area of research.
This talk is based on work done with Erin W. Chambers, Jeff Erickson, and Amir Nayyeri.
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves O(dnρ) query time and O(n1 + ρ + nd) space, where ρ ≤ 7/(8c2) + O(1/c3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and ℓ1 norm with ρ ≤ 7/(8c) + O(1/c3/2) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).
Our data structure is able to bypass the locality-sensitive hashing barrier by using data-dependent hash functions (previous work used functions that are oblivious to the data).
Joint work with Alexandr Andoni, Piotr Indyk and Nguyễn Lê Huy. A short summary of the paper can be found at http://mittheory.wordpress.com/2013/11/22/beyond-lsh/.
In 1976, Christofides gave an approximation algorithm for the traveling salesman problem (TSP) with metric costs that was guaranteed to find a tour that was no more than 3/2 times the length of the shortest tour visiting a given set of n cities; it remains an open problem to give a polynomial-time algorithm with a better performance guarantee. There have been a number of recent results that yield improvements for significant special cases, and for related problems. In this talk, we shall present an approximation algorithm for the s-t path TSP with metric costs, which is guaranteed to find a solution of cost within a factor of the golden ratio of optimal in polynomial time; in this variant, in addition to the pairwise distances among n points, there are two prespecified endpoints s and t, and the problem is to find a shortest Hamiltonian path between s and t. Hoogeveen showed that the natural variant of Christofides' algorithm for this problem is a 5/3-approximation algorithm, and this asymptotically tight bound has been the best approximation ratio known until now. We modify this algorithm so that it chooses the initial spanning tree based on an optimal solution to a natural linear programming relaxation, rather than a minimum spanning tree; we prove this simple but crucial modification leads to an improved approximation ratio, surpassing the 20-year-old barrier set by the natural Christofides' algorithm variant.
This is joint work with Hyung-Chan An and Robert Kleinberg.
The amount of data available and requiring analysis has grown at astonishing rate in recent years. To cope with this deluge of information it is fundamental to design new algorithms to analyze data efficiently. In this talk, we describe our effort to build a large scale graph-mining library. We first describe the general framework and few relevant problems that we are trying to solve. Then we describe in details two results on local algorithms for clustering and on learning noisy feedback from a crowdsource system.
To solve the first problem we develop a random-walked based method with better theoretical guarantee compared to all previous work, both in terms of the clustering accuracy and the conductance of the output set. We also prove that our analysis is tight, and perform empirical evaluation to support our theory on both synthetic and real data.
For the second problem we introduce a new model of Gaussian mixtures, that captures the setting where the data points correspond to ratings on a set of items provided by users who have widely varying expertise. In this setting we study the single item case and obtain efficient algorithms for the problem, complemented by near-matching lower bounds; we also obtain preliminary results for the multiple items case.
Joint work with Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Vahab Mirrokni and Zeyuan Allen Zhu.
In this talk I will discuss new approximation algorithms for multiple classes of classical problems in large graphs. Such algorithms can be used to remove redundancy in distributed systems, cut their costs, simplify the structure of a large network, cluster a set of data points, etc.
For general directed graphs I will focus on sparsification with distance and connectivity preserving guarantees (directed spanners and Steiner forests). For planar graphs I will discuss algorithms for a class of problems, which have costs associated with nodes of the graph.
I will also discuss new algorithms for modern massive parallel computational models. I will show new sketching techniques for geometric graphs in Euclidean space, which allow to compute approximate minimum spanning tree and minimum cost bichromatic matching for huge graphs with a small amount of communication overhead.
This talk is based on multiple papers by the speaker in collaboration with Alexandr Andoni, Piotr Berman, Arnab Bhattacharyya, Krzysztof Onak, Aleksandar Nikolov, Konstantin Makarychev and Sofya Raskhodnikova.
Recent advances in the study of approximate inference algorithms in the machine learning community have led to provable lower bounds for certain families of counting problems. I'll explain how simple dynamic programming algorithms combined with "lifts" of graphs have led to these new lower bounds, and I'll discuss a number of interesting conjectures related to classic counting problems in computer science. No prior knowledge of graphical models and/or machine learning is required.