As an important component of any ad serving system, online capacity (or budget) planning is a central problem in online ad allocation. I will survey primal-based and dual-based techniques borrowed from online stochastic matching literature and report theoretical approximation guarantees and practical evaluations of these algorithms on Google display ads systems serving billions of ads. Then, inspired by practical applications, I will discuss a Simultaneous approximation results for both adversarial and stochastic models and present some recent theoretical results in this context. Finally, I will describe a multi-objective online approximation algorithm for maximizing weight and cardinality of the matching at the same time. I will conclude with several open problems.
Vahab Mirrokni is a Senior Staff Research Scientist and the manger of the algorithms research group at Google Research, New York. He joined Google after two years at Microsoft Research, and a year at MIT and Amazon. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University in 2001. Vahab is a co-winner of a SODA best student paper award and an ACM EC best paper award. Vahab's research interests include algorithmic game theory, large-scale and social network analysis, and approximation algorithms. At Google, he is mainly working on algorithmic and economic problems related to search and online advertising. Recently he is working on online ad allocation problems, distributed algorithms for large-scale graph mining, and mechanism design for advertising exchanges.
We introduce and study a new type of learning problem for probability distributions over the Boolean hypercube. As in the standard PAC learning model, a learning problem in our framework is defined by a class C of Boolean functions over the hypercube, but unlike the standard PAC model, in our model the learning algorithm is given uniform random satisfying assignments of an unknown function in C and its goal is to output a high-accuracy approximation of the uniform distribution over the space of satisfying assignments for f. This distribution learning problem may be viewed as a demanding variant of standard Boolean function learning, where the learning algorithm only receives positive examples and --- more importantly --- must output a hypothesis function which has small *multiplicative* error (i.e. small error relative to the size of f^{-1}(1).
As our main results, we show that the two most widely studied classes of Boolean functions in computational learning theory --- linear threshold functions and DNF formulas --- have efficient distribution learning algorithms in our model. Our algorithm for linear threshold functions runs in time poly(n,1/epsilon) and our algorithm for polynomial-size DNF runs in quasipolynomial time. On the other hand, we also prove complementary hardness results which show that under cryptographic assumptions, learning monotone 2-CNFs, intersections of 2 halfspaces, and degree-2 PTFs are all hard. This shows that our algorithms are close to the limits of what is efficiently learnable in this model.
Joint work with Anindya De and Ilias Diakonikolas.
A common theme in many areas of theoretical computer science is that it is sometimes much easier to compute approximations to a function than to compute the function exactly, while sometimes approximation does not make things any easier. The study of this general topic in different settings has led to many deep and far-reaching consequences, including for example much recent progress on the P vs. NP problem and the development of sublinear algorithms.
In this talk, we explore how approximation affects the circuit complexity of boolean functions. Specifically, we restrict our attention to depth-2 circuits (i.e., DNFs and CNFs) and we ask how large such a circuit must be to agree with a given target function on 99% of all possible inputs. As we will see during the talk, the study of this question leads to surprising results and to interesting connections to the design of randomized algorithms, to information theory, to the analysis of boolean functions, and to covering codes.
Joint work with Li-Yang Tan.
I will discuss how sketching techniques from the data stream literature can be used to speed up well-studied algorithms for problems occurring in numerical linear algebra, such as least squares regression and approximate singular value decomposition. I will also discuss how they can be used to achieve very efficient algorithms for variants of these problems, such as robust and structured regression. In many cases the algorithms obtained using sketching are the only known ways to obtain asymptotically optimal algorithms.
In recent years, parallel computing has become ubiquitous, as modern computation platforms, from smartphones to network routers and personal computers to large clusters and clouds, each contain multiple processors. On these parallel computers, scheduling decisions --- what happens when and where --- has a large impact on performance. This talk will present recent research on scheduling algorithms that provide provable guarantees on the performance parallel programs that they schedule. In particular, the talk will cover 2 main topics: (1) Data structures are an important component in designing sequential algorithms. However, using data structures within parallel algorithms while guaranteeing speedup is challenging due to contention between concurrent operations and the resulting slowdown. Amortized data structures, in particular, are extremely difficult to use from within parallel algorithms. This talk will present BATCHER, a scheduling algorithm that allows parallel algorithms to use data structures while still guaranteeing speedup to the algorithm. (2) Streaming computations represent an important class of parallel applications. We consider the problem of scheduling streaming applications to minimize the number of cache misses incurred by the application. We show that for a large class of streaming applications, we can reduce this problem to a graph partitioning problem.
We prove super-polynomial lower bounds on the size of linear programming relaxations for approximation versions of constraint satisfaction problems. We show that for these problems, polynomial-sized linear programs are exactly as powerful as programs arising from a constant number of rounds of the Sherali-Adams hierarchy.
In particular, any polynomial-sized linear program for Max Cut has an integrality gap of 1/2 and any such linear program for Max 3-Sat has an integrality gap of 7/8.
Joint work with James Lee, Prasad Raghavendra, and David Steurer.
We present sublinear algorithms for approximately testing properties of real-valued data under Lp distance measures (for p = 1,2). Our algorithms allow one to distinguish datasets which have a certain property from datasets which are far from having it with respect to Lp distance. While the classical property testing framework developed under the Hamming distance has been studied extensively, testing under Lp distances has received little attention.
For applications involving noisy real-valued data using Lp distances is natural because unlike Hamming distance it allows to suppress distortion introduced by the noise. Moreover, we show that it also allows one to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and Lipschitz conditions (also known as “sensitivity”). Our algorithms require minimal assumptions on the choice of the sampled data (either uniform or easily samplable random points suffice). We also show connections between our Lp-testing model and the standard framework of property testing under the Hamming distance. In particular, some of our results improve existing bounds for Hamming distance.
Joint work with Piotr Berman and Sofya Raskhodnikova.
The amount of data being routinely processed by computers is exploding, and this puts a new demand on the algorithms community: do our algorithms make efficient use of their data? Many familiar problems can be reexamined in this light to yield significant and practical improvements. A small asymptotic algorithmic improvement, from needing terabytes of data to merely gigabytes, effectively cut costs by a factor of a thousand in a data-driven world.
In this talk we consider a fundamental hypothesis testing problem: given data from a probability distribution, was it drawn from a known distribution P, or from a distribution far from P? Our analysis reveals several distinct features that previous algorithms failed to take advantage of, and culminates by showing that these are essentially all the possible features an algorithm could take advantage of. Namely, we introduce what we call an "instance-by-instance optimal" algorithm and show that our algorithm, when testing against an input distribution P, performs within constant factors of the best possible performance of the best possible algorithm customized for P.
Our analyses introduces a new way of using linear programming duality to automatically analyze a broad class of inequalities that generalizes Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms.
Joint work with Gregory Valiant
Proofs of Retrievability (PoR) enable a client to store a number of file blocks with a cloud server so that later the server can prove possession of all the data in a very efficient manner (i.e., with constant computation and bandwidth). Although many efficient PoR schemes for static data have been constructed, only two dynamic PoR schemes exist, which have various disadvantages such as large client storage or practical inefficiency.
We propose the first dynamic POR scheme that is both theoretically and practically efficient. Specifically (i) all complexity measures involved are at most logarithmic in the number of the outsourced blocks; (ii) our system implementation is very practical.
Joint work with Elaine Shi and Emil Stefanov, to appear at CCS 2013
We discuss algorithms for the private analysis of network data. Such algorithms work on data sets that contain sensitive relationship information (for example, romantic ties). Their goal is to compute approximations to global statistics of the graph while protecting information specific to individuals. Our algorithms satisfy a rigorous notion of privacy, called node differential privacy. Intuitively, it means that an algorithm's output distribution does not change significantly when a node and all its adjacent edges are removed from a graph. We present several techniques for designing node differentially private algorithms. We also develop methodology for analyzing the accuracy of such algorithms on realistic networks. Our techniques are based on combinatorial analysis, network flow, and linear and convex programming. Based on joint work with Shiva Kasiviswanathan, Kobbi Nissim and Adam Smith
The cutting plane approach to optimal matchings has been discussed by several authors over the past decades [Padberg-Rao, Grotschel-Holland, Lovasz-Plummer, Trick, Fischetti-Lodi], and its convergence has been an open question. We give a cutting plane algorithm for the minimum-cost perfect matching problem using Edmonds' blossom inequalities as cuts and prove polynomial convergence of this algorithm.
Our main insight is an LP-based method to retain/drop candidate cutting planes. Our cut-addition is based on maintaining laminarity. This leads to a sequence of intermediate linear programs with a linear number of constraints whose optima are half-integral and supported by a disjoint union of odd cycles and edges. This structural property of the optima is instrumental in finding violated blossom inequalities (cuts) in linear time. The number of cycles in the support of the half-integral optima acts as a potential function to show efficient convergence to an integral solution.
Joint work with Laszlo Vegh and Santosh Vempala.
An "oblivious subspace embedding" (OSE) is a distribution over matrices S such that for any low-dimensional subspace V, with high probability over the choice of S, ||Sx||_2 approximately equals ||x||_2 (up to 1+eps multiplicative error) for all x in V simultaneously. Sarlos in 2006 pioneered the use of OSE's for speeding up algorithms for several numerical linear algebra problems. Problems that benefit from OSE's include: approximate least squares regression, low-rank approximation, l_p regression, approximating leverage scores, and constructing good preconditioners.
We give a class of OSE distributions we call "oblivious sparse norm-approximating projections" (OSNAP) that yield matrices S with few rows that are also extremely sparse, yielding improvements over recent work in this area by (Clarkson, Woodruff STOC '13). In particular, we show S can have O(d^2) rows and 1 non-zero entry per column, or even O(d^{1+gamma}) rows and poly(1/gamma) non-zero entries per column for any desired constant gamma>0. When applying the latter bound for example to the approximate least squares regression problem of finding x to minimize ||Ax - b||_2 up to a constant factor, where A is n x d for n >> d, we obtain an algorithm with running time O(nnz(A) + d^{omega + gamma}). Here nnz(A) is the number of non-zero entries in A, and omega is the exponent of square matrix multiplication.
Our main technical result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: i.e. we show that for any U in R^{n x d} with orthonormal columns and random sparse S with appropriately chosen entries and sufficiently many rows, all singular values of SU lie in the interval [1-eps, 1+eps] with good probability.
Joint work with Huy Lê Nguyễn (Princeton).
Our approaches are based on computing the importance of the rows, known as leverage scores, in an iterative manner. We show that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances. This gives an algorithm whose runtime is input sparsity plus an overhead comparable to the cost of solving a regression problem on the smaller approximation. Our results build upon the close connection between randomized matrix algorithms, iterative methods, and graph sparsification.