Increasingly large datasets are being collected by governments, private companies and research institutions. This motivates increased interest in the design and analysis of algorithms for rigorous analysis of such data. In many cases, even linear space/time algorithms can be too slow. Therefore, there has been growing body of work on "sub-linear algorithms", that use space or time that are sub-linear in the input size. In this course we will cover such algorithms, which can be used for the analysis of distributions, graphs, data streams and high-dimensional real-valued data.
We focus on two types of sublinear algorithms: sub-linear time algorithms, and sketching/streaming algorithms. The former access only a small number of input elements and perform very limited computation; the answer is then inferred using approaches such as random sampling or property testing. The latter extract only a small amount of information about the input (a "sketch") and compute an answer based on that. Manifestation of this approach include data stream algorithms and randomized dimensionality reduction.
This course is partially based on the Sublinear Algorithms class by Piotr Indyk and Ronitt Rubinfeld at MIT, the Big Data class by Jealni Nelson at Harvard and the Sublinear Algorithms class by Sofya Raskhodnikova at Penn State. It also includes some recent work by the lecturer.