Next week I am launching at Penn a new graduate class called “algorithms for Big Data”. Really excited to be teaching my first full class which was in the works for a while. Last semester I co-taught “Computational Learning Theory” with Michael Kearns which was a great experience but developing a new class on my own was even more entertaining. The homepage is here.
Tentative list of topics is available and I will appreciate any comments/suggestions.
Among other related “big data theory” classes listed here my class will be one of the most
focused on distributed algorithms for clusters and Hadoop/Mapreduce. E.g., most of the streaming and dimensionality reduction techniques introduced in the first parts of the class serve primarily as an introduction into linear sketching which works in the distributed context as well.
On a related note, multiple shout-outs to Google who makes its Compute Engine available for 2 months of free trial with a $200 credit. The demos in my class will be run on this platform which I think is the friendliest among the competitors.
I was also really excited to find out that this year Google has launched the first large online distributed algorithm competition that I am aware of – Distributed Code Jam. I’ve been expecting this for a while and now you can finally get your hands on a nice set of algorithmic problems in distributed computing. The solutions are executed on 100 machines in parallel which allows to process inputs of 109 records easily. Practice Round problems include some classic theoretical problems such as distributed majority computation and finding a path on a cycle.