Modern Algorithms or The Brave New O of the Big N

The Role of Algorithms

Driven by booming enrollments in Computer Science, the first theoretical class that the majors usually take, “Introduction to Algorithms”, is also experiencing unprecedented growth. At top schools this is evidenced by the fact that dozens of TAs are now employed to teach this class to several hundreds of students each year. There are multiple reasons making “Introduction to Algorithms” one of the cornerstones of the Computer Science curriculum. One of the key roles that it plays for many students is serving as their first introduction into fully rigorous analysis of the performance of computer programs. It teaches the students how to use the rigorous mathematical lens to see abstract structure behind the data that they haven’t seen before.

Furthermore, it introduces a basic set of tools that can be used to process large amounts of data regardless of any assumptions about the generation process used to create it (worst-case analysis). Unlike other popular approaches to algorithm design such as machine learning and its newest incarnation called “deep learning”, the core algorithms curriculum gives solutions which use no training data and behave robustly under any changes in their input.

What Should the “Introduction to Algorithms” Look Like?

The question asked here is provocative and doesn’t have a hard and fast answer for multiple reasons. First, the answers vary quite a bit depending on who you ask. I will illustrate this point later by comparing the curricula used at some of the top schools. Hence, I will stress first that all ideas expressed below are a matter of my personal taste. From this point on all opinions expressed in this post are about how I would teach introduction to algorithms rather than suggesting the reader to do the same. In fact, I believe that in the U.S. and all over the world we are very lucky to have enough diversity to create curricula which look quite different from each other thus giving students more options. While the most fundamental basics are roughly the same, the choice of advanced topics is often driven by the instructor’s research interests. For those interested in pursuing a research career this gives an opportunity to get involved in research early on.

Second, unlike more traditional subjects such as maths and physics, the subject itself is rapidly evolving. My rough estimate from looking at the history would be that once every 10-15 years a significant part of the curriculum has to undergo a shake-up. This is another reason why having an instructor who is an active researcher in the area is critical for keeping up with developments in the field. Stale curricula can even sometimes create room for doubt in whether algorithms are still relevant or some other class can be used as a replacement. While there is hardly any doubt that rigorous analysis of algorithms will be relevant for many years to come, concerns such as the one above can be seen as a call for action.

Despite the two fundamental challenges discussed above, I believe that there are some guiding principles that can be used to determine the choice of topics for the introductory classes. The first one is simplicity and clarity of the underlying ideas. The second one is them passing the test of time and being implemented and used in a variety of software packages. This process serves as a “natural selection” for algorithmic ideas. A 10-15 year period is usually enough for the hype around hot topics to settle down. Finally, universality and robustness to the choice of a particular model or architecture also play an important role. This is probably the hardest principle to use since it involves predicting the future.

The Shoulders of Giants

Books

Now let’s briefly discuss the existing literature and curricula at the top schools.

Probably the most canonical textbook on algorithms is the MIT book known as CLRS (first published in 1990, the most recent third edition came out in 2009). I’ve got my first edition in high school back in 2003. At the time this book was quite a breakthrough compared to the previous generation of textbooks such as Aho-Hopcroft-Ullman’s and Sedgewick’s. I was heavily influenced by CLRS, subsequently using it to teach introductory classes for high school students in mid-late 00’s.

The fact that almost all algorithms from CLRS can be implemented as is using general purpose programming languages (C++, Java, Python) also made it very popular in the programming community including the competitive part of it. E.g. in the Russian summer camps for high school students CLRS formed the core of the B/C-level classes while A/B-level classes cover topics roughly similar to Erik Demaine’s Advanced Algorithms and Advanced Data Structures. I don’t have the data but will be very surprised if CLRS didn’t sell more copies than any other algorithms textbook ever published. A recent testament to the popularity of CLRS is the fact that its first author Thomas Cormen is about as popular on Quora as the President of the United States (this fact probably tells more about the kind of people who are active on Quora though).

Over years multiple alternatives have emerged, among which I would like to mention two: the Berkeley-UCSD “Algorithms” by Dasgupta, Papadimitriou and Vazirani and the Cornell’s “Algorithm Design” by Kleinberg and Tardos. Both books were published in 2005-06 and to the best of my knowledge second editions aren’t available yet. One of the main differences between these newer books and CLRS is their concise style and focus on high-level ideas rather than low-level details. However, all the books discussed above are starting to show their age. A litmus test is the fact that they either don’t mention Chernoff bounds at all or mention them as an exercise or in one of the last chapters where they are barely used. I would expect a modern algorithms textbook to introduce concentration bounds early on and then use them heavily throughout the course.

Recently among textbooks I haven’t seen any strong newcomers, which might be partly due to the fact that books are somewhat passe these days (the only notable exception off the top of my head is a recent book “Foundations of Data Science” by Hopcroft and Kannan which is very interesting but has a somewhat different goal so I don’t see how an introductory algorithms class can be based solely on it).

Courses

In search for a modern algorithms curriculum let’s now turn to the classes taught recently at some of the schools in the U.S. whose class pages are publicly available. At some schools different instructors teach the class in different years, so here I just picked one at random to save space. Certain topics appear consistently in all of these classes (sorting/median, hashing, dynamic programming, greedy algorithms, cuts and flows, BFS/DFS, union-find, MST, FFT, shortest paths, etc.) so I will focus on the differences which make these classes unique.

At MIT the “Design and Analysis of Algorithms” class is taught by Erik Demaine. Here is the most recent page. Erik is one of the best living experts on data structures. No surprise his class is a little heavy on cool data structures, including Van Emde Boas trees, Skip Lists and Range Trees which aren't usually present in a typical algorithms curriculum.
At CMU the “Algorithms” class was recently taught by Anupam Gupta and Danny Sleator, page here. This is a very interesting class where the instructors made a great effort including some modern topics such as linear programming, zero-sum games, streaming algorithms for big data, online algorithms, machine learning, gradient descent together with some advanced data structures (splay trees and segment trees).
At Berkeley the class was recently taught by David Wagner, page here. The class is based on the DPV book and also serves as an introduction into Theoretical Computer Science (primarily because it discusses in detail NP-completeness, which is either not present in other classes or only mentioned briefly). The non-standard topics include an intro to machine learning, streaming algorithms (CountMin sketch) and PageRank.
At Cornell the class was recently taught by Eva Tardos and David Steurer, page here. Not surprisingly, the class is heavily KT-based. Among unusual topics there is a lot of NP-hardness and computability (Turing machines, Church-Turing, undecidability, etc.) + a large module on approximation algorithms. Modern topics include Nash equilibria, best expert algorithm (multiplicative weights) and stable matching. Overall, this class has a strong bias towards foundations and approximation algorithms + an AGT/learning spin to it.
At Stanford the class is taught this semester by Virginia Williams, page here. This is a traditional CLRS-based class. Since Stanford is on a quarter system this class is shorter than others. For more advanced algorithms courses at Stanford see CS168, CS261, CS267 and CS367. In particular, CS168, "The Modern Algorithmic Toolbox" is a great example of an advanced modernized algorithms class. According to private channels a modernized version of the algorithms curriculum is currently under construction at Stanford.
At Harvard the Data Structures and Algorithms class is taught by Jelani Nelson, page here. This is also a fairly traditional CLRS/KT-based class with a touch of linear programming and approximation algorithms.
At UIUC the class is taught by Jeff Erickson whose lecture notes basically form a book. Non-standard topics include matroids, heavy emphasis of randomized algorithms and amortized data structures,

The Brave New O of the Big N

Finally, I would like to suggest some ideas for a modern algorithms curriculum. As I mentioned in the motivational discussion above I believe that there are three fundamental guidelines: simplicity, implementability / test of time and potential for the future. None of the proposed topics is particularly new and all of them have been tested in advanced graduate level classes at different schools with accessible expositions available. Petabytes of data are getting crunched daily using these techniques and most of them have been implemented in a variety of software packages.

Randomized and approximation algorithms. Concentration bounds and tail inequalities early on. Examples of simple randomized and approximation algorithms that are actually used in practice, e.g PageRank, Set Cover, etc. There is a lot of mileage in these basic algorithms.
Linear programming. LP basics/duality + approximation algorithms. Since this topic has already made it into a large number of courses discussed above, I won't discuss it in much detail. Considering LP-solvers as a solution that is available for a wide class of problems is an implicit goal achieved here.
Basics of machine learning and learning theory. Core learning ideas: perceptron, boosting, VC-dimension, multiplicative weights. In order to strengthen connections with machine learning one can emphasize clustering problems in other parts of the course (SVD, k-means, single-linkage clustering, nearest neighbor, etc.)
Linear sketching. This is probably the most recent topic (see these two surveys by Andrew McGregor and David Woodruff), but I strongly believe that by now the field is mature enough to be covered in the intro class. A good example of a linear sketch is the CountMin data structure. It is a stronger version of the Bloom filter which is one of the most widely used data structures. The basic philosophy here is surprisingly powerful: CountMin allows to maintain an approximate version of the most basic data structure, an array, using space which is independent of the array's size. Taking this further, linear sketching is a very powerful tool for designing algorithms for massive data regardless of the computational model. Whether it is streaming, MapReduce or take your pick, linear sketches are often the best solution known and/or proved to be optimal. They can also be implemented using basic linear algebraic primitives (see next bullet).
Algorithms based on linear algebraic primitives. I think that avoiding combinatorial magic is the key to making algorithms robust to the choice of the computational model and also more parallel (see next bullet). Whenever, there is a solution which only uses basic linear algebra, it might be a good idea to prefer it over a combinatorial algorithm even if the latter is a little bit faster and/or easier to implement from scratch. Regardless of the computational model one can expect linear algebraic primitives to be already implemented there (e.g. MatLab). A good example here is the All-Pairs-Shortest-Paths problem (see Uri Zwick's slides for details). Other examples are PageRank, applications of SVD and linear sketching algorithms described above.
Parallel algorithms and data structures. When faced with multiple algorithmic alternatives it might be a good idea to pick one that is parallelizable. E.g. among the Prim's, Kruskal's and Boruvka's algorithm for MST Boruvka's is the winner here because it is the only one that is not sequential. This fact is used in a variety of parallel MST algorithms. Linear sketching is again going to be handy here. Algorithms based on sorting and hash tables are good since these primitives are often very efficiently implemented in parallel systems (e.g. Hadoop, DHT).
Data structures and NP-completeness => Advanced classes. In order to make some room for the suggestions described above I would suggest to reduce discussion of these topics to the bare minimum that is necessary in order to cover the core algorithmic ideas. I believe that each of them by itself deserves to be covered in a separate class. With hundreds of students enrolled these topics start to feel too specialized for an introductory algorithms class. NP-completeness can be combined with other topics in computational complexity and automata theory to make it a semester long course. Data structures seem to go naturally with advanced algorithms as another course. To spice things up one can even add purely functional data structures. This may sound a little controversial but there seems to be a general tendency towards moving away from data structues among the books and curricula discussed above. As for NP-completeness, I think it depends on whether a separate class on the theory of computing is offered which for any good school I really believe should be the case.

Grigory Yaroslavtsev