Grigory Yaroslavtsev bio photo

Grigory Yaroslavtsev

Assistant Professor of Computer Science, George Mason University.

Email Twitter Facebook Google+ LinkedIn Github

In this blog post I want to suggest a simple reason why you should study your algorithms really well if you want to design algorithms that deal with big data. This reason comes from the way billings offered by cloud services work.

Maybe you remember yourself taking that algorithms class and thinking: “Who really cares if that algorithm uses a bit more time? Can't we just wait a little longer?”. Or “Ok, we can save some space here, but if it all fits into my RAM anyway then why bother?”. These are both great reasons not to care too much about efficiency of your algorithms if your data is small, fits into RAM and the running times aren't significant enough to matter anyway. So you would go on to program your favorite video game and not care about that professor talking about all that big-Oh nonsense. And in the short run you would be right. While you are developing a prototype of your favorite video game you shouldn't care. When I was working at a startup I remember myself learning the hard way that premature optimization is the root of all evil.


However, once your video game becomes successful and you get to deal with big data that has to be stored and processed in the cloud this reasoning starts to fall short. Let's say you developed Candy Crush Saga (valued at $7bn in 2014) and now you are interested in doing some data analytics about your >10 million active users. You are now considering outsourcing your data storage and computation to the cloud. Here is where you might want to learn why the design of space and time-efficient algorithms matters for the bottom line of your future business.

100x more efficient algorithms = 100x less money in billings

So that time and space your professor was talking about – what does it have to do with your spending on the cloud services? The answer is surprisingly simple – if you need 100x more time and space then your billing increases 100 times. Below I used the pricing calculator that comes with Google Compute Engine to see how the cost scales if I want to use 100/1000/10000 identical machines for a year.

I was myself surprised to find this out since I expected some economy of scale to kick in. In fact, sometimes it does but usually is quite negligible. Say, you can get an X% discount but that doesn't help much against linear scaling.