Let's talk numbers
Aug 21, 2013

We don’t talk numbers enough, in software business. You walk into a store to buy a gallon of milk and you know the price. You ask a software consultant how much the solution costs and he says, “it depends”. You cannot even get him to state assumptions and give a reasonable price.

At IITM, I heard a story about a German professor, and his style of evaluating the exam papers. In one of the questions, the student made a mistake in some order calculation. His answer was accurate but for the the trailing zero. The student got “0”.

Now, in general, if the student understood what he needs to do, and applied the formula, even if made some simple calculation mistake, he would get partial credit, in other courses. Getting zero marks was a shocker to him. But the German professor wanted was for the students to develop a good feel for the answer. For instance, if I were to calculate the weight (er…, mass) of a person and it came to 700 Kg, it should ring some warning bells, right?

In my job, lack of basic understanding about numbers holds back people, unfortunately. This ignorance shows up in places like sizing the hardware, designing the solutions, and creating budgets.

My inspiration for this post is a wonderful book “Programming pearls” that talks about these kind of back of the [envelope calculations](http://www.cs.bell- labs.com/cm/cs/pearls/bote.html).

Network Latencies

Suppose you are hosting a web application in San Francisco. Your users are in Delhi. How much minimum latency can you expect?

The distance is around 7500 miles or 12000 Km. Light can travel 186000 miles per second. So, it takes around .04 seconds. But light travels around 30% less speed in fiber than in vacuum. Also, it is not going to travel in a straight line – it may go through lot of segments. Besides, there are relays and other elements that delay the signal. All in all, we can say our signal will have an effective speed of say .15 seconds. Now, we need to a round trip, for any acknowledgement – so that makes it .3 seconds. A simple web page will have 15 components (images, fonts, CSS, and JS). That means around 4 round trips (Most browsers do four components at a time).

So, just in speed of light basis alone, your network is going to take up 1.2 seconds. That is going be the number you are going to add to your testing on your laptop.

Developing Java code

I met a seasoned developer of Java code. He has been developing systems, and well versed with optimization. We were reviewing some code and I noticed the he was optimizing for creating less number of objects.

How much time does it take to create an Object in Java? How much with initialization from String? How much time does it take to concatenate strings? To append to strings? Or, to parse an XML string of 1K size?

The usual answer from most people is “it depends”. Of course, it depends on the JVM, OS, machine, and so many other factors. My response was “pick your choice – your laptop, whatever JVM, and OS that you use, when the system is doing whatever it normally does”. It is surprising that most people have no idea about performance of their favorite language!

In fact, most of us who work with computers should have an idea about the following numbers.

Numbers every should know about computers

In a famous talk by Jeff Dean, he lays out the standard numbers that every engineer should know:

These may appear simple enough, but the implications of developing code are enormous. Consider some sample cases:

  1. What if you want to design a system for authentication for all the users in Google (ignore the security side of the question – only think of it as a lookup problem), how would you do it?
  2. If you want to design a system for a quote server for the stock market, how would you speed it up?
  3. You are developing an e-commerce for a medium retailer with 10000 SKU’s (Stock keeping Units). What options would you consider for speeding up the application?

Naturally, these kind of questions lead to some other numbers. What is the cost of the solution?

Cost of computing, storage, and network

There are two ways you can go about constructing infrastructure: the lexis- nexis way or the Google way. Remember that Lexis-Nexis is a search company that lets people search for legal, scientific, and specialty data. Their approach to build robust, fail-safe, humongous machines that serve the need. On the opposite spectrum is Google, which uses white boxed machines, with stripped down parts. I suppose it gives a new meaning to the phrase “Lean, mean machine”. (Incidentally, HDFS etc, take similar approach).

Our situation lies more towards Google. Let us look at the price of some machines.

  1. A two processor, 32 threaded blade server, with 256 GB is around $22000. You can run 32 average virtual machines on this beast. Even if you are going for high-performance machines, you can run at least 8 machines.
  2. If you are going for slightly lower end machines (because you are taking care of robustness in the architecture), you have other choices. For instance, you can do away with ECC memory etc. You can give up power management, KVM over IP etc, for simpler needs. [Example: setting up internal POC’s and labs.]. If that is the case, you can get 64 GB machine, with SSD of 512 GB and 5 TB storage at around $3000.

So, you have some rough numbers to play with, if you are constructing your own setup. What about the cost of cloud? There, pricing can be usage based, and can get complex. Let us take a 64GB machine with 8 hyper-threads. If we are running it most of the time, what is its cost?

Amazon’s cost tends to be slightly on the high side. You pay by the hour and it costs around $1.3 per hr. That is roughly equivalent to $1000 per month. If you know your usage patterns well, you may optimize it down to say, $500 per month.

Or, you could use one of my favorite hosting sites: OVH. There, a server like the above, would cost around $150 per month. Most of the others fall somewhere in between.

Now, do a small experiment in understanding the costs of the solutions: Say, to create an e-commerce site that keeps the entire catalogue cached in memory, what is the cost of the solution?

To truly understand the cost of solution, you also need to factor in people cost as well. That means, what is the effort to develop solutions, operate and support them.

TCO: Total cost of ownership

To understand the cost of operations and support, here is a rule of thumb. If you are automating tasks, reducing human intervention, you can assume that cost of design and development can range from $50 to $250. The variation is due to location, complexity of systems, effort estimation variations, choices of technology, and a few other details.

A few details worth noting:

  1. You can get a good idea of salary positions and skills by looking at the sites like dice.salary.com. Try indeed.com, for a more detailed look at, based on the history of postings.
  2. To get the loaded cost to the solution for labor, multiply the cost per hour by 2. For instance, if you need to pay $100 K salary, the hourly cost is going to be $100 (2000 hrs per year).
  3. By choosing a good offshore partner, you can get the ops cost as low as $15 to $30.
  4. The cost of good design and architecture – as they say, is priceless!

As per technology choices: which technology gets you low TCO? What kind of numbers can you use in your back of the envelope calculations?

That will be the topic for some other day.