Is Scalability A Bogey Created By Hardware Industry?

A friend recently asked me the following question via Twitter:

@s_ketharaman Thoughts? I am not so sure since infra is still not scaling. https://t.co/vDi1zHZynB

— Gopinath Pandalai (@gopibella) April 28, 2020

I read the linked HBR article on Blockchain technologies and replied as follows:

Many mobile payments use cases I never thought could go digital due to question marks on scalability did go digital after re/demonetization. Since then, I’ve begun to wonder if the role of scalability is exaggerated. IMO, current scalability of Blockchain is adequate for many use cases mentioned in the article, especially when combined with sidechains.

I got the feeling that scalability was exaggerated after two experiences in the last 4-5 years.

DIGITAL PAYMENTS

There was a huge cash crunch in the wake of re/demonetization of high value currency notes in India in 2016-17. There was a plethora of digital payment apps like PayTM, PayZapp, et al, to tide over the shortage of currency notes. But the digerati claimed that digital payment was a pipe dream for India for a variety of reasons like

Low literacy levels in the population
Patchy network in many parts of the country
Slower speed of digital payments compared to cash.

More in #CashlessIndia – Why Putting Cart Before Horse Will Work.

But, in my actual shopping visits at the time, I noticed that PayTM et al did work fine at push cart vendors, paan-beedi shops, khirana stores, streetside chai ka tapris, and so on (For the uninitiated, all of these are informal retail outlets that dot any Indian city).

It only then dawned on me that transaction volume in most of these usage scenarios was quite low. There was no line of people waiting to pay. If the mobile payment was slow, it was okay, as there was nobody behind you rushing you to finish your payment. In the worst case, even if the mobile payment failed once or twice, it didn’t matter – you simply re-tried it until it succeeded.

(The one place where mobile payments did NOT take off was supermarket checkout. There was a line behind you, speed did matter and you couldn’t keep retrying failed payments. Scalability was on the critical path for this use case.)

As an aside, the above logic won’t work in advanced markets like the USA where people are money rich time poor, there’s a coolness factor associated with being an early adopter of mobile payments, and it’s extremely uncool if a mobile payment fails. From The Failure of Coin:

I bought a Coin under the promise I would be able to carry just one credit card, so when my shiny new Coin arrived, I emptied my wallet of all other credit cards and headed straight to my local coffee spot. I was feeling pretty slick when the barista commented on how cool the card looked, but when it failed to swipe I quickly moved from cool early adopter to embarrassed dork unable to pay for a $5 latte. Over the next ten days I tried to use my Coin 54 times. It succeeded a mere 28. Even if that first use was the only failure and the other 53 times were smashing successes, the nervousness I felt every time I handed my Coin over to be swiped made the experience negative every single time. A credit card needs to work every time, if it doesn’t, it causes a huge amount of pain.

RIDESHARE

I’ve written about this in How Rideshare Startups Flipped A Switch To Create A Billion Dollar Business Overnight. Let me copy-paste the relevant passage from that blog post:

As we all know, unlike conventional yellow top taxis, rideshare cabs don’t have a meter. The driver’s smartphone app uses GPS to track the route and clock to track the time of the journey. At the destination, the driver swipes the End Trip button on the app. The app sends the route and travel time to the server over the air. The server sitting in the cloud computes the fare and communicates it back to the driver’s smartphone. As a result, the whole rideshare model depends on robust network coverage all over the city. We all crib about frequent call drops and Internet outages in many parts of any city. So, according to common wisdom, the aggregator cab model should break down frequently. But, it does not. Regardless of where you go, the driver’s app manages to access the network and find out the fare. In the very rare occasion that it does not, the driver can call the rideshare company’s call center and get the fare – this has happened to me only once in the 200+ rides I must have taken in the last 3-4 years. That’s, like, 99.5% success rate.

These two experiences forced me to think more deeply about this subject.

Soon, going by my own background, I began wondering if scalability was an obsession only among hardware professionals.

I spent a decade in the hardware industry before entering software. I learned about scalability during my stint in hardware and it was always in my mind in my subsequent two decades in software. However, it was only me.

Software performance has been an afterthought for most customers. Programmers in every software company I’ve worked for have blissfully gone on coding with the least amount of awareness of the infra on which their programs were going to run or concern about speed, response time and other nonfunctional parameters. But, because of my background, I’ve always been extremely anxious about the nonchalant approach I’ve seen in the software industry towards scalability.

But life went on.

Sure enough, the $hit would hit the fan in almost every project after going live. The server would keel over due to poor scalability. But the dev team would almost always implement a fix for the performance problem in 2-3 days. Easy peasy.

I’ve always come out of those experiences feeling that my deep anxieties about scalability were for nought.

I said “almost always” because there was one project where things were not so easy, and scalability became a make-or-break issue.

We developed a new payment system for a Top 5 UK bank. The customer put a clause in the contract stipulating that our software must deliver a throughput of 2400TPS at go-live. The figure was guided by the estimated volume of the new method of payment over the first five years following its go-live. We were required to benchmark our solution and prove this performance as a part of acceptance criteria.

The project manager who worked on the contract conveniently forgot about this clause and nobody else in the delivery organization had read the contract.

Many months later, we delivered the software.

Customer asked us for certification of 2400TPS. Needless to say, we didn’t have it.

Customer told us to get it.

We ran performance tests on the limited sized infra available with our own dev org. Not surprisingly, the software delivered only 360TPS. Since it was woefully short of the exit criteria, the customer rejected the software.

We then went to the customer’s preferred hardware vendor. The highest configuration of the server that they could provide us in India gave a throughput of 1200TPS. Again, not enough.

Finally, we had to visit the vendor’s high-end lab in France to conduct the benchmark. Thankfully, after a bit of tweaking, our software did pass the customer’s approval criteria.

While our scalability-insensitive movie had a happy ending, it also cost the company nearly $100K in out-of-pocket expenses.

* FPS Forecasted Volume for Just One UK Bank in 2008: 38 billion transactions / year.
* FPS Actual Volume for All UK Banks Put Together in 2019: 2.5 billion transactions / year.

LOL. Someone at that bank was extremely gung-ho about FPS offtake!https://t.co/6W7gxdnE30

— Ketharaman Swaminathan (@s_ketharaman) June 24, 2020

It’s now over ten years since the said payment system has gone live. Its actual throughput has not yet crossed 500TPS even once. All that hullaboo about 2400TPS was a waste of time and money in hindsight.

I found out during this period that the customer manager who put the 2400TPS clause in the contract began his career in hardware!

I’m now strongly inclined to believe that scalability is a bogey created by the hardware industry and that it only afflicts customer and vendor folks (like me) who came to software via hardware.