AWS Snowmobile: Because Data Is Still Transported Physically

AWS Snowmobile is an 18-wheel truck that physically transports Exabyte-scale data from customers’ premises to AWS Regions in a 45 feet container.

For the uninitiated:

  • 1 Exabyte = 1000 Petabyte = 1,000,000 Terabytes = 1,000,000,000 Gigabytes.
  • AWS Regions are separate geographic areas that Amazon uses to house its AWS infrastructure. These are distributed around the world so that customers can choose a region closest to them in order to host their cloud infrastructure there. When I last checked, there were 34 AWS regions in the world e.g. US East (N. Virginia) us-east-1, EU (Frankfurt) eu-central-1, Asia Pacific (Mumbai) ap-south-1.

When I heard about AWS Snowmobile, I posted the following update on social media:

@gtm360: AWS Snowmobile, 45′ long ruggedized shipping container pulled by a semi-trailer truck, physically transports Exabyte-scale data from Customer to AWS. Uploading 1 EB (1M TB) of data via 1Gbps pipe will take 260 years. Snowmobile will do it in 6 months. https://aws.amazon.com/snowmobile/

Please see the exhibit on the right for a back-of-the-envelope calculation of the 260 years figure mentioned above.

LinkedIn user Naval S asked me the following question:

Naval S: I’m curious if there is a categorisation of data heavy industries vs data light ones. And also curious about the independent variables that determine the size of the data in these organisations.

According to AWS Snowmobile website, the satellite imagery industry is data heavy, with video libraries and image repositories running into hundreds of petabytes or even exabyte.

I can also  bet that any industry that uses a lot of instrumentation will be data-heavy.

This includes chemical process plants with an array of instruments to measure pressure, temperature, volume, density, viscosity, and other vital process parameters. (Exceptions exist, more on that in another post.)

Logs, traces and metrics of IT systems are also data-heavy, regardless of the industry. These are typically records of system activities inside applications and infrastructure building blocks like server, storage, and networking.


In Is Scalability A Bogey Created By Hardware Industry?, I wrote about a benchmark we ran to prove that our payment hub had a performance of 2400TPS. The server vendor’s facility in India / APAC didn’t support such a high throughput. We carried out the benchmark at its facility in Southern France.

After completing the benchmarks, we had to analyze the logs, traces and metrics. We found out that they were hundreds of gigabytes in size. There was no way we could transfer them via Internet from France to our facility in India within a reasonable time. As a result, our engineers copied the data into multiple Magneto Optical drives and carried them by hand to Pune. (Phew, the data survived all the scanning underwent by their luggage at the three airports they transited through!)

That was in 2008.


Since then, there has been an explosion of bandwidth of WAN links. My ISP, for example, has been offering 1 Gbps Fiber-to-Home broadband internet connection for over a year.

One would think everything can be transmitted via Internet now.

Nope.

As AWS Snowmobile shows, physical transportation of data is still in vogue.

That’s because, along with the growth in Internet speeds, the amount of data generated has also exploded during this period. I’m guessing that speeds have grown linearly whereas data has grown exponentially.

My favorite example to illustrate the exponential growth in data is from transportation. In the past, when we took a traditional yellow cab or auto rickshaw, there was no data. Whereas, when we take an Uber today, our trip generates a huge data exhaust comprising expected route, actual route, fare, payment method, ratings, and so on.

It’s not only me. According to Georgetown Law Technology Review Technology Explainer entitled Re-Identification of “Anonymized” Data:

Today, almost everything about our lives is digitally recorded and stored somewhere. Every interaction with technology creates data about that user. Each credit card purchase, medical diagnosis, Google search, Facebook post, or Netflix preferences is another recorded data point about that individual user. Beyond that, every census report, home purchase, voter registration, medical history, and cell phone geolocation is recorded and stored.

Ergo data still needs to be transported physically and AWS Snowmobile is a thing.