TECH | STORIES
Weekly Technical Case Studies #1–2025
Your weekly technical stories you wish you were part of
Hi I am Dipto , every week I read some interesting case study related to tech and share them here.
Here are the stories for this week.
Pokemon GO Scaling using GCP and Kubernetes
When Pokemon go launched at 2016 its popularity exploded , the engineers had expected 5x traffic but the traffic surpassed 50x within days. This was a problem which had to be resolved.
The engineers determined the problem to be at three main areas
- Legacy infrastructure dependence
- Engineers need to manually scale
- Consistent shared game sharing was difficult
The migration of Pokemon Go to k8s is a fantastic case study because it touches so many different types of cloud services
- Migration to docker containers
- Adopting 10’s of 1000’s Kubernetes nodes for orchestration of container clusters at scale
- Migration to more than 5000 cloud spanner from google datastore
- Using GCLB to handle 400k requests per second and scale to 1 million during peak load
All these improvements helped Pokemon Go to achieve world wide scale and strong infrastructure for millions of players worldwide. They were able to handle 50x more traffic and process 5 to 10TB of data everyday.
If you would like to read more this is the link — https://cloud.google.com/blog/topics/developers-practitioners/how-pokemon-go-scales-millions-requests
Airbnb improving propagation delay
Airbnb had migrated to Istio service mesh to monitor communication between their services effectively which was not able to scale using their previous implementation of Smartstack service mesh.
However there was a problem.
- After a version upgrade of Istio there was an increase in the propagation delay — the time it took for configuration changes from the control plane to workloads and thus the investigation started.
- Investigations found that the LRU cache was locking both the reads and writes slowing down processing when multiple threads accessed the same lock.
- This was in turn delaying Istio’s handling of resources like endpoint discovery , cluster discovery and increased the latency of a key metric the propagation delay metric relied upon.
To investigate further the CDS and RDS cache was turned off and on analysis found a bug in the istio codebase where an important flag was not respected enough.
Furthermore there was another major issue- The debounce process which used to batch config changes and push them was taking more than the configured time.
This was due to an expensive go reflection for deep copying of data and inefficient processing of virtual services for each proxy.
This was optimized by passing by reference and removing the expensive copystructure library used by golang and the long term fix was to collaborate with istio community for optimizations.
These changes reduced the propagation delays significantly improving system reliability during deployments.
Read more about it here -> https://medium.com/airbnb-engineering/improving-istio-propagation-delay-d4da9b5b9f90
Targets failed expansion in Canada
In 2011 Target decided to expand into Canada however there were two major problems.
Canadian currency was different from dollars and they used a different metric system which meant inventory system that was set up to handle US dollars would need to be updated to handle Canadian dollars. Conversion methods would need to be added.
So Target aimed to streamline its supply chain by implementing a new ERP system tailored for its Canadian operations. However, tight deadlines and undertrained employees led to critical data entry errors during implementation.
Approximately 70% of the data entered into the ERP system due to
- poor training
- rushed timelines.
For example, incorrect product dimensions caused shipping errors, while pricing discrepancies made goods unaffordable for Canadian consumers. The system failed to synchronize inventory levels between warehouses and stores effectively.
The scale of the error was so massive that target had to close 133 stores and by 2015 had to file for bankruptcy. It was able to bounce back but the issue was devastating.
Read more about it here -> https://www.henricodolfing.com/2019/09/case-study-target-canada-failure.html
I will publish weekly interesting articles to read , subscribe if you are interested in such stuff and want to be aware of technical case studies.