By Meg Ramsey
Disney+ debuted with more than 500 movies, 7,500 TV show episodes … and more than 8,000 people unable to access the service.
As Disney explained, “demand for Disney+ has exceeded our highest expectations.”
While unexpectedly high demand is good news for Disney, it’s bad news for anyone who woke up ready to stream Toy Story, Avengers: Endgame, or The Mandalorian. But launch day problems are also not uncommon.
The Game of Thrones season 7 premiere crashed the HBO Now app. When WWE Network launched in 2014, its sign-up page crashed. More recently, Bungie had to pull down Destiny 1 and 2 for “emergency maintenance” after introducing an extension to Destiny 2.
Why do so many companies have launch day outages? Do they underestimate demand? Do they fail to properly test their systems before launch?
The answer, at least in the case of Disney+, is likely a little more complicated.
Disney+ had the best technology available
Unlike Netflix, Hulu, and Amazon Prime, Disney hasn’t been refining the design of its systems over the past decade to handle increasing demand for video streaming. It hasn’t spent years using a chaos engineering discipline like Netflix has to anticipate how unexpected events like failures and load issues could impact resiliency.
Disney was planning to splash right into the space with an extensive content library and high expectations. It needed help.
Disney invested $1.8 billion to gain a majority stake in BAMtech, which has run streaming services for Major League Baseball, WWE, and Hulu. As Wired put it, “BAMtech is the gold standard” for online streaming, with more than a decade of experience delivering video to mobile devices.
Disney not only had the technology in its corner, it also tested capacity by running a pilot of its streaming service in the Netherlands. It uncovered some technical glitches, but Disney quickly resolved them.
So what was the likely culprit?
Cloud native vs legacy applications
You could argue that BAMtech gave Disney a false sense of resiliency, and that launching the service in the U.S., Canada, and the Netherlands on the same day might have been too optimistic.
But ultimately the service’s Achilles heel was likely auto scaling. Hyperscale cloud platforms turn auto-scaling responsibility over to their customers, so Disney and its support team might have failed to set up their infrastructure appropriately.
But even if they did have auto-scaling enabled, their applications might not auto-scale all that well. Disney+ likely wasn’t developed as a completely cloud-native application, and its launch day outage could stem from a legacy application tied into the new platform that’s not as elastic as the platform needs it to be.
Combine that with the massive spike of eager Disney fans and you get the outage they experienced. Once Disney+ went down, the team was likely working to find the application, or potentially even database issue, causing the failure. Only then could they restore service to users.
How to prepare for outsized success
It’s not just launch days that can create big spikes in traffic. Black Friday shopping, for example, can overwhelm systems if you’re not prepared. Tools like application performance monitoring can help ferret out problems like this and should be used prior to any launch.
In the streaming wars, new services have to be ready for the masses immediately. Legacy streaming players benefited from years of organic growth and intentional design to handle this usage and traffic. New entrants don’t have the benefit of time and must architect their streaming platforms to scale dynamically to meet hugely hyped demand.
But it’s not just the cloud platform that must scale—the applications must also be able to scale at the same rate. The two have to stay in sync, or else users will see less of Cinderella and more of Wreck It Ralph.