In case you are a daily listener to the podcast or an e-commerce watcher, you recognize “the season” is essential for us. It’s a yearly recurring theme within the podcast. You might be most likely additionally conscious that uptime and responsiveness of our app and web site are essential. And also you might need seen that enabling our software program engineers to carry out at their peak is essential for us. Enabling groups and engineers is what we do to construct an amazing place to engineer.
And generally issues simply go bitter. An ideal storm happens that’s undoubtedly not a tailwind…
As our CTO will say “by no means waste a great disaster”. We now have to be taught from what occurred. Let’s discover a kind of incidents. We return to the season begin of 2019. Simply earlier than the beginning of the Friday Afternoon Drinks, an enormous incident began in our Android App. This triggered downtime in different areas of the platforma as effectively. And perhaps identical to when investigating a airplane crash there is not only one factor that was off however a collection of unlikely issues occurred in a brief span of time. Let’s dive into this.
What the episode covers
- Why is studying from failures an essential matter to share?
- Some context, what a part of the panorama are we speaking about within the episode?
- What was your perspective? What had been you doing and what occurred?
- Taking a couple of steps again: What was the method of incident administration and the way did we step-by-step repair the problem?
- When the mud settled: What did we be taught? What did we enhance?
- Julius van Dis – Full-Stack engineer at Flock. He was chargeable for the app, particularly its direct backend. Among the initiatives he has finished embrace making the app and repair panorama multilingual, the migration and integration of a brand new gateway, creation of a basket API and improved app updates.
Peter Paul van de Beek