Culture at Skyscanner – Working at a Cloud Native business

In their Wired feature article Skyscanner showcase their corporate culture.

The company has enjoyed explosive growth. Born in Edinburgh in 2003 the business has quickly escalated from a startup into a global giant with over 1,000 staff and 100 million customers.

VentureBeat describes their funding rounds and 2016 acquisition by Ctrip for an eye watering £1.4 billion.

Check out their latest job postings here.

Squads and Tribes

In the Wired article CTO George Goodyer describes a culture and organizational model that is now known as a ‘Cloud Native’ approach.

Digital pioneers like Netflix and Amazon have entirely transformed how businesses develop and deploy software, such that they can rapidly iterate new releases at a much faster rate and thus out perform their competitors online.

This is achieved through breaking up the ‘monoliths’ of large enterprise systems into small, modular ‘microservices’ and similarly, breaking up the corporate structures that maintained them, into small, autonomous product teams.

Goodyer explains how this approach is implemented at Skyscanner through a system of ‘Squads and Tribes’:

Squads are small teams, focused on one particular part of the company’s digital infrastructure – the app design, for example, or security on the site. They’re made up of six to eight engineers, plus a line manager and, depending on the subject the squad deals with, there may also be designer, a data scientist or someone from the commercial team. Put five to ten of these squads together, and you’ve got a tribe.

The ultimate objective is empowered, accountable teams. To maintain the same startup dynamism that founded the company Goodyer needs developers that move very quickly and innovate, while still taking responsibility for and learning from mistakes that inevitably occur.

This speed can’t be achieved through a traditional departmental micro-management culture, and instead the Cloud Native ethos is one of small, self-organizing product teams, who take end-to-end ownership of the services they develop and critically, maintain as well.

Scaling a digital business through a Cloud Native architecture on AWS

For an example of the technology aspects of the Cloud Native approach in action, check out this AWS tutorial video, featuring Paul Gillespie from Skyscanner, who walks through their Kubernetes clusters architecture.

What’s the secret to making this reliable at scale? Diversification—multiple Availability Zones, multiple Regions, multiple clusters, and multiple Amazon EC2 instance types.

You’ll learn how they leverage EC2 Spot, Auto Scaling, ELB and some novel design patterns to make their Kubernetes clusters both cost-effective and highly-available.

At 1:17, Paul begins to explain why the Kubernetes is made to run on the EC2 instance in their Skyscanner infrastructure, instead of opting for other approaches.

He further adds that Kubernetes cluster is the ultimate portion of the infrastructure involved in SkyScanner. At 1:59, he explains the presence of ASG (Auto Scaling Groups) and the availability zones. He states that the individual ESG gets deployed across multiple instances. In Skyscanner, 5 ASG’s are present which eventually provides the required diversification.

At 3:03 he explains how well the scalability is taken care of at SkyScanner with the help of this kind of infrastructure. The presence of potential 120-130 nodes helps in catering to many users’ needs. At 3:24, he begins to explain the type and ways of how the queries are dealt with.

The single cluster in the busy regions will generally be between 60 to 70 thousand queries per second. At 4:44, he explains how the presence of a hundred percent spot technology for the running of Kubernetes cluster. At 5:02, Paul states that the shelter script will prevent the Kubernetes scheduler terminating the spot.

The presence of Auto Scaling Groups

At 6:08, Paul explains that Kubernetes must be notified about the presence of the nodes and the nodes that are currently unavailable as well.

At 6:27 he adds that the sharp proxy and the sensations run on all the nodes. He further adds that the team at SkyScanner encountered the problem with the cost of the scaler while building the infrastructure. There was an issue from the auto-scaling when the single ASG reached zero. The clusters had to be restarted often. At 7:16 he states that this problem was resolved using the Londo patch.

At 7:37, Paul states that the reserved instances are used mainly because of the diversification, their cost and to build the multiple clusters efficiently. The reserved instances also allow the use of different instance types.

At 8:15 he states that the different workloads get considered to ensure that all test case queries get passed successfully. Paul, being a part of the infrastructure team at Skyscanner explains the importance of achieving scalability with minimal cost. He puts forward the idea of reserved instances and patch to achieve efficient Kubenertes cluster functionality.

 

Related Articles

Responses

Your email address will not be published. Required fields are marked *