Rightmove’s Journey to Cloud

Interested in joining us?

Check out our current vacancies
Rightmove’s Journey to Cloud featured image

Rightmove is so much more than just a property portal. We provide a huge range of services to estate agents, new home developers, mortgage lenders, surveyors and more. Everything is delivered either via an API, mobile app or web interface. We have around 160 microservices, a small amount of our original monolith and many other supporting applications.

We typically serve 50 million pages view and upload 10,000 properties daily. Collectively we see 1.3 billion minutes spent on the website every month! We have around 17 product teams, consisting of 200+ software engineers, product, and data team members. The infrastructure services we use to power the application stack are constantly evolving.

The Journey So Far

Historically we’ve been running out of a heavily code-driven, active-active-active, three-datacentre architecture. It’s been incredibly resilient and served us well. However, as the application stack changes more rapidly and becomes more dynamic this infrastructure was struggling to keep up and the overheads of running a datacentre model were starting to add up. Alongside this, it was difficult to experiment with new features and provide true business agility. This is why we’ve turned to the cloud.

2021

Our cloud journey started in 2021 with the definition of a high-level blueprint to help shape our requirements and a detailed assessment of cloud providers. In April 2021 we selected Google Cloud Platform as our preferred partner, agreed on commercial terms, and began working on Phase 1 – Foundations and Landing Zone.

By the end of 2021, we had the set up the people, process and technology, for a secure, scalable, hybrid cloud landing zone that could host containerised workloads in Google’s managed service for Kubernetes (GKE).

2022

In 2022 we built out the capability of the platform to handle large-scale 24/7 production traffic as well as expanding the feature set to support more applications and use cases. By the end of 2022, following a huge effort from our teams, Sold Property Search was running 100% in cloud and we had a small percentage of core Property Search traffic running in cloud.

This was running on a platform that would scale to the 14 million searches done every day and provide the industry-leading uptime we wanted. We had two further pilot applications running 100% in cloud and had migrated the entire Rightmove Landlord and Tenancy services platform to GCP.

By the end of 2022 we had expanded the feature set and enabled the use of a machine learning service to scan all property images to stop the wrong content from getting onto the website and we enabled secure cloud storage that could be used for holding lettings information, plus many other things.

2023

This year, we plan to accelerate. We’re migrating nearly 100 of our microservices to the new platform as well as other key infrastructure services and large-scale databases. We’re also building out a new data architecture to provide better insights and enable us to build better products for customers and consumers.

What is the “Project Factory” and why do we use it?

In Google Cloud, a Project is a fundamental organisational unit used to manage and isolate resources (like Kubernetes clusters, storage buckets or networking resources), it’s also used to isolate permissions, and billing.

Projects are the basis for any new initiative that will be undertaken on the platform. They need to be configured with the correct identity and access management controls, network settings, common APIs, monitoring and logging, as well as Terraform configuration and pipelines for deployments.

We needed a way to enable a more self-service platform yet maintain governance and consistency in everything that gets deployed. Enter the Project Factory. It enables us to automatically provision Projects that are pre-configured for the Rightmove GCP platform, simply by submitting a Pull Request.

Our custom Project Factory Terraform module calls the Google maintained open source Terraform module that aligns to Google Cloud Platform best practices. This allowed us to be more efficient when developing the code and keeps us up to date with Google Cloud Platform API changes. We follow this pattern for most of our Terraform development where possible.

Evolution of the Project Factory

As the Project Factory worked so well, we continued to add additional modules, including the code to configure independent Terraform CICD pipelines (to increase CICD security) this resulted in the Project Factory Terraform statefiles becoming too large and cumbersome to manage when many people are making changes, much like a monolithic application.

We’re now in the process of splitting the statefiles into more discrete units based on service type and/or Project. This means that Terraform pipelines will now run from refresh to apply on much smaller statefiles meaning much faster deployments, and we will be able to make changes to multiple Projects, in multiple environments, by multiple people at the same time. 

Why are coding standards for Terraform important?

Terraform is the defacto standard language for infrastructure as code. Love it or loath it, you’ve got to maintain it.

Having standards for your Terraform code is important for maintaining code quality, readability, reliability, and security. They ensure your platform remains reliable, agile and scalable.

It’s even more important when using a wrapper like Terragrunt. We use Terragrunt to simplify the management of multiple Terraform modules across different environments. We also use it for remote state management as it enables you to keep your code DRY (Don’t Repeat Yourself).

We have defined naming standards for Terraform modules, local resources, Terragrunt folders & repository names.  We have defined coding standards that say no variables should have default values and that variables must be explicitly declared in each Terragrunt environment. We have defined a repository structure that separates Terraform code from Terragrunt environment configuration.

These standards help to keep our environment deployable and our platform healthy. Our Terraform coding standards are based on Linting and “policy as code” tools as part of the pipeline to maintain these standards as well as comply with security guidelines.

What is our approach to living in the cloud?

We continue to work really closely with our Finance and Security teams, as we have done from day zero.

We work primarily with a single cloud provider to reduce complexity, increase time to value and leverage economies of scale. Multi-cloud comes next as our datacentre footprint reduces. 

We only run workloads in cloud that are cloud ready, we really only deal with containers, having moved from Docker Swarm/UCP on premise to Kubernetes in GKE in Google Cloud. No large scale lift and shift. No VMs running website applications. To hear more about our journey to containers see this post by Fin Evans – our journey into the world of containers.

We use PAAS where we want the team to upskill and learn and SAAS where we need to move services more quickly and don’t want to deal with the overheads.

We only purchased SAAS bolt-on products when we felt the pain of missing core functionality from the cloud provider. 

We’re prioritizing the migration of services that would otherwise need hardware replacement/datacentre spend.

We ensure the open source products we select have a strong community. 

We’re adopting a platform-as-a-product way of thinking and working, embedding a platform product owner in our team and delivering with cross functional squads.

This is only the beginning of our journey to cloud and we have lots more to do. Keep checking this blog for updates as we continue evolving the platform that powers Rightmove.

Cloud,  Software Engineering
Andrew Tate author photo

Post written by

Andrew's teams provide and progress the infrastructure, data stores, site reliability, platforms, and tooling that power the Rightmove website. They also enable Product teams to safely, rapidly, reliably and frequently release new functionality. In his spare time, Andrew keeps active and enjoys being outdoors.

View all posts by Andrew Tate
%d bloggers like this: