Engineering

Blue-Green Deployment Strategies for PCF Microservices

By | | 6 min read


Summary
Pivotal Cloud Foundry’s powerful capabilities make it easy to implement a blue-green deployment in a microservices architecture. Here’s how to do it right.

Blue-green deployment is a well-known pattern for updating software components by switching between simultaneously available environments or services. The context in which a blue-green deployment strategy is used can vary from switching between data centers, web servers in a single data center, or microservices in a Pivotal Cloud Foundry (PCF) deployment.

In a microservices architecture, it’s often challenging to monitor and measure the performance of a microservice updated via blue-green deployment, specifically when determining the impact on consumers of the service, the overall business process, and existing service level agreements (SLAs).

But there’s good news. PCF—with its built-in router and commands for easily managing app requests, and its sound orchestration of containerized apps—makes implementing a blue-green deployment trivial.

Our Example App

In this blog, we’ll focus on a simplified PCF Spring Boot microservice that implements a REST service to process orders, and a single orders endpoint for posting new orders. For simplicity’s sake, this service has a single “downstream” dependency on an account microservice, which it calls via the account service REST API.

You’ll find the example app here: https://github.com/jaholmes/orderapp.

The order-service will use the account-service to create an account. In this scenario, an order is submitted and a user requests a new account created as part of the order submission.

Deploying the Green Version of the Account Service

We will target the account-service to perform a blue-green deployment of a new version of the account-service, which includes some bug fixes. We’ll perform the blue-green deployment using the CF CLI map-route and unmap-route commands.

When we push the account-service app, we’ll adopt an app-naming strategy that appends -blue or -green to the app name, and assume our deployment pipeline would automatically switch between the two prefixes from one deployment to the next.

So our initial deployment, based on the manifest.yml here, would be:

$ cf push account-service-blue -i 5

After pushing this app, we create the production route and map it to the blue version of the app using cf map-route.

$ cf map-route account-service-blue apps.<pcf-domain> –hostname
prod-account-service

Then we create the user-provided service.

$ cf cups account-service -p ‘{ “route”:
“prod-account-service.apps.<pcf-domain”}’,

And bind the order-service to this user-provided service by referencing it in the order-service’s manifest.yml.

When pushed, the order-service app consumes the account-service route from its environment variables, and uses this route to communicate with the account-service, regardless of whether it’s the blue or green version of the account-service.

The initial deployment shows both the order and blue account-service running with five instances.

Both services have started the AppDynamics APM agent integrated in the Java buildpack and are reporting to an AppDynamics controller. The flowmap for the order service/orders endpoint shows the requests flowing from the order-service to the account-service, and the response time averaging 77 milliseconds (ms). A majority of that time is being consumed by the account-service. The instances are represented as nodes on the flowmap:

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cid2B4E3F40-83F5-7B4D-B29E-9B1B49406216.png

Performing Blue/Green Deployment

Now we’re ready to push the updated “green” account-service that implements fixes to known issues. We’ll push it and change the app name to “account-service-green” so it’s deployed separately from account-service-blue.

$ cf push account-service-green -i 5

At this point, the Apps Manager shows both versions of the account-service app running, but only the blue version is receiving traffic.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidF0181812-9B19-B54A-B3C1-3DE28C3FF7D8.png

We can validate this by referencing a monitoring dashboard that displays call per minute and response time for blue and green versions of the account-service. Below, the dashboard shows no activity for Account-service-green.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cid02E85E8C-9A07-894A-8302-6B6DD613D738.png

This dashboard distinguishes between blue and green versions by applying a filtering condition, which matches node names that start with “account-service-blue” or “account-service-green.”

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidFC785568-6013-4749-8B4A-807807B560C8.png

This node-matching criteria will match the nodes’ names assigned by the AppDynamics integration included in the Java buildpack, which uses the pattern <pcf-app-name>:<instance id>. Below is a list of the node names reporting under the accountservice tier that shows this pattern.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidBD99BA86-DAF3-2840-9AB2-A0D698A76677.png

To complete the blue-green deployment, we use the cf map-route command to change the routing from the blue to green version.

$ cf map-route account-service-green apps.<pcf-domain> –hostname prod-account-service

$ cf unmap-route account-service-blue apps.<pcf-domain> –hostname prod-account-service

This instructs the CF router to route all requests to prod-account-service.<pcf-domain> (used by the order-service app) to the green version. At this point, we want to evaluate as quickly as possible the performance impact of the green version on the order-service.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidAAD007D3-3552-654C-81FD-960FA40D05A2.png

Our blue-green dashboard shows the traffic has indeed switched from the blue to the green nodes, but performance has degraded. Our blue version of the account-service was averaging well below 100 ms, but the green version is showing an uptick to around 150 ms (artificially introduced for the sake of example).

We see a proportional impact on the order service, which is taking an additional 100 ms to process requests. This could be a case where a rollback is necessary, which again is straightforward using cf map-route and unmap-route.

Baselines and Health Rules

Rather than relying strictly on dashboards to decide whether to rollback a deployment, we can establish thresholds in health rules that compare performance to baselines based on historical behavior.

For example, we could create health rules with thresholds based on the baseline or average performance of the order-service, and alert if the performance exceeds that baseline by two standard deviations.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cid9CBE7643-13AE-C548-BF6A-670052152CAD.png

When we deployed the green version, we were quickly alerted to a performance degradation of the order-service, as shown in our dashboard:

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidB976C4A2-3755-DF4A-83B7-680C9F7045BA.png

We also defined a health rule that focuses on the aggregate baseline of the account-service (including average performance of all blue and green versions) to determine if the latest deployment of the account-service is behaving poorly. Again, this produced an alert based on our poorly performing green version:

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidB2371FA1-3024-CC48-B497-36784BEA6E94.png

The ability to identify slower-than-normal transactions and capture detailed diagnostic data is also critical to finding the root case.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidF2E1050C-6C99-2044-B133-2774A54C6209.png

In the case of the slower account-service version, we can drill down within snapshots to the specific lines of code responsible for the slowness.

/var/folders/3m/33kp4svn2vn977nnjhf51zz40000gp/T/com.microsoft.Word/WebArchiveCopyPasteTempFiles/cidFA81DD41-3E93-5041-9D07-B3EF34EE7D25.png

Monitoring Impact on a Business Process

In the previous dashboards, we tracked the impact of an updated microservice on clients and the overall application. If the microservice is part of a larger business process, it would also be important to compare the performance of the business process before and after the microservice was updated via blue-green deployment.

In the dashboard below, the order microservice, which depends on the account microservice, may be part of a business process that involves converting offers to orders. In this example, conversion rate is the key performance indicator we want to monitor, and a negative impact on conversion rates would be cause to consider a rollback.

The Power of APM on PCF

While PCF makes deploying microservices relatively simple, monitoring the impact of updates is complex. It’s critical to have a powerful monitoring solution like AppDynamics, which can quickly and automatically spot performance anomalies and identify impacts on service clients and overall business processes.