Scaling on Google App Engine

Last Updated: 24 Apr 2025

This blog post is part of the Writing An API series

On a recent mobile app project using Google Cloud Platform, we had issues with instances not scaling as expected. So we took a closer look to understand what was happening.

Instances

For context, we should start by defining instances: “...a virtual machine (VM) hosted on Google's infrastructure.”

On Google Cloud Platform, you are billed depending on how many instances you are using at a time and for how long. Getting the scaling of them correct is really important so that you can make sure you are only using resources when you need to, and only paying for what you are actually using.

App Engine Scaling

When we started out using App Engine we used the basic scaling configuration as, from first glance, that seemed to suit our needs perfectly.

<basic-scaling>
 <max-instances>100</max-instances>
</basic-scaling>

Our scaling configuration.

Quite quickly we found that costs were spiralling and we seemed to have instances running permanently when we weren’t intentionally using them.

After long investigations and consulting with Google Support, they informed us:

“The main issue is that App Engine starts a background thread (/_ah/background) in your module for logging purposes, and this background thread keeps running until the HTTP request times out.”

They then suggested we move to automatic scaling using min idle instances, and to say we want to scale down to no instances when there is no load.

Automatic scaling does not support background threads at all. This means that no background thread nonsense could stop our instances from scaling down. Although it felt like a work around, we took on their suggestion.

Basic versus automatic

It appears that basic and automatic scaling are essentially the same. The difference being automatic scaling works based on a few key metrics like request rate and response latencies.

https://cloud.google.com/appengine/docs/standard/java/an-overview-of-app-engine#scaling_types_and_instance_classes

<automatic-scaling>
 <min-idle-instances>0</min-idle-instances>
 <max-idle-instances>0</max-idle-instances>
</automatic-scaling>

Using this configuration we finally started to see the scaling nature we expected. Instances started with high load and shutdown within 15 minutes of going idle. This page gives more information so take a look - you'll need to scroll down a little to the relevant section: https://cloud.google.com/appengine/pricing

High load vs cost

For us, cost was a significant consideration. So we theorised about how to optimise for cost in a high load scenario.

If you want your high load to be completed as quickly as possible, then you should just kick off all the tasks at the same time. This will cause App Engine to spin up loads of instances to get through your queues as quickly as it can.

However, the 15 mins inherent cost to an instance means the fewer instances you start, the lower the total time up to a point. Therefore, if you have control over the load it’s worth strategically exploring limiting this load.

Enjoyed this post? Why not check out the next in the Writing An API series