Deploying the Pod– Scaling Up

The pod you’re about to deploy contains two containers. The first, Cloud SQL Proxy, establishes a connection to the Cloud SQL instance using permissions granted by the Google service account.

The second container holds the application. Unaware of its presence within Google Cloud or its deployment within a Kubernetes cluster, this application functions solely with the knowledge of its need to connect to a database. The connection details it requires are supplied through environment variables.

Scaling with a Horizontal Pod Autoscaler

In GKE Autopilot, as with other Kubernetes distributions, the number of instances (pods) for a service is not automatically scaled up and down by default as they are in Cloud Run. Instead, you can scale the number of pods in the cluster using a HorizontalPodAutoscaler. This will scale the number of pods based on the CPU usage of the pods. This is also slightly different to Cloud Run, as new pods are created when a threshold of CPU or memory usage is reached, rather than scaling based on the number of requests.

In the k8s directory, autoscaler.yaml defines the autoscaler. It is configured to scale the number of pods between 1 and 10 based on the CPU usage of the pods. The CPU usage is measured over 30 seconds, and the target CPU usage is 50%. This means that if the CPU usage of the pods is over 50% for 30 seconds, then a new pod will be created. If the CPU usage is below 50% for 30 seconds, then a pod will be deleted.

This helps ensure that there is sufficient capacity to handle requests, but it does not guarantee that there will be sufficient capacity. If there is a sudden spike in requests, then the pods may not be able to handle the requests.

However, as GKE Autopilot will automatically scale the number of nodes in the cluster, there will likely be sufficient capacity to handle the requests.

Exposing with a Load Balancer

When using Cloud Run, you did not need to expose the application to the internet. It was automatically exposed to the internet via a load balancer. For GKE Autopilot, you need to expose the application to the internet using a Kubernetes load balancer and an ingress controller.

GKE Autopilot does have an ingress controller built in, so you don’t need to worry about configuring NGINX or similar. You can use this by creating an ingress resource and then annotating your service to use the ingress controller.

This is a point where you take the generic Kubernetes configuration and annotate it with a specific Google Cloud configuration. In this case, annotate the service configuration for the fact service to use the ingress controller. Annotate the service with the following annotation to use the ingress controller:

For me, this returned sub 100ms response time, which was substantially better than with Cloud Run. It is a useful test to compare how GKE and Cloud Run compare for different workloads.