Archives 2021

Deploying the Pod– Scaling Up

The pod you’re about to deploy contains two containers. The first, Cloud SQL Proxy, establishes a connection to the Cloud SQL instance using permissions granted by the Google service account.

The second container holds the application. Unaware of its presence within Google Cloud or its deployment within a Kubernetes cluster, this application functions solely with the knowledge of its need to connect to a database. The connection details it requires are supplied through environment variables.

Scaling with a Horizontal Pod Autoscaler

In GKE Autopilot, as with other Kubernetes distributions, the number of instances (pods) for a service is not automatically scaled up and down by default as they are in Cloud Run. Instead, you can scale the number of pods in the cluster using a HorizontalPodAutoscaler. This will scale the number of pods based on the CPU usage of the pods. This is also slightly different to Cloud Run, as new pods are created when a threshold of CPU or memory usage is reached, rather than scaling based on the number of requests.

In the k8s directory, autoscaler.yaml defines the autoscaler. It is configured to scale the number of pods between 1 and 10 based on the CPU usage of the pods. The CPU usage is measured over 30 seconds, and the target CPU usage is 50%. This means that if the CPU usage of the pods is over 50% for 30 seconds, then a new pod will be created. If the CPU usage is below 50% for 30 seconds, then a pod will be deleted.

This helps ensure that there is sufficient capacity to handle requests, but it does not guarantee that there will be sufficient capacity. If there is a sudden spike in requests, then the pods may not be able to handle the requests.

However, as GKE Autopilot will automatically scale the number of nodes in the cluster, there will likely be sufficient capacity to handle the requests.

Exposing with a Load Balancer

When using Cloud Run, you did not need to expose the application to the internet. It was automatically exposed to the internet via a load balancer. For GKE Autopilot, you need to expose the application to the internet using a Kubernetes load balancer and an ingress controller.

GKE Autopilot does have an ingress controller built in, so you don’t need to worry about configuring NGINX or similar. You can use this by creating an ingress resource and then annotating your service to use the ingress controller.

This is a point where you take the generic Kubernetes configuration and annotate it with a specific Google Cloud configuration. In this case, annotate the service configuration for the fact service to use the ingress controller. Annotate the service with the following annotation to use the ingress controller:

For me, this returned sub 100ms response time, which was substantially better than with Cloud Run. It is a useful test to compare how GKE and Cloud Run compare for different workloads.

Kubernetes Configuration– Scaling Up

The project also contains several generic Kubernetes YAML configurations in the k8s directory. These would be the same for any Kubernetes platform and define how to deploy the application:

namespace.yaml

A namespace is a way to group resources in Kubernetes much like a project does in Google Cloud. This configuration defines a facts namespace.

deployment.yaml

In Kubernetes, the smallest deployable unit is a pod. This is made up of one or more containers. In this configuration, the pod contains two containers: the fact service instance and the Cloud SQL Proxy. A deployment is a way to deploy and scale an identical set of pods. It contains a template section with the actual pod spec.

service.yaml

A Kubernetes service is a way to provide a stable network endpoint for the pod with an IP address and port. If there are multiple instances of pods, it also distributes traffic between them and stops routing traffic if a readiness or liveness probe fails.

ingress.yaml

An ingress is a way to expose a Kubernetes services to the internet. Here you are using it to expose the fact service.

serviceaccount.yaml

A Kubernetes service account is a way to grant a pod access to other services. It is a way to provide a stable identity for the pod.

Implementation

With the preparation done, you are now ready to deploy the application to GKE Autopilot. First, you will deploy the application to connect to Cloud SQL, as you did with the Cloud Run implementation. Then you will configure Cloud Spanner and use that as an alternative.

Create a GKE Autopilot Cluster

Unlike Cloud Run, GKE Autopilot is a Kubernetes cluster, albeit a highly managed one, not a serverless service. You need to provision a cluster to run your application on.

If you have the kubectx command installed, you can enter it to list all the contexts in the kubeconfig file. This is all the clusters available to you. You should see the context for the cluster you just created and possibly any other Kubernetes clusters you have, for example, a local Minikube.

As GKE Autopilot is a fully managed Kubernetes cluster, the nodes are managed by Google, and you do not have access to them. For most people, this is a good thing, as managing a Kubernetes cluster yourself can get complicated very quickly.

Service Account Binding with Workload Identity

Kubernetes, like Google Cloud, has the concept of service accounts. These are a way to grant permissions to pods running in the cluster. You will create a Kubernetes service account and bind it to the Google service account you created earlier using Workload Identity. This will allow the pods to access the Cloud SQL instance.

This is not particularly straightforward, but when working, it provides a nice way of integrating workloads on Kubernetes with Google Cloud services without an explicit dependency on Google Cloud.

Executing this command isn’t directly creating the service account. Instead, it’s sending a declarative configuration to the Kubernetes API server. This configuration describes the desired state for a service account, namely how you intend it to exist within your Kubernetes environment.

The kubectl apply command allows you to assert control over the system configuration. When invoked, Kubernetes compares your input (the desired state) with the current state of the system, making the necessary changes to align the two.

To put it simply, by running kubectl apply -f k8s/serviceaccount.yaml, you’re instructing Kubernetes, “This is how I want the service account setup to look. Please make it so.”

Preparation– Scaling Up

There are some small changes to the fact service needed to prepare it for Kubernetes and Cloud Spanner.

Getting Ready for Kubernetes

For Cloud Run, you are not strictly required to configure health checks for the application. For GKE Autopilot, you will need to use the Kubernetes readiness and liveness probes to check the health of the application. This is a great way to ensure that the application is running correctly and is ready to receive traffic:

Liveness check

This indicates that a pod is healthy. If it fails, Kubernetes restarts the application.

Readiness check

This indicates that the application is ready to receive traffic. Kubernetes will not send traffic to the pod until it is successful.

As your Spring Boot application takes several seconds to start, it is helpful to use the readiness probe to ensure the application is ready to receive traffic before Kubernetes sends any.

Fortunately, Spring Boot provides a health endpoint that you can use for this purpose. You can configure the readiness and liveness probes to use this endpoint.

You use these endpoints in the Kubernetes configuration to configure the readiness and liveness probes.

Getting Ready for Spanner

There are a few things to consider when using Spanner. Although it is PostgreSQL compatible, it is not fully PostgreSQL compliant, and this means there are some limitations.

The first is that it does not support sequences, so it is not possible to automatically generate primary keys, as it was with Cloud SQL. This version of fact service in this chapter uses universally unique identifiers (UUIDs) for primary keys instead of an ID that is auto-incremented by the database.

Hibernate, the ORM library the fact service uses, has a nice feature of automatically updating schemas. This is not supported by Spanner, so you need to manually create the schema. Fortunately, the single table is simple in this case, so it’s not a big issue. However, this does add an extra step to the deployment process.

In Google Cloud Spanner, you can use the TIMESTAMP data type to store timestamp values. The precision of the TIMESTAMP data type is up to nanoseconds, but it does not store time zone information as Cloud SQL does. This means there is more information in the LocalDateTime Java type that can be stored in Spanner’s TIMESTAMP type.

To solve this issue, the common practice is to use two fields in your entity, one for the timestamp and another for the time zone. You store the timestamp as a String in a standardized format, like ISO 8601, and you store the time zone as another String. When you retrieve the data, you can parse the timestamp and apply the time zone. This is what has been done in this version of the fact service.

These are the type of limitations you need to be aware of when using Spanner; they are small but significant. It is not a drop-in replacement for PostgreSQL. An application written to work with Cloud SQL for PostgreSQL will not necessarily work with Spanner. However, an application written to work within the limitations of Spanner’s PostgreSQL will likely work with Cloud SQL for PostgreSQL. If you just target PostgreSQL, you will likely not be able to use Spanner without modification.

Tip

This is the trade-off you make when using a cloud native database. You get scalability and performance, but you lose some features of a traditional database. However, in this case, the benefits are large and the limitation relatively small.