Category Reintroducing Terraform

Installing Terraform – Deploying Skills Mapper

Terraform is a command-line tool that you can install on your local machine. It’s compatible with Windows, Mac, and Linux, and you can download it directly from the Terraform website. After downloading, you’ll need to add it to your system’s path to enable command-line execution. You can verify the installation by running terraform –version, which should return the installed version.

Terraform makes use of plugins that allow it to communicate with the APIs of service providers like Google Cloud. Not surprisingly, in this setup, you will mainly be using the Google Cloud provider. Terraform is not perfect, though, and it is common to come across small limitations. The Skills Mapper deployment is no exception, so there are a few workarounds required.

Terraform Workflow

Using the Terraform tool has four main steps:

terraform init

Initialize the Terraform environment and download any plugins needed.

terraform plan

Show what Terraform will do. Terraform will check the current state, compare it to the desired state, and show what it will do to get there.

terraform apply

Apply the changes to the infrastructure. Terraform will make the changes to the infrastructure to get to the desired state.

terraform destroy

Destroy the infrastructure. Terraform will remove all the infrastructure it created.

Terraform Configuration

Terraform uses configuration files to define the desired state. For Skills Mapper, this is in the terraform directory or the GitHub repository. There are many files in this configuration, and they are separated into modules, which is Terraform’s way of grouping functionality for reuse.

Preparing for Terraform

Several prerequisites need to be in place before you can deploy using Terraform.

Creating Projects

First, you need to create two projects, an application project and a management project, as you did earlier in the book. Both projects must have a billing project enabled. The instructions for this is are Chapter 4.

Ensure you have the names of these projects available as environment variables (e.g., skillsmapper-application and skillsmapper-management, respectively):

APPLICATION_PROJECT_ID
=
skillsmapper-application

MANAGEMENT_PROJECT_ID
=
skillsmapper-management

Reintroducing Terraform – Deploying Skills Mapper

In most of this book, you have been using gcloud commands to deploy everything. If you wanted to ship the product, you could do what I have done in the book and produce a step-by-step guide to the commands. However, it is easy to make a mistake when following instructions. What would be much better is to automate all those commands in a way that could consistently deploy everything for you with a single command.

One option would be to put all the commands in shell scripts. However, when using gcloud commands you are effectively calling the Google Cloud API in the background. What is better is to use a tool that makes the same API calls but is designed for this type of automation. This is the principle of infrastructure as code (IaC).

In this appendix, you have the opportunity to set up everything discussed in this book in one go with automation.

Note

The code for this chapter is in the terraform folder of the GitHub repository.

Reintroducing Terraform

The tool designated for automating the creation of infrastructure in this context is Terraform, an open source offering from HashiCorp. Terraform exemplifies an IaC tool, a concept briefly explored in Chapter 5 when it was utilized to deploy the tag updater.

While Google Cloud offers a similar tool called Deployment Manager, it is limited to supporting only Google Cloud. On the other hand, Terraform’s applicability extends to all public clouds and various other types of infrastructure. This broader compatibility has made Terraform more widely accepted, even within the Google Cloud ecosystem.

To understand the distinction between using Terraform and manual methods like gcloud commands or shell scripts, consider the difference between imperative and declarative approaches:

Imperative approach

Using gcloud commands or shell scripts is an imperative method. Here, you act as a micromanaging manager, explicitly directing the Google Cloud API on what actions to perform and how to execute them.

Declarative approach

Terraform operates on a declarative principle. Instead of micromanaging each step, you define a specific goal, and Terraform takes the necessary actions to achieve it. This approach is similar to how Kubernetes functions; you declare the desired state, and the tool works to realize that state.

The declarative nature of Terraform allows for a more streamlined and efficient process, aligning the tool with the objectives without requiring detailed command over each step.

What Terraform is effectively doing is taking the destination defined as a YAML configuration and working out the route to get there, provisioning the entire secure environment. This is reproducible and repeatable, so if you wanted to have multiple environments with the same configuration (e.g., dev, QA, and prod) you could build them with the same recipe, ensuring a consistent product.

Terraform also allows you to specify variables and compute values to customize the deployment. It also understands the dependencies between resources and creates them in the right order. Most importantly, it keeps track of everything that is created; if you want to remove everything, it can clean up after itself.

The code used to define the desired state also acts as a way of documenting all the infrastructure. If anyone wants to understand all the infrastructure used in the system, the Terraform configuration is a central source of truth. As it is code, it can be shared in a source code repository and versioned with an audited history. This means developers can issue pull requests for changes, for example, rather than having to raise tickets with an operations team. It is a great example of how a tool enables DevOps or SRE practices.

This appendix is here to help you use Terraform to deploy your own Skills Mapper environment. It is not intended to go into Terraform in depth. For that, I recommend the Terraform documentation or Terraform: Up and Running (O’Reilly) by Yevgeniy Brikman.

Professional Certification – Going Further-2

If you have diligently worked through this book, I suggest starting with the Associate Cloud Engineer exam, progressing to the Professional Cloud Architect, and thereafter, tailoring your certification journey based on your interests and career aspirations. Although there is no rigid sequence for taking the exams, there is some overlap between them, and the more you undertake, the easier they become. For instance, once you’ve prepared for the Professional Architect exam, the Professional Developer exam does not require a great deal of additional preparation. Following is the full list of certifications available at the time of writing:

Cloud Digital Leader

Focuses on a foundational understanding of Google Cloud’s capabilities and their benefits to organizations

Associate Cloud Engineer

Highlights the hands-on skills needed for managing operations within Google Cloud

Professional Cloud Architect

Concentrates on the design, management, and orchestration of solutions using a comprehensive range of Google Cloud products and services

Professional Cloud Database Engineer

Addresses the design, management, and troubleshooting of Google Cloud databases, with an emphasis on data migrations

Professional Cloud Developer

Emphasizes the design, build, test, and deployment cycle of applications operating on Google Cloud

Professional Data Engineer

Designed for professionals constructing and securing data processing systems

Professional Cloud DevOps Engineer

Covers DevOps, SRE, CI/CD, and observability aspects within Google Cloud

Professional Cloud Security Engineer

Prioritizes the security of Google Cloud, its applications, data, and users

Professional Cloud Network Engineer

Concentrates on the design, planning, and implementation of Google Cloud networks, having significant overlap with security concepts

Professional Google Workspace Administrator

Targets professionals managing and securing Google Workspace, formerly known as G Suite

Professional Machine Learning Engineer

Serves those involved in the design, construction, and operationalization of machine learning models on Google Cloud

The exams are not easy—that is what makes them valuable—but they are not impossible either. Different people will have different preferences for how to prepare. When I have prepared for exams, I prefer to do a little, often: an hour of reading or watching a video in the morning followed by an hour of hands-on experimentation in the evening. I find that this helps me to retain the information and to build up my knowledge over time. As I get closer to the exam, I do more practice exams; Google provides example questions for each in the exam guide, to get used to the style of questions and identify any gaps in knowledge to work on.

I have a ritual of booking my exam for 10 AM and having Starbucks tea and fruit toast followed by a walk before the exam. I arrive or set up in plenty of time, so I am relaxed. When the exam starts, I recommend reading questions very carefully, as there is often a small detail that makes all the difference to the answer.

Sometimes a difficult question can use up time; in this case, I flag it and move on. I also flag any questions I am not completely sure about and come back later. At the end of the exam, I am usually much more confident about my answers.

Often, there will be a piece of information in one question that may unlock a difficult question earlier on. Most importantly, if you are not sure, make a guess. You will not be penalized for a wrong answer, but you will be penalized for not answering a question.

When you finish and submit your exam, you will get a provisional pass or fail. Google does not give you a score or a breakdown to tell you which questions you got wrong (like AWS, for example). You will get an email a few days later with your final result. You may also receive a code to redeem for a gift from Google (at the time of writing and depending on the exam), which is a nice touch. You can also list your certification in the Google Cloud Certified Directory. For example, you can see my profile in the Directory site.

Tip

Resist the temptation to use exam dumps for preparation. These question compilations are often shared in violation of the exam’s confidentiality agreement and tend to be outdated and misleading. The optimal way to prepare is to tap into the vast amount of learning material available, get hands-on experience, and take the official practice exams.

I’ve interviewed candidates who relied on exam dumps, and it’s usually clear: they struggle with basic questions. These exams are meant to gauge your understanding and proficiency with the platform, not rote memorization of facts. Encountering a familiar question in the exam is not as gratifying as being able to answer based on a solid understanding and practical experience.

It is a great feeling when you pass, and if you find the experience useful, there are many other specialties. One thing to note is that certification expires after two years, so if you do many exams at once, you will need to do them all again in two years to stay certified. The exception is that the Cloud Digital Leader and Associate Cloud Engineer certifications are valid for three years. Good luck on your certification journey!

Professional Certification – Going Further-1

This book has aimed to lay a solid foundation for you to build upon. If you’ve come this far, you have covered a lot of ground, but there’s still much more to learn.

Fortunately, there’s a vast community of people who are eager for you to succeed and willing to lend a hand. Regardless of how good the platform is, the applications that run on it are only as good as the people who build them. The most daunting task any platform faces is not just attracting skilled individuals but also nurturing their success. This is true for Google Cloud as well; a scarcity of necessary skills can make organizations apprehensive about adopting the platform.

In 2021, for instance, Google pledged to equip 40 million people with Google Cloud skills. That is a huge number, equivalent to the entire population of California. From my perspective, Google is addressing this by promoting four key areas for Google Cloud learning:

  • Professional certification
  • Online learning resources
  • Community groups
  • Conferences and events

Professional Certification

Google, in line with other cloud providers, offers certifications on many facets of Google Cloud. These certifications are structured into general certifications and specialist certifications, which align with the common job roles in the industry.

Each certification requires passing an exam that is normally two hours long. The exam typically consists of 50–60 multiple-choice or multiple-select questions. However, don’t be fooled into thinking that the exams are easy. The questions are designed to test your knowledge and understanding of the platform, often requiring you to make a judgment on the best answer from several possible options. The questions are not designed to trick you but to make you think. They are not designed to test your ability to remember facts but to test your ability to apply your knowledge to solve problems.

A third-party provider administers these exams. Professional-level exams are priced at $200 plus tax (as of the time of writing); the Associate Cloud Engineer costs $125 and the Cloud Digital Leader is around $90. All these exams can be undertaken either at a testing center or from the comfort of your home, with a remote proctor overseeing the process via your webcam. Further information about the exams and registration can be found on the certification site.

The Cloud Digital Leader certification serves as the entry point. It is a foundational-level exam intended for individuals with no prior Google Cloud experience. It is a good place to start if you are new to Google Cloud; this certification is often pursued by less technical people wishing to grasp the basic understanding of Google Cloud. Nonetheless, it requires a surprisingly broad understanding of the diverse products and services Google Cloud provides.

The Associate Cloud Engineer certification is the next tier, aimed at individuals with 6+ months of Google Cloud experience. It is a good starting point for developers or administrators and covers the basics of Google Cloud, requiring a comprehensive understanding of the various products and services offered by Google Cloud. This exam also includes the most hands-on skills, such as gcloud commands, while remaining multiple choice. Even though it is promoted as an associate rather than a professional-level qualification, there is a substantial amount of material to cover, and the knowledge gap is not as large as it might initially seem.

In this book, you have covered content applicable to the Associate Cloud Engineer exam, Professional Cloud Architect, and Professional Cloud Developer. You also touched on aspects of the Professional Cloud DevOps Engineer in Chapters 12 and 13. The Professional Cloud Architect certification covers the broadest scope of the Google Cloud Platform and is often deemed the most challenging of the exams. All professional-level exams recommend over a year of Google Cloud experience.

How Will This Solution Scale? – Scaling Up

Here, you have seen a mixture of cloud native and traditional technologies. Although GKE Autopilot is not serverless, it is cloud native. As demand increases, more instances of the fact service will be created by the horizontal autoscaler. As more instances are scheduled, the GKE Autopilot cluster will automatically add additional nodes to deal with the extra pods.

GKE Autopilot also appears considerably faster to service requests than the same container running on Cloud Run. This could be down to the way networking is configured, with requests reaching the service by a more direct route.

This solution will not scale to zero in the same way as Cloud Run, and there will always need to be one pod running to service requests (if individual instances are still running in a single pod). Remember, however, that if demand suddenly increases, it will take a few minutes for both the GKE Autopilot cluster to provision the extra resources required for running the post and then for the pods to start.

While the service can be scaled almost indefinitely, the real bottleneck is the Cloud SQL database, which is not cloud native. There are two related limitations. The first is that the database cannot be dynamically scaled. You have to specify the tier of the machine used for the database, and while this can be changed manually with a database restart, it cannot change automatically in response to load. More importantly, there is a limit to the number of database connections from the instances of the services.

This means that if the instances increase without limit, they will exhaust the number of connections available to the database and fail to connect. For this reason, it is important to limit the number of instances so that the number (instances × connections per instance) is below the maximum number of connections available to the database.

However, you have seen that with some minor adjustments, you can allow the fact service to work with Google Cloud Spanner, a cloud native database with the potential to scale far beyond the limitations of Cloud SQL, creating a full cloud native solution.

How Much Will This Solution Cost?

Unlike Cloud Run, GKE Autopilot does not have a cost per request; you will be billed for the pods running on the cluster and a cluster management fee per hour. At the time of writing, the first 720 hours of cluster management per month are included per account, so you effectively get one cluster free.

The cost of pods is based on the amount of CPU, memory, and ephemeral storage requested by scheduled pods. This is billed per second. The most significant cost is for CPU. Therefore, it is very important to make sure the resources you request for your pod are adequate but not excessive. Remember that a Kubernetes pod can use additional resources up to the limit specified; the requested resources are the ones that are reserved.

As each pod is charged per second, it does not make sense to keep a pod running for a second longer than it needs to. Therefore, using horizontal autoscaling to dynamically increase and decrease the number of running pods to fit demand will help keep costs down.

The cost of Cloud Spanner in this minimal configuration is under $100 per month. That is still ten times the cost of a minimal Cloud SQL instance. However, another advantage of the cloud is that it allows you to experiment with services like advanced databases for short periods, without the massive outlay of money or effort you would have if you were to experiment on-premises. On the cloud, you just switch off the service again and stop paying, so if you wanted to try Spanner for an hour for a few cents, you can.

Summary

This chapter should have given you a glimpse at how you can go further in Google Cloud. However, it is a powerful platform with many services and features. There is a lot more to learn.

For this project, you used the following services directly:

  • GKE Autopilot is used as the container runtime to run the container.
  • Cloud SQL is used as the database backend for the application.
  • Cloud Secrets Manager is used to securely store the database password.
  • Cloud Spanner is used as an alternative database backend for the application.

Chapter 15 wraps up your Google Cloud journey and looks at some options for further learning.

Switching to Spanner – Scaling Up

Google Cloud Spanner is Google’s fully managed, scalable, relational database service. Cloud Spanner is designed to offer the transactional consistency of a traditional relational database plus the scalability and performance of a NoSQL database.

Unlike Cloud SQL, Cloud Spanner is cloud native and can scale horizontally and globally. Although it can be very expensive, it is a good fit for large-scale applications. While it is certainly overkill for the fact service at the moment, it is useful to demonstrate how to use it and how switching from Cloud SQL is possible.

The cloud-agnostic version of the fact service used in this chapter knows nothing about Google Cloud. Although it connects to a Cloud SQL database, it connects through a proxy. As far as the Spring application is concerned, there is a PostgreSQL instance running on the local host it can talk to using the PostgreSQL wire protocol. The Cloud SQL Proxy is taking care of all the networking, encryption, and authentication required.

While you can connect to Cloud Spanner natively using client libraries, it is also possible to connect to Cloud Spanner via a proxy, similar to how you have with Cloud SQL. The PGAdapter provides a PostgreSQL-compatible interface to Cloud Spanner, as again the client application can treat it as a PostgreSQL database running on the localhost. There are several different options for running the PGAdapter as a standalone Java process, a Java library, or a Docker container. As the fact service uses Kubernetes, the easiest is to use the Docker image provided by Google as a sidecar container in the same way as the Cloud SQL Proxy.

Spanner instances are configured to have a specified number of processing units. This computed capacity determines the amount of data throughput, queries per second (QPS), and storage limits of your instance. This was previously the number of nodes in the cluster, with one node being equivalent to 1,000 processing units, and one node being the smallest configuration.

This meant there was no cheap way of using Spanner. Now it is possible to specify a minimum of 100 processing units, which is equivalent to 0.1 nodes. This is a much more cost-effective way of using Spanner for small applications, development, and testing.

When creating a Google Cloud Spanner instance, you’ll often observe a notably quick provisioning time compared to Cloud SQL. This expedited setup stems from Spanner’s unique architecture, designed for horizontal scaling across a global and distributed system. Instead of allocating traditional “instance-specific” resources, as many relational databases do, Spanner simply reserves capacity within its pre-existing, distributed infrastructure. On the other hand, Cloud SQL requires time-intensive provisioning because it establishes traditional database instances with designated resources (CPU, memory, storage) based on the user’s configuration. With Spanner, you’re seamlessly integrating into a vast, already-established system, while with Cloud SQL, you’re carving out a more personalized, dedicated space.

Deploying the Pod– Scaling Up

The pod you’re about to deploy contains two containers. The first, Cloud SQL Proxy, establishes a connection to the Cloud SQL instance using permissions granted by the Google service account.

The second container holds the application. Unaware of its presence within Google Cloud or its deployment within a Kubernetes cluster, this application functions solely with the knowledge of its need to connect to a database. The connection details it requires are supplied through environment variables.

Scaling with a Horizontal Pod Autoscaler

In GKE Autopilot, as with other Kubernetes distributions, the number of instances (pods) for a service is not automatically scaled up and down by default as they are in Cloud Run. Instead, you can scale the number of pods in the cluster using a HorizontalPodAutoscaler. This will scale the number of pods based on the CPU usage of the pods. This is also slightly different to Cloud Run, as new pods are created when a threshold of CPU or memory usage is reached, rather than scaling based on the number of requests.

In the k8s directory, autoscaler.yaml defines the autoscaler. It is configured to scale the number of pods between 1 and 10 based on the CPU usage of the pods. The CPU usage is measured over 30 seconds, and the target CPU usage is 50%. This means that if the CPU usage of the pods is over 50% for 30 seconds, then a new pod will be created. If the CPU usage is below 50% for 30 seconds, then a pod will be deleted.

This helps ensure that there is sufficient capacity to handle requests, but it does not guarantee that there will be sufficient capacity. If there is a sudden spike in requests, then the pods may not be able to handle the requests.

However, as GKE Autopilot will automatically scale the number of nodes in the cluster, there will likely be sufficient capacity to handle the requests.

Exposing with a Load Balancer

When using Cloud Run, you did not need to expose the application to the internet. It was automatically exposed to the internet via a load balancer. For GKE Autopilot, you need to expose the application to the internet using a Kubernetes load balancer and an ingress controller.

GKE Autopilot does have an ingress controller built in, so you don’t need to worry about configuring NGINX or similar. You can use this by creating an ingress resource and then annotating your service to use the ingress controller.

This is a point where you take the generic Kubernetes configuration and annotate it with a specific Google Cloud configuration. In this case, annotate the service configuration for the fact service to use the ingress controller. Annotate the service with the following annotation to use the ingress controller:

For me, this returned sub 100ms response time, which was substantially better than with Cloud Run. It is a useful test to compare how GKE and Cloud Run compare for different workloads.