Skip to content

CloudBuild GKE Autopilot Private Cluster on New VPC

This method is to create a brand new VPC, dedicated for Cloud Build Private Pool usage, aiming to provide a fixed external IP for Cloud Build Private Pool without interfering with the existing VPC.

Scenario

  • Ensure all network traffic is internal, but want the convenience of Autopilot GKE, so Private Cluster is needed
  • If you want to connect to CloudSQL with a private IP without using CloudSQL Auth Proxy, then Autopilot GKE needs to use Private Cluster
  • The red box in the diagram below is the scope of this implementation. GKE uses a private cluster, CloudBuild below uses a private pool with an independent VPC, while CloudBuild above without a red box uses a public pool for CI. Details can be found in CloudBuild triggered by Pub/Sub

CloudBuild GKE Autopilot Private Cluster Architecture

Process and Explanation

  1. Create a VPC network and enable Subnet PGA and NAT Gateway, so the pods in GKE will use the NAT Gateway for external networking
  2. Create a GKE autopilot Private Cluster
  3. (Optional) Use GCE as a bastion host, the service account must have Kubernetes Engine Developer permissions to get the GKE credentials
  4. Create Artifact Registry so GKE Private Cluster can directly connect through PGA
  5. Create a VPC network (red box in the diagram above) dedicated to CloudBuild and the necessary routes
  • Create Cloud Build and use a Private Pool to ensure data flow remains within the private network with a fixed IP, thus an independent VM as a bastion from CloudBuild to GKE is needed
    • The process becomes CloudBuild to internal IP -> VM (static internal IP & static public IP) -> GKE(public IP)
    • When setting up the private pool for CloudBuild, do not check the option for external IP to ensure traffic stays within the VPC, but this means it cannot access the internet. For installing packages, it is recommended to separate private pool execution
  1. Reserve the following two network segments to be unused by any services:
  • CloudBuild reserves 192.168.10.0/24
  • Docker bridge reserves 172.17.0.0/16 (GCP must avoid using this segment)

Execution Steps

Create GKE VPC

This VPC is to provide external access for GKE Private cluster and is used for running internal services, not for Cloud Build Private Pool external access.

Create a new VPC

Enter the network name and choose Custom in Subnets

Create new VPC network

  1. Enter a name
  2. Enter a custom subnet
  3. Enable PGA
  4. Done

Create new VPC subnet

Click Create to create the VPC

Create new VPC

Set up IAP Firewall

Ensure IAP can pass through the firewall by creating a rule to allow 35.235.240.0/20.

Set up new firewall rule

Click Create

Create new firewall rule

Set up NAT Gateway

Create a NAT Gateway so all traffic can go out through a specific IP.

Set up NAT Gateway

  1. Enter the gateway name
  2. Select VPC
  3. Choose the region
  4. Create a router

Set up NAT Gateway

Enter the router name and click Create

Create router

Finally, create Cloud NAT

Create NAT Gateway

Create Artifact Registry

Click + Create

Create Artifact Registry

Enter the name, choose Docker format, and select the region to create.

Create repository

Create GKE autopilot Private Cluster

Create GKE

Create GKE

  1. Enter the name
  2. Select the region
  3. Set up the network

Set up GKE cluster

  1. Select the network and subnet
  2. Choose private cluster
  3. Check to enable external connections
  4. Check to allow different regions to connect, and to lock specific external IPs to connect to the control plane
  5. Create

Set up GKE cluster network

(Optional) Create Bastion VM

If you want the control plane to have no external access, you can use a VM in the same subnet as a bastion, or use a VPN in a production environment to connect from on-premise to the private cluster control plane.

Create Service Account

To allow the bastion VM to obtain and operate GKE, create a service account and provide permissions.

Create service account

Enter the name, click Create and continue.

Set up service account

Grant the following permissions to complete:

  • Kubernetes Engine Developer
  • Logs Writer
  • Monitoring Metric Writer
  • Monitoring Viewer
  • Stackdriver Resource Metadata Writer

(Optional) Provide Artifact Registry Permissions to Bastion VM service account

  • Add the previously created service account to Artifact Registry with the Artifact Registry Writer role.
  • Check the previously created repo, click show info panel, and then click ADD PRINCIPAL

Set Artifact repo permissions

  1. Paste the previously created service account
  2. Add the Artifact Registry Writer permission
  3. Save

Set service account write permissions for artifact repo

Create GCE VM

Create a VM within the same internal network as the GKE Cluster without an external IP.

Create GCE instance

Enter the name and choose an appropriate VM size

Choose GCE instance size and name

  1. Choose the operating system, for instance Ubuntu 22.04
  2. Select the previously created service account

Set boot disk and service account

Expand Advanced options and Networking

Set GCE networking

  1. Select the VPC and subnet created earlier
  2. Disable external IP
  3. Done

Set GCE network interface

Create the VM

Create VM

Test

Click SSH

SSH VM Test

Install gcloud

Refer to this document for installing gcloud commands.

GCP Bastion VM
1
sudo su -
2
sudo apt-get update
3
sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo
4
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
5
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
6
sudo apt-get update && sudo apt-get install google-cloud-cli
7
# Install kubectl and related packages
8
apt install kubectl google-cloud-sdk-gke-gcloud-auth-plugin -y

Connect to the GKE cluster using the internal IP

GCP Bastion VM
1
gcloud container clusters get-credentials CLUSTER_NAME --project=PROJECT_NAME --region=asia-east1 --internal-ip
2
# View kube config
3
cat ~/.kube/config
4
5
# Use kubectl to connect to the cluster; theoretically, there should be nothing, as no resources are created
6
kubectl get no -o wide
Install docker and push image to Artifact Registry

Follow this guide to install Docker.

GCP Bastion VM
1
# Add Docker's official GPG key:
2
sudo apt-get update
3
sudo apt-get install ca-certificates curl gnupg
4
sudo install -m 0755 -d /etc/apt/keyrings
5
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
6
sudo chmod a+r /etc/apt/keyrings/docker.gpg
7
8
# Add the repository to Apt sources:
9
echo \
10
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
11
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
12
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
13
sudo apt-get update
14
15
# Install Packages
16
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Docker push image to Registry

Click the repo that was created in Artifact Registry

Click Artifacts repo

Copy the path, then click SETUP INSTRUCTIONS

Click Artifacts setup instructions

Copy configure docker command

Copy configure docker command

Paste the command in the console

GCP Bastion VM
1
gcloud auth configure-docker asia-east1-docker.pkg.dev

Expected result; during the process, input Y to agree:

GCP Bastion VM
1
Adding credentials for: asia-east1-docker.pkg.dev
2
After update, the following will be written to your Docker config file located at
3
[/root/.docker/config.json]:
4
{
5
"credHelpers": {
6
"asia-east1-docker.pkg.dev": "gcloud"
7
}
8
}
9
10
Do you want to continue (Y/n)? Y
11
12
Docker configuration file updated.

Pull Ubuntu image and push it to Artifact Registry

GCP Bastion VM
1
docker pull ubuntu
2
3
# Tag with the copied path and name + tag
4
docker tag ubuntu:latest asia-east1-docker.pkg.dev/PROJECT_NAME/REPO_NAME/demo:latest
5
6
# Push to the repo
7
docker push asia-east1-docker.pkg.dev/PROJECT_NAME/REPO_NAME/demo:latest

Now, the image should be stored in Artifact Registry.

Artifact Registry push results

Deploy image to GKE

Use the following yaml for deployment and change the image path to the one used earlier for push.

GCP Bastion VM
1
echo 'apiVersion: v1
2
kind: Pod
3
metadata:
4
name: ubuntu
5
spec:
6
containers:
7
- name: ubuntu
8
image: asia-east1-docker.pkg.dev/PROJECT_NAME/REPO_NAME/demo:latest
9
# Just spin & wait forever
10
command: [ "/bin/bash", "-c", "--" ]
11
args: [ "while true; do sleep 30000; done;" ]' > sleep.yaml

Deploy to GKE autopilot, it can take several minutes to complete the deployment

GCP Bastion VM
1
kubectl apply -f sleep.yaml
2
kubectl get po

Expected result:

GCP Bastion VM
1
NAME READY STATUS RESTARTS AGE
2
ubuntu 1/1 Running 0 4m21s

Check if the external IP is the NAT Gateway IP

GCP Bastion VM
1
kubectl exec ubuntu -it -- apt update > /dev/null && apt install -y curl > /dev/null && curl https://api.ipify.org

After testing, remove the pod

GCP Bastion VM
1
kubectl delete -f sleep.yaml

Create CloudBuild Private Pool VPC

The following steps are based on this and this document.

Create new VPC

Same method as creating “GKE VPC”, configure as follows:

  • VPC Name: build-network
  • Subnet Name: nat-subnet
  • IP: 10.1.0.0/24
  • Region: asia-east1

Create PSA

Open VPC to create IP RANGE

Create VPC PSA

Allocate IP range

After creation, ensure the firewall allows this segment

Set VPC IP range

Create Private connections to services

Click Create connection

PSA Setup

Select Google Cloud Platform and check the newly created subnet

Create private connection

Open Export custom route

Open Export custom route

Firewall Settings

  • Name: allow-pool-to-nat
  • Direction of traffic: Ingress
  • Action on match: Allow
  • Targets: Specified target tags
  • Target tags: nat-gateway
  • Source IPv4 Ranges: 192.168.124.0/24
  • Protocols and ports: Allow all

Create CloudBuild

Set Cloud Build Network Environment

Click me to enable the necessary API and refer to this document to create Private Pool.

Create a worker pool.

Create worker pool

  1. Enter name and select region
  2. Choose Private network
  3. Select the created VPC network (build-network)
  4. Do not assign an external IP to ensure connections through private VPC
  5. Create, using default settings of e2-medium and providing 100GB disk space

Set up worker pool

Create Bridging VM

Refer to this document to ensure CloudBuild uses a fixed IP when connecting to GKE.

Create VM, enter the name, choose the region and an appropriate size, e.g., e2-micro

Create Bridging VM

  1. Expand advanced settings for the network setup
  2. Input network tags: direct-gateway-access, nat-gateway to be used later
  3. Enable IP forwarding

Set up Bridging VM network

  1. Select the created VPC network (build-network) and its subnetwork
  2. Reserve static IP
  3. Done

Set up Bridging VM network - reserve internal static IP

  1. Enter the name for the reserved IP
  2. Customize the IP or let it auto-allocate
  3. Reserve

Set up Bridging VM network - reserve external static IP

Also, reserve a public IP

Set up Bridging VM network - set reserved external static IP

Paste the following startup script in the startup script field

Bridging VM startup script
1
sysctl -w net.ipv4.ip_forward=1
2
iptables -t nat -A POSTROUTING -o $(ip addr show scope global | head -1 | awk -F: '{print $2}') -j MASQUERADE

Set up Bridging VM startup script

Create VPC Routes

To ensure all outgoing connections use fixed IP, create 4 routes. The main reason is to avoid the default 0.0.0.0/0. Thus, use 0.0.0.0/1 and 128.0.0.0/1 to prioritize.

  • Redirect all traffic to the Bridging VM, priority 100
    • 0.0.0.0/1 -> VM
    • 128.0.0.0/1 -> VM
  • Redirect traffic from the Bridging VM to the Default internet gateway (i.e., the VM’s public IP), priority 10 (higher than the first two routes)
    • VM (0.0.0.0/1) -> Default internet gateway
    • VM (128.0.0.0/1) -> Default internet gateway

Open VPC Route and create

  1. Select Routes
  2. Click ROUTE MANAGEMENT
  3. Create ROUTE

Create route

The first two routes redirect all traffic to the VM as follows

  1. Name them through-nat, through-nat2 respectively
  2. Select the VPC (build-network)
  3. Enter 0.0.0.0/1 and 128.0.0.0/1 respectively
  4. Set priority as 100
  5. Specify the next hop for all outgoing traffic and enter the previously created VM internal static IP

Click Create at the bottom and create two routes.

Set inbound route

The next two routes allow traffic from the VM to the external IP. Configure as follows:

  1. Name them direct-to-gateway1 and direct-to-gateway2
  2. Select the VPC (build-network)
  3. Enter 0.0.0.0/1 and 128.0.0.0/1 respectively
  4. Set priority as 10 (lower than the first two routes)
  5. Enter the tag: direct-gateway-access
  6. Next hop: Default internet gateway

Click Create at the bottom and create two routes.

Set outbound route

Set GKE Authorized Network

Configure to allow connection from the previously created Bridging VM. Click on the created GKE Cluster.

Set GKE

Click Control plane authorized networks and Edit

Edit control plane

  1. Add a new entry
  2. Enter name and Bridging VM public static IP
  3. Save

Edit control plane authorized network

Test External Connectivity

Open Cloud Shell and create a cloudbuild.yaml file. Paste the following content and modify the purple parameters.

This script will verify if the outgoing IP is the Bridging VM’s public static IP.

cloudbuild.yaml
1
steps:
2
- name: alpine
3
args:
4
- sh
5
- -exc
6
- |
7
apk update
8
apk add bind-tools
9
dig @resolver1.opendns.com myip.opendns.com
10
options:
11
pool:
12
name: 'projects/YOUR_PROJECT_NAME/locations/asia-east1/workerPools/YOUR_POOL_NAME'

Run the following command

Shell
1
gcloud builds submit --no-source

Expected result; the highlighted part is the VM’s external IP

Shell
1
Created [https://cloudbuild.googleapis.com/v1/projects/YOUR_PROJECT_NAME/locations/asia-east1/builds/f25e0644-3d37-4a8a-89ac-xxxxxxx].
2
Logs are available at [ https://console.cloud.google.com/cloud-build/builds;region=asia-east1/f25e0644-3d37-4a8a-89ac-xxxxxxx?project=xxxxxxx ].
3
----------------------------------------------------------- REMOTE BUILD OUTPUT -----------------------------------------------------------
4
starting build "f25e0644-3d37-4a8a-89ac-xxxxxxx"
5
6
FETCHSOURCE
7
BUILD
8
Pulling image: alpine
9
Using default tag: latest
10
latest: Pulling from library/alpine
11
Digest: sha256:51b67269f354137895d43f3b3d810bfacd39454xxxxxxx
12
Status: Downloaded newer image for alpine:latest
13
docker.io/library/alpine:latest
14
+ apk update
15
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
16
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
17
v3.19.0-148-g1780794db9c [https://dl-cdn.alpinelinux.org/alpine/v3.19/main]
18
v3.19.0-149-gf57fb478059 [https://dl-cdn.alpinelinux.org/alpine/v3.19/community]
19
OK: 22981 distinct packages available
20
+ apk add bind-tools
21
(1/14) Installing fstrm (0.6.1-r4)
22
(2/14) Installing krb5-conf (1.0-r2)
23
(3/14) Installing libcom_err (1.47.0-r5)
24
(4/14) Installing keyutils-libs (1.6.3-r3)
25
(5/14) Installing libverto (0.3.2-r2)
26
(6/14) Installing krb5-libs (1.21.2-r0)
27
(7/14) Installing json-c (0.17-r0)
28
(8/14) Installing nghttp2-libs (1.58.0-r0)
29
(9/14) Installing protobuf-c (1.4.1-r7)
30
(10/14) Installing libuv (1.47.0-r0)
31
(11/14) Installing xz-libs (5.4.5-r0)
32
(12/14) Installing libxml2 (2.11.6-r0)
33
(13/14) Installing bind-libs (9.18.19-r1)
34
(14/14) Installing bind-tools (9.18.19-r1)
35
Executing busybox-1.36.1-r15.trigger
36
OK: 15 MiB in 29 packages
37
+ dig @resolver1.opendns.com myip.opendns.com
38
39
; <<>> DiG 9.18.19 <<>> @resolver1.opendns.com myip.opendns.com
40
; (1 server found)
41
;; global options: +cmd
42
;; Got answer:
43
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52347
44
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
45
46
;; OPT PSEUDOSECTION:
47
; EDNS: version: 0, flags:; udp: 4096
48
;; QUESTION SECTION:
49
;myip.opendns.com. IN A
50
51
;; ANSWER SECTION:
52
myip.opendns.com. 0 IN A x.x.x.x
53
54
;; Query time: 28 msec
55
;; SERVER: 208.67.222.222#53(resolver1.opendns.com) (UDP)
56
;; WHEN: Fri Dec 29 01:31:47 UTC 2023
57
;; MSG SIZE rcvd: 61
58
59
PUSH
60
DONE
61
-------------------------------------------------------------------------------------------------------------------------------------------
62
ID: f25e0644-3d37-4a8a-89ac-xxxxxxx
63
CREATE_TIME: 2023-12-29T01:30:57+00:00
64
DURATION: 9S
65
SOURCE: -
66
IMAGES: -
67
STATUS: SUCCESS

Cloudbuild yaml file example

In the substitutions section, replace it with your environment variables. This example deploys a specific container image to GKE and integrates with CloudBuild triggered by Pub/Sub for GKE deployment.

cloudbuild.yaml
1
---
2
steps:
3
- name: gcr.io/cloud-builders/gcloud
4
entrypoint: gcloud
5
args: ["container", "clusters", "get-credentials", "$_CLUSTER_NAME", "--region", "$_REGION_NAME", "--project", "$_PROJECT_ID"]
6
7
- name: gcr.io/cloud-builders/gcloud
8
entrypoint: sh
9
args: ["-c", "cat $_DEPLOY.template |sed -e 's%{{IMAGE}}%$_REGISTRY_NAME/$_PROJECT_ID/$_REGISTRY_REPO_NAME/$_REPO_NAME:$_TAG_NAME%g' > $_DEPLOY"]
10
11
- name: gcr.io/cloud-builders/gcloud
12
entrypoint: kubectl
13
args: ["apply", "-f", "$_DEPLOY", "-n", "$_NAMESPACE_NAME"]
14
15
options:
16
env:
17
- 'KUBECONFIG=/workspace/kubeconfig'
18
logging: CLOUD_LOGGING_ONLY
19
pool:
20
name: 'projects/$_PROJECT_ID/locations/$_REGION_NAME/workerPools/$_WORKER_POOL_NAME'
21
22
substitutions:
23
_PROJECT_ID: YOUR_PROJECT_ID
24
_REGISTRY_NAME: YOUR_REGISTRY_LOCATION
25
_REGISTRY_REPO_NAME: YOUR_REGISTRY_REPO_NAME
26
_REPO_NAME: "YOUR_REPO_NAME"
27
_TAG_NAME: "latest"
28
_DEPLOY: kube-hello-change.yaml # YOUR K8S YAML
29
_CLUSTER_NAME: YOUR_GKE_CLUSTER_NAME
30
_REGION_NAME: asia-east1
31
_NAMESPACE_NAME: default
32
_WORKER_POOL_NAME: YOUR_WORKER_POOL_NAME
33
34
timeout: 6000s

Example

cloudbuild.yaml
1
---
2
steps:
3
4
- name: gcr.io/cloud-builders/gcloud
5
entrypoint: gcloud
6
args: ["container", "clusters", "get-credentials", "$_CLUSTER_NAME", "--region", "$_REGION_NAME", "--project", "$_PROJECT_ID"]
7
8
- name: gcr.io/cloud-builders/gcloud
9
entrypoint: sh
10
args: ["-c", "cat deploy/$_DEPLOY.template |sed -e 's%{{IMAGE}}%$_REGISTRY_NAME/$_PROJECT_ID/$_REGISTRY_REPO_NAME/$_REPO_NAME:$_TAG_NAME%g' > $_DEPLOY"]
11
12
- name: gcr.io/cloud-builders/gcloud
13
entrypoint: kubectl
14
args: ["apply", "-f", "$_DEPLOY", "-n", "$_NAMESPACE_NAME"]
15
16
options:
17
logging: CLOUD_LOGGING_ONLY
18
env:
19
- 'KUBECONFIG=/workspace/kubeconfig'
20
pool:
21
name: 'projects/$_PROJECT_ID/locations/$_REGION_NAME/workerPools/$_WORKER_POOL_NAME'
22
substitutions:
23
_PROJECT_ID: OOOOOOXXXXXXXX
24
_REGISTRY_NAME: asia-east1-docker.pkg.dev
25
_REGISTRY_REPO_NAME: docker-repo
26
_REPO_NAME: "test-api"
27
_TAG_NAME: "latest"
28
_DIR_NAME: "./"
29
_DEPLOY: kube-hello-change.yaml
30
_CLUSTER_NAME: dev-cluster
31
_REGION_NAME: asia-east1
32
_NAMESPACE_NAME: default
33
_WORKER_POOL_NAME: test-build
34
35
timeout: 6000s
kube-hello-change.yaml.template
1
apiVersion: v1
2
kind: Service
3
metadata:
4
name: hello-test-service
5
annotations:
6
networking.gke.io/load-balancer-type: "Internal"
7
spec:
8
type: LoadBalancer
9
externalTrafficPolicy: Cluster
10
selector:
11
app: hello-test
12
ports:
13
- protocol: "TCP"
14
port: 8080
15
targetPort: 9999
16
17
---
18
apiVersion: apps/v1
19
kind: Deployment
20
metadata:
21
name: hello-test
22
spec:
23
selector:
24
matchLabels:
25
app: hello-test
26
replicas: 1
27
template:
28
metadata:
29
labels:
30
app: hello-test
31
spec:
32
containers:
33
- name: test-pod
34
image: {{IMAGE}}
35
imagePullPolicy: Always
36
ports:
37
- name: http-server
38
containerPort: 9999
39
livenessProbe:
40
httpGet:
41
path: /actuator/health/liveness
42
port: http-server
43
initialDelaySeconds: 60
44
periodSeconds: 5
45
resources:
46
limits:
47
cpu: 500m
48
ephemeral-storage: 1Gi
49
memory: 2Gi
50
requests:
51
cpu: 500m
52
ephemeral-storage: 1Gi
53
memory: 2Gi
54
readinessProbe:
55
httpGet:
56
path: /actuator/health/readiness
57
port: http-server
58
initialDelaySeconds: 60
59
periodSeconds: 5