GKE Cluster Setup

Creating a GKE cluster involves setting up the Google Cloud infrastructure (VPC network, IAM roles), creating the cluster control plane, configuring worker nodes (Standard mode) or enabling Autopilot mode, and connecting your local kubectl to the cluster. This guide covers the complete setup process from prerequisites to deploying your first application.

Prerequisites

Before creating a GKE cluster, ensure you have:

Google Cloud Account Requirements

  • Google Cloud Project - Active project with billing enabled
  • IAM Permissions - Ability to create clusters, node pools, and manage resources
  • Service Quotas - Sufficient service quotas for Compute Engine, VPC, and GKE
  • Region Selection - Choose a Google Cloud region where GKE is available

Local Tools

Install these tools on your local machine:

kubectl:

# macOS
brew install kubectl

# Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Verify installation
kubectl version --client

gcloud CLI:

# macOS
brew install google-cloud-sdk

# Linux
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# Verify installation
gcloud --version

Google Cloud Authentication:

# Login to Google Cloud
gcloud auth login

# Set default project
gcloud config set project YOUR_PROJECT_ID

# Verify authentication
gcloud auth list

Enable Required APIs:

# Enable GKE API
gcloud services enable container.googleapis.com

# Enable Compute Engine API (required for nodes)
gcloud services enable compute.googleapis.com

Understanding GKE Components

Before creating a cluster, understand what gets created:

graph TB subgraph gcp_resources[Google Cloud Resources Created] GKE[GKE Cluster] --> CP[Control Plane] GKE --> VPC[VPC Network] GKE --> FW[Firewall Rules] GKE --> IAM_SVC[Service Account] end subgraph node_resources[Node Resources - Standard Mode] NP[Node Pool] --> INSTANCE_GROUP[Managed Instance Group] INSTANCE_GROUP --> VMs[Compute Engine VMs] end subgraph autopilot[Autopilot Mode] AUTO[Autopilot] --> AUTO_NODES[Auto-Managed Nodes] end GKE --> NP GKE --> AUTO style GKE fill:#e1f5ff style CP fill:#fff4e1 style NP fill:#e8f5e9 style AUTO fill:#f3e5f5

GKE Cluster:

  • Control plane managed by Google Cloud
  • API endpoint for cluster access
  • Cluster configuration and version

VPC Network and Networking:

  • VPC network for cluster isolation
  • Subnets across zones
  • Firewall rules for traffic control
  • Route tables for traffic routing

Service Accounts:

  • Cluster Service Account - Permissions for cluster components
  • Node Service Account - Permissions for worker nodes to access GCP services

Node Pool (Standard Mode):

  • Managed Instance Group for worker nodes
  • Compute Engine VM instances running Kubernetes node components

Autopilot Mode:

  • Fully managed nodes
  • Automatic provisioning and scaling
  • No node pool management required

Creating a Cluster

There are three main ways to create a GKE cluster:

gcloud CLI is the official tool for GKE cluster creation:

Simple Cluster Creation (Standard Mode):

# Create cluster with default settings
gcloud container clusters create my-cluster \
  --zone us-central1-a \
  --num-nodes 3 \
  --machine-type n1-standard-2

This single command creates:

  • GKE cluster
  • Default node pool with 3 nodes
  • VPC network (if not specified)
  • Firewall rules
  • kubeconfig configuration

Simple Cluster Creation (Autopilot Mode):

# Create Autopilot cluster
gcloud container clusters create-auto my-autopilot-cluster \
  --region us-central1

Advanced Cluster Configuration:

# Create cluster with custom configuration
gcloud container clusters create production-cluster \
  --zone us-central1-a \
  --num-nodes 3 \
  --machine-type n1-standard-2 \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --enable-autorepair \
  --enable-autoupgrade \
  --network my-vpc \
  --subnetwork my-subnet \
  --enable-private-nodes \
  --enable-private-endpoint \
  --enable-shielded-nodes \
  --enable-binary-authorization \
  --workload-pool PROJECT_ID.svc.id.goog

Method 2: Google Cloud Console

Creating via the Google Cloud Console provides a visual interface:

  1. Navigate to GKE:

    • Go to Google Cloud Console → Kubernetes Engine → Clusters → Create
  2. Choose Cluster Mode:

    • Standard mode (manage nodes yourself)
    • Autopilot mode (fully managed)
  3. Configure Cluster (Standard):

    • Cluster name
    • Kubernetes version
    • Location (zone or region)
    • VPC network and subnet
    • Private cluster options
    • Security options
  4. Configure Node Pool (Standard):

    • Machine type and size
    • Number of nodes
    • Auto-scaling configuration
    • Node labels and taints
  5. Create Cluster:

    • Click Create
    • Wait for cluster creation
  6. Configure kubectl:

    • Click Connect
    • Run the provided command

Method 3: Terraform

For infrastructure as code, use Terraform:

# main.tf
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "google" {
  project = "my-project-id"
  region  = "us-central1"
}

resource "google_container_cluster" "primary" {
  name     = "my-cluster"
  location = "us-central1-a"

  remove_default_node_pool = true
  initial_node_count       = 1

  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.subnet.name

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = true
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  workload_identity_config {
    workload_pool = "my-project-id.svc.id.goog"
  }

  addons_config {
    network_policy_config {
      disabled = false
    }
  }
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "general-pool"
  location   = "us-central1-a"
  cluster    = google_container_cluster.primary.name
  node_count = 3

  autoscaling {
    min_node_count = 1
    max_node_count = 10
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }

  node_config {
    preemptible  = false
    machine_type = "n1-standard-2"

    workload_metadata_config {
      mode = "GKE_METADATA"
    }

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }
}

resource "google_compute_network" "vpc" {
  name                    = "gke-vpc"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "subnet" {
  name          = "gke-subnet"
  ip_cidr_range = "10.0.0.0/24"
  region        = "us-central1"
  network       = google_compute_network.vpc.id
}

Apply with:

terraform init
terraform plan
terraform apply

Standard vs Autopilot Clusters

Standard Mode

Standard mode gives you full control over node configuration:

Characteristics:

  • You manage node pools and nodes
  • Full control over machine types and sizes
  • Manual or automatic node scaling
  • Node lifecycle management
  • Lower cost for large, predictable workloads

Use When:

  • Need specific node configurations
  • Have predictable workloads
  • Want full control over nodes
  • Need custom node images or configurations

Autopilot Mode

Autopilot mode provides fully managed nodes:

Characteristics:

  • Google Cloud manages nodes automatically
  • Pay only for requested resources (CPU, memory)
  • Automatic scaling and optimization
  • Enhanced security defaults
  • No node pool management

Use When:

  • Want simplified operations
  • Have variable workloads
  • Prefer pay-per-pod pricing
  • Don’t need specific node configurations

Cluster Configuration Options

Kubernetes Version

Choose a Kubernetes version supported by GKE:

# List available versions
gcloud container get-server-config --zone us-central1-a

# Create cluster with specific version
gcloud container clusters create my-cluster \
  --zone us-central1-a \
  --cluster-version 1.28.0-gke.100

Version Considerations:

  • Use a recent stable version for new features
  • Check GKE version support lifecycle
  • Consider upgrade path when choosing version
  • Test version compatibility with your applications

VPC Network Configuration

GKE requires a VPC network with specific configuration:

Subnet Requirements:

  • At least one subnet for nodes
  • Sufficient IP addresses for pods (alias IP ranges)
  • Secondary IP ranges for pods

IP Address Planning:

graph TB VPC[VPC: 10.0.0.0/16] --> S1[Subnet: 10.0.1.0/24<br/>Primary Range] S1 --> N1[Nodes: 10.0.1.0/26] S1 --> P1[Pods: 10.10.0.0/14<br/>Secondary Range] style VPC fill:#e1f5ff style S1 fill:#fff4e1 style P1 fill:#e8f5e9

VPC-Native Networking:

  • Pods get IP addresses from secondary IP ranges
  • No overlay networks required
  • Direct VPC connectivity

Private Clusters

Configure private clusters for enhanced security:

# Create private cluster
gcloud container clusters create private-cluster \
  --zone us-central1-a \
  --enable-private-nodes \
  --enable-private-endpoint \
  --master-ipv4-cidr 172.16.0.0/28 \
  --network my-vpc \
  --subnetwork my-subnet

Private Cluster Features:

  • Private nodes (no external IPs)
  • Private endpoint (API server only accessible from VPC)
  • Enhanced security
  • Requires VPN or bastion host for access

Initial Node Pool Setup (Standard Mode)

After creating the cluster, configure node pools:

# Create node pool
gcloud container node-pools create general-pool \
  --cluster my-cluster \
  --zone us-central1-a \
  --num-nodes 3 \
  --machine-type n1-standard-2 \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --enable-autorepair \
  --enable-autoupgrade

Node Pool Configuration Options:

  • Machine types and sizes
  • Minimum, maximum, and initial node count
  • Auto-scaling configuration
  • Auto-repair and auto-upgrade
  • Preemptible VMs for cost savings
  • Labels and taints

Autopilot Configuration

Autopilot clusters don’t require node pool configuration:

# Create Autopilot cluster
gcloud container clusters create-auto my-autopilot-cluster \
  --region us-central1 \
  --release-channel regular \
  --workload-pool PROJECT_ID.svc.id.goog

Autopilot Features:

  • Automatic node provisioning
  • Automatic scaling
  • Enhanced security defaults
  • Pay-per-pod pricing
  • No node management

Cluster Authentication

Configure kubectl to access your cluster:

Get Cluster Credentials

# Get cluster credentials
gcloud container clusters get-credentials my-cluster \
  --zone us-central1-a

# For regional clusters
gcloud container clusters get-credentials my-cluster \
  --region us-central1

This updates ~/.kube/config with cluster credentials.

Verify Access

# Test cluster access
kubectl get nodes

# Should show your worker nodes (Standard mode)
NAME                                        STATUS   ROLES    AGE   VERSION
gke-my-cluster-default-pool-xxx-yyy        Ready    <none>   5m    v1.28.0-gke.100
gke-my-cluster-default-pool-xxx-zzz        Ready    <none>   5m    v1.28.0-gke.100
gke-my-cluster-default-pool-xxx-aaa        Ready    <none>   5m    v1.28.0-gke.100

Cloud IAM Integration

GKE integrates with Google Cloud IAM for authentication:

# Grant IAM permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="user:[email protected]" \
  --role="roles/container.developer"

# Grant cluster admin
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="user:[email protected]" \
  --role="roles/container.clusterAdmin"

IAM Roles:

  • container.viewer - View clusters
  • container.developer - Create and update resources
  • container.clusterAdmin - Full cluster administration
  • container.admin - Full project administration

Post-Setup Configuration

After cluster creation, configure essential components:

Enable Workload Identity

Enable Workload Identity for pod-level authentication:

# Enable Workload Identity on cluster
gcloud container clusters update my-cluster \
  --zone us-central1-a \
  --workload-pool PROJECT_ID.svc.id.goog

# Enable Workload Identity on node pool
gcloud container node-pools update default-pool \
  --cluster my-cluster \
  --zone us-central1-a \
  --workload-metadata=GKE_METADATA

Configure Network Policy

Enable network policies for pod-to-pod isolation:

# Enable network policy
gcloud container clusters update my-cluster \
  --zone us-central1-a \
  --enable-network-policy

Enable Binary Authorization

Enable Binary Authorization for container image verification:

# Enable Binary Authorization
gcloud container clusters update my-cluster \
  --zone us-central1-a \
  --enable-binary-authorization

Set Up Monitoring

Enable Cloud Operations (monitoring and logging):

# Enable monitoring
gcloud container clusters update my-cluster \
  --zone us-central1-a \
  --monitoring=SYSTEM,WORKLOAD

Best Practices

  1. Use Regional Clusters - For high availability across zones

  2. Enable Auto-Repair and Auto-Upgrade - Automatic node maintenance

  3. Enable Workload Identity - For secure pod-to-GCP authentication

  4. Use Private Clusters - For enhanced security in production

  5. Configure Auto-Scaling - For cost optimization

  6. Use Preemptible VMs - For cost savings on non-critical workloads

  7. Enable Network Policy - For pod-to-pod isolation

  8. Enable Binary Authorization - For container image security

  9. Set Resource Quotas - To prevent resource exhaustion

  10. Use Release Channels - For automatic version management

Common Issues

Insufficient IP Addresses

Problem: Pods can’t get IP addresses

Solution:

  • Increase secondary IP range size
  • Create additional secondary ranges
  • Use larger subnet CIDR

Cluster Creation Fails

Problem: Cluster creation times out or fails

Solution:

  • Check IAM permissions
  • Verify service quotas
  • Check VPC network configuration
  • Review Cloud Logging for errors

kubectl Access Denied

Problem: Can’t access cluster with kubectl

Solution:

  • Verify cluster credentials are updated
  • Check Cloud IAM permissions
  • Verify you’re authenticated: gcloud auth list
  • Check private endpoint configuration

Node Pool Creation Fails

Problem: Node pool creation times out or fails

Solution:

  • Check service account permissions
  • Verify firewall rules
  • Check subnet configuration
  • Review Cloud Logging for errors

Next Steps

After cluster setup:

See Also