跳到內容

GKE Provision Ephemeral Storage with Local SSDs

Scenario

  • 建立一個非 local SSD node pool 的 GKE 集群
  • 使用 local SSD node pool 更新 GKE 集群

注意:

  • local SSD 需要機器類型 n1-standard-1 或更大; 預設機器類型 e2-medium 不支援。 您可以在 Compute Engine 文件中了解更多關於機器類型的訊息。

  • 從版本 1.25.3-gke.1800 開始,GKE node pool 可以配置為使用具有 NVMe 接口的 local SSD 作為本地臨時存儲。要了解更多關於 GKE 上 local SSD 支援的訊息,請參見關於 local SSD

步驟

建立 GKE 集群

建立不帶 local SSD 的 GKE 集群

連接到 GKE 集群

CloudShell
1
gcloud container clusters get-credentials before-localssd --zone asia-east1-a --project PROJECT_ID

更新 GKE 集群

CloudShell
1
gcloud container node-pools create ssd-pool --cluster=before-localssd --ephemeral-storage-local-ssd count=1 --machine-type=n1-standard-2 --zone=asia-east1-a

檢查 node pool 狀態

  • gke-before-localssd-default-pool 是預設 node pool
  • gke-before-localssd-ssd-pool 是 local SSD node pool
CloudShell
1
$ kubectl get no
2
NAME STATUS ROLES AGE VERSION
3
gke-before-localssd-default-pool-77fef070-79g2 Ready <none> 34m v1.27.3-gke.100
4
gke-before-localssd-default-pool-77fef070-9rwj Ready <none> 34m v1.27.3-gke.100
5
gke-before-localssd-default-pool-77fef070-p285 Ready <none> 34m v1.27.3-gke.100
6
gke-before-localssd-ssd-pool-acba8d73-k2bd Ready <none> 27m v1.27.3-gke.100
7
gke-before-localssd-ssd-pool-acba8d73-ntk9 Ready <none> 27m v1.27.3-gke.100
8
gke-before-localssd-ssd-pool-acba8d73-p84z Ready <none> 27m v1.27.3-gke.100

檢查 local SSD

通過 kubectl 命令檢查

CloudShell
1
kubectl describe no gke-before-localssd-ssd-pool-acba8d73-k2bd | grep gke-ephemeral-storage-local-ssd=true
2
3
cloud.google.com/gke-ephemeral-storage-local-ssd=true

通過 GCE 登錄檢查

ssh 進入 local SSD GCE 節點

CloudShell
1
# df -h | grep ssd
2
/dev/nvme0n1 369G 2.0G 348G 1% /mnt/stateful_partition/kube-ephemeral-ssd
3
4
# fdisk -l /dev/disk/by-id/google-local-nvme-ssd-0
5
Disk /dev/disk/by-id/google-local-nvme-ssd-0: 375 GiB, 402653184000 bytes, 98304000 sectors
6
Disk model: nvme_card
7
Units: sectors of 1 * 4096 = 4096 bytes
8
Sector size (logical/physical): 4096 bytes / 4096 bytes
9
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

建立 Pod 以利用 local SSD

建立使用臨時 volume 的 Pod

CloudShell
1
echo 'apiVersion: v1
2
kind: Pod
3
metadata:
4
name: ssd-pod
5
spec:
6
containers:
7
- name: before-localssd
8
image: "nginx"
9
resources:
10
requests:
11
ephemeral-storage: "200Gi"
12
volumeMounts:
13
- mountPath: /cache
14
name: scratch-volume
15
nodeSelector:
16
cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
17
volumes:
18
- name: scratch-volume
19
emptyDir: {}' > ssd-pod.yaml
20
kubectl apply -f ssd-pod.yaml

fio 基準測試

CloudShell
1
# 登錄 pod
2
kubectl exec -it ssd-pod -- bash

安裝 fio 套件並進行基準測試

通過執行多個並行流(16+)的順序寫入來測試寫入吞吐量,使用 1 MB 的 I/O 塊大小和至少 64 的 I/O 深度。您還可以參考 local SSD 性能 來測試硬碟性能。

CloudShell
1
apt update
2
apt install fio -y
3
TEST_DIR=/cache
4
5
fio --name=write_throughput --directory=$TEST_DIR --numjobs=16 --size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1 --iodepth_batch_submit=64 --iodepth_batch_complete_max=64

結果

CloudShell
1
write_throughput: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
2
...
3
fio-3.33
4
Starting 16 processes
5
write_throughput: Laying out IO file (1 file / 10240MiB)
6
write_throughput: Laying out IO file (1 file / 10240MiB)
7
write_throughput: Laying out IO file (1 file / 10240MiB)
8
write_throughput: Laying out IO file (1 file / 10240MiB)
9
write_throughput: Laying out IO file (1 file / 10240MiB)
10
write_throughput: Laying out IO file (1 file / 10240MiB)
11
write_throughput: Laying out IO file (1 file / 10240MiB)
12
write_throughput: Laying out IO file (1 file / 10240MiB)
13
write_throughput: Laying out IO file (1 file / 10240MiB)
14
write_throughput: Laying out IO file (1 file / 10240MiB)
15
write_throughput: Laying out IO file (1 file / 10240MiB)
16
write_throughput: Laying out IO file (1 file / 10240MiB)
17
write_throughput: Laying out IO file (1 file / 10240MiB)
18
write_throughput: Laying out IO file (1 file / 10240MiB)
19
write_throughput: Laying out IO file (1 file / 10240MiB)
20
write_throughput: Laying out IO file (1 file / 10240MiB)
21
Jobs: 1 (f=1): [_(8),W(1),_(7)][16.9%][w=406MiB/s][w=406 IOPS][eta 05m:29s]
22
write_throughput: (groupid=0, jobs=16): err= 0: pid=719: Thu Sep 14 07:56:37 2023
23
write: IOPS=382, BW=398MiB/s (417MB/s)(24.4GiB/62748msec); 0 zone resets
24
slat (usec): min=59, max=2834.4k, avg=12741.82, stdev=69312.45
25
clat (msec): min=4, max=6187, avg=2589.37, stdev=1428.72
26
lat (msec): min=5, max=6188, avg=2601.89, stdev=1418.38
27
clat percentiles (msec):
28
| 1.00th=[ 75], 5.00th=[ 284], 10.00th=[ 659], 20.00th=[ 1183],
29
| 30.00th=[ 1670], 40.00th=[ 2039], 50.00th=[ 2601], 60.00th=[ 3104],
30
| 70.00th=[ 3507], 80.00th=[ 3910], 90.00th=[ 4463], 95.00th=[ 4799],
31
| 99.00th=[ 5604], 99.50th=[ 5873], 99.90th=[ 5940], 99.95th=[ 5940],
32
| 99.99th=[ 6208]
33
bw ( MiB/s): min= 39, max= 2865, per=100.00%, avg=1090.83, stdev=49.21, samples=713
34
iops : min= 39, max= 2864, avg=1090.17, stdev=49.19, samples=713
35
lat (msec) : 10=0.12%, 20=0.08%, 50=0.43%, 100=0.82%, 250=3.51%
36
lat (msec) : 500=3.85%, 750=3.11%, 1000=4.46%, 2000=23.62%, >=2000=63.80%
37
cpu : usr=0.16%, sys=0.12%, ctx=9475, majf=0, minf=600
38
IO depths : 1=0.8%, 2=0.3%, 4=1.1%, 8=3.7%, 16=5.1%, 32=20.6%, >=64=68.1%
39
submit : 0=0.0%, 4=90.3%, 8=5.1%, 16=2.5%, 32=0.6%, 64=1.5%, >=64=0.0%
40
complete : 0=0.0%, 4=90.6%, 8=5.0%, 16=2.4%, 32=0.3%, 64=1.7%, >=64=0.0%
41
issued rwts: total=0,23973,0,0 short=0,0,0,0 dropped=0,0,0,0
42
latency : target=0, window=0, percentile=100.00%, depth=64
43
44
Run status group 0 (all jobs):
45
WRITE: bw=398MiB/s (417MB/s), 398MiB/s-398MiB/s (417MB/s-417MB/s), io=24.4GiB (26.2GB), run=62748-62748msec
46
47
Disk stats (read/write):
48
nvme0n1: ios=0/26784, merge=0/3086, ticks=0/56818328, in_queue=56837263, util=97.53%

通過執行隨機寫入來測試寫入 IOPS,使用 4 KB 的 I/O 塊大小和至少 256 的 I/O 深度

CloudShell
1
fio --name=write_iops --directory=$TEST_DIR --size=10G \
2
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
3
--verify=0 --bs=4K --iodepth=256 --rw=randwrite --group_reporting=1 \
4
--iodepth_batch_submit=256 --iodepth_batch_complete_max=256

結果

CloudShell
1
write_iops: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
2
fio-3.33
3
Starting 1 process
4
write_iops: Laying out IO file (1 file / 10240MiB)
5
Jobs: 1 (f=1): [w(1)][100.0%][w=385MiB/s][w=98.6k IOPS][eta 00m:00s]
6
write_iops: (groupid=0, jobs=1): err= 0: pid=740: Thu Sep 14 08:10:54 2023
7
write: IOPS=45.2k, BW=177MiB/s (185MB/s)(10.3GiB/60002msec); 0 zone resets
8
slat (usec): min=2, max=61504, avg=3333.70, stdev=1846.47
9
clat (nsec): min=1901, max=61911k, avg=1515277.46, stdev=1814343.11
10
lat (usec): min=78, max=65958, avg=4848.96, stdev=2271.94
11
clat percentiles (usec):
12
| 1.00th=[ 5], 5.00th=[ 7], 10.00th=[ 8], 20.00th=[ 10],
13
| 30.00th=[ 13], 40.00th=[ 115], 50.00th=[ 1123], 60.00th=[ 1811],
14
| 70.00th=[ 2343], 80.00th=[ 2900], 90.00th=[ 3818], 95.00th=[ 4621],
15
| 99.00th=[ 5997], 99.50th=[ 6521], 99.90th=[ 9241], 99.95th=[21627],
16
| 99.99th=[48497]
17
bw ( KiB/s): min=135543, max=400080, per=99.11%, avg=179134.47, stdev=37486.89, samples=119
18
iops : min=33885, max=100020, avg=44783.47, stdev=9371.73, samples=119
19
lat (usec) : 2=0.01%, 4=0.12%, 10=21.30%, 20=16.35%, 50=0.82%
20
lat (usec) : 100=1.15%, 250=1.78%, 500=2.25%, 750=2.14%, 1000=2.36%
21
lat (msec) : 2=15.69%, 4=27.46%, 10=8.47%, 20=0.04%, 50=0.04%
22
lat (msec) : 100=0.01%
23
cpu : usr=2.76%, sys=55.13%, ctx=63570, majf=0, minf=37
24
IO depths : 1=0.1%, 2=0.2%, 4=0.5%, 8=0.7%, 16=1.3%, 32=4.8%, >=64=92.5%
25
submit : 0=0.0%, 4=4.0%, 8=3.7%, 16=4.9%, 32=7.7%, 64=16.1%, >=64=63.6%
26
complete : 0=0.0%, 4=3.3%, 8=3.6%, 16=4.5%, 32=6.4%, 64=9.8%, >=64=72.4%
27
issued rwts: total=0,2711032,0,0 short=0,0,0,0 dropped=0,0,0,0
28
latency : target=0, window=0, percentile=100.00%, depth=256
29
30
Run status group 0 (all jobs):
31
WRITE: bw=177MiB/s (185MB/s), 177MiB/s-177MiB/s (185MB/s-185MB/s), io=10.3GiB (11.1GB), run=60002-60002msec
32
33
Disk stats (read/write):
34
nvme0n1: ios=0/2765681, merge=0/68424, ticks=0/1409012, in_queue=1409184, util=97.24%

通過執行多個並行流(16+)的順序讀取來測試讀取吞吐量,使用 1 MB 的 I/O 塊大小和至少 64 的 I/O 深度

CloudShell
1
fio --name=read_throughput --directory=$TEST_DIR --numjobs=16 \
2
--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
3
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \
4
--group_reporting=1 \
5
--iodepth_batch_submit=64 --iodepth_batch_complete_max=64

結果

CloudShell
1
read_throughput: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=64
2
...
3
fio-3.33
4
Starting 16 processes
5
read_throughput: Laying out IO file (1 file / 10240MiB)
6
read_throughput: Laying out IO file (1 file / 10240MiB)
7
read_throughput: Laying out IO file (1 file / 10240MiB)
8
read_throughput: Laying out IO file (1 file / 10240MiB)
9
read_throughput: Laying out IO file (1 file / 10240MiB)
10
read_throughput: Laying out IO file (1 file / 10240MiB)
11
read_throughput: Laying out IO file (1 file / 10240MiB)
12
read_throughput: Laying out IO file (1 file / 10240MiB)
13
read_throughput: Laying out IO file (1 file / 10240MiB)
14
read_throughput: Laying out IO file (1 file / 10240MiB)
15
read_throughput: Laying out IO file (1 file / 10240MiB)
16
read_throughput: Laying out IO file (1 file / 10240MiB)
17
read_throughput: Laying out IO file (1 file / 10240MiB)
18
read_throughput: Laying out IO file (1 file / 10240MiB)
19
read_throughput: Laying out IO file (1 file / 10240MiB)
20
read_throughput: Laying out IO file (1 file / 10240MiB)
21
Jobs: 16 (f=16): [R(16)][25.8%][r=14.0MiB/s][r=14 IOPS][eta 03m:04s]
22
read_throughput: (groupid=0, jobs=16): err= 0: pid=713: Thu Sep 14 09:25:47 2023
23
read: IOPS=686, BW=703MiB/s (737MB/s)(42.2GiB/61514msec)
24
slat (usec): min=22, max=2499, avg=213.27, stdev=189.48
25
clat (msec): min=930, max=3353, avg=1471.58, stdev=237.58
26
lat (msec): min=931, max=3353, avg=1471.79, stdev=237.58
27
clat percentiles (msec):
28
| 1.00th=[ 1011], 5.00th=[ 1150], 10.00th=[ 1200], 20.00th=[ 1284],
29
| 30.00th=[ 1334], 40.00th=[ 1401], 50.00th=[ 1469], 60.00th=[ 1519],
30
| 70.00th=[ 1586], 80.00th=[ 1653], 90.00th=[ 1720], 95.00th=[ 1838],
31
| 99.00th=[ 2165], 99.50th=[ 2635], 99.90th=[ 3138], 99.95th=[ 3205],
32
| 99.99th=[ 3272]
33
bw ( KiB/s): min=51220, max=1790161, per=100.00%, avg=728539.93, stdev=25696.80, samples=1901
34
iops : min= 50, max= 1748, avg=711.12, stdev=25.09, samples=1901
35
lat (msec) : 1000=0.63%, 2000=100.28%, >=2000=1.43%
36
cpu : usr=0.01%, sys=0.25%, ctx=14884, majf=0, minf=584
37
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=53.0%, >=64=46.7%
38
submit : 0=0.0%, 4=88.5%, 8=9.1%, 16=2.4%, 32=0.1%, 64=0.0%, >=64=0.0%
39
complete : 0=0.0%, 4=87.9%, 8=9.3%, 16=2.6%, 32=0.1%, 64=0.1%, >=64=0.0%
40
issued rwts: total=42233,0,0,0 short=0,0,0,0 dropped=0,0,0,0
41
latency : target=0, window=0, percentile=100.00%, depth=64
42
43
Run status group 0 (all jobs):
44
READ: bw=703MiB/s (737MB/s), 703MiB/s-703MiB/s (737MB/s-737MB/s), io=42.2GiB (45.3GB), run=61514-61514msec
45
46
Disk stats (read/write):
47
nvme0n1: ios=56662/25, merge=2858/66, ticks=81023493/46174, in_queue=81088222, util=99.04%

通過執行隨機讀取來測試讀取 IOPS,使用 4 KB 的 I/O 塊大小和至少 256 的 I/O 深度

CloudShell
1
fio --name=read_iops --directory=$TEST_DIR --size=10G \
2
--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \
3
--verify=0 --bs=4K --iodepth=256 --rw=randread --group_reporting=1 \
4
--iodepth_batch_submit=256 --iodepth_batch_complete_max=256

結果

CloudShell
1
read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=256
2
fio-3.33
3
Starting 1 process
4
read_iops: Laying out IO file (1 file / 10240MiB)
5
Jobs: 1 (f=1): [r(1)][100.0%][r=704MiB/s][r=180k IOPS][eta 00m:00s]
6
read_iops: (groupid=0, jobs=1): err= 0: pid=760: Thu Sep 14 08:21:50 2023
7
read: IOPS=180k, BW=702MiB/s (737MB/s)(41.2GiB/60002msec)
8
slat (usec): min=2, max=7014, avg=148.95, stdev=115.29
9
clat (usec): min=3, max=10971, avg=1239.28, stdev=482.94
10
lat (usec): min=115, max=11189, avg=1388.23, stdev=443.47
11
clat percentiles (usec):
12
| 1.00th=[ 227], 5.00th=[ 490], 10.00th=[ 685], 20.00th=[ 848],
13
| 30.00th=[ 963], 40.00th=[ 1057], 50.00th=[ 1172], 60.00th=[ 1369],
14
| 70.00th=[ 1549], 80.00th=[ 1663], 90.00th=[ 1795], 95.00th=[ 1909],
15
| 99.00th=[ 2638], 99.50th=[ 2900], 99.90th=[ 3228], 99.95th=[ 3556],
16
| 99.99th=[ 7111]
17
bw ( KiB/s): min=702896, max=723408, per=100.00%, avg=719921.18, stdev=2489.56, samples=119
18
iops : min=175724, max=180852, avg=179980.24, stdev=622.37, samples=119
19
lat (usec) : 4=0.01%, 10=0.25%, 20=0.22%, 50=0.09%, 100=0.06%
20
lat (usec) : 250=0.60%, 500=3.97%, 750=8.06%, 1000=20.59%
21
lat (msec) : 2=62.86%, 4=3.27%, 10=0.03%, 20=0.01%
22
cpu : usr=11.98%, sys=42.33%, ctx=141255, majf=0, minf=37
23
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=99.9%
24
submit : 0=0.0%, 4=30.7%, 8=7.4%, 16=11.2%, 32=15.8%, 64=21.8%, >=64=13.2%
25
complete : 0=0.0%, 4=32.3%, 8=6.7%, 16=9.7%, 32=13.1%, 64=21.4%, >=64=16.9%
26
issued rwts: total=10789453,0,0,0 short=0,0,0,0 dropped=0,0,0,0
27
latency : target=0, window=0, percentile=100.00%, depth=256
28
29
Run status group 0 (all jobs):
30
READ: bw=702MiB/s (737MB/s), 702MiB/s-702MiB/s (737MB/s-737MB/s), io=41.2GiB (44.2GB), run=60002-60002msec
31
32
Disk stats (read/write):
33
nvme0n1: ios=11138129/24, merge=11509/52, ticks=13567488/8, in_queue=13567499, util=99.82%

建立使用 hostPath volume 的 Pod

CloudShell
1
echo 'apiVersion: v1
2
kind: Pod
3
metadata:
4
name: "test-ssd"
5
spec:
6
containers:
7
- name: "shell"
8
image: "nginx"
9
volumeMounts:
10
- mountPath: "/test-ssd"
11
name: "test-ssd"
12
volumes:
13
- name: "test-ssd"
14
hostPath:
15
path: "/mnt/stateful_partition/kube-ephemeral-ssd"
16
nodeSelector:
17
cloud.google.com/gke-ephemeral-storage-local-ssd: "true" ' > localssd-hostpath.yaml
18
kubectl apply -f localssd-hostpath.yaml

檢查 local SSD

CloudShell
1
kubectl exec -it test-ssd -- bash
2
df -h /test-ssd
3
Filesystem Size Used Avail Use% Mounted on
4
/dev/nvme0n1 369G 2.5G 347G 1% /test-ssd

常見問題

local SSD 超出配額限制

錯誤訊息如下所示。

CloudShell
1
gcloud container node-pools create ssd-pool-3 --cluster=before-localssd --ephemeral-storage-local-ssd count=24 --machine-type=n1-standard-8 --zone=asia-east1-a
2
3
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=The number of local SSDs specified (24) exceeds the maximum allowed for this machine type (8).

解決方案:

  • 減少 local SSD 的數量
  • 選擇支援更多 local SSD 的機器類型

local SSD 無法掛載

錯誤訊息如下所示。

CloudShell
1
Events:
2
Type Reason Age From Message
3
---- ------ ---- ---- -------
4
Warning FailedMount 12s (x7 over 60s) kubelet MountVolume.SetUp failed for volume "test-ssd" : hostPath type check failed: /mnt/stateful_partition/kube-ephemeral-ssd is not a directory

解決方案:

  • 檢查節點是否具有 local SSD
  • 檢查 local SSD 的掛載路徑是否正確

參考資料

  1. Provision ephemeral storage with local SSDs
  2. Local SSD performance