GKE Provision Ephemeral Storage with Local SSDs
Scenario
- 建立一個非 local SSD node pool 的 GKE 集群
- 使用 local SSD node pool 更新 GKE 集群
注意:
-
local SSD 需要機器類型 n1-standard-1 或更大; 預設機器類型 e2-medium 不支援。 您可以在 Compute Engine 文件中了解更多關於機器類型的訊息。
-
從版本 1.25.3-gke.1800 開始,GKE node pool 可以配置為使用具有 NVMe 接口的 local SSD 作為本地臨時存儲。要了解更多關於 GKE 上 local SSD 支援的訊息,請參見關於 local SSD。
步驟
建立 GKE 集群
建立不帶 local SSD 的 GKE 集群
連接到 GKE 集群
1gcloud container clusters get-credentials before-localssd --zone asia-east1-a --project PROJECT_ID更新 GKE 集群
1gcloud container node-pools create ssd-pool --cluster=before-localssd --ephemeral-storage-local-ssd count=1 --machine-type=n1-standard-2 --zone=asia-east1-a檢查 node pool 狀態
- gke-before-localssd-default-pool 是預設 node pool
- gke-before-localssd-ssd-pool 是 local SSD node pool
1$ kubectl get no2NAME STATUS ROLES AGE VERSION3gke-before-localssd-default-pool-77fef070-79g2 Ready <none> 34m v1.27.3-gke.1004gke-before-localssd-default-pool-77fef070-9rwj Ready <none> 34m v1.27.3-gke.1005gke-before-localssd-default-pool-77fef070-p285 Ready <none> 34m v1.27.3-gke.1006gke-before-localssd-ssd-pool-acba8d73-k2bd Ready <none> 27m v1.27.3-gke.1007gke-before-localssd-ssd-pool-acba8d73-ntk9 Ready <none> 27m v1.27.3-gke.1008gke-before-localssd-ssd-pool-acba8d73-p84z Ready <none> 27m v1.27.3-gke.100檢查 local SSD
通過 kubectl 命令檢查
1kubectl describe no gke-before-localssd-ssd-pool-acba8d73-k2bd | grep gke-ephemeral-storage-local-ssd=true2
3cloud.google.com/gke-ephemeral-storage-local-ssd=true通過 GCE 登錄檢查
ssh 進入 local SSD GCE 節點
1# df -h | grep ssd2/dev/nvme0n1 369G 2.0G 348G 1% /mnt/stateful_partition/kube-ephemeral-ssd3
4# fdisk -l /dev/disk/by-id/google-local-nvme-ssd-05Disk /dev/disk/by-id/google-local-nvme-ssd-0: 375 GiB, 402653184000 bytes, 98304000 sectors6Disk model: nvme_card7Units: sectors of 1 * 4096 = 4096 bytes8Sector size (logical/physical): 4096 bytes / 4096 bytes9I/O size (minimum/optimal): 4096 bytes / 4096 bytes建立 Pod 以利用 local SSD
建立使用臨時 volume 的 Pod
1echo 'apiVersion: v12kind: Pod3metadata:4 name: ssd-pod5spec:6 containers:7 - name: before-localssd8 image: "nginx"9 resources:10 requests:11 ephemeral-storage: "200Gi"12 volumeMounts:13 - mountPath: /cache14 name: scratch-volume15 nodeSelector:16 cloud.google.com/gke-ephemeral-storage-local-ssd: "true"17 volumes:18 - name: scratch-volume19 emptyDir: {}' > ssd-pod.yaml20kubectl apply -f ssd-pod.yamlfio 基準測試
1# 登錄 pod2kubectl exec -it ssd-pod -- bash安裝 fio 套件並進行基準測試
通過執行多個並行流(16+)的順序寫入來測試寫入吞吐量,使用 1 MB 的 I/O 塊大小和至少 64 的 I/O 深度。您還可以參考 local SSD 性能 來測試硬碟性能。
1apt update2apt install fio -y3TEST_DIR=/cache4
5fio --name=write_throughput --directory=$TEST_DIR --numjobs=16 --size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write --group_reporting=1 --iodepth_batch_submit=64 --iodepth_batch_complete_max=64結果
1write_throughput: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=642...3fio-3.334Starting 16 processes5write_throughput: Laying out IO file (1 file / 10240MiB)6write_throughput: Laying out IO file (1 file / 10240MiB)7write_throughput: Laying out IO file (1 file / 10240MiB)8write_throughput: Laying out IO file (1 file / 10240MiB)9write_throughput: Laying out IO file (1 file / 10240MiB)10write_throughput: Laying out IO file (1 file / 10240MiB)11write_throughput: Laying out IO file (1 file / 10240MiB)12write_throughput: Laying out IO file (1 file / 10240MiB)13write_throughput: Laying out IO file (1 file / 10240MiB)14write_throughput: Laying out IO file (1 file / 10240MiB)15write_throughput: Laying out IO file (1 file / 10240MiB)16write_throughput: Laying out IO file (1 file / 10240MiB)17write_throughput: Laying out IO file (1 file / 10240MiB)18write_throughput: Laying out IO file (1 file / 10240MiB)19write_throughput: Laying out IO file (1 file / 10240MiB)20write_throughput: Laying out IO file (1 file / 10240MiB)21Jobs: 1 (f=1): [_(8),W(1),_(7)][16.9%][w=406MiB/s][w=406 IOPS][eta 05m:29s]22write_throughput: (groupid=0, jobs=16): err= 0: pid=719: Thu Sep 14 07:56:37 202323 write: IOPS=382, BW=398MiB/s (417MB/s)(24.4GiB/62748msec); 0 zone resets24 slat (usec): min=59, max=2834.4k, avg=12741.82, stdev=69312.4525 clat (msec): min=4, max=6187, avg=2589.37, stdev=1428.7226 lat (msec): min=5, max=6188, avg=2601.89, stdev=1418.3827 clat percentiles (msec):28 | 1.00th=[ 75], 5.00th=[ 284], 10.00th=[ 659], 20.00th=[ 1183],29 | 30.00th=[ 1670], 40.00th=[ 2039], 50.00th=[ 2601], 60.00th=[ 3104],30 | 70.00th=[ 3507], 80.00th=[ 3910], 90.00th=[ 4463], 95.00th=[ 4799],31 | 99.00th=[ 5604], 99.50th=[ 5873], 99.90th=[ 5940], 99.95th=[ 5940],32 | 99.99th=[ 6208]33 bw ( MiB/s): min= 39, max= 2865, per=100.00%, avg=1090.83, stdev=49.21, samples=71334 iops : min= 39, max= 2864, avg=1090.17, stdev=49.19, samples=71335 lat (msec) : 10=0.12%, 20=0.08%, 50=0.43%, 100=0.82%, 250=3.51%36 lat (msec) : 500=3.85%, 750=3.11%, 1000=4.46%, 2000=23.62%, >=2000=63.80%37 cpu : usr=0.16%, sys=0.12%, ctx=9475, majf=0, minf=60038 IO depths : 1=0.8%, 2=0.3%, 4=1.1%, 8=3.7%, 16=5.1%, 32=20.6%, >=64=68.1%39 submit : 0=0.0%, 4=90.3%, 8=5.1%, 16=2.5%, 32=0.6%, 64=1.5%, >=64=0.0%40 complete : 0=0.0%, 4=90.6%, 8=5.0%, 16=2.4%, 32=0.3%, 64=1.7%, >=64=0.0%41 issued rwts: total=0,23973,0,0 short=0,0,0,0 dropped=0,0,0,042 latency : target=0, window=0, percentile=100.00%, depth=6443
44Run status group 0 (all jobs):45 WRITE: bw=398MiB/s (417MB/s), 398MiB/s-398MiB/s (417MB/s-417MB/s), io=24.4GiB (26.2GB), run=62748-62748msec46
47Disk stats (read/write):48 nvme0n1: ios=0/26784, merge=0/3086, ticks=0/56818328, in_queue=56837263, util=97.53%通過執行隨機寫入來測試寫入 IOPS,使用 4 KB 的 I/O 塊大小和至少 256 的 I/O 深度
1fio --name=write_iops --directory=$TEST_DIR --size=10G \2--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \3--verify=0 --bs=4K --iodepth=256 --rw=randwrite --group_reporting=1 \4--iodepth_batch_submit=256 --iodepth_batch_complete_max=256結果
1write_iops: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=2562fio-3.333Starting 1 process4write_iops: Laying out IO file (1 file / 10240MiB)5Jobs: 1 (f=1): [w(1)][100.0%][w=385MiB/s][w=98.6k IOPS][eta 00m:00s]6write_iops: (groupid=0, jobs=1): err= 0: pid=740: Thu Sep 14 08:10:54 20237 write: IOPS=45.2k, BW=177MiB/s (185MB/s)(10.3GiB/60002msec); 0 zone resets8 slat (usec): min=2, max=61504, avg=3333.70, stdev=1846.479 clat (nsec): min=1901, max=61911k, avg=1515277.46, stdev=1814343.1110 lat (usec): min=78, max=65958, avg=4848.96, stdev=2271.9411 clat percentiles (usec):12 | 1.00th=[ 5], 5.00th=[ 7], 10.00th=[ 8], 20.00th=[ 10],13 | 30.00th=[ 13], 40.00th=[ 115], 50.00th=[ 1123], 60.00th=[ 1811],14 | 70.00th=[ 2343], 80.00th=[ 2900], 90.00th=[ 3818], 95.00th=[ 4621],15 | 99.00th=[ 5997], 99.50th=[ 6521], 99.90th=[ 9241], 99.95th=[21627],16 | 99.99th=[48497]17 bw ( KiB/s): min=135543, max=400080, per=99.11%, avg=179134.47, stdev=37486.89, samples=11918 iops : min=33885, max=100020, avg=44783.47, stdev=9371.73, samples=11919 lat (usec) : 2=0.01%, 4=0.12%, 10=21.30%, 20=16.35%, 50=0.82%20 lat (usec) : 100=1.15%, 250=1.78%, 500=2.25%, 750=2.14%, 1000=2.36%21 lat (msec) : 2=15.69%, 4=27.46%, 10=8.47%, 20=0.04%, 50=0.04%22 lat (msec) : 100=0.01%23 cpu : usr=2.76%, sys=55.13%, ctx=63570, majf=0, minf=3724 IO depths : 1=0.1%, 2=0.2%, 4=0.5%, 8=0.7%, 16=1.3%, 32=4.8%, >=64=92.5%25 submit : 0=0.0%, 4=4.0%, 8=3.7%, 16=4.9%, 32=7.7%, 64=16.1%, >=64=63.6%26 complete : 0=0.0%, 4=3.3%, 8=3.6%, 16=4.5%, 32=6.4%, 64=9.8%, >=64=72.4%27 issued rwts: total=0,2711032,0,0 short=0,0,0,0 dropped=0,0,0,028 latency : target=0, window=0, percentile=100.00%, depth=25629
30Run status group 0 (all jobs):31 WRITE: bw=177MiB/s (185MB/s), 177MiB/s-177MiB/s (185MB/s-185MB/s), io=10.3GiB (11.1GB), run=60002-60002msec32
33Disk stats (read/write):34 nvme0n1: ios=0/2765681, merge=0/68424, ticks=0/1409012, in_queue=1409184, util=97.24%通過執行多個並行流(16+)的順序讀取來測試讀取吞吐量,使用 1 MB 的 I/O 塊大小和至少 64 的 I/O 深度
1fio --name=read_throughput --directory=$TEST_DIR --numjobs=16 \2--size=10G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \3--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=read \4--group_reporting=1 \5--iodepth_batch_submit=64 --iodepth_batch_complete_max=64結果
1read_throughput: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=642...3fio-3.334Starting 16 processes5read_throughput: Laying out IO file (1 file / 10240MiB)6read_throughput: Laying out IO file (1 file / 10240MiB)7read_throughput: Laying out IO file (1 file / 10240MiB)8read_throughput: Laying out IO file (1 file / 10240MiB)9read_throughput: Laying out IO file (1 file / 10240MiB)10read_throughput: Laying out IO file (1 file / 10240MiB)11read_throughput: Laying out IO file (1 file / 10240MiB)12read_throughput: Laying out IO file (1 file / 10240MiB)13read_throughput: Laying out IO file (1 file / 10240MiB)14read_throughput: Laying out IO file (1 file / 10240MiB)15read_throughput: Laying out IO file (1 file / 10240MiB)16read_throughput: Laying out IO file (1 file / 10240MiB)17read_throughput: Laying out IO file (1 file / 10240MiB)18read_throughput: Laying out IO file (1 file / 10240MiB)19read_throughput: Laying out IO file (1 file / 10240MiB)20read_throughput: Laying out IO file (1 file / 10240MiB)21Jobs: 16 (f=16): [R(16)][25.8%][r=14.0MiB/s][r=14 IOPS][eta 03m:04s]22read_throughput: (groupid=0, jobs=16): err= 0: pid=713: Thu Sep 14 09:25:47 202323 read: IOPS=686, BW=703MiB/s (737MB/s)(42.2GiB/61514msec)24 slat (usec): min=22, max=2499, avg=213.27, stdev=189.4825 clat (msec): min=930, max=3353, avg=1471.58, stdev=237.5826 lat (msec): min=931, max=3353, avg=1471.79, stdev=237.5827 clat percentiles (msec):28 | 1.00th=[ 1011], 5.00th=[ 1150], 10.00th=[ 1200], 20.00th=[ 1284],29 | 30.00th=[ 1334], 40.00th=[ 1401], 50.00th=[ 1469], 60.00th=[ 1519],30 | 70.00th=[ 1586], 80.00th=[ 1653], 90.00th=[ 1720], 95.00th=[ 1838],31 | 99.00th=[ 2165], 99.50th=[ 2635], 99.90th=[ 3138], 99.95th=[ 3205],32 | 99.99th=[ 3272]33 bw ( KiB/s): min=51220, max=1790161, per=100.00%, avg=728539.93, stdev=25696.80, samples=190134 iops : min= 50, max= 1748, avg=711.12, stdev=25.09, samples=190135 lat (msec) : 1000=0.63%, 2000=100.28%, >=2000=1.43%36 cpu : usr=0.01%, sys=0.25%, ctx=14884, majf=0, minf=58437 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=53.0%, >=64=46.7%38 submit : 0=0.0%, 4=88.5%, 8=9.1%, 16=2.4%, 32=0.1%, 64=0.0%, >=64=0.0%39 complete : 0=0.0%, 4=87.9%, 8=9.3%, 16=2.6%, 32=0.1%, 64=0.1%, >=64=0.0%40 issued rwts: total=42233,0,0,0 short=0,0,0,0 dropped=0,0,0,041 latency : target=0, window=0, percentile=100.00%, depth=6442
43Run status group 0 (all jobs):44 READ: bw=703MiB/s (737MB/s), 703MiB/s-703MiB/s (737MB/s-737MB/s), io=42.2GiB (45.3GB), run=61514-61514msec45
46Disk stats (read/write):47 nvme0n1: ios=56662/25, merge=2858/66, ticks=81023493/46174, in_queue=81088222, util=99.04%通過執行隨機讀取來測試讀取 IOPS,使用 4 KB 的 I/O 塊大小和至少 256 的 I/O 深度
1fio --name=read_iops --directory=$TEST_DIR --size=10G \2--time_based --runtime=60s --ramp_time=2s --ioengine=libaio --direct=1 \3--verify=0 --bs=4K --iodepth=256 --rw=randread --group_reporting=1 \4--iodepth_batch_submit=256 --iodepth_batch_complete_max=256結果
1read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=2562fio-3.333Starting 1 process4read_iops: Laying out IO file (1 file / 10240MiB)5Jobs: 1 (f=1): [r(1)][100.0%][r=704MiB/s][r=180k IOPS][eta 00m:00s]6read_iops: (groupid=0, jobs=1): err= 0: pid=760: Thu Sep 14 08:21:50 20237 read: IOPS=180k, BW=702MiB/s (737MB/s)(41.2GiB/60002msec)8 slat (usec): min=2, max=7014, avg=148.95, stdev=115.299 clat (usec): min=3, max=10971, avg=1239.28, stdev=482.9410 lat (usec): min=115, max=11189, avg=1388.23, stdev=443.4711 clat percentiles (usec):12 | 1.00th=[ 227], 5.00th=[ 490], 10.00th=[ 685], 20.00th=[ 848],13 | 30.00th=[ 963], 40.00th=[ 1057], 50.00th=[ 1172], 60.00th=[ 1369],14 | 70.00th=[ 1549], 80.00th=[ 1663], 90.00th=[ 1795], 95.00th=[ 1909],15 | 99.00th=[ 2638], 99.50th=[ 2900], 99.90th=[ 3228], 99.95th=[ 3556],16 | 99.99th=[ 7111]17 bw ( KiB/s): min=702896, max=723408, per=100.00%, avg=719921.18, stdev=2489.56, samples=11918 iops : min=175724, max=180852, avg=179980.24, stdev=622.37, samples=11919 lat (usec) : 4=0.01%, 10=0.25%, 20=0.22%, 50=0.09%, 100=0.06%20 lat (usec) : 250=0.60%, 500=3.97%, 750=8.06%, 1000=20.59%21 lat (msec) : 2=62.86%, 4=3.27%, 10=0.03%, 20=0.01%22 cpu : usr=11.98%, sys=42.33%, ctx=141255, majf=0, minf=3723 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=99.9%24 submit : 0=0.0%, 4=30.7%, 8=7.4%, 16=11.2%, 32=15.8%, 64=21.8%, >=64=13.2%25 complete : 0=0.0%, 4=32.3%, 8=6.7%, 16=9.7%, 32=13.1%, 64=21.4%, >=64=16.9%26 issued rwts: total=10789453,0,0,0 short=0,0,0,0 dropped=0,0,0,027 latency : target=0, window=0, percentile=100.00%, depth=25628
29Run status group 0 (all jobs):30 READ: bw=702MiB/s (737MB/s), 702MiB/s-702MiB/s (737MB/s-737MB/s), io=41.2GiB (44.2GB), run=60002-60002msec31
32Disk stats (read/write):33 nvme0n1: ios=11138129/24, merge=11509/52, ticks=13567488/8, in_queue=13567499, util=99.82%建立使用 hostPath volume 的 Pod
1echo 'apiVersion: v12kind: Pod3metadata:4 name: "test-ssd"5spec:6 containers:7 - name: "shell"8 image: "nginx"9 volumeMounts:10 - mountPath: "/test-ssd"11 name: "test-ssd"12 volumes:13 - name: "test-ssd"14 hostPath:15 path: "/mnt/stateful_partition/kube-ephemeral-ssd"16 nodeSelector:17 cloud.google.com/gke-ephemeral-storage-local-ssd: "true" ' > localssd-hostpath.yaml18kubectl apply -f localssd-hostpath.yaml檢查 local SSD
1kubectl exec -it test-ssd -- bash2df -h /test-ssd3Filesystem Size Used Avail Use% Mounted on4/dev/nvme0n1 369G 2.5G 347G 1% /test-ssd常見問題
local SSD 超出配額限制
錯誤訊息如下所示。
1gcloud container node-pools create ssd-pool-3 --cluster=before-localssd --ephemeral-storage-local-ssd count=24 --machine-type=n1-standard-8 --zone=asia-east1-a2
3ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=The number of local SSDs specified (24) exceeds the maximum allowed for this machine type (8).解決方案:
- 減少 local SSD 的數量
- 選擇支援更多 local SSD 的機器類型
local SSD 無法掛載
錯誤訊息如下所示。
1Events:2 Type Reason Age From Message3 ---- ------ ---- ---- -------4 Warning FailedMount 12s (x7 over 60s) kubelet MountVolume.SetUp failed for volume "test-ssd" : hostPath type check failed: /mnt/stateful_partition/kube-ephemeral-ssd is not a directory解決方案:
- 檢查節點是否具有 local SSD
- 檢查 local SSD 的掛載路徑是否正確