故障描述
在客戶現(xiàn)場我們使用7.0u3c的新功能,vSAN 集群關(guān)閉向?qū)?,對vSAN群集進(jìn)行停機(jī)維護(hù)。該集群由四個(gè) Dell R940xa 節(jié)點(diǎn)組成,vCenter 位于非 vSAN 節(jié)點(diǎn)上。關(guān)機(jī)是通過各項(xiàng)預(yù)檢查,在拔掉電源之前vSAN主機(jī)已正確關(guān)閉。在重新啟動(dòng)vSAN集群后,所有 vSAN 虛擬機(jī)都被列為不可訪問,并且如果在數(shù)據(jù)存儲(chǔ)中瀏覽(通過 GUI 或命令行)是不可見的,但是vSAN的容量是正常的。
重啟集群的按鈕不存在,所以我們工程師按照kb通過命令行手動(dòng)重啟集群。然而,恢復(fù)腳本超時(shí):
[root@esxi-ip21:/tmp] python /usr/lib/vmware/vsan/bin/reboot_helper.py recover
Begin to recover the cluster...
Time among connected hosts are synchronized.
Scheduled vSAN cluster restore task.
Waiting for the scheduled task...(18s left)
Checking network status...
Recovery is not ready, retry after 10s...
Recovery is not ready, retry after 10s...
Recovery is not ready, retry after 10s...
Timeout, please try again later
在其他vSAN節(jié)點(diǎn)上挨個(gè)嘗試,仍然是一樣超時(shí),但是集群看起來已經(jīng)正確重組:
[root@esxi-ip24:~] esxcli vsan health cluster list -w
Health Test Name Status
--------------------------------------------------------------------- ------
Overall health red (vSAN Object health)
Data red
vSAN object health (objecthealth) red
vSAN object format health (objectformat) green
Performance service red
Stats DB object (statsdb) green
Stats primary election (masterexist) red
Network green
Hosts with connectivity issues (hostconnectivity) green
vSAN cluster partition (clusterpartition) green
All hosts have a vSAN vmknic configured (vsanvmknic) green
vSAN: Basic (unicast) connectivity check (smallping) green
vSAN: MTU check (ping with large packet size) (largeping) green
vMotion: Basic (unicast) connectivity check (vmotionpingsmall) green
vMotion: MTU check (ping with large packet size) (vmotionpinglarge) green
Network latency check (hostlatencycheck) green
Physical disk green
Operation health (physdiskoverall) green
Disk capacity (physdiskcapacity) green
Congestion (physdiskcongestion) green
Component limit health (physdiskcomplimithealth) green
Component metadata health (componentmetadata) green
Memory pools (heaps) (lsomheap) green
Memory pools (slabs) (lsomslab) green
Cluster green
Advanced vSAN configuration in sync (advcfgsync) green
vSAN daemon liveness (clomdliveness) green
vSAN Disk Balance (diskbalance) green
Resync operations throttling (resynclimit) green
Software version compatibility (upgradesoftware) green
Disk format version (upgradelowerhosts) green
Capacity utilization green
Storage space (diskspace) green
Read cache reservations (rcreservation) green
Component (nodecomponentlimit) green
What if the most consumed host fails (limit1hf) green
[root@esxi-ip21:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2022-04-05T06:59:46Z
Local Node UUID: 61706d49-8294-acd0-d16d-0c42a188a480
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Sub-Cluster Backup UUID: 61630325-2430-662c-2398-0c42a188cf94
Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
Sub-Cluster Membership Entry Revision: 5
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324, 61706d49-8294-acd0-d16d-0c42a188a480
Sub-Cluster Member HostNames: esxi-ip23, esxi-ip24, esxi-ip22, esxi-ip21
Sub-Cluster Membership UUID: 80c34b62-e2af-95a3-3cbc-0c42a18906f4
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.244
Mode: REGULAR
[root@esxi-ip22:~] localcli vsan cluster get
Cluster Information:
Enabled: true
Current Local Time: 2022-04-05T11:53:07Z
Local Node UUID: 61630d69-99d0-a086-19d6-0c42a188d324
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
Sub-Cluster Membership Entry Revision: 3
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.237
Mode: REGULAR
[root@esxi-ip23:~] localcli vsan cluster get
Cluster Information:
Enabled: true
Current Local Time: 2022-04-05T11:53:15Z
Local Node UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
Sub-Cluster Membership Entry Revision: 3
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.240
Mode: REGULAR
[root@esxi-ip23:~] esxcli vsan network list
[root@esxi-ip24:~] localcli vsan cluster get
Cluster Information:
Enabled: true
Current Local Time: 2022-04-05T11:53:22Z
Local Node UUID: 61630325-2430-662c-2398-0c42a188cf94
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 61630ac8-a9d3-010e-0f3b-0c42a1891c44
Sub-Cluster Backup UUID: 61706d49-8294-acd0-d16d-0c42a188a480
Sub-Cluster UUID: 521dc69b-43c2-c545-9394-ed3e2a26d54a
Sub-Cluster Membership Entry Revision: 3
Sub-Cluster Member Count: 4
Sub-Cluster Member UUIDs: 61630ac8-a9d3-010e-0f3b-0c42a1891c44, 61706d49-8294-acd0-d16d-0c42a188a480, 61630325-2430-662c-2398-0c42a188cf94, 61630d69-99d0-a086-19d6-0c42a188d324
Sub-Cluster Member HostNames: esxi-ip23, esxi-ip21, esxi-ip24, esxi-ip22
Sub-Cluster Membership UUID: 3eed4b62-8a16-340c-ee17-0c42a18906f4
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: be662064-2725-4b82-82aa-314d0f5628c8 14 2022-04-05T06:19:10.238
Mode: REGULAR
對vSAN存儲(chǔ)進(jìn)行檢查,仍未發(fā)現(xiàn)錯(cuò)誤:
[root@esxi-ip24:~] localcli vsan storage list | grep CMMDS
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
In CMMDS: true
繼續(xù)對每個(gè)節(jié)點(diǎn)網(wǎng)絡(luò)進(jìn)行檢查,仍未發(fā)現(xiàn)問題:
[root@esxi-ip24:~] esxcli vsan network list
Interface
VmkNic Name: vmk1
IP Protocol: IP
Interface UUID: 52b76165-9ffc-ef69-93ca-b0a31b7caf98
Agent Group Multicast Address: 224.2.3.4
Agent Group IPv6 Multicast Address: ff19::2:3:4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group IPv6 Multicast Address: ff19::1:2:3
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Data-in-Transit Encryption Key Exchange Port: 0
Multicast TTL: 5
Traffic Type: vsan
[root@esxi-ip24:~] vmkping -I vmk1 192.168.90.22
PING 192.168.90.22 (192.168.90.22): 56 data bytes
64 bytes from 192.168.90.22: icmp_seq=0 ttl=64 time=0.215 ms
64 bytes from 192.168.90.22: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from 192.168.90.22: icmp_seq=2 ttl=64 time=0.094 ms
解決方案
https://kb.vmware.com/s/article/87350
總結(jié)
使用超融合一定要買服務(wù),使用超融合一定要買服務(wù),使用超融合一定要買服務(wù)!