[613405.736532] mlx5_core 0000:2a:00.0: poll_health:971:(pid 0): device's health compromised - reached miss count
[613405.737166] mlx5_core 0000:2a:00.0: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[613405.738196] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[0] 0x00000000
[613405.738781] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[1] 0x00000000
[613405.739334] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[2] 0x00000000
[613405.739904] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[3] 0x00000000
[613405.740465] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[4] 0x00000000
[613405.741018] mlx5_core 0000:2a:00.0: print_health_info:495:(pid 0): assert_var[5] 0x00000000
[613405.741550] mlx5_core 0000:2a:00.0: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8
[613405.742070] mlx5_core 0000:2a:00.0: print_health_info:499:(pid 0): assert_callra 0x20a26488
[613405.742589] mlx5_core 0000:2a:00.0: print_health_info:500:(pid 0): fw_ver 26.35.2000
[613405.743089] mlx5_core 0000:2a:00.0: print_health_info:502:(pid 0): time 0
[613405.743575] mlx5_core 0000:2a:00.0: print_health_info:503:(pid 0): hw_id 0x00000216
[613405.744054] mlx5_core 0000:2a:00.0: print_health_info:504:(pid 0): rfr 0
[613405.744522] mlx5_core 0000:2a:00.0: print_health_info:505:(pid 0): severity 3 (ERROR)
[613405.744989] mlx5_core 0000:2a:00.0: print_health_info:506:(pid 0): irisc_index 7
[613405.745405] mlx5_core 0000:2a:00.0: print_health_info:507:(pid 0): synd 0x1: firmware internal error
[613405.745840] mlx5_core 0000:2a:00.0: print_health_info:509:(pid 0): ext_synd 0x8a02
[613405.746281] mlx5_core 0000:2a:00.0: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0
[613406.278016] mlx5_core 0000:be:00.1 ens6f1np1: Link up
[613406.285205] 8021q: adding VLAN 0 to HW filter on device ens6f1np1
[613406.325260] IPv6: ADDRCONF(NETDEV_CHANGE): ens6f1np1: link becomes ready
[613406.824530] mlx5_core 0000:2a:00.1: poll_health:971:(pid 0): device's health compromised - reached miss count
[613406.825059] mlx5_core 0000:2a:00.1: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[613406.825930] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[0] 0x00000000
[613406.826328] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[1] 0x00000000
[613406.826758] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[2] 0x00000000
[613406.827118] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[3] 0x00000000
[613406.827503] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[4] 0x00000000
[613406.827840] mlx5_core 0000:2a:00.1: print_health_info:495:(pid 0): assert_var[5] 0x00000000
[613406.828167] mlx5_core 0000:2a:00.1: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8
[613406.828489] mlx5_core 0000:2a:00.1: print_health_info:499:(pid 0): assert_callra 0x20a26488
[613406.828819] mlx5_core 0000:2a:00.1: print_health_info:500:(pid 0): fw_ver 26.35.2000
[613406.829126] mlx5_core 0000:2a:00.1: print_health_info:502:(pid 0): time 0
[613406.829434] mlx5_core 0000:2a:00.1: print_health_info:503:(pid 0): hw_id 0x00000216
[613406.829781] mlx5_core 0000:2a:00.1: print_health_info:504:(pid 0): rfr 0
[613406.830129] mlx5_core 0000:2a:00.1: print_health_info:505:(pid 0): severity 3 (ERROR)
[613406.830479] mlx5_core 0000:2a:00.1: print_health_info:506:(pid 0): irisc_index 7
[613406.830827] mlx5_core 0000:2a:00.1: print_health_info:507:(pid 0): synd 0x1: firmware internal error
[613406.831150] mlx5_core 0000:2a:00.1: print_health_info:509:(pid 0): ext_synd 0x8a02
[613406.831485] mlx5_core 0000:2a:00.1: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0
[613406.888534] mlx5_core 0000:be:00.0: poll_health:971:(pid 0): device's health compromised - reached miss count
[613406.888971] mlx5_core 0000:be:00.0: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[613406.889684] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[0] 0x00000000
[613406.890047] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[1] 0x00000000
[613406.890392] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[2] 0x00000000
[613406.890720] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[3] 0x00000000
[613406.891010] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[4] 0x00000000
[613406.891308] mlx5_core 0000:be:00.0: print_health_info:495:(pid 0): assert_var[5] 0x00000000
[613406.891605] mlx5_core 0000:be:00.0: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8
[613406.891893] mlx5_core 0000:be:00.0: print_health_info:499:(pid 0): assert_callra 0x20a26488
[613406.892171] mlx5_core 0000:be:00.0: print_health_info:500:(pid 0): fw_ver 26.35.2000
[613406.892438] mlx5_core 0000:be:00.0: print_health_info:502:(pid 0): time 0
[613406.892705] mlx5_core 0000:be:00.0: print_health_info:503:(pid 0): hw_id 0x00000216
[613406.892999] mlx5_core 0000:be:00.0: print_health_info:504:(pid 0): rfr 0
[613406.893296] mlx5_core 0000:be:00.0: print_health_info:505:(pid 0): severity 3 (ERROR)
[613406.893602] mlx5_core 0000:be:00.0: print_health_info:506:(pid 0): irisc_index 7
[613406.893909] mlx5_core 0000:be:00.0: print_health_info:507:(pid 0): synd 0x1: firmware internal error
[613406.894213] mlx5_core 0000:be:00.0: print_health_info:509:(pid 0): ext_synd 0x8a02
[613406.894519] mlx5_core 0000:be:00.0: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0
[613407.976530] mlx5_core 0000:be:00.1: poll_health:971:(pid 0): device's health compromised - reached miss count
[613407.976887] mlx5_core 0000:be:00.1: print_health_info:491:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[613407.977530] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[0] 0x00000000
[613407.977875] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[1] 0x00000000
[613407.978170] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[2] 0x00000000
[613407.978499] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[3] 0x00000000
[613407.978816] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[4] 0x00000000
[613407.979092] mlx5_core 0000:be:00.1: print_health_info:495:(pid 0): assert_var[5] 0x00000000
[613407.979379] mlx5_core 0000:be:00.1: print_health_info:498:(pid 0): assert_exit_ptr 0x20a202c8
[613407.979656] mlx5_core 0000:be:00.1: print_health_info:499:(pid 0): assert_callra 0x20a26488
[613407.979932] mlx5_core 0000:be:00.1: print_health_info:500:(pid 0): fw_ver 26.35.2000
[613407.980196] mlx5_core 0000:be:00.1: print_health_info:502:(pid 0): time 0
[613407.980464] mlx5_core 0000:be:00.1: print_health_info:503:(pid 0): hw_id 0x00000216
[613407.980729] mlx5_core 0000:be:00.1: print_health_info:504:(pid 0): rfr 0
[613407.980995] mlx5_core 0000:be:00.1: print_health_info:505:(pid 0): severity 3 (ERROR)
[613407.981273] mlx5_core 0000:be:00.1: print_health_info:506:(pid 0): irisc_index 7
[613407.981559] mlx5_core 0000:be:00.1: print_health_info:507:(pid 0): synd 0x1: firmware internal error
[613407.981840] mlx5_core 0000:be:00.1: print_health_info:509:(pid 0): ext_synd 0x8a02
[613407.982121] mlx5_core 0000:be:00.1: print_health_info:510:(pid 0): raw fw_ver 0x1a2307d0
QuadEn參數說明:
QuadEn為1表示Flash工作在四線模式,QuadEn為0表示Flash工作在二線模式。
四線模式、二線模式是Flash與SPIFLash燒寫器、網卡FW的通訊方式,四線模式的速率會優于二線模式,某些情況下,當FW向Flash讀取數據時,如果Flash工作于二線模式,由于速率的限制,可能不能及時響應FW的請求,會導致FW運行出現些問題。
網卡上電過程中,FW會向Flash讀取數據,FW首先會檢查Fash是否支持四線模式,如果支持則采用四線模式通訊,不支持則采用二線模式通訊。
問題結論:
開啟固件的OuadEn參數。
解決方案:
測試過程中用的網卡沒有經過生產的FT階段, 在生產的FT階段會開啟
修改方法:
參考《ip link set down關閉后link燈依然點亮》安裝mft工具修改固件參數QuadEn,重啟生效