From: "Bills, Jason M" <jason.m.bills@linux.intel.com>
To: openbmc@lists.ozlabs.org
Subject: Re: [Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue?
Date: Mon, 9 Dec 2024 10:08:37 -0700 [thread overview]
Message-ID: <8ea7caf7-2f46-4bf8-bf44-cc3061a3ebf0@linux.intel.com> (raw)
In-Reply-To: <LV8PR08MB94865788E99E0FC65AB42B119A372@LV8PR08MB9486.namprd08.prod.outlook.com>
On 12/3/2024 9:44 PM, Jacky Lee (TPI) wrote:
> Hi sir,
>
> We have an Intel Birch Stream platform, and our BMC FW developer are
> implementing OpenBMC onto it with a DC-SCM module, the BMC chip is
> ASPEED AST2600 and the RoT is ASPEED AST1060.
>
> We got an issue that sometimes the specific sensors get no reading after
> AC power cycle the system, the failure rate is about 12%, below is the
> example log:
>
> CPU1_PVCCA_EHV | 00h | ok | 0.1 | 0.39 Amps
> CPU1_PVCCA_EHV | 01h | ok | 0.1 | 2 Amps
> CPU1_PVCCD0 | 02h | ok | 0.1 | 0.16 Amps
> CPU1_PVCCD0 | 03h | ok | 0.1 | 2 Amps
> CPU1_PVCCFA_EHV_ | 04h | ok | 0.1 | 3.90 Amps
> CPU1_PVCCFA_EHV_ | 05h | ok | 0.1 | 29 Amps
> CPU1_PVCCINF | 06h | ok | 0.1 | 1.25 Amps
> CPU1_PVCCINF | 07h | ok | 0.1 | 17 Amps
> CPU1_PVNN | 08h | ok | 0.1 | 0.08 Amps
> CPU1_PVNN | 09h | ok | 0.1 | 1 Amps
> CPU1_VCCIN | 0Ah | ok | 0.1 | 255 Amps
> CPU2_PVCCA_EHV | 0Bh | ok | 0.1 | 0.39 Amps
> CPU2_PVCCA_EHV | 0Ch | ok | 0.1 | 2 Amps
> CPU2_PVCCD0 | 0Dh | ok | 0.1 | 0.31 Amps
> CPU2_PVCCD0 | 0Eh | ok | 0.1 | 3 Amps
> CPU2_PVCCFA_EHV_ | 0Fh | ok | 0.1 | 4.76 Amps
> CPU2_PVCCFA_EHV_ | 10h | ok | 0.1 | 26 Amps
> CPU2_PVCCINF | 11h | ok | 0.1 | 1.09 Amps
> CPU2_PVCCINF | 12h | ok | 0.1 | 15 Amps
> CPU2_PVNN | 13h | ok | 0.1 | 0.08 Amps
> CPU2_PVNN | 14h | ok | 0.1 | 1 Amps
> CPU2_VCCIN | 15h | ok | 0.1 | 255 Amps
> FAN0_INLET_PWM | 16h | ok | 0.1 | 29.79 unspecifi
> FAN0_OUTLET_PWM | 17h | ok | 0.1 | 29.79 unspecifi
> FAN1_INLET_PWM | 18h | ok | 0.1 | 29.79 unspecifi
> FAN1_OUTLET_PWM | 19h | ok | 0.1 | 29.79 unspecifi
> FAN2_INLET_PWM | 1Ah | ok | 0.1 | 29.79 unspecifi
> FAN2_OUTLET_PWM | 1Bh | ok | 0.1 | 29.79 unspecifi
> FAN3_INLET_PWM | 1Ch | ok | 0.1 | 29.79 unspecifi
> FAN3_OUTLET_PWM | 1Dh | ok | 0.1 | 29.79 unspecifi
> FAN0_INLET_TACH | 1Eh | ok | 0.1 | 5292 RPM
> FAN0_OUTLET_TACH | 1Fh | ok | 0.1 | 4508 RPM
> FAN1_INLET_TACH | 20h | ok | 0.1 | 5390 RPM
> FAN1_OUTLET_TACH | 21h | ok | 0.1 | 4508 RPM
> FAN2_INLET_TACH | 22h | ok | 0.1 | 5390 RPM
> FAN2_OUTLET_TACH | 23h | ok | 0.1 | 4606 RPM
> FAN3_INLET_TACH | 24h | ok | 0.1 | 5390 RPM
> FAN3_OUTLET_TACH | 25h | ok | 0.1 | 4606 RPM
> CPU1_PVCCA_EHV | 26h | ok | 0.1 | 0 Watts
> CPU1_PVCCA_EHV | 27h | ok | 0.1 | 0 Watts
> CPU1_PVCCFA_EHV_ | 28h | ok | 0.1 | 59 Watts
> CPU1_PVCCFA_EHV_ | 29h | ok | 0.1 | 47.20 Watts
> CPU1_PVCCINF | 2Ah | ok | 0.1 | 11.80 Watts
> CPU1_PVCCINF | 2Bh | ok | 0.1 | 11.80 Watts
> CPU1_VCCIN | 2Ch | ok | 0.1 | 82.60 Watts
> CPU1_VCCIN | 2Dh | ok | 0.1 | 70.80 Watts
> CPU2_PVCCA_EHV | 2Eh | ok | 0.1 | 0 Watts
> CPU2_PVCCA_EHV | 2Fh | ok | 0.1 | 0 Watts
> CPU2_PVCCFA_EHV_ | 30h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCFA_EHV_ | 31h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCINF | 32h | ok | 0.1 | 11.80 Watts
> CPU2_PVCCINF | 33h | ok | 0.1 | 11.80 Watts
> CPU2_VCCIN | 34h | ok | 0.1 | 82.60 Watts
> CPU2_VCCIN | 35h | ok | 0.1 | 70.80 Watts
> Cpu_Power_Averag | 36h | ok | 0.1 | 124 Watts
> *Cpu_Power_Averag | 37h | ns | 0.1 | No Reading*
> Cpu_Power_Cap_CP | 38h | ok | 0.1 | 0 Watts
> *Cpu_Power_Cap_CP | 39h | ns | 0.1 | No Reading*
> Dimm_Power_Avera | 3Ah | ok | 0.1 | 300 Watts
> *Dimm_Power_Avera | 3Bh | ns | 0.1 | No Reading*
> Dimm_Power_Cap_C | 3Ch | ok | 0.1 | 0 Watts
> CPU1_PVCCA_Contr | 3Eh | ok | 0.1 | 34 degrees C
> CPU1_PVCCA_EHV | 3Fh | ok | 0.1 | 34 degrees C
> CPU1_PVCCD0 | 40h | ok | 0.1 | 42 degrees C
> CPU1_PVCCFA_Cont | 41h | ok | 0.1 | 43 degrees C
> CPU1_PVCCFA_EHV_ | 42h | ok | 0.1 | 44 degrees C
> CPU1_VCCIN | 43h | ok | 0.1 | 49 degrees C
> CPU2_PVCCA_Contr | 44h | ok | 0.1 | 33 degrees C
> CPU2_PVCCA_EHV | 45h | ok | 0.1 | 33 degrees C
> CPU2_PVCCD0 | 46h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_Cont | 47h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_EHV_ | 48h | ok | 0.1 | 44 degrees C
> CPU2_VCCIN | 49h | ok | 0.1 | 51 degrees C
> *DIMM_A1_CPU1 | 4Ah | ns | 0.1 | No Reading*
> DIMM_A1_CPU2 | 4Bh | ok | 0.1 | 36 degrees C
> *DIMM_A2_CPU1 | 4Ch | ns | 0.1 | No Reading*
> DIMM_A2_CPU2 | 4Dh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU1 | 4Eh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU2 | 4Fh | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU1 | 50h | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU2 | 51h | ok | 0.1 | 36 degrees C
> DIMM_C1_CPU1 | 52h | ok | 0.1 | 35 degrees C
> DIMM_C1_CPU2 | 53h | ok | 0.1 | 36 degrees C
> DIMM_C2_CPU1 | 54h | ok | 0.1 | 35 degrees C
> DIMM_C2_CPU2 | 55h | ok | 0.1 | 36 degrees C
> DIMM_D1_CPU1 | 56h | ok | 0.1 | 34 degrees C
> DIMM_D1_CPU2 | 57h | ok | 0.1 | 36 degrees C
> DIMM_D2_CPU1 | 58h | ok | 0.1 | 34 degrees C
> DIMM_D2_CPU2 | 59h | ok | 0.1 | 36 degrees C
> DIMM_E1_CPU1 | 5Ah | ok | 0.1 | 34 degrees C
> DIMM_E1_CPU2 | 5Bh | ok | 0.1 | 35 degrees C
> DIMM_E2_CPU1 | 5Ch | ok | 0.1 | 34 degrees C
> DIMM_E2_CPU2 | 5Dh | ok | 0.1 | 35 degrees C
> DIMM_F1_CPU1 | 5Eh | ok | 0.1 | 32 degrees C
> DIMM_F1_CPU2 | 5Fh | ok | 0.1 | 34 degrees C
> DIMM_F2_CPU1 | 60h | ok | 0.1 | 32 degrees C
> DIMM_F2_CPU2 | 61h | ok | 0.1 | 34 degrees C
> DIMM_G1_CPU1 | 62h | ok | 0.1 | 37 degrees C
> DIMM_G1_CPU2 | 63h | ok | 0.1 | 35 degrees C
> DIMM_G2_CPU1 | 64h | ok | 0.1 | 37 degrees C
> DIMM_G2_CPU2 | 65h | ok | 0.1 | 35 degrees C
> DIMM_H1_CPU1 | 66h | ok | 0.1 | 37 degrees C
> DIMM_H1_CPU2 | 67h | ok | 0.1 | 35 degrees C
> DIMM_H2_CPU1 | 68h | ok | 0.1 | 37 degrees C
> DIMM_H2_CPU2 | 69h | ok | 0.1 | 35 degrees C
> DIMM_I1_CPU1 | 6Ah | ok | 0.1 | 36 degrees C
> DIMM_I1_CPU2 | 6Bh | ok | 0.1 | 35 degrees C
> DIMM_I2_CPU1 | 6Ch | ok | 0.1 | 36 degrees C
> DIMM_I2_CPU2 | 6Dh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU1 | 6Eh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU2 | 6Fh | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU1 | 70h | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU2 | 71h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU1 | 72h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU2 | 73h | ok | 0.1 | 34 degrees C
> DIMM_K2_CPU1 | 74h | ok | 0.1 | 35 degrees C
> DIMM_K2_CPU2 | 75h | ok | 0.1 | 34 degrees C
> DIMM_L1_CPU1 | 76h | ok | 0.1 | 35 degrees C
> DIMM_L1_CPU2 | 77h | ok | 0.1 | 34 degrees C
> DIMM_L2_CPU1 | 78h | ok | 0.1 | 35 degrees C
> DIMM_L2_CPU2 | 79h | ok | 0.1 | 34 degrees C
> DTS_CPU1 | 7Ah | ok | 0.1 | 57 degrees C
> *DTS_CPU2 | 7Bh | ns | 0.1 | No Reading*
> Die_CPU1 | 7Ch | ok | 0.1 | 57 degrees C
> *Die_CPU2 | 7Dh | ns | 0.1 | No Reading*
> T_DBB_U44 | 7Eh | ok | 0.1 | 28 degrees C
> T_DCSCMB_U91 | 7Fh | ok | 0.1 | 30 degrees C
> T_FIOB_U1 | 80h | ok | 0.1 | 30 degrees C
> T_MB_U30 | 81h | ok | 0.1 | 40 degrees C
> T_MB_U31 | 82h | ok | 0.1 | 39 degrees C
> T_MB_U32 | 83h | ok | 0.1 | 29 degrees C
> T_MB_U33 | 84h | ok | 0.1 | 29 degrees C
> T_NVME_E3S_1 | 85h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_2 | 86h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_3 | 87h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_4 | 88h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_5 | 89h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_6 | 8Ah | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_7 | 8Bh | ok | 0.1 | 27.89 degrees C
> T_NVME_E3S_8 | 8Ch | ok | 0.1 | 27.89 degrees C
> T_NVME_M2_0 | 8Dh | ok | 0.1 | 44.82 degrees C
> T_NVME_M2_1 | 8Eh | ok | 0.1 | 45.82 degrees C
> T_PDB_U10 | 8Fh | ok | 0.1 | 41 degrees C
> T_PDB_U11 | 90h | ok | 0.1 | 41 degrees C
> CPU1_PVCCA_EHV | 91h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCA_EHV | 92h | ok | 0.1 | 2 Volts
> CPU1_PVCCD0 | 93h | ok | 0.1 | 1 Volts
> CPU1_PVCCD1 | 94h | ok | 0.1 | 1 Volts
> CPU1_PVCCD | 95h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 96h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 97h | ok | 0.1 | 2 Volts
> CPU1_PVCCINF | 98h | ok | 0.1 | 1 Volts
> CPU1_PVNN | 99h | ok | 0.1 | 1 Volts
> CPU1_VCCIN | 9Ah | ok | 0.1 | 2 Volts
> CPU2_PVCCA_EHV | 9Bh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCA_EHV | 9Ch | ok | 0.1 | 2 Volts
> CPU2_PVCCD0 | 9Dh | ok | 0.1 | 1 Volts
> CPU2_PVCCD1 | 9Eh | ok | 0.1 | 1 Volts
> CPU2_PVCCD | 9Fh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A0h | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A1h | ok | 0.1 | 2 Volts
> CPU2_PVCCINF | A2h | ok | 0.1 | 1 Volts
> CPU2_PVNN | A3h | ok | 0.1 | 1 Volts
> CPU2_VCCIN | A4h | ok | 0.1 | 2 Volts
> V_DCSCMB_P1V05_U | A5h | ok | 0.1 | 1.05 Volts
> V_DCSCMB_P1V0 | A6h | ok | 0.1 | 1.00 Volts
> V_DCSCMB_P3V3_RG | A7h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P3V3_ST | A8h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P12V_AU | A9h | ok | 0.1 | 12.20 Volts
> V_HPM_P1V0_AUX | AAh | ok | 0.1 | 0.99 Volts
> V_HPM_P1V1_AUX | ABh | ok | 0.1 | 1.09 Volts
> V_HPM_P1V2_MAX10 | ACh | ok | 0.1 | 1.20 Volts
> V_HPM_P1V8_AUX | ADh | ok | 0.1 | 1.78 Volts
> V_HPM_P2V5_MAX10 | AEh | ok | 0.1 | 2.47 Volts
> V_HPM_P3V3 | AFh | ok | 0.1 | 3.27 Volts
> V_HPM_P3V3_AUX | B0h | ok | 0.1 | 3.27 Volts
> V_HPM_P5V_AUX | B1h | ok | 0.1 | 2.79 Volts
> V_HPM_P12V | B2h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_AUX | B3h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_STBY | B4h | ok | 0.1 | 11.92 Volts
> V_HPM_PVCC3V3_AU | B5h | ok | 0.1 | 3.27 Volts
>
> And our EE thought that it is not a HW issue and request our BMC FW
> developer to debug it. We have also tried to exchange both CPU1/2
> location either the DIMM module, but the issue still goes with the slot,
> not the CPU or DIMM itself. Also, when this issue happened, it would be
> always happen unless you AC power cycle the system.
>
> Because this issue only happened with AC cycle the system, it could not
> be reproduced with DC power cycling test which the BMC FW has not to
> reboot its firmware OS, so we think it is possible to cause by BMC
> firmware issue, but we don't know how to debug it thru BMC firmware even
> the console log, we need your help to provide some directions on
> debugging it, thank you.
>
> BTW, the OS we used on the system is Rocky Linux 9.4, and the sensor
> list was captured from the OS thru ipmitool during the test.
>
> Best regards,
> *Jacky Lee*
Hi Jacky,
For issues related to Intel platforms, you can directly reach out to
your Intel support representative for assistance.
Thanks,
-Jason
>
>
> 2F, No.6, Sec.1, Jhongsing Rd., Wugu
>
> Township, New Taipei 248, Taiwan (R.O.C.)
> Tel(TW): 886-2-89771415
>
> Fax(TW): 886-2-89769773
>
> E-mail: Jacky.Lee@flex.com <mailto:Jacky.Lee@flex.com>
>
> Legal Disclaimer :
> The information contained in this message may be privileged and
> confidential.
> It is intended to be read only by the individual or entity to whom it is
> addressed
> or by their designee. If the reader of this message is not the intended
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of
> this message!
prev parent reply other threads:[~2024-12-09 17:09 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 4:44 [Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue? Jacky Lee (TPI)
2024-12-09 17:08 ` Bills, Jason M [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ea7caf7-2f46-4bf8-bf44-cc3061a3ebf0@linux.intel.com \
--to=jason.m.bills@linux.intel.com \
--cc=openbmc@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.