All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Bills, Jason M" <jason.m.bills@linux.intel.com>
To: openbmc@lists.ozlabs.org
Subject: Re: [Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue?
Date: Mon, 9 Dec 2024 10:08:37 -0700	[thread overview]
Message-ID: <8ea7caf7-2f46-4bf8-bf44-cc3061a3ebf0@linux.intel.com> (raw)
In-Reply-To: <LV8PR08MB94865788E99E0FC65AB42B119A372@LV8PR08MB9486.namprd08.prod.outlook.com>



On 12/3/2024 9:44 PM, Jacky Lee (TPI) wrote:
> Hi sir,
> 
> We have an Intel Birch Stream platform, and our BMC FW developer are 
> implementing OpenBMC onto it with a DC-SCM module, the BMC chip is 
> ASPEED AST2600 and the RoT is ASPEED AST1060.
> 
> We got an issue that sometimes the specific sensors get no reading after 
> AC power cycle the system, the failure rate is about 12%, below is the 
> example log:
> 
> CPU1_PVCCA_EHV | 00h | ok | 0.1 | 0.39 Amps
> CPU1_PVCCA_EHV | 01h | ok | 0.1 | 2 Amps
> CPU1_PVCCD0 | 02h | ok | 0.1 | 0.16 Amps
> CPU1_PVCCD0 | 03h | ok | 0.1 | 2 Amps
> CPU1_PVCCFA_EHV_ | 04h | ok | 0.1 | 3.90 Amps
> CPU1_PVCCFA_EHV_ | 05h | ok | 0.1 | 29 Amps
> CPU1_PVCCINF | 06h | ok | 0.1 | 1.25 Amps
> CPU1_PVCCINF | 07h | ok | 0.1 | 17 Amps
> CPU1_PVNN | 08h | ok | 0.1 | 0.08 Amps
> CPU1_PVNN | 09h | ok | 0.1 | 1 Amps
> CPU1_VCCIN | 0Ah | ok | 0.1 | 255 Amps
> CPU2_PVCCA_EHV | 0Bh | ok | 0.1 | 0.39 Amps
> CPU2_PVCCA_EHV | 0Ch | ok | 0.1 | 2 Amps
> CPU2_PVCCD0 | 0Dh | ok | 0.1 | 0.31 Amps
> CPU2_PVCCD0 | 0Eh | ok | 0.1 | 3 Amps
> CPU2_PVCCFA_EHV_ | 0Fh | ok | 0.1 | 4.76 Amps
> CPU2_PVCCFA_EHV_ | 10h | ok | 0.1 | 26 Amps
> CPU2_PVCCINF | 11h | ok | 0.1 | 1.09 Amps
> CPU2_PVCCINF | 12h | ok | 0.1 | 15 Amps
> CPU2_PVNN | 13h | ok | 0.1 | 0.08 Amps
> CPU2_PVNN | 14h | ok | 0.1 | 1 Amps
> CPU2_VCCIN | 15h | ok | 0.1 | 255 Amps
> FAN0_INLET_PWM | 16h | ok | 0.1 | 29.79 unspecifi
> FAN0_OUTLET_PWM | 17h | ok | 0.1 | 29.79 unspecifi
> FAN1_INLET_PWM | 18h | ok | 0.1 | 29.79 unspecifi
> FAN1_OUTLET_PWM | 19h | ok | 0.1 | 29.79 unspecifi
> FAN2_INLET_PWM | 1Ah | ok | 0.1 | 29.79 unspecifi
> FAN2_OUTLET_PWM | 1Bh | ok | 0.1 | 29.79 unspecifi
> FAN3_INLET_PWM | 1Ch | ok | 0.1 | 29.79 unspecifi
> FAN3_OUTLET_PWM | 1Dh | ok | 0.1 | 29.79 unspecifi
> FAN0_INLET_TACH | 1Eh | ok | 0.1 | 5292 RPM
> FAN0_OUTLET_TACH | 1Fh | ok | 0.1 | 4508 RPM
> FAN1_INLET_TACH | 20h | ok | 0.1 | 5390 RPM
> FAN1_OUTLET_TACH | 21h | ok | 0.1 | 4508 RPM
> FAN2_INLET_TACH | 22h | ok | 0.1 | 5390 RPM
> FAN2_OUTLET_TACH | 23h | ok | 0.1 | 4606 RPM
> FAN3_INLET_TACH | 24h | ok | 0.1 | 5390 RPM
> FAN3_OUTLET_TACH | 25h | ok | 0.1 | 4606 RPM
> CPU1_PVCCA_EHV | 26h | ok | 0.1 | 0 Watts
> CPU1_PVCCA_EHV | 27h | ok | 0.1 | 0 Watts
> CPU1_PVCCFA_EHV_ | 28h | ok | 0.1 | 59 Watts
> CPU1_PVCCFA_EHV_ | 29h | ok | 0.1 | 47.20 Watts
> CPU1_PVCCINF | 2Ah | ok | 0.1 | 11.80 Watts
> CPU1_PVCCINF | 2Bh | ok | 0.1 | 11.80 Watts
> CPU1_VCCIN | 2Ch | ok | 0.1 | 82.60 Watts
> CPU1_VCCIN | 2Dh | ok | 0.1 | 70.80 Watts
> CPU2_PVCCA_EHV | 2Eh | ok | 0.1 | 0 Watts
> CPU2_PVCCA_EHV | 2Fh | ok | 0.1 | 0 Watts
> CPU2_PVCCFA_EHV_ | 30h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCFA_EHV_ | 31h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCINF | 32h | ok | 0.1 | 11.80 Watts
> CPU2_PVCCINF | 33h | ok | 0.1 | 11.80 Watts
> CPU2_VCCIN | 34h | ok | 0.1 | 82.60 Watts
> CPU2_VCCIN | 35h | ok | 0.1 | 70.80 Watts
> Cpu_Power_Averag | 36h | ok | 0.1 | 124 Watts
> *Cpu_Power_Averag | 37h | ns | 0.1 | No Reading*
> Cpu_Power_Cap_CP | 38h | ok | 0.1 | 0 Watts
> *Cpu_Power_Cap_CP | 39h | ns | 0.1 | No Reading*
> Dimm_Power_Avera | 3Ah | ok | 0.1 | 300 Watts
> *Dimm_Power_Avera | 3Bh | ns | 0.1 | No Reading*
> Dimm_Power_Cap_C | 3Ch | ok | 0.1 | 0 Watts
> CPU1_PVCCA_Contr | 3Eh | ok | 0.1 | 34 degrees C
> CPU1_PVCCA_EHV | 3Fh | ok | 0.1 | 34 degrees C
> CPU1_PVCCD0 | 40h | ok | 0.1 | 42 degrees C
> CPU1_PVCCFA_Cont | 41h | ok | 0.1 | 43 degrees C
> CPU1_PVCCFA_EHV_ | 42h | ok | 0.1 | 44 degrees C
> CPU1_VCCIN | 43h | ok | 0.1 | 49 degrees C
> CPU2_PVCCA_Contr | 44h | ok | 0.1 | 33 degrees C
> CPU2_PVCCA_EHV | 45h | ok | 0.1 | 33 degrees C
> CPU2_PVCCD0 | 46h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_Cont | 47h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_EHV_ | 48h | ok | 0.1 | 44 degrees C
> CPU2_VCCIN | 49h | ok | 0.1 | 51 degrees C
> *DIMM_A1_CPU1 | 4Ah | ns | 0.1 | No Reading*
> DIMM_A1_CPU2 | 4Bh | ok | 0.1 | 36 degrees C
> *DIMM_A2_CPU1 | 4Ch | ns | 0.1 | No Reading*
> DIMM_A2_CPU2 | 4Dh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU1 | 4Eh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU2 | 4Fh | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU1 | 50h | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU2 | 51h | ok | 0.1 | 36 degrees C
> DIMM_C1_CPU1 | 52h | ok | 0.1 | 35 degrees C
> DIMM_C1_CPU2 | 53h | ok | 0.1 | 36 degrees C
> DIMM_C2_CPU1 | 54h | ok | 0.1 | 35 degrees C
> DIMM_C2_CPU2 | 55h | ok | 0.1 | 36 degrees C
> DIMM_D1_CPU1 | 56h | ok | 0.1 | 34 degrees C
> DIMM_D1_CPU2 | 57h | ok | 0.1 | 36 degrees C
> DIMM_D2_CPU1 | 58h | ok | 0.1 | 34 degrees C
> DIMM_D2_CPU2 | 59h | ok | 0.1 | 36 degrees C
> DIMM_E1_CPU1 | 5Ah | ok | 0.1 | 34 degrees C
> DIMM_E1_CPU2 | 5Bh | ok | 0.1 | 35 degrees C
> DIMM_E2_CPU1 | 5Ch | ok | 0.1 | 34 degrees C
> DIMM_E2_CPU2 | 5Dh | ok | 0.1 | 35 degrees C
> DIMM_F1_CPU1 | 5Eh | ok | 0.1 | 32 degrees C
> DIMM_F1_CPU2 | 5Fh | ok | 0.1 | 34 degrees C
> DIMM_F2_CPU1 | 60h | ok | 0.1 | 32 degrees C
> DIMM_F2_CPU2 | 61h | ok | 0.1 | 34 degrees C
> DIMM_G1_CPU1 | 62h | ok | 0.1 | 37 degrees C
> DIMM_G1_CPU2 | 63h | ok | 0.1 | 35 degrees C
> DIMM_G2_CPU1 | 64h | ok | 0.1 | 37 degrees C
> DIMM_G2_CPU2 | 65h | ok | 0.1 | 35 degrees C
> DIMM_H1_CPU1 | 66h | ok | 0.1 | 37 degrees C
> DIMM_H1_CPU2 | 67h | ok | 0.1 | 35 degrees C
> DIMM_H2_CPU1 | 68h | ok | 0.1 | 37 degrees C
> DIMM_H2_CPU2 | 69h | ok | 0.1 | 35 degrees C
> DIMM_I1_CPU1 | 6Ah | ok | 0.1 | 36 degrees C
> DIMM_I1_CPU2 | 6Bh | ok | 0.1 | 35 degrees C
> DIMM_I2_CPU1 | 6Ch | ok | 0.1 | 36 degrees C
> DIMM_I2_CPU2 | 6Dh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU1 | 6Eh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU2 | 6Fh | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU1 | 70h | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU2 | 71h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU1 | 72h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU2 | 73h | ok | 0.1 | 34 degrees C
> DIMM_K2_CPU1 | 74h | ok | 0.1 | 35 degrees C
> DIMM_K2_CPU2 | 75h | ok | 0.1 | 34 degrees C
> DIMM_L1_CPU1 | 76h | ok | 0.1 | 35 degrees C
> DIMM_L1_CPU2 | 77h | ok | 0.1 | 34 degrees C
> DIMM_L2_CPU1 | 78h | ok | 0.1 | 35 degrees C
> DIMM_L2_CPU2 | 79h | ok | 0.1 | 34 degrees C
> DTS_CPU1 | 7Ah | ok | 0.1 | 57 degrees C
> *DTS_CPU2 | 7Bh | ns | 0.1 | No Reading*
> Die_CPU1 | 7Ch | ok | 0.1 | 57 degrees C
> *Die_CPU2 | 7Dh | ns | 0.1 | No Reading*
> T_DBB_U44 | 7Eh | ok | 0.1 | 28 degrees C
> T_DCSCMB_U91 | 7Fh | ok | 0.1 | 30 degrees C
> T_FIOB_U1 | 80h | ok | 0.1 | 30 degrees C
> T_MB_U30 | 81h | ok | 0.1 | 40 degrees C
> T_MB_U31 | 82h | ok | 0.1 | 39 degrees C
> T_MB_U32 | 83h | ok | 0.1 | 29 degrees C
> T_MB_U33 | 84h | ok | 0.1 | 29 degrees C
> T_NVME_E3S_1 | 85h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_2 | 86h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_3 | 87h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_4 | 88h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_5 | 89h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_6 | 8Ah | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_7 | 8Bh | ok | 0.1 | 27.89 degrees C
> T_NVME_E3S_8 | 8Ch | ok | 0.1 | 27.89 degrees C
> T_NVME_M2_0 | 8Dh | ok | 0.1 | 44.82 degrees C
> T_NVME_M2_1 | 8Eh | ok | 0.1 | 45.82 degrees C
> T_PDB_U10 | 8Fh | ok | 0.1 | 41 degrees C
> T_PDB_U11 | 90h | ok | 0.1 | 41 degrees C
> CPU1_PVCCA_EHV | 91h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCA_EHV | 92h | ok | 0.1 | 2 Volts
> CPU1_PVCCD0 | 93h | ok | 0.1 | 1 Volts
> CPU1_PVCCD1 | 94h | ok | 0.1 | 1 Volts
> CPU1_PVCCD | 95h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 96h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 97h | ok | 0.1 | 2 Volts
> CPU1_PVCCINF | 98h | ok | 0.1 | 1 Volts
> CPU1_PVNN | 99h | ok | 0.1 | 1 Volts
> CPU1_VCCIN | 9Ah | ok | 0.1 | 2 Volts
> CPU2_PVCCA_EHV | 9Bh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCA_EHV | 9Ch | ok | 0.1 | 2 Volts
> CPU2_PVCCD0 | 9Dh | ok | 0.1 | 1 Volts
> CPU2_PVCCD1 | 9Eh | ok | 0.1 | 1 Volts
> CPU2_PVCCD | 9Fh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A0h | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A1h | ok | 0.1 | 2 Volts
> CPU2_PVCCINF | A2h | ok | 0.1 | 1 Volts
> CPU2_PVNN | A3h | ok | 0.1 | 1 Volts
> CPU2_VCCIN | A4h | ok | 0.1 | 2 Volts
> V_DCSCMB_P1V05_U | A5h | ok | 0.1 | 1.05 Volts
> V_DCSCMB_P1V0 | A6h | ok | 0.1 | 1.00 Volts
> V_DCSCMB_P3V3_RG | A7h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P3V3_ST | A8h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P12V_AU | A9h | ok | 0.1 | 12.20 Volts
> V_HPM_P1V0_AUX | AAh | ok | 0.1 | 0.99 Volts
> V_HPM_P1V1_AUX | ABh | ok | 0.1 | 1.09 Volts
> V_HPM_P1V2_MAX10 | ACh | ok | 0.1 | 1.20 Volts
> V_HPM_P1V8_AUX | ADh | ok | 0.1 | 1.78 Volts
> V_HPM_P2V5_MAX10 | AEh | ok | 0.1 | 2.47 Volts
> V_HPM_P3V3 | AFh | ok | 0.1 | 3.27 Volts
> V_HPM_P3V3_AUX | B0h | ok | 0.1 | 3.27 Volts
> V_HPM_P5V_AUX | B1h | ok | 0.1 | 2.79 Volts
> V_HPM_P12V | B2h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_AUX | B3h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_STBY | B4h | ok | 0.1 | 11.92 Volts
> V_HPM_PVCC3V3_AU | B5h | ok | 0.1 | 3.27 Volts
> 
> And our EE thought that it is not a HW issue and request our BMC FW 
> developer to debug it. We have also tried to exchange both CPU1/2 
> location either the DIMM module, but the issue still goes with the slot, 
> not the CPU or DIMM itself. Also, when this issue happened, it would be 
> always happen unless you AC power cycle the system.
> 
> Because this issue only happened with AC cycle the system, it could not 
> be reproduced with DC power cycling test which the BMC FW has not to 
> reboot its firmware OS, so we think it is possible to cause by BMC 
> firmware issue, but we don't know how to debug it thru BMC firmware even 
> the console log, we need your help to provide some directions on 
> debugging it, thank you.
> 
> BTW, the OS we used on the system is Rocky Linux 9.4, and the sensor 
> list was captured from the OS thru ipmitool during the test.
> 
> Best regards,
> *Jacky Lee*

Hi Jacky,

For issues related to Intel platforms, you can directly reach out to 
your Intel support representative for assistance.

Thanks,
-Jason

> 
> 
> 2F, No.6, Sec.1, Jhongsing Rd., Wugu
> 
> Township, New Taipei 248, Taiwan (R.O.C.)
> Tel(TW): 886-2-89771415
> 
> Fax(TW): 886-2-89769773
> 
> E-mail: Jacky.Lee@flex.com <mailto:Jacky.Lee@flex.com>
> 
> Legal Disclaimer :
> The information contained in this message may be privileged and 
> confidential.
> It is intended to be read only by the individual or entity to whom it is 
> addressed
> or by their designee. If the reader of this message is not the intended 
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of 
> this message!


      reply	other threads:[~2024-12-09 17:09 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04  4:44 [Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue? Jacky Lee (TPI)
2024-12-09 17:08 ` Bills, Jason M [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ea7caf7-2f46-4bf8-bf44-cc3061a3ebf0@linux.intel.com \
    --to=jason.m.bills@linux.intel.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.