Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
@ 2024-09-26 21:11 Laurence Oberman
  2024-09-27  6:10 ` Nilay Shroff
  2024-10-03 21:04 ` Keith Busch
  0 siblings, 2 replies; 8+ messages in thread
From: Laurence Oberman @ 2024-09-26 21:11 UTC (permalink / raw)
  To: busch, keith, linux-nvme

Hi Keith
Hope all is well

Quick question (expected or not)

It was reported to Red Hat, seeing issues with using a
"nvme subsystem-reset /dev/nvme0" command to test resets.

On multiple servers I tested on two types of nvme attached devices
These are not the rootfs devices

1. The front slot (hotplug) devices in a 2.5in format 
reset and after some time recover (what is expected)

Example of one working

Does not trap and land up as a machine-check

[ 2215.440468] pcieport 0000:10:01.1: AER: Multiple Uncorrected (Non-
Fatal) error received: 0000:12:13.0
[ 2215.440532] pcieport 0000:12:13.0: PCIe Bus Error:
severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester
ID)
[ 2215.440536] pcieport 0000:12:13.0:   device [10b5:8748] error
status/mask=00100000/00000000
[ 2215.440540] pcieport 0000:12:13.0:    [20] UnsupReq              
(First)
[ 2215.440544] pcieport 0000:12:13.0: AER:   TLP Header: 40009001
1000000f e9211000 12000000
[ 2215.441813] systemd-journald[2173]: Sent WATCHDOG=1 notification.
[ 2216.937498] {1}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 4
[ 2216.937505] {1}[Hardware Error]: event severity: info
[ 2216.937508] {1}[Hardware Error]:  Error 0, type: fatal
[ 2216.937511] {1}[Hardware Error]:  fru_text: PcieError
[ 2216.937514] {1}[Hardware Error]:   section_type: PCIe error
[ 2216.937515] {1}[Hardware Error]:   port_type: 4, root port
[ 2216.937517] {1}[Hardware Error]:   version: 0.2
[ 2216.937519] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
[ 2216.937522] {1}[Hardware Error]:   device_id: 0000:10:01.1
[ 2216.937524] {1}[Hardware Error]:   slot: 3
[ 2216.937525] {1}[Hardware Error]:   secondary_bus: 0x11
[ 2216.937526] {1}[Hardware Error]:   vendor_id: 0x1022, device_id:
0x1453
[ 2216.937528] {1}[Hardware Error]:   class_code: 060400
[ 2216.937529] {1}[Hardware Error]:   bridge: secondary_status: 0x2000,
control: 0x0012
[ 2216.937530] {1}[Hardware Error]:   aer_uncor_status: 0x00000000,
aer_uncor_mask: 0x04500000
[ 2216.937532] {1}[Hardware Error]:   aer_uncor_severity: 0x004e2030
[ 2216.937532] {1}[Hardware Error]:   TLP Header: 00000000 00000000
00000000 00000000
[ 2216.937629] pcieport 0000:10:01.1: AER: aer_status: 0x00000000,
aer_mask: 0x04500000
[ 2216.937634] pcieport 0000:10:01.1: AER: aer_layer=Transaction Layer,
aer_agent=Receiver ID
[ 2216.937638] pcieport 0000:10:01.1: AER: aer_uncor_severity:
0x004e2030
[ 2216.937645] nvme nvme4: frozen state error detected, reset
controller
[ 2217.071095] nvme nvme10: frozen state error detected, reset
controller
[ 2217.096928] nvme nvme0: frozen state error detected, reset
controller
[ 2217.118947] nvme nvme18: frozen state error detected, reset
controller
[ 2217.138945] nvme nvme6: frozen state error detected, reset
controller
[ 2217.164918] nvme nvme14: frozen state error detected, reset
controller
[ 2217.186902] nvme nvme20: frozen state error detected, reset
controller
[ 2279.420266] nvme 0000:1a:00.0: Unable to change power state from
D3cold to D0, device inaccessible
[ 2279.420329] nvme nvme22: Disabling device after reset failure: -19
[ 2279.464727] pcieport 0000:12:13.0: AER: device recovery failed
[ 2279.464823] pcieport 0000:12:13.0: pciehp: pcie_do_write_cmd: no
response from device

Port resets and recovers

[ 2279.593196] pcieport 0000:10:01.1: AER: Root Port link has been
reset (0)
[ 2279.593699] nvme nvme4: restart after slot reset
[ 2279.593949] nvme nvme10: restart after slot reset
[ 2279.594222] nvme nvme0: restart after slot reset
[ 2279.594453] nvme nvme18: restart after slot reset
[ 2279.594728] nvme nvme6: restart after slot reset
[ 2279.594984] nvme nvme14: restart after slot reset
[ 2279.595226] nvme nvme20: restart after slot reset
[ 2279.595435] pcieport 0000:12:13.0: pciehp: Slot(19): Card present
[ 2279.595441] pcieport 0000:12:13.0: pciehp: Slot(19): Link Up
[ 2279.609081] nvme nvme4: Shutdown timeout set to 8 seconds
[ 2279.617532] nvme nvme0: Shutdown timeout set to 8 seconds
[ 2279.617533] nvme nvme14: Shutdown timeout set to 8 seconds
[ 2279.618028] nvme nvme6: Shutdown timeout set to 8 seconds
[ 2279.618207] nvme nvme18: Shutdown timeout set to 8 seconds
[ 2279.618290] nvme nvme10: Shutdown timeout set to 8 seconds
[ 2279.618308] nvme nvme20: Shutdown timeout set to 8 seconds
[ 2279.631961] nvme nvme4: 32/0/0 default/read/poll queues
[ 2279.643293] nvme nvme14: 32/0/0 default/read/poll queues
[ 2279.643372] nvme nvme0: 32/0/0 default/read/poll queues
[ 2279.644881] nvme nvme6: 32/0/0 default/read/poll queues
[ 2279.644966] nvme nvme10: 32/0/0 default/read/poll queues
[ 2279.645030] nvme nvme18: 32/0/0 default/read/poll queues
[ 2279.645132] nvme nvme20: 32/0/0 default/read/poll queues
[ 2279.645202] pcieport 0000:10:01.1: AER: device recovery successful

2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
a machine check and panics the box when its against a nvme in a 
PCIE slot

  263.862919] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5
Bank 6: ba00000000000e0b
[  263.862924] mce: [Hardware Error]: RIP !INEXACT!
10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
[  263.862931] mce: [Hardware Error]: TSC 7a47d8d62ba6dd MISC 83100000 
[  263.862933] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1727384194
SOCKET 1 APIC 40 microcode d0003a5
[  263.862936] mce: [Hardware Error]: Run the above through 'mcelog --
ascii'
[  263.885254] mce: [Hardware Error]: Machine check: Processor context
corrupt
[  263.885259] Kernel panic - not syncing: Fatal machine check

Hardware event. This is not a software error.
CPU 0 BANK 0 TSC 7a47d8d62ba6dd 
RIP !INEXACT! 10:ffffffff8571dce4
TIME 1727384194 Thu Sep 26 16:56:34 2024
MCG status:
MCi status:
Machine check not valid
Corrected error
MCA: No Error
STATUS 0 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 106 Step 6
RIP: intel_idle+0x54/0x90}
SOCKET 1 APIC 40 microcode d0003a5
Run the above through 'mcelog --ascii'
Machine check: Processor context corrupt

Regards
Laurence




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-09-26 21:11 nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot Laurence Oberman
@ 2024-09-27  6:10 ` Nilay Shroff
  2024-09-27 12:18   ` Laurence Oberman
  2024-10-03 21:04 ` Keith Busch
  1 sibling, 1 reply; 8+ messages in thread
From: Nilay Shroff @ 2024-09-27  6:10 UTC (permalink / raw)
  To: Laurence Oberman, busch, keith, linux-nvme, Keith Busch



On 9/27/24 02:41, Laurence Oberman wrote:
> Hi Keith
> Hope all is well
> 
> Quick question (expected or not)
> 
> It was reported to Red Hat, seeing issues with using a
> "nvme subsystem-reset /dev/nvme0" command to test resets.
> 
> On multiple servers I tested on two types of nvme attached devices
> These are not the rootfs devices
> 
> 1. The front slot (hotplug) devices in a 2.5in format 
> reset and after some time recover (what is expected)
> 
> Example of one working
> 
> Does not trap and land up as a machine-check
> 
> [ 2215.440468] pcieport 0000:10:01.1: AER: Multiple Uncorrected (Non-
> Fatal) error received: 0000:12:13.0
> [ 2215.440532] pcieport 0000:12:13.0: PCIe Bus Error:
> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester
> ID)
> [ 2215.440536] pcieport 0000:12:13.0:   device [10b5:8748] error
> status/mask=00100000/00000000
> [ 2215.440540] pcieport 0000:12:13.0:    [20] UnsupReq              
> (First)
> [ 2215.440544] pcieport 0000:12:13.0: AER:   TLP Header: 40009001
> 1000000f e9211000 12000000
> [ 2215.441813] systemd-journald[2173]: Sent WATCHDOG=1 notification.
> [ 2216.937498] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 4
> [ 2216.937505] {1}[Hardware Error]: event severity: info
> [ 2216.937508] {1}[Hardware Error]:  Error 0, type: fatal
> [ 2216.937511] {1}[Hardware Error]:  fru_text: PcieError
> [ 2216.937514] {1}[Hardware Error]:   section_type: PCIe error
> [ 2216.937515] {1}[Hardware Error]:   port_type: 4, root port
> [ 2216.937517] {1}[Hardware Error]:   version: 0.2
> [ 2216.937519] {1}[Hardware Error]:   command: 0x0407, status: 0x0010
> [ 2216.937522] {1}[Hardware Error]:   device_id: 0000:10:01.1
> [ 2216.937524] {1}[Hardware Error]:   slot: 3
> [ 2216.937525] {1}[Hardware Error]:   secondary_bus: 0x11
> [ 2216.937526] {1}[Hardware Error]:   vendor_id: 0x1022, device_id:
> 0x1453
> [ 2216.937528] {1}[Hardware Error]:   class_code: 060400
> [ 2216.937529] {1}[Hardware Error]:   bridge: secondary_status: 0x2000,
> control: 0x0012
> [ 2216.937530] {1}[Hardware Error]:   aer_uncor_status: 0x00000000,
> aer_uncor_mask: 0x04500000
> [ 2216.937532] {1}[Hardware Error]:   aer_uncor_severity: 0x004e2030
> [ 2216.937532] {1}[Hardware Error]:   TLP Header: 00000000 00000000
> 00000000 00000000
> [ 2216.937629] pcieport 0000:10:01.1: AER: aer_status: 0x00000000,
> aer_mask: 0x04500000
> [ 2216.937634] pcieport 0000:10:01.1: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [ 2216.937638] pcieport 0000:10:01.1: AER: aer_uncor_severity:
> 0x004e2030
> [ 2216.937645] nvme nvme4: frozen state error detected, reset
> controller
> [ 2217.071095] nvme nvme10: frozen state error detected, reset
> controller
> [ 2217.096928] nvme nvme0: frozen state error detected, reset
> controller
> [ 2217.118947] nvme nvme18: frozen state error detected, reset
> controller
> [ 2217.138945] nvme nvme6: frozen state error detected, reset
> controller
> [ 2217.164918] nvme nvme14: frozen state error detected, reset
> controller
> [ 2217.186902] nvme nvme20: frozen state error detected, reset
> controller
> [ 2279.420266] nvme 0000:1a:00.0: Unable to change power state from
> D3cold to D0, device inaccessible
> [ 2279.420329] nvme nvme22: Disabling device after reset failure: -19
> [ 2279.464727] pcieport 0000:12:13.0: AER: device recovery failed
> [ 2279.464823] pcieport 0000:12:13.0: pciehp: pcie_do_write_cmd: no
> response from device
> 
> Port resets and recovers
> 
> [ 2279.593196] pcieport 0000:10:01.1: AER: Root Port link has been
> reset (0)
> [ 2279.593699] nvme nvme4: restart after slot reset
> [ 2279.593949] nvme nvme10: restart after slot reset
> [ 2279.594222] nvme nvme0: restart after slot reset
> [ 2279.594453] nvme nvme18: restart after slot reset
> [ 2279.594728] nvme nvme6: restart after slot reset
> [ 2279.594984] nvme nvme14: restart after slot reset
> [ 2279.595226] nvme nvme20: restart after slot reset
> [ 2279.595435] pcieport 0000:12:13.0: pciehp: Slot(19): Card present
> [ 2279.595441] pcieport 0000:12:13.0: pciehp: Slot(19): Link Up
> [ 2279.609081] nvme nvme4: Shutdown timeout set to 8 seconds
> [ 2279.617532] nvme nvme0: Shutdown timeout set to 8 seconds
> [ 2279.617533] nvme nvme14: Shutdown timeout set to 8 seconds
> [ 2279.618028] nvme nvme6: Shutdown timeout set to 8 seconds
> [ 2279.618207] nvme nvme18: Shutdown timeout set to 8 seconds
> [ 2279.618290] nvme nvme10: Shutdown timeout set to 8 seconds
> [ 2279.618308] nvme nvme20: Shutdown timeout set to 8 seconds
> [ 2279.631961] nvme nvme4: 32/0/0 default/read/poll queues
> [ 2279.643293] nvme nvme14: 32/0/0 default/read/poll queues
> [ 2279.643372] nvme nvme0: 32/0/0 default/read/poll queues
> [ 2279.644881] nvme nvme6: 32/0/0 default/read/poll queues
> [ 2279.644966] nvme nvme10: 32/0/0 default/read/poll queues
> [ 2279.645030] nvme nvme18: 32/0/0 default/read/poll queues
> [ 2279.645132] nvme nvme20: 32/0/0 default/read/poll queues
> [ 2279.645202] pcieport 0000:10:01.1: AER: device recovery successful
> 
> 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> a machine check and panics the box when its against a nvme in a 
> PCIE slot
> 
>   263.862919] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5
> Bank 6: ba00000000000e0b
> [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
> 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> [  263.862931] mce: [Hardware Error]: TSC 7a47d8d62ba6dd MISC 83100000 
> [  263.862933] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1727384194
> SOCKET 1 APIC 40 microcode d0003a5
> [  263.862936] mce: [Hardware Error]: Run the above through 'mcelog --
> ascii'
> [  263.885254] mce: [Hardware Error]: Machine check: Processor context
> corrupt
> [  263.885259] Kernel panic - not syncing: Fatal machine check
> 
> Hardware event. This is not a software error.
> CPU 0 BANK 0 TSC 7a47d8d62ba6dd 
> RIP !INEXACT! 10:ffffffff8571dce4
> TIME 1727384194 Thu Sep 26 16:56:34 2024
> MCG status:
> MCi status:
> Machine check not valid
> Corrected error
> MCA: No Error
> STATUS 0 MCGSTATUS 0
> CPUID Vendor Intel Family 6 Model 106 Step 6
> RIP: intel_idle+0x54/0x90}
> SOCKET 1 APIC 40 microcode d0003a5
> Run the above through 'mcelog --ascii'
> Machine check: Processor context corrupt
> 
> Regards
> Laurence
> 
> 
> 
I think the Keith's email address is not correct. Adding the correct email address of Keith here.

BTW, Keith recently help fixed an issue in kernel v6.11 with nvme subsystem-reset command to ensure 
that we recover the nvme disk on PPC. On PPC architecture, we use EEH to recover the disk post 
subsystem-reset but yours is Intel and that uses AER for recovery. So I'm not sure if that same 
commit 210b1f6576e8("nvme-pci: do not directly handle subsys reset fallout") which was merged in 
kernel v6.11 causing a side effect on the Intel machine. 

Would you please revert the above commit and see if that help fix the observed symptom on your
Intel machine? 

Thanks,
--Nilay







^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-09-27  6:10 ` Nilay Shroff
@ 2024-09-27 12:18   ` Laurence Oberman
  2024-09-27 13:06     ` Nilay Shroff
  0 siblings, 1 reply; 8+ messages in thread
From: Laurence Oberman @ 2024-09-27 12:18 UTC (permalink / raw)
  To: Nilay Shroff, linux-nvme, Keith Busch

On Fri, 2024-09-27 at 11:40 +0530, Nilay Shroff wrote:
> 
> 
> On 9/27/24 02:41, Laurence Oberman wrote:
> > Hi Keith
> > Hope all is well
> > 
> > Quick question (expected or not)
> > 
> > It was reported to Red Hat, seeing issues with using a
> > "nvme subsystem-reset /dev/nvme0" command to test resets.
> > 
> > On multiple servers I tested on two types of nvme attached devices
> > These are not the rootfs devices
> > 
> > 1. The front slot (hotplug) devices in a 2.5in format 
> > reset and after some time recover (what is expected)
> > 
> > Example of one working
> > 
> > Does not trap and land up as a machine-check
> > 
> > [ 2215.440468] pcieport 0000:10:01.1: AER: Multiple Uncorrected
> > (Non-
> > Fatal) error received: 0000:12:13.0
> > [ 2215.440532] pcieport 0000:12:13.0: PCIe Bus Error:
> > severity=Uncorrected (Non-Fatal), type=Transaction Layer,
> > (Requester
> > ID)
> > [ 2215.440536] pcieport 0000:12:13.0:   device [10b5:8748] error
> > status/mask=00100000/00000000
> > (First)
> > [ 2215.440544] pcieport 0000:12:13.0: AER:   TLP Header: 40009001
> > 1000000f e9211000 12000000
> > [ 2215.441813] systemd-journald[2173]: Sent WATCHDOG=1
> > notification.
> > [ 2216.937498] {1}[Hardware Error]: Hardware error from APEI
> > Generic
> > Hardware Error Source: 4
> > [ 2216.937505] {1}[Hardware Error]: event severity: info
> > [ 2216.937508] {1}[Hardware Error]:  Error 0, type: fatal
> > [ 2216.937511] {1}[Hardware Error]:  fru_text: PcieError
> > [ 2216.937514] {1}[Hardware Error]:   section_type: PCIe error
> > [ 2216.937515] {1}[Hardware Error]:   port_type: 4, root port
> > [ 2216.937517] {1}[Hardware Error]:   version: 0.2
> > [ 2216.937519] {1}[Hardware Error]:   command: 0x0407, status:
> > 0x0010
> > [ 2216.937522] {1}[Hardware Error]:   device_id: 0000:10:01.1
> > [ 2216.937524] {1}[Hardware Error]:   slot: 3
> > [ 2216.937525] {1}[Hardware Error]:   secondary_bus: 0x11
> > [ 2216.937526] {1}[Hardware Error]:   vendor_id: 0x1022, device_id:
> > 0x1453
> > [ 2216.937528] {1}[Hardware Error]:   class_code: 060400
> > [ 2216.937529] {1}[Hardware Error]:   bridge: secondary_status:
> > 0x2000,
> > control: 0x0012
> > [ 2216.937530] {1}[Hardware Error]:   aer_uncor_status: 0x00000000,
> > aer_uncor_mask: 0x04500000
> > [ 2216.937532] {1}[Hardware Error]:   aer_uncor_severity:
> > 0x004e2030
> > [ 2216.937532] {1}[Hardware Error]:   TLP Header: 00000000 00000000
> > 00000000 00000000
> > [ 2216.937629] pcieport 0000:10:01.1: AER: aer_status: 0x00000000,
> > aer_mask: 0x04500000
> > [ 2216.937634] pcieport 0000:10:01.1: AER: aer_layer=Transaction
> > Layer,
> > aer_agent=Receiver ID
> > [ 2216.937638] pcieport 0000:10:01.1: AER: aer_uncor_severity:
> > 0x004e2030
> > [ 2216.937645] nvme nvme4: frozen state error detected, reset
> > controller
> > [ 2217.071095] nvme nvme10: frozen state error detected, reset
> > controller
> > [ 2217.096928] nvme nvme0: frozen state error detected, reset
> > controller
> > [ 2217.118947] nvme nvme18: frozen state error detected, reset
> > controller
> > [ 2217.138945] nvme nvme6: frozen state error detected, reset
> > controller
> > [ 2217.164918] nvme nvme14: frozen state error detected, reset
> > controller
> > [ 2217.186902] nvme nvme20: frozen state error detected, reset
> > controller
> > [ 2279.420266] nvme 0000:1a:00.0: Unable to change power state from
> > D3cold to D0, device inaccessible
> > [ 2279.420329] nvme nvme22: Disabling device after reset failure: -
> > 19
> > [ 2279.464727] pcieport 0000:12:13.0: AER: device recovery failed
> > [ 2279.464823] pcieport 0000:12:13.0: pciehp: pcie_do_write_cmd: no
> > response from device
> > 
> > Port resets and recovers
> > 
> > [ 2279.593196] pcieport 0000:10:01.1: AER: Root Port link has been
> > reset (0)
> > [ 2279.593699] nvme nvme4: restart after slot reset
> > [ 2279.593949] nvme nvme10: restart after slot reset
> > [ 2279.594222] nvme nvme0: restart after slot reset
> > [ 2279.594453] nvme nvme18: restart after slot reset
> > [ 2279.594728] nvme nvme6: restart after slot reset
> > [ 2279.594984] nvme nvme14: restart after slot reset
> > [ 2279.595226] nvme nvme20: restart after slot reset
> > [ 2279.595435] pcieport 0000:12:13.0: pciehp: Slot(19): Card
> > present
> > [ 2279.595441] pcieport 0000:12:13.0: pciehp: Slot(19): Link Up
> > [ 2279.609081] nvme nvme4: Shutdown timeout set to 8 seconds
> > [ 2279.617532] nvme nvme0: Shutdown timeout set to 8 seconds
> > [ 2279.617533] nvme nvme14: Shutdown timeout set to 8 seconds
> > [ 2279.618028] nvme nvme6: Shutdown timeout set to 8 seconds
> > [ 2279.618207] nvme nvme18: Shutdown timeout set to 8 seconds
> > [ 2279.618290] nvme nvme10: Shutdown timeout set to 8 seconds
> > [ 2279.618308] nvme nvme20: Shutdown timeout set to 8 seconds
> > [ 2279.631961] nvme nvme4: 32/0/0 default/read/poll queues
> > [ 2279.643293] nvme nvme14: 32/0/0 default/read/poll queues
> > [ 2279.643372] nvme nvme0: 32/0/0 default/read/poll queues
> > [ 2279.644881] nvme nvme6: 32/0/0 default/read/poll queues
> > [ 2279.644966] nvme nvme10: 32/0/0 default/read/poll queues
> > [ 2279.645030] nvme nvme18: 32/0/0 default/read/poll queues
> > [ 2279.645132] nvme nvme20: 32/0/0 default/read/poll queues
> > [ 2279.645202] pcieport 0000:10:01.1: AER: device recovery
> > successful
> > 
> > 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> > a machine check and panics the box when its against a nvme in a 
> > PCIE slot
> > 
> >   263.862919] mce: [Hardware Error]: CPU 12: Machine Check
> > Exception: 5
> > Bank 6: ba00000000000e0b
> > [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
> > 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> > [  263.862931] mce: [Hardware Error]: TSC 7a47d8d62ba6dd MISC
> > 83100000 
> > [  263.862933] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME
> > 1727384194
> > SOCKET 1 APIC 40 microcode d0003a5
> > [  263.862936] mce: [Hardware Error]: Run the above through 'mcelog
> > --
> > ascii'
> > [  263.885254] mce: [Hardware Error]: Machine check: Processor
> > context
> > corrupt
> > [  263.885259] Kernel panic - not syncing: Fatal machine check
> > 
> > Hardware event. This is not a software error.
> > CPU 0 BANK 0 TSC 7a47d8d62ba6dd 
> > RIP !INEXACT! 10:ffffffff8571dce4
> > TIME 1727384194 Thu Sep 26 16:56:34 2024
> > MCG status:
> > MCi status:
> > Machine check not valid
> > Corrected error
> > MCA: No Error
> > STATUS 0 MCGSTATUS 0
> > CPUID Vendor Intel Family 6 Model 106 Step 6
> > RIP: intel_idle+0x54/0x90}
> > SOCKET 1 APIC 40 microcode d0003a5
> > Run the above through 'mcelog --ascii'
> > Machine check: Processor context corrupt
> > 
> > Regards
> > Laurence
> > 
> > 
> > 
> I think the Keith's email address is not correct. Adding the correct
> email address of Keith here.
> 
> BTW, Keith recently help fixed an issue in kernel v6.11 with nvme
> subsystem-reset command to ensure 
> that we recover the nvme disk on PPC. On PPC architecture, we use EEH
> to recover the disk post 
> subsystem-reset but yours is Intel and that uses AER for recovery. So
> I'm not sure if that same 
> commit 210b1f6576e8("nvme-pci: do not directly handle subsys reset
> fallout") which was merged in 
> kernel v6.11 causing a side effect on the Intel machine. 
> 
> Would you please revert the above commit and see if that help fix the
> observed symptom on your
> Intel machine? 
> 
> Thanks,
> --Nilay
> 
> 
> 
> 
> 
Hello Nilay
Thanks will try that.
Was your IBM PPC issue also only with direct attached PCIE slot based
nvme.
Will report back after testing with the revert



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-09-27 12:18   ` Laurence Oberman
@ 2024-09-27 13:06     ` Nilay Shroff
  0 siblings, 0 replies; 8+ messages in thread
From: Nilay Shroff @ 2024-09-27 13:06 UTC (permalink / raw)
  To: Laurence Oberman, linux-nvme, Keith Busch



On 9/27/24 17:48, Laurence Oberman wrote:
> On Fri, 2024-09-27 at 11:40 +0530, Nilay Shroff wrote:
>>
>>
>> On 9/27/24 02:41, Laurence Oberman wrote:
>>> Hi Keith
>>> Hope all is well
>>>
>>> Quick question (expected or not)
>>>
>>> It was reported to Red Hat, seeing issues with using a
>>> "nvme subsystem-reset /dev/nvme0" command to test resets.
>>>
>>> On multiple servers I tested on two types of nvme attached devices
>>> These are not the rootfs devices
>>>
>>> 1. The front slot (hotplug) devices in a 2.5in format 
>>> reset and after some time recover (what is expected)
>>>
>>> Example of one working
>>>
>>> Does not trap and land up as a machine-check
>>>
>>> [ 2215.440468] pcieport 0000:10:01.1: AER: Multiple Uncorrected
>>> (Non-
>>> Fatal) error received: 0000:12:13.0
>>> [ 2215.440532] pcieport 0000:12:13.0: PCIe Bus Error:
>>> severity=Uncorrected (Non-Fatal), type=Transaction Layer,
>>> (Requester
>>> ID)
>>> [ 2215.440536] pcieport 0000:12:13.0:   device [10b5:8748] error
>>> status/mask=00100000/00000000
>>> (First)
>>> [ 2215.440544] pcieport 0000:12:13.0: AER:   TLP Header: 40009001
>>> 1000000f e9211000 12000000
>>> [ 2215.441813] systemd-journald[2173]: Sent WATCHDOG=1
>>> notification.
>>> [ 2216.937498] {1}[Hardware Error]: Hardware error from APEI
>>> Generic
>>> Hardware Error Source: 4
>>> [ 2216.937505] {1}[Hardware Error]: event severity: info
>>> [ 2216.937508] {1}[Hardware Error]:  Error 0, type: fatal
>>> [ 2216.937511] {1}[Hardware Error]:  fru_text: PcieError
>>> [ 2216.937514] {1}[Hardware Error]:   section_type: PCIe error
>>> [ 2216.937515] {1}[Hardware Error]:   port_type: 4, root port
>>> [ 2216.937517] {1}[Hardware Error]:   version: 0.2
>>> [ 2216.937519] {1}[Hardware Error]:   command: 0x0407, status:
>>> 0x0010
>>> [ 2216.937522] {1}[Hardware Error]:   device_id: 0000:10:01.1
>>> [ 2216.937524] {1}[Hardware Error]:   slot: 3
>>> [ 2216.937525] {1}[Hardware Error]:   secondary_bus: 0x11
>>> [ 2216.937526] {1}[Hardware Error]:   vendor_id: 0x1022, device_id:
>>> 0x1453
>>> [ 2216.937528] {1}[Hardware Error]:   class_code: 060400
>>> [ 2216.937529] {1}[Hardware Error]:   bridge: secondary_status:
>>> 0x2000,
>>> control: 0x0012
>>> [ 2216.937530] {1}[Hardware Error]:   aer_uncor_status: 0x00000000,
>>> aer_uncor_mask: 0x04500000
>>> [ 2216.937532] {1}[Hardware Error]:   aer_uncor_severity:
>>> 0x004e2030
>>> [ 2216.937532] {1}[Hardware Error]:   TLP Header: 00000000 00000000
>>> 00000000 00000000
>>> [ 2216.937629] pcieport 0000:10:01.1: AER: aer_status: 0x00000000,
>>> aer_mask: 0x04500000
>>> [ 2216.937634] pcieport 0000:10:01.1: AER: aer_layer=Transaction
>>> Layer,
>>> aer_agent=Receiver ID
>>> [ 2216.937638] pcieport 0000:10:01.1: AER: aer_uncor_severity:
>>> 0x004e2030
>>> [ 2216.937645] nvme nvme4: frozen state error detected, reset
>>> controller
>>> [ 2217.071095] nvme nvme10: frozen state error detected, reset
>>> controller
>>> [ 2217.096928] nvme nvme0: frozen state error detected, reset
>>> controller
>>> [ 2217.118947] nvme nvme18: frozen state error detected, reset
>>> controller
>>> [ 2217.138945] nvme nvme6: frozen state error detected, reset
>>> controller
>>> [ 2217.164918] nvme nvme14: frozen state error detected, reset
>>> controller
>>> [ 2217.186902] nvme nvme20: frozen state error detected, reset
>>> controller
>>> [ 2279.420266] nvme 0000:1a:00.0: Unable to change power state from
>>> D3cold to D0, device inaccessible
>>> [ 2279.420329] nvme nvme22: Disabling device after reset failure: -
>>> 19
>>> [ 2279.464727] pcieport 0000:12:13.0: AER: device recovery failed
>>> [ 2279.464823] pcieport 0000:12:13.0: pciehp: pcie_do_write_cmd: no
>>> response from device
>>>
>>> Port resets and recovers
>>>
>>> [ 2279.593196] pcieport 0000:10:01.1: AER: Root Port link has been
>>> reset (0)
>>> [ 2279.593699] nvme nvme4: restart after slot reset
>>> [ 2279.593949] nvme nvme10: restart after slot reset
>>> [ 2279.594222] nvme nvme0: restart after slot reset
>>> [ 2279.594453] nvme nvme18: restart after slot reset
>>> [ 2279.594728] nvme nvme6: restart after slot reset
>>> [ 2279.594984] nvme nvme14: restart after slot reset
>>> [ 2279.595226] nvme nvme20: restart after slot reset
>>> [ 2279.595435] pcieport 0000:12:13.0: pciehp: Slot(19): Card
>>> present
>>> [ 2279.595441] pcieport 0000:12:13.0: pciehp: Slot(19): Link Up
>>> [ 2279.609081] nvme nvme4: Shutdown timeout set to 8 seconds
>>> [ 2279.617532] nvme nvme0: Shutdown timeout set to 8 seconds
>>> [ 2279.617533] nvme nvme14: Shutdown timeout set to 8 seconds
>>> [ 2279.618028] nvme nvme6: Shutdown timeout set to 8 seconds
>>> [ 2279.618207] nvme nvme18: Shutdown timeout set to 8 seconds
>>> [ 2279.618290] nvme nvme10: Shutdown timeout set to 8 seconds
>>> [ 2279.618308] nvme nvme20: Shutdown timeout set to 8 seconds
>>> [ 2279.631961] nvme nvme4: 32/0/0 default/read/poll queues
>>> [ 2279.643293] nvme nvme14: 32/0/0 default/read/poll queues
>>> [ 2279.643372] nvme nvme0: 32/0/0 default/read/poll queues
>>> [ 2279.644881] nvme nvme6: 32/0/0 default/read/poll queues
>>> [ 2279.644966] nvme nvme10: 32/0/0 default/read/poll queues
>>> [ 2279.645030] nvme nvme18: 32/0/0 default/read/poll queues
>>> [ 2279.645132] nvme nvme20: 32/0/0 default/read/poll queues
>>> [ 2279.645202] pcieport 0000:10:01.1: AER: device recovery
>>> successful
>>>
>>> 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
>>> a machine check and panics the box when its against a nvme in a 
>>> PCIE slot
>>>
>>>   263.862919] mce: [Hardware Error]: CPU 12: Machine Check
>>> Exception: 5
>>> Bank 6: ba00000000000e0b
>>> [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
>>> 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
>>> [  263.862931] mce: [Hardware Error]: TSC 7a47d8d62ba6dd MISC
>>> 83100000 
>>> [  263.862933] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME
>>> 1727384194
>>> SOCKET 1 APIC 40 microcode d0003a5
>>> [  263.862936] mce: [Hardware Error]: Run the above through 'mcelog
>>> --
>>> ascii'
>>> [  263.885254] mce: [Hardware Error]: Machine check: Processor
>>> context
>>> corrupt
>>> [  263.885259] Kernel panic - not syncing: Fatal machine check
>>>
>>> Hardware event. This is not a software error.
>>> CPU 0 BANK 0 TSC 7a47d8d62ba6dd 
>>> RIP !INEXACT! 10:ffffffff8571dce4
>>> TIME 1727384194 Thu Sep 26 16:56:34 2024
>>> MCG status:
>>> MCi status:
>>> Machine check not valid
>>> Corrected error
>>> MCA: No Error
>>> STATUS 0 MCGSTATUS 0
>>> CPUID Vendor Intel Family 6 Model 106 Step 6
>>> RIP: intel_idle+0x54/0x90}
>>> SOCKET 1 APIC 40 microcode d0003a5
>>> Run the above through 'mcelog --ascii'
>>> Machine check: Processor context corrupt
>>>
>>> Regards
>>> Laurence
>>>
>>>
>>>
>> I think the Keith's email address is not correct. Adding the correct
>> email address of Keith here.
>>
>> BTW, Keith recently help fixed an issue in kernel v6.11 with nvme
>> subsystem-reset command to ensure 
>> that we recover the nvme disk on PPC. On PPC architecture, we use EEH
>> to recover the disk post 
>> subsystem-reset but yours is Intel and that uses AER for recovery. So
>> I'm not sure if that same 
>> commit 210b1f6576e8("nvme-pci: do not directly handle subsys reset
>> fallout") which was merged in 
>> kernel v6.11 causing a side effect on the Intel machine. 
>>
>> Would you please revert the above commit and see if that help fix the
>> observed symptom on your
>> Intel machine? 
>>
>> Thanks,
>> --Nilay
>>
>>
>>
>>
>>
> Hello Nilay
> Thanks will try that.
> Was your IBM PPC issue also only with direct attached PCIE slot based
> nvme.
> Will report back after testing with the revert
> 
On PPC, it doesn't matter whether NVMe disk is directly attached to PHB or 
attached through another PCIe bridge. On PPC we saw that when nvme subsystem-
reset command is executed on an NVMe disk the EEH couldn't recover the disk 
and that' where the above commit (from Keith) helped get the disk recovered 
using EEH after the subsystem-reset command.

Thanks,
--Nilay




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-09-26 21:11 nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot Laurence Oberman
  2024-09-27  6:10 ` Nilay Shroff
@ 2024-10-03 21:04 ` Keith Busch
  2024-10-07 15:56   ` Laurence Oberman
  1 sibling, 1 reply; 8+ messages in thread
From: Keith Busch @ 2024-10-03 21:04 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: busch, keith, linux-nvme

On Thu, Sep 26, 2024 at 05:11:05PM -0400, Laurence Oberman wrote:
> It was reported to Red Hat, seeing issues with using a
> "nvme subsystem-reset /dev/nvme0" command to test resets.

I really dislike that command. The side effects are overkill for the pci
transport...
 
> On multiple servers I tested on two types of nvme attached devices
> These are not the rootfs devices
>
> 1. The front slot (hotplug) devices in a 2.5in format 
> reset and after some time recover (what is expected)
> 
> Example of one working
> 
> Does not trap and land up as a machine-check

<snip>

> 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> a machine check and panics the box when its against a nvme in a 
> PCIE slot
> 
> [  263.862919] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5 Bank 6: ba00000000000e0b
> [  263.862924] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}

So this wasn't failing before 6.11? As Nilay mentioned, there are some
changes on how nvme subsystem reset is handled. The main thing being
this ioctl doesn't automatically trigger an nvme reset. I expected
delayed recovery might happen, but machine checks are not expected. If
this was working before, I can only guess right now that the previous
behavior was accessing MMIO and config quicker and triggered a different
error path. If you're successful with the PPC patch reverted, I would be
interested to hear about it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-10-03 21:04 ` Keith Busch
@ 2024-10-07 15:56   ` Laurence Oberman
  2024-10-29 16:07     ` Laurence Oberman
  0 siblings, 1 reply; 8+ messages in thread
From: Laurence Oberman @ 2024-10-07 15:56 UTC (permalink / raw)
  To: Keith Busch; +Cc: busch, keith, linux-nvme

On Thu, 2024-10-03 at 15:04 -0600, Keith Busch wrote:
> On Thu, Sep 26, 2024 at 05:11:05PM -0400, Laurence Oberman wrote:
> > It was reported to Red Hat, seeing issues with using a
> > "nvme subsystem-reset /dev/nvme0" command to test resets.
> 
> I really dislike that command. The side effects are overkill for the
> pci
> transport...
>  
> > On multiple servers I tested on two types of nvme attached devices
> > These are not the rootfs devices
> > 
> > 1. The front slot (hotplug) devices in a 2.5in format 
> > reset and after some time recover (what is expected)
> > 
> > Example of one working
> > 
> > Does not trap and land up as a machine-check
> 
> <snip>
> 
> > 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> > a machine check and panics the box when its against a nvme in a 
> > PCIE slot
> > 
> > [  263.862919] mce: [Hardware Error]: CPU 12: Machine Check
> > Exception: 5 Bank 6: ba00000000000e0b
> > [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
> > 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> 
> So this wasn't failing before 6.11? As Nilay mentioned, there are
> some
> changes on how nvme subsystem reset is handled. The main thing being
> this ioctl doesn't automatically trigger an nvme reset. I expected
> delayed recovery might happen, but machine checks are not expected.
> If
> this was working before, I can only guess right now that the previous
> behavior was accessing MMIO and config quicker and triggered a
> different
> error path. If you're successful with the PPC patch reverted, I would
> be
> interested to hear about it.
> 

Hello

Quick update about this.
I went back all the way to 6.8 and this still happens.
I started to think that these HPE servers were more susceptible to the
machine checks on the PCIE state changes.

So I tested on a Lenovo and still had panics.
I do not think this is worth pursuing given that Keith already
confirmed this is not recommended and way too heavy handed on the PCIE
path.

I have told the reporter of this that they are not to use this type of
fault injection on directly attached nvme devices.

Thanks
Laurence



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-10-07 15:56   ` Laurence Oberman
@ 2024-10-29 16:07     ` Laurence Oberman
  2024-10-29 16:42       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Laurence Oberman @ 2024-10-29 16:07 UTC (permalink / raw)
  To: Keith Busch; +Cc: busch, keith, linux-nvme

On Mon, 2024-10-07 at 11:56 -0400, Laurence Oberman wrote:
> On Thu, 2024-10-03 at 15:04 -0600, Keith Busch wrote:
> > On Thu, Sep 26, 2024 at 05:11:05PM -0400, Laurence Oberman wrote:
> > > It was reported to Red Hat, seeing issues with using a
> > > "nvme subsystem-reset /dev/nvme0" command to test resets.
> > 
> > I really dislike that command. The side effects are overkill for
> > the
> > pci
> > transport...
> >  
> > > On multiple servers I tested on two types of nvme attached
> > > devices
> > > These are not the rootfs devices
> > > 
> > > 1. The front slot (hotplug) devices in a 2.5in format 
> > > reset and after some time recover (what is expected)
> > > 
> > > Example of one working
> > > 
> > > Does not trap and land up as a machine-check
> > 
> > <snip>
> > 
> > > 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> > > a machine check and panics the box when its against a nvme in a 
> > > PCIE slot
> > > 
> > > [  263.862919] mce: [Hardware Error]: CPU 12: Machine Check
> > > Exception: 5 Bank 6: ba00000000000e0b
> > > [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
> > > 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> > 
> > So this wasn't failing before 6.11? As Nilay mentioned, there are
> > some
> > changes on how nvme subsystem reset is handled. The main thing
> > being
> > this ioctl doesn't automatically trigger an nvme reset. I expected
> > delayed recovery might happen, but machine checks are not expected.
> > If
> > this was working before, I can only guess right now that the
> > previous
> > behavior was accessing MMIO and config quicker and triggered a
> > different
> > error path. If you're successful with the PPC patch reverted, I
> > would
> > be
> > interested to hear about it.
> > 
> 
> Hello
> 
> Quick update about this.
> I went back all the way to 6.8 and this still happens.
> I started to think that these HPE servers were more susceptible to
> the
> machine checks on the PCIE state changes.
> 
> So I tested on a Lenovo and still had panics.
> I do not think this is worth pursuing given that Keith already
> confirmed this is not recommended and way too heavy handed on the
> PCIE
> path.
> 
> I have told the reporter of this that they are not to use this type
> of
> fault injection on directly attached nvme devices.
> 
> Thanks
> Laurence
> 
Hello

Finishing this thread off but have a final question. 
Bottom line is certain server hardware sees the nvme reset command
create a machine check for PCIE plugged NVME devices going back quite
far in kernel versions,  and we panic.

As Keith had said, that nvme reset command is too much impact

There is a final simple question for M2 connected NVME devices. 
Are these expected to auto-re-connect after an nvme reset is issued. 

The complaint is the following

nvme subsystem-reset /dev/nvme0 
Device is disconnected as expected but requires the following to
reconnect

echo 1 >  /sys/bus/pci/devices/0000:02:00.0/remove
echo 1 > /sys/bus/pci/rescan

Then it is reconnected.

Thanks
Laurence



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
  2024-10-29 16:07     ` Laurence Oberman
@ 2024-10-29 16:42       ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2024-10-29 16:42 UTC (permalink / raw)
  To: Laurence Oberman; +Cc: busch, keith, linux-nvme

On Tue, Oct 29, 2024 at 12:07:26PM -0400, Laurence Oberman wrote:
> Finishing this thread off but have a final question. 
> Bottom line is certain server hardware sees the nvme reset command
> create a machine check for PCIE plugged NVME devices going back quite
> far in kernel versions,  and we panic.
> 
> As Keith had said, that nvme reset command is too much impact

Sure, it takes the PCIe link down, and handling for that, if at all, is
platform specific.
 
> There is a final simple question for M2 connected NVME devices. 
> Are these expected to auto-re-connect after an nvme reset is issued. 
> 
> The complaint is the following
> 
> nvme subsystem-reset /dev/nvme0 
> Device is disconnected as expected but requires the following to
> reconnect
> 
> echo 1 >  /sys/bus/pci/devices/0000:02:00.0/remove
> echo 1 > /sys/bus/pci/rescan
> 
> Then it is reconnected.

For platforms that don't support link detected hotplug, that sequence
should get the device back to usable.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-29 18:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-26 21:11 nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot Laurence Oberman
2024-09-27  6:10 ` Nilay Shroff
2024-09-27 12:18   ` Laurence Oberman
2024-09-27 13:06     ` Nilay Shroff
2024-10-03 21:04 ` Keith Busch
2024-10-07 15:56   ` Laurence Oberman
2024-10-29 16:07     ` Laurence Oberman
2024-10-29 16:42       ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox