Linux CXL
 help / color / mirror / Atom feed
* Some thoughts and questions about CXL & MCE
@ 2023-12-22 16:33 Shiyang Ruan
  2024-01-02 17:29 ` Jonathan Cameron
  2024-01-03 16:45 ` Dan Williams
  0 siblings, 2 replies; 7+ messages in thread
From: Shiyang Ruan @ 2023-12-22 16:33 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams@intel.com, Jonathan Cameron, dave.jiang,
	vishal.l.verma

Hi guys,

I have some thoughts and questions about CXL & MCE mechanism.

CXL type-3 devices can be used as volatile or persistent memory, so a 
poisoned page on them should also trigger a memory failure, to let OS 
handle process using the page and let device driver recover the page.  I 
am now investigating this.

Currently, CXL RAS is under development.  We can now inject POISON on a 
CXL device by qemu (qmp commands), and then `cxl list -L` could show 
those poisoned areas.  But the POISON injection is silent, I think we 
need a singal here to notify OS to handle those poisoned areas when 
injecting.  According to CXL 3.0 spec Figure 12-5, there are 2 methods 
to send the signal: FW-First and OS-First.
My understanding about them is:
- FW-First method:
   a. CXL device report POISON to Firmware
   b. GHES calls CXL driver handler[1], which will handle the POISON
   c. CXL driver handler translates DPA to HPA, construct a mce 
instance, then call mce_log() to queue this MCE (? not sure)
- OS-First method:
   a. CXL device report POISON to OS by MSI
   b. CXL driver will handle the POISON
   c. same with the c. above

So, I think:
Firstly, and obviously, we need to add a signal when injecting POISON in 
qemu.  For example, call `cxl_event_insert()` after injection.

Secondly, implement a method in CXL driver to turn POISON to MCE and 
push it into the mce_evt_pool for decode chain to process, then 
mce_uc_nb on this chain will finally call memory_failure().

And a question:
How to configure the CXL device to choose FW-First or OS-First singal 
methods (methods for qemu and bare matel if possible)?


I don't fully understand the CXL spec yet (it's difficult for me), so 
the above ideas may be immature, but I really want to figure out how we 
can make CXL & MCE work.  I'd really appreciate it if you could help me 
on this!

[1] 
https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u


--
Thanks,
Ruan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some thoughts and questions about CXL & MCE
  2023-12-22 16:33 Some thoughts and questions about CXL & MCE Shiyang Ruan
@ 2024-01-02 17:29 ` Jonathan Cameron
  2024-01-03 16:45 ` Dan Williams
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Cameron @ 2024-01-02 17:29 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: linux-cxl, dan.j.williams@intel.com, dave.jiang, vishal.l.verma,
	qemu-devel

On Sat, 23 Dec 2023 00:33:43 +0800
Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:

> Hi guys,
> 
> I have some thoughts and questions about CXL & MCE mechanism.

+CC qemu-devel as quite bit of this is QEMU related .

> 
> CXL type-3 devices can be used as volatile or persistent memory, so a 
> poisoned page on them should also trigger a memory failure, to let OS 
> handle process using the page and let device driver recover the page.  I 
> am now investigating this.
> 
> Currently, CXL RAS is under development.  We can now inject POISON on a 
> CXL device by qemu (qmp commands), and then `cxl list -L` could show 
> those poisoned areas.  But the POISON injection is silent, I think we 
> need a singal here to notify OS to handle those poisoned areas when 
> injecting. 

Agreed. The emulation is far from complete.  It should kick off
the relevant event log entry additions as well under at least some
circumstances (depends whether we think we are injecting poison to be
discovered later - which it won't be because we don't check for poison
when doing reads and writes - I've not yet figured out how to do that
in QEMU). If we are using the inject poison opcode from the host OS
then we are missing the bit in 8.2.9.9.4.2 (CXL r3.1)
"In addition , the device shall add an appropriate poison creation
event to it's internal informational event log, update the event
status register and if configured, interrupt the host".
So that should do a General Media Event of type 04h - host inject poison.

For the qmp interface we should add control of whether we are injecting
poison that is intended to trigger an error now (e.g. what would result
from a scrub detecting it) or poison for detection later - either by
triggering a media scan, or by a host read / write.

If it's a scrub poison detection that we are emulating then we
should issue an uncorrectable GMR Event record with Memory Event Type
of Scrub media or maybe a 00h (Media ECC error) if we think some
other reason might cause it and transaction type 05 Media patrol scrub.

Note IIRC you can manually inject these records which will result in
appropriate events being reported in Linux they just aren't currently
hooked up to the QEMU poison injection (qmp or host interface).

If you have time to look at filling these more complex flows in that
would be great as it would make the qemu side of things easier to use.

> According to CXL 3.0 spec Figure 12-5, there are 2 methods 
> to send the signal: FW-First and OS-First.
> My understanding about them is:
> - FW-First method:
>    a. CXL device report POISON to Firmware
>    b. GHES calls CXL driver handler[1], which will handle the POISON

I'm in two minds about how to emulate the firmware first paths.
In the short term I'll get some old code I have running again that
lets us do general CPER record injection. However, we might want to
actually push the record creation into EDK2.  Meh - lets do it in
qemu first and see how bad it looks.

>    c. CXL driver handler translates DPA to HPA, construct a mce 
> instance, then call mce_log() to queue this MCE (? not sure)

Yes, the last step is missing currently I think? (I'm loosing track
of some of the ras flows).

> - OS-First method:
>    a. CXL device report POISON to OS by MSI
>    b. CXL driver will handle the POISON
>    c. same with the c. above
> 
> So, I think:
> Firstly, and obviously, we need to add a signal when injecting POISON in 
> qemu.  For example, call `cxl_event_insert()` after injection.
Yes - create the appropriate records and add them.  However we'll need
to enable adding different causes of poison so we know whether to do this
or to rely on later queries or not.

> 
> Secondly, implement a method in CXL driver to turn POISON to MCE and 
> push it into the mce_evt_pool for decode chain to process, then 
> mce_uc_nb on this chain will finally call memory_failure().
> 
> And a question:
> How to configure the CXL device to choose FW-First or OS-First singal 
> methods (methods for qemu and bare matel if possible)?

There is an _OSC for this.  We can hook that up in QEMU but it may be
controversial to do it there rather than in EDK2.

> 
> 
> I don't fully understand the CXL spec yet (it's difficult for me), so 
> the above ideas may be immature, but I really want to figure out how we 
> can make CXL & MCE work.  I'd really appreciate it if you could help me 
> on this!
> 
> [1] 
> https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u

Great if you can look at filling in the details in this area.
There are still quite a few flows we haven't fully realized in emulation
or in the kernel.

Jonathan

> 
> 
> --
> Thanks,
> Ruan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Some thoughts and questions about CXL & MCE
  2023-12-22 16:33 Some thoughts and questions about CXL & MCE Shiyang Ruan
  2024-01-02 17:29 ` Jonathan Cameron
@ 2024-01-03 16:45 ` Dan Williams
  2024-01-08 12:37   ` Jonathan Cameron
  1 sibling, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-01-03 16:45 UTC (permalink / raw)
  To: Shiyang Ruan, linux-cxl
  Cc: dan.j.williams@intel.com, Jonathan Cameron, dave.jiang,
	vishal.l.verma

Shiyang Ruan wrote:
> Hi guys,
> 
> I have some thoughts and questions about CXL & MCE mechanism.
> 
> CXL type-3 devices can be used as volatile or persistent memory, so a 
> poisoned page on them should also trigger a memory failure, to let OS 
> handle process using the page and let device driver recover the page.  I 
> am now investigating this.
> 
> Currently, CXL RAS is under development.  We can now inject POISON on a 
> CXL device by qemu (qmp commands), and then `cxl list -L` could show 
> those poisoned areas.  But the POISON injection is silent, I think we 
> need a singal here to notify OS to handle those poisoned areas when 
> injecting.  According to CXL 3.0 spec Figure 12-5, there are 2 methods 
> to send the signal: FW-First and OS-First.
> My understanding about them is:
> - FW-First method:
>    a. CXL device report POISON to Firmware
>    b. GHES calls CXL driver handler[1], which will handle the POISON

Yes, GHES conveys a CPER record to the OS and the CPER handler forwards the
record to the CXL driver for address translation.

>    c. CXL driver handler translates DPA to HPA, construct a mce 

Close, an "MCE" is the native notification of a machine check exception. In
firmware first the error is translated to an EDAC event via
ghes_edac_report_mem_error(). That also gets translated to an mce_log() to
support legacy consumers of that event format.

> instance, then call mce_log() to queue this MCE (? not sure)

CXL firmware-first events are all reported in terms of DPA. The BIOS can
additionally do the address translation itself and emit the generic
memory CPER record, but Linux is assuming it needs to supplement CPER
records with OS translation.

> - OS-First method:
>    a. CXL device report POISON to OS by MSI
>    b. CXL driver will handle the POISON
>    c. same with the c. above
> 
> So, I think:
> Firstly, and obviously, we need to add a signal when injecting POISON in 
> qemu.  For example, call `cxl_event_insert()` after injection.
> 
> Secondly, implement a method in CXL driver to turn POISON to MCE and 
> push it into the mce_evt_pool for decode chain to process, then 
> mce_uc_nb on this chain will finally call memory_failure().

The driver needs to arrange to emit the event via existing memory error
record communication mechanism which is ghes_edac_report_mem_error(), or
one of its helpers.

The EDAC memory error reporting ABI is well established, and perhaps CXL
memory errors can skip emitting legacy mce_log() events, but depends on
what userspace remains that still consumes mce_log().

> And a question:
> How to configure the CXL device to choose FW-First or OS-First singal 
> methods (methods for qemu and bare matel if possible)?

Honestly I am not sure it is worthwhile to have QEMU support firmware
first vs just have a way to trigger CPER record events. The goal of the
CXL QEMU enabling is to test OS enabling and the virtual device does not
actually need simulate all of the Firmware-First mechanisms, the CPER
record result is sufficient.

> I don't fully understand the CXL spec yet (it's difficult for me), so 
> the above ideas may be immature, but I really want to figure out how we 
> can make CXL & MCE work.  I'd really appreciate it if you could help me 
> on this!
> 
> [1] 
> https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u

Following this work from Smita and Ira is the right path.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some thoughts and questions about CXL & MCE
  2024-01-03 16:45 ` Dan Williams
@ 2024-01-08 12:37   ` Jonathan Cameron
  2024-01-08 21:14     ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2024-01-08 12:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Shiyang Ruan, linux-cxl, dave.jiang, vishal.l.verma


> > And a question:
> > How to configure the CXL device to choose FW-First or OS-First singal 
> > methods (methods for qemu and bare matel if possible)?  
> 
> Honestly I am not sure it is worthwhile to have QEMU support firmware
> first vs just have a way to trigger CPER record events. The goal of the
> CXL QEMU enabling is to test OS enabling and the virtual device does not
> actually need simulate all of the Firmware-First mechanisms, the CPER
> record result is sufficient.

Agreed - one emulation wrinkle is it needs to not do some other stuff when
simulating them.  I.e. we don't want to fill up event logs etc with events
we are pretending the firmware dealt with and already cleared from logs etc.

To make it useable it has to do some magic to fill in the cper record, but
that is easy enough. Control wise, reuse the existing error injection
commands.

One other wrinkle I'm working through is the control of CPER vs normal reporting.
Current thought is we do what ACPI allows and start in firmware first, until the
_OSC call.  If that requests native handling we go back to what we currently
support (native only emulation).

However, there isn't a convenient way to mess with what Linux asks for which we'd
want to make it easy to test the handling once the driver stack is up.

I'm not sure anyone would be keen on a pci_aer=no-ask,cxl-mem-error=no-ask type
kernel boot parameter to instruct the kernel to never ask for control.

I also don't much like a qemu parameter which basically says 'report aer as
broken so the OS can't grab it'. Anyhow those are details.

Ah well. Getting the _OSC handshake to save what was negotiated on qemu side was
fiddly but I got that working on Friday so I have all the pieces for protocol errors
done (ARM only for now - I need to look at notifications in ACPI on x86 + enable
HEST in general on qemu-x86).  Will post an RFC for ARM shortly.

Bare metal will be burried in bios config most likely.

Jonathan


> 
> > I don't fully understand the CXL spec yet (it's difficult for me), so 
> > the above ideas may be immature, but I really want to figure out how we 
> > can make CXL & MCE work.  I'd really appreciate it if you could help me 
> > on this!
> > 
> > [1] 
> > https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u  
> 
> Following this work from Smita and Ira is the right path.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some thoughts and questions about CXL & MCE
  2024-01-08 12:37   ` Jonathan Cameron
@ 2024-01-08 21:14     ` Dan Williams
  2024-01-09 16:18       ` Jonathan Cameron
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2024-01-08 21:14 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: Shiyang Ruan, linux-cxl, dave.jiang, vishal.l.verma

Jonathan Cameron wrote:
[..]
> One other wrinkle I'm working through is the control of CPER vs normal reporting.
> Current thought is we do what ACPI allows and start in firmware first, until the
> _OSC call.  If that requests native handling we go back to what we currently
> support (native only emulation).
> 
> However, there isn't a convenient way to mess with what Linux asks for which we'd
> want to make it easy to test the handling once the driver stack is up.
> 
> I'm not sure anyone would be keen on a pci_aer=no-ask,cxl-mem-error=no-ask type
> kernel boot parameter to instruct the kernel to never ask for control.

I just expressed a similar lament to someone else asking about this, and
claimed that is up to the BIOS to say "no", not for Linux to skip
asking. It turns out that the Linux pci=noear knob predated _OSC:

   7ece14175376 PCI/AER: Remove aerdriver.forceload kernel parameter

..., so there was legacy to carry forward. Otherwise, in a post _OSC
world it's the OS responsibility to ask and the firmware responsibility
to optionally say, "no".

> I also don't much like a qemu parameter which basically says 'report aer as
> broken so the OS can't grab it'. Anyhow those are details.
> 
> Ah well. Getting the _OSC handshake to save what was negotiated on qemu side was
> fiddly but I got that working on Friday so I have all the pieces for protocol errors
> done (ARM only for now - I need to look at notifications in ACPI on x86 + enable
> HEST in general on qemu-x86).  Will post an RFC for ARM shortly.
> 
> Bare metal will be burried in bios config most likely.
> 
> Jonathan
> 
> 
> > 
> > > I don't fully understand the CXL spec yet (it's difficult for me), so 
> > > the above ideas may be immature, but I really want to figure out how we 
> > > can make CXL & MCE work.  I'd really appreciate it if you could help me 
> > > on this!
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u  
> > 
> > Following this work from Smita and Ira is the right path.
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some thoughts and questions about CXL & MCE
  2024-01-08 21:14     ` Dan Williams
@ 2024-01-09 16:18       ` Jonathan Cameron
  2024-01-09 19:59         ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Cameron @ 2024-01-09 16:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: Shiyang Ruan, linux-cxl, dave.jiang, vishal.l.verma

On Mon, 8 Jan 2024 13:14:06 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> Jonathan Cameron wrote:
> [..]
> > One other wrinkle I'm working through is the control of CPER vs normal reporting.
> > Current thought is we do what ACPI allows and start in firmware first, until the
> > _OSC call.  If that requests native handling we go back to what we currently
> > support (native only emulation).
> > 
> > However, there isn't a convenient way to mess with what Linux asks for which we'd
> > want to make it easy to test the handling once the driver stack is up.
> > 
> > I'm not sure anyone would be keen on a pci_aer=no-ask,cxl-mem-error=no-ask type
> > kernel boot parameter to instruct the kernel to never ask for control.  
> 
> I just expressed a similar lament to someone else asking about this, and
> claimed that is up to the BIOS to say "no", not for Linux to skip
> asking. It turns out that the Linux pci=noear knob predated _OSC:
> 
>    7ece14175376 PCI/AER: Remove aerdriver.forceload kernel parameter
> 
> ..., so there was legacy to carry forward. Otherwise, in a post _OSC
> world it's the OS responsibility to ask and the firmware responsibility
> to optionally say, "no".

PCI Firmware specification rev 3.3 Section 4.5.1.
(right at the end of page 48)

"System firmware must only mask a Control Field bit to zero if it has explicit
knowledge that the feature will not work properly under native operating system
control, due to platform errata or other incompatibilities."

Meh, I guess I could add a 'native-aer=broken' parameter to the qemu boot -
I'm sure that will sail through reviews :)

> 
> > I also don't much like a qemu parameter which basically says 'report aer as
> > broken so the OS can't grab it'. Anyhow those are details.
> > 
> > Ah well. Getting the _OSC handshake to save what was negotiated on qemu side was
> > fiddly but I got that working on Friday so I have all the pieces for protocol errors
> > done (ARM only for now - I need to look at notifications in ACPI on x86 + enable
> > HEST in general on qemu-x86).  Will post an RFC for ARM shortly.
> > 
> > Bare metal will be burried in bios config most likely.
> > 
> > Jonathan
> > 
> >   
> > >   
> > > > I don't fully understand the CXL spec yet (it's difficult for me), so 
> > > > the above ideas may be immature, but I really want to figure out how we 
> > > > can make CXL & MCE work.  I'd really appreciate it if you could help me 
> > > > on this!
> > > > 
> > > > [1] 
> > > > https://lore.kernel.org/linux-cxl/20231220-cxl-cper-v5-0-1bb8a4ca2c7a@intel.com/T/#u    
> > > 
> > > Following this work from Smita and Ira is the right path.  
> >   
> 
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some thoughts and questions about CXL & MCE
  2024-01-09 16:18       ` Jonathan Cameron
@ 2024-01-09 19:59         ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2024-01-09 19:59 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: Shiyang Ruan, linux-cxl, dave.jiang, vishal.l.verma

Jonathan Cameron wrote:
> On Mon, 8 Jan 2024 13:14:06 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Jonathan Cameron wrote:
> > [..]
> > > One other wrinkle I'm working through is the control of CPER vs normal reporting.
> > > Current thought is we do what ACPI allows and start in firmware first, until the
> > > _OSC call.  If that requests native handling we go back to what we currently
> > > support (native only emulation).
> > > 
> > > However, there isn't a convenient way to mess with what Linux asks for which we'd
> > > want to make it easy to test the handling once the driver stack is up.
> > > 
> > > I'm not sure anyone would be keen on a pci_aer=no-ask,cxl-mem-error=no-ask type
> > > kernel boot parameter to instruct the kernel to never ask for control.  
> > 
> > I just expressed a similar lament to someone else asking about this, and
> > claimed that is up to the BIOS to say "no", not for Linux to skip
> > asking. It turns out that the Linux pci=noear knob predated _OSC:
> > 
> >    7ece14175376 PCI/AER: Remove aerdriver.forceload kernel parameter
> > 
> > ..., so there was legacy to carry forward. Otherwise, in a post _OSC
> > world it's the OS responsibility to ask and the firmware responsibility
> > to optionally say, "no".
> 
> PCI Firmware specification rev 3.3 Section 4.5.1.
> (right at the end of page 48)
> 
> "System firmware must only mask a Control Field bit to zero if it has explicit
> knowledge that the feature will not work properly under native operating system
> control, due to platform errata or other incompatibilities."

Feel like "other incompatibilities" is the catch-all gray area for
firmware to force firmware-first operation when that ensures
"compatibility" with some 3rd party expectation.

> 
> Meh, I guess I could add a 'native-aer=broken' parameter to the qemu boot -
> I'm sure that will sail through reviews :)

Kidding aside it still feels like a bunch of work to emulate a flow in
QEMU for a firmware no one is going to run outside of kernel development
testing.  I.e. the QEMU CXL emulation is for testing the Linux kernel
not the Tianocore implementation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-09 19:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 16:33 Some thoughts and questions about CXL & MCE Shiyang Ruan
2024-01-02 17:29 ` Jonathan Cameron
2024-01-03 16:45 ` Dan Williams
2024-01-08 12:37   ` Jonathan Cameron
2024-01-08 21:14     ` Dan Williams
2024-01-09 16:18       ` Jonathan Cameron
2024-01-09 19:59         ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox