Linux CXL
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support
@ 2023-01-13 15:40 Jonathan Cameron
  2023-01-13 15:40 ` [RFC PATCH 1/2] cxl: RAS: Multiple header recording support Jonathan Cameron
  2023-01-13 15:40 ` [RFC PATCH 2/2] cxl: Add tprintk support for header log hex dump Jonathan Cameron
  0 siblings, 2 replies; 8+ messages in thread
From: Jonathan Cameron @ 2023-01-13 15:40 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams
  Cc: linuxarm, ira.weiny, vishal.l.verma, alison.schofield, Dave Jiang

CXL UE RAS Error reporting allows an EP to report the capability of
recording Multiple Header Logs for uncorrectable errors.
Unlike equivalent feature in PCIe, there is no enable control
for this feature, so a supporting device may be expecting
a more complex software flow than that necessary for devices
that do not support this feature. Documentation of this feature
is sparse, with assumption it works the same as PCIe.

There are hardware implementation choices allowed in the
equivalent PCIe r6.0 base spec section (6.4.2.4) that could
be safely used with the existing code, even with Multiple
Header Recording support but there are others that cannot.

The issue is what happens when the EP is doing Multiple Header
Recording but then the software writes 1 to clear more than one
status bit at the time (PCIe spec warns against doing this
 - but it is what the current kernel code will do):
Option 1)
  It does the nice thing and clears all matching errors.
  Note this is a bit strange for the case where the device
  supports logging multiple instances of a given error - so
  the two can't be combined cleanly.  With that feature
  I can't see how anyone could implement hardware that coped
  cleanly with the wrong software flow.
Option 2)
  It clears only the first error bit leaving a bunch of error
  bits set (note that if it has recorded multiple errors of
  same type it might not even do that). These are sticky
  across resets, so you will probably end up coming back up
  and immediately seeing an error.

So whilst you can design an EP to safe against non MH recording
aware software, it isn't generally the case. As we don't have
an explicit enable on CXL we have to handle anything reporting
the capability in a MH safe fashion.

This feature was developed against emulation in QEMU.
The relevant patches have not yet been posted but can be found on
https://gitlab.com/jic23/qemu/-/commits/cxl-2023-01-11
along with description of how to inject errors in the patch
descriptions.  I'll post them for review for QEMU inclusion
shortly.

RFC simply because the lack of specification detail means I am
less sure on this code than I would normally be. Unfortunately it
could be argued that the first patch is a fix for the
current upstream CXL RAS support.  If we want a simpler fix
one option would be to just fail to enable RAS support if
Multiple Header recording capability bit is set.  Or we
decide that it doesn't matter for now and add support for this
feature via the normal merge cycle.

Second patch is just there to make this easier to test as
no additional software is needed to print the header log.

Base is rather messy due to a clash between multiple cxl tree
branches.
cxl/fixes with the trace move on cxl/next cherry picked on top
as it moves the code that was fixed.

Jonathan Cameron (2):
  cxl: RAS: Multiple header recording support
  cxl: Add tprintk support for header log hex dump

 drivers/cxl/core/pci.c   | 17 ++++++++++++-----
 drivers/cxl/core/trace.h |  7 +++++--
 drivers/cxl/cxl.h        |  1 +
 3 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 8+ messages in thread
* [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support
@ 2023-01-13 15:40 Jonathan Cameron
  2023-01-13 15:40 ` [RFC PATCH 1/2] cxl: RAS: Multiple header recording support Jonathan Cameron
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron @ 2023-01-13 15:40 UTC (permalink / raw)
  To: linux-cxl, dan.j.williams
  Cc: linuxarm, ira.weiny, vishal.l.verma, alison.schofield

CXL UE RAS Error reporting allows an EP to report the capability of
recording Multiple Header Logs for uncorrectable errors.
Unlike equivalent feature in PCIe, there is no enable control
for this feature, so a supporting device may be expecting
a more complex software flow than that necessary for devices
that do not support this feature. Documentation of this feature
is sparse, with assumption it works the same as PCIe.

There are hardware implementation choices allowed in the
equivalent PCIe r6.0 base spec section (6.4.2.4) that could
be safely used with the existing code, even with Multiple
Header Recording support but there are others that cannot.

The issue is what happens when the EP is doing Multiple Header
Recording but then the software writes 1 to clear more than one
status bit at the time (PCIe spec warns against doing this
 - but it is what the current kernel code will do):
Option 1)
  It does the nice thing and clears all matching errors.
  Note this is a bit strange for the case where the device
  supports logging multiple instances of a given error - so
  the two can't be combined cleanly.  With that feature
  I can't see how anyone could implement hardware that coped
  cleanly with the wrong software flow.
Option 2)
  It clears only the first error bit leaving a bunch of error
  bits set (note that if it has recorded multiple errors of
  same type it might not even do that). These are sticky
  across resets, so you will probably end up coming back up
  and immediately seeing an error.

So whilst you can design an EP to safe against non MH recording
aware software, it isn't generally the case. As we don't have
an explicit enable on CXL we have to handle anything reporting
the capability in a MH safe fashion.

This feature was developed against emulation in QEMU.
The relevant patches have not yet been posted but can be found on
https://gitlab.com/jic23/qemu/-/commits/cxl-2023-01-11
along with description of how to inject errors in the patch
descriptions.  I'll post them for review for QEMU inclusion
shortly.

RFC simply because the lack of specification detail means I am
less sure on this code than I would normally be. Unfortunately it
could be argued that the first patch is a fix for the
current upstream CXL RAS support.  If we want a simpler fix
one option would be to just fail to enable RAS support if
Multiple Header recording capability bit is set.  Or we
decide that it doesn't matter for now and add support for this
feature via the normal merge cycle.

Second patch is just there to make this easier to test as
no additional software is needed to print the header log.

Base is rather messy due to a clash between multiple cxl tree
branches.
cxl/fixes with the trace move on cxl/next cherry picked on top
as it moves the code that was fixed.

Jonathan Cameron (2):
  cxl: RAS: Multiple header recording support
  cxl: Add tprintk support for header log hex dump

 drivers/cxl/core/pci.c   | 17 ++++++++++++-----
 drivers/cxl/core/trace.h |  7 +++++--
 drivers/cxl/cxl.h        |  1 +
 3 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-18 15:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-13 15:40 [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support Jonathan Cameron
2023-01-13 15:40 ` [RFC PATCH 1/2] cxl: RAS: Multiple header recording support Jonathan Cameron
2023-01-17 18:05   ` Dave Jiang
2023-01-13 15:40 ` [RFC PATCH 2/2] cxl: Add tprintk support for header log hex dump Jonathan Cameron
2023-01-17 18:08   ` Dave Jiang
2023-01-18  9:43     ` Jonathan Cameron
2023-01-18 15:19       ` Dave Jiang
  -- strict thread matches above, loose matches on Subject: below --
2023-01-13 15:40 [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support Jonathan Cameron
2023-01-13 15:40 ` [RFC PATCH 1/2] cxl: RAS: Multiple header recording support Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox