Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: <linux-cxl@vger.kernel.org>, <dan.j.williams@intel.com>
Cc: <linuxarm@huawei.com>, <ira.weiny@intel.com>,
	<vishal.l.verma@intel.com>, <alison.schofield@intel.com>,
	Dave Jiang <dave.jiang@intel.com>
Subject: [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support
Date: Fri, 13 Jan 2023 15:40:56 +0000	[thread overview]
Message-ID: <20230113154058.16227-1-Jonathan.Cameron@huawei.com> (raw)

CXL UE RAS Error reporting allows an EP to report the capability of
recording Multiple Header Logs for uncorrectable errors.
Unlike equivalent feature in PCIe, there is no enable control
for this feature, so a supporting device may be expecting
a more complex software flow than that necessary for devices
that do not support this feature. Documentation of this feature
is sparse, with assumption it works the same as PCIe.

There are hardware implementation choices allowed in the
equivalent PCIe r6.0 base spec section (6.4.2.4) that could
be safely used with the existing code, even with Multiple
Header Recording support but there are others that cannot.

The issue is what happens when the EP is doing Multiple Header
Recording but then the software writes 1 to clear more than one
status bit at the time (PCIe spec warns against doing this
 - but it is what the current kernel code will do):
Option 1)
  It does the nice thing and clears all matching errors.
  Note this is a bit strange for the case where the device
  supports logging multiple instances of a given error - so
  the two can't be combined cleanly.  With that feature
  I can't see how anyone could implement hardware that coped
  cleanly with the wrong software flow.
Option 2)
  It clears only the first error bit leaving a bunch of error
  bits set (note that if it has recorded multiple errors of
  same type it might not even do that). These are sticky
  across resets, so you will probably end up coming back up
  and immediately seeing an error.

So whilst you can design an EP to safe against non MH recording
aware software, it isn't generally the case. As we don't have
an explicit enable on CXL we have to handle anything reporting
the capability in a MH safe fashion.

This feature was developed against emulation in QEMU.
The relevant patches have not yet been posted but can be found on
https://gitlab.com/jic23/qemu/-/commits/cxl-2023-01-11
along with description of how to inject errors in the patch
descriptions.  I'll post them for review for QEMU inclusion
shortly.

RFC simply because the lack of specification detail means I am
less sure on this code than I would normally be. Unfortunately it
could be argued that the first patch is a fix for the
current upstream CXL RAS support.  If we want a simpler fix
one option would be to just fail to enable RAS support if
Multiple Header recording capability bit is set.  Or we
decide that it doesn't matter for now and add support for this
feature via the normal merge cycle.

Second patch is just there to make this easier to test as
no additional software is needed to print the header log.

Base is rather messy due to a clash between multiple cxl tree
branches.
cxl/fixes with the trace move on cxl/next cherry picked on top
as it moves the code that was fixed.

Jonathan Cameron (2):
  cxl: RAS: Multiple header recording support
  cxl: Add tprintk support for header log hex dump

 drivers/cxl/core/pci.c   | 17 ++++++++++++-----
 drivers/cxl/core/trace.h |  7 +++++--
 drivers/cxl/cxl.h        |  1 +
 3 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.37.2


             reply	other threads:[~2023-01-13 15:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 15:40 Jonathan Cameron [this message]
2023-01-13 15:40 ` [RFC PATCH 1/2] cxl: RAS: Multiple header recording support Jonathan Cameron
2023-01-17 18:05   ` Dave Jiang
2023-01-13 15:40 ` [RFC PATCH 2/2] cxl: Add tprintk support for header log hex dump Jonathan Cameron
2023-01-17 18:08   ` Dave Jiang
2023-01-18  9:43     ` Jonathan Cameron
2023-01-18 15:19       ` Dave Jiang
  -- strict thread matches above, loose matches on Subject: below --
2023-01-13 15:40 [RFC PATCH 0/2] CXL UE RAS Multiple Header Logging support Jonathan Cameron
2023-01-13 15:53 ` Jonathan Cameron
2023-01-17 17:36   ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230113154058.16227-1-Jonathan.Cameron@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox