All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <alison.schofield@intel.com>,
	<vishal.l.verma@intel.com>, <bwidawsk@kernel.org>,
	<dan.j.williams@intel.com>, <shiju.jose@huawei.com>,
	<rrichter@amd.com>
Subject: Re: [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling
Date: Mon, 24 Oct 2022 17:01:02 +0100	[thread overview]
Message-ID: <20221024170102.00000c4b@huawei.com> (raw)
In-Reply-To: <ae8330db-ab77-7952-e846-de7dc527890c@intel.com>

On Wed, 19 Oct 2022 10:38:13 -0700
Dave Jiang <dave.jiang@intel.com> wrote:

> On 10/19/2022 10:30 AM, Jonathan Cameron wrote:
> > On Tue, 11 Oct 2022 18:19:15 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >  
> >> On Tue, 11 Oct 2022 08:18:34 -0700
> >> Dave Jiang <dave.jiang@intel.com> wrote:
> >>  
> >>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:  
> >>>> On Fri, 16 Sep 2022 16:10:53 -0700
> >>>> Dave Jiang <dave.jiang@intel.com> wrote:
> >>>>       
> >>>>> Series set to RFC since there's no means to test. Would like to get opinion
> >>>>> on whether going with using trace events as reporting mechanism is ok.
> >>>>>
> >>>>> Jonathan,
> >>>>> We currently don't have any ways to test AER events. Do you have any plans
> >>>>> to support AER events via QEMU emulation?  
> >>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.  
> > Hi Dave,
> >
> > Quick update.
> >
> > Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> > figuring out why I wasn't getting messages past the upstream switch port.
> > Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> > that patch isn't upstream yet.
> > Also QEMU AER rooting seems to be based on some older PCIE spec
> > so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> >
> > Anyhow, should have something you can play with in a day or two.  
> 
> Awesome! Thanks! :)

Took a little longer than expected..

Anyhow, now at
https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24

That tree is carrying far too many things right now for it make much sense
to me to email this to qemu-devel - though I may pull
hw/pci/aer: Add missing routing for AER errors
out in advance as that's closing a spec different between QEMU emulation of AER
and what the PCI spec says.

Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
patches have been on list for a week or so.

Top patch includes a very short 'how to' in patch description.  Basically fire
up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
qemu commandline and use commands like:

{ "execute": "qmp_capabilities" }
...
{ "execute": "cxl-inject-uncorrectable-error",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
        "type": "cache-address-parity",
        "header": [ 3, 4]
    } }
...
{ "execute": "cxl-inject-correctable-error",
    "arguments": {
        "path": "/machine/peripheral/cxl-pmem0",
        "type": "physical",
        "header": [ 3, 4]
    } }



> 
> 
> > In meantime an example dump (not writing the header log yet!)
> >
> > pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
> > cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
> > cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
> > cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
> > cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
> > cxl_port endpoint6: No CMA mailbox
> > cxl_pci 0000:0f:00.0: mem3: error resume successful
> > pcieport 0000:0e:00.0: AER: device recovery successful
> >
> > Jonathan  
> 


  reply	other threads:[~2022-10-24 20:26 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 23:10 [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 1/9] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 2/9] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 3/9] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-10-18 13:43   ` Jonathan Cameron
2022-09-16 23:11 ` [PATCH RFC v2 4/9] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 5/9] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-10-20 16:54   ` Jonathan Cameron
2022-09-16 23:11 ` [PATCH RFC v2 6/9] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 7/9] cxl/pci: Find and map the " Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 8/9] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-10-20 17:02   ` Jonathan Cameron
2022-10-20 17:07     ` Dave Jiang
2022-10-20 17:52       ` Steven Rostedt
2022-09-16 23:11 ` [PATCH RFC v2 9/9] cxl/pci: Add (hopeful) error handling support Dave Jiang
2022-10-20 13:45   ` Jonathan Cameron
2022-10-20 14:50     ` Dave Jiang
2022-10-20 14:03   ` Jonathan Cameron
2022-10-20 14:57     ` Dave Jiang
2022-10-20 15:52   ` Jonathan Cameron
2022-10-20 16:06     ` Dave Jiang
2022-10-20 16:11       ` Jonathan Cameron
2022-10-11 14:17 ` [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling Jonathan Cameron
2022-10-11 15:18   ` Dave Jiang
2022-10-11 17:19     ` Jonathan Cameron
2022-10-19 17:30       ` Jonathan Cameron
2022-10-19 17:38         ` Dave Jiang
2022-10-24 16:01           ` Jonathan Cameron [this message]
2022-10-25 15:22             ` Dave Jiang
2022-11-03 12:58             ` Jonathan Cameron
2022-11-03 13:27               ` Jonathan Cameron
2022-11-16 23:20                 ` Dave Jiang
2022-11-17 13:50                   ` Jonathan Cameron
2022-11-18 17:15                     ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221024170102.00000c4b@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rrichter@amd.com \
    --cc=shiju.jose@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.