Linux CXL
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Dave Jiang <dave.jiang@intel.com>
Cc: <linux-cxl@vger.kernel.org>, <alison.schofield@intel.com>,
	<vishal.l.verma@intel.com>, <bwidawsk@kernel.org>,
	<dan.j.williams@intel.com>, <shiju.jose@huawei.com>,
	<rrichter@amd.com>
Subject: Re: [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling
Date: Thu, 3 Nov 2022 12:58:51 +0000	[thread overview]
Message-ID: <20221103125851.00000ce9@huawei.com> (raw)
In-Reply-To: <20221024170102.00000c4b@huawei.com>

On Mon, 24 Oct 2022 17:01:02 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 19 Oct 2022 10:38:13 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
> 
> > On 10/19/2022 10:30 AM, Jonathan Cameron wrote:  
> > > On Tue, 11 Oct 2022 18:19:15 +0100
> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> > >    
> > >> On Tue, 11 Oct 2022 08:18:34 -0700
> > >> Dave Jiang <dave.jiang@intel.com> wrote:
> > >>    
> > >>> On 10/11/2022 7:17 AM, Jonathan Cameron wrote:    
> > >>>> On Fri, 16 Sep 2022 16:10:53 -0700
> > >>>> Dave Jiang <dave.jiang@intel.com> wrote:
> > >>>>         
> > >>>>> Series set to RFC since there's no means to test. Would like to get opinion
> > >>>>> on whether going with using trace events as reporting mechanism is ok.
> > >>>>>
> > >>>>> Jonathan,
> > >>>>> We currently don't have any ways to test AER events. Do you have any plans
> > >>>>> to support AER events via QEMU emulation?    
> > >>>> Sorry - missed this entirely as gotten a bit behind reading CXL emails.    
> > > Hi Dave,
> > >
> > > Quick update.
> > >
> > > Working QEMU emulation - but needs some/lots of cleanup. Particularly fun was
> > > figuring out why I wasn't getting messages past the upstream switch port.
> > > Turned out the serial number ECAP was on top of the AER ECAP. Oops - thankfully
> > > that patch isn't upstream yet.
> > > Also QEMU AER rooting seems to be based on some older PCIE spec
> > > so needed some tweaks to get the device to actually issue ERR_FATAL etc.
> > >
> > > Anyhow, should have something you can play with in a day or two.    
> > 
> > Awesome! Thanks! :)  
> 
> Took a little longer than expected..
> 
> Anyhow, now at
> https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-24
> 
> That tree is carrying far too many things right now for it make much sense
> to me to email this to qemu-devel - though I may pull
> hw/pci/aer: Add missing routing for AER errors
> out in advance as that's closing a spec different between QEMU emulation of AER
> and what the PCI spec says.
> 
> Hopefully set of out of tree patches will start to shrink soon - v9 of the DOE
> patches have been on list for a week or so.
> 
> Top patch includes a very short 'how to' in patch description.  Basically fire
> up QMP: Add something like -qmp tcp:localhost:444,server=on,wait=off to your
> qemu commandline and use commands like:
> 
> { "execute": "qmp_capabilities" }
> ...
> { "execute": "cxl-inject-uncorrectable-error",
>     "arguments": {
>         "path": "/machine/peripheral/cxl-pmem0",
>         "type": "cache-address-parity",
>         "header": [ 3, 4]
>     } }
> ...
> { "execute": "cxl-inject-correctable-error",
>     "arguments": {
>         "path": "/machine/peripheral/cxl-pmem0",
>         "type": "physical",
>         "header": [ 3, 4]
>     } }
> 

So Dave reported that this wasn't working on x86 qemu machines.

A fun bit of debugging later (I hate AML) and I think I have find the issue +
have a hack to workaround it for now.

So need some background.
1) CXL code is based on QEMU's pci expander bridge root bridge - there is a complex
   bit of handling to create appropriate ACPI DSDT magic.
2) The CXL root port is based on pcie_root_port.c 
3) Both CXL root port and pcie root port use traditional PCI interrupts, not MSI/MSIX
   for their signaling.
4) Q35 machine uses an IOAPIC and the resulting PCI bus interrupt routing lands the
   actual interrupt on line 23 for my particular configuration
5) The ACPI table says it's on line 11.
6) x86 code for creating the PRT has an informative comment...
https://elixir.bootlin.com/qemu/latest/source/hw/i386/acpi-build.c#L697
 * The main goal is to equaly distribute the interrupts
 * over the 4 existing ACPI links (works only for i440fx).

So the hack I'm running is below (note the UID thing is a separate bug that stops
iasl from disassembling the DSDT due to a duplicate entry - I'll send out a fix
for that shortly).

There are a bunch of possible approaches to fix this if my identification of
the issue is correct.

1) Clean equivalent of this hack that runs on appropriate machines only.
2) Use MSI instead. (ioh3420 root port takes this approach I think).

From 286c8f9b6d229d9e71f64657b6b3ccb70cb98306 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Thu, 3 Nov 2022 12:20:25 +0000
Subject: [PATCH] HACK: Fix-up interrupt routing for CXL on q35.

I need to do some more thinking to figure out correct approach
to solve this problem.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/i386/acpi-build.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4f54b61904..8055253e68 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -746,7 +746,7 @@ static Aml *build_prt(bool is_pci0_prt)
                       lnk_idx));
 
         /* route[2] = "LNK[D|A|B|C]", selection based on pin % 3  */
-        aml_append(while_ctx, initialize_route(route, "LNKD", lnk_idx, 0));
+        aml_append(while_ctx, initialize_route(route, "GSIH", lnk_idx, 0));
         if (is_pci0_prt) {
             Aml *if_device_1, *if_pin_4, *else_pin_4;
 
@@ -762,16 +762,16 @@ static Aml *build_prt(bool is_pci0_prt)
                 else_pin_4 = aml_else();
                 {
                     aml_append(else_pin_4,
-                        aml_store(build_prt_entry("LNKA"), route));
+                        aml_store(build_prt_entry("GSIE"), route));
                 }
                 aml_append(if_device_1, else_pin_4);
             }
             aml_append(while_ctx, if_device_1);
         } else {
-            aml_append(while_ctx, initialize_route(route, "LNKA", lnk_idx, 1));
+            aml_append(while_ctx, initialize_route(route, "GSIE", lnk_idx, 1));
         }
-        aml_append(while_ctx, initialize_route(route, "LNKB", lnk_idx, 2));
-        aml_append(while_ctx, initialize_route(route, "LNKC", lnk_idx, 3));
+        aml_append(while_ctx, initialize_route(route, "GSIF", lnk_idx, 2));
+        aml_append(while_ctx, initialize_route(route, "GSIG", lnk_idx, 3));
 
         /* route[0] = 0x[slot]FFFF */
         aml_append(while_ctx,
@@ -1627,7 +1627,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
                 aml_append(pkg, aml_eisaid("PNP0A03"));
                 aml_append(dev, aml_name_decl("_CID", pkg));
                 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-                aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
+//                aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
                 build_cxl_osc_method(dev);
             } else if (pci_bus_is_express(bus)) {
                 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-- 
2.37.2
> 
> > 
> >   
> > > In meantime an example dump (not writing the header log yet!)
> > >
> > > pcieport 0000:0c:00.0: AER: Uncorrected (Non-Fatal) error received: 0000:0f:00.0
> > > cxl_pci 0000:0f:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > > cxl_pci 0000:0f:00.0:   device [8086:0d93] error status/mask=00004000/00000000
> > > cxl_pci 0000:0f:00.0:    [14] CmpltTO                (First)
> > > cxl_ras_uc: mem3: status: 'Cache Data Parity Error' first_error: 'Cache Data Parity Error' header log: {0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0}
> > > cxl_pci 0000:0f:00.0: mem3: restart CXL.mem after slot reset
> > > cxl_port endpoint6: No CMA mailbox
> > > cxl_pci 0000:0f:00.0: mem3: error resume successful
> > > pcieport 0000:0e:00.0: AER: device recovery successful
> > >
> > > Jonathan    
> >   
> 
> 


  parent reply	other threads:[~2022-11-03 12:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 23:10 [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 1/9] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 2/9] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 3/9] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-10-18 13:43   ` Jonathan Cameron
2022-09-16 23:11 ` [PATCH RFC v2 4/9] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 5/9] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-10-20 16:54   ` Jonathan Cameron
2022-09-16 23:11 ` [PATCH RFC v2 6/9] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 7/9] cxl/pci: Find and map the " Dave Jiang
2022-09-16 23:11 ` [PATCH RFC v2 8/9] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-10-20 17:02   ` Jonathan Cameron
2022-10-20 17:07     ` Dave Jiang
2022-10-20 17:52       ` Steven Rostedt
2022-09-16 23:11 ` [PATCH RFC v2 9/9] cxl/pci: Add (hopeful) error handling support Dave Jiang
2022-10-20 13:45   ` Jonathan Cameron
2022-10-20 14:50     ` Dave Jiang
2022-10-20 14:03   ` Jonathan Cameron
2022-10-20 14:57     ` Dave Jiang
2022-10-20 15:52   ` Jonathan Cameron
2022-10-20 16:06     ` Dave Jiang
2022-10-20 16:11       ` Jonathan Cameron
2022-10-11 14:17 ` [PATCH RFC v2 0/9] cxl/pci: Add fundamental error handling Jonathan Cameron
2022-10-11 15:18   ` Dave Jiang
2022-10-11 17:19     ` Jonathan Cameron
2022-10-19 17:30       ` Jonathan Cameron
2022-10-19 17:38         ` Dave Jiang
2022-10-24 16:01           ` Jonathan Cameron
2022-10-25 15:22             ` Dave Jiang
2022-11-03 12:58             ` Jonathan Cameron [this message]
2022-11-03 13:27               ` Jonathan Cameron
2022-11-16 23:20                 ` Dave Jiang
2022-11-17 13:50                   ` Jonathan Cameron
2022-11-18 17:15                     ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221103125851.00000ce9@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rrichter@amd.com \
    --cc=shiju.jose@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox