From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Cc: <linux-cxl@vger.kernel.org>, <qemu-devel@nongnu.org>,
Robert Richter <rrichter@amd.com>,
Terry Bowman <Terry.Bowman@amd.com>, <dan.williams@intel.com>
Subject: Enabling internal errors for VH CXL devices: [was: Re: Questions about CXL RAS injection test in qemu]
Date: Wed, 6 Mar 2024 13:23:59 +0000 [thread overview]
Message-ID: <20240306132359.00001956@Huawei.com> (raw)
In-Reply-To: <20240306112707.3116081-1-wangyuquan1236@phytium.com.cn>
On Wed, 6 Mar 2024 19:27:07 +0800
Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote:
> Hello, Jonathan
>
> Recently I met some problems on CXL RAS tests.
>
> I tried to use "cxl-inject-uncorrectable-errors" and "cxl-inject-correctable-error"
> qmp to inject CXL errors, however, there was no any kernel printing information in
> my qemu machine. And the qmp connection was unstable that made the machine
> always "terminating on signal 2".
The qmp connection being unstable is odd - might be related to the CXL code, but
I'm not sure how..
>
> In addition, I successfully used the hmp "pcie_aer_inject_error" in the same conditions.
> The kernel showed relevant print information.
IIRC the AER paths print under all circumstances whereas CXL errors do not, they simply
trigger tracepoints - but you should have seen device resets.
However I span up a test and I think the issue is more straight forward.
The uncorrectable internal error and correctable internal errors are masked on the device.
I thought we changed the default on this in linux but maybe not :(
Hack is fine the relevant device with lspci -tv and then use
setpci -s 0d:00.0 0x208.l=0
to clear all the mask bits for uncorrectable errors.
Note I tested this on a convenient arm64 setup so always possible there is yet
another problem on x86.
Robert / Terry, I tracked down the patch where you enabled this for RCHs and there was
some discussion on walking out on VH as well to enable this, but seems it
never happened. Can you remember why? Just kicked back for a future occasion?
Jonathan
>
> Question:
> 1) Is my CXL RAS test operations standard?
> 2) The error injected by "pcie_aer_inject_error" is "protocol & link errors" of cxl.io?
> The error injected by "cxl-inject-uncorrectable-errors" or "cxl-inject-correctable-error" is "protocol & link errors" of cxl.cachemem?
>
> Hope I can get some helps here, any help will be greatly appreciated.
>
>
> My qemu command line:
> qemu-system-x86_64 \
> -M q35,nvdimm=on,cxl=on \
> -m 4G \
> -smp 4 \
> -object memory-backend-ram,size=2G,id=mem0 \
> -numa node,nodeid=0,cpus=0-1,memdev=mem0 \
> -object memory-backend-ram,size=2G,id=mem1 \
> -numa node,nodeid=1,cpus=2-3,memdev=mem1 \
> -object memory-backend-ram,size=256M,id=cxl-mem0 \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
> -device cxl-type3,bus=root_port0,volatile-memdev=cxl-mem0,id=cxl-mem0 \
> -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k \
> -hda ../disk/ubuntu_x86_test_new.qcow2 \
> -nographic \
> -qmp tcp:127.0.0.1:4444,server,nowait \
>
> Qemu version: 8.2.50, the lastest commit of branch cxl-2024-03-05 in "https://gitlab.com/jic23/qemu"
> Kernel version: 6.8.0-rc6
>
> My steps in the Qemu qmp:
> 1) telnet 127.0.0.1 4444
>
> result:
> Trying 127.0.0.1...
> Connected to 127.0.0.1.
> Escape character is '^]'.
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 8}, "package": "v6.2.0-19482-gccfb4fe221"}, "capabilities": ["oob"]}}
>
> 2) { "execute": "qmp_capabilities" }
>
> result:
> {"return": {}}
>
> 3) If inject correctable error:
> { "execute": "cxl-inject-correctable-error",
> "arguments": {
> "path": "/machine/peripheral/cxl-mem0",
> "type": "physical"
> } }
>
> result:
> {"return": {}}
>
> 3) If inject uncorrectable error:
> { "execute": "cxl-inject-uncorrectable-errors",
> "arguments": {
> "path": "/machine/peripheral/cxl-mem0",
> "errors": [
> {
> "type": "cache-address-parity",
> "header": [ 3, 4]
> },
> {
> "type": "cache-data-parity",
> "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
> },
> {
> "type": "internal",
> "header": [ 1, 2, 4]
> }
> ]
> }}
>
> result:
> {"return": {}}
> {"timestamp": {"seconds": 1709721640, "microseconds": 275345}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-signal"}}
>
> Many thanks
> Yuquan
>
next prev parent reply other threads:[~2024-03-06 13:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-06 11:27 Questions about CXL RAS injection test in qemu Yuquan Wang
2024-03-06 13:23 ` Jonathan Cameron [this message]
2024-03-06 17:12 ` Enabling internal errors for VH CXL devices: [was: Re: Questions about CXL RAS injection test in qemu] Terry Bowman
2024-03-06 19:06 ` Terry Bowman
2024-03-06 17:16 ` Dan Williams
2024-03-06 17:42 ` Terry Bowman
-- strict thread matches above, loose matches on Subject: below --
2024-03-08 2:01 Yuquan Wang
2024-03-08 12:59 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240306132359.00001956@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Terry.Bowman@amd.com \
--cc=dan.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=rrichter@amd.com \
--cc=wangyuquan1236@phytium.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox