From: Jonathan Cameron via <qemu-devel@nongnu.org> To: Markus Armbruster <armbru@redhat.com> Cc: qemu-devel@nongnu.org, "Michael Tsirkin" <mst@redhat.com>, "Ben Widawsky" <bwidawsk@kernel.org>, linux-cxl@vger.kernel.org, linuxarm@huawei.com, "Ira Weiny" <ira.weiny@intel.com>, "Gregory Price" <gourry.memverge@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, "Mike Maslenkin" <mike.maslenkin@gmail.com>, "Dave Jiang" <dave.jiang@intel.com> Subject: Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support. Date: Tue, 31 Oct 2023 17:55:22 +0000 [thread overview] Message-ID: <20231031175522.00006073@Huawei.com> (raw) In-Reply-To: <87cyx04qcw.fsf@pond.sub.org> On Fri, 27 Oct 2023 06:54:39 +0200 Markus Armbruster <armbru@redhat.com> wrote: > I'm trying to fill in QMP documentation holes, and found one in commit > 415442a1b4a (this patch). Details inline. > > Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > > > CXL uses PCI AER Internal errors to signal to the host that an error has > > occurred. The host can then read more detailed status from the CXL RAS > > capability. > > > > For uncorrectable errors: support multiple injection in one operation > > as this is needed to reliably test multiple header logging support in an > > OS. The equivalent feature doesn't exist for correctable errors, so only > > one error need be injected at a time. > > > > Note: > > - Header content needs to be manually specified in a fashion that > > matches the specification for what can be in the header for each > > error type. > > > > Injection via QMP: > > { "execute": "qmp_capabilities" } > > ... > > { "execute": "cxl-inject-uncorrectable-errors", > > "arguments": { > > "path": "/machine/peripheral/cxl-pmem0", > > "errors": [ > > { > > "type": "cache-address-parity", > > "header": [ 3, 4] > > }, > > { > > "type": "cache-data-parity", > > "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] > > }, > > { > > "type": "internal", > > "header": [ 1, 2, 4] > > } > > ] > > }} > > ... > > { "execute": "cxl-inject-correctable-error", > > "arguments": { > > "path": "/machine/peripheral/cxl-pmem0", > > "type": "physical" > > } } > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > [...] > > > diff --git a/qapi/cxl.json b/qapi/cxl.json > > new file mode 100644 > > index 0000000000..ac7e167fa2 > > --- /dev/null > > +++ b/qapi/cxl.json > > @@ -0,0 +1,118 @@ > > +# -*- Mode: Python -*- > > +# vim: filetype=python > > + > > +## > > +# = CXL devices > > +## > > + > > +## > > +# @CxlUncorErrorType: > > +# > > +# Type of uncorrectable CXL error to inject. These errors are reported via > > +# an AER uncorrectable internal error with additional information logged at > > +# the CXL device. > > +# > > +# @cache-data-parity: Data error such as data parity or data ECC error CXL.cache > > +# @cache-address-parity: Address parity or other errors associated with the > > +# address field on CXL.cache > > +# @cache-be-parity: Byte enable parity or other byte enable errors on CXL.cache > > +# @cache-data-ecc: ECC error on CXL.cache > > +# @mem-data-parity: Data error such as data parity or data ECC error on CXL.mem > > +# @mem-address-parity: Address parity or other errors associated with the > > +# address field on CXL.mem > > +# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem. > > +# @mem-data-ecc: Data ECC error on CXL.mem. > > +# @reinit-threshold: REINIT threshold hit. > > +# @rsvd-encoding: Received unrecognized encoding. > > +# @poison-received: Received poison from the peer. > > +# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate which) > > +# @internal: Component specific error > > +# @cxl-ide-tx: Integrity and data encryption tx error. > > +# @cxl-ide-rx: Integrity and data encryption rx error. > > +## > > + > > +{ 'enum': 'CxlUncorErrorType', > > + 'data': ['cache-data-parity', > > + 'cache-address-parity', > > + 'cache-be-parity', > > + 'cache-data-ecc', > > + 'mem-data-parity', > > + 'mem-address-parity', > > + 'mem-be-parity', > > + 'mem-data-ecc', > > + 'reinit-threshold', > > + 'rsvd-encoding', > > + 'poison-received', > > + 'receiver-overflow', > > + 'internal', > > + 'cxl-ide-tx', > > + 'cxl-ide-rx' > > + ] > > + } > > + > > +## > > +# @CXLUncorErrorRecord: > > +# > > +# Record of a single error including header log. > > +# > > +# @type: Type of error > > +# @header: 16 DWORD of header. > > +## > > +{ 'struct': 'CXLUncorErrorRecord', > > + 'data': { > > + 'type': 'CxlUncorErrorType', > > + 'header': [ 'uint32' ] > > + } > > +} > > + > > +## > > +# @cxl-inject-uncorrectable-errors: > > +# > > +# Command to allow injection of multiple errors in one go. This allows testing > > +# of multiple header log handling in the OS. > > +# > > +# @path: CXL Type 3 device canonical QOM path > > +# @errors: Errors to inject > > +## > > +{ 'command': 'cxl-inject-uncorrectable-errors', > > + 'data': { 'path': 'str', > > + 'errors': [ 'CXLUncorErrorRecord' ] }} > > + > > +## > > +# @CxlCorErrorType: > > +# > > +# Type of CXL correctable error to inject > > +# > > +# @cache-data-ecc: Data ECC error on CXL.cache > > +# @mem-data-ecc: Data ECC error on CXL.mem > > Missing: > > # @retry-threshold: ... > > I need suitable description text. Can you help me? Spec says: "Retry Threshold Hit. (NUM_RETRY>=MAX_NUM_RETRY). See Section 4.2.8.5.1 for the definitions of NUM_RETRY and MAX_NUM_RETRY." Following the reference: "NUM_RETRY: This counter is used to count the number of RETRY.Req requests sent to retry the same flit. The counter remains enabled during the whole retry sequence (state is not RETRY_LOCAL_NORMAL). It is reset to 0 at initialization. It is also reset to 0 when a RETRY.Ack sequence is received with the Empty bit set or whenever the LRSM state is RETRY_LOCAL_NORMAL and an error-free retryable flit is received. The counter is incremented whenever the LRSM state changes from RETRY_LLRREQ to RETRY_LOCAL_IDLE. If the counter reaches a threshold (called MAX_NUM_RETRY), then the local retry state machine transitions to the RETRY_PHY_REINIT. The NUM_RETRY counter is also reset when the Physical layer exits from LTSSM recovery state (the LRSM transition through RETRY_PHY_REINIT to RETRY_LLRREQ)." So based on my failure to understand much of that beyond it has something to do with low level retries, maybe just "Number of times the retry threshold was hit." Thanks for tidying this up! ? > > > +# @crc-threshold: Component specific and applicable to 68 byte Flit mode only. > > +# @cache-poison-received: Received poison from a peer on CXL.cache. > > +# @mem-poison-received: Received poison from a peer on CXL.mem > > +# @physical: Received error indication from the physical layer. > > +## > > +{ 'enum': 'CxlCorErrorType', > > + 'data': ['cache-data-ecc', > > + 'mem-data-ecc', > > + 'crc-threshold', > > + 'retry-threshold', > > + 'cache-poison-received', > > + 'mem-poison-received', > > + 'physical'] > > +} > > + > > +## > > +# @cxl-inject-correctable-error: > > +# > > +# Command to inject a single correctable error. Multiple error injection > > +# of this error type is not interesting as there is no associated header log. > > +# These errors are reported via AER as a correctable internal error, with > > +# additional detail available from the CXL device. > > +# > > +# @path: CXL Type 3 device canonical QOM path > > +# @type: Type of error. > > +## > > +{ 'command': 'cxl-inject-correctable-error', > > + 'data': { 'path': 'str', > > + 'type': 'CxlCorErrorType' > > + } > > +} > > [...] > >
WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com> To: Markus Armbruster <armbru@redhat.com> Cc: qemu-devel@nongnu.org, "Michael Tsirkin" <mst@redhat.com>, "Ben Widawsky" <bwidawsk@kernel.org>, linux-cxl@vger.kernel.org, linuxarm@huawei.com, "Ira Weiny" <ira.weiny@intel.com>, "Gregory Price" <gourry.memverge@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, "Mike Maslenkin" <mike.maslenkin@gmail.com>, "Dave Jiang" <dave.jiang@intel.com> Subject: Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support. Date: Tue, 31 Oct 2023 17:55:22 +0000 [thread overview] Message-ID: <20231031175522.00006073@Huawei.com> (raw) Message-ID: <20231031175522.DYeFJYzSPaOpQdZVZLGDLqrSj7Zdk739blYnISo9By8@z> (raw) In-Reply-To: <87cyx04qcw.fsf@pond.sub.org> On Fri, 27 Oct 2023 06:54:39 +0200 Markus Armbruster <armbru@redhat.com> wrote: > I'm trying to fill in QMP documentation holes, and found one in commit > 415442a1b4a (this patch). Details inline. > > Jonathan Cameron <Jonathan.Cameron@huawei.com> writes: > > > CXL uses PCI AER Internal errors to signal to the host that an error has > > occurred. The host can then read more detailed status from the CXL RAS > > capability. > > > > For uncorrectable errors: support multiple injection in one operation > > as this is needed to reliably test multiple header logging support in an > > OS. The equivalent feature doesn't exist for correctable errors, so only > > one error need be injected at a time. > > > > Note: > > - Header content needs to be manually specified in a fashion that > > matches the specification for what can be in the header for each > > error type. > > > > Injection via QMP: > > { "execute": "qmp_capabilities" } > > ... > > { "execute": "cxl-inject-uncorrectable-errors", > > "arguments": { > > "path": "/machine/peripheral/cxl-pmem0", > > "errors": [ > > { > > "type": "cache-address-parity", > > "header": [ 3, 4] > > }, > > { > > "type": "cache-data-parity", > > "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31] > > }, > > { > > "type": "internal", > > "header": [ 1, 2, 4] > > } > > ] > > }} > > ... > > { "execute": "cxl-inject-correctable-error", > > "arguments": { > > "path": "/machine/peripheral/cxl-pmem0", > > "type": "physical" > > } } > > > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > [...] > > > diff --git a/qapi/cxl.json b/qapi/cxl.json > > new file mode 100644 > > index 0000000000..ac7e167fa2 > > --- /dev/null > > +++ b/qapi/cxl.json > > @@ -0,0 +1,118 @@ > > +# -*- Mode: Python -*- > > +# vim: filetype=python > > + > > +## > > +# = CXL devices > > +## > > + > > +## > > +# @CxlUncorErrorType: > > +# > > +# Type of uncorrectable CXL error to inject. These errors are reported via > > +# an AER uncorrectable internal error with additional information logged at > > +# the CXL device. > > +# > > +# @cache-data-parity: Data error such as data parity or data ECC error CXL.cache > > +# @cache-address-parity: Address parity or other errors associated with the > > +# address field on CXL.cache > > +# @cache-be-parity: Byte enable parity or other byte enable errors on CXL.cache > > +# @cache-data-ecc: ECC error on CXL.cache > > +# @mem-data-parity: Data error such as data parity or data ECC error on CXL.mem > > +# @mem-address-parity: Address parity or other errors associated with the > > +# address field on CXL.mem > > +# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem. > > +# @mem-data-ecc: Data ECC error on CXL.mem. > > +# @reinit-threshold: REINIT threshold hit. > > +# @rsvd-encoding: Received unrecognized encoding. > > +# @poison-received: Received poison from the peer. > > +# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate which) > > +# @internal: Component specific error > > +# @cxl-ide-tx: Integrity and data encryption tx error. > > +# @cxl-ide-rx: Integrity and data encryption rx error. > > +## > > + > > +{ 'enum': 'CxlUncorErrorType', > > + 'data': ['cache-data-parity', > > + 'cache-address-parity', > > + 'cache-be-parity', > > + 'cache-data-ecc', > > + 'mem-data-parity', > > + 'mem-address-parity', > > + 'mem-be-parity', > > + 'mem-data-ecc', > > + 'reinit-threshold', > > + 'rsvd-encoding', > > + 'poison-received', > > + 'receiver-overflow', > > + 'internal', > > + 'cxl-ide-tx', > > + 'cxl-ide-rx' > > + ] > > + } > > + > > +## > > +# @CXLUncorErrorRecord: > > +# > > +# Record of a single error including header log. > > +# > > +# @type: Type of error > > +# @header: 16 DWORD of header. > > +## > > +{ 'struct': 'CXLUncorErrorRecord', > > + 'data': { > > + 'type': 'CxlUncorErrorType', > > + 'header': [ 'uint32' ] > > + } > > +} > > + > > +## > > +# @cxl-inject-uncorrectable-errors: > > +# > > +# Command to allow injection of multiple errors in one go. This allows testing > > +# of multiple header log handling in the OS. > > +# > > +# @path: CXL Type 3 device canonical QOM path > > +# @errors: Errors to inject > > +## > > +{ 'command': 'cxl-inject-uncorrectable-errors', > > + 'data': { 'path': 'str', > > + 'errors': [ 'CXLUncorErrorRecord' ] }} > > + > > +## > > +# @CxlCorErrorType: > > +# > > +# Type of CXL correctable error to inject > > +# > > +# @cache-data-ecc: Data ECC error on CXL.cache > > +# @mem-data-ecc: Data ECC error on CXL.mem > > Missing: > > # @retry-threshold: ... > > I need suitable description text. Can you help me? Spec says: "Retry Threshold Hit. (NUM_RETRY>=MAX_NUM_RETRY). See Section 4.2.8.5.1 for the definitions of NUM_RETRY and MAX_NUM_RETRY." Following the reference: "NUM_RETRY: This counter is used to count the number of RETRY.Req requests sent to retry the same flit. The counter remains enabled during the whole retry sequence (state is not RETRY_LOCAL_NORMAL). It is reset to 0 at initialization. It is also reset to 0 when a RETRY.Ack sequence is received with the Empty bit set or whenever the LRSM state is RETRY_LOCAL_NORMAL and an error-free retryable flit is received. The counter is incremented whenever the LRSM state changes from RETRY_LLRREQ to RETRY_LOCAL_IDLE. If the counter reaches a threshold (called MAX_NUM_RETRY), then the local retry state machine transitions to the RETRY_PHY_REINIT. The NUM_RETRY counter is also reset when the Physical layer exits from LTSSM recovery state (the LRSM transition through RETRY_PHY_REINIT to RETRY_LLRREQ)." So based on my failure to understand much of that beyond it has something to do with low level retries, maybe just "Number of times the retry threshold was hit." Thanks for tidying this up! ? > > > +# @crc-threshold: Component specific and applicable to 68 byte Flit mode only. > > +# @cache-poison-received: Received poison from a peer on CXL.cache. > > +# @mem-poison-received: Received poison from a peer on CXL.mem > > +# @physical: Received error indication from the physical layer. > > +## > > +{ 'enum': 'CxlCorErrorType', > > + 'data': ['cache-data-ecc', > > + 'mem-data-ecc', > > + 'crc-threshold', > > + 'retry-threshold', > > + 'cache-poison-received', > > + 'mem-poison-received', > > + 'physical'] > > +} > > + > > +## > > +# @cxl-inject-correctable-error: > > +# > > +# Command to inject a single correctable error. Multiple error injection > > +# of this error type is not interesting as there is no associated header log. > > +# These errors are reported via AER as a correctable internal error, with > > +# additional detail available from the CXL device. > > +# > > +# @path: CXL Type 3 device canonical QOM path > > +# @type: Type of error. > > +## > > +{ 'command': 'cxl-inject-correctable-error', > > + 'data': { 'path': 'str', > > + 'type': 'CxlCorErrorType' > > + } > > +} > > [...] > >
next prev parent reply other threads:[~2023-10-31 17:55 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-02-21 15:21 [PATCH v5 0/8] hw/cxl: RAS error emulation and injection Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 1/8] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 2/8] hw/pci/aer: Add missing routing for AER errors Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 3/8] hw/pci-bridge/cxl_root_port: Wire up AER Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 4/8] hw/pci-bridge/cxl_root_port: Wire up MSI Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 5/8] hw/mem/cxl-type3: Add AER extended capability Jonathan Cameron via 2023-02-21 15:21 ` [PATCH v5 6/8] hw/cxl: Fix endian issues in CXL RAS capability defaults / masks Jonathan Cameron via 2023-02-21 22:06 ` Philippe Mathieu-Daudé 2023-02-21 15:21 ` [PATCH v5 7/8] hw/pci/aer: Make PCIE AER error injection facility available for other emulation to use Jonathan Cameron via 2023-02-21 22:08 ` Philippe Mathieu-Daudé 2023-02-21 15:21 ` [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support Jonathan Cameron via 2023-02-21 15:48 ` Dave Jiang 2023-02-21 22:15 ` Philippe Mathieu-Daudé 2023-02-22 14:53 ` Jonathan Cameron via 2023-02-22 15:32 ` Philippe Mathieu-Daudé 2023-02-22 16:49 ` Jonathan Cameron via 2023-02-22 18:16 ` Philippe Mathieu-Daudé 2023-02-23 6:58 ` Thomas Huth 2023-02-23 7:37 ` Markus Armbruster 2023-02-23 14:27 ` Jonathan Cameron via 2023-02-24 17:37 ` Jonathan Cameron via 2023-02-24 19:02 ` Philippe Mathieu-Daudé 2023-02-27 9:40 ` Markus Armbruster 2023-02-22 18:28 ` Markus Armbruster 2023-10-27 4:54 ` Markus Armbruster 2023-10-31 17:55 ` Jonathan Cameron via [this message] 2023-10-31 17:55 ` Jonathan Cameron 2023-11-02 6:47 ` Markus Armbruster
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20231031175522.00006073@Huawei.com \ --to=qemu-devel@nongnu.org \ --cc=Jonathan.Cameron@Huawei.com \ --cc=armbru@redhat.com \ --cc=bwidawsk@kernel.org \ --cc=dave.jiang@intel.com \ --cc=gourry.memverge@gmail.com \ --cc=ira.weiny@intel.com \ --cc=linux-cxl@vger.kernel.org \ --cc=linuxarm@huawei.com \ --cc=mike.maslenkin@gmail.com \ --cc=mst@redhat.com \ --cc=philmd@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).