From: Markus Armbruster <armbru@redhat.com>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: qemu-devel@nongnu.org, "Michael Tsirkin" <mst@redhat.com>,
"Ben Widawsky" <bwidawsk@kernel.org>,
linux-cxl@vger.kernel.org, linuxarm@huawei.com,
"Ira Weiny" <ira.weiny@intel.com>,
"Gregory Price" <gourry.memverge@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Mike Maslenkin" <mike.maslenkin@gmail.com>,
"Dave Jiang" <dave.jiang@intel.com>
Subject: Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support.
Date: Thu, 02 Nov 2023 07:47:14 +0100 [thread overview]
Message-ID: <87wmv0d53h.fsf@pond.sub.org> (raw)
In-Reply-To: <20231031175522.00006073@Huawei.com> (Jonathan Cameron's message of "Tue, 31 Oct 2023 17:55:22 +0000")
Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:
> On Fri, 27 Oct 2023 06:54:39 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I'm trying to fill in QMP documentation holes, and found one in commit
>> 415442a1b4a (this patch). Details inline.
>>
>> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:
>>
>> > CXL uses PCI AER Internal errors to signal to the host that an error has
>> > occurred. The host can then read more detailed status from the CXL RAS
>> > capability.
>> >
>> > For uncorrectable errors: support multiple injection in one operation
>> > as this is needed to reliably test multiple header logging support in an
>> > OS. The equivalent feature doesn't exist for correctable errors, so only
>> > one error need be injected at a time.
>> >
>> > Note:
>> > - Header content needs to be manually specified in a fashion that
>> > matches the specification for what can be in the header for each
>> > error type.
>> >
>> > Injection via QMP:
>> > { "execute": "qmp_capabilities" }
>> > ...
>> > { "execute": "cxl-inject-uncorrectable-errors",
>> > "arguments": {
>> > "path": "/machine/peripheral/cxl-pmem0",
>> > "errors": [
>> > {
>> > "type": "cache-address-parity",
>> > "header": [ 3, 4]
>> > },
>> > {
>> > "type": "cache-data-parity",
>> > "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
>> > },
>> > {
>> > "type": "internal",
>> > "header": [ 1, 2, 4]
>> > }
>> > ]
>> > }}
>> > ...
>> > { "execute": "cxl-inject-correctable-error",
>> > "arguments": {
>> > "path": "/machine/peripheral/cxl-pmem0",
>> > "type": "physical"
>> > } }
>> >
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>
>> [...]
>>
>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > new file mode 100644
>> > index 0000000000..ac7e167fa2
>> > --- /dev/null
>> > +++ b/qapi/cxl.json
>> > @@ -0,0 +1,118 @@
>> > +# -*- Mode: Python -*-
>> > +# vim: filetype=python
>> > +
>> > +##
>> > +# = CXL devices
>> > +##
>> > +
>> > +##
>> > +# @CxlUncorErrorType:
>> > +#
>> > +# Type of uncorrectable CXL error to inject. These errors are reported via
>> > +# an AER uncorrectable internal error with additional information logged at
>> > +# the CXL device.
>> > +#
>> > +# @cache-data-parity: Data error such as data parity or data ECC error CXL.cache
>> > +# @cache-address-parity: Address parity or other errors associated with the
>> > +# address field on CXL.cache
>> > +# @cache-be-parity: Byte enable parity or other byte enable errors on CXL.cache
>> > +# @cache-data-ecc: ECC error on CXL.cache
>> > +# @mem-data-parity: Data error such as data parity or data ECC error on CXL.mem
>> > +# @mem-address-parity: Address parity or other errors associated with the
>> > +# address field on CXL.mem
>> > +# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem.
>> > +# @mem-data-ecc: Data ECC error on CXL.mem.
>> > +# @reinit-threshold: REINIT threshold hit.
>> > +# @rsvd-encoding: Received unrecognized encoding.
>> > +# @poison-received: Received poison from the peer.
>> > +# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate which)
>> > +# @internal: Component specific error
>> > +# @cxl-ide-tx: Integrity and data encryption tx error.
>> > +# @cxl-ide-rx: Integrity and data encryption rx error.
>> > +##
>> > +
>> > +{ 'enum': 'CxlUncorErrorType',
>> > + 'data': ['cache-data-parity',
>> > + 'cache-address-parity',
>> > + 'cache-be-parity',
>> > + 'cache-data-ecc',
>> > + 'mem-data-parity',
>> > + 'mem-address-parity',
>> > + 'mem-be-parity',
>> > + 'mem-data-ecc',
>> > + 'reinit-threshold',
>> > + 'rsvd-encoding',
>> > + 'poison-received',
>> > + 'receiver-overflow',
>> > + 'internal',
>> > + 'cxl-ide-tx',
>> > + 'cxl-ide-rx'
>> > + ]
>> > + }
>> > +
>> > +##
>> > +# @CXLUncorErrorRecord:
>> > +#
>> > +# Record of a single error including header log.
>> > +#
>> > +# @type: Type of error
>> > +# @header: 16 DWORD of header.
>> > +##
>> > +{ 'struct': 'CXLUncorErrorRecord',
>> > + 'data': {
>> > + 'type': 'CxlUncorErrorType',
>> > + 'header': [ 'uint32' ]
>> > + }
>> > +}
>> > +
>> > +##
>> > +# @cxl-inject-uncorrectable-errors:
>> > +#
>> > +# Command to allow injection of multiple errors in one go. This allows testing
>> > +# of multiple header log handling in the OS.
>> > +#
>> > +# @path: CXL Type 3 device canonical QOM path
>> > +# @errors: Errors to inject
>> > +##
>> > +{ 'command': 'cxl-inject-uncorrectable-errors',
>> > + 'data': { 'path': 'str',
>> > + 'errors': [ 'CXLUncorErrorRecord' ] }}
>> > +
>> > +##
>> > +# @CxlCorErrorType:
>> > +#
>> > +# Type of CXL correctable error to inject
>> > +#
>> > +# @cache-data-ecc: Data ECC error on CXL.cache
>> > +# @mem-data-ecc: Data ECC error on CXL.mem
>>
>> Missing:
>>
>> # @retry-threshold: ...
>>
>> I need suitable description text. Can you help me?
>
> Spec says:
> "Retry Threshold Hit. (NUM_RETRY>=MAX_NUM_RETRY).
> See Section 4.2.8.5.1 for the definitions of NUM_RETRY and MAX_NUM_RETRY."
>
> Following the reference:
> "NUM_RETRY: This counter is used to count the number of RETRY.Req requests
> sent to retry the same flit. The counter remains enabled during the whole retry
> sequence (state is not RETRY_LOCAL_NORMAL). It is reset to 0 at initialization. It is
> also reset to 0 when a RETRY.Ack sequence is received with the Empty bit set or
> whenever the LRSM state is RETRY_LOCAL_NORMAL and an error-free retryable flit
> is received. The counter is incremented whenever the LRSM state changes from
> RETRY_LLRREQ to RETRY_LOCAL_IDLE. If the counter reaches a threshold (called
> MAX_NUM_RETRY), then the local retry state machine transitions to the
> RETRY_PHY_REINIT. The NUM_RETRY counter is also reset when the Physical layer
> exits from LTSSM recovery state (the LRSM transition through RETRY_PHY_REINIT
> to RETRY_LLRREQ)."
>
> So based on my failure to understand much of that beyond it has something
> to do with low level retries, maybe just
>
> "Number of times the retry threshold was hit."
Sold! Thanks for your help.
> Thanks for tidying this up!
You're welcome!
I intend post the patch as part of a series filling in documentation
holes all over the place. Will take some time, I'm afraid.
[...]
prev parent reply other threads:[~2023-11-02 6:47 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-21 15:21 [PATCH v5 0/8] hw/cxl: RAS error emulation and injection Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 1/8] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 2/8] hw/pci/aer: Add missing routing for AER errors Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 3/8] hw/pci-bridge/cxl_root_port: Wire up AER Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 4/8] hw/pci-bridge/cxl_root_port: Wire up MSI Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 5/8] hw/mem/cxl-type3: Add AER extended capability Jonathan Cameron
2023-02-21 15:21 ` [PATCH v5 6/8] hw/cxl: Fix endian issues in CXL RAS capability defaults / masks Jonathan Cameron
2023-02-21 22:06 ` Philippe Mathieu-Daudé
2023-02-21 15:21 ` [PATCH v5 7/8] hw/pci/aer: Make PCIE AER error injection facility available for other emulation to use Jonathan Cameron
2023-02-21 22:08 ` Philippe Mathieu-Daudé
2023-02-21 15:21 ` [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support Jonathan Cameron
2023-02-21 15:48 ` Dave Jiang
2023-02-21 22:15 ` Philippe Mathieu-Daudé
2023-02-22 14:53 ` Jonathan Cameron
2023-02-22 15:32 ` Philippe Mathieu-Daudé
2023-02-22 16:49 ` Jonathan Cameron
2023-02-22 18:16 ` Philippe Mathieu-Daudé
2023-02-23 6:58 ` Thomas Huth
2023-02-23 7:37 ` Markus Armbruster
2023-02-23 14:27 ` Jonathan Cameron
2023-02-24 17:37 ` Jonathan Cameron
2023-02-24 19:02 ` Philippe Mathieu-Daudé
2023-02-27 9:40 ` Markus Armbruster
2023-02-22 18:28 ` Markus Armbruster
2023-10-27 4:54 ` Markus Armbruster
2023-10-31 17:55 ` Jonathan Cameron
2023-11-02 6:47 ` Markus Armbruster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wmv0d53h.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=Jonathan.Cameron@Huawei.com \
--cc=bwidawsk@kernel.org \
--cc=dave.jiang@intel.com \
--cc=gourry.memverge@gmail.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=mike.maslenkin@gmail.com \
--cc=mst@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox