All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: qemu-devel@nongnu.org, "Michael Tsirkin" <mst@redhat.com>,
	"Ben Widawsky" <bwidawsk@kernel.org>,
	linux-cxl@vger.kernel.org, linuxarm@huawei.com,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Gregory Price" <gourry.memverge@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Mike Maslenkin" <mike.maslenkin@gmail.com>,
	"Dave Jiang" <dave.jiang@intel.com>
Subject: Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support.
Date: Thu, 02 Nov 2023 07:47:14 +0100	[thread overview]
Message-ID: <87wmv0d53h.fsf@pond.sub.org> (raw)
In-Reply-To: <20231031175522.00006073@Huawei.com> (Jonathan Cameron's message of "Tue, 31 Oct 2023 17:55:22 +0000")

Jonathan Cameron <Jonathan.Cameron@Huawei.com> writes:

> On Fri, 27 Oct 2023 06:54:39 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> I'm trying to fill in QMP documentation holes, and found one in commit
>> 415442a1b4a (this patch).  Details inline.
>> 
>> Jonathan Cameron <Jonathan.Cameron@huawei.com> writes:
>> 
>> > CXL uses PCI AER Internal errors to signal to the host that an error has
>> > occurred. The host can then read more detailed status from the CXL RAS
>> > capability.
>> >
>> > For uncorrectable errors: support multiple injection in one operation
>> > as this is needed to reliably test multiple header logging support in an
>> > OS. The equivalent feature doesn't exist for correctable errors, so only
>> > one error need be injected at a time.
>> >
>> > Note:
>> >  - Header content needs to be manually specified in a fashion that
>> >    matches the specification for what can be in the header for each
>> >    error type.
>> >
>> > Injection via QMP:
>> > { "execute": "qmp_capabilities" }
>> > ...
>> > { "execute": "cxl-inject-uncorrectable-errors",
>> >   "arguments": {
>> >     "path": "/machine/peripheral/cxl-pmem0",
>> >     "errors": [
>> >         {
>> >             "type": "cache-address-parity",
>> >             "header": [ 3, 4]
>> >         },
>> >         {
>> >             "type": "cache-data-parity",
>> >             "header": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
>> >         },
>> >         {
>> >             "type": "internal",
>> >             "header": [ 1, 2, 4]
>> >         }
>> >         ]
>> >   }}
>> > ...
>> > { "execute": "cxl-inject-correctable-error",
>> >     "arguments": {
>> >         "path": "/machine/peripheral/cxl-pmem0",
>> >         "type": "physical"
>> >     } }
>> >
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
>> 
>> [...]
>> 
>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > new file mode 100644
>> > index 0000000000..ac7e167fa2
>> > --- /dev/null
>> > +++ b/qapi/cxl.json
>> > @@ -0,0 +1,118 @@
>> > +# -*- Mode: Python -*-
>> > +# vim: filetype=python
>> > +
>> > +##
>> > +# = CXL devices
>> > +##
>> > +
>> > +##
>> > +# @CxlUncorErrorType:
>> > +#
>> > +# Type of uncorrectable CXL error to inject. These errors are reported via
>> > +# an AER uncorrectable internal error with additional information logged at
>> > +# the CXL device.
>> > +#
>> > +# @cache-data-parity: Data error such as data parity or data ECC error CXL.cache
>> > +# @cache-address-parity: Address parity or other errors associated with the
>> > +#                        address field on CXL.cache
>> > +# @cache-be-parity: Byte enable parity or other byte enable errors on CXL.cache
>> > +# @cache-data-ecc: ECC error on CXL.cache
>> > +# @mem-data-parity: Data error such as data parity or data ECC error on CXL.mem
>> > +# @mem-address-parity: Address parity or other errors associated with the
>> > +#                      address field on CXL.mem
>> > +# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem.
>> > +# @mem-data-ecc: Data ECC error on CXL.mem.
>> > +# @reinit-threshold: REINIT threshold hit.
>> > +# @rsvd-encoding: Received unrecognized encoding.
>> > +# @poison-received: Received poison from the peer.
>> > +# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate which)
>> > +# @internal: Component specific error
>> > +# @cxl-ide-tx: Integrity and data encryption tx error.
>> > +# @cxl-ide-rx: Integrity and data encryption rx error.
>> > +##
>> > +
>> > +{ 'enum': 'CxlUncorErrorType',
>> > +  'data': ['cache-data-parity',
>> > +           'cache-address-parity',
>> > +           'cache-be-parity',
>> > +           'cache-data-ecc',
>> > +           'mem-data-parity',
>> > +           'mem-address-parity',
>> > +           'mem-be-parity',
>> > +           'mem-data-ecc',
>> > +           'reinit-threshold',
>> > +           'rsvd-encoding',
>> > +           'poison-received',
>> > +           'receiver-overflow',
>> > +           'internal',
>> > +           'cxl-ide-tx',
>> > +           'cxl-ide-rx'
>> > +           ]
>> > + }
>> > +
>> > +##
>> > +# @CXLUncorErrorRecord:
>> > +#
>> > +# Record of a single error including header log.
>> > +#
>> > +# @type: Type of error
>> > +# @header: 16 DWORD of header.
>> > +##
>> > +{ 'struct': 'CXLUncorErrorRecord',
>> > +  'data': {
>> > +      'type': 'CxlUncorErrorType',
>> > +      'header': [ 'uint32' ]
>> > +  }
>> > +}
>> > +
>> > +##
>> > +# @cxl-inject-uncorrectable-errors:
>> > +#
>> > +# Command to allow injection of multiple errors in one go. This allows testing
>> > +# of multiple header log handling in the OS.
>> > +#
>> > +# @path: CXL Type 3 device canonical QOM path
>> > +# @errors: Errors to inject
>> > +##
>> > +{ 'command': 'cxl-inject-uncorrectable-errors',
>> > +  'data': { 'path': 'str',
>> > +             'errors': [ 'CXLUncorErrorRecord' ] }}
>> > +
>> > +##
>> > +# @CxlCorErrorType:
>> > +#
>> > +# Type of CXL correctable error to inject
>> > +#
>> > +# @cache-data-ecc: Data ECC error on CXL.cache
>> > +# @mem-data-ecc: Data ECC error on CXL.mem  
>> 
>> Missing:
>> 
>>    # @retry-threshold: ...
>> 
>> I need suitable description text.  Can you help me?
>
> Spec says:
> "Retry Threshold Hit. (NUM_RETRY>=MAX_NUM_RETRY).
> See Section 4.2.8.5.1 for the definitions of NUM_RETRY and MAX_NUM_RETRY."
>
> Following the reference:
> "NUM_RETRY: This counter is used to count the number of RETRY.Req requests
> sent to retry the same flit. The counter remains enabled during the whole retry
> sequence (state is not RETRY_LOCAL_NORMAL). It is reset to 0 at initialization. It is
> also reset to 0 when a RETRY.Ack sequence is received with the Empty bit set or
> whenever the LRSM state is RETRY_LOCAL_NORMAL and an error-free retryable flit
> is received. The counter is incremented whenever the LRSM state changes from
> RETRY_LLRREQ to RETRY_LOCAL_IDLE. If the counter reaches a threshold (called
> MAX_NUM_RETRY), then the local retry state machine transitions to the
> RETRY_PHY_REINIT. The NUM_RETRY counter is also reset when the Physical layer
> exits from LTSSM recovery state (the LRSM transition through RETRY_PHY_REINIT
> to RETRY_LLRREQ)."
>
> So based on my failure to understand much of that beyond it has something
> to do with low level retries, maybe just
>
> "Number of times the retry threshold was hit."

Sold!  Thanks for your help.

> Thanks for tidying this up!

You're welcome!

I intend post the patch as part of a series filling in documentation
holes all over the place.  Will take some time, I'm afraid.

[...]


      reply	other threads:[~2023-11-02  6:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-21 15:21 [PATCH v5 0/8] hw/cxl: RAS error emulation and injection Jonathan Cameron
2023-02-21 15:21 ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 1/8] hw/pci/aer: Implement PCI_ERR_UNCOR_MASK register Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 2/8] hw/pci/aer: Add missing routing for AER errors Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 3/8] hw/pci-bridge/cxl_root_port: Wire up AER Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 4/8] hw/pci-bridge/cxl_root_port: Wire up MSI Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 5/8] hw/mem/cxl-type3: Add AER extended capability Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:21 ` [PATCH v5 6/8] hw/cxl: Fix endian issues in CXL RAS capability defaults / masks Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 22:06   ` Philippe Mathieu-Daudé
2023-02-21 15:21 ` [PATCH v5 7/8] hw/pci/aer: Make PCIE AER error injection facility available for other emulation to use Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 22:08   ` Philippe Mathieu-Daudé
2023-02-21 15:21 ` [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support Jonathan Cameron
2023-02-21 15:21   ` Jonathan Cameron via
2023-02-21 15:48   ` Dave Jiang
2023-02-21 22:15   ` Philippe Mathieu-Daudé
2023-02-22 14:53     ` Jonathan Cameron
2023-02-22 14:53       ` Jonathan Cameron via
2023-02-22 15:32       ` Philippe Mathieu-Daudé
2023-02-22 16:49         ` Jonathan Cameron
2023-02-22 16:49           ` Jonathan Cameron via
2023-02-22 18:16           ` Philippe Mathieu-Daudé
2023-02-23  6:58             ` Thomas Huth
2023-02-23  7:37               ` Markus Armbruster
2023-02-23 14:27                 ` Jonathan Cameron
2023-02-23 14:27                   ` Jonathan Cameron via
2023-02-24 17:37                   ` Jonathan Cameron
2023-02-24 17:37                     ` Jonathan Cameron via
2023-02-24 19:02                   ` Philippe Mathieu-Daudé
2023-02-27  9:40                     ` Markus Armbruster
2023-02-22 18:28       ` Markus Armbruster
2023-10-27  4:54   ` Markus Armbruster
2023-10-31 17:55     ` Jonathan Cameron
2023-10-31 17:55       ` Jonathan Cameron via
2023-11-02  6:47       ` Markus Armbruster [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wmv0d53h.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=bwidawsk@kernel.org \
    --cc=dave.jiang@intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mike.maslenkin@gmail.com \
    --cc=mst@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.