From: Nathan Lynch <nathan.lynch@amd.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>,
"Nathan Lynch via B4 Relay"
<devnull+nathan.lynch.amd.com@kernel.org>
Cc: Vinod Koul <vkoul@kernel.org>, Wei Huang <wei.huang2@amd.com>,
"Mario Limonciello" <mario.limonciello@amd.com>,
Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <dmaengine@vger.kernel.org>
Subject: Re: [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support
Date: Mon, 15 Sep 2025 15:42:37 -0500 [thread overview]
Message-ID: <87frcna1mq.fsf@AUSNATLYNCH.amd.com> (raw)
In-Reply-To: <20250915131151.00005f26@huawei.com>
Jonathan Cameron <jonathan.cameron@huawei.com> writes:
> On Fri, 05 Sep 2025 13:48:29 -0500
> Nathan Lynch via B4 Relay <devnull+nathan.lynch.amd.com@kernel.org> wrote:
>
>> From: Nathan Lynch <nathan.lynch@amd.com>
>>
>> SDXI implementations provide software with detailed information about
>> error conditions using a per-device ring buffer in system memory. When
>> an error condition is signaled via interrupt, the driver retrieves any
>> pending error log entries and reports them to the kernel log.
>>
>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
> Hi,
> A few more comments inline. Kind of similar stuff around
> having both register definitions for unpacking and the structure
> definitions in patch 2.
>
> Thanks,
>
> Jonathan
>> ---
>> drivers/dma/sdxi/error.c | 340 +++++++++++++++++++++++++++++++++++++++++++++++
>> drivers/dma/sdxi/error.h | 16 +++
>> 2 files changed, 356 insertions(+)
>>
>> diff --git a/drivers/dma/sdxi/error.c b/drivers/dma/sdxi/error.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..c5e33f5989250352f6b081a3049b3b1f972c85a6
>> --- /dev/null
>> +++ b/drivers/dma/sdxi/error.c
>
>> +/* The "unpacked" counterpart to ERRLOG_HD_ENT. */
>> +struct errlog_entry {
>> + u64 dsc_index;
>> + u16 cxt_num;
>> + u16 err_class;
>> + u16 type;
>> + u8 step;
>> + u8 buf;
>> + u8 sub_step;
>> + u8 re;
>> + bool vl;
>> + bool cv;
>> + bool div;
>> + bool bv;
>> +};
>> +
>> +#define ERRLOG_ENTRY_FIELD(hi_, lo_, name_) \
>> + PACKED_FIELD(hi_, lo_, struct errlog_entry, name_)
>> +#define ERRLOG_ENTRY_FLAG(nr_, name_) \
>> + ERRLOG_ENTRY_FIELD(nr_, nr_, name_)
>> +
>> +/* Refer to "Error Log Header Entry (ERRLOG_HD_ENT)" */
>> +static const struct packed_field_u16 errlog_hd_ent_fields[] = {
>> + ERRLOG_ENTRY_FLAG(0, vl),
>> + ERRLOG_ENTRY_FIELD(13, 8, step),
>> + ERRLOG_ENTRY_FIELD(26, 16, type),
>> + ERRLOG_ENTRY_FLAG(32, cv),
>> + ERRLOG_ENTRY_FLAG(33, div),
>> + ERRLOG_ENTRY_FLAG(34, bv),
>> + ERRLOG_ENTRY_FIELD(38, 36, buf),
>> + ERRLOG_ENTRY_FIELD(43, 40, sub_step),
>> + ERRLOG_ENTRY_FIELD(46, 44, re),
>> + ERRLOG_ENTRY_FIELD(63, 48, cxt_num),
>> + ERRLOG_ENTRY_FIELD(127, 64, dsc_index),
>> + ERRLOG_ENTRY_FIELD(367, 352, err_class),
>
> The association between the fields here and struct sdxi_err_log_hd_ent
> to me should be via some defines in patch 2 for the various fields
> embedded in misc0 etc.
>
>> +};
>
>> +static void sdxi_print_err(struct sdxi_dev *sdxi, u64 err_rd)
>> +{
>> + struct errlog_entry ent;
>> + size_t index;
>> +
>> + index = err_rd % ERROR_LOG_ENTRIES;
>> +
>> + unpack_fields(&sdxi->err_log[index], sizeof(sdxi->err_log[0]),
>> + &ent, errlog_hd_ent_fields, SDXI_PACKING_QUIRKS);
>> +
>> + if (!ent.vl) {
>> + dev_err_ratelimited(sdxi_to_dev(sdxi),
>> + "Ignoring error log entry with vl=0\n");
>> + return;
>> + }
>> +
>> + if (ent.type != OP_TYPE_ERRLOG) {
>> + dev_err_ratelimited(sdxi_to_dev(sdxi),
>> + "Ignoring error log entry with type=%#x\n",
>> + ent.type);
>> + return;
>> + }
>> +
>> + sdxi_err(sdxi, "error log entry[%zu], MMIO_ERR_RD=%#llx:\n",
>> + index, err_rd);
>> + sdxi_err(sdxi, " re: %#x (%s)\n", ent.re, reaction_str(ent.re));
>> + sdxi_err(sdxi, " step: %#x (%s)\n", ent.step, step_str(ent.step));
>> + sdxi_err(sdxi, " sub_step: %#x (%s)\n",
>> + ent.sub_step, sub_step_str(ent.sub_step));
>> + sdxi_err(sdxi, " cv: %u div: %u bv: %u\n", ent.cv, ent.div, ent.bv);
>> + if (ent.bv)
>> + sdxi_err(sdxi, " buf: %u\n", ent.buf);
>> + if (ent.cv)
>> + sdxi_err(sdxi, " cxt_num: %#x\n", ent.cxt_num);
>> + if (ent.div)
>> + sdxi_err(sdxi, " dsc_index: %#llx\n", ent.dsc_index);
>> + sdxi_err(sdxi, " err_class: %#x\n", ent.err_class);
> Consider using tracepoints for error logging rather than large splats
> in the log.
Agreed, context-level errors (which will be user-triggerable once there
is an ABI exposed to user space) should not be dumped to the kernel log
by default.
Some function-level errors may be appropriate to print.
> I'd then just fill the tracepoint in directly rather than have an
> unpacking step.
Yes, I can do that.
>
>> +}
>
>> +/* Refer to "Error Log Initialization" */
>> +int sdxi_error_init(struct sdxi_dev *sdxi)
>> +{
>> + u64 reg;
>> + int err;
>> +
>> + /* 1. Clear MMIO_ERR_CFG. Error interrupts are inhibited until step 6. */
>> + sdxi_write64(sdxi, SDXI_MMIO_ERR_CFG, 0);
>> +
>> + /* 2. Clear MMIO_ERR_STS. The flags in this register are RW1C. */
>> + reg = FIELD_PREP(SDXI_MMIO_ERR_STS_STS_BIT, 1) |
>> + FIELD_PREP(SDXI_MMIO_ERR_STS_OVF_BIT, 1) |
>> + FIELD_PREP(SDXI_MMIO_ERR_STS_ERR_BIT, 1);
>> + sdxi_write64(sdxi, SDXI_MMIO_ERR_STS, reg);
>> +
>> + /* 3. Allocate memory for the error log ring buffer, initialize to zero. */
>> + sdxi->err_log = dma_alloc_coherent(sdxi_to_dev(sdxi), ERROR_LOG_SZ,
>> + &sdxi->err_log_dma, GFP_KERNEL);
>> + if (!sdxi->err_log)
>> + return -ENOMEM;
>> +
>> + /*
>> + * 4. Set MMIO_ERR_CTL.intr_en to 1 if interrupts on
>> + * context-level errors are desired.
>> + */
>> + reg = sdxi_read64(sdxi, SDXI_MMIO_ERR_CTL);
>> + FIELD_MODIFY(SDXI_MMIO_ERR_CTL_EN, ®, 1);
>> + sdxi_write64(sdxi, SDXI_MMIO_ERR_CTL, reg);
>> +
>> + /*
>> + * The spec is not explicit about when to do this, but this
>> + * seems like the right time: enable interrupt on
>> + * function-level transition to error state.
>> + */
>> + reg = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
>> + FIELD_MODIFY(SDXI_MMIO_CTL0_FN_ERR_INTR_EN, ®, 1);
>> + sdxi_write64(sdxi, SDXI_MMIO_CTL0, reg);
>> +
>> + /* 5. Clear MMIO_ERR_WRT and MMIO_ERR_RD. */
>> + sdxi_write64(sdxi, SDXI_MMIO_ERR_WRT, 0);
>> + sdxi_write64(sdxi, SDXI_MMIO_ERR_RD, 0);
>> +
>> + /*
>> + * Error interrupts can be generated once MMIO_ERR_CFG.en is
>> + * set in step 6, so set up the handler now.
>> + */
>> + err = request_threaded_irq(sdxi->error_irq, NULL, sdxi_irq_thread,
>> + IRQF_TRIGGER_NONE, "SDXI error", sdxi);
>> + if (err)
>> + goto free_errlog;
>> +
>> + /* 6. Program MMIO_ERR_CFG. */
>
> I'm guessing these are numbers steps in some bit of the spec?
> If not some of these comments like this one provide no value. We can
> see what is being written from the code! Perhaps add a very specific
> spec reference if you want to show why the numbering is here.
Perhaps it's understated, but at the beginning of this function:
/* Refer to "Error Log Initialization" */
int sdxi_error_init(struct sdxi_dev *sdxi)
The numbered steps in the function correspond to the numbered steps in
that part of the spec.
I could make the comment something like:
/*
* The numbered steps below correspond to the sequence outlined in 3.4.2
* "Error Log Initialization".
*/
though I'm unsure how stable the section numbering in the SDXI spec will
be over time.
next prev parent reply other threads:[~2025-09-15 20:42 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 18:48 [PATCH RFC 00/13] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 01/13] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
2025-09-15 17:25 ` Bjorn Helgaas
2025-09-15 20:17 ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 02/13] dmaengine: sdxi: Add control structure definitions Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 03/13] dmaengine: sdxi: Add descriptor encoding and unit tests Nathan Lynch via B4 Relay
2025-09-15 11:52 ` Jonathan Cameron
2025-09-15 19:30 ` Nathan Lynch
2025-09-16 14:20 ` Jonathan Cameron
2025-09-16 19:06 ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 04/13] dmaengine: sdxi: Add MMIO register definitions Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 05/13] dmaengine: sdxi: Add software data structures Nathan Lynch via B4 Relay
2025-09-15 11:59 ` Jonathan Cameron
2025-09-16 19:07 ` Nathan Lynch
2025-09-16 9:38 ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support Nathan Lynch via B4 Relay
2025-09-15 12:11 ` Jonathan Cameron
2025-09-15 20:42 ` Nathan Lynch [this message]
2025-09-16 14:23 ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 07/13] dmaengine: sdxi: Import descriptor enqueue code from spec Nathan Lynch via B4 Relay
2025-09-15 12:18 ` Jonathan Cameron
2025-09-16 17:05 ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 08/13] dmaengine: sdxi: Context creation/removal, descriptor submission Nathan Lynch via B4 Relay
2025-09-15 14:12 ` Jonathan Cameron
2025-09-16 20:40 ` Nathan Lynch
2025-09-17 13:34 ` Jonathan Cameron
2025-09-15 19:42 ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 09/13] dmaengine: sdxi: Add core device management code Nathan Lynch via B4 Relay
2025-09-15 14:23 ` Jonathan Cameron
2025-09-16 21:23 ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 10/13] dmaengine: sdxi: Add PCI driver support Nathan Lynch via B4 Relay
2025-09-05 19:14 ` Mario Limonciello
2025-09-10 15:25 ` Nathan Lynch
2025-09-05 20:05 ` Bjorn Helgaas
2025-09-10 15:28 ` Nathan Lynch
2025-09-15 15:03 ` Jonathan Cameron
2025-09-16 16:43 ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 11/13] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
2025-09-15 15:16 ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 12/13] dmaengine: sdxi: Add Kconfig and Makefile Nathan Lynch via B4 Relay
2025-09-15 15:08 ` Jonathan Cameron
2025-09-15 16:44 ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 13/13] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87frcna1mq.fsf@AUSNATLYNCH.amd.com \
--to=nathan.lynch@amd.com \
--cc=bhelgaas@google.com \
--cc=devnull+nathan.lynch.amd.com@kernel.org \
--cc=dmaengine@vger.kernel.org \
--cc=jonathan.cameron@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mario.limonciello@amd.com \
--cc=vkoul@kernel.org \
--cc=wei.huang2@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox