public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathan.lynch@amd.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Nathan Lynch via B4 Relay"
	<devnull+nathan.lynch.amd.com@kernel.org>
Cc: Vinod Koul <vkoul@kernel.org>, Wei Huang <wei.huang2@amd.com>,
	"Mario Limonciello" <mario.limonciello@amd.com>,
	Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <dmaengine@vger.kernel.org>
Subject: Re: [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support
Date: Mon, 15 Sep 2025 15:42:37 -0500	[thread overview]
Message-ID: <87frcna1mq.fsf@AUSNATLYNCH.amd.com> (raw)
In-Reply-To: <20250915131151.00005f26@huawei.com>

Jonathan Cameron <jonathan.cameron@huawei.com> writes:

> On Fri, 05 Sep 2025 13:48:29 -0500
> Nathan Lynch via B4 Relay <devnull+nathan.lynch.amd.com@kernel.org> wrote:
>
>> From: Nathan Lynch <nathan.lynch@amd.com>
>> 
>> SDXI implementations provide software with detailed information about
>> error conditions using a per-device ring buffer in system memory. When
>> an error condition is signaled via interrupt, the driver retrieves any
>> pending error log entries and reports them to the kernel log.
>> 
>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
> Hi,
> A few more comments inline. Kind of similar stuff around
> having both register definitions for unpacking and the structure
> definitions in patch 2.
>
> Thanks,
>
> Jonathan
>> ---
>>  drivers/dma/sdxi/error.c | 340 +++++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/dma/sdxi/error.h |  16 +++
>>  2 files changed, 356 insertions(+)
>> 
>> diff --git a/drivers/dma/sdxi/error.c b/drivers/dma/sdxi/error.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..c5e33f5989250352f6b081a3049b3b1f972c85a6
>> --- /dev/null
>> +++ b/drivers/dma/sdxi/error.c
>
>> +/* The "unpacked" counterpart to ERRLOG_HD_ENT. */
>> +struct errlog_entry {
>> +	u64 dsc_index;
>> +	u16 cxt_num;
>> +	u16 err_class;
>> +	u16 type;
>> +	u8 step;
>> +	u8 buf;
>> +	u8 sub_step;
>> +	u8 re;
>> +	bool vl;
>> +	bool cv;
>> +	bool div;
>> +	bool bv;
>> +};
>> +
>> +#define ERRLOG_ENTRY_FIELD(hi_, lo_, name_)				\
>> +	PACKED_FIELD(hi_, lo_, struct errlog_entry, name_)
>> +#define ERRLOG_ENTRY_FLAG(nr_, name_) \
>> +	ERRLOG_ENTRY_FIELD(nr_, nr_, name_)
>> +
>> +/* Refer to "Error Log Header Entry (ERRLOG_HD_ENT)" */
>> +static const struct packed_field_u16 errlog_hd_ent_fields[] = {
>> +	ERRLOG_ENTRY_FLAG(0, vl),
>> +	ERRLOG_ENTRY_FIELD(13, 8, step),
>> +	ERRLOG_ENTRY_FIELD(26, 16, type),
>> +	ERRLOG_ENTRY_FLAG(32, cv),
>> +	ERRLOG_ENTRY_FLAG(33, div),
>> +	ERRLOG_ENTRY_FLAG(34, bv),
>> +	ERRLOG_ENTRY_FIELD(38, 36, buf),
>> +	ERRLOG_ENTRY_FIELD(43, 40, sub_step),
>> +	ERRLOG_ENTRY_FIELD(46, 44, re),
>> +	ERRLOG_ENTRY_FIELD(63, 48, cxt_num),
>> +	ERRLOG_ENTRY_FIELD(127, 64, dsc_index),
>> +	ERRLOG_ENTRY_FIELD(367, 352, err_class),
>
> The association between the fields here and struct sdxi_err_log_hd_ent
> to me should be via some defines in patch 2 for the various fields
> embedded in misc0 etc.
>
>> +};
>
>> +static void sdxi_print_err(struct sdxi_dev *sdxi, u64 err_rd)
>> +{
>> +	struct errlog_entry ent;
>> +	size_t index;
>> +
>> +	index = err_rd % ERROR_LOG_ENTRIES;
>> +
>> +	unpack_fields(&sdxi->err_log[index], sizeof(sdxi->err_log[0]),
>> +		      &ent, errlog_hd_ent_fields, SDXI_PACKING_QUIRKS);
>> +
>> +	if (!ent.vl) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with vl=0\n");
>> +		return;
>> +	}
>> +
>> +	if (ent.type != OP_TYPE_ERRLOG) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with type=%#x\n",
>> +				    ent.type);
>> +		return;
>> +	}
>> +
>> +	sdxi_err(sdxi, "error log entry[%zu], MMIO_ERR_RD=%#llx:\n",
>> +		 index, err_rd);
>> +	sdxi_err(sdxi, "  re: %#x (%s)\n", ent.re, reaction_str(ent.re));
>> +	sdxi_err(sdxi, "  step: %#x (%s)\n", ent.step, step_str(ent.step));
>> +	sdxi_err(sdxi, "  sub_step: %#x (%s)\n",
>> +		 ent.sub_step, sub_step_str(ent.sub_step));
>> +	sdxi_err(sdxi, "  cv: %u div: %u bv: %u\n", ent.cv, ent.div, ent.bv);
>> +	if (ent.bv)
>> +		sdxi_err(sdxi, "  buf: %u\n", ent.buf);
>> +	if (ent.cv)
>> +		sdxi_err(sdxi, "  cxt_num: %#x\n", ent.cxt_num);
>> +	if (ent.div)
>> +		sdxi_err(sdxi, "  dsc_index: %#llx\n", ent.dsc_index);
>> +	sdxi_err(sdxi, "  err_class: %#x\n", ent.err_class);
> Consider using tracepoints for error logging rather than large splats
> in the log.

Agreed, context-level errors (which will be user-triggerable once there
is an ABI exposed to user space) should not be dumped to the kernel log
by default.

Some function-level errors may be appropriate to print.

> I'd then just fill the tracepoint in directly rather than have an
> unpacking step.

Yes, I can do that.


>
>> +}
>
>> +/* Refer to "Error Log Initialization" */
>> +int sdxi_error_init(struct sdxi_dev *sdxi)
>> +{
>> +	u64 reg;
>> +	int err;
>> +
>> +	/* 1. Clear MMIO_ERR_CFG. Error interrupts are inhibited until step 6. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CFG, 0);
>> +
>> +	/* 2. Clear MMIO_ERR_STS. The flags in this register are RW1C. */
>> +	reg = FIELD_PREP(SDXI_MMIO_ERR_STS_STS_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_OVF_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_ERR_BIT, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_STS, reg);
>> +
>> +	/* 3. Allocate memory for the error log ring buffer, initialize to zero. */
>> +	sdxi->err_log = dma_alloc_coherent(sdxi_to_dev(sdxi), ERROR_LOG_SZ,
>> +					   &sdxi->err_log_dma, GFP_KERNEL);
>> +	if (!sdxi->err_log)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * 4. Set MMIO_ERR_CTL.intr_en to 1 if interrupts on
>> +	 * context-level errors are desired.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_ERR_CTL);
>> +	FIELD_MODIFY(SDXI_MMIO_ERR_CTL_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CTL, reg);
>> +
>> +	/*
>> +	 * The spec is not explicit about when to do this, but this
>> +	 * seems like the right time: enable interrupt on
>> +	 * function-level transition to error state.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
>> +	FIELD_MODIFY(SDXI_MMIO_CTL0_FN_ERR_INTR_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_CTL0, reg);
>> +
>> +	/* 5. Clear MMIO_ERR_WRT and MMIO_ERR_RD. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_WRT, 0);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_RD, 0);
>> +
>> +	/*
>> +	 * Error interrupts can be generated once MMIO_ERR_CFG.en is
>> +	 * set in step 6, so set up the handler now.
>> +	 */
>> +	err = request_threaded_irq(sdxi->error_irq, NULL, sdxi_irq_thread,
>> +				   IRQF_TRIGGER_NONE, "SDXI error", sdxi);
>> +	if (err)
>> +		goto free_errlog;
>> +
>> +	/* 6. Program MMIO_ERR_CFG. */
>
> I'm guessing these are numbers steps in some bit of the spec?
> If not some of these comments like this one provide no value.  We can
> see what is being written from the code!  Perhaps add a very specific
> spec reference if you want to show why the numbering is here.

Perhaps it's understated, but at the beginning of this function:

  /* Refer to "Error Log Initialization" */
  int sdxi_error_init(struct sdxi_dev *sdxi)

The numbered steps in the function correspond to the numbered steps in
that part of the spec.

I could make the comment something like:

/*
 * The numbered steps below correspond to the sequence outlined in 3.4.2
 * "Error Log Initialization".
 */

though I'm unsure how stable the section numbering in the SDXI spec will
be over time.

  reply	other threads:[~2025-09-15 20:42 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-05 18:48 [PATCH RFC 00/13] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 01/13] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
2025-09-15 17:25   ` Bjorn Helgaas
2025-09-15 20:17     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 02/13] dmaengine: sdxi: Add control structure definitions Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 03/13] dmaengine: sdxi: Add descriptor encoding and unit tests Nathan Lynch via B4 Relay
2025-09-15 11:52   ` Jonathan Cameron
2025-09-15 19:30     ` Nathan Lynch
2025-09-16 14:20       ` Jonathan Cameron
2025-09-16 19:06         ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 04/13] dmaengine: sdxi: Add MMIO register definitions Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 05/13] dmaengine: sdxi: Add software data structures Nathan Lynch via B4 Relay
2025-09-15 11:59   ` Jonathan Cameron
2025-09-16 19:07     ` Nathan Lynch
2025-09-16  9:38   ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support Nathan Lynch via B4 Relay
2025-09-15 12:11   ` Jonathan Cameron
2025-09-15 20:42     ` Nathan Lynch [this message]
2025-09-16 14:23       ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 07/13] dmaengine: sdxi: Import descriptor enqueue code from spec Nathan Lynch via B4 Relay
2025-09-15 12:18   ` Jonathan Cameron
2025-09-16 17:05   ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 08/13] dmaengine: sdxi: Context creation/removal, descriptor submission Nathan Lynch via B4 Relay
2025-09-15 14:12   ` Jonathan Cameron
2025-09-16 20:40     ` Nathan Lynch
2025-09-17 13:34       ` Jonathan Cameron
2025-09-15 19:42   ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 09/13] dmaengine: sdxi: Add core device management code Nathan Lynch via B4 Relay
2025-09-15 14:23   ` Jonathan Cameron
2025-09-16 21:23     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 10/13] dmaengine: sdxi: Add PCI driver support Nathan Lynch via B4 Relay
2025-09-05 19:14   ` Mario Limonciello
2025-09-10 15:25     ` Nathan Lynch
2025-09-05 20:05   ` Bjorn Helgaas
2025-09-10 15:28     ` Nathan Lynch
2025-09-15 15:03   ` Jonathan Cameron
2025-09-16 16:43   ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 11/13] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
2025-09-15 15:16   ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 12/13] dmaengine: sdxi: Add Kconfig and Makefile Nathan Lynch via B4 Relay
2025-09-15 15:08   ` Jonathan Cameron
2025-09-15 16:44     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 13/13] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frcna1mq.fsf@AUSNATLYNCH.amd.com \
    --to=nathan.lynch@amd.com \
    --cc=bhelgaas@google.com \
    --cc=devnull+nathan.lynch.amd.com@kernel.org \
    --cc=dmaengine@vger.kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=vkoul@kernel.org \
    --cc=wei.huang2@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox