All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nathan Lynch <nathan.lynch@amd.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>,
	"Nathan Lynch via B4 Relay"
	<devnull+nathan.lynch.amd.com@kernel.org>
Cc: Vinod Koul <vkoul@kernel.org>, Wei Huang <wei.huang2@amd.com>,
	"Mario Limonciello" <mario.limonciello@amd.com>,
	Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <dmaengine@vger.kernel.org>
Subject: Re: [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support
Date: Mon, 15 Sep 2025 15:42:37 -0500	[thread overview]
Message-ID: <87frcna1mq.fsf@AUSNATLYNCH.amd.com> (raw)
In-Reply-To: <20250915131151.00005f26@huawei.com>

Jonathan Cameron <jonathan.cameron@huawei.com> writes:

> On Fri, 05 Sep 2025 13:48:29 -0500
> Nathan Lynch via B4 Relay <devnull+nathan.lynch.amd.com@kernel.org> wrote:
>
>> From: Nathan Lynch <nathan.lynch@amd.com>
>> 
>> SDXI implementations provide software with detailed information about
>> error conditions using a per-device ring buffer in system memory. When
>> an error condition is signaled via interrupt, the driver retrieves any
>> pending error log entries and reports them to the kernel log.
>> 
>> Co-developed-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Wei Huang <wei.huang2@amd.com>
>> Signed-off-by: Nathan Lynch <nathan.lynch@amd.com>
> Hi,
> A few more comments inline. Kind of similar stuff around
> having both register definitions for unpacking and the structure
> definitions in patch 2.
>
> Thanks,
>
> Jonathan
>> ---
>>  drivers/dma/sdxi/error.c | 340 +++++++++++++++++++++++++++++++++++++++++++++++
>>  drivers/dma/sdxi/error.h |  16 +++
>>  2 files changed, 356 insertions(+)
>> 
>> diff --git a/drivers/dma/sdxi/error.c b/drivers/dma/sdxi/error.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..c5e33f5989250352f6b081a3049b3b1f972c85a6
>> --- /dev/null
>> +++ b/drivers/dma/sdxi/error.c
>
>> +/* The "unpacked" counterpart to ERRLOG_HD_ENT. */
>> +struct errlog_entry {
>> +	u64 dsc_index;
>> +	u16 cxt_num;
>> +	u16 err_class;
>> +	u16 type;
>> +	u8 step;
>> +	u8 buf;
>> +	u8 sub_step;
>> +	u8 re;
>> +	bool vl;
>> +	bool cv;
>> +	bool div;
>> +	bool bv;
>> +};
>> +
>> +#define ERRLOG_ENTRY_FIELD(hi_, lo_, name_)				\
>> +	PACKED_FIELD(hi_, lo_, struct errlog_entry, name_)
>> +#define ERRLOG_ENTRY_FLAG(nr_, name_) \
>> +	ERRLOG_ENTRY_FIELD(nr_, nr_, name_)
>> +
>> +/* Refer to "Error Log Header Entry (ERRLOG_HD_ENT)" */
>> +static const struct packed_field_u16 errlog_hd_ent_fields[] = {
>> +	ERRLOG_ENTRY_FLAG(0, vl),
>> +	ERRLOG_ENTRY_FIELD(13, 8, step),
>> +	ERRLOG_ENTRY_FIELD(26, 16, type),
>> +	ERRLOG_ENTRY_FLAG(32, cv),
>> +	ERRLOG_ENTRY_FLAG(33, div),
>> +	ERRLOG_ENTRY_FLAG(34, bv),
>> +	ERRLOG_ENTRY_FIELD(38, 36, buf),
>> +	ERRLOG_ENTRY_FIELD(43, 40, sub_step),
>> +	ERRLOG_ENTRY_FIELD(46, 44, re),
>> +	ERRLOG_ENTRY_FIELD(63, 48, cxt_num),
>> +	ERRLOG_ENTRY_FIELD(127, 64, dsc_index),
>> +	ERRLOG_ENTRY_FIELD(367, 352, err_class),
>
> The association between the fields here and struct sdxi_err_log_hd_ent
> to me should be via some defines in patch 2 for the various fields
> embedded in misc0 etc.
>
>> +};
>
>> +static void sdxi_print_err(struct sdxi_dev *sdxi, u64 err_rd)
>> +{
>> +	struct errlog_entry ent;
>> +	size_t index;
>> +
>> +	index = err_rd % ERROR_LOG_ENTRIES;
>> +
>> +	unpack_fields(&sdxi->err_log[index], sizeof(sdxi->err_log[0]),
>> +		      &ent, errlog_hd_ent_fields, SDXI_PACKING_QUIRKS);
>> +
>> +	if (!ent.vl) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with vl=0\n");
>> +		return;
>> +	}
>> +
>> +	if (ent.type != OP_TYPE_ERRLOG) {
>> +		dev_err_ratelimited(sdxi_to_dev(sdxi),
>> +				    "Ignoring error log entry with type=%#x\n",
>> +				    ent.type);
>> +		return;
>> +	}
>> +
>> +	sdxi_err(sdxi, "error log entry[%zu], MMIO_ERR_RD=%#llx:\n",
>> +		 index, err_rd);
>> +	sdxi_err(sdxi, "  re: %#x (%s)\n", ent.re, reaction_str(ent.re));
>> +	sdxi_err(sdxi, "  step: %#x (%s)\n", ent.step, step_str(ent.step));
>> +	sdxi_err(sdxi, "  sub_step: %#x (%s)\n",
>> +		 ent.sub_step, sub_step_str(ent.sub_step));
>> +	sdxi_err(sdxi, "  cv: %u div: %u bv: %u\n", ent.cv, ent.div, ent.bv);
>> +	if (ent.bv)
>> +		sdxi_err(sdxi, "  buf: %u\n", ent.buf);
>> +	if (ent.cv)
>> +		sdxi_err(sdxi, "  cxt_num: %#x\n", ent.cxt_num);
>> +	if (ent.div)
>> +		sdxi_err(sdxi, "  dsc_index: %#llx\n", ent.dsc_index);
>> +	sdxi_err(sdxi, "  err_class: %#x\n", ent.err_class);
> Consider using tracepoints for error logging rather than large splats
> in the log.

Agreed, context-level errors (which will be user-triggerable once there
is an ABI exposed to user space) should not be dumped to the kernel log
by default.

Some function-level errors may be appropriate to print.

> I'd then just fill the tracepoint in directly rather than have an
> unpacking step.

Yes, I can do that.


>
>> +}
>
>> +/* Refer to "Error Log Initialization" */
>> +int sdxi_error_init(struct sdxi_dev *sdxi)
>> +{
>> +	u64 reg;
>> +	int err;
>> +
>> +	/* 1. Clear MMIO_ERR_CFG. Error interrupts are inhibited until step 6. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CFG, 0);
>> +
>> +	/* 2. Clear MMIO_ERR_STS. The flags in this register are RW1C. */
>> +	reg = FIELD_PREP(SDXI_MMIO_ERR_STS_STS_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_OVF_BIT, 1) |
>> +	      FIELD_PREP(SDXI_MMIO_ERR_STS_ERR_BIT, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_STS, reg);
>> +
>> +	/* 3. Allocate memory for the error log ring buffer, initialize to zero. */
>> +	sdxi->err_log = dma_alloc_coherent(sdxi_to_dev(sdxi), ERROR_LOG_SZ,
>> +					   &sdxi->err_log_dma, GFP_KERNEL);
>> +	if (!sdxi->err_log)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * 4. Set MMIO_ERR_CTL.intr_en to 1 if interrupts on
>> +	 * context-level errors are desired.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_ERR_CTL);
>> +	FIELD_MODIFY(SDXI_MMIO_ERR_CTL_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_CTL, reg);
>> +
>> +	/*
>> +	 * The spec is not explicit about when to do this, but this
>> +	 * seems like the right time: enable interrupt on
>> +	 * function-level transition to error state.
>> +	 */
>> +	reg = sdxi_read64(sdxi, SDXI_MMIO_CTL0);
>> +	FIELD_MODIFY(SDXI_MMIO_CTL0_FN_ERR_INTR_EN, &reg, 1);
>> +	sdxi_write64(sdxi, SDXI_MMIO_CTL0, reg);
>> +
>> +	/* 5. Clear MMIO_ERR_WRT and MMIO_ERR_RD. */
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_WRT, 0);
>> +	sdxi_write64(sdxi, SDXI_MMIO_ERR_RD, 0);
>> +
>> +	/*
>> +	 * Error interrupts can be generated once MMIO_ERR_CFG.en is
>> +	 * set in step 6, so set up the handler now.
>> +	 */
>> +	err = request_threaded_irq(sdxi->error_irq, NULL, sdxi_irq_thread,
>> +				   IRQF_TRIGGER_NONE, "SDXI error", sdxi);
>> +	if (err)
>> +		goto free_errlog;
>> +
>> +	/* 6. Program MMIO_ERR_CFG. */
>
> I'm guessing these are numbers steps in some bit of the spec?
> If not some of these comments like this one provide no value.  We can
> see what is being written from the code!  Perhaps add a very specific
> spec reference if you want to show why the numbering is here.

Perhaps it's understated, but at the beginning of this function:

  /* Refer to "Error Log Initialization" */
  int sdxi_error_init(struct sdxi_dev *sdxi)

The numbered steps in the function correspond to the numbered steps in
that part of the spec.

I could make the comment something like:

/*
 * The numbered steps below correspond to the sequence outlined in 3.4.2
 * "Error Log Initialization".
 */

though I'm unsure how stable the section numbering in the SDXI spec will
be over time.

  reply	other threads:[~2025-09-15 20:42 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-05 18:48 [PATCH RFC 00/13] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch
2025-09-05 18:48 ` Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 01/13] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 17:25   ` Bjorn Helgaas
2025-09-15 20:17     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 02/13] dmaengine: sdxi: Add control structure definitions Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 03/13] dmaengine: sdxi: Add descriptor encoding and unit tests Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 11:52   ` Jonathan Cameron
2025-09-15 19:30     ` Nathan Lynch
2025-09-16 14:20       ` Jonathan Cameron
2025-09-16 19:06         ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 04/13] dmaengine: sdxi: Add MMIO register definitions Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-05 18:48 ` [PATCH RFC 05/13] dmaengine: sdxi: Add software data structures Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 11:59   ` Jonathan Cameron
2025-09-16 19:07     ` Nathan Lynch
2025-09-16  9:38   ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 06/13] dmaengine: sdxi: Add error reporting support Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 12:11   ` Jonathan Cameron
2025-09-15 20:42     ` Nathan Lynch [this message]
2025-09-16 14:23       ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 07/13] dmaengine: sdxi: Import descriptor enqueue code from spec Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 12:18   ` Jonathan Cameron
2025-09-16 17:05   ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 08/13] dmaengine: sdxi: Context creation/removal, descriptor submission Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 14:12   ` Jonathan Cameron
2025-09-16 20:40     ` Nathan Lynch
2025-09-17 13:34       ` Jonathan Cameron
2025-09-15 19:42   ` Markus Elfring
2025-09-05 18:48 ` [PATCH RFC 09/13] dmaengine: sdxi: Add core device management code Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 14:23   ` Jonathan Cameron
2025-09-16 21:23     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 10/13] dmaengine: sdxi: Add PCI driver support Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-05 19:14   ` Mario Limonciello
2025-09-10 15:25     ` Nathan Lynch
2025-09-05 20:05   ` Bjorn Helgaas
2025-09-10 15:28     ` Nathan Lynch
2025-09-15 15:03   ` Jonathan Cameron
2025-09-16 16:43   ` [External] : " ALOK TIWARI
2025-09-05 18:48 ` [PATCH RFC 11/13] dmaengine: sdxi: Add DMA engine provider Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-15 15:16   ` Jonathan Cameron
2025-09-05 18:48 ` [PATCH RFC 12/13] dmaengine: sdxi: Add Kconfig and Makefile Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay
2025-09-08  4:48   ` kernel test robot
2025-09-08  5:19   ` kernel test robot
2025-09-15 15:08   ` Jonathan Cameron
2025-09-15 16:44     ` Nathan Lynch
2025-09-05 18:48 ` [PATCH RFC 13/13] MAINTAINERS: Add entry for SDXI driver Nathan Lynch
2025-09-05 18:48   ` Nathan Lynch via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frcna1mq.fsf@AUSNATLYNCH.amd.com \
    --to=nathan.lynch@amd.com \
    --cc=bhelgaas@google.com \
    --cc=devnull+nathan.lynch.amd.com@kernel.org \
    --cc=dmaengine@vger.kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=vkoul@kernel.org \
    --cc=wei.huang2@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.