From: sashiko-bot@kernel.org
To: "Anisa Su" <anisa.su887@gmail.com>
Cc: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev
Subject: Re: [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
Date: Thu, 25 Jun 2026 18:15:43 +0000 [thread overview]
Message-ID: <20260625181544.67B691F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260625112638.550691-14-anisa.su@samsung.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Error paths in handle_add_event fail to abort the DCD event chain properly, leading to a stalled device, leaked watchdog timers, and partial chain acceptance.
- [High] Watchdog timeout clears the pending list without transitioning to a discard state, leading to a double-response protocol violation if the chain completes.
--
commit f4b4a95fd0c56d91f06baa4811e9a05ea46012ee
Author: Anisa Su <anisa.su887@gmail.com>
cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
A DC_ADD_CAPACITY event can span multiple event records. This patch adds a
delayed-work watchdog to prevent indefinitely stalled chains and enforces a
maximum limit on the number of pending extents.
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 7dd40fb8d613c..4e887b5cdc3ed 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
[ ... ]
> @@ -1208,15 +1208,78 @@ static void clear_pending_extents(void *_mds)
>
> list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list)
> delete_extent_node(pos);
> + mds->add_ctx.nr_pending = 0;
> mds->add_ctx.group = NULL;
> }
[ ... ]
> +static void cxl_dc_add_timeout(struct work_struct *work)
> +{
> + struct pending_add_ctx *ctx = container_of(to_delayed_work(work),
> + struct pending_add_ctx,
> + timeout_work);
[ ... ]
> + if (cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> + &ctx->pending_extents, 0))
> + dev_dbg(dev, "Failed to send empty ADD_DC_RESPONSE on timeout\n");
> +
> + clear_pending_extents(mds);
> + ctx->armed = false;
> +}
[Severity: High]
If the device is just slow and later sends the remainder of the chain
(including the closing More=0 event), will the driver treat the delayed
events as a fresh chain?
Because cxl_dc_add_timeout() resets ctx->armed = false and nr_pending = 0
without marking the chain as discarded, it appears handle_add_event() would
process a delayed tail of the chain as a new chain.
When the More=0 event is processed, wouldn't the driver issue a second
ADD_DC_RESPONSE for this single event chain, violating the CXL protocol?
[ ... ]
> @@ -1239,10 +1303,20 @@ static int add_to_pending_list(struct list_head *pending_list,
> static int handle_add_event(struct cxl_memdev_state *mds,
> struct cxl_event_dcd *event)
> {
> + struct pending_add_ctx *ctx = &mds->add_ctx;
> struct device *dev = mds->cxlds.dev;
> int rc;
>
> - rc = add_to_pending_list(&mds->add_ctx.pending_extents, &event->extent);
> + guard(mutex)(&ctx->lock);
> +
> + if (ctx->nr_pending >= CXL_DC_MAX_PENDING_EXTENTS) {
> + dev_warn(dev, "DC add chain exceeds %u extents; dropping (firmware bug)\n",
> + CXL_DC_MAX_PENDING_EXTENTS);
> + clear_pending_extents(mds);
> + return -ENOSPC;
> + }
> +
> + rc = add_to_pending_list(&ctx->pending_extents, &event->extent);
> if (rc) {
> clear_pending_extents(mds);
> return rc;
[Severity: High]
Do these early error returns fail to abort the event chain properly?
If the limit is exceeded or memory allocation fails, the code calls
clear_pending_extents() and returns an error without sending an
ADD_DC_RESPONSE.
If this happens on the first extent, wouldn't the device never receive a
response and stall indefinitely?
If it occurs on a subsequent extent, does this leave the watchdog timer
armed to fire spuriously later?
Additionally, because clear_pending_extents() resets nr_pending to 0, if
the device continues sending More=1 events for the same chain, would the
driver incorrectly process them as a new chain and potentially accept the
tail of a chain after dropping the head?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260625112638.550691-1-anisa.su@samsung.com?part=13
next prev parent reply other threads:[~2026-06-25 18:15 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:04 DCD: Add support for Dynamic Capacity Devices (DCD) Anisa Su
2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
2026-06-26 21:43 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 02/31] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 22:26 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 04/31] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-06-26 22:37 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 05/31] cxl/mem: Expose dynamic ram 1 partition in sysfs Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 23:08 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 06/31] cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode Anisa Su
2026-06-25 11:04 ` [PATCH v11 07/31] cxl/region: Add DC DAX region support Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 23:18 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 08/31] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-06-25 11:04 ` [PATCH v11 09/31] cxl/pci: Factor out interrupt policy check Anisa Su
2026-06-25 11:04 ` [PATCH v11 10/31] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-06-25 18:14 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 11/31] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-06-25 11:04 ` [PATCH v11 12/31] cxl/mem: Set up framework for handling DC Events Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 21:54 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains Anisa Su
2026-06-25 18:15 ` sashiko-bot [this message]
2026-06-25 11:04 ` [PATCH v11 14/31] cxl/extent: Handle DC Add Capacity events Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 15/31] cxl/mem: Drop misaligned DCD extent groups Anisa Su
2026-06-25 18:19 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 16/31] cxl/extent: Validate DC extent partition Anisa Su
2026-06-25 18:20 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 17/31] cxl/mem: Enforce tag-group semantics Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 18/31] cxl/extent: Handle DC Release Capacity events Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 19/31] cxl/extent: Enforce cross-region tag uniqueness Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 20/31] cxl/region/extent: Expose dc_extent information in sysfs Anisa Su
2026-06-25 18:33 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 21/31] cxl + dax: Surface dax_resources on DCD Add Capacity events Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 22/31] cxl + dax: Release dax_resources on DCD Release " Anisa Su
2026-06-25 18:36 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 23/31] dax/bus: Factor out dev dax resize logic Anisa Su
2026-06-25 18:27 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 24/31] dax/bus: Add uuid sysfs attribute to dax devices Anisa Su
2026-06-25 11:05 ` [PATCH v11 25/31] dax/bus: Reject resize on DC dax devices and enforce 0-size creation Anisa Su
2026-06-25 11:05 ` [PATCH v11 26/31] dax/bus: Tag-aware uuid claim and show on DC dax devices Anisa Su
2026-06-25 18:26 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 27/31] cxl/region: Read existing extents on region creation Anisa Su
2026-06-25 18:32 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 28/31] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 29/31] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-06-25 18:31 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 30/31] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-06-25 18:34 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 31/31] Documentation/cxl: Document DCD extent handling and DC-backed DAX regions Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 18:00 ` [PATCH v11 03/31] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-06-26 22:30 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260625181544.67B691F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=anisa.su887@gmail.com \
--cc=linux-cxl@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.