From: Dave Jiang <dave.jiang@intel.com>
To: Anisa Su <anisa.su887@gmail.com>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: nvdimm@lists.linux.dev, Dan Williams <djbw@kernel.org>,
Jonathan Cameron <jic23@kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <iweiny@kernel.org>,
Alison Schofield <alison.schofield@intel.com>,
John Groves <John@Groves.net>, Gregory Price <gourry@gourry.net>,
Anisa Su <anisa.su@samsung.com>
Subject: Re: [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
Date: Tue, 30 Jun 2026 14:11:12 -0700 [thread overview]
Message-ID: <3fb3b2c2-9be6-483b-94bd-d0515d5df9b2@intel.com> (raw)
In-Reply-To: <20260625112638.550691-14-anisa.su@samsung.com>
On 6/25/26 4:04 AM, Anisa Su wrote:
> A DC_ADD_CAPACITY event can span multiple event records grouped together
> by the CXL_DCD_EVENT_MORE flag. Extents are staged in the pending list until
> the last event record ('More'=0) is received, at which point the pending
> list is processed. If the device opens such a chain (More=1) but never
> sends the closing record, the staged list sits indefinitely.
>
> Add a delayed-work watchdog that, on expiry, refuses the chain with an
> empty ADD_DC_RESPONSE and drops the staged list.
>
> The 20s timeout is a conservative upper bound and may be tightened
> later. The timeout is purely defensive — the spec does not require it,
> but prevents issues from a lost mailbox response or a crashed fabric manager.
>
> The watchdog bounds how long a chain may stall, but a device could still
> defeat it by streaming More=1 records faster than the timeout, growing the
> staged list without bound. Also cap a runtime chain at
> CXL_DC_MAX_PENDING_EXTENTS and refuse it once exceeded; existing-extent
> recovery is bounded separately by the device's reported extent count.
>
> Signed-off-by: Anisa Su <anisa.su@samsung.com>
Minor comment below in addition to sashiko
>
> ---
> Changes:
> 1. mbox.c: Fix comment in handle_add_event(), before closing the 'More'
> chain and disabling the watchdog. The comment incorrectly claimed
> handle_add_event() runs in system_wq.
> 2. mbox.c: Drop unnecessary initialization of add_ctx.armed=false in
> cxl_memdev_state_create(), as allocated memory is already zeroed
> 3. mbox.c: assert add_ctx.lock is held in add_to_pending_list(); it
> serializes access to add_ctx.pending_extents.
> 4. mbox.c: cap a runtime More=1 chain at CXL_DC_MAX_PENDING_EXTENTS in
> handle_add_event() so a buggy device cannot grow the staged list
> without bound (the watchdog bounds time, not memory).
> ---
> drivers/cxl/core/mbox.c | 98 ++++++++++++++++++++++++++++++++++++++++-
> drivers/cxl/cxlmem.h | 24 ++++++++--
> 2 files changed, 117 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 7dd40fb8d613..4e887b5cdc3e 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1208,15 +1208,78 @@ static void clear_pending_extents(void *_mds)
>
> list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list)
> delete_extent_node(pos);
> + mds->add_ctx.nr_pending = 0;
> mds->add_ctx.group = NULL;
> }
>
> +/*
> + * Defensive cap on extents staged in one runtime More=1 chain: a buggy
> + * device could otherwise grow the list without bound. Not spec-defined.
> + */
> +#define CXL_DC_MAX_PENDING_EXTENTS 100
> +
> +/*
> + * Bound on how long the host will wait for a device to finish a
> + * multi-record DC_ADD_CAPACITY chain (More=1 ... More=0) before
> + * refusing the chain.
> + * The timeout is not defined in the spec, but added for defensive purposes.
> + * Since there is no spec-defined timeout, 20s is chosen as a generous
> + * upper bound and matches the GPF timeout.
> + */
> +#define CXL_DC_ADD_TIMEOUT (20 * HZ)
> +
> +static void cxl_dc_add_timeout(struct work_struct *work)
> +{
> + struct pending_add_ctx *ctx = container_of(to_delayed_work(work),
> + struct pending_add_ctx,
> + timeout_work);
> + struct cxl_memdev_state *mds = container_of(ctx,
> + struct cxl_memdev_state,
> + add_ctx);
> + struct device *dev = mds->cxlds.dev;
> +
> + guard(mutex)(&ctx->lock);
> +
> + /*
> + * handle_add_event() cancels this work non-synchronously (a sync
> + * cancel would deadlock on @ctx->lock, which the chain-close path
> + * holds), so a callback that already started running can reach here
> + * after its chain has moved on. Abort only if a chain is still armed
> + * AND the timer has not been re-armed since this expiry fired: a fresh
> + * mod_delayed_work() (a later extent in this chain, or a new chain)
> + * makes delayed_work_pending() true, meaning this expiry belongs to a
> + * superseded deadline and must not abort the current chain.
> + */
> + if (!ctx->armed || delayed_work_pending(&ctx->timeout_work))
> + return;
> +
> + dev_warn(dev, "DC add chain timed out; refusing staged extents\n");
> +
> + if (cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> + &ctx->pending_extents, 0))
> + dev_dbg(dev, "Failed to send empty ADD_DC_RESPONSE on timeout\n");
> +
> + clear_pending_extents(mds);
> + ctx->armed = false;
> +}
> +
> +static void cxl_cancel_dcd_add_chain_work(void *_mds)
> +{
> + struct cxl_memdev_state *mds = _mds;
> +
> + cancel_delayed_work_sync(&mds->add_ctx.timeout_work);
> +}
> +
> static int add_to_pending_list(struct list_head *pending_list,
> struct cxl_extent *to_add)
> {
> + struct pending_add_ctx *ctx =
> + container_of(pending_list, struct pending_add_ctx, pending_extents);
> struct cxl_extent_list_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
> struct cxl_extent *extent;
>
> + lockdep_assert_held(&ctx->lock);
> +
> if (!node)
> return -ENOMEM;
> extent = kmemdup(to_add, sizeof(*extent), GFP_KERNEL);
> @@ -1227,6 +1290,7 @@ static int add_to_pending_list(struct list_head *pending_list,
>
> node->extent = extent;
> list_add_tail(&node->list, pending_list);
> + ctx->nr_pending++;
> return 0;
> }
>
> @@ -1239,10 +1303,20 @@ static int add_to_pending_list(struct list_head *pending_list,
> static int handle_add_event(struct cxl_memdev_state *mds,
> struct cxl_event_dcd *event)
> {
> + struct pending_add_ctx *ctx = &mds->add_ctx;
> struct device *dev = mds->cxlds.dev;
> int rc;
>
> - rc = add_to_pending_list(&mds->add_ctx.pending_extents, &event->extent);
> + guard(mutex)(&ctx->lock);
> +
> + if (ctx->nr_pending >= CXL_DC_MAX_PENDING_EXTENTS) {
> + dev_warn(dev, "DC add chain exceeds %u extents; dropping (firmware bug)\n",
> + CXL_DC_MAX_PENDING_EXTENTS);
> + clear_pending_extents(mds);
> + return -ENOSPC;
> + }
> +
> + rc = add_to_pending_list(&ctx->pending_extents, &event->extent);
> if (rc) {
> clear_pending_extents(mds);
> return rc;
> @@ -1250,9 +1324,19 @@ static int handle_add_event(struct cxl_memdev_state *mds,
>
> if (event->flags & CXL_DCD_EVENT_MORE) {
> dev_dbg(dev, "more bit set; delay the surfacing of extent\n");
> + mod_delayed_work(system_wq, &ctx->timeout_work,
> + CXL_DC_ADD_TIMEOUT);
> + ctx->armed = true;
> return 0;
> }
>
> + /*
> + * Chain is closing. Disarm before flushing so a pending watchdog
> + * (queued but blocked on @ctx->lock) sees !armed and bails out.
> + */
> + ctx->armed = false;
> + cancel_delayed_work(&ctx->timeout_work);
> +
> rc = cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> &mds->add_ctx.pending_extents, 0);
> clear_pending_extents(mds);
> @@ -2036,11 +2120,23 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>
> mutex_init(&mds->event.log_lock);
> INIT_LIST_HEAD(&mds->add_ctx.pending_extents);
> + mutex_init(&mds->add_ctx.lock);
> + INIT_DELAYED_WORK(&mds->add_ctx.timeout_work,
> + cxl_dc_add_timeout);
>
> rc = devm_add_action_or_reset(dev, clear_pending_extents, mds);
> if (rc)
> return ERR_PTR(rc);
>
> + /*
> + * Registered after clear_pending_extents so devm's reverse-order
> + * unwind cancels (and waits for) the watchdog first, then the list
> + * cleanup runs with the watchdog guaranteed not to refire.
> + */
> + rc = devm_add_action_or_reset(dev, cxl_cancel_dcd_add_chain_work, mds);
> + if (rc)
> + return ERR_PTR(rc);
> +
> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
> if (rc == -EOPNOTSUPP)
> dev_warn(dev, "CXL MCE unsupported\n");
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 4ffa7bd1e5f1..81498d47f309 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -8,6 +8,8 @@
> #include <linux/uuid.h>
> #include <linux/node.h>
> #include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/workqueue.h>
> #include <cxl/event.h>
> #include <cxl/mailbox.h>
> #include "cxl.h"
> @@ -407,19 +409,33 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
>
> /**
> * struct pending_add_ctx - Staging state for an in-progress
> - * DCD_ADD_CAPACITY event chain
> + * DCD_ADD_CAPACITY event chain
> * @pending_extents: extents received so far in the chain; flushed when
> - * the chain closes (More=0)
> + * the chain closes (More=0)
> * @group: tag group being assembled from the chain
> + * @timeout_work: watchdog that fires if a chain is opened with
> + * CXL_DCD_EVENT_MORE but the closing record never arrives
> + * @lock: serialises updates to the chain state against the watchdog
> + * @armed: set when a More=1 chain opens; cleared when the chain closes,
> + * either by a More=0 event record or by the watchdog firing.
> *
> * A DCD_ADD_CAPACITY notification can span multiple event records
> * stitched together by the CXL_DCD_EVENT_MORE flag. Records are staged
> - * here until the device clears More, at which point the staged batch is
> - * processed and responded to as a single Add_DC_Response.
> + * here until an event record with 'More'=0 is received, at which point the
> + * staged batch is processed and responded to as a single Add_DC_Response.
> + *
> + * If a chain is opened (More=1) but the device never sends the closing
> + * record, the staged list would otherwise sit indefinitely. @timeout_work
> + * is a defensive watchdog that refuses such a chain with an empty response
> + * and drops the staged list.
> */
> struct pending_add_ctx {
> struct list_head pending_extents;
> struct cxl_dc_tag_group *group;
> + struct delayed_work timeout_work;
> + struct mutex lock;
> + unsigned int nr_pending;
Missing kdoc in comment section
> + bool armed;
> };
>
> /**
next prev parent reply other threads:[~2026-06-30 21:11 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:04 DCD: Add support for Dynamic Capacity Devices (DCD) Anisa Su
2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
2026-06-26 21:43 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 02/31] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 22:26 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 04/31] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-06-26 22:37 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 05/31] cxl/mem: Expose dynamic ram 1 partition in sysfs Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 23:08 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 06/31] cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode Anisa Su
2026-06-25 11:04 ` [PATCH v11 07/31] cxl/region: Add DC DAX region support Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 23:18 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 08/31] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-06-25 11:04 ` [PATCH v11 09/31] cxl/pci: Factor out interrupt policy check Anisa Su
2026-06-25 11:04 ` [PATCH v11 10/31] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-06-25 18:14 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 11/31] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-06-25 11:04 ` [PATCH v11 12/31] cxl/mem: Set up framework for handling DC Events Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 21:54 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains Anisa Su
2026-06-25 18:15 ` sashiko-bot
2026-06-30 21:11 ` Dave Jiang [this message]
2026-06-25 11:04 ` [PATCH v11 14/31] cxl/extent: Handle DC Add Capacity events Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 15/31] cxl/mem: Drop misaligned DCD extent groups Anisa Su
2026-06-25 18:19 ` sashiko-bot
2026-06-30 21:23 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 16/31] cxl/extent: Validate DC extent partition Anisa Su
2026-06-25 18:20 ` sashiko-bot
2026-06-30 22:49 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 17/31] cxl/mem: Enforce tag-group semantics Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 18/31] cxl/extent: Handle DC Release Capacity events Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 19/31] cxl/extent: Enforce cross-region tag uniqueness Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 20/31] cxl/region/extent: Expose dc_extent information in sysfs Anisa Su
2026-06-25 18:33 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 21/31] cxl + dax: Surface dax_resources on DCD Add Capacity events Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 22/31] cxl + dax: Release dax_resources on DCD Release " Anisa Su
2026-06-25 18:36 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 23/31] dax/bus: Factor out dev dax resize logic Anisa Su
2026-06-25 18:27 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 24/31] dax/bus: Add uuid sysfs attribute to dax devices Anisa Su
2026-06-30 23:21 ` Dave Jiang
2026-06-25 11:05 ` [PATCH v11 25/31] dax/bus: Reject resize on DC dax devices and enforce 0-size creation Anisa Su
2026-06-25 11:05 ` [PATCH v11 26/31] dax/bus: Tag-aware uuid claim and show on DC dax devices Anisa Su
2026-06-25 18:26 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 27/31] cxl/region: Read existing extents on region creation Anisa Su
2026-06-25 18:32 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 28/31] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 29/31] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-06-25 18:31 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 30/31] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-06-25 18:34 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 31/31] Documentation/cxl: Document DCD extent handling and DC-backed DAX regions Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 18:00 ` [PATCH v11 03/31] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-06-26 22:30 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fb3b2c2-9be6-483b-94bd-d0515d5df9b2@intel.com \
--to=dave.jiang@intel.com \
--cc=John@Groves.net \
--cc=alison.schofield@intel.com \
--cc=anisa.su887@gmail.com \
--cc=anisa.su@samsung.com \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=gourry@gourry.net \
--cc=iweiny@kernel.org \
--cc=jic23@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox