From: Dave Jiang <dave.jiang@intel.com>
To: Anisa Su <anisa.su887@gmail.com>,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: nvdimm@lists.linux.dev, Dan Williams <djbw@kernel.org>,
Jonathan Cameron <jic23@kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <iweiny@kernel.org>,
Alison Schofield <alison.schofield@intel.com>,
John Groves <John@Groves.net>, Gregory Price <gourry@gourry.net>,
Anisa Su <anisa.su@samsung.com>
Subject: Re: [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
Date: Tue, 30 Jun 2026 14:11:12 -0700 [thread overview]
Message-ID: <3fb3b2c2-9be6-483b-94bd-d0515d5df9b2@intel.com> (raw)
In-Reply-To: <20260625112638.550691-14-anisa.su@samsung.com>
On 6/25/26 4:04 AM, Anisa Su wrote:
> A DC_ADD_CAPACITY event can span multiple event records grouped together
> by the CXL_DCD_EVENT_MORE flag. Extents are staged in the pending list until
> the last event record ('More'=0) is received, at which point the pending
> list is processed. If the device opens such a chain (More=1) but never
> sends the closing record, the staged list sits indefinitely.
>
> Add a delayed-work watchdog that, on expiry, refuses the chain with an
> empty ADD_DC_RESPONSE and drops the staged list.
>
> The 20s timeout is a conservative upper bound and may be tightened
> later. The timeout is purely defensive — the spec does not require it,
> but prevents issues from a lost mailbox response or a crashed fabric manager.
>
> The watchdog bounds how long a chain may stall, but a device could still
> defeat it by streaming More=1 records faster than the timeout, growing the
> staged list without bound. Also cap a runtime chain at
> CXL_DC_MAX_PENDING_EXTENTS and refuse it once exceeded; existing-extent
> recovery is bounded separately by the device's reported extent count.
>
> Signed-off-by: Anisa Su <anisa.su@samsung.com>
Minor comment below in addition to sashiko
>
> ---
> Changes:
> 1. mbox.c: Fix comment in handle_add_event(), before closing the 'More'
> chain and disabling the watchdog. The comment incorrectly claimed
> handle_add_event() runs in system_wq.
> 2. mbox.c: Drop unnecessary initialization of add_ctx.armed=false in
> cxl_memdev_state_create(), as allocated memory is already zeroed
> 3. mbox.c: assert add_ctx.lock is held in add_to_pending_list(); it
> serializes access to add_ctx.pending_extents.
> 4. mbox.c: cap a runtime More=1 chain at CXL_DC_MAX_PENDING_EXTENTS in
> handle_add_event() so a buggy device cannot grow the staged list
> without bound (the watchdog bounds time, not memory).
> ---
> drivers/cxl/core/mbox.c | 98 ++++++++++++++++++++++++++++++++++++++++-
> drivers/cxl/cxlmem.h | 24 ++++++++--
> 2 files changed, 117 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 7dd40fb8d613..4e887b5cdc3e 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1208,15 +1208,78 @@ static void clear_pending_extents(void *_mds)
>
> list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list)
> delete_extent_node(pos);
> + mds->add_ctx.nr_pending = 0;
> mds->add_ctx.group = NULL;
> }
>
> +/*
> + * Defensive cap on extents staged in one runtime More=1 chain: a buggy
> + * device could otherwise grow the list without bound. Not spec-defined.
> + */
> +#define CXL_DC_MAX_PENDING_EXTENTS 100
> +
> +/*
> + * Bound on how long the host will wait for a device to finish a
> + * multi-record DC_ADD_CAPACITY chain (More=1 ... More=0) before
> + * refusing the chain.
> + * The timeout is not defined in the spec, but added for defensive purposes.
> + * Since there is no spec-defined timeout, 20s is chosen as a generous
> + * upper bound and matches the GPF timeout.
> + */
> +#define CXL_DC_ADD_TIMEOUT (20 * HZ)
> +
> +static void cxl_dc_add_timeout(struct work_struct *work)
> +{
> + struct pending_add_ctx *ctx = container_of(to_delayed_work(work),
> + struct pending_add_ctx,
> + timeout_work);
> + struct cxl_memdev_state *mds = container_of(ctx,
> + struct cxl_memdev_state,
> + add_ctx);
> + struct device *dev = mds->cxlds.dev;
> +
> + guard(mutex)(&ctx->lock);
> +
> + /*
> + * handle_add_event() cancels this work non-synchronously (a sync
> + * cancel would deadlock on @ctx->lock, which the chain-close path
> + * holds), so a callback that already started running can reach here
> + * after its chain has moved on. Abort only if a chain is still armed
> + * AND the timer has not been re-armed since this expiry fired: a fresh
> + * mod_delayed_work() (a later extent in this chain, or a new chain)
> + * makes delayed_work_pending() true, meaning this expiry belongs to a
> + * superseded deadline and must not abort the current chain.
> + */
> + if (!ctx->armed || delayed_work_pending(&ctx->timeout_work))
> + return;
> +
> + dev_warn(dev, "DC add chain timed out; refusing staged extents\n");
> +
> + if (cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> + &ctx->pending_extents, 0))
> + dev_dbg(dev, "Failed to send empty ADD_DC_RESPONSE on timeout\n");
> +
> + clear_pending_extents(mds);
> + ctx->armed = false;
> +}
> +
> +static void cxl_cancel_dcd_add_chain_work(void *_mds)
> +{
> + struct cxl_memdev_state *mds = _mds;
> +
> + cancel_delayed_work_sync(&mds->add_ctx.timeout_work);
> +}
> +
> static int add_to_pending_list(struct list_head *pending_list,
> struct cxl_extent *to_add)
> {
> + struct pending_add_ctx *ctx =
> + container_of(pending_list, struct pending_add_ctx, pending_extents);
> struct cxl_extent_list_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
> struct cxl_extent *extent;
>
> + lockdep_assert_held(&ctx->lock);
> +
> if (!node)
> return -ENOMEM;
> extent = kmemdup(to_add, sizeof(*extent), GFP_KERNEL);
> @@ -1227,6 +1290,7 @@ static int add_to_pending_list(struct list_head *pending_list,
>
> node->extent = extent;
> list_add_tail(&node->list, pending_list);
> + ctx->nr_pending++;
> return 0;
> }
>
> @@ -1239,10 +1303,20 @@ static int add_to_pending_list(struct list_head *pending_list,
> static int handle_add_event(struct cxl_memdev_state *mds,
> struct cxl_event_dcd *event)
> {
> + struct pending_add_ctx *ctx = &mds->add_ctx;
> struct device *dev = mds->cxlds.dev;
> int rc;
>
> - rc = add_to_pending_list(&mds->add_ctx.pending_extents, &event->extent);
> + guard(mutex)(&ctx->lock);
> +
> + if (ctx->nr_pending >= CXL_DC_MAX_PENDING_EXTENTS) {
> + dev_warn(dev, "DC add chain exceeds %u extents; dropping (firmware bug)\n",
> + CXL_DC_MAX_PENDING_EXTENTS);
> + clear_pending_extents(mds);
> + return -ENOSPC;
> + }
> +
> + rc = add_to_pending_list(&ctx->pending_extents, &event->extent);
> if (rc) {
> clear_pending_extents(mds);
> return rc;
> @@ -1250,9 +1324,19 @@ static int handle_add_event(struct cxl_memdev_state *mds,
>
> if (event->flags & CXL_DCD_EVENT_MORE) {
> dev_dbg(dev, "more bit set; delay the surfacing of extent\n");
> + mod_delayed_work(system_wq, &ctx->timeout_work,
> + CXL_DC_ADD_TIMEOUT);
> + ctx->armed = true;
> return 0;
> }
>
> + /*
> + * Chain is closing. Disarm before flushing so a pending watchdog
> + * (queued but blocked on @ctx->lock) sees !armed and bails out.
> + */
> + ctx->armed = false;
> + cancel_delayed_work(&ctx->timeout_work);
> +
> rc = cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
> &mds->add_ctx.pending_extents, 0);
> clear_pending_extents(mds);
> @@ -2036,11 +2120,23 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>
> mutex_init(&mds->event.log_lock);
> INIT_LIST_HEAD(&mds->add_ctx.pending_extents);
> + mutex_init(&mds->add_ctx.lock);
> + INIT_DELAYED_WORK(&mds->add_ctx.timeout_work,
> + cxl_dc_add_timeout);
>
> rc = devm_add_action_or_reset(dev, clear_pending_extents, mds);
> if (rc)
> return ERR_PTR(rc);
>
> + /*
> + * Registered after clear_pending_extents so devm's reverse-order
> + * unwind cancels (and waits for) the watchdog first, then the list
> + * cleanup runs with the watchdog guaranteed not to refire.
> + */
> + rc = devm_add_action_or_reset(dev, cxl_cancel_dcd_add_chain_work, mds);
> + if (rc)
> + return ERR_PTR(rc);
> +
> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
> if (rc == -EOPNOTSUPP)
> dev_warn(dev, "CXL MCE unsupported\n");
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 4ffa7bd1e5f1..81498d47f309 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -8,6 +8,8 @@
> #include <linux/uuid.h>
> #include <linux/node.h>
> #include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/workqueue.h>
> #include <cxl/event.h>
> #include <cxl/mailbox.h>
> #include "cxl.h"
> @@ -407,19 +409,33 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
>
> /**
> * struct pending_add_ctx - Staging state for an in-progress
> - * DCD_ADD_CAPACITY event chain
> + * DCD_ADD_CAPACITY event chain
> * @pending_extents: extents received so far in the chain; flushed when
> - * the chain closes (More=0)
> + * the chain closes (More=0)
> * @group: tag group being assembled from the chain
> + * @timeout_work: watchdog that fires if a chain is opened with
> + * CXL_DCD_EVENT_MORE but the closing record never arrives
> + * @lock: serialises updates to the chain state against the watchdog
> + * @armed: set when a More=1 chain opens; cleared when the chain closes,
> + * either by a More=0 event record or by the watchdog firing.
> *
> * A DCD_ADD_CAPACITY notification can span multiple event records
> * stitched together by the CXL_DCD_EVENT_MORE flag. Records are staged
> - * here until the device clears More, at which point the staged batch is
> - * processed and responded to as a single Add_DC_Response.
> + * here until an event record with 'More'=0 is received, at which point the
> + * staged batch is processed and responded to as a single Add_DC_Response.
> + *
> + * If a chain is opened (More=1) but the device never sends the closing
> + * record, the staged list would otherwise sit indefinitely. @timeout_work
> + * is a defensive watchdog that refuses such a chain with an empty response
> + * and drops the staged list.
> */
> struct pending_add_ctx {
> struct list_head pending_extents;
> struct cxl_dc_tag_group *group;
> + struct delayed_work timeout_work;
> + struct mutex lock;
> + unsigned int nr_pending;
Missing kdoc in comment section
> + bool armed;
> };
>
> /**
next prev parent reply other threads:[~2026-06-30 21:11 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 11:04 DCD: Add support for Dynamic Capacity Devices (DCD) Anisa Su
2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
2026-06-26 21:43 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 02/31] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 22:26 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 04/31] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-06-26 22:37 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 05/31] cxl/mem: Expose dynamic ram 1 partition in sysfs Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 23:08 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 06/31] cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode Anisa Su
2026-06-25 11:04 ` [PATCH v11 07/31] cxl/region: Add DC DAX region support Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-26 23:18 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 08/31] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-06-25 11:04 ` [PATCH v11 09/31] cxl/pci: Factor out interrupt policy check Anisa Su
2026-06-25 11:04 ` [PATCH v11 10/31] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-06-25 18:14 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 11/31] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-06-25 11:04 ` [PATCH v11 12/31] cxl/mem: Set up framework for handling DC Events Anisa Su
2026-06-25 18:12 ` sashiko-bot
2026-06-26 21:54 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains Anisa Su
2026-06-25 18:15 ` sashiko-bot
2026-06-30 21:11 ` Dave Jiang [this message]
2026-06-25 11:04 ` [PATCH v11 14/31] cxl/extent: Handle DC Add Capacity events Anisa Su
2026-06-25 18:16 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 15/31] cxl/mem: Drop misaligned DCD extent groups Anisa Su
2026-06-25 18:19 ` sashiko-bot
2026-06-30 21:23 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 16/31] cxl/extent: Validate DC extent partition Anisa Su
2026-06-25 18:20 ` sashiko-bot
2026-06-30 22:49 ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 17/31] cxl/mem: Enforce tag-group semantics Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 18/31] cxl/extent: Handle DC Release Capacity events Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 19/31] cxl/extent: Enforce cross-region tag uniqueness Anisa Su
2026-06-25 18:23 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 20/31] cxl/region/extent: Expose dc_extent information in sysfs Anisa Su
2026-06-25 18:33 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 21/31] cxl + dax: Surface dax_resources on DCD Add Capacity events Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 22/31] cxl + dax: Release dax_resources on DCD Release " Anisa Su
2026-06-25 18:36 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 23/31] dax/bus: Factor out dev dax resize logic Anisa Su
2026-06-25 18:27 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 24/31] dax/bus: Add uuid sysfs attribute to dax devices Anisa Su
2026-06-30 23:21 ` Dave Jiang
2026-06-25 11:05 ` [PATCH v11 25/31] dax/bus: Reject resize on DC dax devices and enforce 0-size creation Anisa Su
2026-06-25 11:05 ` [PATCH v11 26/31] dax/bus: Tag-aware uuid claim and show on DC dax devices Anisa Su
2026-06-25 18:26 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 27/31] cxl/region: Read existing extents on region creation Anisa Su
2026-06-25 18:32 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 28/31] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-06-25 18:29 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 29/31] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-06-25 18:31 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 30/31] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-06-25 18:34 ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 31/31] Documentation/cxl: Document DCD extent handling and DC-backed DAX regions Anisa Su
2026-06-25 18:24 ` sashiko-bot
2026-06-25 18:00 ` [PATCH v11 03/31] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-06-26 22:30 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3fb3b2c2-9be6-483b-94bd-d0515d5df9b2@intel.com \
--to=dave.jiang@intel.com \
--cc=John@Groves.net \
--cc=alison.schofield@intel.com \
--cc=anisa.su887@gmail.com \
--cc=anisa.su@samsung.com \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=gourry@gourry.net \
--cc=iweiny@kernel.org \
--cc=jic23@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.