From: Dave Jiang <dave.jiang@intel.com>
To: Shuai Xue <xueshuai@linux.alibaba.com>,
vinicius.gomes@intel.com, fenghuay@nvidia.com, vkoul@kernel.org
Cc: dmaengine@vger.kernel.org, colin.i.king@gmail.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
Date: Thu, 22 May 2025 07:55:26 -0700 [thread overview]
Message-ID: <a03e4f97-2289-4af7-8bfc-ad2d38ec8677@intel.com> (raw)
In-Reply-To: <20250522063329.51156-2-xueshuai@linux.alibaba.com>
On 5/21/25 11:33 PM, Shuai Xue wrote:
> A device reset command disables all WQs in hardware. If issued while a WQ
> is being enabled, it can cause a mismatch between the software and hardware
> states.
>
> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
> send a reset command and clear the state (wq->state) of all WQs. It then
> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
> ensure consistency between the software and hardware states.
>
> However, a race condition exists between the WQ enable path and the
> reset/recovery path. For example:
>
> A: WQ enable path B: Reset and recovery path
> ------------------ ------------------------
> a1. issue IDXD_CMD_ENABLE_WQ
> b1. issue IDXD_CMD_RESET_DEVICE
> b2. clear wq->state
> b3. check wq_enable_map bit, not set
> a2. set wq->state = IDXD_WQ_ENABLED
> a3. set wq_enable_map
>
> In this case, b1 issues a reset command that disables all WQs in hardware.
> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
> leading to an inconsistency between wq->state (software) and the actual
> hardware state (IDXD_WQ_DISABLED).
Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
DJ
>
> To fix this, the WQ state should be set to IDXD_WQ_ENABLED before issuing
> the IDXD_CMD_ENABLE_WQ command. This ensures the software state aligns with
> the expected hardware behavior during resets:
>
> A: WQ enable path B: Reset and recovery path
> ------------------ ------------------------
> b1. issue IDXD_CMD_RESET_DEVICE
> b2. clear wq->state
> a1. set wq->state = IDXD_WQ_ENABLED
> a2. set wq_enable_map
> b3. check wq_enable_map bit, true
> b4. check wq state, return
> a3. issue IDXD_CMD_ENABLE_WQ
>
> By updating the state early, the reset path can safely re-enable the WQ
> based on wq_enable_map.
>
> A corner case arises when both paths attempt to enable the same WQ:
>
> A: WQ enable path B: Reset and recovery path
> ------------------ ------------------------
> b1. issue IDXD_CMD_RESET_DEVICE
> b2. clear wq->state
> a1. set wq->state = IDXD_WQ_ENABLED
> a2. set wq_enable_map
> b3. check wq_enable_map bit
> b4. check wq state, not reset
> b5. issue IDXD_CMD_ENABLE_WQ
> a3. issue IDXD_CMD_ENABLE_WQ
>
> Here, the command status (CMDSTS) returns IDXD_CMDSTS_ERR_WQ_ENABLED,
> but the driver treats it as success (IDXD_CMDSTS_SUCCESS), ensuring the WQ
> remains enabled.
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
> drivers/dma/idxd/device.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> index 5cf419fe6b46..b424c3c8f359 100644
> --- a/drivers/dma/idxd/device.c
> +++ b/drivers/dma/idxd/device.c
> @@ -188,16 +188,32 @@ int idxd_wq_enable(struct idxd_wq *wq)
> return 0;
> }
>
> + /*
> + * A device reset command disables all WQs in hardware. If issued
> + * while a WQ is being enabled, it can cause a mismatch between the
> + * software and hardware states.
> + *
> + * When a hardware error occurs, the IDXD driver calls
> + * idxd_device_reset() to send a reset command and clear the state
> + * (wq->state) of all WQs. It then uses wq_enable_map to re-enable
> + * them and ensure consistency between the software and hardware states.
> + *
> + * To avoid inconsistency between software and hardware states,
> + * issue the IDXD_CMD_ENABLE_WQ command as the final step.
> + */
> + wq->state = IDXD_WQ_ENABLED;
> + set_bit(wq->id, idxd->wq_enable_map);
> +
> idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status);
>
> if (status != IDXD_CMDSTS_SUCCESS &&
> status != IDXD_CMDSTS_ERR_WQ_ENABLED) {
> + clear_bit(wq->id, idxd->wq_enable_map);
> + wq->state = IDXD_WQ_DISABLED;
> dev_dbg(dev, "WQ enable failed: %#x\n", status);
> return -ENXIO;
> }
>
> - wq->state = IDXD_WQ_ENABLED;
> - set_bit(wq->id, idxd->wq_enable_map);
> dev_dbg(dev, "WQ %d enabled\n", wq->id);
> return 0;
> }
next prev parent reply other threads:[~2025-05-22 14:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 6:33 [PATCH v1 0/2] dmaengine: idxd: minor fixes for idxd driver Shuai Xue
2025-05-22 6:33 ` [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths Shuai Xue
2025-05-22 14:55 ` Dave Jiang [this message]
2025-05-23 5:20 ` Shuai Xue
2025-05-23 14:54 ` Dave Jiang
2025-05-24 5:40 ` Shuai Xue
2025-05-28 2:21 ` Vinicius Costa Gomes
2025-06-03 14:32 ` Dave Jiang
2025-06-04 8:55 ` Shuai Xue
2025-06-04 14:19 ` Dave Jiang
2025-06-05 7:40 ` Shuai Xue
2025-06-06 14:32 ` Dave Jiang
2025-06-16 1:54 ` Shuai Xue
2025-06-16 15:22 ` Dave Jiang
2025-05-22 6:33 ` [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path Shuai Xue
2025-05-22 14:56 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a03e4f97-2289-4af7-8bfc-ad2d38ec8677@intel.com \
--to=dave.jiang@intel.com \
--cc=colin.i.king@gmail.com \
--cc=dmaengine@vger.kernel.org \
--cc=fenghuay@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=vinicius.gomes@intel.com \
--cc=vkoul@kernel.org \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.