linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] dmaengine: idxd: minor fixes for idxd driver
@ 2025-05-22  6:33 Shuai Xue
  2025-05-22  6:33 ` [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths Shuai Xue
  2025-05-22  6:33 ` [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path Shuai Xue
  0 siblings, 2 replies; 16+ messages in thread
From: Shuai Xue @ 2025-05-22  6:33 UTC (permalink / raw)
  To: vinicius.gomes, dave.jiang, fenghuay, vkoul
  Cc: xueshuai, dmaengine, colin.i.king, linux-kernel

Add two minor fixes for idxd driver.

Shuai Xue (2):
  dmaengine: idxd: Fix race condition between WQ enable and reset paths
  dmaengine: idxd: fix potential NULL pointer dereference on wq setup
    error path

 drivers/dma/idxd/device.c | 20 ++++++++++++++++++--
 drivers/dma/idxd/init.c   |  3 ++-
 2 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-22  6:33 [PATCH v1 0/2] dmaengine: idxd: minor fixes for idxd driver Shuai Xue
@ 2025-05-22  6:33 ` Shuai Xue
  2025-05-22 14:55   ` Dave Jiang
  2025-05-22  6:33 ` [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path Shuai Xue
  1 sibling, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-05-22  6:33 UTC (permalink / raw)
  To: vinicius.gomes, dave.jiang, fenghuay, vkoul
  Cc: xueshuai, dmaengine, colin.i.king, linux-kernel

A device reset command disables all WQs in hardware. If issued while a WQ
is being enabled, it can cause a mismatch between the software and hardware
states.

When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
send a reset command and clear the state (wq->state) of all WQs. It then
uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
ensure consistency between the software and hardware states.

However, a race condition exists between the WQ enable path and the
reset/recovery path. For example:

A: WQ enable path                   B: Reset and recovery path
------------------                 ------------------------
a1. issue IDXD_CMD_ENABLE_WQ
                                   b1. issue IDXD_CMD_RESET_DEVICE
                                   b2. clear wq->state
                                   b3. check wq_enable_map bit, not set
a2. set wq->state = IDXD_WQ_ENABLED
a3. set wq_enable_map

In this case, b1 issues a reset command that disables all WQs in hardware.
Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
leading to an inconsistency between wq->state (software) and the actual
hardware state (IDXD_WQ_DISABLED).

To fix this, the WQ state should be set to IDXD_WQ_ENABLED before issuing
the IDXD_CMD_ENABLE_WQ command. This ensures the software state aligns with
the expected hardware behavior during resets:

A: WQ enable path                   B: Reset and recovery path
------------------                 ------------------------
                                   b1. issue IDXD_CMD_RESET_DEVICE
                                   b2. clear wq->state
a1. set wq->state = IDXD_WQ_ENABLED
a2. set wq_enable_map
                                   b3. check wq_enable_map bit, true
                                   b4. check wq state, return
a3. issue IDXD_CMD_ENABLE_WQ

By updating the state early, the reset path can safely re-enable the WQ
based on wq_enable_map.

A corner case arises when both paths attempt to enable the same WQ:

A: WQ enable path                   B: Reset and recovery path
------------------                 ------------------------
                                   b1. issue IDXD_CMD_RESET_DEVICE
                                   b2. clear wq->state
a1. set wq->state = IDXD_WQ_ENABLED
a2. set wq_enable_map
                                   b3. check wq_enable_map bit
                                   b4. check wq state, not reset
                                   b5. issue IDXD_CMD_ENABLE_WQ
a3. issue IDXD_CMD_ENABLE_WQ

Here, the command status (CMDSTS) returns IDXD_CMDSTS_ERR_WQ_ENABLED,
but the driver treats it as success (IDXD_CMDSTS_SUCCESS), ensuring the WQ
remains enabled.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/dma/idxd/device.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 5cf419fe6b46..b424c3c8f359 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -188,16 +188,32 @@ int idxd_wq_enable(struct idxd_wq *wq)
 		return 0;
 	}
 
+	/*
+	 * A device reset command disables all WQs in hardware. If issued
+	 * while a WQ is being enabled, it can cause a mismatch between the
+	 * software and hardware states.
+	 *
+	 * When a hardware error occurs, the IDXD driver calls
+	 * idxd_device_reset() to send a reset command and clear the state
+	 * (wq->state) of all WQs. It then uses wq_enable_map to re-enable
+	 * them and ensure consistency between the software and hardware states.
+	 *
+	 * To avoid inconsistency between software and hardware states,
+	 * issue the IDXD_CMD_ENABLE_WQ command as the final step.
+	 */
+	wq->state = IDXD_WQ_ENABLED;
+	set_bit(wq->id, idxd->wq_enable_map);
+
 	idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status);
 
 	if (status != IDXD_CMDSTS_SUCCESS &&
 	    status != IDXD_CMDSTS_ERR_WQ_ENABLED) {
+		clear_bit(wq->id, idxd->wq_enable_map);
+		wq->state = IDXD_WQ_DISABLED;
 		dev_dbg(dev, "WQ enable failed: %#x\n", status);
 		return -ENXIO;
 	}
 
-	wq->state = IDXD_WQ_ENABLED;
-	set_bit(wq->id, idxd->wq_enable_map);
 	dev_dbg(dev, "WQ %d enabled\n", wq->id);
 	return 0;
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path
  2025-05-22  6:33 [PATCH v1 0/2] dmaengine: idxd: minor fixes for idxd driver Shuai Xue
  2025-05-22  6:33 ` [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths Shuai Xue
@ 2025-05-22  6:33 ` Shuai Xue
  2025-05-22 14:56   ` Dave Jiang
  1 sibling, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-05-22  6:33 UTC (permalink / raw)
  To: vinicius.gomes, dave.jiang, fenghuay, vkoul
  Cc: xueshuai, dmaengine, colin.i.king, linux-kernel

If wq allocation fails during the initial iteration of the loop,
`conf_dev` is still NULL.  However, the existing error handling via
`goto err` would attempt to call `put_device(conf_dev)`, leading to a
NULL pointer dereference.

This issue occurs because there's no dedicated error label for the WQ
allocation failure case, causing it to fall through to an incorrect
error path.

Fix this by introducing a new error label `err_wq`, ensuring that we
only invoke `put_device(conf_dev)` when `conf_dev` is valid.

Fixes: 3fd2f4bc010c ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
Cc: stable@vger.kernel.org
Reported-by: Colin King <colin.i.king@gmail.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/dma/idxd/init.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 760b7d81fcd8..bf57ad30d613 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -196,7 +196,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
 		wq = kzalloc_node(sizeof(*wq), GFP_KERNEL, dev_to_node(dev));
 		if (!wq) {
 			rc = -ENOMEM;
-			goto err;
+			goto err_wq;
 		}
 
 		idxd_dev_set_type(&wq->idxd_dev, IDXD_DEV_WQ);
@@ -246,6 +246,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
 	put_device(conf_dev);
 	kfree(wq);
 
+err_wq:
 	while (--i >= 0) {
 		wq = idxd->wqs[i];
 		if (idxd->hw.wq_cap.op_config)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-22  6:33 ` [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths Shuai Xue
@ 2025-05-22 14:55   ` Dave Jiang
  2025-05-23  5:20     ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-05-22 14:55 UTC (permalink / raw)
  To: Shuai Xue, vinicius.gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 5/21/25 11:33 PM, Shuai Xue wrote:
> A device reset command disables all WQs in hardware. If issued while a WQ
> is being enabled, it can cause a mismatch between the software and hardware
> states.
> 
> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
> send a reset command and clear the state (wq->state) of all WQs. It then
> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
> ensure consistency between the software and hardware states.
> 
> However, a race condition exists between the WQ enable path and the
> reset/recovery path. For example:
> 
> A: WQ enable path                   B: Reset and recovery path
> ------------------                 ------------------------
> a1. issue IDXD_CMD_ENABLE_WQ
>                                    b1. issue IDXD_CMD_RESET_DEVICE
>                                    b2. clear wq->state
>                                    b3. check wq_enable_map bit, not set
> a2. set wq->state = IDXD_WQ_ENABLED
> a3. set wq_enable_map
> 
> In this case, b1 issues a reset command that disables all WQs in hardware.
> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
> leading to an inconsistency between wq->state (software) and the actual
> hardware state (IDXD_WQ_DISABLED).


Would it lessen the complication to just have wq enable path grab the device lock before proceeding?

DJ

> 
> To fix this, the WQ state should be set to IDXD_WQ_ENABLED before issuing
> the IDXD_CMD_ENABLE_WQ command. This ensures the software state aligns with
> the expected hardware behavior during resets:
> 
> A: WQ enable path                   B: Reset and recovery path
> ------------------                 ------------------------
>                                    b1. issue IDXD_CMD_RESET_DEVICE
>                                    b2. clear wq->state
> a1. set wq->state = IDXD_WQ_ENABLED
> a2. set wq_enable_map
>                                    b3. check wq_enable_map bit, true
>                                    b4. check wq state, return
> a3. issue IDXD_CMD_ENABLE_WQ
> 
> By updating the state early, the reset path can safely re-enable the WQ
> based on wq_enable_map.
> 
> A corner case arises when both paths attempt to enable the same WQ:
> 
> A: WQ enable path                   B: Reset and recovery path
> ------------------                 ------------------------
>                                    b1. issue IDXD_CMD_RESET_DEVICE
>                                    b2. clear wq->state
> a1. set wq->state = IDXD_WQ_ENABLED
> a2. set wq_enable_map
>                                    b3. check wq_enable_map bit
>                                    b4. check wq state, not reset
>                                    b5. issue IDXD_CMD_ENABLE_WQ
> a3. issue IDXD_CMD_ENABLE_WQ
> 
> Here, the command status (CMDSTS) returns IDXD_CMDSTS_ERR_WQ_ENABLED,
> but the driver treats it as success (IDXD_CMDSTS_SUCCESS), ensuring the WQ
> remains enabled.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
>  drivers/dma/idxd/device.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> index 5cf419fe6b46..b424c3c8f359 100644
> --- a/drivers/dma/idxd/device.c
> +++ b/drivers/dma/idxd/device.c
> @@ -188,16 +188,32 @@ int idxd_wq_enable(struct idxd_wq *wq)
>  		return 0;
>  	}
>  
> +	/*
> +	 * A device reset command disables all WQs in hardware. If issued
> +	 * while a WQ is being enabled, it can cause a mismatch between the
> +	 * software and hardware states.
> +	 *
> +	 * When a hardware error occurs, the IDXD driver calls
> +	 * idxd_device_reset() to send a reset command and clear the state
> +	 * (wq->state) of all WQs. It then uses wq_enable_map to re-enable
> +	 * them and ensure consistency between the software and hardware states.
> +	 *
> +	 * To avoid inconsistency between software and hardware states,
> +	 * issue the IDXD_CMD_ENABLE_WQ command as the final step.
> +	 */
> +	wq->state = IDXD_WQ_ENABLED;
> +	set_bit(wq->id, idxd->wq_enable_map);
> +
>  	idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status);
>  
>  	if (status != IDXD_CMDSTS_SUCCESS &&
>  	    status != IDXD_CMDSTS_ERR_WQ_ENABLED) {
> +		clear_bit(wq->id, idxd->wq_enable_map);
> +		wq->state = IDXD_WQ_DISABLED;
>  		dev_dbg(dev, "WQ enable failed: %#x\n", status);
>  		return -ENXIO;
>  	}
>  
> -	wq->state = IDXD_WQ_ENABLED;
> -	set_bit(wq->id, idxd->wq_enable_map);
>  	dev_dbg(dev, "WQ %d enabled\n", wq->id);
>  	return 0;
>  }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path
  2025-05-22  6:33 ` [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path Shuai Xue
@ 2025-05-22 14:56   ` Dave Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2025-05-22 14:56 UTC (permalink / raw)
  To: Shuai Xue, vinicius.gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 5/21/25 11:33 PM, Shuai Xue wrote:
> If wq allocation fails during the initial iteration of the loop,
> `conf_dev` is still NULL.  However, the existing error handling via
> `goto err` would attempt to call `put_device(conf_dev)`, leading to a
> NULL pointer dereference.
> 
> This issue occurs because there's no dedicated error label for the WQ
> allocation failure case, causing it to fall through to an incorrect
> error path.
> 
> Fix this by introducing a new error label `err_wq`, ensuring that we
> only invoke `put_device(conf_dev)` when `conf_dev` is valid.
> 
> Fixes: 3fd2f4bc010c ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
> Cc: stable@vger.kernel.org
> Reported-by: Colin King <colin.i.king@gmail.com>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
>  drivers/dma/idxd/init.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 760b7d81fcd8..bf57ad30d613 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -196,7 +196,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
>  		wq = kzalloc_node(sizeof(*wq), GFP_KERNEL, dev_to_node(dev));
>  		if (!wq) {
>  			rc = -ENOMEM;
> -			goto err;
> +			goto err_wq;
>  		}
>  
>  		idxd_dev_set_type(&wq->idxd_dev, IDXD_DEV_WQ);
> @@ -246,6 +246,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
>  	put_device(conf_dev);
>  	kfree(wq);
>  
> +err_wq:
>  	while (--i >= 0) {
>  		wq = idxd->wqs[i];
>  		if (idxd->hw.wq_cap.op_config)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-22 14:55   ` Dave Jiang
@ 2025-05-23  5:20     ` Shuai Xue
  2025-05-23 14:54       ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-05-23  5:20 UTC (permalink / raw)
  To: Dave Jiang, vinicius.gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



在 2025/5/22 22:55, Dave Jiang 写道:
> 
> 
> On 5/21/25 11:33 PM, Shuai Xue wrote:
>> A device reset command disables all WQs in hardware. If issued while a WQ
>> is being enabled, it can cause a mismatch between the software and hardware
>> states.
>>
>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>> send a reset command and clear the state (wq->state) of all WQs. It then
>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>> ensure consistency between the software and hardware states.
>>
>> However, a race condition exists between the WQ enable path and the
>> reset/recovery path. For example:
>>
>> A: WQ enable path                   B: Reset and recovery path
>> ------------------                 ------------------------
>> a1. issue IDXD_CMD_ENABLE_WQ
>>                                     b1. issue IDXD_CMD_RESET_DEVICE
>>                                     b2. clear wq->state
>>                                     b3. check wq_enable_map bit, not set
>> a2. set wq->state = IDXD_WQ_ENABLED
>> a3. set wq_enable_map
>>
>> In this case, b1 issues a reset command that disables all WQs in hardware.
>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>> leading to an inconsistency between wq->state (software) and the actual
>> hardware state (IDXD_WQ_DISABLED).
> 
> 
> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
> 
> DJ

Yep, how about add a spin lock to enable wq and reset device path.

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 38633ec5b60e..c0dc904b2a94 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
  }
  EXPORT_SYMBOL_GPL(idxd_wq_enable);
  
+/*
+ * This function enables a WQ in hareware and updates the driver maintained
+ * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
+ * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
+ * otherwise disable the WQ without the driver's state being properly
+ * updated.
+ *
+ * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
+ * called after the WQ has been explicitly disabled, so no concurrency
+ * issues arise.
+ */
+int idxd_wq_enable_locked(struct idxd_wq *wq)
+{
+       struct idxd_device *idxd = wq->idxd;
+       int ret;
+
+       spin_lock(&idxd->dev_lock);
+       ret = idxd_wq_enable_locked(wq);
+       spin_unlock(&idxd->dev_lock);
+
+       return ret;
+}
+
  int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
  {
         struct idxd_device *idxd = wq->idxd;
@@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
  
         __idxd_wq_set_pasid_locked(wq, pasid);
  
-       rc = idxd_wq_enable(wq);
+       rc = idxd_wq_enable_locked(wq);
         if (rc < 0)
                 return rc;
  
@@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
         iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
         spin_unlock(&idxd->dev_lock);
  
-       rc = idxd_wq_enable(wq);
+       rc = idxd_wq_enable_locked(wq);
         if (rc < 0)
                 return rc;
  
@@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
  
  void idxd_device_reset(struct idxd_device *idxd)
  {
+
+       spin_lock(&idxd->dev_lock);
         idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
+       spin_unlock(&idxd->dev_lock);
+

(The dev_lock should also apply to idxd_wq_enable(), I did not paste here)

Also, I found a new bug that idxd_device_config() is called without
hold idxd->dev_lock.

idxd_device_config() explictly asserts the hold of idxd->dev_lock.

+++ b/drivers/dma/idxd/irq.c
@@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
  {
         struct idxd_device *idxd = container_of(work, struct idxd_device, work);
         struct device *dev = &idxd->pdev->dev;
-       int rc, i;
+       int rc = 0, i;
  
         idxd_device_reset(idxd);
-       rc = idxd_device_config(idxd);
-       if (rc < 0)
+       spin_lock(&idxd->dev_lock);
+       if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
+               rc = idxd_device_config(idxd);
+       spin_unlock(&idxd->dev_lock);
+       if (rc < 0) {
+               dev_err(dev, "Reinit: %s config fails\n", dev_name(idxd_confdev(idxd)));
                 goto out;
+       }
  
         rc = idxd_device_enable(idxd);
         if (rc < 0)

Please correct me if I missed anything.

Thanks.
Shuai


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-23  5:20     ` Shuai Xue
@ 2025-05-23 14:54       ` Dave Jiang
  2025-05-24  5:40         ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-05-23 14:54 UTC (permalink / raw)
  To: Shuai Xue, vinicius.gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 5/22/25 10:20 PM, Shuai Xue wrote:
> 
> 
> 在 2025/5/22 22:55, Dave Jiang 写道:
>>
>>
>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>> is being enabled, it can cause a mismatch between the software and hardware
>>> states.
>>>
>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>> ensure consistency between the software and hardware states.
>>>
>>> However, a race condition exists between the WQ enable path and the
>>> reset/recovery path. For example:
>>>
>>> A: WQ enable path                   B: Reset and recovery path
>>> ------------------                 ------------------------
>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>                                     b1. issue IDXD_CMD_RESET_DEVICE
>>>                                     b2. clear wq->state
>>>                                     b3. check wq_enable_map bit, not set
>>> a2. set wq->state = IDXD_WQ_ENABLED
>>> a3. set wq_enable_map
>>>
>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>> leading to an inconsistency between wq->state (software) and the actual
>>> hardware state (IDXD_WQ_DISABLED).
>>
>>
>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>
>> DJ
> 
> Yep, how about add a spin lock to enable wq and reset device path.
> 
> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> index 38633ec5b60e..c0dc904b2a94 100644
> --- a/drivers/dma/idxd/device.c
> +++ b/drivers/dma/idxd/device.c
> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>  }
>  EXPORT_SYMBOL_GPL(idxd_wq_enable);
>  
> +/*
> + * This function enables a WQ in hareware and updates the driver maintained
> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
> + * otherwise disable the WQ without the driver's state being properly
> + * updated.
> + *
> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
> + * called after the WQ has been explicitly disabled, so no concurrency
> + * issues arise.
> + */
> +int idxd_wq_enable_locked(struct idxd_wq *wq)
> +{
> +       struct idxd_device *idxd = wq->idxd;
> +       int ret;
> +
> +       spin_lock(&idxd->dev_lock);

Let's start using the new cleanup macro going forward:
guard(spinlock)(&idxd->dev_lock);

On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
guard(device)(idxd_confdev(idxd));

And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
guard(device)(wq_confdev(wq));

If you are up for it that is. 


> +       ret = idxd_wq_enable_locked(wq);
> +       spin_unlock(&idxd->dev_lock);
> +
> +       return ret;
> +}
> +
>  int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
>  {
>         struct idxd_device *idxd = wq->idxd;
> @@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
>  
>         __idxd_wq_set_pasid_locked(wq, pasid);
>  
> -       rc = idxd_wq_enable(wq);
> +       rc = idxd_wq_enable_locked(wq);
>         if (rc < 0)
>                 return rc;
>  
> @@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
>         iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
>         spin_unlock(&idxd->dev_lock);
>  
> -       rc = idxd_wq_enable(wq);
> +       rc = idxd_wq_enable_locked(wq);
>         if (rc < 0)
>                 return rc;
>  
> @@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
>  
>  void idxd_device_reset(struct idxd_device *idxd)
>  {
> +
> +       spin_lock(&idxd->dev_lock);
>         idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
> +       spin_unlock(&idxd->dev_lock);
> +
> 

I think you just need the wq_enable locked and also in idxd_device_clear_state(), extend the lock to the whole function. Locking the reset function around just the command execute won't protect the wq enable path against the changing of the software states on the reset side. 

DJ

> (The dev_lock should also apply to idxd_wq_enable(), I did not paste here)
> 
> Also, I found a new bug that idxd_device_config() is called without
> hold idxd->dev_lock.
> > idxd_device_config() explictly asserts the hold of idxd->dev_lock.
> 
> +++ b/drivers/dma/idxd/irq.c
> @@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
>  {
>         struct idxd_device *idxd = container_of(work, struct idxd_device, work);
>         struct device *dev = &idxd->pdev->dev;
> -       int rc, i;
> +       int rc = 0, i;
>  
>         idxd_device_reset(idxd);
> -       rc = idxd_device_config(idxd);
> -       if (rc < 0)
> +       spin_lock(&idxd->dev_lock);
I wonder if you should also just lock the idxd_device_reset() and the idxd_device_enable() as well in this case as you don't anything to interfere with the entire reinit path. 

> +       if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags))
> +               rc = idxd_device_config(idxd);
> +       spin_unlock(&idxd->dev_lock);
> +       if (rc < 0) {
> +               dev_err(dev, "Reinit: %s config fails\n", dev_name(idxd_confdev(idxd)));
>                 goto out;
> +       }
>  
>         rc = idxd_device_enable(idxd);
>         if (rc < 0)
> 
> Please correct me if I missed anything.
> 
> Thanks.
> Shuai
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-23 14:54       ` Dave Jiang
@ 2025-05-24  5:40         ` Shuai Xue
  2025-05-28  2:21           ` Vinicius Costa Gomes
  0 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-05-24  5:40 UTC (permalink / raw)
  To: Dave Jiang, vinicius.gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



在 2025/5/23 22:54, Dave Jiang 写道:
> 
> 
> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>
>>
>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>
>>>
>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>> states.
>>>>
>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>> ensure consistency between the software and hardware states.
>>>>
>>>> However, a race condition exists between the WQ enable path and the
>>>> reset/recovery path. For example:
>>>>
>>>> A: WQ enable path                   B: Reset and recovery path
>>>> ------------------                 ------------------------
>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>                                      b1. issue IDXD_CMD_RESET_DEVICE
>>>>                                      b2. clear wq->state
>>>>                                      b3. check wq_enable_map bit, not set
>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>> a3. set wq_enable_map
>>>>
>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>> leading to an inconsistency between wq->state (software) and the actual
>>>> hardware state (IDXD_WQ_DISABLED).
>>>
>>>
>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>
>>> DJ
>>
>> Yep, how about add a spin lock to enable wq and reset device path.
>>
>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>> index 38633ec5b60e..c0dc904b2a94 100644
>> --- a/drivers/dma/idxd/device.c
>> +++ b/drivers/dma/idxd/device.c
>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>   }
>>   EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>   
>> +/*
>> + * This function enables a WQ in hareware and updates the driver maintained
>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>> + * otherwise disable the WQ without the driver's state being properly
>> + * updated.
>> + *
>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>> + * called after the WQ has been explicitly disabled, so no concurrency
>> + * issues arise.
>> + */
>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>> +{
>> +       struct idxd_device *idxd = wq->idxd;
>> +       int ret;
>> +
>> +       spin_lock(&idxd->dev_lock);
> 
> Let's start using the new cleanup macro going forward:
> guard(spinlock)(&idxd->dev_lock);
> 
> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
> guard(device)(idxd_confdev(idxd));

Then we need to replace the lock from spinlock to mutex lock?

> 
> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
> guard(device)(wq_confdev(wq));
> 
> If you are up for it that is.

We creates a hierarchy: pdev -> idxd device -> wq device.
idxd_confdev(idxd) is the parent of wq_confdev(wq) because:

     (wq_confdev(wq))->parent = idxd_confdev(idxd);

Is it safe to grap lock of idxd_confdev(idxd) under hold
lock of wq_confdev(wq)?

We have mounts of code use spinlock of idxd->dev_lock under
hold of wq->wq_lock.

> 
> 
>> +       ret = idxd_wq_enable_locked(wq);
>> +       spin_unlock(&idxd->dev_lock);
>> +
>> +       return ret;
>> +}
>> +
>>   int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
>>   {
>>          struct idxd_device *idxd = wq->idxd;
>> @@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
>>   
>>          __idxd_wq_set_pasid_locked(wq, pasid);
>>   
>> -       rc = idxd_wq_enable(wq);
>> +       rc = idxd_wq_enable_locked(wq);
>>          if (rc < 0)
>>                  return rc;
>>   
>> @@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
>>          iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
>>          spin_unlock(&idxd->dev_lock);
>>   
>> -       rc = idxd_wq_enable(wq);
>> +       rc = idxd_wq_enable_locked(wq);
>>          if (rc < 0)
>>                  return rc;
>>   
>> @@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
>>   
>>   void idxd_device_reset(struct idxd_device *idxd)
>>   {
>> +
>> +       spin_lock(&idxd->dev_lock);
>>          idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
>> +       spin_unlock(&idxd->dev_lock);
>> +
>>
> 
> I think you just need the wq_enable locked and also in idxd_device_clear_state(), extend the lock to the whole function. Locking the reset function around just the command execute won't protect the wq enable path against the changing of the software states on the reset side.

Quite agreed.

> 
> DJ
> 
>> (The dev_lock should also apply to idxd_wq_enable(), I did not paste here)
>>
>> Also, I found a new bug that idxd_device_config() is called without
>> hold idxd->dev_lock.
>>> idxd_device_config() explictly asserts the hold of idxd->dev_lock.
>>
>> +++ b/drivers/dma/idxd/irq.c
>> @@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
>>   {
>>          struct idxd_device *idxd = container_of(work, struct idxd_device, work);
>>          struct device *dev = &idxd->pdev->dev;
>> -       int rc, i;
>> +       int rc = 0, i;
>>   
>>          idxd_device_reset(idxd);
>> -       rc = idxd_device_config(idxd);
>> -       if (rc < 0)
>> +       spin_lock(&idxd->dev_lock);
> I wonder if you should also just lock the idxd_device_reset() and the idxd_device_enable() as well in this case as you don't anything to interfere with the entire reinit path.

During reset, any operation to enable wq should indeed be avoided,
but there isn't a suitable lock currently. idxd->dev_lock is a
lightweight lock, only used when updating the device state, and
it's used while holding wq->wq_lock. Therefore, holding idxd->dev_lock
currently cannot form mutual exclusion with wq->wq_lock.

And the sub caller of idxd_device_reset(), e.g. idxd_device_clear_state()
also spins to hold idxd->dev_lock.

A hack way it to grab wq_lock of all wqs before before reinit, but
this is hardly elegant (:

Thanks.
Have a nice holiday!

Best regards,
Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-24  5:40         ` Shuai Xue
@ 2025-05-28  2:21           ` Vinicius Costa Gomes
  2025-06-03 14:32             ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Vinicius Costa Gomes @ 2025-05-28  2:21 UTC (permalink / raw)
  To: Shuai Xue, Dave Jiang, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel

Shuai Xue <xueshuai@linux.alibaba.com> writes:

> 在 2025/5/23 22:54, Dave Jiang 写道:
>> 
>> 
>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>
>>>
>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>
>>>>
>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>> states.
>>>>>
>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>> ensure consistency between the software and hardware states.
>>>>>
>>>>> However, a race condition exists between the WQ enable path and the
>>>>> reset/recovery path. For example:
>>>>>
>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>> ------------------                 ------------------------
>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>                                      b1. issue IDXD_CMD_RESET_DEVICE
>>>>>                                      b2. clear wq->state
>>>>>                                      b3. check wq_enable_map bit, not set
>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>> a3. set wq_enable_map
>>>>>
>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>
>>>>
>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>
>>>> DJ
>>>
>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>
>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>> index 38633ec5b60e..c0dc904b2a94 100644
>>> --- a/drivers/dma/idxd/device.c
>>> +++ b/drivers/dma/idxd/device.c
>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>   }
>>>   EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>   
>>> +/*
>>> + * This function enables a WQ in hareware and updates the driver maintained
>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>> + * otherwise disable the WQ without the driver's state being properly
>>> + * updated.
>>> + *
>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>> + * issues arise.
>>> + */
>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>> +{
>>> +       struct idxd_device *idxd = wq->idxd;
>>> +       int ret;
>>> +
>>> +       spin_lock(&idxd->dev_lock);
>> 
>> Let's start using the new cleanup macro going forward:
>> guard(spinlock)(&idxd->dev_lock);
>> 
>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>> guard(device)(idxd_confdev(idxd));
>
> Then we need to replace the lock from spinlock to mutex lock?

We still need a (spin) lock that we could hold in interrupt contexts.

>
>> 
>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>> guard(device)(wq_confdev(wq));
>> 
>> If you are up for it that is.
>
> We creates a hierarchy: pdev -> idxd device -> wq device.
> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>
>      (wq_confdev(wq))->parent = idxd_confdev(idxd);
>
> Is it safe to grap lock of idxd_confdev(idxd) under hold
> lock of wq_confdev(wq)?
>
> We have mounts of code use spinlock of idxd->dev_lock under
> hold of wq->wq_lock.
>

I agree with Dave that the locking could be simplified, but I don't
think that we should hold this series because of that. That
simplification can be done later.

>> 
>> 
>>> +       ret = idxd_wq_enable_locked(wq);
>>> +       spin_unlock(&idxd->dev_lock);
>>> +
>>> +       return ret;
>>> +}
>>> +
>>>   int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
>>>   {
>>>          struct idxd_device *idxd = wq->idxd;
>>> @@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
>>>   
>>>          __idxd_wq_set_pasid_locked(wq, pasid);
>>>   
>>> -       rc = idxd_wq_enable(wq);
>>> +       rc = idxd_wq_enable_locked(wq);
>>>          if (rc < 0)
>>>                  return rc;
>>>   
>>> @@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
>>>          iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
>>>          spin_unlock(&idxd->dev_lock);
>>>   
>>> -       rc = idxd_wq_enable(wq);
>>> +       rc = idxd_wq_enable_locked(wq);
>>>          if (rc < 0)
>>>                  return rc;
>>>   
>>> @@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
>>>   
>>>   void idxd_device_reset(struct idxd_device *idxd)
>>>   {
>>> +
>>> +       spin_lock(&idxd->dev_lock);
>>>          idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
>>> +       spin_unlock(&idxd->dev_lock);
>>> +
>>>
>> 
>> I think you just need the wq_enable locked and also in idxd_device_clear_state(), extend the lock to the whole function. Locking the reset function around just the command execute won't protect the wq enable path against the changing of the software states on the reset side.
>
> Quite agreed.
>
>> 
>> DJ
>> 
>>> (The dev_lock should also apply to idxd_wq_enable(), I did not paste here)
>>>
>>> Also, I found a new bug that idxd_device_config() is called without
>>> hold idxd->dev_lock.
>>>> idxd_device_config() explictly asserts the hold of idxd->dev_lock.
>>>
>>> +++ b/drivers/dma/idxd/irq.c
>>> @@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
>>>   {
>>>          struct idxd_device *idxd = container_of(work, struct idxd_device, work);
>>>          struct device *dev = &idxd->pdev->dev;
>>> -       int rc, i;
>>> +       int rc = 0, i;
>>>   
>>>          idxd_device_reset(idxd);
>>> -       rc = idxd_device_config(idxd);
>>> -       if (rc < 0)
>>> +       spin_lock(&idxd->dev_lock);
>> I wonder if you should also just lock the idxd_device_reset() and the idxd_device_enable() as well in this case as you don't anything to interfere with the entire reinit path.
>
> During reset, any operation to enable wq should indeed be avoided,
> but there isn't a suitable lock currently. idxd->dev_lock is a
> lightweight lock, only used when updating the device state, and
> it's used while holding wq->wq_lock. Therefore, holding idxd->dev_lock
> currently cannot form mutual exclusion with wq->wq_lock.
>
> And the sub caller of idxd_device_reset(), e.g. idxd_device_clear_state()
> also spins to hold idxd->dev_lock.
>
> A hack way it to grab wq_lock of all wqs before before reinit, but
> this is hardly elegant (:
>
> Thanks.
> Have a nice holiday!
>
> Best regards,
> Shuai
>


Cheers,
-- 
Vinicius

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-05-28  2:21           ` Vinicius Costa Gomes
@ 2025-06-03 14:32             ` Dave Jiang
  2025-06-04  8:55               ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-06-03 14:32 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Shuai Xue, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
> Shuai Xue <xueshuai@linux.alibaba.com> writes:
> 
>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>
>>>
>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>
>>>>
>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>
>>>>>
>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>> states.
>>>>>>
>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>> ensure consistency between the software and hardware states.
>>>>>>
>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>> reset/recovery path. For example:
>>>>>>
>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>> ------------------                 ------------------------
>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>                                      b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>                                      b2. clear wq->state
>>>>>>                                      b3. check wq_enable_map bit, not set
>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>> a3. set wq_enable_map
>>>>>>
>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>
>>>>>
>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>
>>>>> DJ
>>>>
>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>
>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>> --- a/drivers/dma/idxd/device.c
>>>> +++ b/drivers/dma/idxd/device.c
>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>   
>>>> +/*
>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>> + * otherwise disable the WQ without the driver's state being properly
>>>> + * updated.
>>>> + *
>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>> + * issues arise.
>>>> + */
>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>> +{
>>>> +       struct idxd_device *idxd = wq->idxd;
>>>> +       int ret;
>>>> +
>>>> +       spin_lock(&idxd->dev_lock);
>>>
>>> Let's start using the new cleanup macro going forward:
>>> guard(spinlock)(&idxd->dev_lock);
>>>
>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>> guard(device)(idxd_confdev(idxd));
>>
>> Then we need to replace the lock from spinlock to mutex lock?
> 
> We still need a (spin) lock that we could hold in interrupt contexts.
> 
>>
>>>
>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>> guard(device)(wq_confdev(wq));
>>>
>>> If you are up for it that is.
>>
>> We creates a hierarchy: pdev -> idxd device -> wq device.
>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>
>>      (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>
>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>> lock of wq_confdev(wq)?
>>
>> We have mounts of code use spinlock of idxd->dev_lock under
>> hold of wq->wq_lock.
>>
> 
> I agree with Dave that the locking could be simplified, but I don't
> think that we should hold this series because of that. That
> simplification can be done later.

I agree. Just passing musing on the current code.

> 
>>>
>>>
>>>> +       ret = idxd_wq_enable_locked(wq);
>>>> +       spin_unlock(&idxd->dev_lock);
>>>> +
>>>> +       return ret;
>>>> +}
>>>> +
>>>>   int idxd_wq_disable(struct idxd_wq *wq, bool reset_config)
>>>>   {
>>>>          struct idxd_device *idxd = wq->idxd;
>>>> @@ -330,7 +353,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid)
>>>>   
>>>>          __idxd_wq_set_pasid_locked(wq, pasid);
>>>>   
>>>> -       rc = idxd_wq_enable(wq);
>>>> +       rc = idxd_wq_enable_locked(wq);
>>>>          if (rc < 0)
>>>>                  return rc;
>>>>   
>>>> @@ -380,7 +403,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq)
>>>>          iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset);
>>>>          spin_unlock(&idxd->dev_lock);
>>>>   
>>>> -       rc = idxd_wq_enable(wq);
>>>> +       rc = idxd_wq_enable_locked(wq);
>>>>          if (rc < 0)
>>>>                  return rc;
>>>>   
>>>> @@ -644,7 +667,11 @@ int idxd_device_disable(struct idxd_device *idxd)
>>>>   
>>>>   void idxd_device_reset(struct idxd_device *idxd)
>>>>   {
>>>> +
>>>> +       spin_lock(&idxd->dev_lock);
>>>>          idxd_cmd_exec(idxd, IDXD_CMD_RESET_DEVICE, 0, NULL);
>>>> +       spin_unlock(&idxd->dev_lock);
>>>> +
>>>>
>>>
>>> I think you just need the wq_enable locked and also in idxd_device_clear_state(), extend the lock to the whole function. Locking the reset function around just the command execute won't protect the wq enable path against the changing of the software states on the reset side.
>>
>> Quite agreed.
>>
>>>
>>> DJ
>>>
>>>> (The dev_lock should also apply to idxd_wq_enable(), I did not paste here)
>>>>
>>>> Also, I found a new bug that idxd_device_config() is called without
>>>> hold idxd->dev_lock.
>>>>> idxd_device_config() explictly asserts the hold of idxd->dev_lock.
>>>>
>>>> +++ b/drivers/dma/idxd/irq.c
>>>> @@ -33,12 +33,17 @@ static void idxd_device_reinit(struct work_struct *work)
>>>>   {
>>>>          struct idxd_device *idxd = container_of(work, struct idxd_device, work);
>>>>          struct device *dev = &idxd->pdev->dev;
>>>> -       int rc, i;
>>>> +       int rc = 0, i;
>>>>   
>>>>          idxd_device_reset(idxd);
>>>> -       rc = idxd_device_config(idxd);
>>>> -       if (rc < 0)
>>>> +       spin_lock(&idxd->dev_lock);
>>> I wonder if you should also just lock the idxd_device_reset() and the idxd_device_enable() as well in this case as you don't anything to interfere with the entire reinit path.
>>
>> During reset, any operation to enable wq should indeed be avoided,
>> but there isn't a suitable lock currently. idxd->dev_lock is a
>> lightweight lock, only used when updating the device state, and
>> it's used while holding wq->wq_lock. Therefore, holding idxd->dev_lock
>> currently cannot form mutual exclusion with wq->wq_lock.
>>
>> And the sub caller of idxd_device_reset(), e.g. idxd_device_clear_state()
>> also spins to hold idxd->dev_lock.
>>
>> A hack way it to grab wq_lock of all wqs before before reinit, but
>> this is hardly elegant (:
>>
>> Thanks.
>> Have a nice holiday!
>>
>> Best regards,
>> Shuai
>>
> 
> 
> Cheers,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-03 14:32             ` Dave Jiang
@ 2025-06-04  8:55               ` Shuai Xue
  2025-06-04 14:19                 ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-06-04  8:55 UTC (permalink / raw)
  To: Dave Jiang, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



在 2025/6/3 22:32, Dave Jiang 写道:
> 
> 
> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>
>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>
>>>>
>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>
>>>>>
>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>
>>>>>>
>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>> states.
>>>>>>>
>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>
>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>> reset/recovery path. For example:
>>>>>>>
>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>> ------------------                 ------------------------
>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>                                       b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>                                       b2. clear wq->state
>>>>>>>                                       b3. check wq_enable_map bit, not set
>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>> a3. set wq_enable_map
>>>>>>>
>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>
>>>>>>
>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>
>>>>>> DJ
>>>>>
>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>
>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>> --- a/drivers/dma/idxd/device.c
>>>>> +++ b/drivers/dma/idxd/device.c
>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>    }
>>>>>    EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>    
>>>>> +/*
>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>> + * updated.
>>>>> + *
>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>> + * issues arise.
>>>>> + */
>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>> +{
>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>> +       int ret;
>>>>> +
>>>>> +       spin_lock(&idxd->dev_lock);
>>>>
>>>> Let's start using the new cleanup macro going forward:
>>>> guard(spinlock)(&idxd->dev_lock);
>>>>
>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>> guard(device)(idxd_confdev(idxd));
>>>
>>> Then we need to replace the lock from spinlock to mutex lock?
>>
>> We still need a (spin) lock that we could hold in interrupt contexts.
>>
>>>
>>>>
>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>> guard(device)(wq_confdev(wq));
>>>>
>>>> If you are up for it that is.
>>>
>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>
>>>       (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>
>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>> lock of wq_confdev(wq)?
>>>
>>> We have mounts of code use spinlock of idxd->dev_lock under
>>> hold of wq->wq_lock.
>>>
>>
>> I agree with Dave that the locking could be simplified, but I don't
>> think that we should hold this series because of that. That
>> simplification can be done later.
> 
> I agree. Just passing musing on the current code.

Got it, do I need to send a separate patch for Patch 2?

Thanks.
Shuai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-04  8:55               ` Shuai Xue
@ 2025-06-04 14:19                 ` Dave Jiang
  2025-06-05  7:40                   ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-06-04 14:19 UTC (permalink / raw)
  To: Shuai Xue, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 6/4/25 1:55 AM, Shuai Xue wrote:
> 
> 
> 在 2025/6/3 22:32, Dave Jiang 写道:
>>
>>
>> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>
>>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>>
>>>>>
>>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>>
>>>>>>
>>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>>
>>>>>>>
>>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>>> states.
>>>>>>>>
>>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>>
>>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>>> reset/recovery path. For example:
>>>>>>>>
>>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>>> ------------------                 ------------------------
>>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>>                                       b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>>                                       b2. clear wq->state
>>>>>>>>                                       b3. check wq_enable_map bit, not set
>>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>>> a3. set wq_enable_map
>>>>>>>>
>>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>>
>>>>>>>
>>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>>
>>>>>>> DJ
>>>>>>
>>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>>
>>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>>> --- a/drivers/dma/idxd/device.c
>>>>>> +++ b/drivers/dma/idxd/device.c
>>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>>    }
>>>>>>    EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>>    +/*
>>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>>> + * updated.
>>>>>> + *
>>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>>> + * issues arise.
>>>>>> + */
>>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>>> +{
>>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>>> +       int ret;
>>>>>> +
>>>>>> +       spin_lock(&idxd->dev_lock);
>>>>>
>>>>> Let's start using the new cleanup macro going forward:
>>>>> guard(spinlock)(&idxd->dev_lock);
>>>>>
>>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>>> guard(device)(idxd_confdev(idxd));
>>>>
>>>> Then we need to replace the lock from spinlock to mutex lock?
>>>
>>> We still need a (spin) lock that we could hold in interrupt contexts.
>>>
>>>>
>>>>>
>>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>>> guard(device)(wq_confdev(wq));
>>>>>
>>>>> If you are up for it that is.
>>>>
>>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>>
>>>>       (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>>
>>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>>> lock of wq_confdev(wq)?
>>>>
>>>> We have mounts of code use spinlock of idxd->dev_lock under
>>>> hold of wq->wq_lock.
>>>>
>>>
>>> I agree with Dave that the locking could be simplified, but I don't
>>> think that we should hold this series because of that. That
>>> simplification can be done later.
>>
>> I agree. Just passing musing on the current code.
> 
> Got it, do I need to send a separate patch for Patch 2?

Not sure what you mean. Do you mean if you need to send patch 2 again?
> 
> Thanks.
> Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-04 14:19                 ` Dave Jiang
@ 2025-06-05  7:40                   ` Shuai Xue
  2025-06-06 14:32                     ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-06-05  7:40 UTC (permalink / raw)
  To: Dave Jiang, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



在 2025/6/4 22:19, Dave Jiang 写道:
> 
> 
> On 6/4/25 1:55 AM, Shuai Xue wrote:
>>
>>
>> 在 2025/6/3 22:32, Dave Jiang 写道:
>>>
>>>
>>> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>>
>>>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>>>
>>>>>>
>>>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>>>
>>>>>>>
>>>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>>>> states.
>>>>>>>>>
>>>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>>>
>>>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>>>> reset/recovery path. For example:
>>>>>>>>>
>>>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>>>> ------------------                 ------------------------
>>>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>>>                                        b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>>>                                        b2. clear wq->state
>>>>>>>>>                                        b3. check wq_enable_map bit, not set
>>>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>>>> a3. set wq_enable_map
>>>>>>>>>
>>>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>>>
>>>>>>>>
>>>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>>>
>>>>>>>> DJ
>>>>>>>
>>>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>>>
>>>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>>>> --- a/drivers/dma/idxd/device.c
>>>>>>> +++ b/drivers/dma/idxd/device.c
>>>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>>>     +/*
>>>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>>>> + * updated.
>>>>>>> + *
>>>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>>>> + * issues arise.
>>>>>>> + */
>>>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>>>> +{
>>>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>>>> +       int ret;
>>>>>>> +
>>>>>>> +       spin_lock(&idxd->dev_lock);
>>>>>>
>>>>>> Let's start using the new cleanup macro going forward:
>>>>>> guard(spinlock)(&idxd->dev_lock);
>>>>>>
>>>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>>>> guard(device)(idxd_confdev(idxd));
>>>>>
>>>>> Then we need to replace the lock from spinlock to mutex lock?
>>>>
>>>> We still need a (spin) lock that we could hold in interrupt contexts.
>>>>
>>>>>
>>>>>>
>>>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>>>> guard(device)(wq_confdev(wq));
>>>>>>
>>>>>> If you are up for it that is.
>>>>>
>>>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>>>
>>>>>        (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>>>
>>>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>>>> lock of wq_confdev(wq)?
>>>>>
>>>>> We have mounts of code use spinlock of idxd->dev_lock under
>>>>> hold of wq->wq_lock.
>>>>>
>>>>
>>>> I agree with Dave that the locking could be simplified, but I don't
>>>> think that we should hold this series because of that. That
>>>> simplification can be done later.
>>>
>>> I agree. Just passing musing on the current code.
>>
>> Got it, do I need to send a separate patch for Patch 2?
> 
> Not sure what you mean. Do you mean if you need to send patch 2 again?

Yep, the locking issue is more complicate and can be done later.
(I could split patch 2 from this patch set if you prefer a new patch.)

>>
>> Thanks.
>> Shuai


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-05  7:40                   ` Shuai Xue
@ 2025-06-06 14:32                     ` Dave Jiang
  2025-06-16  1:54                       ` Shuai Xue
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jiang @ 2025-06-06 14:32 UTC (permalink / raw)
  To: Shuai Xue, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 6/5/25 12:40 AM, Shuai Xue wrote:
> 
> 
> 在 2025/6/4 22:19, Dave Jiang 写道:
>>
>>
>> On 6/4/25 1:55 AM, Shuai Xue wrote:
>>>
>>>
>>> 在 2025/6/3 22:32, Dave Jiang 写道:
>>>>
>>>>
>>>> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>>>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>>>
>>>>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>>>>
>>>>>>>
>>>>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>>>>> states.
>>>>>>>>>>
>>>>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>>>>
>>>>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>>>>> reset/recovery path. For example:
>>>>>>>>>>
>>>>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>>>>> ------------------                 ------------------------
>>>>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>>>>                                        b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>>>>                                        b2. clear wq->state
>>>>>>>>>>                                        b3. check wq_enable_map bit, not set
>>>>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>>>>> a3. set wq_enable_map
>>>>>>>>>>
>>>>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>>>>
>>>>>>>>> DJ
>>>>>>>>
>>>>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>>>>
>>>>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>>>>> --- a/drivers/dma/idxd/device.c
>>>>>>>> +++ b/drivers/dma/idxd/device.c
>>>>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>>>>     +/*
>>>>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>>>>> + * updated.
>>>>>>>> + *
>>>>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>>>>> + * issues arise.
>>>>>>>> + */
>>>>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>>>>> +{
>>>>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>>>>> +       int ret;
>>>>>>>> +
>>>>>>>> +       spin_lock(&idxd->dev_lock);
>>>>>>>
>>>>>>> Let's start using the new cleanup macro going forward:
>>>>>>> guard(spinlock)(&idxd->dev_lock);
>>>>>>>
>>>>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>>>>> guard(device)(idxd_confdev(idxd));
>>>>>>
>>>>>> Then we need to replace the lock from spinlock to mutex lock?
>>>>>
>>>>> We still need a (spin) lock that we could hold in interrupt contexts.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>>>>> guard(device)(wq_confdev(wq));
>>>>>>>
>>>>>>> If you are up for it that is.
>>>>>>
>>>>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>>>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>>>>
>>>>>>        (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>>>>
>>>>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>>>>> lock of wq_confdev(wq)?
>>>>>>
>>>>>> We have mounts of code use spinlock of idxd->dev_lock under
>>>>>> hold of wq->wq_lock.
>>>>>>
>>>>>
>>>>> I agree with Dave that the locking could be simplified, but I don't
>>>>> think that we should hold this series because of that. That
>>>>> simplification can be done later.
>>>>
>>>> I agree. Just passing musing on the current code.
>>>
>>> Got it, do I need to send a separate patch for Patch 2?
>>
>> Not sure what you mean. Do you mean if you need to send patch 2 again?
> 
> Yep, the locking issue is more complicate and can be done later.
> (I could split patch 2 from this patch set if you prefer a new patch.)

Yes split it out and send it ahead if it can go independently.

> 
>>>
>>> Thanks.
>>> Shuai
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-06 14:32                     ` Dave Jiang
@ 2025-06-16  1:54                       ` Shuai Xue
  2025-06-16 15:22                         ` Dave Jiang
  0 siblings, 1 reply; 16+ messages in thread
From: Shuai Xue @ 2025-06-16  1:54 UTC (permalink / raw)
  To: Dave Jiang, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



在 2025/6/6 22:32, Dave Jiang 写道:
> 
> 
> On 6/5/25 12:40 AM, Shuai Xue wrote:
>>
>>
>> 在 2025/6/4 22:19, Dave Jiang 写道:
>>>
>>>
>>> On 6/4/25 1:55 AM, Shuai Xue wrote:
>>>>
>>>>
>>>> 在 2025/6/3 22:32, Dave Jiang 写道:
>>>>>
>>>>>
>>>>> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>>>>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>>>>
>>>>>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>>>>>> states.
>>>>>>>>>>>
>>>>>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>>>>>
>>>>>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>>>>>> reset/recovery path. For example:
>>>>>>>>>>>
>>>>>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>>>>>> ------------------                 ------------------------
>>>>>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>>>>>                                         b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>>>>>                                         b2. clear wq->state
>>>>>>>>>>>                                         b3. check wq_enable_map bit, not set
>>>>>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>>>>>> a3. set wq_enable_map
>>>>>>>>>>>
>>>>>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>>>>>
>>>>>>>>>> DJ
>>>>>>>>>
>>>>>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>>>>>> --- a/drivers/dma/idxd/device.c
>>>>>>>>> +++ b/drivers/dma/idxd/device.c
>>>>>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>>>>>      }
>>>>>>>>>      EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>>>>>      +/*
>>>>>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>>>>>> + * updated.
>>>>>>>>> + *
>>>>>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>>>>>> + * issues arise.
>>>>>>>>> + */
>>>>>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>>>>>> +{
>>>>>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>>>>>> +       int ret;
>>>>>>>>> +
>>>>>>>>> +       spin_lock(&idxd->dev_lock);
>>>>>>>>
>>>>>>>> Let's start using the new cleanup macro going forward:
>>>>>>>> guard(spinlock)(&idxd->dev_lock);
>>>>>>>>
>>>>>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>>>>>> guard(device)(idxd_confdev(idxd));
>>>>>>>
>>>>>>> Then we need to replace the lock from spinlock to mutex lock?
>>>>>>
>>>>>> We still need a (spin) lock that we could hold in interrupt contexts.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>>>>>> guard(device)(wq_confdev(wq));
>>>>>>>>
>>>>>>>> If you are up for it that is.
>>>>>>>
>>>>>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>>>>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>>>>>
>>>>>>>         (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>>>>>
>>>>>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>>>>>> lock of wq_confdev(wq)?
>>>>>>>
>>>>>>> We have mounts of code use spinlock of idxd->dev_lock under
>>>>>>> hold of wq->wq_lock.
>>>>>>>
>>>>>>
>>>>>> I agree with Dave that the locking could be simplified, but I don't
>>>>>> think that we should hold this series because of that. That
>>>>>> simplification can be done later.
>>>>>
>>>>> I agree. Just passing musing on the current code.
>>>>
>>>> Got it, do I need to send a separate patch for Patch 2?
>>>
>>> Not sure what you mean. Do you mean if you need to send patch 2 again?
>>
>> Yep, the locking issue is more complicate and can be done later.
>> (I could split patch 2 from this patch set if you prefer a new patch.)
> 
> Yes split it out and send it ahead if it can go independently.

Hi, Dave,

I split patch 2 out.

As for this race issue between WQ enable and reset paths, do you have any plan
or idea?


> 
>>
>>>>
>>>> Thanks.
>>>> Shuai
>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths
  2025-06-16  1:54                       ` Shuai Xue
@ 2025-06-16 15:22                         ` Dave Jiang
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Jiang @ 2025-06-16 15:22 UTC (permalink / raw)
  To: Shuai Xue, Vinicius Costa Gomes, fenghuay, vkoul
  Cc: dmaengine, colin.i.king, linux-kernel



On 6/15/25 6:54 PM, Shuai Xue wrote:
> 
> 
> 在 2025/6/6 22:32, Dave Jiang 写道:
>>
>>
>> On 6/5/25 12:40 AM, Shuai Xue wrote:
>>>
>>>
>>> 在 2025/6/4 22:19, Dave Jiang 写道:
>>>>
>>>>
>>>> On 6/4/25 1:55 AM, Shuai Xue wrote:
>>>>>
>>>>>
>>>>> 在 2025/6/3 22:32, Dave Jiang 写道:
>>>>>>
>>>>>>
>>>>>> On 5/27/25 7:21 PM, Vinicius Costa Gomes wrote:
>>>>>>> Shuai Xue <xueshuai@linux.alibaba.com> writes:
>>>>>>>
>>>>>>>> 在 2025/5/23 22:54, Dave Jiang 写道:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/22/25 10:20 PM, Shuai Xue wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2025/5/22 22:55, Dave Jiang 写道:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/21/25 11:33 PM, Shuai Xue wrote:
>>>>>>>>>>>> A device reset command disables all WQs in hardware. If issued while a WQ
>>>>>>>>>>>> is being enabled, it can cause a mismatch between the software and hardware
>>>>>>>>>>>> states.
>>>>>>>>>>>>
>>>>>>>>>>>> When a hardware error occurs, the IDXD driver calls idxd_device_reset() to
>>>>>>>>>>>> send a reset command and clear the state (wq->state) of all WQs. It then
>>>>>>>>>>>> uses wq_enable_map (a bitmask tracking enabled WQs) to re-enable them and
>>>>>>>>>>>> ensure consistency between the software and hardware states.
>>>>>>>>>>>>
>>>>>>>>>>>> However, a race condition exists between the WQ enable path and the
>>>>>>>>>>>> reset/recovery path. For example:
>>>>>>>>>>>>
>>>>>>>>>>>> A: WQ enable path                   B: Reset and recovery path
>>>>>>>>>>>> ------------------                 ------------------------
>>>>>>>>>>>> a1. issue IDXD_CMD_ENABLE_WQ
>>>>>>>>>>>>                                         b1. issue IDXD_CMD_RESET_DEVICE
>>>>>>>>>>>>                                         b2. clear wq->state
>>>>>>>>>>>>                                         b3. check wq_enable_map bit, not set
>>>>>>>>>>>> a2. set wq->state = IDXD_WQ_ENABLED
>>>>>>>>>>>> a3. set wq_enable_map
>>>>>>>>>>>>
>>>>>>>>>>>> In this case, b1 issues a reset command that disables all WQs in hardware.
>>>>>>>>>>>> Since b3 checks wq_enable_map before a2, it doesn't re-enable the WQ,
>>>>>>>>>>>> leading to an inconsistency between wq->state (software) and the actual
>>>>>>>>>>>> hardware state (IDXD_WQ_DISABLED).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Would it lessen the complication to just have wq enable path grab the device lock before proceeding?
>>>>>>>>>>>
>>>>>>>>>>> DJ
>>>>>>>>>>
>>>>>>>>>> Yep, how about add a spin lock to enable wq and reset device path.
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
>>>>>>>>>> index 38633ec5b60e..c0dc904b2a94 100644
>>>>>>>>>> --- a/drivers/dma/idxd/device.c
>>>>>>>>>> +++ b/drivers/dma/idxd/device.c
>>>>>>>>>> @@ -203,6 +203,29 @@ int idxd_wq_enable(struct idxd_wq *wq)
>>>>>>>>>>      }
>>>>>>>>>>      EXPORT_SYMBOL_GPL(idxd_wq_enable);
>>>>>>>>>>      +/*
>>>>>>>>>> + * This function enables a WQ in hareware and updates the driver maintained
>>>>>>>>>> + * wq->state to IDXD_WQ_ENABLED. It should be called with the dev_lock held
>>>>>>>>>> + * to prevent race conditions with IDXD_CMD_RESET_DEVICE, which could
>>>>>>>>>> + * otherwise disable the WQ without the driver's state being properly
>>>>>>>>>> + * updated.
>>>>>>>>>> + *
>>>>>>>>>> + * For IDXD_CMD_DISABLE_DEVICE, this function is safe because it is only
>>>>>>>>>> + * called after the WQ has been explicitly disabled, so no concurrency
>>>>>>>>>> + * issues arise.
>>>>>>>>>> + */
>>>>>>>>>> +int idxd_wq_enable_locked(struct idxd_wq *wq)
>>>>>>>>>> +{
>>>>>>>>>> +       struct idxd_device *idxd = wq->idxd;
>>>>>>>>>> +       int ret;
>>>>>>>>>> +
>>>>>>>>>> +       spin_lock(&idxd->dev_lock);
>>>>>>>>>
>>>>>>>>> Let's start using the new cleanup macro going forward:
>>>>>>>>> guard(spinlock)(&idxd->dev_lock);
>>>>>>>>>
>>>>>>>>> On a side note, there's been a cleanup on my mind WRT this driver's locking. I think we can replace idxd->dev_lock with idxd_confdev(idxd) device lock. You can end up just do:
>>>>>>>>> guard(device)(idxd_confdev(idxd));
>>>>>>>>
>>>>>>>> Then we need to replace the lock from spinlock to mutex lock?
>>>>>>>
>>>>>>> We still need a (spin) lock that we could hold in interrupt contexts.
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> And also drop the wq->wq_lock and replace with wq_confdev(wq) device lock:
>>>>>>>>> guard(device)(wq_confdev(wq));
>>>>>>>>>
>>>>>>>>> If you are up for it that is.
>>>>>>>>
>>>>>>>> We creates a hierarchy: pdev -> idxd device -> wq device.
>>>>>>>> idxd_confdev(idxd) is the parent of wq_confdev(wq) because:
>>>>>>>>
>>>>>>>>         (wq_confdev(wq))->parent = idxd_confdev(idxd);
>>>>>>>>
>>>>>>>> Is it safe to grap lock of idxd_confdev(idxd) under hold
>>>>>>>> lock of wq_confdev(wq)?
>>>>>>>>
>>>>>>>> We have mounts of code use spinlock of idxd->dev_lock under
>>>>>>>> hold of wq->wq_lock.
>>>>>>>>
>>>>>>>
>>>>>>> I agree with Dave that the locking could be simplified, but I don't
>>>>>>> think that we should hold this series because of that. That
>>>>>>> simplification can be done later.
>>>>>>
>>>>>> I agree. Just passing musing on the current code.
>>>>>
>>>>> Got it, do I need to send a separate patch for Patch 2?
>>>>
>>>> Not sure what you mean. Do you mean if you need to send patch 2 again?
>>>
>>> Yep, the locking issue is more complicate and can be done later.
>>> (I could split patch 2 from this patch set if you prefer a new patch.)
>>
>> Yes split it out and send it ahead if it can go independently.
> 
> Hi, Dave,
> 
> I split patch 2 out.
> 
> As for this race issue between WQ enable and reset paths, do you have any plan
> or idea?

Need a lock to protect the paths. Maybe a patch to use the existing wq lock for now and somebody can clean up the locking after that? I don't work on this driver anymore. So either you can propose a locking cleanup or if Vinicius has time maybe he can take a look. 

> 
> 
>>
>>>
>>>>>
>>>>> Thanks.
>>>>> Shuai
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-06-16 15:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  6:33 [PATCH v1 0/2] dmaengine: idxd: minor fixes for idxd driver Shuai Xue
2025-05-22  6:33 ` [PATCH v1 1/2] dmaengine: idxd: Fix race condition between WQ enable and reset paths Shuai Xue
2025-05-22 14:55   ` Dave Jiang
2025-05-23  5:20     ` Shuai Xue
2025-05-23 14:54       ` Dave Jiang
2025-05-24  5:40         ` Shuai Xue
2025-05-28  2:21           ` Vinicius Costa Gomes
2025-06-03 14:32             ` Dave Jiang
2025-06-04  8:55               ` Shuai Xue
2025-06-04 14:19                 ` Dave Jiang
2025-06-05  7:40                   ` Shuai Xue
2025-06-06 14:32                     ` Dave Jiang
2025-06-16  1:54                       ` Shuai Xue
2025-06-16 15:22                         ` Dave Jiang
2025-05-22  6:33 ` [PATCH v1 2/2] dmaengine: idxd: fix potential NULL pointer dereference on wq setup error path Shuai Xue
2025-05-22 14:56   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).