netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [[PATCH v2 iwl-next] v2 0/4]
@ 2024-08-26 18:10 Manoj Vishwanathan
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Manoj Vishwanathan @ 2024-08-26 18:10 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
	intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan

IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues

This patch series addresses several IDPF virtchnl issues:

* Improved error reporting for better diagnostics.
* Fixed locking sequence in virtchnl message handling to avoid potential race conditions.
* Converted idpf workqueues to unbound to prevent virtchnl processing delays under heavy load.

Previously, CPU-bound kworkers for virtchnl processing could be starved,
leading to transaction timeouts and connection failures.
This was particularly problematic when IRQ traffic and user space processes contended for the same CPU. 

By making the workqueues unbound, we ensure virtchnl processing is not tied to a specific CPU,
improving responsiveness even under high system load.

---
V2:
 - Dropped patch from Willem
 - RCS/RCT variable naming
 - Improved commit message on feddback
v1: https://lore.kernel.org/netdev/20240813182747.1770032-2-manojvishy@google.com/T/

David Decotigny (1):
  idpf: address an rtnl lock splat in tx timeout recovery path

Manoj Vishwanathan (2):
  idpf: Acquire the lock before accessing the xn->salt
  idpf: add more info during virtchnl transaction time out

Marco Leogrande (1):
  idpf: convert workqueues to unbound

 drivers/net/ethernet/intel/idpf/idpf_main.c     | 15 ++++++++++-----
 drivers/net/ethernet/intel/idpf/idpf_txrx.c     | 14 +++++++++++++-
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 12 +++++++-----
 3 files changed, 30 insertions(+), 11 deletions(-)

-- 
2.46.0.295.g3b9ea8a38a-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path
  2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
@ 2024-08-26 18:10 ` Manoj Vishwanathan
  2024-08-28 21:28   ` [Intel-wired-lan] " Jacob Keller
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Manoj Vishwanathan @ 2024-08-26 18:10 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
	intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, David Decotigny,
	Manoj Vishwanathan

From: David Decotigny <decot@google.com>

Adopt the same pattern as in other places in the code to take the rtnl
lock during hard resets.
Tested the patch by injecting tx timeout in IDPF , observe that idpf
recovers and IDPF comes back reachable

Without this patch causes there is a splat:
[  270.145214] WARNING: CPU:  PID:  at net/sched/sch_generic.c:534 dev_watchdog

Fixes: d4d5587182664 (idpf: initialize interrupts and enable vport)
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index af2879f03b8d..806a8b6ea5c5 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -4326,6 +4326,7 @@ int idpf_vport_intr_alloc(struct idpf_vport *vport)
  */
 int idpf_vport_intr_init(struct idpf_vport *vport)
 {
+	bool hr_reset_in_prog;
 	char *int_name;
 	int err;
 
@@ -4334,8 +4335,19 @@ int idpf_vport_intr_init(struct idpf_vport *vport)
 		return err;
 
 	idpf_vport_intr_map_vector_to_qs(vport);
+	/** 
+	 * If we're in normal up path, the stack already takes the
+	 * rtnl_lock for us, however, if we're doing up as a part of a
+	 * hard reset, we'll need to take the lock ourself before
+	 * touching the netdev.
+	 */
+	hr_reset_in_prog = test_bit(IDPF_HR_RESET_IN_PROG,
+					vport->adapter->flags);
+	if (hr_reset_in_prog)
+		rtnl_lock();
 	idpf_vport_intr_napi_add_all(vport);
-
+	if (hr_reset_in_prog)
+		rtnl_unlock();
 	err = vport->adapter->dev_ops.reg_ops.intr_reg_init(vport);
 	if (err)
 		goto unroll_vectors_alloc;
-- 
2.46.0.295.g3b9ea8a38a-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt
  2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
@ 2024-08-26 18:10 ` Manoj Vishwanathan
  2024-08-28 21:29   ` [Intel-wired-lan] " Jacob Keller
  2024-08-29 15:54   ` Linga, Pavan Kumar
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound Manoj Vishwanathan
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Manoj Vishwanathan @ 2024-08-26 18:10 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
	intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan

The transaction salt was being accessed before acquiring the
idpf_vc_xn_lock when idpf has to forward the virtchnl reply.

Fixes: 34c21fa894a1a (“idpf: implement virtchnl transaction manager”)
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 70986e12da28..30eec674d594 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
 		return -EINVAL;
 	}
 	xn = &adapter->vcxn_mngr->ring[xn_idx];
+	idpf_vc_xn_lock(xn);
 	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
 	if (xn->salt != salt) {
 		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
 				    xn->salt, salt);
+		idpf_vc_xn_unlock(xn);
 		return -EINVAL;
 	}
 
-	idpf_vc_xn_lock(xn);
 	switch (xn->state) {
 	case IDPF_VC_XN_WAITING:
 		/* success */
-- 
2.46.0.295.g3b9ea8a38a-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound
  2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
@ 2024-08-26 18:10 ` Manoj Vishwanathan
  2024-08-28 22:02   ` [Intel-wired-lan] " Jacob Keller
  2024-08-29 16:02   ` Linga, Pavan Kumar
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out Manoj Vishwanathan
  2024-08-29 20:11 ` [Intel-wired-lan] [[PATCH v2 iwl-next] v2 0/4] Tony Nguyen
  4 siblings, 2 replies; 15+ messages in thread
From: Manoj Vishwanathan @ 2024-08-26 18:10 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
	intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Marco Leogrande,
	Manoj Vishwanathan

From: Marco Leogrande <leogrande@google.com>

When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.

This is not just a theoretical problem: in a praticular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.

Tested:

* I have also run a manual test to gauge the performance
  improvement. The test consists of an antagonist process
  (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
  process is run under `taskset 01` to bind it to CPU0, and its
  priority is changed with `chrt -pQ 9900 10000 ${pid}` and
  `renice -n -20 ${pid}` after start.

  Then, the IDPF driver is forced to prefer CPU0 by editing all calls
  to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.

  Finally, `ktraces` for the workqueue events are collected.

  Without the current patch, the antagonist process can force
  arbitrary delays between `workqueue_queue_work` and
  `workqueue_execute_start`, that in my tests were as high as
  `30ms`. With the current patch applied, the workqueue can be
  migrated to another unloaded CPU in the same node, and, keeping
  everything else equal, the maximum delay I could see was `6us`.

Fixes: 0fe45467a1041 (idpf: add create vport and netdev configuration)
Signed-off-by: Marco Leogrande <leogrande@google.com>
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index db476b3314c8..dfd56fc5ff65 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -174,7 +174,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_master(pdev);
 	pci_set_drvdata(pdev, adapter);
 
-	adapter->init_wq = alloc_workqueue("%s-%s-init", 0, 0,
+	adapter->init_wq = alloc_workqueue("%s-%s-init",
+					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					   dev_driver_string(dev),
 					   dev_name(dev));
 	if (!adapter->init_wq) {
@@ -183,7 +184,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_free;
 	}
 
-	adapter->serv_wq = alloc_workqueue("%s-%s-service", 0, 0,
+	adapter->serv_wq = alloc_workqueue("%s-%s-service",
+					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					   dev_driver_string(dev),
 					   dev_name(dev));
 	if (!adapter->serv_wq) {
@@ -192,7 +194,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_serv_wq_alloc;
 	}
 
-	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx", 0, 0,
+	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx",
+					  WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					  dev_driver_string(dev),
 					  dev_name(dev));
 	if (!adapter->mbx_wq) {
@@ -201,7 +204,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_mbx_wq_alloc;
 	}
 
-	adapter->stats_wq = alloc_workqueue("%s-%s-stats", 0, 0,
+	adapter->stats_wq = alloc_workqueue("%s-%s-stats",
+					    WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					    dev_driver_string(dev),
 					    dev_name(dev));
 	if (!adapter->stats_wq) {
@@ -210,7 +214,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_stats_wq_alloc;
 	}
 
-	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event", 0, 0,
+	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event",
+					       WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					       dev_driver_string(dev),
 					       dev_name(dev));
 	if (!adapter->vc_event_wq) {
-- 
2.46.0.295.g3b9ea8a38a-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out
  2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
                   ` (2 preceding siblings ...)
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound Manoj Vishwanathan
@ 2024-08-26 18:10 ` Manoj Vishwanathan
  2024-08-28 22:03   ` [Intel-wired-lan] " Jacob Keller
  2024-08-29 16:07   ` Linga, Pavan Kumar
  2024-08-29 20:11 ` [Intel-wired-lan] [[PATCH v2 iwl-next] v2 0/4] Tony Nguyen
  4 siblings, 2 replies; 15+ messages in thread
From: Manoj Vishwanathan @ 2024-08-26 18:10 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
	intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan

Add more information related to the transaction like cookie, vc_op,
salt when transaction times out and include similar information
when transaction salt does not match.

Info output for transaction timeout:
-------------------
(op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------

Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 30eec674d594..d8294f31fdf9 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -517,8 +517,9 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
 		retval = -ENXIO;
 		goto only_unlock;
 	case IDPF_VC_XN_WAITING:
-		dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
-				       params->vc_op, params->timeout_ms);
+		dev_notice_ratelimited(&adapter->pdev->dev,
+				"Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
+				params->vc_op, cookie, xn->vc_op, xn->salt, params->timeout_ms);
 		retval = -ETIME;
 		break;
 	case IDPF_VC_XN_COMPLETED_SUCCESS:
@@ -615,8 +616,8 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
 	idpf_vc_xn_lock(xn);
 	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
 	if (xn->salt != salt) {
-		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
-				    xn->salt, salt);
+		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
+				xn->vc_op, xn->salt, xn->state, ctlq_msg->cookie.mbx.chnl_opcode, salt);
 		idpf_vc_xn_unlock(xn);
 		return -EINVAL;
 	}
-- 
2.46.0.295.g3b9ea8a38a-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
@ 2024-08-28 21:28   ` Jacob Keller
  0 siblings, 0 replies; 15+ messages in thread
From: Jacob Keller @ 2024-08-28 21:28 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, David Decotigny, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> From: David Decotigny <decot@google.com>
> 
> Adopt the same pattern as in other places in the code to take the rtnl
> lock during hard resets.
> Tested the patch by injecting tx timeout in IDPF , observe that idpf
> recovers and IDPF comes back reachable
> 
> Without this patch causes there is a splat:
> [  270.145214] WARNING: CPU:  PID:  at net/sched/sch_generic.c:534 dev_watchdog
> 
> Fixes: d4d5587182664 (idpf: initialize interrupts and enable vport)
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index af2879f03b8d..806a8b6ea5c5 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -4326,6 +4326,7 @@ int idpf_vport_intr_alloc(struct idpf_vport *vport)
>   */
>  int idpf_vport_intr_init(struct idpf_vport *vport)
>  {
> +	bool hr_reset_in_prog;
>  	char *int_name;
>  	int err;
>  
> @@ -4334,8 +4335,19 @@ int idpf_vport_intr_init(struct idpf_vport *vport)
>  		return err;
>  
>  	idpf_vport_intr_map_vector_to_qs(vport);
> +	/** 
> +	 * If we're in normal up path, the stack already takes the
> +	 * rtnl_lock for us, however, if we're doing up as a part of a
> +	 * hard reset, we'll need to take the lock ourself before
> +	 * touching the netdev.
> +	 */
> +	hr_reset_in_prog = test_bit(IDPF_HR_RESET_IN_PROG,
> +					vport->adapter->flags);
> +	if (hr_reset_in_prog)
> +		rtnl_lock();
>  	idpf_vport_intr_napi_add_all(vport);
> -
> +	if (hr_reset_in_prog)
> +		rtnl_unlock();

This feels a little fragile. Why not pass the reset in progress as a
flag from the caller? Surely the caller knows whether this is happening
due to an interface up or due to a reset?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
@ 2024-08-28 21:29   ` Jacob Keller
  2024-08-30  6:04     ` Przemek Kitszel
  2024-08-29 15:54   ` Linga, Pavan Kumar
  1 sibling, 1 reply; 15+ messages in thread
From: Jacob Keller @ 2024-08-28 21:29 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> The transaction salt was being accessed before acquiring the
> idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
> 
> Fixes: 34c21fa894a1a (“idpf: implement virtchnl transaction manager”)
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 70986e12da28..30eec674d594 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
>  		return -EINVAL;
>  	}
>  	xn = &adapter->vcxn_mngr->ring[xn_idx];
> +	idpf_vc_xn_lock(xn);

Could look at implementing cleanup.h based locking here so that we could
use guard or scope_guard and not have to litter the exit paths with unlocks.

I don't think that needs to be done in this patch, though.

>  	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
>  	if (xn->salt != salt) {
>  		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
>  				    xn->salt, salt);
> +		idpf_vc_xn_unlock(xn);
>  		return -EINVAL;
>  	}
>  
> -	idpf_vc_xn_lock(xn);
>  	switch (xn->state) {
>  	case IDPF_VC_XN_WAITING:
>  		/* success */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound Manoj Vishwanathan
@ 2024-08-28 22:02   ` Jacob Keller
  2024-08-29 16:02   ` Linga, Pavan Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: Jacob Keller @ 2024-08-28 22:02 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Marco Leogrande



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> From: Marco Leogrande <leogrande@google.com>
> 
> When a workqueue is created with `WQ_UNBOUND`, its work items are
> served by special worker-pools, whose host workers are not bound to
> any specific CPU. In the default configuration (i.e. when
> `queue_delayed_work` and friends do not specify which CPU to run the
> work item on), `WQ_UNBOUND` allows the work item to be executed on any
> CPU in the same node of the CPU it was enqueued on. While this
> solution potentially sacrifices locality, it avoids contention with
> other processes that might dominate the CPU time of the processor the
> work item was scheduled on.
> 
> This is not just a theoretical problem: in a praticular scenario

Nit: s/praticular/particular/

> misconfigured process was hogging most of the time from CPU0, leaving
> less than 0.5% of its CPU time to the kworker. The IDPF workqueues
> that were using the kworker on CPU0 suffered large completion delays
> as a result, causing performance degradation, timeouts and eventual
> system crash.
> 

Curious how the delay could result in a full system crash. That seems
like some other concurrency issue. I guess something like a Tx timeout
could happen though.

> Tested:
> 
> * I have also run a manual test to gauge the performance
>   improvement. The test consists of an antagonist process
>   (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
>   process is run under `taskset 01` to bind it to CPU0, and its
>   priority is changed with `chrt -pQ 9900 10000 ${pid}` and
>   `renice -n -20 ${pid}` after start.
> 
>   Then, the IDPF driver is forced to prefer CPU0 by editing all calls
>   to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
> 
>   Finally, `ktraces` for the workqueue events are collected.
> 
>   Without the current patch, the antagonist process can force
>   arbitrary delays between `workqueue_queue_work` and
>   `workqueue_execute_start`, that in my tests were as high as
>   `30ms`. With the current patch applied, the workqueue can be
>   migrated to another unloaded CPU in the same node, and, keeping
>   everything else equal, the maximum delay I could see was `6us`.
> 

Hmm. I don't have a direct issue with using WQ_UNBOUND, and I can't
think of any reason these work queue tasks *need* to be CPU bound.

I do feel like there may be other solutions to managing the tasks on the
system such that this isn't necessary.

However, if using WQ_UNBOUND solves these problems and is simpler in
that system administrators are less likely to screw things up, I think
its a net positive.

I do not know if there are any other side effects of WQ_UNBOUND, so take
this with a grain of salt:
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>


> Fixes: 0fe45467a1041 (idpf: add create vport and netdev configuration)
> Signed-off-by: Marco Leogrande <leogrande@google.com>
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index db476b3314c8..dfd56fc5ff65 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -174,7 +174,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	pci_set_master(pdev);
>  	pci_set_drvdata(pdev, adapter);
>  
> -	adapter->init_wq = alloc_workqueue("%s-%s-init", 0, 0,
> +	adapter->init_wq = alloc_workqueue("%s-%s-init",
> +					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					   dev_driver_string(dev),
>  					   dev_name(dev));
>  	if (!adapter->init_wq) {
> @@ -183,7 +184,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto err_free;
>  	}
>  
> -	adapter->serv_wq = alloc_workqueue("%s-%s-service", 0, 0,
> +	adapter->serv_wq = alloc_workqueue("%s-%s-service",
> +					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					   dev_driver_string(dev),
>  					   dev_name(dev));
>  	if (!adapter->serv_wq) {
> @@ -192,7 +194,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto err_serv_wq_alloc;
>  	}
>  
> -	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx", 0, 0,
> +	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx",
> +					  WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					  dev_driver_string(dev),
>  					  dev_name(dev));
>  	if (!adapter->mbx_wq) {
> @@ -201,7 +204,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto err_mbx_wq_alloc;
>  	}
>  
> -	adapter->stats_wq = alloc_workqueue("%s-%s-stats", 0, 0,
> +	adapter->stats_wq = alloc_workqueue("%s-%s-stats",
> +					    WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					    dev_driver_string(dev),
>  					    dev_name(dev));
>  	if (!adapter->stats_wq) {
> @@ -210,7 +214,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto err_stats_wq_alloc;
>  	}
>  
> -	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event", 0, 0,
> +	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event",
> +					       WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					       dev_driver_string(dev),
>  					       dev_name(dev));
>  	if (!adapter->vc_event_wq) {

This seems like quite a lot of work queues for a driver :D

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out Manoj Vishwanathan
@ 2024-08-28 22:03   ` Jacob Keller
  2024-08-29 16:07   ` Linga, Pavan Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: Jacob Keller @ 2024-08-28 22:03 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> Add more information related to the transaction like cookie, vc_op,
> salt when transaction times out and include similar information
> when transaction salt does not match.
> 
> Info output for transaction timeout:
> -------------------
> (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
> -------------------
> 
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
  2024-08-28 21:29   ` [Intel-wired-lan] " Jacob Keller
@ 2024-08-29 15:54   ` Linga, Pavan Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: Linga, Pavan Kumar @ 2024-08-29 15:54 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> The transaction salt was being accessed before acquiring the
> idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
> 
> Fixes: 34c21fa894a1a (“idpf: implement virtchnl transaction manager”)
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>

Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>

> ---
>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 70986e12da28..30eec674d594 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
>   		return -EINVAL;
>   	}
>   	xn = &adapter->vcxn_mngr->ring[xn_idx];
> +	idpf_vc_xn_lock(xn);
>   	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
>   	if (xn->salt != salt) {
>   		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
>   				    xn->salt, salt);
> +		idpf_vc_xn_unlock(xn);
>   		return -EINVAL;
>   	}
>   
> -	idpf_vc_xn_lock(xn);
>   	switch (xn->state) {
>   	case IDPF_VC_XN_WAITING:
>   		/* success */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound Manoj Vishwanathan
  2024-08-28 22:02   ` [Intel-wired-lan] " Jacob Keller
@ 2024-08-29 16:02   ` Linga, Pavan Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: Linga, Pavan Kumar @ 2024-08-29 16:02 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews, Marco Leogrande



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> From: Marco Leogrande <leogrande@google.com>
> 
> When a workqueue is created with `WQ_UNBOUND`, its work items are
> served by special worker-pools, whose host workers are not bound to
> any specific CPU. In the default configuration (i.e. when
> `queue_delayed_work` and friends do not specify which CPU to run the
> work item on), `WQ_UNBOUND` allows the work item to be executed on any
> CPU in the same node of the CPU it was enqueued on. While this
> solution potentially sacrifices locality, it avoids contention with
> other processes that might dominate the CPU time of the processor the
> work item was scheduled on.
> 
> This is not just a theoretical problem: in a praticular scenario > misconfigured process was hogging most of the time from CPU0, leaving
> less than 0.5% of its CPU time to the kworker. The IDPF workqueues
> that were using the kworker on CPU0 suffered large completion delays
> as a result, causing performance degradation, timeouts and eventual
> system crash.
> 
> Tested:
> 
> * I have also run a manual test to gauge the performance
>    improvement. The test consists of an antagonist process
>    (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
>    process is run under `taskset 01` to bind it to CPU0, and its
>    priority is changed with `chrt -pQ 9900 10000 ${pid}` and
>    `renice -n -20 ${pid}` after start.
> 
>    Then, the IDPF driver is forced to prefer CPU0 by editing all calls
>    to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
> 
>    Finally, `ktraces` for the workqueue events are collected.
> 
>    Without the current patch, the antagonist process can force
>    arbitrary delays between `workqueue_queue_work` and
>    `workqueue_execute_start`, that in my tests were as high as
>    `30ms`. With the current patch applied, the workqueue can be
>    migrated to another unloaded CPU in the same node, and, keeping
>    everything else equal, the maximum delay I could see was `6us`.
> 
> Fixes: 0fe45467a1041 (idpf: add create vport and netdev configuration)
> Signed-off-by: Marco Leogrande <leogrande@google.com>
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>

Except the nit (s/praticular/particular) what Jake mentioned, changes 
look good to me.

Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>

> ---
>   drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index db476b3314c8..dfd56fc5ff65 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -174,7 +174,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	pci_set_master(pdev);
>   	pci_set_drvdata(pdev, adapter);
>   
> -	adapter->init_wq = alloc_workqueue("%s-%s-init", 0, 0,
> +	adapter->init_wq = alloc_workqueue("%s-%s-init",
> +					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>   					   dev_driver_string(dev),
>   					   dev_name(dev));
>   	if (!adapter->init_wq) {
> @@ -183,7 +184,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   		goto err_free;
>   	}
>   
> -	adapter->serv_wq = alloc_workqueue("%s-%s-service", 0, 0,
> +	adapter->serv_wq = alloc_workqueue("%s-%s-service",
> +					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>   					   dev_driver_string(dev),
>   					   dev_name(dev));
>   	if (!adapter->serv_wq) {
> @@ -192,7 +194,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   		goto err_serv_wq_alloc;
>   	}
>   
> -	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx", 0, 0,
> +	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx",
> +					  WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>   					  dev_driver_string(dev),
>   					  dev_name(dev));
>   	if (!adapter->mbx_wq) {
> @@ -201,7 +204,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   		goto err_mbx_wq_alloc;
>   	}
>   
> -	adapter->stats_wq = alloc_workqueue("%s-%s-stats", 0, 0,
> +	adapter->stats_wq = alloc_workqueue("%s-%s-stats",
> +					    WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>   					    dev_driver_string(dev),
>   					    dev_name(dev));
>   	if (!adapter->stats_wq) {
> @@ -210,7 +214,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   		goto err_stats_wq_alloc;
>   	}
>   
> -	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event", 0, 0,
> +	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event",
> +					       WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>   					       dev_driver_string(dev),
>   					       dev_name(dev));
>   	if (!adapter->vc_event_wq) {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out Manoj Vishwanathan
  2024-08-28 22:03   ` [Intel-wired-lan] " Jacob Keller
@ 2024-08-29 16:07   ` Linga, Pavan Kumar
  1 sibling, 0 replies; 15+ messages in thread
From: Linga, Pavan Kumar @ 2024-08-29 16:07 UTC (permalink / raw)
  To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> Add more information related to the transaction like cookie, vc_op,
> salt when transaction times out and include similar information
> when transaction salt does not match.
> 
> Info output for transaction timeout:
> -------------------
> (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
> -------------------
> 
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>

Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>

> ---
>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 30eec674d594..d8294f31fdf9 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -517,8 +517,9 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
>   		retval = -ENXIO;
>   		goto only_unlock;
>   	case IDPF_VC_XN_WAITING:
> -		dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
> -				       params->vc_op, params->timeout_ms);
> +		dev_notice_ratelimited(&adapter->pdev->dev,
> +				"Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
> +				params->vc_op, cookie, xn->vc_op, xn->salt, params->timeout_ms);
>   		retval = -ETIME;
>   		break;
>   	case IDPF_VC_XN_COMPLETED_SUCCESS:
> @@ -615,8 +616,8 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
>   	idpf_vc_xn_lock(xn);
>   	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
>   	if (xn->salt != salt) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
> -				    xn->salt, salt);
> +		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
> +				xn->vc_op, xn->salt, xn->state, ctlq_msg->cookie.mbx.chnl_opcode, salt);
>   		idpf_vc_xn_unlock(xn);
>   		return -EINVAL;
>   	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 0/4]
  2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
                   ` (3 preceding siblings ...)
  2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out Manoj Vishwanathan
@ 2024-08-29 20:11 ` Tony Nguyen
  4 siblings, 0 replies; 15+ messages in thread
From: Tony Nguyen @ 2024-08-29 20:11 UTC (permalink / raw)
  To: Manoj Vishwanathan, Przemek Kitszel, David S. Miller,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, google-lan-reviews



On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues
> 
> This patch series addresses several IDPF virtchnl issues:

How are you sending these patches? Are you sending it all in one 
command/send? The threading is not showing up correctly in Patchwork, 
though lore does show it correctly.

> * Improved error reporting for better diagnostics.
> * Fixed locking sequence in virtchnl message handling to avoid potential race conditions.
> * Converted idpf workqueues to unbound to prevent virtchnl processing delays under heavy load.
>
> Previously, CPU-bound kworkers for virtchnl processing could be starved,
> leading to transaction timeouts and connection failures.
> This was particularly problematic when IRQ traffic and user space processes contended for the same CPU.
> 
> By making the workqueues unbound, we ensure virtchnl processing is not tied to a specific CPU,
> improving responsiveness even under high system load.

This should be broken up between bugs and improvements, one for iwl-net 
(fixes) and one for iwl-next (improvements). Patches 1-3 should go to 
iwl-net and 4 to iwl-next.

All the Fixes: are not in the correct format
e.g.
WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> 
("<title line>")' - ie: 'Fixes: 34c21fa894a1 ("idpf: implement virtchnl 
transaction manager")'

They all have 1 extra char. Patches 1 & 3 are missing the '"' and patch 
is using the wrong '"' char.

Finally, what tree are you using for your patches? Some of the patches 
don't apply to the iwl tree or the netdev tree.

For IWL patches, please use these trees (dev-queue branch)

https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue.git/

https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git/

Thanks,
Tony

> ---
> V2:
>   - Dropped patch from Willem
>   - RCS/RCT variable naming
>   - Improved commit message on feddback
> v1: https://lore.kernel.org/netdev/20240813182747.1770032-2-manojvishy@google.com/T/
> 
> David Decotigny (1):
>    idpf: address an rtnl lock splat in tx timeout recovery path
> 
> Manoj Vishwanathan (2):
>    idpf: Acquire the lock before accessing the xn->salt
>    idpf: add more info during virtchnl transaction time out
> 
> Marco Leogrande (1):
>    idpf: convert workqueues to unbound
> 
>   drivers/net/ethernet/intel/idpf/idpf_main.c     | 15 ++++++++++-----
>   drivers/net/ethernet/intel/idpf/idpf_txrx.c     | 14 +++++++++++++-
>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 12 +++++++-----
>   3 files changed, 30 insertions(+), 11 deletions(-)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt
  2024-08-28 21:29   ` [Intel-wired-lan] " Jacob Keller
@ 2024-08-30  6:04     ` Przemek Kitszel
  2024-08-30 21:31       ` Keller, Jacob E
  0 siblings, 1 reply; 15+ messages in thread
From: Przemek Kitszel @ 2024-08-30  6:04 UTC (permalink / raw)
  To: Jacob Keller, Tony Nguyen
  Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan,
	David S. Miller, Eric Dumazet, intel-wired-lan

On 8/28/24 23:29, Jacob Keller wrote:
> 
> 
> On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
>> The transaction salt was being accessed before acquiring the
>> idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
>>
>> Fixes: 34c21fa894a1a (“idpf: implement virtchnl transaction manager”)
>> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
>> ---
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> 
>>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>> index 70986e12da28..30eec674d594 100644
>> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>> @@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
>>   		return -EINVAL;
>>   	}
>>   	xn = &adapter->vcxn_mngr->ring[xn_idx];
>> +	idpf_vc_xn_lock(xn);
> 
> Could look at implementing cleanup.h based locking here so that we could
> use guard or scope_guard and not have to litter the exit paths with unlocks.

only scope_guard() for networking code

> 
> I don't think that needs to be done in this patch, though.

+1

> 
>>   	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
>>   	if (xn->salt != salt) {
>>   		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
>>   				    xn->salt, salt);
>> +		idpf_vc_xn_unlock(xn);
>>   		return -EINVAL;
>>   	}
>>   
>> -	idpf_vc_xn_lock(xn);
>>   	switch (xn->state) {
>>   	case IDPF_VC_XN_WAITING:
>>   		/* success */


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt
  2024-08-30  6:04     ` Przemek Kitszel
@ 2024-08-30 21:31       ` Keller, Jacob E
  0 siblings, 0 replies; 15+ messages in thread
From: Keller, Jacob E @ 2024-08-30 21:31 UTC (permalink / raw)
  To: Kitszel, Przemyslaw, Nguyen, Anthony L
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	google-lan-reviews@googlegroups.com, Manoj Vishwanathan,
	David S. Miller, Eric Dumazet, intel-wired-lan@lists.osuosl.org



> -----Original Message-----
> From: Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>
> Sent: Thursday, August 29, 2024 11:05 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; google-lan-
> reviews@googlegroups.com; Manoj Vishwanathan <manojvishy@google.com>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock
> before accessing the xn->salt
> 
> On 8/28/24 23:29, Jacob Keller wrote:
> >
> >
> > On 8/26/2024 11:10 AM, Manoj Vishwanathan wrote:
> >> The transaction salt was being accessed before acquiring the
> >> idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
> >>
> >> Fixes: 34c21fa894a1a (“idpf: implement virtchnl transaction manager”)
> >> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> >> ---
> >
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> >
> >>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >> index 70986e12da28..30eec674d594 100644
> >> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >> @@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter
> *adapter,
> >>   		return -EINVAL;
> >>   	}
> >>   	xn = &adapter->vcxn_mngr->ring[xn_idx];
> >> +	idpf_vc_xn_lock(xn);
> >
> > Could look at implementing cleanup.h based locking here so that we could
> > use guard or scope_guard and not have to litter the exit paths with unlocks.
> 
> only scope_guard() for networking code
> 

Yea, leaving it as-is is fine. I personally find cleanup-based locking better, but it appears the maintainers and majority feel otherwise.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-08-30 21:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 18:10 [[PATCH v2 iwl-next] v2 0/4] Manoj Vishwanathan
2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 1/4] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
2024-08-28 21:28   ` [Intel-wired-lan] " Jacob Keller
2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 2/4] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
2024-08-28 21:29   ` [Intel-wired-lan] " Jacob Keller
2024-08-30  6:04     ` Przemek Kitszel
2024-08-30 21:31       ` Keller, Jacob E
2024-08-29 15:54   ` Linga, Pavan Kumar
2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 3/4] idpf: convert workqueues to unbound Manoj Vishwanathan
2024-08-28 22:02   ` [Intel-wired-lan] " Jacob Keller
2024-08-29 16:02   ` Linga, Pavan Kumar
2024-08-26 18:10 ` [[PATCH v2 iwl-next] v2 4/4] idpf: add more info during virtchnl transaction time out Manoj Vishwanathan
2024-08-28 22:03   ` [Intel-wired-lan] " Jacob Keller
2024-08-29 16:07   ` Linga, Pavan Kumar
2024-08-29 20:11 ` [Intel-wired-lan] [[PATCH v2 iwl-next] v2 0/4] Tony Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).