netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [iwl-next PATCH v3 0/3] IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues
@ 2024-12-12 23:33 Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 1/3] idpf: Acquire the lock before accessing the xn->salt Brian Vazquez
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Brian Vazquez @ 2024-12-12 23:33 UTC (permalink / raw)
  To: Brian Vazquez, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan
  Cc: David Decotigny, Vivek Kumar, Anjali Singhai, Sridhar Samudrala,
	linux-kernel, netdev, emil.s.tantilov, Brian Vazquez

This patch series addresses several IDPF virtchnl issues:

* Improved error reporting for better diagnostics.
* Fixed locking sequence in virtchnl message handling to avoid potential race conditions.
* Converted idpf workqueues to unbound to prevent virtchnl processing delays under heavy load.

Previously, CPU-bound kworkers for virtchnl processing could be starved,
leading to transaction timeouts and connection failures.
This was particularly problematic when IRQ traffic and user space processes contended for the same CPU. 

By making the workqueues unbound, we ensure virtchnl processing is not tied to a specific CPU,
improving responsiveness even under high system load.

---
V3:
 - Taking over Manoj's v2 series
 - Dropped "idpf: address an rtnl lock splat in tx timeout recovery
   path" it needs more rework and will be submitted later
 - Addresed nit typo
 - Addresed checkpatch.pl errors and warnings
V2:
 - Dropped patch from Willem
 - RCS/RCT variable naming
 - Improved commit message on feedback
v1: https://lore.kernel.org/netdev/20240813182747.1770032-2-manojvishy@google.com/T/


Manoj Vishwanathan (2):
  idpf: Acquire the lock before accessing the xn->salt
  idpf: add more info during virtchnl transaction time out

Marco Leogrande (1):
  idpf: convert workqueues to unbound

 drivers/net/ethernet/intel/idpf/idpf_main.c     | 15 ++++++++++-----
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 16 +++++++++++-----
 2 files changed, 21 insertions(+), 10 deletions(-)

-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [iwl-next PATCH v3 1/3] idpf: Acquire the lock before accessing the xn->salt
  2024-12-12 23:33 [iwl-next PATCH v3 0/3] IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues Brian Vazquez
@ 2024-12-12 23:33 ` Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out Brian Vazquez
  2 siblings, 0 replies; 8+ messages in thread
From: Brian Vazquez @ 2024-12-12 23:33 UTC (permalink / raw)
  To: Brian Vazquez, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan
  Cc: David Decotigny, Vivek Kumar, Anjali Singhai, Sridhar Samudrala,
	linux-kernel, netdev, emil.s.tantilov, Manoj Vishwanathan,
	Brian Vazquez, Jacob Keller, Pavan Kumar Linga

From: Manoj Vishwanathan <manojvishy@google.com>

The transaction salt was being accessed before acquiring the
idpf_vc_xn_lock when idpf has to forward the virtchnl reply.

Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index d46c95f91b0d..13274544f7f4 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
 		return -EINVAL;
 	}
 	xn = &adapter->vcxn_mngr->ring[xn_idx];
+	idpf_vc_xn_lock(xn);
 	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
 	if (xn->salt != salt) {
 		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
 				    xn->salt, salt);
+		idpf_vc_xn_unlock(xn);
 		return -EINVAL;
 	}
 
-	idpf_vc_xn_lock(xn);
 	switch (xn->state) {
 	case IDPF_VC_XN_WAITING:
 		/* success */
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound
  2024-12-12 23:33 [iwl-next PATCH v3 0/3] IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 1/3] idpf: Acquire the lock before accessing the xn->salt Brian Vazquez
@ 2024-12-12 23:33 ` Brian Vazquez
  2024-12-13 12:36   ` Hillf Danton
  2024-12-12 23:33 ` [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out Brian Vazquez
  2 siblings, 1 reply; 8+ messages in thread
From: Brian Vazquez @ 2024-12-12 23:33 UTC (permalink / raw)
  To: Brian Vazquez, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan
  Cc: David Decotigny, Vivek Kumar, Anjali Singhai, Sridhar Samudrala,
	linux-kernel, netdev, emil.s.tantilov, Marco Leogrande,
	Manoj Vishwanathan, Brian Vazquez, Jacob Keller,
	Pavan Kumar Linga

From: Marco Leogrande <leogrande@google.com>

When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.

This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.

Tested:

* I have also run a manual test to gauge the performance
  improvement. The test consists of an antagonist process
  (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
  process is run under `taskset 01` to bind it to CPU0, and its
  priority is changed with `chrt -pQ 9900 10000 ${pid}` and
  `renice -n -20 ${pid}` after start.

  Then, the IDPF driver is forced to prefer CPU0 by editing all calls
  to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.

  Finally, `ktraces` for the workqueue events are collected.

  Without the current patch, the antagonist process can force
  arbitrary delays between `workqueue_queue_work` and
  `workqueue_execute_start`, that in my tests were as high as
  `30ms`. With the current patch applied, the workqueue can be
  migrated to another unloaded CPU in the same node, and, keeping
  everything else equal, the maximum delay I could see was `6us`.

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Signed-off-by: Marco Leogrande <leogrande@google.com>
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index 305958c4c230..da1e3525719f 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -198,7 +198,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_master(pdev);
 	pci_set_drvdata(pdev, adapter);
 
-	adapter->init_wq = alloc_workqueue("%s-%s-init", 0, 0,
+	adapter->init_wq = alloc_workqueue("%s-%s-init",
+					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					   dev_driver_string(dev),
 					   dev_name(dev));
 	if (!adapter->init_wq) {
@@ -207,7 +208,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_free;
 	}
 
-	adapter->serv_wq = alloc_workqueue("%s-%s-service", 0, 0,
+	adapter->serv_wq = alloc_workqueue("%s-%s-service",
+					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					   dev_driver_string(dev),
 					   dev_name(dev));
 	if (!adapter->serv_wq) {
@@ -216,7 +218,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_serv_wq_alloc;
 	}
 
-	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx", 0, 0,
+	adapter->mbx_wq = alloc_workqueue("%s-%s-mbx",
+					  WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					  dev_driver_string(dev),
 					  dev_name(dev));
 	if (!adapter->mbx_wq) {
@@ -225,7 +228,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_mbx_wq_alloc;
 	}
 
-	adapter->stats_wq = alloc_workqueue("%s-%s-stats", 0, 0,
+	adapter->stats_wq = alloc_workqueue("%s-%s-stats",
+					    WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					    dev_driver_string(dev),
 					    dev_name(dev));
 	if (!adapter->stats_wq) {
@@ -234,7 +238,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_stats_wq_alloc;
 	}
 
-	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event", 0, 0,
+	adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event",
+					       WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
 					       dev_driver_string(dev),
 					       dev_name(dev));
 	if (!adapter->vc_event_wq) {
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out
  2024-12-12 23:33 [iwl-next PATCH v3 0/3] IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 1/3] idpf: Acquire the lock before accessing the xn->salt Brian Vazquez
  2024-12-12 23:33 ` [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound Brian Vazquez
@ 2024-12-12 23:33 ` Brian Vazquez
  2024-12-13  9:36   ` [Intel-wired-lan] " Paul Menzel
  2 siblings, 1 reply; 8+ messages in thread
From: Brian Vazquez @ 2024-12-12 23:33 UTC (permalink / raw)
  To: Brian Vazquez, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan
  Cc: David Decotigny, Vivek Kumar, Anjali Singhai, Sridhar Samudrala,
	linux-kernel, netdev, emil.s.tantilov, Manoj Vishwanathan,
	Brian Vazquez, Jacob Keller, Pavan Kumar Linga

From: Manoj Vishwanathan <manojvishy@google.com>

Add more information related to the transaction like cookie, vc_op,
salt when transaction times out and include similar information
when transaction salt does not match.

Info output for transaction timeout:
-------------------
(op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------

Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 13274544f7f4..c7d82f142f4e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -517,8 +516,10 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
 		retval = -ENXIO;
 		goto only_unlock;
 	case IDPF_VC_XN_WAITING:
-		dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
-				       params->vc_op, params->timeout_ms);
+		dev_notice_ratelimited(&adapter->pdev->dev,
+				       "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
+				       params->vc_op, cookie, xn->vc_op,
+				       xn->salt, params->timeout_ms);
 		retval = -ETIME;
 		break;
 	case IDPF_VC_XN_COMPLETED_SUCCESS:
@@ -615,8 +613,9 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
 	idpf_vc_xn_lock(xn);
 	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
 	if (xn->salt != salt) {
-		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
-				    xn->salt, salt);
+		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
+				    xn->vc_op, xn->salt, xn->state,
+				    ctlq_msg->cookie.mbx.chnl_opcode, salt);
 		idpf_vc_xn_unlock(xn);
 		return -EINVAL;
 	}
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out
  2024-12-12 23:33 ` [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out Brian Vazquez
@ 2024-12-13  9:36   ` Paul Menzel
  2024-12-16 16:25     ` Brian Vazquez
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Menzel @ 2024-12-13  9:36 UTC (permalink / raw)
  To: Brian Vazquez, Manoj Vishwanathan
  Cc: Brian Vazquez, Tony Nguyen, Przemek Kitszel, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
	David Decotigny, Vivek Kumar, Anjali Singhai, Sridhar Samudrala,
	linux-kernel, netdev, emil.s.tantilov, Jacob Keller,
	Pavan Kumar Linga

Dear Brian, dear Manoj,


Thank you for your patch.

Am 13.12.24 um 00:33 schrieb Brian Vazquez:
> From: Manoj Vishwanathan <manojvishy@google.com>
> 
> Add more information related to the transaction like cookie, vc_op,
> salt when transaction times out and include similar information
> when transaction salt does not match.

If possible, the salt mismatch should also go into the summary/title. Maybe:

idpf: Add more info during virtchnl transaction timeout/salt mismatch

> Info output for transaction timeout:
> -------------------
> (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
> -------------------

For easier comparison, before it was:

(op 5015, 60000ms)

> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> ---
>   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 13274544f7f4..c7d82f142f4e 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -517,8 +516,10 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
>   		retval = -ENXIO;
>   		goto only_unlock;
>   	case IDPF_VC_XN_WAITING:
> -		dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
> -				       params->vc_op, params->timeout_ms);
> +		dev_notice_ratelimited(&adapter->pdev->dev,
> +				       "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
> +				       params->vc_op, cookie, xn->vc_op,
> +				       xn->salt, params->timeout_ms);
>   		retval = -ETIME;
>   		break;
>   	case IDPF_VC_XN_COMPLETED_SUCCESS:
> @@ -615,8 +613,9 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
>   	idpf_vc_xn_lock(xn);
>   	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
>   	if (xn->salt != salt) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
> -				    xn->salt, salt);
> +		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
> +				    xn->vc_op, xn->salt, xn->state,
> +				    ctlq_msg->cookie.mbx.chnl_opcode, salt);
>   		idpf_vc_xn_unlock(xn);
>   		return -EINVAL;
>   	}

Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound
  2024-12-12 23:33 ` [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound Brian Vazquez
@ 2024-12-13 12:36   ` Hillf Danton
  2024-12-16 16:35     ` Brian Vazquez
  0 siblings, 1 reply; 8+ messages in thread
From: Hillf Danton @ 2024-12-13 12:36 UTC (permalink / raw)
  To: Brian Vazquez, Marco Leogrande; +Cc: Eric Dumazet, linux-kernel, netdev

On Thu, 12 Dec 2024 23:33:32 +0000 Brian Vazquez <brianvv@google.com>
> When a workqueue is created with `WQ_UNBOUND`, its work items are
> served by special worker-pools, whose host workers are not bound to
> any specific CPU. In the default configuration (i.e. when
> `queue_delayed_work` and friends do not specify which CPU to run the
> work item on), `WQ_UNBOUND` allows the work item to be executed on any
> CPU in the same node of the CPU it was enqueued on. While this
> solution potentially sacrifices locality, it avoids contention with
> other processes that might dominate the CPU time of the processor the
> work item was scheduled on.
> 
> This is not just a theoretical problem: in a particular scenario

The cpu hog due to (the user space) misconfig exists regardless it is
bound workqueue or not, in addition to the fact that linux kernel is
never the blue pill to kill all pains, so extra support for unbound wq
is needed.

> misconfigured process was hogging most of the time from CPU0, leaving
> less than 0.5% of its CPU time to the kworker. The IDPF workqueues
> that were using the kworker on CPU0 suffered large completion delays
> as a result, causing performance degradation, timeouts and eventual
> system crash.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out
  2024-12-13  9:36   ` [Intel-wired-lan] " Paul Menzel
@ 2024-12-16 16:25     ` Brian Vazquez
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Vazquez @ 2024-12-16 16:25 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Manoj Vishwanathan, Brian Vazquez, Tony Nguyen, Przemek Kitszel,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	intel-wired-lan, David Decotigny, Vivek Kumar, Anjali Singhai,
	Sridhar Samudrala, linux-kernel, netdev, emil.s.tantilov,
	Jacob Keller, Pavan Kumar Linga

Thanks for the feedback, I will address it in the next version.

On Fri, Dec 13, 2024 at 4:36 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Brian, dear Manoj,
>
>
> Thank you for your patch.
>
> Am 13.12.24 um 00:33 schrieb Brian Vazquez:
> > From: Manoj Vishwanathan <manojvishy@google.com>
> >
> > Add more information related to the transaction like cookie, vc_op,
> > salt when transaction times out and include similar information
> > when transaction salt does not match.
>
> If possible, the salt mismatch should also go into the summary/title. Maybe:
>
> idpf: Add more info during virtchnl transaction timeout/salt mismatch
>
> > Info output for transaction timeout:
> > -------------------
> > (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
> > -------------------
>
> For easier comparison, before it was:
>
> (op 5015, 60000ms)
>
> > Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> > ---
> >   drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 13 +++++++++----
> >   1 file changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> > index 13274544f7f4..c7d82f142f4e 100644
> > --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> > +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> > @@ -517,8 +516,10 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
> >               retval = -ENXIO;
> >               goto only_unlock;
> >       case IDPF_VC_XN_WAITING:
> > -             dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
> > -                                    params->vc_op, params->timeout_ms);
> > +             dev_notice_ratelimited(&adapter->pdev->dev,
> > +                                    "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
> > +                                    params->vc_op, cookie, xn->vc_op,
> > +                                    xn->salt, params->timeout_ms);
> >               retval = -ETIME;
> >               break;
> >       case IDPF_VC_XN_COMPLETED_SUCCESS:
> > @@ -615,8 +613,9 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
> >       idpf_vc_xn_lock(xn);
> >       salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
> >       if (xn->salt != salt) {
> > -             dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
> > -                                 xn->salt, salt);
> > +             dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
> > +                                 xn->vc_op, xn->salt, xn->state,
> > +                                 ctlq_msg->cookie.mbx.chnl_opcode, salt);
> >               idpf_vc_xn_unlock(xn);
> >               return -EINVAL;
> >       }
>
> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound
  2024-12-13 12:36   ` Hillf Danton
@ 2024-12-16 16:35     ` Brian Vazquez
  0 siblings, 0 replies; 8+ messages in thread
From: Brian Vazquez @ 2024-12-16 16:35 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Marco Leogrande, Eric Dumazet, linux-kernel, netdev

On Fri, Dec 13, 2024 at 7:51 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 12 Dec 2024 23:33:32 +0000 Brian Vazquez <brianvv@google.com>
> > When a workqueue is created with `WQ_UNBOUND`, its work items are
> > served by special worker-pools, whose host workers are not bound to
> > any specific CPU. In the default configuration (i.e. when
> > `queue_delayed_work` and friends do not specify which CPU to run the
> > work item on), `WQ_UNBOUND` allows the work item to be executed on any
> > CPU in the same node of the CPU it was enqueued on. While this
> > solution potentially sacrifices locality, it avoids contention with
> > other processes that might dominate the CPU time of the processor the
> > work item was scheduled on.
> >
> > This is not just a theoretical problem: in a particular scenario
>
> The cpu hog due to (the user space) misconfig exists regardless it is
> bound workqueue or not, in addition to the fact that linux kernel is
> never the blue pill to kill all pains, so extra support for unbound wq
> is needed.
>

I agree that misconfig could exist even with unbound wq. Still unbound wq
gives the process an opportunity to run if resources are available, if
not, it means
that system is under stress and users should take a deeper look anyway.

> > misconfigured process was hogging most of the time from CPU0, leaving
> > less than 0.5% of its CPU time to the kworker. The IDPF workqueues
> > that were using the kworker on CPU0 suffered large completion delays
> > as a result, causing performance degradation, timeouts and eventual
> > system crash.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-16 16:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-12 23:33 [iwl-next PATCH v3 0/3] IDPF Virtchnl: Enhance error reporting & fix locking/workqueue issues Brian Vazquez
2024-12-12 23:33 ` [iwl-next PATCH v3 1/3] idpf: Acquire the lock before accessing the xn->salt Brian Vazquez
2024-12-12 23:33 ` [iwl-next PATCH v3 2/3] idpf: convert workqueues to unbound Brian Vazquez
2024-12-13 12:36   ` Hillf Danton
2024-12-16 16:35     ` Brian Vazquez
2024-12-12 23:33 ` [iwl-next PATCH v3 3/3] idpf: add more info during virtchnl transaction time out Brian Vazquez
2024-12-13  9:36   ` [Intel-wired-lan] " Paul Menzel
2024-12-16 16:25     ` Brian Vazquez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).