* [PATCH v1 0/5] IDPF Virtchnl fixes
@ 2024-08-13 18:27 Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 1/5] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan
This patch series is to enhance IDPF virtchnl error reporting and some
minor fixes to the locking sequence in virtchnl message handling
we encountered while testing.
Also includes a minor clean up with regards to warning we encountered
in controlq section of IDPF.
The issue we had here was a virtchnl processing delay leading to the
"xn->salt" mismatch, transaction timeout and connection not recovering.
This was due to default CPU bounded kworker workqueue for virtchnl message
processing being starved by aggressive userspace load causing the
virtchnl processing to be delayed and causing a transaction timeout.
The reason the virtchnl process kworker was stalled as it
was bound to CPU0 by default and there was immense IRQ traffic to CPU0.
All of the above with an aggressive user space process on the same core
lead to the change from Marco Leogrande to convert the idpf workqueues
to unbound.
Manoj Vishwanathan (3):
idpf: address an rtnl lock splat in tx timeout recovery path
idpf: Acquire the lock before accessing the xn->salt
idpf: more info during virtchnl transaction time out
Marco Leogrande (1):
idpf: convert workqueues to unbound
Willem de Bruijn (1):
idpf: warn on possible ctlq overflow
drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++-----
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 ++++++++++++-
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 21 ++++++++++++++-----
3 files changed, 39 insertions(+), 11 deletions(-)
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v1 1/5] idpf: address an rtnl lock splat in tx timeout recovery path
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
@ 2024-08-13 18:27 ` Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 2/5] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan
Adopt the same pattern as in other places in the code to take the rtnl
lock during hard resets.
Tested the patch by injecting tx timeout in IDPF , observe that idpf
recovers and IDPF comes back reachable
Without this patch causes there is a splat:
[ 270.145214] WARNING: CPU: PID: at net/sched/sch_generic.c:534 dev_watchdog
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index af2879f03b8d..3c01be90fa75 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -4328,14 +4328,26 @@ int idpf_vport_intr_init(struct idpf_vport *vport)
{
char *int_name;
int err;
+ bool hr_reset_in_prog;
err = idpf_vport_intr_init_vec_idx(vport);
if (err)
return err;
idpf_vport_intr_map_vector_to_qs(vport);
+ /**
+ * If we're in normal up path, the stack already takes the
+ * rtnl_lock for us, however, if we're doing up as a part of a
+ * hard reset, we'll need to take the lock ourself before
+ * touching the netdev.
+ */
+ hr_reset_in_prog = test_bit(IDPF_HR_RESET_IN_PROG,
+ vport->adapter->flags);
+ if (hr_reset_in_prog)
+ rtnl_lock();
idpf_vport_intr_napi_add_all(vport);
-
+ if (hr_reset_in_prog)
+ rtnl_unlock();
err = vport->adapter->dev_ops.reg_ops.intr_reg_init(vport);
if (err)
goto unroll_vectors_alloc;
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 2/5] idpf: Acquire the lock before accessing the xn->salt
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 1/5] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
@ 2024-08-13 18:27 ` Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 3/5] idpf: convert workqueues to unbound Manoj Vishwanathan
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan
The transaction salt was being accessed before acquiring the
idpf_vc_xn_lock when idpf has to forward the virtchnl reply.
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 70986e12da28..30eec674d594 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -612,14 +612,15 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
return -EINVAL;
}
xn = &adapter->vcxn_mngr->ring[xn_idx];
+ idpf_vc_xn_lock(xn);
salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
if (xn->salt != salt) {
dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
xn->salt, salt);
+ idpf_vc_xn_unlock(xn);
return -EINVAL;
}
- idpf_vc_xn_lock(xn);
switch (xn->state) {
case IDPF_VC_XN_WAITING:
/* success */
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 3/5] idpf: convert workqueues to unbound
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 1/5] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 2/5] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
@ 2024-08-13 18:27 ` Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 4/5] idpf: more info during virtchnl transaction time out Manoj Vishwanathan
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Marco Leogrande,
Manoj Vishwanathan
From: Marco Leogrande <leogrande@google.com>
When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.
This is not just a theoretical problem: in b/317234476, a
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.
Tested:
* I have also run a manual test to gauge the performance
improvement. The test consists of an antagonist process
(`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
process is run under `taskset 01` to bind it to CPU0, and its
priority is changed with `chrt -pQ 9900 10000 ${pid}` and
`renice -n -20 ${pid}` after start.
Then, the IDPF driver is forced to prefer CPU0 by editing all calls
to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.
Finally, `ktraces` for the workqueue events are collected.
Without the current patch, the antagonist process can force
arbitrary delays between `workqueue_queue_work` and
`workqueue_execute_start`, that in my tests were as high as
`30ms`. With the current patch applied, the workqueue can be
migrated to another unloaded CPU in the same node, and, keeping
everything else equal, the maximum delay I could see was `6us`.
Signed-off-by: Marco Leogrande <leogrande@google.com>
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index db476b3314c8..dfd56fc5ff65 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -174,7 +174,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
pci_set_master(pdev);
pci_set_drvdata(pdev, adapter);
- adapter->init_wq = alloc_workqueue("%s-%s-init", 0, 0,
+ adapter->init_wq = alloc_workqueue("%s-%s-init",
+ WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
dev_driver_string(dev),
dev_name(dev));
if (!adapter->init_wq) {
@@ -183,7 +184,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_free;
}
- adapter->serv_wq = alloc_workqueue("%s-%s-service", 0, 0,
+ adapter->serv_wq = alloc_workqueue("%s-%s-service",
+ WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
dev_driver_string(dev),
dev_name(dev));
if (!adapter->serv_wq) {
@@ -192,7 +194,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_serv_wq_alloc;
}
- adapter->mbx_wq = alloc_workqueue("%s-%s-mbx", 0, 0,
+ adapter->mbx_wq = alloc_workqueue("%s-%s-mbx",
+ WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
dev_driver_string(dev),
dev_name(dev));
if (!adapter->mbx_wq) {
@@ -201,7 +204,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_mbx_wq_alloc;
}
- adapter->stats_wq = alloc_workqueue("%s-%s-stats", 0, 0,
+ adapter->stats_wq = alloc_workqueue("%s-%s-stats",
+ WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
dev_driver_string(dev),
dev_name(dev));
if (!adapter->stats_wq) {
@@ -210,7 +214,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_stats_wq_alloc;
}
- adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event", 0, 0,
+ adapter->vc_event_wq = alloc_workqueue("%s-%s-vc_event",
+ WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
dev_driver_string(dev),
dev_name(dev));
if (!adapter->vc_event_wq) {
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 4/5] idpf: more info during virtchnl transaction time out
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
` (2 preceding siblings ...)
2024-08-13 18:27 ` [PATCH v1 3/5] idpf: convert workqueues to unbound Manoj Vishwanathan
@ 2024-08-13 18:27 ` Manoj Vishwanathan
2024-08-13 19:28 ` [Intel-wired-lan] " Paul Menzel
2024-08-13 18:27 ` [PATCH v1 5/5] idpf: warn on possible ctlq overflow Manoj Vishwanathan
2024-08-15 22:55 ` [PATCH v1 0/5] IDPF Virtchnl fixes Tony Nguyen
5 siblings, 1 reply; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Manoj Vishwanathan
Add more information related to the transaction like cookie, vc_op, salt
when transaction times out and include info like state, vc_op, chnl_opcode
when transaction salt does not match.
Sample output for transaction timeout:
-------------------
Transaction timed-out (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 30eec674d594..07239afb285e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -517,8 +517,9 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
retval = -ENXIO;
goto only_unlock;
case IDPF_VC_XN_WAITING:
- dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
- params->vc_op, params->timeout_ms);
+ dev_notice_ratelimited(&adapter->pdev->dev,
+ "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
+ params->vc_op, cookie, xn->vc_op, xn->salt, params->timeout_ms);
retval = -ETIME;
break;
case IDPF_VC_XN_COMPLETED_SUCCESS:
@@ -615,8 +616,8 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
idpf_vc_xn_lock(xn);
salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
if (xn->salt != salt) {
- dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
- xn->salt, salt);
+ dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
+ xn->vc_op, xn->salt, xn->state, ctlq_msg->cookie.mbx.chnl_opcode, salt);
idpf_vc_xn_unlock(xn);
return -EINVAL;
}
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v1 5/5] idpf: warn on possible ctlq overflow
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
` (3 preceding siblings ...)
2024-08-13 18:27 ` [PATCH v1 4/5] idpf: more info during virtchnl transaction time out Manoj Vishwanathan
@ 2024-08-13 18:27 ` Manoj Vishwanathan
2024-08-13 19:19 ` Willem de Bruijn
2024-08-15 22:55 ` [PATCH v1 0/5] IDPF Virtchnl fixes Tony Nguyen
5 siblings, 1 reply; 9+ messages in thread
From: Manoj Vishwanathan @ 2024-08-13 18:27 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Willem de Bruijn,
Manoj Vishwanathan
From: Willem de Bruijn <willemb@google.com>
The virtchannel control queue is lossy to avoid deadlock. Ensure that
no losses occur in practice. Detect a full queue, when overflows may
have happened.
In practice, virtchnl is synchronous currenty and messages generally
take a single slot. Using up anywhere near the full ring is not
expected.
Tested: Running several traffic tests and no logs seen in the dmesg
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 07239afb285e..1852836d81e4 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -218,6 +218,15 @@ static int idpf_mb_clean(struct idpf_adapter *adapter)
if (err)
goto err_kfree;
+ /* Warn if messages may have been dropped */
+ if (num_q_msg == IDPF_DFLT_MBX_Q_LEN) {
+ static atomic_t mbx_full = ATOMIC_INIT(0);
+ int cnt;
+
+ cnt = atomic_inc_return(&mbx_full);
+ net_warn_ratelimited("%s: ctlq full (%d)\n", __func__, cnt);
+ }
+
for (i = 0; i < num_q_msg; i++) {
if (!q_msg[i])
continue;
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v1 5/5] idpf: warn on possible ctlq overflow
2024-08-13 18:27 ` [PATCH v1 5/5] idpf: warn on possible ctlq overflow Manoj Vishwanathan
@ 2024-08-13 19:19 ` Willem de Bruijn
0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2024-08-13 19:19 UTC (permalink / raw)
To: Manoj Vishwanathan, Tony Nguyen, Przemek Kitszel, David S. Miller,
Eric Dumazet, intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews, Willem de Bruijn,
Manoj Vishwanathan
Manoj Vishwanathan wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> The virtchannel control queue is lossy to avoid deadlock. Ensure that
> no losses occur in practice. Detect a full queue, when overflows may
> have happened.
>
> In practice, virtchnl is synchronous currenty and messages generally
> take a single slot. Using up anywhere near the full ring is not
> expected.
>
> Tested: Running several traffic tests and no logs seen in the dmesg
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
This was an internal patch. Not really intended for upstream as is.
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 07239afb285e..1852836d81e4 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -218,6 +218,15 @@ static int idpf_mb_clean(struct idpf_adapter *adapter)
> if (err)
> goto err_kfree;
>
> + /* Warn if messages may have been dropped */
> + if (num_q_msg == IDPF_DFLT_MBX_Q_LEN) {
> + static atomic_t mbx_full = ATOMIC_INIT(0);
> + int cnt;
> +
> + cnt = atomic_inc_return(&mbx_full);
> + net_warn_ratelimited("%s: ctlq full (%d)\n", __func__, cnt);
A single static variable across all devices.
If this indeed should never happen, a WARN_ON_ONCE will suffice.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v1 4/5] idpf: more info during virtchnl transaction time out
2024-08-13 18:27 ` [PATCH v1 4/5] idpf: more info during virtchnl transaction time out Manoj Vishwanathan
@ 2024-08-13 19:28 ` Paul Menzel
0 siblings, 0 replies; 9+ messages in thread
From: Paul Menzel @ 2024-08-13 19:28 UTC (permalink / raw)
To: Manoj Vishwanathan
Cc: Tony Nguyen, Przemek Kitszel, David S. Miller, Eric Dumazet,
intel-wired-lan, netdev, linux-kernel, google-lan-reviews
Dear Manoj,
Thank you for your patch.
It’d be great if you made the summary a statement, that means, adding a
verb (in imperative mood), like:
idpf: Add more info during virtchnl transaction timeout
Am 13.08.24 um 20:27 schrieb Manoj Vishwanathan:
> Add more information related to the transaction like cookie, vc_op, salt
> when transaction times out and include info like state, vc_op, chnl_opcode
> when transaction salt does not match.
>
> Sample output for transaction timeout:
> -------------------
> Transaction timed-out (op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
> -------------------
>
> Signed-off-by: Manoj Vishwanathan <manojvishy@google.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 30eec674d594..07239afb285e 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -517,8 +517,9 @@ static ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
> retval = -ENXIO;
> goto only_unlock;
> case IDPF_VC_XN_WAITING:
> - dev_notice_ratelimited(&adapter->pdev->dev, "Transaction timed-out (op %d, %dms)\n",
> - params->vc_op, params->timeout_ms);
> + dev_notice_ratelimited(&adapter->pdev->dev,
> + "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
> + params->vc_op, cookie, xn->vc_op, xn->salt, params->timeout_ms);
> retval = -ETIME;
> break;
> case IDPF_VC_XN_COMPLETED_SUCCESS:
> @@ -615,8 +616,8 @@ idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
> idpf_vc_xn_lock(xn);
> salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
> if (xn->salt != salt) {
> - dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (%02x != %02x)\n",
> - xn->salt, salt);
> + dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
> + xn->vc_op, xn->salt, xn->state, ctlq_msg->cookie.mbx.chnl_opcode, salt);
> idpf_vc_xn_unlock(xn);
> return -EINVAL;
> }
Kind regards,
Paul
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v1 0/5] IDPF Virtchnl fixes
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
` (4 preceding siblings ...)
2024-08-13 18:27 ` [PATCH v1 5/5] idpf: warn on possible ctlq overflow Manoj Vishwanathan
@ 2024-08-15 22:55 ` Tony Nguyen
5 siblings, 0 replies; 9+ messages in thread
From: Tony Nguyen @ 2024-08-15 22:55 UTC (permalink / raw)
To: Manoj Vishwanathan, Przemek Kitszel, David S. Miller,
Eric Dumazet, intel-wired-lan
Cc: netdev, linux-kernel, google-lan-reviews
On 8/13/2024 11:27 AM, Manoj Vishwanathan wrote:
Hi Manoj,
It would be great if you could familiarize yourself with the netdev process:
https://docs.kernel.org/process/maintainer-netdev.html#netdev-faq
For the Intel drivers, we follow the same guidelines other than we use
iwl-net|iwl-next rather than net|net-next respectively.
> This patch series is to enhance IDPF virtchnl error reporting and some
> minor fixes to the locking sequence in virtchnl message handling
> we encountered while testing.
> Also includes a minor clean up with regards to warning we encountered
> in controlq section of IDPF.
The bug fixes should be broken out and designated for iwl-net and the
others to iwl-next. Fix patches should include a Fixes: tag.
Also, please use RCT for networking patches (patch 1).
Thanks,
Tony
> The issue we had here was a virtchnl processing delay leading to the
> "xn->salt" mismatch, transaction timeout and connection not recovering.
> This was due to default CPU bounded kworker workqueue for virtchnl message
> processing being starved by aggressive userspace load causing the
> virtchnl processing to be delayed and causing a transaction timeout.
> The reason the virtchnl process kworker was stalled as it
> was bound to CPU0 by default and there was immense IRQ traffic to CPU0.
> All of the above with an aggressive user space process on the same core
> lead to the change from Marco Leogrande to convert the idpf workqueues
> to unbound.
>
> Manoj Vishwanathan (3):
> idpf: address an rtnl lock splat in tx timeout recovery path
> idpf: Acquire the lock before accessing the xn->salt
> idpf: more info during virtchnl transaction time out
>
> Marco Leogrande (1):
> idpf: convert workqueues to unbound
>
> Willem de Bruijn (1):
> idpf: warn on possible ctlq overflow
>
> drivers/net/ethernet/intel/idpf/idpf_main.c | 15 ++++++++-----
> drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 ++++++++++++-
> .../net/ethernet/intel/idpf/idpf_virtchnl.c | 21 ++++++++++++++-----
> 3 files changed, 39 insertions(+), 11 deletions(-)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-15 22:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 18:27 [PATCH v1 0/5] IDPF Virtchnl fixes Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 1/5] idpf: address an rtnl lock splat in tx timeout recovery path Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 2/5] idpf: Acquire the lock before accessing the xn->salt Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 3/5] idpf: convert workqueues to unbound Manoj Vishwanathan
2024-08-13 18:27 ` [PATCH v1 4/5] idpf: more info during virtchnl transaction time out Manoj Vishwanathan
2024-08-13 19:28 ` [Intel-wired-lan] " Paul Menzel
2024-08-13 18:27 ` [PATCH v1 5/5] idpf: warn on possible ctlq overflow Manoj Vishwanathan
2024-08-13 19:19 ` Willem de Bruijn
2024-08-15 22:55 ` [PATCH v1 0/5] IDPF Virtchnl fixes Tony Nguyen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox