* [PATCHv2 0/2] ipoib bugfix
@ 2023-11-21 13:03 Jack Wang
2023-11-21 13:03 ` [PATCHv2 1/2] ipoib: Fix error code return in ipoib_mcast_join Jack Wang
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Jack Wang @ 2023-11-21 13:03 UTC (permalink / raw)
To: linux-rdma; +Cc: leon, jgg
We run into queue timeout often with call trace as such:
NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
Call Trace:
call_timer_fn+0x27/0x100
__run_timers.part.0+0x1be/0x230
? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
run_timer_softirq+0x26/0x50
__do_softirq+0xbc/0x26d
asm_call_irq_on_stack+0xf/0x20
ib0.beef: transmit timeout: latency 10 msecs
ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0
The last two message repeated for days.
After cross check with Mellanox OFED, I noticed some bugfix are missing in
upstream, hence I take the liberty to send them out.
Thx!
v2:
Fix the build error due to napi api change in v6.7
Jack Wang (2):
ipoib: Fix error code return in ipoib_mcast_join
ipoib: Add tx timeout work to recover queue stop situation
drivers/infiniband/ulp/ipoib/ipoib.h | 4 +++
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 26 ++++++++++++++-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 +++++++++++++++++--
.../infiniband/ulp/ipoib/ipoib_multicast.c | 1 +
4 files changed, 61 insertions(+), 3 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCHv2 1/2] ipoib: Fix error code return in ipoib_mcast_join
2023-11-21 13:03 [PATCHv2 0/2] ipoib bugfix Jack Wang
@ 2023-11-21 13:03 ` Jack Wang
2023-11-21 13:03 ` [PATCHv2 2/2] ipoib: Add tx timeout work to recover queue stop situation Jack Wang
2023-11-26 9:34 ` [PATCHv2 0/2] ipoib bugfix Leon Romanovsky
2 siblings, 0 replies; 4+ messages in thread
From: Jack Wang @ 2023-11-21 13:03 UTC (permalink / raw)
To: linux-rdma; +Cc: leon, jgg
Return the error code in case of ib_sa_join_multicast fail.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 5b3154503bf4..9e6967a40042 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -546,6 +546,7 @@ static int ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast)
spin_unlock_irq(&priv->lock);
complete(&mcast->done);
spin_lock_irq(&priv->lock);
+ return ret;
}
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCHv2 2/2] ipoib: Add tx timeout work to recover queue stop situation
2023-11-21 13:03 [PATCHv2 0/2] ipoib bugfix Jack Wang
2023-11-21 13:03 ` [PATCHv2 1/2] ipoib: Fix error code return in ipoib_mcast_join Jack Wang
@ 2023-11-21 13:03 ` Jack Wang
2023-11-26 9:34 ` [PATCHv2 0/2] ipoib bugfix Leon Romanovsky
2 siblings, 0 replies; 4+ messages in thread
From: Jack Wang @ 2023-11-21 13:03 UTC (permalink / raw)
To: linux-rdma; +Cc: leon, jgg
As we sometime run into tx timeout from ipoib, queue seems stopped
and can't recover. Diff with mellanox OFED show
mellanox driver has timeout work to recover in such case.
Add tx timeout work/napi work to recover such case.
Also increase the watchdog_timeo to 10 seconds, so more tolerant to
error.
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
---
drivers/infiniband/ulp/ipoib/ipoib.h | 4 +++
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 26 +++++++++++++++++-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 +++++++++++++++++++++--
3 files changed, 60 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 35e9c8a330e2..963e936da5e3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -351,10 +351,12 @@ struct ipoib_dev_priv {
struct workqueue_struct *wq;
struct delayed_work mcast_task;
struct work_struct carrier_on_task;
+ struct work_struct reschedule_napi_work;
struct work_struct flush_light;
struct work_struct flush_normal;
struct work_struct flush_heavy;
struct work_struct restart_task;
+ struct work_struct tx_timeout_work;
struct delayed_work ah_reap_task;
struct delayed_work neigh_reap_task;
struct ib_device *ca;
@@ -499,6 +501,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb,
struct ib_ah *address, u32 dqpn);
void ipoib_reap_ah(struct work_struct *work);
+void ipoib_napi_schedule_work(struct work_struct *work);
struct ipoib_path *__path_find(struct net_device *dev, void *gid);
void ipoib_mark_paths_invalid(struct net_device *dev);
void ipoib_flush_paths(struct net_device *dev);
@@ -510,6 +513,7 @@ void ipoib_ib_tx_timer_func(struct timer_list *t);
void ipoib_ib_dev_flush_light(struct work_struct *work);
void ipoib_ib_dev_flush_normal(struct work_struct *work);
void ipoib_ib_dev_flush_heavy(struct work_struct *work);
+void ipoib_ib_tx_timeout_work(struct work_struct *work);
void ipoib_pkey_event(struct work_struct *work);
void ipoib_ib_dev_cleanup(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 7f84d9866cef..5cde275daa94 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -531,11 +531,35 @@ void ipoib_ib_rx_completion(struct ib_cq *cq, void *ctx_ptr)
napi_schedule(&priv->recv_napi);
}
+/* The function will force napi_schedule */
+void ipoib_napi_schedule_work(struct work_struct *work)
+{
+ struct ipoib_dev_priv *priv =
+ container_of(work, struct ipoib_dev_priv, reschedule_napi_work);
+ bool ret;
+
+ do {
+ ret = napi_schedule(&priv->send_napi);
+ if (!ret)
+ msleep(3);
+ } while (!ret && netif_queue_stopped(priv->dev) &&
+ test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags));
+}
+
void ipoib_ib_tx_completion(struct ib_cq *cq, void *ctx_ptr)
{
struct ipoib_dev_priv *priv = ctx_ptr;
+ bool ret;
- napi_schedule(&priv->send_napi);
+ ret = napi_schedule(&priv->send_napi);
+ /*
+ * if the queue is closed the driver must be able to schedule napi,
+ * otherwise we can end with closed queue forever, because no new
+ * packets to send and napi callback might not get new event after
+ * its re-arm of the napi.
+ */
+ if (!ret && netif_queue_stopped(priv->dev))
+ schedule_work(&priv->reschedule_napi_work);
}
static inline int post_send(struct ipoib_dev_priv *priv,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 967004ccad98..7a5be705d718 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1200,7 +1200,34 @@ static void ipoib_timeout(struct net_device *dev, unsigned int txqueue)
netif_queue_stopped(dev), priv->tx_head, priv->tx_tail,
priv->global_tx_head, priv->global_tx_tail);
- /* XXX reset QP, etc. */
+
+ schedule_work(&priv->tx_timeout_work);
+}
+
+void ipoib_ib_tx_timeout_work(struct work_struct *work)
+{
+ struct ipoib_dev_priv *priv = container_of(work,
+ struct ipoib_dev_priv,
+ tx_timeout_work);
+ int err;
+
+ rtnl_lock();
+
+ if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+ goto unlock;
+
+ ipoib_stop(priv->dev);
+ err = ipoib_open(priv->dev);
+ if (err) {
+ ipoib_warn(priv, "ipoib_open failed recovering from a tx_timeout, err(%d).\n",
+ err);
+ goto unlock;
+ }
+
+ netif_tx_wake_all_queues(priv->dev);
+unlock:
+ rtnl_unlock();
+
}
static int ipoib_hard_header(struct sk_buff *skb,
@@ -2112,7 +2139,7 @@ void ipoib_setup_common(struct net_device *dev)
ipoib_set_ethtool_ops(dev);
- dev->watchdog_timeo = HZ;
+ dev->watchdog_timeo = 10 * HZ;
dev->flags |= IFF_BROADCAST | IFF_MULTICAST;
@@ -2150,10 +2177,12 @@ static void ipoib_build_priv(struct net_device *dev)
INIT_DELAYED_WORK(&priv->mcast_task, ipoib_mcast_join_task);
INIT_WORK(&priv->carrier_on_task, ipoib_mcast_carrier_on_task);
+ INIT_WORK(&priv->reschedule_napi_work, ipoib_napi_schedule_work);
INIT_WORK(&priv->flush_light, ipoib_ib_dev_flush_light);
INIT_WORK(&priv->flush_normal, ipoib_ib_dev_flush_normal);
INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy);
INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task);
+ INIT_WORK(&priv->tx_timeout_work, ipoib_ib_tx_timeout_work);
INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah);
INIT_DELAYED_WORK(&priv->neigh_reap_task, ipoib_reap_neigh);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCHv2 0/2] ipoib bugfix
2023-11-21 13:03 [PATCHv2 0/2] ipoib bugfix Jack Wang
2023-11-21 13:03 ` [PATCHv2 1/2] ipoib: Fix error code return in ipoib_mcast_join Jack Wang
2023-11-21 13:03 ` [PATCHv2 2/2] ipoib: Add tx timeout work to recover queue stop situation Jack Wang
@ 2023-11-26 9:34 ` Leon Romanovsky
2 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2023-11-26 9:34 UTC (permalink / raw)
To: linux-rdma, Jack Wang; +Cc: jgg
On Tue, 21 Nov 2023 14:03:14 +0100, Jack Wang wrote:
> We run into queue timeout often with call trace as such:
> NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
> Call Trace:
> call_timer_fn+0x27/0x100
> __run_timers.part.0+0x1be/0x230
> ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
> run_timer_softirq+0x26/0x50
> __do_softirq+0xbc/0x26d
> asm_call_irq_on_stack+0xf/0x20
> ib0.beef: transmit timeout: latency 10 msecs
> ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0
>
> [...]
Applied, thanks!
[1/2] ipoib: Fix error code return in ipoib_mcast_join
https://git.kernel.org/rdma/rdma/c/753fff78f43070
[2/2] ipoib: Add tx timeout work to recover queue stop situation
https://git.kernel.org/rdma/rdma/c/50af5d12f7e24b
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-11-26 9:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21 13:03 [PATCHv2 0/2] ipoib bugfix Jack Wang
2023-11-21 13:03 ` [PATCHv2 1/2] ipoib: Fix error code return in ipoib_mcast_join Jack Wang
2023-11-21 13:03 ` [PATCHv2 2/2] ipoib: Add tx timeout work to recover queue stop situation Jack Wang
2023-11-26 9:34 ` [PATCHv2 0/2] ipoib bugfix Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox