* [PATCH v2] net/sched: adjust device watchdog timer to detect stopped queue at right time
@ 2024-05-06 13:59 Praveen Kumar Kannoju
2024-05-08 0:30 ` Jakub Kicinski
0 siblings, 1 reply; 2+ messages in thread
From: Praveen Kumar Kannoju @ 2024-05-06 13:59 UTC (permalink / raw)
To: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel
Cc: rajesh.sivaramasubramaniom, rama.nichanamatlu, manjunath.b.patil,
Praveen Kumar Kannoju
Applications are sensitive to long network latency, particularly
heartbeat monitoring ones. Longer the tx timeout recovery higher the
risk with such applications on a production machines. This patch
remedies, yet honoring device set tx timeout.
Modify watchdog next timeout to be shorter than the device specified.
Compute the next timeout be equal to device watchdog timeout less the
how long ago queue stop had been done. At next watchdog timeout tx
timeout handler is called into if still in stopped state. Either called
or not called, restore the watchdog timeout back to device specified.
Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
---
v2:
- Identify the oldest trans_start from all the queues and use it.
v1: https://lore.kernel.org/netdev/20240430140010.5005-1-praveen.kannoju@oracle.com/
---
net/sched/sch_generic.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 4a2c763e2d11..840b995c7233 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -506,19 +506,22 @@ static void dev_watchdog(struct timer_list *t)
unsigned int timedout_ms = 0;
unsigned int i;
unsigned long trans_start;
+ unsigned long oldest_start = jiffies;
for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq;
txq = netdev_get_tx_queue(dev, i);
trans_start = READ_ONCE(txq->trans_start);
- if (netif_xmit_stopped(txq) &&
- time_after(jiffies, (trans_start +
- dev->watchdog_timeo))) {
+ if (!netif_xmit_stopped(txq))
+ continue;
+ if (time_after(jiffies, (trans_start + dev->watchdog_timeo))) {
timedout_ms = jiffies_to_msecs(jiffies - trans_start);
atomic_long_inc(&txq->trans_timeout);
break;
}
+ if (time_after(oldest_start, trans_start))
+ oldest_start = trans_start;
}
if (unlikely(timedout_ms)) {
@@ -531,7 +534,7 @@ static void dev_watchdog(struct timer_list *t)
netif_unfreeze_queues(dev);
}
if (!mod_timer(&dev->watchdog_timer,
- round_jiffies(jiffies +
+ round_jiffies(oldest_start +
dev->watchdog_timeo)))
release = false;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH v2] net/sched: adjust device watchdog timer to detect stopped queue at right time
2024-05-06 13:59 [PATCH v2] net/sched: adjust device watchdog timer to detect stopped queue at right time Praveen Kumar Kannoju
@ 2024-05-08 0:30 ` Jakub Kicinski
0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2024-05-08 0:30 UTC (permalink / raw)
To: Praveen Kumar Kannoju
Cc: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel,
rajesh.sivaramasubramaniom, rama.nichanamatlu, manjunath.b.patil
LGTM!
One nit..
On Mon, 6 May 2024 19:29:44 +0530 Praveen Kumar Kannoju wrote:
> + if (time_after(jiffies, (trans_start + dev->watchdog_timeo))) {
^ ^
Would you mind dropping these brackets while you're touching this line?
They are unnecessary.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-05-08 0:30 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-06 13:59 [PATCH v2] net/sched: adjust device watchdog timer to detect stopped queue at right time Praveen Kumar Kannoju
2024-05-08 0:30 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).