* [PATCH] net: sched: print jiffies when transmit queue time out
@ 2023-04-19 11:56 Yajun Deng
2023-04-19 12:02 ` Eric Dumazet
2023-04-20 1:27 ` Jakub Kicinski
0 siblings, 2 replies; 5+ messages in thread
From: Yajun Deng @ 2023-04-19 11:56 UTC (permalink / raw)
To: jhs, xiyou.wangcong, jiri, davem, edumazet, kuba, pabeni
Cc: netdev, linux-kernel, Yajun Deng
Although there is watchdog_timeo to let users know when the transmit queue
begin stall, but dev_watchdog() is called with an interval. The jiffies
will always be greater than watchdog_timeo.
To let users know the exact time the stall started, print jiffies when
the transmit queue time out.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
---
net/sched/sch_generic.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a9aadc4e6858..67b78e260d28 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -502,7 +502,7 @@ static void dev_watchdog(struct timer_list *t)
if (netif_device_present(dev) &&
netif_running(dev) &&
netif_carrier_ok(dev)) {
- int some_queue_timedout = 0;
+ unsigned long some_queue_timedout = 0;
unsigned int i;
unsigned long trans_start;
@@ -514,7 +514,7 @@ static void dev_watchdog(struct timer_list *t)
if (netif_xmit_stopped(txq) &&
time_after(jiffies, (trans_start +
dev->watchdog_timeo))) {
- some_queue_timedout = 1;
+ some_queue_timedout = jiffies - trans_start;
atomic_long_inc(&txq->trans_timeout);
break;
}
@@ -522,8 +522,9 @@ static void dev_watchdog(struct timer_list *t)
if (unlikely(some_queue_timedout)) {
trace_net_dev_xmit_timeout(dev, i);
- WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
- dev->name, netdev_drivername(dev), i);
+ WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): \
+ transmit queue %u timed out %lu jiffies\n",
+ dev->name, netdev_drivername(dev), i, some_queue_timedout);
netif_freeze_queues(dev);
dev->netdev_ops->ndo_tx_timeout(dev, i);
netif_unfreeze_queues(dev);
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] net: sched: print jiffies when transmit queue time out
2023-04-19 11:56 [PATCH] net: sched: print jiffies when transmit queue time out Yajun Deng
@ 2023-04-19 12:02 ` Eric Dumazet
2023-04-20 2:49 ` Yajun Deng
2023-04-20 1:27 ` Jakub Kicinski
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2023-04-19 12:02 UTC (permalink / raw)
To: Yajun Deng
Cc: jhs, xiyou.wangcong, jiri, davem, kuba, pabeni, netdev,
linux-kernel
On Wed, Apr 19, 2023 at 1:56 PM Yajun Deng <yajun.deng@linux.dev> wrote:
>
> Although there is watchdog_timeo to let users know when the transmit queue
> begin stall, but dev_watchdog() is called with an interval. The jiffies
> will always be greater than watchdog_timeo.
>
> To let users know the exact time the stall started, print jiffies when
> the transmit queue time out.
>
> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> ---
> atomic_long_inc(&txq->trans_timeout);
> break;
> }
> @@ -522,8 +522,9 @@ static void dev_watchdog(struct timer_list *t)
>
> if (unlikely(some_queue_timedout)) {
> trace_net_dev_xmit_timeout(dev, i);
> - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
> - dev->name, netdev_drivername(dev), i);
> + WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): \
> + transmit queue %u timed out %lu jiffies\n",
> + dev->name, netdev_drivername(dev), i, some_queue_timedout);
If we really want this, I suggest we export a time in ms units, using
jiffies_to_msecs()
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net: sched: print jiffies when transmit queue time out
2023-04-19 11:56 [PATCH] net: sched: print jiffies when transmit queue time out Yajun Deng
2023-04-19 12:02 ` Eric Dumazet
@ 2023-04-20 1:27 ` Jakub Kicinski
2023-04-20 2:17 ` Yajun Deng
1 sibling, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2023-04-20 1:27 UTC (permalink / raw)
To: Yajun Deng
Cc: jhs, xiyou.wangcong, jiri, davem, edumazet, pabeni, netdev,
linux-kernel
On Wed, 19 Apr 2023 19:56:32 +0800 Yajun Deng wrote:
> Although there is watchdog_timeo to let users know when the transmit queue
> begin stall, but dev_watchdog() is called with an interval. The jiffies
> will always be greater than watchdog_timeo.
>
> To let users know the exact time the stall started, print jiffies when
> the transmit queue time out.
Please add an explanation of how this information is useful in practice.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net: sched: print jiffies when transmit queue time out
2023-04-20 1:27 ` Jakub Kicinski
@ 2023-04-20 2:17 ` Yajun Deng
0 siblings, 0 replies; 5+ messages in thread
From: Yajun Deng @ 2023-04-20 2:17 UTC (permalink / raw)
To: Jakub Kicinski
Cc: jhs, xiyou.wangcong, jiri, davem, edumazet, pabeni, netdev,
linux-kernel
April 20, 2023 9:27 AM, "Jakub Kicinski" <kuba@kernel.org> wrote:
> On Wed, 19 Apr 2023 19:56:32 +0800 Yajun Deng wrote:
>
>> Although there is watchdog_timeo to let users know when the transmit queue
>> begin stall, but dev_watchdog() is called with an interval. The jiffies
>> will always be greater than watchdog_timeo.
>>
>> To let users know the exact time the stall started, print jiffies when
>> the transmit queue time out.
>
> Please add an explanation of how this information is useful in practice.
We found some cases with several warnings. We want to confirm which happened first.
First warning:
16:37:57 kernel: [ 7100.097547] ------------[ cut here ]------------
16:37:57 kernel: [ 7100.097550] NETDEV WATCHDOG: eno2 (i40e): transmit queue 8 timed out
16:37:57 kernel: [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270
...
Second warning:
16:38:44 kernel: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
16:38:44 kernel: [ 7147.756958] rcu: 24-....: (59999 ticks this GP) idle=546/1/0x4000000000000000 softirq=367 3137/3673146 fqs=13844
16:38:44 kernel: [ 7147.756960] (t=60001 jiffies g=4322709 q=133381)
16:38:44 kernel: [ 7147.756962] NMI backtrace for cpu 24
...
As we can see, the transmit queue start stall should be before 16:37:52, the rcu start stall is 16:37:44.
These two times are closer, we want to confirm which happened first.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net: sched: print jiffies when transmit queue time out
2023-04-19 12:02 ` Eric Dumazet
@ 2023-04-20 2:49 ` Yajun Deng
0 siblings, 0 replies; 5+ messages in thread
From: Yajun Deng @ 2023-04-20 2:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: jhs, xiyou.wangcong, jiri, davem, kuba, pabeni, netdev,
linux-kernel
April 19, 2023 8:02 PM, "Eric Dumazet" <edumazet@google.com> wrote:
> On Wed, Apr 19, 2023 at 1:56 PM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>> Although there is watchdog_timeo to let users know when the transmit queue
>> begin stall, but dev_watchdog() is called with an interval. The jiffies
>> will always be greater than watchdog_timeo.
>>
>> To let users know the exact time the stall started, print jiffies when
>> the transmit queue time out.
>>
>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
>> ---
>>
>> atomic_long_inc(&txq->trans_timeout);
>> break;
>> }
>> @@ -522,8 +522,9 @@ static void dev_watchdog(struct timer_list *t)
>>
>> if (unlikely(some_queue_timedout)) {
>> trace_net_dev_xmit_timeout(dev, i);
>> - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
>> - dev->name, netdev_drivername(dev), i);
>> + WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): \
>> + transmit queue %u timed out %lu jiffies\n",
>> + dev->name, netdev_drivername(dev), i, some_queue_timedout);
>
> If we really want this, I suggest we export a time in ms units, using
> jiffies_to_msecs()
OK.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-04-20 2:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-19 11:56 [PATCH] net: sched: print jiffies when transmit queue time out Yajun Deng
2023-04-19 12:02 ` Eric Dumazet
2023-04-20 2:49 ` Yajun Deng
2023-04-20 1:27 ` Jakub Kicinski
2023-04-20 2:17 ` Yajun Deng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).