* [PATCH net] net: poll tx timeout only on active tx queues
@ 2016-06-30 13:58 Saeed Mahameed
2016-06-30 14:28 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-06-30 13:58 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia,
Saeed Mahameed
From: Mohamad Haj Yahia <mohamad@mellanox.com>
Change the netdev watchdog to poll only the real active tx queues
instead of polling all tx queues.
The netdev driver doesn't necessarily have to start/stop all the
tx queues including the inactive tx queues.
Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
net/sched/sch_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..a10f0ff 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg)
unsigned int i;
unsigned long trans_start;
- for (i = 0; i < dev->num_tx_queues; i++) {
+ for (i = 0; i < dev->real_num_tx_queues; i++) {
struct netdev_queue *txq;
txq = netdev_get_tx_queue(dev, i);
--
2.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues
2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed
@ 2016-06-30 14:28 ` Eric Dumazet
2016-06-30 14:47 ` Saeed Mahameed
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2016-06-30 14:28 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, netdev, Eric Dumazet, Tom Herbert,
Mohamad Haj Yahia
On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote:
> From: Mohamad Haj Yahia <mohamad@mellanox.com>
>
> Change the netdev watchdog to poll only the real active tx queues
> instead of polling all tx queues.
> The netdev driver doesn't necessarily have to start/stop all the
> tx queues including the inactive tx queues.
>
> Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.')
> Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
> net/sched/sch_generic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f9e0e9c..a10f0ff 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg)
> unsigned int i;
> unsigned long trans_start;
>
> - for (i = 0; i < dev->num_tx_queues; i++) {
> + for (i = 0; i < dev->real_num_tx_queues; i++) {
> struct netdev_queue *txq;
>
> txq = netdev_get_tx_queue(dev, i);
Strange, why don't you change all others helpers that are using
num_tx_queues ?
Which driver had a problem with this code ?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues
2016-06-30 14:28 ` Eric Dumazet
@ 2016-06-30 14:47 ` Saeed Mahameed
2016-07-01 4:50 ` Yuval Mintz
0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-06-30 14:47 UTC (permalink / raw)
To: Eric Dumazet
Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Eric Dumazet,
Tom Herbert, Mohamad Haj Yahia
On Thu, Jun 30, 2016 at 5:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote:
>> - for (i = 0; i < dev->num_tx_queues; i++) {
>> + for (i = 0; i < dev->real_num_tx_queues; i++) {
>> struct netdev_queue *txq;
>>
>> txq = netdev_get_tx_queue(dev, i);
>
> Strange, why don't you change all others helpers that are using
> num_tx_queues ?
>
which other helpers ?
since this function assumes that all tx queues are started and if a
non real_txq is stopped for more that timeout period it will start
shouting call traces and warnings.
> Which driver had a problem with this code ?
non yet.
currently all the device driver call netif_tx_start_all_queues(dev)
on open to W/A this issue. which is strange since only
real_num_tx_queues are active.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues
2016-06-30 14:47 ` Saeed Mahameed
@ 2016-07-01 4:50 ` Yuval Mintz
2016-07-01 5:18 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Yuval Mintz @ 2016-07-01 4:50 UTC (permalink / raw)
To: Saeed Mahameed, Eric Dumazet
Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert,
Mohamad Haj Yahia
> currently all the device driver call netif_tx_start_all_queues(dev)
> on open to W/A this issue. which is strange since only
> real_num_tx_queues are active.
You could also argue that netif_tx_start_all_queues() should
only enable the real_num_tx_queues.
[Although that would obviously cause all drivers to reach the
'problem' you're currently fixing].
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues
2016-07-01 4:50 ` Yuval Mintz
@ 2016-07-01 5:18 ` Eric Dumazet
2016-07-01 12:03 ` Saeed Mahameed
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2016-07-01 5:18 UTC (permalink / raw)
To: Yuval Mintz
Cc: Saeed Mahameed, Saeed Mahameed, David Miller, netdev,
Eric Dumazet, Tom Herbert, Mohamad Haj Yahia
On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote:
> > currently all the device driver call netif_tx_start_all_queues(dev)
> > on open to W/A this issue. which is strange since only
> > real_num_tx_queues are active.
>
> You could also argue that netif_tx_start_all_queues() should
> only enable the real_num_tx_queues.
> [Although that would obviously cause all drivers to reach the
> 'problem' you're currently fixing].
Yep. Basically what I pointed out.
It seems inconsistent to have loops using num_tx_queues, and others
using real_num_tx_queues.
Instead of 'fixing' one of them, we should take a deeper look, even if
the change looks fine.
num_tx_queues should be used in code that runs once, like
netdev_lockdep_set_classes(), but other loops should probably use
real_num_tx_queues.
Anyway all these changes should definitely target net-next, not net
tree.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues
2016-07-01 5:18 ` Eric Dumazet
@ 2016-07-01 12:03 ` Saeed Mahameed
2016-07-06 6:42 ` Yuval Mintz
0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-07-01 12:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: Yuval Mintz, Saeed Mahameed, David Miller, netdev, Eric Dumazet,
Tom Herbert, Mohamad Haj Yahia
On Fri, Jul 1, 2016 at 8:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote:
>> > currently all the device driver call netif_tx_start_all_queues(dev)
>> > on open to W/A this issue. which is strange since only
>> > real_num_tx_queues are active.
>>
>> You could also argue that netif_tx_start_all_queues() should
>> only enable the real_num_tx_queues.
>> [Although that would obviously cause all drivers to reach the
>> 'problem' you're currently fixing].
>
> Yep. Basically what I pointed out.
>
> It seems inconsistent to have loops using num_tx_queues, and others
> using real_num_tx_queues.
>
> Instead of 'fixing' one of them, we should take a deeper look, even if
> the change looks fine.
>
> num_tx_queues should be used in code that runs once, like
> netdev_lockdep_set_classes(), but other loops should probably use
> real_num_tx_queues.
>
> Anyway all these changes should definitely target net-next, not net
> tree.
>
Thank you Eric and Yuval,
Although i slightly disagree, this patch is good as is, even with the
inconsistency, which is there due to a bad design.
I don't' see why new drivers need to keep copy from old wrong
implementations and workarounds.
But for the long term, you have a point.
We will consider a deeper fix for net-next as you suggested, and drop
this temporary fix.
Thanks
Saeed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH net] net: poll tx timeout only on active tx queues
2016-07-01 12:03 ` Saeed Mahameed
@ 2016-07-06 6:42 ` Yuval Mintz
0 siblings, 0 replies; 7+ messages in thread
From: Yuval Mintz @ 2016-07-06 6:42 UTC (permalink / raw)
To: Saeed Mahameed, Eric Dumazet
Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert,
Mohamad Haj Yahia
> >> > currently all the device driver call
> >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is
> >> > strange since only real_num_tx_queues are active.
> >>
> >> You could also argue that netif_tx_start_all_queues() should only
> >> enable the real_num_tx_queues.
> >> [Although that would obviously cause all drivers to reach the
> >> 'problem' you're currently fixing].
> >
> > Yep. Basically what I pointed out.
> >
> > It seems inconsistent to have loops using num_tx_queues, and others
> > using real_num_tx_queues.
> >
> > Instead of 'fixing' one of them, we should take a deeper look, even if
> > the change looks fine.
> >
> > num_tx_queues should be used in code that runs once, like
> > netdev_lockdep_set_classes(), but other loops should probably use
> > real_num_tx_queues.
> >
> > Anyway all these changes should definitely target net-next, not net
> > tree.
> >
>
> But for the long term, you have a point.
> We will consider a deeper fix for net-next as you suggested, and drop this
> temporary fix.
I think we've actually managed to hit an issue with qede [& modified bnx2x]
due to netif_tx_start_all_queues() starting all Tx-queues -
While reducing the number of channels on an interface driver reloads
following which the xmit function receives an SKB using a too-high txq.
Investigation seem to indicate that some TCP traffic arrived during the
reload, got enqueued on the qdisc with high txq and then got transmitted
as-is after re-enabling tx.
[Removing the modulo from bnx2x's select_queue() lead to same issue.]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-07-06 6:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed
2016-06-30 14:28 ` Eric Dumazet
2016-06-30 14:47 ` Saeed Mahameed
2016-07-01 4:50 ` Yuval Mintz
2016-07-01 5:18 ` Eric Dumazet
2016-07-01 12:03 ` Saeed Mahameed
2016-07-06 6:42 ` Yuval Mintz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).