netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: poll tx timeout only on active tx queues
@ 2016-06-30 13:58 Saeed Mahameed
  2016-06-30 14:28 ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-06-30 13:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia,
	Saeed Mahameed

From: Mohamad Haj Yahia <mohamad@mellanox.com>

Change the netdev watchdog to poll only the real active tx queues
instead of polling all tx queues.
The netdev driver doesn't necessarily have to start/stop all the
tx queues including the inactive tx queues.

Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 net/sched/sch_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..a10f0ff 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg)
 			unsigned int i;
 			unsigned long trans_start;
 
-			for (i = 0; i < dev->num_tx_queues; i++) {
+			for (i = 0; i < dev->real_num_tx_queues; i++) {
 				struct netdev_queue *txq;
 
 				txq = netdev_get_tx_queue(dev, i);
-- 
2.8.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: poll tx timeout only on active tx queues
  2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed
@ 2016-06-30 14:28 ` Eric Dumazet
  2016-06-30 14:47   ` Saeed Mahameed
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2016-06-30 14:28 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Eric Dumazet, Tom Herbert,
	Mohamad Haj Yahia

On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote:
> From: Mohamad Haj Yahia <mohamad@mellanox.com>
> 
> Change the netdev watchdog to poll only the real active tx queues
> instead of polling all tx queues.
> The netdev driver doesn't necessarily have to start/stop all the
> tx queues including the inactive tx queues.
> 
> Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.')
> Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  net/sched/sch_generic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index f9e0e9c..a10f0ff 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg)
>  			unsigned int i;
>  			unsigned long trans_start;
>  
> -			for (i = 0; i < dev->num_tx_queues; i++) {
> +			for (i = 0; i < dev->real_num_tx_queues; i++) {
>  				struct netdev_queue *txq;
>  
>  				txq = netdev_get_tx_queue(dev, i);

Strange, why don't you change all others helpers that are using
num_tx_queues ?

Which driver had a problem with this code ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: poll tx timeout only on active tx queues
  2016-06-30 14:28 ` Eric Dumazet
@ 2016-06-30 14:47   ` Saeed Mahameed
  2016-07-01  4:50     ` Yuval Mintz
  0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-06-30 14:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Eric Dumazet,
	Tom Herbert, Mohamad Haj Yahia

On Thu, Jun 30, 2016 at 5:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote:
>> -                     for (i = 0; i < dev->num_tx_queues; i++) {
>> +                     for (i = 0; i < dev->real_num_tx_queues; i++) {
>>                               struct netdev_queue *txq;
>>
>>                               txq = netdev_get_tx_queue(dev, i);
>
> Strange, why don't you change all others helpers that are using
> num_tx_queues ?
>

which other helpers ?
since this function assumes that all tx queues are started and if a
non real_txq is stopped for more that timeout period it will start
shouting call traces and warnings.

> Which driver had a problem with this code ?

non yet.
currently all the device driver call  netif_tx_start_all_queues(dev)
on open to W/A this issue. which is strange since only
real_num_tx_queues are active.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: poll tx timeout only on active tx queues
  2016-06-30 14:47   ` Saeed Mahameed
@ 2016-07-01  4:50     ` Yuval Mintz
  2016-07-01  5:18       ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Yuval Mintz @ 2016-07-01  4:50 UTC (permalink / raw)
  To: Saeed Mahameed, Eric Dumazet
  Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert,
	Mohamad Haj Yahia

> currently all the device driver call  netif_tx_start_all_queues(dev)
> on open to W/A this issue. which is strange since only
> real_num_tx_queues are active.

You could also argue that netif_tx_start_all_queues() should
only enable the real_num_tx_queues.
[Although that would obviously cause all drivers to reach the
'problem' you're currently fixing].

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: poll tx timeout only on active tx queues
  2016-07-01  4:50     ` Yuval Mintz
@ 2016-07-01  5:18       ` Eric Dumazet
  2016-07-01 12:03         ` Saeed Mahameed
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2016-07-01  5:18 UTC (permalink / raw)
  To: Yuval Mintz
  Cc: Saeed Mahameed, Saeed Mahameed, David Miller, netdev,
	Eric Dumazet, Tom Herbert, Mohamad Haj Yahia

On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote:
> > currently all the device driver call  netif_tx_start_all_queues(dev)
> > on open to W/A this issue. which is strange since only
> > real_num_tx_queues are active.
> 
> You could also argue that netif_tx_start_all_queues() should
> only enable the real_num_tx_queues.
> [Although that would obviously cause all drivers to reach the
> 'problem' you're currently fixing].

Yep. Basically what I pointed out.

It seems inconsistent to have loops using num_tx_queues, and others
using real_num_tx_queues.

Instead of 'fixing' one of them, we should take a deeper look, even if
the change looks fine.

num_tx_queues should be used in code that runs once, like
netdev_lockdep_set_classes(), but other loops should probably use
real_num_tx_queues.

Anyway all these changes should definitely target net-next, not net
tree.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: poll tx timeout only on active tx queues
  2016-07-01  5:18       ` Eric Dumazet
@ 2016-07-01 12:03         ` Saeed Mahameed
  2016-07-06  6:42           ` Yuval Mintz
  0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-07-01 12:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Yuval Mintz, Saeed Mahameed, David Miller, netdev, Eric Dumazet,
	Tom Herbert, Mohamad Haj Yahia

On Fri, Jul 1, 2016 at 8:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote:
>> > currently all the device driver call  netif_tx_start_all_queues(dev)
>> > on open to W/A this issue. which is strange since only
>> > real_num_tx_queues are active.
>>
>> You could also argue that netif_tx_start_all_queues() should
>> only enable the real_num_tx_queues.
>> [Although that would obviously cause all drivers to reach the
>> 'problem' you're currently fixing].
>
> Yep. Basically what I pointed out.
>
> It seems inconsistent to have loops using num_tx_queues, and others
> using real_num_tx_queues.
>
> Instead of 'fixing' one of them, we should take a deeper look, even if
> the change looks fine.
>
> num_tx_queues should be used in code that runs once, like
> netdev_lockdep_set_classes(), but other loops should probably use
> real_num_tx_queues.
>
> Anyway all these changes should definitely target net-next, not net
> tree.
>

Thank you Eric and Yuval,

Although i slightly disagree, this patch is good as is, even with the
inconsistency, which is there due to a bad design.
I don't' see why new drivers need to keep copy from old wrong
implementations and workarounds.

But for the long term, you have a point.
We will consider a deeper fix for net-next as you suggested, and drop
this temporary fix.

Thanks
Saeed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] net: poll tx timeout only on active tx queues
  2016-07-01 12:03         ` Saeed Mahameed
@ 2016-07-06  6:42           ` Yuval Mintz
  0 siblings, 0 replies; 7+ messages in thread
From: Yuval Mintz @ 2016-07-06  6:42 UTC (permalink / raw)
  To: Saeed Mahameed, Eric Dumazet
  Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert,
	Mohamad Haj Yahia

> >> > currently all the device driver call
> >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is
> >> > strange since only real_num_tx_queues are active.
> >>
> >> You could also argue that netif_tx_start_all_queues() should only
> >> enable the real_num_tx_queues.
> >> [Although that would obviously cause all drivers to reach the
> >> 'problem' you're currently fixing].
> >
> > Yep. Basically what I pointed out.
> >
> > It seems inconsistent to have loops using num_tx_queues, and others
> > using real_num_tx_queues.
> >
> > Instead of 'fixing' one of them, we should take a deeper look, even if
> > the change looks fine.
> >
> > num_tx_queues should be used in code that runs once, like
> > netdev_lockdep_set_classes(), but other loops should probably use
> > real_num_tx_queues.
> >
> > Anyway all these changes should definitely target net-next, not net
> > tree.
> >
> 
> But for the long term, you have a point.
> We will consider a deeper fix for net-next as you suggested, and drop this
> temporary fix.

I think we've actually managed to hit an issue with qede [& modified bnx2x]
due to netif_tx_start_all_queues() starting all Tx-queues - 
While reducing the number of channels on an interface driver reloads
following which the xmit function receives an SKB using a too-high txq.

Investigation seem to indicate that some TCP traffic arrived during the
reload, got enqueued on the qdisc with high txq and then got transmitted
as-is after re-enabling tx.
[Removing the modulo from bnx2x's select_queue() lead to same issue.]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-06  6:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed
2016-06-30 14:28 ` Eric Dumazet
2016-06-30 14:47   ` Saeed Mahameed
2016-07-01  4:50     ` Yuval Mintz
2016-07-01  5:18       ` Eric Dumazet
2016-07-01 12:03         ` Saeed Mahameed
2016-07-06  6:42           ` Yuval Mintz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).