* [PATCH net] net: poll tx timeout only on active tx queues
@ 2016-06-30 13:58 Saeed Mahameed
2016-06-30 14:28 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Saeed Mahameed @ 2016-06-30 13:58 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia,
Saeed Mahameed
From: Mohamad Haj Yahia <mohamad@mellanox.com>
Change the netdev watchdog to poll only the real active tx queues
instead of polling all tx queues.
The netdev driver doesn't necessarily have to start/stop all the
tx queues including the inactive tx queues.
Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
net/sched/sch_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f9e0e9c..a10f0ff 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg)
unsigned int i;
unsigned long trans_start;
- for (i = 0; i < dev->num_tx_queues; i++) {
+ for (i = 0; i < dev->real_num_tx_queues; i++) {
struct netdev_queue *txq;
txq = netdev_get_tx_queue(dev, i);
--
2.8.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH net] net: poll tx timeout only on active tx queues 2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed @ 2016-06-30 14:28 ` Eric Dumazet 2016-06-30 14:47 ` Saeed Mahameed 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2016-06-30 14:28 UTC (permalink / raw) To: Saeed Mahameed Cc: David S. Miller, netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote: > From: Mohamad Haj Yahia <mohamad@mellanox.com> > > Change the netdev watchdog to poll only the real active tx queues > instead of polling all tx queues. > The netdev driver doesn't necessarily have to start/stop all the > tx queues including the inactive tx queues. > > Fixes: fd2ea0a79faa ('net: Use queue aware tests throughout.') > Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > --- > net/sched/sch_generic.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index f9e0e9c..a10f0ff 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -253,7 +253,7 @@ static void dev_watchdog(unsigned long arg) > unsigned int i; > unsigned long trans_start; > > - for (i = 0; i < dev->num_tx_queues; i++) { > + for (i = 0; i < dev->real_num_tx_queues; i++) { > struct netdev_queue *txq; > > txq = netdev_get_tx_queue(dev, i); Strange, why don't you change all others helpers that are using num_tx_queues ? Which driver had a problem with this code ? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues 2016-06-30 14:28 ` Eric Dumazet @ 2016-06-30 14:47 ` Saeed Mahameed 2016-07-01 4:50 ` Yuval Mintz 0 siblings, 1 reply; 7+ messages in thread From: Saeed Mahameed @ 2016-06-30 14:47 UTC (permalink / raw) To: Eric Dumazet Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia On Thu, Jun 30, 2016 at 5:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2016-06-30 at 16:58 +0300, Saeed Mahameed wrote: >> - for (i = 0; i < dev->num_tx_queues; i++) { >> + for (i = 0; i < dev->real_num_tx_queues; i++) { >> struct netdev_queue *txq; >> >> txq = netdev_get_tx_queue(dev, i); > > Strange, why don't you change all others helpers that are using > num_tx_queues ? > which other helpers ? since this function assumes that all tx queues are started and if a non real_txq is stopped for more that timeout period it will start shouting call traces and warnings. > Which driver had a problem with this code ? non yet. currently all the device driver call netif_tx_start_all_queues(dev) on open to W/A this issue. which is strange since only real_num_tx_queues are active. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues 2016-06-30 14:47 ` Saeed Mahameed @ 2016-07-01 4:50 ` Yuval Mintz 2016-07-01 5:18 ` Eric Dumazet 0 siblings, 1 reply; 7+ messages in thread From: Yuval Mintz @ 2016-07-01 4:50 UTC (permalink / raw) To: Saeed Mahameed, Eric Dumazet Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia > currently all the device driver call netif_tx_start_all_queues(dev) > on open to W/A this issue. which is strange since only > real_num_tx_queues are active. You could also argue that netif_tx_start_all_queues() should only enable the real_num_tx_queues. [Although that would obviously cause all drivers to reach the 'problem' you're currently fixing]. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues 2016-07-01 4:50 ` Yuval Mintz @ 2016-07-01 5:18 ` Eric Dumazet 2016-07-01 12:03 ` Saeed Mahameed 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2016-07-01 5:18 UTC (permalink / raw) To: Yuval Mintz Cc: Saeed Mahameed, Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote: > > currently all the device driver call netif_tx_start_all_queues(dev) > > on open to W/A this issue. which is strange since only > > real_num_tx_queues are active. > > You could also argue that netif_tx_start_all_queues() should > only enable the real_num_tx_queues. > [Although that would obviously cause all drivers to reach the > 'problem' you're currently fixing]. Yep. Basically what I pointed out. It seems inconsistent to have loops using num_tx_queues, and others using real_num_tx_queues. Instead of 'fixing' one of them, we should take a deeper look, even if the change looks fine. num_tx_queues should be used in code that runs once, like netdev_lockdep_set_classes(), but other loops should probably use real_num_tx_queues. Anyway all these changes should definitely target net-next, not net tree. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] net: poll tx timeout only on active tx queues 2016-07-01 5:18 ` Eric Dumazet @ 2016-07-01 12:03 ` Saeed Mahameed 2016-07-06 6:42 ` Yuval Mintz 0 siblings, 1 reply; 7+ messages in thread From: Saeed Mahameed @ 2016-07-01 12:03 UTC (permalink / raw) To: Eric Dumazet Cc: Yuval Mintz, Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia On Fri, Jul 1, 2016 at 8:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2016-07-01 at 04:50 +0000, Yuval Mintz wrote: >> > currently all the device driver call netif_tx_start_all_queues(dev) >> > on open to W/A this issue. which is strange since only >> > real_num_tx_queues are active. >> >> You could also argue that netif_tx_start_all_queues() should >> only enable the real_num_tx_queues. >> [Although that would obviously cause all drivers to reach the >> 'problem' you're currently fixing]. > > Yep. Basically what I pointed out. > > It seems inconsistent to have loops using num_tx_queues, and others > using real_num_tx_queues. > > Instead of 'fixing' one of them, we should take a deeper look, even if > the change looks fine. > > num_tx_queues should be used in code that runs once, like > netdev_lockdep_set_classes(), but other loops should probably use > real_num_tx_queues. > > Anyway all these changes should definitely target net-next, not net > tree. > Thank you Eric and Yuval, Although i slightly disagree, this patch is good as is, even with the inconsistency, which is there due to a bad design. I don't' see why new drivers need to keep copy from old wrong implementations and workarounds. But for the long term, you have a point. We will consider a deeper fix for net-next as you suggested, and drop this temporary fix. Thanks Saeed. ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH net] net: poll tx timeout only on active tx queues 2016-07-01 12:03 ` Saeed Mahameed @ 2016-07-06 6:42 ` Yuval Mintz 0 siblings, 0 replies; 7+ messages in thread From: Yuval Mintz @ 2016-07-06 6:42 UTC (permalink / raw) To: Saeed Mahameed, Eric Dumazet Cc: Saeed Mahameed, David Miller, netdev, Eric Dumazet, Tom Herbert, Mohamad Haj Yahia > >> > currently all the device driver call > >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is > >> > strange since only real_num_tx_queues are active. > >> > >> You could also argue that netif_tx_start_all_queues() should only > >> enable the real_num_tx_queues. > >> [Although that would obviously cause all drivers to reach the > >> 'problem' you're currently fixing]. > > > > Yep. Basically what I pointed out. > > > > It seems inconsistent to have loops using num_tx_queues, and others > > using real_num_tx_queues. > > > > Instead of 'fixing' one of them, we should take a deeper look, even if > > the change looks fine. > > > > num_tx_queues should be used in code that runs once, like > > netdev_lockdep_set_classes(), but other loops should probably use > > real_num_tx_queues. > > > > Anyway all these changes should definitely target net-next, not net > > tree. > > > > But for the long term, you have a point. > We will consider a deeper fix for net-next as you suggested, and drop this > temporary fix. I think we've actually managed to hit an issue with qede [& modified bnx2x] due to netif_tx_start_all_queues() starting all Tx-queues - While reducing the number of channels on an interface driver reloads following which the xmit function receives an SKB using a too-high txq. Investigation seem to indicate that some TCP traffic arrived during the reload, got enqueued on the qdisc with high txq and then got transmitted as-is after re-enabling tx. [Removing the modulo from bnx2x's select_queue() lead to same issue.] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-07-06 6:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-30 13:58 [PATCH net] net: poll tx timeout only on active tx queues Saeed Mahameed 2016-06-30 14:28 ` Eric Dumazet 2016-06-30 14:47 ` Saeed Mahameed 2016-07-01 4:50 ` Yuval Mintz 2016-07-01 5:18 ` Eric Dumazet 2016-07-01 12:03 ` Saeed Mahameed 2016-07-06 6:42 ` Yuval Mintz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).