public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next] net/sched: do not reset queues in graft operations
@ 2026-03-07 16:34 Eric Dumazet
  2026-03-07 21:37 ` Jamal Hadi Salim
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Eric Dumazet @ 2026-03-07 16:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, bpf, Simon Horman, Jamal Hadi Salim, Victor Nogueira,
	Cong Wang, Jiri Pirko, Toke Høiland-Jørgensen,
	eric.dumazet, Eric Dumazet, Yunsheng Lin

Following typical script is extremely disruptive,
because each graft operation calls dev_deactivate()
which resets all the queues of the device.

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
done

One can add "ip link set dev $ETH down/up" to reduce the disruption time:

QPARAM="limit 100000 flow_limit 1000 buckets 4096"
TXQS=64
for ETH in eth1
do
 ip link set dev $ETH down
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 for i in `seq 1 $TXQS`
 do
   slot=$( printf %x $(( i )) )
   tc qd add dev $ETH parent 1:$slot fq $QPARAM
 done
 ip link set dev $ETH up
done

Or we can add a @reset_needed flag to dev_deactivate() and
dev_deactivate_many().

This flag is set to true at device dismantle or linkwatch_do_dev(),
and to false for graft operations.

In the future, we might only stop one queue instead of the whole
device, ie call dev_deactivate_queue() instead of dev_deactivate().

I think the problem (quadratic behavior) was added in commit
2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
for lockless qdisc") but this does not look serious enough to deserve
risky backports.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
---
v2: clarified the changelog
    added kdoc missing part (kernel build bots)

 include/net/sch_generic.h |  4 ++--
 net/core/dev.c            |  2 +-
 net/core/link_watch.c     |  2 +-
 net/sched/sch_api.c       |  2 +-
 net/sched/sch_generic.c   | 20 ++++++++++++--------
 net/sched/sch_htb.c       |  4 ++--
 net/sched/sch_mq.c        |  2 +-
 net/sched/sch_mqprio.c    |  2 +-
 net/sched/sch_taprio.c    |  2 +-
 9 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c355300893a3921740cf6b919cc58de953cd610c..16beba40914ee5a9454f9eec2ad6e600543c89f2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -710,8 +710,8 @@ void dev_qdisc_change_real_num_tx(struct net_device *dev,
 void dev_init_scheduler(struct net_device *dev);
 void dev_shutdown(struct net_device *dev);
 void dev_activate(struct net_device *dev);
-void dev_deactivate(struct net_device *dev);
-void dev_deactivate_many(struct list_head *head);
+void dev_deactivate(struct net_device *dev, bool reset_needed);
+void dev_deactivate_many(struct list_head *head, bool reset_needed);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
diff --git a/net/core/dev.c b/net/core/dev.c
index 203dc36aaed55e706f5d978ec02c6bf48201e3d4..6fc9350f0be8756ae849d5b0ca9b222055d96ced 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1756,7 +1756,7 @@ static void __dev_close_many(struct list_head *head)
 		smp_mb__after_atomic(); /* Commit netif_running(). */
 	}
 
-	dev_deactivate_many(head);
+	dev_deactivate_many(head, true);
 
 	list_for_each_entry(dev, head, close_list) {
 		const struct net_device_ops *ops = dev->netdev_ops;
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 25c455c10a01cf535d6a7d2952d51434fe462690..ff2c1d4538efbc89cd49d8d8477542a6e7bacbad 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -181,7 +181,7 @@ static void linkwatch_do_dev(struct net_device *dev)
 		if (netif_carrier_ok(dev))
 			dev_activate(dev);
 		else
-			dev_deactivate(dev);
+			dev_deactivate(dev, true);
 
 		netif_state_change(dev);
 	}
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index cc43e3f7574fae203989f5c28b4934f0720e64c2..c0bab092ea809955b3a5aaa08c169bca55cfaf7f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1120,7 +1120,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		}
 
 		if (dev->flags & IFF_UP)
-			dev_deactivate(dev);
+			dev_deactivate(dev, false);
 
 		qdisc_offload_graft_root(dev, new, old, extack);
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 556e0d8003161d3dfe68e484f6349714cf989c0c..d4fe907c4ad5895b1dda5249c55e4c1e0168023e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1370,11 +1370,12 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 /**
  * 	dev_deactivate_many - deactivate transmissions on several devices
  * 	@head: list of devices to deactivate
+ *	@reset_needed: qdisc should be reset if true.
  *
  *	This function returns only when all outstanding transmissions
  *	have completed, unless all devices are in dismantle phase.
  */
-void dev_deactivate_many(struct list_head *head)
+void dev_deactivate_many(struct list_head *head, bool reset_needed)
 {
 	bool sync_needed = false;
 	struct net_device *dev;
@@ -1393,11 +1394,14 @@ void dev_deactivate_many(struct list_head *head)
 	if (sync_needed)
 		synchronize_net();
 
-	list_for_each_entry(dev, head, close_list) {
-		netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
+	if (reset_needed) {
+		list_for_each_entry(dev, head, close_list) {
+			netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
 
-		if (dev_ingress_queue(dev))
-			dev_reset_queue(dev, dev_ingress_queue(dev), NULL);
+			if (dev_ingress_queue(dev))
+				dev_reset_queue(dev, dev_ingress_queue(dev),
+						NULL);
+		}
 	}
 
 	/* Wait for outstanding qdisc_run calls. */
@@ -1412,12 +1416,12 @@ void dev_deactivate_many(struct list_head *head)
 	}
 }
 
-void dev_deactivate(struct net_device *dev)
+void dev_deactivate(struct net_device *dev, bool reset_needed)
 {
 	LIST_HEAD(single);
 
 	list_add(&dev->close_list, &single);
-	dev_deactivate_many(&single);
+	dev_deactivate_many(&single, reset_needed);
 	list_del(&single);
 }
 EXPORT_SYMBOL(dev_deactivate);
@@ -1473,7 +1477,7 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev)
 	int ret = 0;
 
 	if (up)
-		dev_deactivate(dev);
+		dev_deactivate(dev, false);
 
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index cf6cd4ccfa2029d9a45e35d6780520290690732d..eb12381795ce1bb0f3b8c5f502e16ad64c4408c8 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1387,7 +1387,7 @@ htb_graft_helper(struct netdev_queue *dev_queue, struct Qdisc *new_q)
 	struct Qdisc *old_q;
 
 	if (dev->flags & IFF_UP)
-		dev_deactivate(dev);
+		dev_deactivate(dev, false);
 	old_q = dev_graft_qdisc(dev_queue, new_q);
 	if (new_q)
 		new_q->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
@@ -1421,7 +1421,7 @@ static void htb_offload_move_qdisc(struct Qdisc *sch, struct htb_class *cl_old,
 		struct Qdisc *qdisc;
 
 		if (dev->flags & IFF_UP)
-			dev_deactivate(dev);
+			dev_deactivate(dev, false);
 		qdisc = dev_graft_qdisc(queue_old, NULL);
 		WARN_ON(qdisc != cl_old->leaf.q);
 	}
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 0ed199fa18f04a06197116b6ab64ff3647dbace9..a0133a7b9d3b09a0d2a6064234c8fdef60dbf955 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -201,7 +201,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 	struct net_device *dev = qdisc_dev(sch);
 
 	if (dev->flags & IFF_UP)
-		dev_deactivate(dev);
+		dev_deactivate(dev, false);
 
 	*old = dev_graft_qdisc(dev_queue, new);
 	if (new)
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b83276409416fabeccc441e5127211632ddcfedb..002add5ce9e0ab04a6260495d1bec02983c2a204 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -469,7 +469,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 		return -EINVAL;
 
 	if (dev->flags & IFF_UP)
-		dev_deactivate(dev);
+		dev_deactivate(dev, false);
 
 	*old = dev_graft_qdisc(dev_queue, new);
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index f721c03514f6008ecc59fe4c4ca4a082099dc125..8e375281195061da848fb2bfaf79cf125afccac0 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2184,7 +2184,7 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
 		return -EINVAL;
 
 	if (dev->flags & IFF_UP)
-		dev_deactivate(dev);
+		dev_deactivate(dev, false);
 
 	/* In offload mode, the child Qdisc is directly attached to the netdev
 	 * TX queue, and thus, we need to keep its refcount elevated in order
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] net/sched: do not reset queues in graft operations
  2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
@ 2026-03-07 21:37 ` Jamal Hadi Salim
  2026-03-09 11:44 ` Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jamal Hadi Salim @ 2026-03-07 21:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, bpf,
	Simon Horman, Victor Nogueira, Cong Wang, Jiri Pirko,
	Toke Høiland-Jørgensen, eric.dumazet, Yunsheng Lin

On Sat, Mar 7, 2026 at 11:34 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Following typical script is extremely disruptive,
> because each graft operation calls dev_deactivate()
> which resets all the queues of the device.
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
> done
>
> One can add "ip link set dev $ETH down/up" to reduce the disruption time:
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  ip link set dev $ETH down
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
>  ip link set dev $ETH up
> done
>
> Or we can add a @reset_needed flag to dev_deactivate() and
> dev_deactivate_many().
>
> This flag is set to true at device dismantle or linkwatch_do_dev(),
> and to false for graft operations.
>
> In the future, we might only stop one queue instead of the whole
> device, ie call dev_deactivate_queue() instead of dev_deactivate().
>
> I think the problem (quadratic behavior) was added in commit
> 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
> for lockless qdisc") but this does not look serious enough to deserve
> risky backports.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> Cc: Yunsheng Lin <linyunsheng@huawei.com>
> ---
> v2: clarified the changelog
>     added kdoc missing part (kernel build bots)
>
>  include/net/sch_generic.h |  4 ++--
>  net/core/dev.c            |  2 +-
>  net/core/link_watch.c     |  2 +-
>  net/sched/sch_api.c       |  2 +-
>  net/sched/sch_generic.c   | 20 ++++++++++++--------
>  net/sched/sch_htb.c       |  4 ++--
>  net/sched/sch_mq.c        |  2 +-
>  net/sched/sch_mqprio.c    |  2 +-
>  net/sched/sch_taprio.c    |  2 +-
>  9 files changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index c355300893a3921740cf6b919cc58de953cd610c..16beba40914ee5a9454f9eec2ad6e600543c89f2 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -710,8 +710,8 @@ void dev_qdisc_change_real_num_tx(struct net_device *dev,
>  void dev_init_scheduler(struct net_device *dev);
>  void dev_shutdown(struct net_device *dev);
>  void dev_activate(struct net_device *dev);
> -void dev_deactivate(struct net_device *dev);
> -void dev_deactivate_many(struct list_head *head);
> +void dev_deactivate(struct net_device *dev, bool reset_needed);
> +void dev_deactivate_many(struct list_head *head, bool reset_needed);
>  struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
>                               struct Qdisc *qdisc);
>  void qdisc_reset(struct Qdisc *qdisc);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 203dc36aaed55e706f5d978ec02c6bf48201e3d4..6fc9350f0be8756ae849d5b0ca9b222055d96ced 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1756,7 +1756,7 @@ static void __dev_close_many(struct list_head *head)
>                 smp_mb__after_atomic(); /* Commit netif_running(). */
>         }
>
> -       dev_deactivate_many(head);
> +       dev_deactivate_many(head, true);
>
>         list_for_each_entry(dev, head, close_list) {
>                 const struct net_device_ops *ops = dev->netdev_ops;
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 25c455c10a01cf535d6a7d2952d51434fe462690..ff2c1d4538efbc89cd49d8d8477542a6e7bacbad 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -181,7 +181,7 @@ static void linkwatch_do_dev(struct net_device *dev)
>                 if (netif_carrier_ok(dev))
>                         dev_activate(dev);
>                 else
> -                       dev_deactivate(dev);
> +                       dev_deactivate(dev, true);
>
>                 netif_state_change(dev);
>         }
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index cc43e3f7574fae203989f5c28b4934f0720e64c2..c0bab092ea809955b3a5aaa08c169bca55cfaf7f 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1120,7 +1120,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>                 }
>
>                 if (dev->flags & IFF_UP)
> -                       dev_deactivate(dev);
> +                       dev_deactivate(dev, false);
>
>                 qdisc_offload_graft_root(dev, new, old, extack);
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 556e0d8003161d3dfe68e484f6349714cf989c0c..d4fe907c4ad5895b1dda5249c55e4c1e0168023e 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1370,11 +1370,12 @@ static bool some_qdisc_is_busy(struct net_device *dev)
>  /**
>   *     dev_deactivate_many - deactivate transmissions on several devices
>   *     @head: list of devices to deactivate
> + *     @reset_needed: qdisc should be reset if true.
>   *
>   *     This function returns only when all outstanding transmissions
>   *     have completed, unless all devices are in dismantle phase.
>   */
> -void dev_deactivate_many(struct list_head *head)
> +void dev_deactivate_many(struct list_head *head, bool reset_needed)
>  {
>         bool sync_needed = false;
>         struct net_device *dev;
> @@ -1393,11 +1394,14 @@ void dev_deactivate_many(struct list_head *head)
>         if (sync_needed)
>                 synchronize_net();
>
> -       list_for_each_entry(dev, head, close_list) {
> -               netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
> +       if (reset_needed) {
> +               list_for_each_entry(dev, head, close_list) {
> +                       netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
>
> -               if (dev_ingress_queue(dev))
> -                       dev_reset_queue(dev, dev_ingress_queue(dev), NULL);
> +                       if (dev_ingress_queue(dev))
> +                               dev_reset_queue(dev, dev_ingress_queue(dev),
> +                                               NULL);
> +               }
>         }
>
>         /* Wait for outstanding qdisc_run calls. */
> @@ -1412,12 +1416,12 @@ void dev_deactivate_many(struct list_head *head)
>         }
>  }
>
> -void dev_deactivate(struct net_device *dev)
> +void dev_deactivate(struct net_device *dev, bool reset_needed)
>  {
>         LIST_HEAD(single);
>
>         list_add(&dev->close_list, &single);
> -       dev_deactivate_many(&single);
> +       dev_deactivate_many(&single, reset_needed);
>         list_del(&single);
>  }
>  EXPORT_SYMBOL(dev_deactivate);
> @@ -1473,7 +1477,7 @@ int dev_qdisc_change_tx_queue_len(struct net_device *dev)
>         int ret = 0;
>
>         if (up)
> -               dev_deactivate(dev);
> +               dev_deactivate(dev, false);
>
>         for (i = 0; i < dev->num_tx_queues; i++) {
>                 ret = qdisc_change_tx_queue_len(dev, &dev->_tx[i]);
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index cf6cd4ccfa2029d9a45e35d6780520290690732d..eb12381795ce1bb0f3b8c5f502e16ad64c4408c8 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -1387,7 +1387,7 @@ htb_graft_helper(struct netdev_queue *dev_queue, struct Qdisc *new_q)
>         struct Qdisc *old_q;
>
>         if (dev->flags & IFF_UP)
> -               dev_deactivate(dev);
> +               dev_deactivate(dev, false);
>         old_q = dev_graft_qdisc(dev_queue, new_q);
>         if (new_q)
>                 new_q->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
> @@ -1421,7 +1421,7 @@ static void htb_offload_move_qdisc(struct Qdisc *sch, struct htb_class *cl_old,
>                 struct Qdisc *qdisc;
>
>                 if (dev->flags & IFF_UP)
> -                       dev_deactivate(dev);
> +                       dev_deactivate(dev, false);
>                 qdisc = dev_graft_qdisc(queue_old, NULL);
>                 WARN_ON(qdisc != cl_old->leaf.q);
>         }
> diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
> index 0ed199fa18f04a06197116b6ab64ff3647dbace9..a0133a7b9d3b09a0d2a6064234c8fdef60dbf955 100644
> --- a/net/sched/sch_mq.c
> +++ b/net/sched/sch_mq.c
> @@ -201,7 +201,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
>         struct net_device *dev = qdisc_dev(sch);
>
>         if (dev->flags & IFF_UP)
> -               dev_deactivate(dev);
> +               dev_deactivate(dev, false);
>
>         *old = dev_graft_qdisc(dev_queue, new);
>         if (new)
> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index b83276409416fabeccc441e5127211632ddcfedb..002add5ce9e0ab04a6260495d1bec02983c2a204 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -469,7 +469,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
>                 return -EINVAL;
>
>         if (dev->flags & IFF_UP)
> -               dev_deactivate(dev);
> +               dev_deactivate(dev, false);
>
>         *old = dev_graft_qdisc(dev_queue, new);
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index f721c03514f6008ecc59fe4c4ca4a082099dc125..8e375281195061da848fb2bfaf79cf125afccac0 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -2184,7 +2184,7 @@ static int taprio_graft(struct Qdisc *sch, unsigned long cl,
>                 return -EINVAL;
>
>         if (dev->flags & IFF_UP)
> -               dev_deactivate(dev);
> +               dev_deactivate(dev, false);
>
>         /* In offload mode, the child Qdisc is directly attached to the netdev
>          * TX queue, and thus, we need to keep its refcount elevated in order
> --
> 2.53.0.473.g4a7958ca14-goog
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] net/sched: do not reset queues in graft operations
  2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
  2026-03-07 21:37 ` Jamal Hadi Salim
@ 2026-03-09 11:44 ` Toke Høiland-Jørgensen
  2026-03-09 19:36 ` Victor Nogueira
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-03-09 11:44 UTC (permalink / raw)
  To: Eric Dumazet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, bpf, Simon Horman, Jamal Hadi Salim, Victor Nogueira,
	Cong Wang, Jiri Pirko, eric.dumazet, Eric Dumazet, Yunsheng Lin

Eric Dumazet <edumazet@google.com> writes:

> Following typical script is extremely disruptive,
> because each graft operation calls dev_deactivate()
> which resets all the queues of the device.
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
> done
>
> One can add "ip link set dev $ETH down/up" to reduce the disruption time:
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  ip link set dev $ETH down
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
>  ip link set dev $ETH up
> done
>
> Or we can add a @reset_needed flag to dev_deactivate() and
> dev_deactivate_many().
>
> This flag is set to true at device dismantle or linkwatch_do_dev(),
> and to false for graft operations.
>
> In the future, we might only stop one queue instead of the whole
> device, ie call dev_deactivate_queue() instead of dev_deactivate().
>
> I think the problem (quadratic behavior) was added in commit
> 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
> for lockless qdisc") but this does not look serious enough to deserve
> risky backports.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Yunsheng Lin <linyunsheng@huawei.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] net/sched: do not reset queues in graft operations
  2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
  2026-03-07 21:37 ` Jamal Hadi Salim
  2026-03-09 11:44 ` Toke Høiland-Jørgensen
@ 2026-03-09 19:36 ` Victor Nogueira
  2026-03-10  1:58 ` Jakub Kicinski
  2026-03-10  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Victor Nogueira @ 2026-03-09 19:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David S . Miller, Jakub Kicinski, Paolo Abeni, netdev, bpf,
	Simon Horman, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Toke Høiland-Jørgensen, eric.dumazet, Yunsheng Lin

On Sat, Mar 7, 2026 at 1:34 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Following typical script is extremely disruptive,
> because each graft operation calls dev_deactivate()
> which resets all the queues of the device.
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
> done
>
> One can add "ip link set dev $ETH down/up" to reduce the disruption time:
>
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  ip link set dev $ETH down
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
>  ip link set dev $ETH up
> done
>
> Or we can add a @reset_needed flag to dev_deactivate() and
> dev_deactivate_many().
>
> This flag is set to true at device dismantle or linkwatch_do_dev(),
> and to false for graft operations.
>
> In the future, we might only stop one queue instead of the whole
> device, ie call dev_deactivate_queue() instead of dev_deactivate().
>
> I think the problem (quadratic behavior) was added in commit
> 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
> for lockless qdisc") but this does not look serious enough to deserve
> risky backports.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Victor Nogueira <victor@mojatatu.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] net/sched: do not reset queues in graft operations
  2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
                   ` (2 preceding siblings ...)
  2026-03-09 19:36 ` Victor Nogueira
@ 2026-03-10  1:58 ` Jakub Kicinski
  2026-03-10  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2026-03-10  1:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	David S . Miller, Paolo Abeni, netdev, bpf, Simon Horman,
	Jamal Hadi Salim, Victor Nogueira, Cong Wang, Jiri Pirko,
	Toke Høiland-Jørgensen, eric.dumazet, Yunsheng Lin

On Sat,  7 Mar 2026 16:34:30 +0000 Eric Dumazet wrote:
> I think the problem (quadratic behavior) was added in commit
> 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op
> for lockless qdisc") but this does not look serious enough to deserve
> risky backports.

FWIW I was wondering whether with your recent llist in place we could
retire the NOLOCK concept. So many workarounds grew around it at
this point :( But in a naive pktgen+dummy test llist ends up dropping
most packets. Not sure what it looks like on a more realistic setup.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 net-next] net/sched: do not reset queues in graft operations
  2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
                   ` (3 preceding siblings ...)
  2026-03-10  1:58 ` Jakub Kicinski
@ 2026-03-10  2:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-10  2:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: ast, daniel, andrii, davem, kuba, pabeni, netdev, bpf, horms, jhs,
	victor, xiyou.wangcong, jiri, toke, eric.dumazet, linyunsheng

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat,  7 Mar 2026 16:34:30 +0000 you wrote:
> Following typical script is extremely disruptive,
> because each graft operation calls dev_deactivate()
> which resets all the queues of the device.
> 
> QPARAM="limit 100000 flow_limit 1000 buckets 4096"
> TXQS=64
> for ETH in eth1
> do
>  tc qd del dev $ETH root 2>/dev/null
>  tc qd add dev $ETH root handle 1: mq
>  for i in `seq 1 $TXQS`
>  do
>    slot=$( printf %x $(( i )) )
>    tc qd add dev $ETH parent 1:$slot fq $QPARAM
>  done
> done
> 
> [...]

Here is the summary with links:
  - [v2,net-next] net/sched: do not reset queues in graft operations
    https://git.kernel.org/netdev/net-next/c/47e8dbb6e763

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-10  2:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 16:34 [PATCH v2 net-next] net/sched: do not reset queues in graft operations Eric Dumazet
2026-03-07 21:37 ` Jamal Hadi Salim
2026-03-09 11:44 ` Toke Høiland-Jørgensen
2026-03-09 19:36 ` Victor Nogueira
2026-03-10  1:58 ` Jakub Kicinski
2026-03-10  2:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox