public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact
@ 2023-05-24  1:16 Peilin Ye
  2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:16 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

Link to v4: https://lore.kernel.org/r/cover.1684825171.git.peilin.ye@bytedance.com/
Link to v3 (incomplete): https://lore.kernel.org/r/cover.1684821877.git.peilin.ye@bytedance.com/
Link to v2: https://lore.kernel.org/r/cover.1684796705.git.peilin.ye@bytedance.com/
Link to v1: https://lore.kernel.org/r/cover.1683326865.git.peilin.ye@bytedance.com/

Hi all,

These are v5 fixes for ingress and clsact Qdiscs.  Please take another
look at patch 1, 2 and 6, thanks!

Changes in v5:
  - for [6/6], reinitialize @q, @p (suggested by Vlad) and @tcm after the
    "replay:" tag
  - for [1,2/6], do nothing in ->destroy() if ->parent isn't ffff:fff1, as
    reported by Pedro

Change in v3, v4:
  - add in-body From: tags

Changes in v2:
  - for [1-5/6], include tags from Jamal and Pedro
  - for [6/6], as suggested by Vlad, replay the request if the current
    Qdisc has any ongoing (RTNL-unlocked) filter requests, instead of
    returning -EBUSY to the user
  - use Closes: tag as warned by checkpatch

[1,2/6]: ingress and clsact Qdiscs should only be created under ffff:fff1
  [3/6]: Under ffff:fff1, only create ingress and clsact Qdiscs (for now,
         at least)
  [4/6]: After creating ingress and clsact Qdiscs under ffff:fff1, do not
         graft them again to anywhere else (e.g. as the inner Qdisc of a
         TBF Qdisc)
  [5/6]: Prepare for [6/6], do not reuse that for-loop in qdisc_graft()
         for ingress and clsact Qdiscs
  [6/6]: Fix use-after-free [a] in mini_qdisc_pair_swap()

[a] https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b

Thanks,
Peilin Ye (6):
  net/sched: sch_ingress: Only create under TC_H_INGRESS
  net/sched: sch_clsact: Only create under TC_H_CLSACT
  net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact)
    Qdiscs
  net/sched: Prohibit regrafting ingress or clsact Qdiscs
  net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
  net/sched: qdisc_destroy() old ingress and clsact Qdiscs before
    grafting

 include/net/sch_generic.h |  8 +++++
 net/sched/sch_api.c       | 68 ++++++++++++++++++++++++++++-----------
 net/sched/sch_generic.c   | 14 ++++++--
 net/sched/sch_ingress.c   | 16 +++++++--
 4 files changed, 83 insertions(+), 23 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
@ 2023-05-24  1:17 ` Peilin Ye
  2023-05-24 15:37   ` Pedro Tammela
  2023-05-24 15:57   ` Jamal Hadi Salim
  2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:17 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

ingress Qdiscs are only supposed to be created under TC_H_INGRESS.
Return -EOPNOTSUPP if 'parent' is not TC_H_INGRESS, similar to
mq_init().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v5:
  - avoid underflowing @ingress_needed_key in ->destroy(), reported by
    Pedro

change in v3, v4:
  - add in-body From: tag

 net/sched/sch_ingress.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 84838128b9c5..f9ef6deb2770 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -80,6 +80,9 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
+	if (sch->parent != TC_H_INGRESS)
+		return -EOPNOTSUPP;
+
 	net_inc_ingress_queue();
 
 	mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
@@ -101,6 +104,9 @@ static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
 
+	if (sch->parent != TC_H_INGRESS)
+		return;
+
 	tcf_block_put_ext(q->block, sch, &q->block_info);
 	net_dec_ingress_queue();
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
  2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
@ 2023-05-24  1:18 ` Peilin Ye
  2023-05-24 15:38   ` Pedro Tammela
  2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:18 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which
equals TC_H_INGRESS).  Return -EOPNOTSUPP if 'parent' is not
TC_H_CLSACT.

Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v5:
  - avoid underflowing @egress_needed_key in ->destroy(), reported by
    Pedro

change in v3, v4:
  - add in-body From: tag

 net/sched/sch_ingress.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index f9ef6deb2770..35963929e117 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -225,6 +225,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
+	if (sch->parent != TC_H_CLSACT)
+		return -EOPNOTSUPP;
+
 	net_inc_ingress_queue();
 	net_inc_egress_queue();
 
@@ -254,6 +257,9 @@ static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
 
+	if (sch->parent != TC_H_CLSACT)
+		return;
+
 	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
 	tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
  2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
  2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
@ 2023-05-24  1:19 ` Peilin Ye
  2023-05-24 15:38   ` Pedro Tammela
  2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:19 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

Currently it is possible to add e.g. an HTB Qdisc under ffff:fff1
(TC_H_INGRESS, TC_H_CLSACT):

  $ ip link add name ifb0 type ifb
  $ tc qdisc add dev ifb0 parent ffff:fff1 htb
  $ tc qdisc add dev ifb0 clsact
  Error: Exclusivity flag on, cannot modify.
  $ drgn
  ...
  >>> ifb0 = netdev_get_by_name(prog, "ifb0")
  >>> qdisc = ifb0.ingress_queue.qdisc_sleeping
  >>> print(qdisc.ops.id.string_().decode())
  htb
  >>> qdisc.flags.value_() # TCQ_F_INGRESS
  2

Only allow ingress and clsact Qdiscs under ffff:fff1.  Return -EINVAL
for everything else.  Make TCQ_F_INGRESS a static flag of ingress and
clsact Qdiscs.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v3, v4:
  - add in-body From: tag

 net/sched/sch_api.c     | 7 ++++++-
 net/sched/sch_ingress.c | 4 ++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index fdb8f429333d..383195955b7d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1252,7 +1252,12 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 	sch->parent = parent;
 
 	if (handle == TC_H_INGRESS) {
-		sch->flags |= TCQ_F_INGRESS;
+		if (!(sch->flags & TCQ_F_INGRESS)) {
+			NL_SET_ERR_MSG(extack,
+				       "Specified parent ID is reserved for ingress and clsact Qdiscs");
+			err = -EINVAL;
+			goto err_out3;
+		}
 		handle = TC_H_MAKE(TC_H_INGRESS, 0);
 	} else {
 		if (handle == 0) {
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 35963929e117..e43a45499372 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -140,7 +140,7 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
 	.cl_ops			=	&ingress_class_ops,
 	.id			=	"ingress",
 	.priv_size		=	sizeof(struct ingress_sched_data),
-	.static_flags		=	TCQ_F_CPUSTATS,
+	.static_flags		=	TCQ_F_INGRESS | TCQ_F_CPUSTATS,
 	.init			=	ingress_init,
 	.destroy		=	ingress_destroy,
 	.dump			=	ingress_dump,
@@ -281,7 +281,7 @@ static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
 	.cl_ops			=	&clsact_class_ops,
 	.id			=	"clsact",
 	.priv_size		=	sizeof(struct clsact_sched_data),
-	.static_flags		=	TCQ_F_CPUSTATS,
+	.static_flags		=	TCQ_F_INGRESS | TCQ_F_CPUSTATS,
 	.init			=	clsact_init,
 	.destroy		=	clsact_destroy,
 	.dump			=	ingress_dump,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
                   ` (2 preceding siblings ...)
  2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
@ 2023-05-24  1:19 ` Peilin Ye
  2023-05-24 15:38   ` Pedro Tammela
  2023-05-24  1:20 ` [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:19 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

Currently, after creating an ingress (or clsact) Qdisc and grafting it
under TC_H_INGRESS (TC_H_CLSACT), it is possible to graft it again under
e.g. a TBF Qdisc:

  $ ip link add ifb0 type ifb
  $ tc qdisc add dev ifb0 handle 1: root tbf rate 20kbit buffer 1600 limit 3000
  $ tc qdisc add dev ifb0 clsact
  $ tc qdisc link dev ifb0 handle ffff: parent 1:1
  $ tc qdisc show dev ifb0
  qdisc tbf 1: root refcnt 2 rate 20Kbit burst 1600b lat 560.0ms
  qdisc clsact ffff: parent ffff:fff1 refcnt 2
                                      ^^^^^^^^

clsact's refcount has increased: it is now grafted under both
TC_H_CLSACT and 1:1.

ingress and clsact Qdiscs should only be used under TC_H_INGRESS
(TC_H_CLSACT).  Prohibit regrafting them.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v3, v4:
  - add in-body From: tag

 net/sched/sch_api.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 383195955b7d..49b9c1bbfdd9 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1596,6 +1596,11 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 					NL_SET_ERR_MSG(extack, "Invalid qdisc name");
 					return -EINVAL;
 				}
+				if (q->flags & TCQ_F_INGRESS) {
+					NL_SET_ERR_MSG(extack,
+						       "Cannot regraft ingress or clsact Qdiscs");
+					return -EINVAL;
+				}
 				if (q == p ||
 				    (p && check_loop(q, p, 0))) {
 					NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and clsact Qdiscs
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
                   ` (3 preceding siblings ...)
  2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
@ 2023-05-24  1:20 ` Peilin Ye
  2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
  2023-05-25 17:16 ` [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Vlad Buslov
  6 siblings, 0 replies; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:20 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

Grafting ingress and clsact Qdiscs does not need a for-loop in
qdisc_graft().  Refactor it.  No functional changes intended.

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v3, v4:
  - add in-body From: tag

 net/sched/sch_api.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 49b9c1bbfdd9..f72a581666a2 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1073,12 +1073,12 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
 	if (parent == NULL) {
 		unsigned int i, num_q, ingress;
+		struct netdev_queue *dev_queue;
 
 		ingress = 0;
 		num_q = dev->num_tx_queues;
 		if ((q && q->flags & TCQ_F_INGRESS) ||
 		    (new && new->flags & TCQ_F_INGRESS)) {
-			num_q = 1;
 			ingress = 1;
 			if (!dev_ingress_queue(dev)) {
 				NL_SET_ERR_MSG(extack, "Device does not have an ingress queue");
@@ -1094,18 +1094,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		if (new && new->ops->attach && !ingress)
 			goto skip;
 
-		for (i = 0; i < num_q; i++) {
-			struct netdev_queue *dev_queue = dev_ingress_queue(dev);
-
-			if (!ingress)
+		if (!ingress) {
+			for (i = 0; i < num_q; i++) {
 				dev_queue = netdev_get_tx_queue(dev, i);
+				old = dev_graft_qdisc(dev_queue, new);
 
-			old = dev_graft_qdisc(dev_queue, new);
-			if (new && i > 0)
-				qdisc_refcount_inc(new);
-
-			if (!ingress)
+				if (new && i > 0)
+					qdisc_refcount_inc(new);
 				qdisc_put(old);
+			}
+		} else {
+			dev_queue = dev_ingress_queue(dev);
+			old = dev_graft_qdisc(dev_queue, new);
 		}
 
 skip:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
                   ` (4 preceding siblings ...)
  2023-05-24  1:20 ` [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
@ 2023-05-24  1:20 ` Peilin Ye
  2023-05-24 15:39   ` Pedro Tammela
  2023-05-25 17:16 ` [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Vlad Buslov
  6 siblings, 1 reply; 20+ messages in thread
From: Peilin Ye @ 2023-05-24  1:20 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Pedro Tammela, Hillf Danton, netdev, linux-kernel, Cong Wang,
	Peilin Ye

From: Peilin Ye <peilin.ye@bytedance.com>

mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in
ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap().  Similar for
clsact Qdiscs and miniq_egress.

Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:

 BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
 Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
 Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
  print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
  print_report mm/kasan/report.c:430 [inline]
  kasan_report+0x11c/0x130 mm/kasan/report.c:536
  mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
  tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
  tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
  tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
  tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
  tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...

@old and @new should not affect each other.  In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.  Fixing without changing sch_api.c turned out to be
difficult (please refer to Closes: for discussions).  Instead, make sure
@new's first call always happen after @old's last call, in
qdisc_destroy(), has finished:

In qdisc_graft(), return -EAGAIN and tell the caller to replay
(suggested by Vlad Buslov) if @old has any ongoing RTNL-unlocked filter
requests, and call qdisc_destroy() for @old before grafting @new.

Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for RTNL-unlocked filter requests.  Introduce
a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.

Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact
Qdiscs".

[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after patch "net/sched: sch_ingress: Only
create under TC_H_INGRESS") on eth0 that has 8 transmission queues:

  Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
  adds a flower filter X to A.

  Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
  b2) to replace A, then adds a flower filter Y to B.

 Thread 1               A's refcnt   Thread 2
  RTM_NEWQDISC (A, RTNL-locked)
   qdisc_create(A)               1
   qdisc_graft(A)                9

  RTM_NEWTFILTER (X, RTNL-unlocked)
   __tcf_qdisc_find(A)          10
   tcf_chain0_head_change(A)
   mini_qdisc_pair_swap(A) (1st)
            |
            |                         RTM_NEWQDISC (B, RTNL-locked)
         RCU sync                2     qdisc_graft(B)
            |                    1     notify_and_destroy(A)
            |
   tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
   qdisc_destroy(A)                    tcf_chain0_head_change(B)
   tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
   mini_qdisc_pair_swap(A) (3rd)                |
           ...                                 ...

Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
on eth0 will not find filter Y in sch_handle_ingress().

This is only one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.  The point is clear though: again, A should
never modify those per-net_device pointers after B, and B should not
update A's RCU state.

Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
change in v5:
  - reinitialize @q, @p (suggested by Vlad) and @tcm before replaying,
    just like @flags in tc_new_tfilter()

change in v3, v4:
  - add in-body From: tag

changes in v2:
  - replay the request if the current Qdisc has any ongoing RTNL-unlocked
    filter requests (Vlad)
  - minor changes in code comments and commit log

 include/net/sch_generic.h |  8 ++++++++
 net/sched/sch_api.c       | 40 ++++++++++++++++++++++++++++++---------
 net/sched/sch_generic.c   | 14 +++++++++++---
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fab5ba3e61b7..3e9cc43cbc90 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -137,6 +137,13 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
 	refcount_inc(&qdisc->refcnt);
 }
 
+static inline bool qdisc_refcount_dec_if_one(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return true;
+	return refcount_dec_if_one(&qdisc->refcnt);
+}
+
 /* Intended to be used by unlocked users, when concurrent qdisc release is
  * possible.
  */
@@ -652,6 +659,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
+void qdisc_destroy(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
 void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f72a581666a2..286b7c58f5b9 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1080,10 +1080,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 		if ((q && q->flags & TCQ_F_INGRESS) ||
 		    (new && new->flags & TCQ_F_INGRESS)) {
 			ingress = 1;
-			if (!dev_ingress_queue(dev)) {
+			dev_queue = dev_ingress_queue(dev);
+			if (!dev_queue) {
 				NL_SET_ERR_MSG(extack, "Device does not have an ingress queue");
 				return -ENOENT;
 			}
+
+			/* Replay if the current ingress (or clsact) Qdisc has ongoing
+			 * RTNL-unlocked filter request(s).  This is the counterpart of that
+			 * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
+			 */
+			if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
+				return -EAGAIN;
 		}
 
 		if (dev->flags & IFF_UP)
@@ -1104,8 +1112,16 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 				qdisc_put(old);
 			}
 		} else {
-			dev_queue = dev_ingress_queue(dev);
-			old = dev_graft_qdisc(dev_queue, new);
+			old = dev_graft_qdisc(dev_queue, NULL);
+
+			/* {ingress,clsact}_destroy() @old before grafting @new to avoid
+			 * unprotected concurrent accesses to net_device::miniq_{in,e}gress
+			 * pointer(s) in mini_qdisc_pair_swap().
+			 */
+			qdisc_notify(net, skb, n, classid, old, new, extack);
+			qdisc_destroy(old);
+
+			dev_graft_qdisc(dev_queue, new);
 		}
 
 skip:
@@ -1119,8 +1135,6 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 
 			if (new && new->ops->attach)
 				new->ops->attach(new);
-		} else {
-			notify_and_destroy(net, skb, n, classid, old, new, extack);
 		}
 
 		if (dev->flags & IFF_UP)
@@ -1450,19 +1464,22 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 			struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
-	struct tcmsg *tcm = nlmsg_data(n);
 	struct nlattr *tca[TCA_MAX + 1];
 	struct net_device *dev;
+	struct Qdisc *q, *p;
+	struct tcmsg *tcm;
 	u32 clid;
-	struct Qdisc *q = NULL;
-	struct Qdisc *p = NULL;
 	int err;
 
+replay:
 	err = nlmsg_parse_deprecated(n, sizeof(*tcm), tca, TCA_MAX,
 				     rtm_tca_policy, extack);
 	if (err < 0)
 		return err;
 
+	tcm = nlmsg_data(n);
+	q = p = NULL;
+
 	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
 	if (!dev)
 		return -ENODEV;
@@ -1515,8 +1532,11 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 			return -ENOENT;
 		}
 		err = qdisc_graft(dev, p, skb, n, clid, NULL, q, extack);
-		if (err != 0)
+		if (err != 0) {
+			if (err == -EAGAIN)
+				goto replay;
 			return err;
+		}
 	} else {
 		qdisc_notify(net, skb, n, clid, NULL, q, NULL);
 	}
@@ -1704,6 +1724,8 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 	if (err) {
 		if (q)
 			qdisc_put(q);
+		if (err == -EAGAIN)
+			goto replay;
 		return err;
 	}
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 37e41f972f69..e14ed47f961c 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1046,7 +1046,7 @@ static void qdisc_free_cb(struct rcu_head *head)
 	qdisc_free(q);
 }
 
-static void qdisc_destroy(struct Qdisc *qdisc)
+static void __qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
 
@@ -1070,6 +1070,14 @@ static void qdisc_destroy(struct Qdisc *qdisc)
 	call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
+void qdisc_destroy(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return;
+
+	__qdisc_destroy(qdisc);
+}
+
 void qdisc_put(struct Qdisc *qdisc)
 {
 	if (!qdisc)
@@ -1079,7 +1087,7 @@ void qdisc_put(struct Qdisc *qdisc)
 	    !refcount_dec_and_test(&qdisc->refcnt))
 		return;
 
-	qdisc_destroy(qdisc);
+	__qdisc_destroy(qdisc);
 }
 EXPORT_SYMBOL(qdisc_put);
 
@@ -1094,7 +1102,7 @@ void qdisc_put_unlocked(struct Qdisc *qdisc)
 	    !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
 		return;
 
-	qdisc_destroy(qdisc);
+	__qdisc_destroy(qdisc);
 	rtnl_unlock();
 }
 EXPORT_SYMBOL(qdisc_put_unlocked);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS
  2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
@ 2023-05-24 15:37   ` Pedro Tammela
  2023-05-24 15:57   ` Jamal Hadi Salim
  1 sibling, 0 replies; 20+ messages in thread
From: Pedro Tammela @ 2023-05-24 15:37 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On 23/05/2023 22:17, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> ingress Qdiscs are only supposed to be created under TC_H_INGRESS.
> Return -EOPNOTSUPP if 'parent' is not TC_H_INGRESS, similar to
> mq_init().
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Tested-by: Pedro Tammela <pctammela@mojatatu.com>

> ---
> change in v5:
>    - avoid underflowing @ingress_needed_key in ->destroy(), reported by
>      Pedro
> 
> change in v3, v4:
>    - add in-body From: tag
> 
>   net/sched/sch_ingress.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index 84838128b9c5..f9ef6deb2770 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -80,6 +80,9 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
>   	struct net_device *dev = qdisc_dev(sch);
>   	int err;
>   
> +	if (sch->parent != TC_H_INGRESS)
> +		return -EOPNOTSUPP;
> +
>   	net_inc_ingress_queue();
>   
>   	mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
> @@ -101,6 +104,9 @@ static void ingress_destroy(struct Qdisc *sch)
>   {
>   	struct ingress_sched_data *q = qdisc_priv(sch);
>   
> +	if (sch->parent != TC_H_INGRESS)
> +		return;
> +
>   	tcf_block_put_ext(q->block, sch, &q->block_info);
>   	net_dec_ingress_queue();
>   }


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT
  2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
@ 2023-05-24 15:38   ` Pedro Tammela
  2023-05-24 15:58     ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Pedro Tammela @ 2023-05-24 15:38 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On 23/05/2023 22:18, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which
> equals TC_H_INGRESS).  Return -EOPNOTSUPP if 'parent' is not
> TC_H_CLSACT.
> 
> Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Tested-by: Pedro Tammela <pctammela@mojatatu.com>

> ---
> change in v5:
>    - avoid underflowing @egress_needed_key in ->destroy(), reported by
>      Pedro
> 
> change in v3, v4:
>    - add in-body From: tag
> 
>   net/sched/sch_ingress.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index f9ef6deb2770..35963929e117 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -225,6 +225,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
>   	struct net_device *dev = qdisc_dev(sch);
>   	int err;
>   
> +	if (sch->parent != TC_H_CLSACT)
> +		return -EOPNOTSUPP;
> +
>   	net_inc_ingress_queue();
>   	net_inc_egress_queue();
>   
> @@ -254,6 +257,9 @@ static void clsact_destroy(struct Qdisc *sch)
>   {
>   	struct clsact_sched_data *q = qdisc_priv(sch);
>   
> +	if (sch->parent != TC_H_CLSACT)
> +		return;
> +
>   	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
>   	tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs
  2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
@ 2023-05-24 15:38   ` Pedro Tammela
  0 siblings, 0 replies; 20+ messages in thread
From: Pedro Tammela @ 2023-05-24 15:38 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On 23/05/2023 22:19, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> Currently it is possible to add e.g. an HTB Qdisc under ffff:fff1
> (TC_H_INGRESS, TC_H_CLSACT):
> 
>    $ ip link add name ifb0 type ifb
>    $ tc qdisc add dev ifb0 parent ffff:fff1 htb
>    $ tc qdisc add dev ifb0 clsact
>    Error: Exclusivity flag on, cannot modify.
>    $ drgn
>    ...
>    >>> ifb0 = netdev_get_by_name(prog, "ifb0")
>    >>> qdisc = ifb0.ingress_queue.qdisc_sleeping
>    >>> print(qdisc.ops.id.string_().decode())
>    htb
>    >>> qdisc.flags.value_() # TCQ_F_INGRESS
>    2
> 
> Only allow ingress and clsact Qdiscs under ffff:fff1.  Return -EINVAL
> for everything else.  Make TCQ_F_INGRESS a static flag of ingress and
> clsact Qdiscs.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Tested-by: Pedro Tammela <pctammela@mojatatu.com>

> ---
> change in v3, v4:
>    - add in-body From: tag
> 
>   net/sched/sch_api.c     | 7 ++++++-
>   net/sched/sch_ingress.c | 4 ++--
>   2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index fdb8f429333d..383195955b7d 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1252,7 +1252,12 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
>   	sch->parent = parent;
>   
>   	if (handle == TC_H_INGRESS) {
> -		sch->flags |= TCQ_F_INGRESS;
> +		if (!(sch->flags & TCQ_F_INGRESS)) {
> +			NL_SET_ERR_MSG(extack,
> +				       "Specified parent ID is reserved for ingress and clsact Qdiscs");
> +			err = -EINVAL;
> +			goto err_out3;
> +		}
>   		handle = TC_H_MAKE(TC_H_INGRESS, 0);
>   	} else {
>   		if (handle == 0) {
> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index 35963929e117..e43a45499372 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -140,7 +140,7 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
>   	.cl_ops			=	&ingress_class_ops,
>   	.id			=	"ingress",
>   	.priv_size		=	sizeof(struct ingress_sched_data),
> -	.static_flags		=	TCQ_F_CPUSTATS,
> +	.static_flags		=	TCQ_F_INGRESS | TCQ_F_CPUSTATS,
>   	.init			=	ingress_init,
>   	.destroy		=	ingress_destroy,
>   	.dump			=	ingress_dump,
> @@ -281,7 +281,7 @@ static struct Qdisc_ops clsact_qdisc_ops __read_mostly = {
>   	.cl_ops			=	&clsact_class_ops,
>   	.id			=	"clsact",
>   	.priv_size		=	sizeof(struct clsact_sched_data),
> -	.static_flags		=	TCQ_F_CPUSTATS,
> +	.static_flags		=	TCQ_F_INGRESS | TCQ_F_CPUSTATS,
>   	.init			=	clsact_init,
>   	.destroy		=	clsact_destroy,
>   	.dump			=	ingress_dump,


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs
  2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
@ 2023-05-24 15:38   ` Pedro Tammela
  0 siblings, 0 replies; 20+ messages in thread
From: Pedro Tammela @ 2023-05-24 15:38 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On 23/05/2023 22:19, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> Currently, after creating an ingress (or clsact) Qdisc and grafting it
> under TC_H_INGRESS (TC_H_CLSACT), it is possible to graft it again under
> e.g. a TBF Qdisc:
> 
>    $ ip link add ifb0 type ifb
>    $ tc qdisc add dev ifb0 handle 1: root tbf rate 20kbit buffer 1600 limit 3000
>    $ tc qdisc add dev ifb0 clsact
>    $ tc qdisc link dev ifb0 handle ffff: parent 1:1
>    $ tc qdisc show dev ifb0
>    qdisc tbf 1: root refcnt 2 rate 20Kbit burst 1600b lat 560.0ms
>    qdisc clsact ffff: parent ffff:fff1 refcnt 2
>                                        ^^^^^^^^
> 
> clsact's refcount has increased: it is now grafted under both
> TC_H_CLSACT and 1:1.
> 
> ingress and clsact Qdiscs should only be used under TC_H_INGRESS
> (TC_H_CLSACT).  Prohibit regrafting them.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Tested-by: Pedro Tammela <pctammela@mojatatu.com>

> ---
> change in v3, v4:
>    - add in-body From: tag
> 
>   net/sched/sch_api.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 383195955b7d..49b9c1bbfdd9 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1596,6 +1596,11 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>   					NL_SET_ERR_MSG(extack, "Invalid qdisc name");
>   					return -EINVAL;
>   				}
> +				if (q->flags & TCQ_F_INGRESS) {
> +					NL_SET_ERR_MSG(extack,
> +						       "Cannot regraft ingress or clsact Qdiscs");
> +					return -EINVAL;
> +				}
>   				if (q == p ||
>   				    (p && check_loop(q, p, 0))) {
>   					NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
@ 2023-05-24 15:39   ` Pedro Tammela
  2023-05-24 16:09     ` Jamal Hadi Salim
  2023-05-26 12:20     ` Jamal Hadi Salim
  0 siblings, 2 replies; 20+ messages in thread
From: Pedro Tammela @ 2023-05-24 15:39 UTC (permalink / raw)
  To: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Peilin Ye, Daniel Borkmann, John Fastabend, Vlad Buslov,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On 23/05/2023 22:20, Peilin Ye wrote:
> From: Peilin Ye <peilin.ye@bytedance.com>
> 
> mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in
> ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
> access this per-net_device pointer in mini_qdisc_pair_swap().  Similar for
> clsact Qdiscs and miniq_egress.
> 
> Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
> requests (thanks Hillf Danton for the hint), when replacing ingress or
> clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
> miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
> causing race conditions [1] including a use-after-free bug in
> mini_qdisc_pair_swap() reported by syzbot:
> 
>   BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
>   Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
> ...
>   Call Trace:
>    <TASK>
>    __dump_stack lib/dump_stack.c:88 [inline]
>    dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
>    print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
>    print_report mm/kasan/report.c:430 [inline]
>    kasan_report+0x11c/0x130 mm/kasan/report.c:536
>    mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
>    tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
>    tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
>    tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
>    tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
>    tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
> ...
> 
> @old and @new should not affect each other.  In other words, @old should
> never modify miniq_{in,e}gress after @new, and @new should not update
> @old's RCU state.  Fixing without changing sch_api.c turned out to be
> difficult (please refer to Closes: for discussions).  Instead, make sure
> @new's first call always happen after @old's last call, in
> qdisc_destroy(), has finished:
> 
> In qdisc_graft(), return -EAGAIN and tell the caller to replay
> (suggested by Vlad Buslov) if @old has any ongoing RTNL-unlocked filter
> requests, and call qdisc_destroy() for @old before grafting @new.
> 
> Introduce qdisc_refcount_dec_if_one() as the counterpart of
> qdisc_refcount_inc_nz() used for RTNL-unlocked filter requests.  Introduce
> a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
> just like qdisc_put() etc.
> 
> Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact
> Qdiscs".
> 
> [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
> TC_H_ROOT (no longer possible after patch "net/sched: sch_ingress: Only
> create under TC_H_INGRESS") on eth0 that has 8 transmission queues:
> 
>    Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
>    adds a flower filter X to A.
> 
>    Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
>    b2) to replace A, then adds a flower filter Y to B.
> 
>   Thread 1               A's refcnt   Thread 2
>    RTM_NEWQDISC (A, RTNL-locked)
>     qdisc_create(A)               1
>     qdisc_graft(A)                9
> 
>    RTM_NEWTFILTER (X, RTNL-unlocked)
>     __tcf_qdisc_find(A)          10
>     tcf_chain0_head_change(A)
>     mini_qdisc_pair_swap(A) (1st)
>              |
>              |                         RTM_NEWQDISC (B, RTNL-locked)
>           RCU sync                2     qdisc_graft(B)
>              |                    1     notify_and_destroy(A)
>              |
>     tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
>     qdisc_destroy(A)                    tcf_chain0_head_change(B)
>     tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
>     mini_qdisc_pair_swap(A) (3rd)                |
>             ...                                 ...
> 
> Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
> mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
> ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
> on eth0 will not find filter Y in sch_handle_ingress().
> 
> This is only one of the possible consequences of concurrently accessing
> miniq_{in,e}gress pointers.  The point is clear though: again, A should
> never modify those per-net_device pointers after B, and B should not
> update A's RCU state.
> 
> Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
> Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
> Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
> Cc: Hillf Danton <hdanton@sina.com>
> Cc: Vlad Buslov <vladbu@mellanox.com>
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Tested-by: Pedro Tammela <pctammela@mojatatu.com>

> ---
> change in v5:
>    - reinitialize @q, @p (suggested by Vlad) and @tcm before replaying,
>      just like @flags in tc_new_tfilter()
> 
> change in v3, v4:
>    - add in-body From: tag
> 
> changes in v2:
>    - replay the request if the current Qdisc has any ongoing RTNL-unlocked
>      filter requests (Vlad)
>    - minor changes in code comments and commit log
> 
>   include/net/sch_generic.h |  8 ++++++++
>   net/sched/sch_api.c       | 40 ++++++++++++++++++++++++++++++---------
>   net/sched/sch_generic.c   | 14 +++++++++++---
>   3 files changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index fab5ba3e61b7..3e9cc43cbc90 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -137,6 +137,13 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
>   	refcount_inc(&qdisc->refcnt);
>   }
>   
> +static inline bool qdisc_refcount_dec_if_one(struct Qdisc *qdisc)
> +{
> +	if (qdisc->flags & TCQ_F_BUILTIN)
> +		return true;
> +	return refcount_dec_if_one(&qdisc->refcnt);
> +}
> +
>   /* Intended to be used by unlocked users, when concurrent qdisc release is
>    * possible.
>    */
> @@ -652,6 +659,7 @@ void dev_deactivate_many(struct list_head *head);
>   struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
>   			      struct Qdisc *qdisc);
>   void qdisc_reset(struct Qdisc *qdisc);
> +void qdisc_destroy(struct Qdisc *qdisc);
>   void qdisc_put(struct Qdisc *qdisc);
>   void qdisc_put_unlocked(struct Qdisc *qdisc);
>   void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len);
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index f72a581666a2..286b7c58f5b9 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1080,10 +1080,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>   		if ((q && q->flags & TCQ_F_INGRESS) ||
>   		    (new && new->flags & TCQ_F_INGRESS)) {
>   			ingress = 1;
> -			if (!dev_ingress_queue(dev)) {
> +			dev_queue = dev_ingress_queue(dev);
> +			if (!dev_queue) {
>   				NL_SET_ERR_MSG(extack, "Device does not have an ingress queue");
>   				return -ENOENT;
>   			}
> +
> +			/* Replay if the current ingress (or clsact) Qdisc has ongoing
> +			 * RTNL-unlocked filter request(s).  This is the counterpart of that
> +			 * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
> +			 */
> +			if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
> +				return -EAGAIN;
>   		}
>   
>   		if (dev->flags & IFF_UP)
> @@ -1104,8 +1112,16 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>   				qdisc_put(old);
>   			}
>   		} else {
> -			dev_queue = dev_ingress_queue(dev);
> -			old = dev_graft_qdisc(dev_queue, new);
> +			old = dev_graft_qdisc(dev_queue, NULL);
> +
> +			/* {ingress,clsact}_destroy() @old before grafting @new to avoid
> +			 * unprotected concurrent accesses to net_device::miniq_{in,e}gress
> +			 * pointer(s) in mini_qdisc_pair_swap().
> +			 */
> +			qdisc_notify(net, skb, n, classid, old, new, extack);
> +			qdisc_destroy(old);
> +
> +			dev_graft_qdisc(dev_queue, new);
>   		}
>   
>   skip:
> @@ -1119,8 +1135,6 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>   
>   			if (new && new->ops->attach)
>   				new->ops->attach(new);
> -		} else {
> -			notify_and_destroy(net, skb, n, classid, old, new, extack);
>   		}
>   
>   		if (dev->flags & IFF_UP)
> @@ -1450,19 +1464,22 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>   			struct netlink_ext_ack *extack)
>   {
>   	struct net *net = sock_net(skb->sk);
> -	struct tcmsg *tcm = nlmsg_data(n);
>   	struct nlattr *tca[TCA_MAX + 1];
>   	struct net_device *dev;
> +	struct Qdisc *q, *p;
> +	struct tcmsg *tcm;
>   	u32 clid;
> -	struct Qdisc *q = NULL;
> -	struct Qdisc *p = NULL;
>   	int err;
>   
> +replay:
>   	err = nlmsg_parse_deprecated(n, sizeof(*tcm), tca, TCA_MAX,
>   				     rtm_tca_policy, extack);
>   	if (err < 0)
>   		return err;
>   
> +	tcm = nlmsg_data(n);
> +	q = p = NULL;
> +
>   	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
>   	if (!dev)
>   		return -ENODEV;
> @@ -1515,8 +1532,11 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>   			return -ENOENT;
>   		}
>   		err = qdisc_graft(dev, p, skb, n, clid, NULL, q, extack);
> -		if (err != 0)
> +		if (err != 0) {
> +			if (err == -EAGAIN)
> +				goto replay;
>   			return err;
> +		}
>   	} else {
>   		qdisc_notify(net, skb, n, clid, NULL, q, NULL);
>   	}
> @@ -1704,6 +1724,8 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>   	if (err) {
>   		if (q)
>   			qdisc_put(q);
> +		if (err == -EAGAIN)
> +			goto replay;
>   		return err;
>   	}
>   
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 37e41f972f69..e14ed47f961c 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1046,7 +1046,7 @@ static void qdisc_free_cb(struct rcu_head *head)
>   	qdisc_free(q);
>   }
>   
> -static void qdisc_destroy(struct Qdisc *qdisc)
> +static void __qdisc_destroy(struct Qdisc *qdisc)
>   {
>   	const struct Qdisc_ops  *ops = qdisc->ops;
>   
> @@ -1070,6 +1070,14 @@ static void qdisc_destroy(struct Qdisc *qdisc)
>   	call_rcu(&qdisc->rcu, qdisc_free_cb);
>   }
>   
> +void qdisc_destroy(struct Qdisc *qdisc)
> +{
> +	if (qdisc->flags & TCQ_F_BUILTIN)
> +		return;
> +
> +	__qdisc_destroy(qdisc);
> +}
> +
>   void qdisc_put(struct Qdisc *qdisc)
>   {
>   	if (!qdisc)
> @@ -1079,7 +1087,7 @@ void qdisc_put(struct Qdisc *qdisc)
>   	    !refcount_dec_and_test(&qdisc->refcnt))
>   		return;
>   
> -	qdisc_destroy(qdisc);
> +	__qdisc_destroy(qdisc);
>   }
>   EXPORT_SYMBOL(qdisc_put);
>   
> @@ -1094,7 +1102,7 @@ void qdisc_put_unlocked(struct Qdisc *qdisc)
>   	    !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
>   		return;
>   
> -	qdisc_destroy(qdisc);
> +	__qdisc_destroy(qdisc);
>   	rtnl_unlock();
>   }
>   EXPORT_SYMBOL(qdisc_put_unlocked);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS
  2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
  2023-05-24 15:37   ` Pedro Tammela
@ 2023-05-24 15:57   ` Jamal Hadi Salim
  1 sibling, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-24 15:57 UTC (permalink / raw)
  To: Peilin Ye
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann, John Fastabend,
	Vlad Buslov, Pedro Tammela, Hillf Danton, netdev, linux-kernel,
	Cong Wang

On Tue, May 23, 2023 at 9:18 PM Peilin Ye <yepeilin.cs@gmail.com> wrote:
>
> From: Peilin Ye <peilin.ye@bytedance.com>
>
> ingress Qdiscs are only supposed to be created under TC_H_INGRESS.
> Return -EOPNOTSUPP if 'parent' is not TC_H_INGRESS, similar to
> mq_init().
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal
> ---
> change in v5:
>   - avoid underflowing @ingress_needed_key in ->destroy(), reported by
>     Pedro
>
> change in v3, v4:
>   - add in-body From: tag
>
>  net/sched/sch_ingress.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> index 84838128b9c5..f9ef6deb2770 100644
> --- a/net/sched/sch_ingress.c
> +++ b/net/sched/sch_ingress.c
> @@ -80,6 +80,9 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt,
>         struct net_device *dev = qdisc_dev(sch);
>         int err;
>
> +       if (sch->parent != TC_H_INGRESS)
> +               return -EOPNOTSUPP;
> +
>         net_inc_ingress_queue();
>
>         mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress);
> @@ -101,6 +104,9 @@ static void ingress_destroy(struct Qdisc *sch)
>  {
>         struct ingress_sched_data *q = qdisc_priv(sch);
>
> +       if (sch->parent != TC_H_INGRESS)
> +               return;
> +
>         tcf_block_put_ext(q->block, sch, &q->block_info);
>         net_dec_ingress_queue();
>  }
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT
  2023-05-24 15:38   ` Pedro Tammela
@ 2023-05-24 15:58     ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-24 15:58 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann,
	John Fastabend, Vlad Buslov, Hillf Danton, netdev, linux-kernel,
	Cong Wang

On Wed, May 24, 2023 at 11:38 AM Pedro Tammela <pctammela@mojatatu.com> wrote:
>
> On 23/05/2023 22:18, Peilin Ye wrote:
> > From: Peilin Ye <peilin.ye@bytedance.com>
> >
> > clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which
> > equals TC_H_INGRESS).  Return -EOPNOTSUPP if 'parent' is not
> > TC_H_CLSACT.
> >
> > Fixes: 1f211a1b929c ("net, sched: add clsact qdisc")
> > Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
>
> Tested-by: Pedro Tammela <pctammela@mojatatu.com>
>

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal
> > ---
> > change in v5:
> >    - avoid underflowing @egress_needed_key in ->destroy(), reported by
> >      Pedro
> >
> > change in v3, v4:
> >    - add in-body From: tag
> >
> >   net/sched/sch_ingress.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
> > index f9ef6deb2770..35963929e117 100644
> > --- a/net/sched/sch_ingress.c
> > +++ b/net/sched/sch_ingress.c
> > @@ -225,6 +225,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt,
> >       struct net_device *dev = qdisc_dev(sch);
> >       int err;
> >
> > +     if (sch->parent != TC_H_CLSACT)
> > +             return -EOPNOTSUPP;
> > +
> >       net_inc_ingress_queue();
> >       net_inc_egress_queue();
> >
> > @@ -254,6 +257,9 @@ static void clsact_destroy(struct Qdisc *sch)
> >   {
> >       struct clsact_sched_data *q = qdisc_priv(sch);
> >
> > +     if (sch->parent != TC_H_CLSACT)
> > +             return;
> > +
> >       tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
> >       tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
> >
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-24 15:39   ` Pedro Tammela
@ 2023-05-24 16:09     ` Jamal Hadi Salim
  2023-05-25  9:25       ` Paolo Abeni
  2023-05-26 12:20     ` Jamal Hadi Salim
  1 sibling, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-24 16:09 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann,
	John Fastabend, Vlad Buslov, Hillf Danton, netdev, linux-kernel,
	Cong Wang

Pedro,
When you have a moment - could you run tc monitor in parallel to the
reproducer and double check it generates the correct events...

cheers,
jamal

On Wed, May 24, 2023 at 11:39 AM Pedro Tammela <pctammela@mojatatu.com> wrote:
>
> On 23/05/2023 22:20, Peilin Ye wrote:
> > From: Peilin Ye <peilin.ye@bytedance.com>
> >
> > mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in
> > ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
> > access this per-net_device pointer in mini_qdisc_pair_swap().  Similar for
> > clsact Qdiscs and miniq_egress.
> >
> > Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
> > requests (thanks Hillf Danton for the hint), when replacing ingress or
> > clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
> > miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
> > causing race conditions [1] including a use-after-free bug in
> > mini_qdisc_pair_swap() reported by syzbot:
> >
> >   BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
> >   Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
> > ...
> >   Call Trace:
> >    <TASK>
> >    __dump_stack lib/dump_stack.c:88 [inline]
> >    dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
> >    print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
> >    print_report mm/kasan/report.c:430 [inline]
> >    kasan_report+0x11c/0x130 mm/kasan/report.c:536
> >    mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
> >    tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
> >    tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
> >    tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
> >    tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
> >    tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
> > ...
> >
> > @old and @new should not affect each other.  In other words, @old should
> > never modify miniq_{in,e}gress after @new, and @new should not update
> > @old's RCU state.  Fixing without changing sch_api.c turned out to be
> > difficult (please refer to Closes: for discussions).  Instead, make sure
> > @new's first call always happen after @old's last call, in
> > qdisc_destroy(), has finished:
> >
> > In qdisc_graft(), return -EAGAIN and tell the caller to replay
> > (suggested by Vlad Buslov) if @old has any ongoing RTNL-unlocked filter
> > requests, and call qdisc_destroy() for @old before grafting @new.
> >
> > Introduce qdisc_refcount_dec_if_one() as the counterpart of
> > qdisc_refcount_inc_nz() used for RTNL-unlocked filter requests.  Introduce
> > a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
> > just like qdisc_put() etc.
> >
> > Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact
> > Qdiscs".
> >
> > [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
> > TC_H_ROOT (no longer possible after patch "net/sched: sch_ingress: Only
> > create under TC_H_INGRESS") on eth0 that has 8 transmission queues:
> >
> >    Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
> >    adds a flower filter X to A.
> >
> >    Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
> >    b2) to replace A, then adds a flower filter Y to B.
> >
> >   Thread 1               A's refcnt   Thread 2
> >    RTM_NEWQDISC (A, RTNL-locked)
> >     qdisc_create(A)               1
> >     qdisc_graft(A)                9
> >
> >    RTM_NEWTFILTER (X, RTNL-unlocked)
> >     __tcf_qdisc_find(A)          10
> >     tcf_chain0_head_change(A)
> >     mini_qdisc_pair_swap(A) (1st)
> >              |
> >              |                         RTM_NEWQDISC (B, RTNL-locked)
> >           RCU sync                2     qdisc_graft(B)
> >              |                    1     notify_and_destroy(A)
> >              |
> >     tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
> >     qdisc_destroy(A)                    tcf_chain0_head_change(B)
> >     tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
> >     mini_qdisc_pair_swap(A) (3rd)                |
> >             ...                                 ...
> >
> > Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
> > mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
> > ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
> > on eth0 will not find filter Y in sch_handle_ingress().
> >
> > This is only one of the possible consequences of concurrently accessing
> > miniq_{in,e}gress pointers.  The point is clear though: again, A should
> > never modify those per-net_device pointers after B, and B should not
> > update A's RCU state.
> >
> > Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
> > Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
> > Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
> > Cc: Hillf Danton <hdanton@sina.com>
> > Cc: Vlad Buslov <vladbu@mellanox.com>
> > Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
>
> Tested-by: Pedro Tammela <pctammela@mojatatu.com>
>
> > ---
> > change in v5:
> >    - reinitialize @q, @p (suggested by Vlad) and @tcm before replaying,
> >      just like @flags in tc_new_tfilter()
> >
> > change in v3, v4:
> >    - add in-body From: tag
> >
> > changes in v2:
> >    - replay the request if the current Qdisc has any ongoing RTNL-unlocked
> >      filter requests (Vlad)
> >    - minor changes in code comments and commit log
> >
> >   include/net/sch_generic.h |  8 ++++++++
> >   net/sched/sch_api.c       | 40 ++++++++++++++++++++++++++++++---------
> >   net/sched/sch_generic.c   | 14 +++++++++++---
> >   3 files changed, 50 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index fab5ba3e61b7..3e9cc43cbc90 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -137,6 +137,13 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
> >       refcount_inc(&qdisc->refcnt);
> >   }
> >
> > +static inline bool qdisc_refcount_dec_if_one(struct Qdisc *qdisc)
> > +{
> > +     if (qdisc->flags & TCQ_F_BUILTIN)
> > +             return true;
> > +     return refcount_dec_if_one(&qdisc->refcnt);
> > +}
> > +
> >   /* Intended to be used by unlocked users, when concurrent qdisc release is
> >    * possible.
> >    */
> > @@ -652,6 +659,7 @@ void dev_deactivate_many(struct list_head *head);
> >   struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
> >                             struct Qdisc *qdisc);
> >   void qdisc_reset(struct Qdisc *qdisc);
> > +void qdisc_destroy(struct Qdisc *qdisc);
> >   void qdisc_put(struct Qdisc *qdisc);
> >   void qdisc_put_unlocked(struct Qdisc *qdisc);
> >   void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, int n, int len);
> > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > index f72a581666a2..286b7c58f5b9 100644
> > --- a/net/sched/sch_api.c
> > +++ b/net/sched/sch_api.c
> > @@ -1080,10 +1080,18 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> >               if ((q && q->flags & TCQ_F_INGRESS) ||
> >                   (new && new->flags & TCQ_F_INGRESS)) {
> >                       ingress = 1;
> > -                     if (!dev_ingress_queue(dev)) {
> > +                     dev_queue = dev_ingress_queue(dev);
> > +                     if (!dev_queue) {
> >                               NL_SET_ERR_MSG(extack, "Device does not have an ingress queue");
> >                               return -ENOENT;
> >                       }
> > +
> > +                     /* Replay if the current ingress (or clsact) Qdisc has ongoing
> > +                      * RTNL-unlocked filter request(s).  This is the counterpart of that
> > +                      * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
> > +                      */
> > +                     if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
> > +                             return -EAGAIN;
> >               }
> >
> >               if (dev->flags & IFF_UP)
> > @@ -1104,8 +1112,16 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> >                               qdisc_put(old);
> >                       }
> >               } else {
> > -                     dev_queue = dev_ingress_queue(dev);
> > -                     old = dev_graft_qdisc(dev_queue, new);
> > +                     old = dev_graft_qdisc(dev_queue, NULL);
> > +
> > +                     /* {ingress,clsact}_destroy() @old before grafting @new to avoid
> > +                      * unprotected concurrent accesses to net_device::miniq_{in,e}gress
> > +                      * pointer(s) in mini_qdisc_pair_swap().
> > +                      */
> > +                     qdisc_notify(net, skb, n, classid, old, new, extack);
> > +                     qdisc_destroy(old);
> > +
> > +                     dev_graft_qdisc(dev_queue, new);
> >               }
> >
> >   skip:
> > @@ -1119,8 +1135,6 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> >
> >                       if (new && new->ops->attach)
> >                               new->ops->attach(new);
> > -             } else {
> > -                     notify_and_destroy(net, skb, n, classid, old, new, extack);
> >               }
> >
> >               if (dev->flags & IFF_UP)
> > @@ -1450,19 +1464,22 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
> >                       struct netlink_ext_ack *extack)
> >   {
> >       struct net *net = sock_net(skb->sk);
> > -     struct tcmsg *tcm = nlmsg_data(n);
> >       struct nlattr *tca[TCA_MAX + 1];
> >       struct net_device *dev;
> > +     struct Qdisc *q, *p;
> > +     struct tcmsg *tcm;
> >       u32 clid;
> > -     struct Qdisc *q = NULL;
> > -     struct Qdisc *p = NULL;
> >       int err;
> >
> > +replay:
> >       err = nlmsg_parse_deprecated(n, sizeof(*tcm), tca, TCA_MAX,
> >                                    rtm_tca_policy, extack);
> >       if (err < 0)
> >               return err;
> >
> > +     tcm = nlmsg_data(n);
> > +     q = p = NULL;
> > +
> >       dev = __dev_get_by_index(net, tcm->tcm_ifindex);
> >       if (!dev)
> >               return -ENODEV;
> > @@ -1515,8 +1532,11 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
> >                       return -ENOENT;
> >               }
> >               err = qdisc_graft(dev, p, skb, n, clid, NULL, q, extack);
> > -             if (err != 0)
> > +             if (err != 0) {
> > +                     if (err == -EAGAIN)
> > +                             goto replay;
> >                       return err;
> > +             }
> >       } else {
> >               qdisc_notify(net, skb, n, clid, NULL, q, NULL);
> >       }
> > @@ -1704,6 +1724,8 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
> >       if (err) {
> >               if (q)
> >                       qdisc_put(q);
> > +             if (err == -EAGAIN)
> > +                     goto replay;
> >               return err;
> >       }
> >
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 37e41f972f69..e14ed47f961c 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -1046,7 +1046,7 @@ static void qdisc_free_cb(struct rcu_head *head)
> >       qdisc_free(q);
> >   }
> >
> > -static void qdisc_destroy(struct Qdisc *qdisc)
> > +static void __qdisc_destroy(struct Qdisc *qdisc)
> >   {
> >       const struct Qdisc_ops  *ops = qdisc->ops;
> >
> > @@ -1070,6 +1070,14 @@ static void qdisc_destroy(struct Qdisc *qdisc)
> >       call_rcu(&qdisc->rcu, qdisc_free_cb);
> >   }
> >
> > +void qdisc_destroy(struct Qdisc *qdisc)
> > +{
> > +     if (qdisc->flags & TCQ_F_BUILTIN)
> > +             return;
> > +
> > +     __qdisc_destroy(qdisc);
> > +}
> > +
> >   void qdisc_put(struct Qdisc *qdisc)
> >   {
> >       if (!qdisc)
> > @@ -1079,7 +1087,7 @@ void qdisc_put(struct Qdisc *qdisc)
> >           !refcount_dec_and_test(&qdisc->refcnt))
> >               return;
> >
> > -     qdisc_destroy(qdisc);
> > +     __qdisc_destroy(qdisc);
> >   }
> >   EXPORT_SYMBOL(qdisc_put);
> >
> > @@ -1094,7 +1102,7 @@ void qdisc_put_unlocked(struct Qdisc *qdisc)
> >           !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
> >               return;
> >
> > -     qdisc_destroy(qdisc);
> > +     __qdisc_destroy(qdisc);
> >       rtnl_unlock();
> >   }
> >   EXPORT_SYMBOL(qdisc_put_unlocked);
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-24 16:09     ` Jamal Hadi Salim
@ 2023-05-25  9:25       ` Paolo Abeni
  2023-05-26 12:19         ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2023-05-25  9:25 UTC (permalink / raw)
  To: Jamal Hadi Salim, Pedro Tammela
  Cc: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann, John Fastabend,
	Vlad Buslov, Hillf Danton, netdev, linux-kernel, Cong Wang

On Wed, 2023-05-24 at 12:09 -0400, Jamal Hadi Salim wrote:
> When you have a moment - could you run tc monitor in parallel to the
> reproducer and double check it generates the correct events...

FTR, I'll wait a bit to apply this series, to allow for the above
tests. Unless someone will scream very loudly very soon, it's not going
to enter today's PR. Since the addressed issue is an ancient one, it
should not a problem, I hope.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact
  2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
                   ` (5 preceding siblings ...)
  2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
@ 2023-05-25 17:16 ` Vlad Buslov
  6 siblings, 0 replies; 20+ messages in thread
From: Vlad Buslov @ 2023-05-25 17:16 UTC (permalink / raw)
  To: Peilin Ye
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko, Peilin Ye,
	Daniel Borkmann, John Fastabend, Vlad Buslov, Pedro Tammela,
	Hillf Danton, netdev, linux-kernel, Cong Wang

On Tue 23 May 2023 at 18:16, Peilin Ye <yepeilin.cs@gmail.com> wrote:
> Link to v4: https://lore.kernel.org/r/cover.1684825171.git.peilin.ye@bytedance.com/
> Link to v3 (incomplete): https://lore.kernel.org/r/cover.1684821877.git.peilin.ye@bytedance.com/
> Link to v2: https://lore.kernel.org/r/cover.1684796705.git.peilin.ye@bytedance.com/
> Link to v1: https://lore.kernel.org/r/cover.1683326865.git.peilin.ye@bytedance.com/
>
> Hi all,
>
> These are v5 fixes for ingress and clsact Qdiscs.  Please take another
> look at patch 1, 2 and 6, thanks!
>

Thanks again!

Reviewed-by: Vlad Buslov <vladbu@nvidia.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-25  9:25       ` Paolo Abeni
@ 2023-05-26 12:19         ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-26 12:19 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Pedro Tammela, Peilin Ye, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann,
	John Fastabend, Vlad Buslov, Hillf Danton, netdev, linux-kernel,
	Cong Wang

On Thu, May 25, 2023 at 5:25 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2023-05-24 at 12:09 -0400, Jamal Hadi Salim wrote:
> > When you have a moment - could you run tc monitor in parallel to the
> > reproducer and double check it generates the correct events...
>
> FTR, I'll wait a bit to apply this series, to allow for the above
> tests. Unless someone will scream very loudly very soon, it's not going
> to enter today's PR. Since the addressed issue is an ancient one, it
> should not a problem, I hope.

Sorry I do not mean to hold this back - I think we should have applied
V1 then worried about making things better.  We can worry about events
after.
So Acking this.

cheers,
jamal

> Cheers,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-24 15:39   ` Pedro Tammela
  2023-05-24 16:09     ` Jamal Hadi Salim
@ 2023-05-26 12:20     ` Jamal Hadi Salim
  2023-05-26 19:47       ` Jamal Hadi Salim
  1 sibling, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-26 12:20 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann,
	John Fastabend, Vlad Buslov, Hillf Danton, netdev, linux-kernel,
	Cong Wang

On Wed, May 24, 2023 at 11:39 AM Pedro Tammela <pctammela@mojatatu.com> wrote:
>
> On 23/05/2023 22:20, Peilin Ye wrote:
> > From: Peilin Ye <peilin.ye@bytedance.com>
> >
> > mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized in
> > ingress_init() to point to net_device::miniq_ingress.  ingress Qdiscs
> > access this per-net_device pointer in mini_qdisc_pair_swap().  Similar for
> > clsact Qdiscs and miniq_egress.
> >
> > Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
> > requests (thanks Hillf Danton for the hint), when replacing ingress or
> > clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
> > miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
> > causing race conditions [1] including a use-after-free bug in
> > mini_qdisc_pair_swap() reported by syzbot:
> >
> >   BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
> >   Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
> > ...
> >   Call Trace:
> >    <TASK>
> >    __dump_stack lib/dump_stack.c:88 [inline]
> >    dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
> >    print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
> >    print_report mm/kasan/report.c:430 [inline]
> >    kasan_report+0x11c/0x130 mm/kasan/report.c:536
> >    mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
> >    tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
> >    tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
> >    tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
> >    tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
> >    tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
> > ...
> >
> > @old and @new should not affect each other.  In other words, @old should
> > never modify miniq_{in,e}gress after @new, and @new should not update
> > @old's RCU state.  Fixing without changing sch_api.c turned out to be
> > difficult (please refer to Closes: for discussions).  Instead, make sure
> > @new's first call always happen after @old's last call, in
> > qdisc_destroy(), has finished:
> >
> > In qdisc_graft(), return -EAGAIN and tell the caller to replay
> > (suggested by Vlad Buslov) if @old has any ongoing RTNL-unlocked filter
> > requests, and call qdisc_destroy() for @old before grafting @new.
> >
> > Introduce qdisc_refcount_dec_if_one() as the counterpart of
> > qdisc_refcount_inc_nz() used for RTNL-unlocked filter requests.  Introduce
> > a non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
> > just like qdisc_put() etc.
> >
> > Depends on patch "net/sched: Refactor qdisc_graft() for ingress and clsact
> > Qdiscs".
> >
> > [1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
> > TC_H_ROOT (no longer possible after patch "net/sched: sch_ingress: Only
> > create under TC_H_INGRESS") on eth0 that has 8 transmission queues:
> >
> >    Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2), then
> >    adds a flower filter X to A.
> >
> >    Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
> >    b2) to replace A, then adds a flower filter Y to B.
> >
> >   Thread 1               A's refcnt   Thread 2
> >    RTM_NEWQDISC (A, RTNL-locked)
> >     qdisc_create(A)               1
> >     qdisc_graft(A)                9
> >
> >    RTM_NEWTFILTER (X, RTNL-unlocked)
> >     __tcf_qdisc_find(A)          10
> >     tcf_chain0_head_change(A)
> >     mini_qdisc_pair_swap(A) (1st)
> >              |
> >              |                         RTM_NEWQDISC (B, RTNL-locked)
> >           RCU sync                2     qdisc_graft(B)
> >              |                    1     notify_and_destroy(A)
> >              |
> >     tcf_block_release(A)          0    RTM_NEWTFILTER (Y, RTNL-unlocked)
> >     qdisc_destroy(A)                    tcf_chain0_head_change(B)
> >     tcf_chain0_head_change_cb_del(A)    mini_qdisc_pair_swap(B) (2nd)
> >     mini_qdisc_pair_swap(A) (3rd)                |
> >             ...                                 ...
> >
> > Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to its
> > mini Qdisc, b1.  Then, A calls mini_qdisc_pair_swap() again during
> > ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress packets
> > on eth0 will not find filter Y in sch_handle_ingress().
> >
> > This is only one of the possible consequences of concurrently accessing
> > miniq_{in,e}gress pointers.  The point is clear though: again, A should
> > never modify those per-net_device pointers after B, and B should not
> > update A's RCU state.
> >
> > Fixes: 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
> > Fixes: 87f373921c4e ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
> > Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
> > Cc: Hillf Danton <hdanton@sina.com>
> > Cc: Vlad Buslov <vladbu@mellanox.com>
> > Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
>
> Tested-by: Pedro Tammela <pctammela@mojatatu.com>


Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
  2023-05-26 12:20     ` Jamal Hadi Salim
@ 2023-05-26 19:47       ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2023-05-26 19:47 UTC (permalink / raw)
  To: Pedro Tammela
  Cc: Peilin Ye, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Cong Wang, Jiri Pirko, Peilin Ye, Daniel Borkmann,
	John Fastabend, Vlad Buslov, Hillf Danton, netdev, linux-kernel,
	Cong Wang

On Fri, May 26, 2023 at 8:20 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Wed, May 24, 2023 at 11:39 AM Pedro Tammela <pctammela@mojatatu.com> wrote:
> >
> > On 23/05/2023 22:20, Peilin Ye wrote:
> > > From: Peilin Ye <peilin.ye@bytedance.com>

> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

I apologize i am going to take this back, lets please hold the series for now.

In pursuit for the effect on events, Pedro and I spent a few _hours_
chasing this - and regardless of the events, there are still
challenges with the concurrency issue. The current reproducer
unfortunately cant cause damage after patch 2, so really patch 6 was
not being tested. We hacked the repro to hit codepath patch 6 fixes.
We are not sure what the root cause is - but it certainly due to the
series. Peilin, Pedro will post the new repro.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-05-26 19:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-24 15:37   ` Pedro Tammela
2023-05-24 15:57   ` Jamal Hadi Salim
2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24 15:58     ` Jamal Hadi Salim
2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:20 ` [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-24 15:39   ` Pedro Tammela
2023-05-24 16:09     ` Jamal Hadi Salim
2023-05-25  9:25       ` Paolo Abeni
2023-05-26 12:19         ` Jamal Hadi Salim
2023-05-26 12:20     ` Jamal Hadi Salim
2023-05-26 19:47       ` Jamal Hadi Salim
2023-05-25 17:16 ` [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Vlad Buslov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox