* [PATCH net] net/sched: Fix mirred deadlock on device recursion
@ 2024-04-15 21:07 Victor Nogueira
2024-04-18 1:40 ` patchwork-bot+netdevbpf
2024-06-07 14:40 ` Johannes Berg
0 siblings, 2 replies; 8+ messages in thread
From: Victor Nogueira @ 2024-04-15 21:07 UTC (permalink / raw)
To: edumazet, davem, kuba, pabeni
Cc: jhs, jiri, xiyou.wangcong, netdev, renmingshuai, pctammela
From: Eric Dumazet <edumazet@google.com>
When the mirred action is used on a classful egress qdisc and a packet is
mirrored or redirected to self we hit a qdisc lock deadlock.
See trace below.
[..... other info removed for brevity....]
[ 82.890906]
[ 82.890906] ============================================
[ 82.890906] WARNING: possible recursive locking detected
[ 82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G W
[ 82.890906] --------------------------------------------
[ 82.890906] ping/418 is trying to acquire lock:
[ 82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[ 82.890906]
[ 82.890906] but task is already holding lock:
[ 82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[ 82.890906]
[ 82.890906] other info that might help us debug this:
[ 82.890906] Possible unsafe locking scenario:
[ 82.890906]
[ 82.890906] CPU0
[ 82.890906] ----
[ 82.890906] lock(&sch->q.lock);
[ 82.890906] lock(&sch->q.lock);
[ 82.890906]
[ 82.890906] *** DEADLOCK ***
[ 82.890906]
[..... other info removed for brevity....]
Example setup (eth0->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
action mirred egress redirect dev eth0
Another example(eth0->eth1->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
action mirred egress redirect dev eth1
tc qdisc add dev eth1 root handle 1: htb default 30
tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
action mirred egress redirect dev eth0
We fix this by adding an owner field (CPU id) to struct Qdisc set after
root qdisc is entered. When the softirq enters it a second time, if the
qdisc owner is the same CPU, the packet is dropped to break the loop.
Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
Fixes: 3bcb846ca4cf ("net: get rid of spin_trylock() in net_tx_action()")
Fixes: e578d9c02587 ("net: sched: use counter to break reclassify loops")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
include/net/sch_generic.h | 1 +
net/core/dev.c | 6 ++++++
net/sched/sch_generic.c | 1 +
3 files changed, 8 insertions(+)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 76db6be16083..f561dfb79743 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -117,6 +117,7 @@ struct Qdisc {
struct qdisc_skb_head q;
struct gnet_stats_basic_sync bstats;
struct gnet_stats_queue qstats;
+ int owner;
unsigned long state;
unsigned long state2; /* must be written under qdisc spinlock */
struct Qdisc *next_sched;
diff --git a/net/core/dev.c b/net/core/dev.c
index 854a3a28a8d8..f6c6e494f0a9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3808,6 +3808,10 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
return rc;
}
+ if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
+ kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
+ return NET_XMIT_DROP;
+ }
/*
* Heuristic to force contended enqueues to serialize on a
* separate lock before trying to get qdisc main lock.
@@ -3847,7 +3851,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
qdisc_run_end(q);
rc = NET_XMIT_SUCCESS;
} else {
+ WRITE_ONCE(q->owner, smp_processor_id());
rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+ WRITE_ONCE(q->owner, -1);
if (qdisc_run_begin(q)) {
if (unlikely(contended)) {
spin_unlock(&q->busylock);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ff5336493777..4a2c763e2d11 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -974,6 +974,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
sch->enqueue = ops->enqueue;
sch->dequeue = ops->dequeue;
sch->dev_queue = dev_queue;
+ sch->owner = -1;
netdev_hold(dev, &sch->dev_tracker, GFP_KERNEL);
refcount_set(&sch->refcnt, 1);
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion 2024-04-15 21:07 [PATCH net] net/sched: Fix mirred deadlock on device recursion Victor Nogueira @ 2024-04-18 1:40 ` patchwork-bot+netdevbpf 2024-06-07 14:40 ` Johannes Berg 1 sibling, 0 replies; 8+ messages in thread From: patchwork-bot+netdevbpf @ 2024-04-18 1:40 UTC (permalink / raw) To: Victor Nogueira Cc: edumazet, davem, kuba, pabeni, jhs, jiri, xiyou.wangcong, netdev, renmingshuai, pctammela Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Mon, 15 Apr 2024 18:07:28 -0300 you wrote: > From: Eric Dumazet <edumazet@google.com> > > When the mirred action is used on a classful egress qdisc and a packet is > mirrored or redirected to self we hit a qdisc lock deadlock. > See trace below. > > [..... other info removed for brevity....] > [ 82.890906] > [ 82.890906] ============================================ > [ 82.890906] WARNING: possible recursive locking detected > [ 82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G W > [ 82.890906] -------------------------------------------- > [ 82.890906] ping/418 is trying to acquire lock: > [ 82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at: > __dev_queue_xmit+0x1778/0x3550 > [ 82.890906] > [ 82.890906] but task is already holding lock: > [ 82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at: > __dev_queue_xmit+0x1778/0x3550 > [ 82.890906] > [ 82.890906] other info that might help us debug this: > [ 82.890906] Possible unsafe locking scenario: > [ 82.890906] > [ 82.890906] CPU0 > [ 82.890906] ---- > [ 82.890906] lock(&sch->q.lock); > [ 82.890906] lock(&sch->q.lock); > [ 82.890906] > [ 82.890906] *** DEADLOCK *** > [ 82.890906] > [..... other info removed for brevity....] > > [...] Here is the summary with links: - [net] net/sched: Fix mirred deadlock on device recursion https://git.kernel.org/netdev/net/c/0f022d32c3ec You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion 2024-04-15 21:07 [PATCH net] net/sched: Fix mirred deadlock on device recursion Victor Nogueira 2024-04-18 1:40 ` patchwork-bot+netdevbpf @ 2024-06-07 14:40 ` Johannes Berg 2024-06-07 14:54 ` Eric Dumazet 1 sibling, 1 reply; 8+ messages in thread From: Johannes Berg @ 2024-06-07 14:40 UTC (permalink / raw) To: Victor Nogueira, edumazet, davem, kuba, pabeni Cc: jhs, jiri, xiyou.wangcong, netdev, renmingshuai, pctammela Hi all, I noticed today that this causes a userspace visible change in behaviour (and a regression in some of our tests) for transmitting to a device when it has no carrier, when noop_qdisc is assigned to it. Instead of silently dropping the packets, -ENOBUFS will be returned if the socket opted in to RECVERR. The reason for this is that the static noop_qdisc: struct Qdisc noop_qdisc = { .enqueue = noop_enqueue, .dequeue = noop_dequeue, .flags = TCQ_F_BUILTIN, .ops = &noop_qdisc_ops, .q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock), .dev_queue = &noop_netdev_queue, .busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock), .gso_skb = { .next = (struct sk_buff *)&noop_qdisc.gso_skb, .prev = (struct sk_buff *)&noop_qdisc.gso_skb, .qlen = 0, .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.gso_skb.lock), }, .skb_bad_txq = { .next = (struct sk_buff *)&noop_qdisc.skb_bad_txq, .prev = (struct sk_buff *)&noop_qdisc.skb_bad_txq, .qlen = 0, .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock), }, }; doesn't have an owner set, and it's obviously not allocated via qdisc_alloc(). Thus, it defaults to 0, so if you get to it on CPU 0 (I was using ARCH=um which isn't even SMP) then it will just always run into the > + if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) { > + kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP); > + return NET_XMIT_DROP; > + } case. I'm not sure I understand the busylock logic well enough, so almost seems to me we shouldn't do this whole thing on the noop_qdisc at all, e.g. via tagging owner with -2 to say don't do it: --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3865,9 +3865,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, qdisc_run_end(q); rc = NET_XMIT_SUCCESS; } else { - WRITE_ONCE(q->owner, smp_processor_id()); + if (q->owner != -2) + WRITE_ONCE(q->owner, smp_processor_id()); rc = dev_qdisc_enqueue(skb, q, &to_free, txq); - WRITE_ONCE(q->owner, -1); + if (q->owner != -2) + WRITE_ONCE(q->owner, -1); if (qdisc_run_begin(q)) { if (unlikely(contended)) { spin_unlock(&q->busylock); diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 2a637a17061b..e857e4638671 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -657,6 +657,7 @@ static struct netdev_queue noop_netdev_queue = { }; struct Qdisc noop_qdisc = { + .owner = -2, .enqueue = noop_enqueue, .dequeue = noop_dequeue, .flags = TCQ_F_BUILTIN, (and yes, I believe it doesn't need to be READ_ONCE for the check against -2 since that's mutually exclusive with all other values) Or maybe simply ignoring the value for the noop_qdisc: --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3822,7 +3822,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, return rc; } - if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) { + if (unlikely(q != &noop_qdisc && READ_ONCE(q->owner) == smp_processor_id())) { kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP); return NET_XMIT_DROP; } That's shorter, but I'm not sure if there might be other special cases... Or maybe someone can think of an even better fix? Thanks, johannes ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion 2024-06-07 14:40 ` Johannes Berg @ 2024-06-07 14:54 ` Eric Dumazet 2024-06-07 14:56 ` Johannes Berg 2024-06-07 15:53 ` [PATCH] net/sched: initialize noop_qdisc owner Johannes Berg 0 siblings, 2 replies; 8+ messages in thread From: Eric Dumazet @ 2024-06-07 14:54 UTC (permalink / raw) To: Johannes Berg Cc: Victor Nogueira, davem, kuba, pabeni, jhs, jiri, xiyou.wangcong, netdev, renmingshuai, pctammela On Fri, Jun 7, 2024 at 4:40 PM Johannes Berg <johannes@sipsolutions.net> wrote: > > Hi all, > > I noticed today that this causes a userspace visible change in behaviour > (and a regression in some of our tests) for transmitting to a device > when it has no carrier, when noop_qdisc is assigned to it. Instead of > silently dropping the packets, -ENOBUFS will be returned if the socket > opted in to RECVERR. > > The reason for this is that the static noop_qdisc: > > struct Qdisc noop_qdisc = { > .enqueue = noop_enqueue, > .dequeue = noop_dequeue, > .flags = TCQ_F_BUILTIN, > .ops = &noop_qdisc_ops, > .q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock), > .dev_queue = &noop_netdev_queue, > .busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock), > .gso_skb = { > .next = (struct sk_buff *)&noop_qdisc.gso_skb, > .prev = (struct sk_buff *)&noop_qdisc.gso_skb, > .qlen = 0, > .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.gso_skb.lock), > }, > .skb_bad_txq = { > .next = (struct sk_buff *)&noop_qdisc.skb_bad_txq, > .prev = (struct sk_buff *)&noop_qdisc.skb_bad_txq, > .qlen = 0, > .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock), > }, > }; > > doesn't have an owner set, and it's obviously not allocated via > qdisc_alloc(). Thus, it defaults to 0, so if you get to it on CPU 0 (I > was using ARCH=um which isn't even SMP) then it will just always run > into the > > > + if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) { > > + kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP); > > + return NET_XMIT_DROP; > > + } > > case. > > I'm not sure I understand the busylock logic well enough, so almost > seems to me we shouldn't do this whole thing on the noop_qdisc at all, > e.g. via tagging owner with -2 to say don't do it: > > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3865,9 +3865,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > qdisc_run_end(q); > rc = NET_XMIT_SUCCESS; > } else { > - WRITE_ONCE(q->owner, smp_processor_id()); > + if (q->owner != -2) > + WRITE_ONCE(q->owner, smp_processor_id()); > rc = dev_qdisc_enqueue(skb, q, &to_free, txq); > - WRITE_ONCE(q->owner, -1); > + if (q->owner != -2) > + WRITE_ONCE(q->owner, -1); > if (qdisc_run_begin(q)) { > if (unlikely(contended)) { > spin_unlock(&q->busylock); > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 2a637a17061b..e857e4638671 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -657,6 +657,7 @@ static struct netdev_queue noop_netdev_queue = { > }; > > struct Qdisc noop_qdisc = { > + .owner = -2, > .enqueue = noop_enqueue, > .dequeue = noop_dequeue, > .flags = TCQ_F_BUILTIN, > > > (and yes, I believe it doesn't need to be READ_ONCE for the check > against -2 since that's mutually exclusive with all other values) > > Or maybe simply ignoring the value for the noop_qdisc: > > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3822,7 +3822,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > return rc; > } > > - if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) { > + if (unlikely(q != &noop_qdisc && READ_ONCE(q->owner) == smp_processor_id())) { > kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP); > return NET_XMIT_DROP; > } > > That's shorter, but I'm not sure if there might be other special > cases... > > Or maybe someone can think of an even better fix? Why not simply initialize noop_qdisc.owner to -1 ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion 2024-06-07 14:54 ` Eric Dumazet @ 2024-06-07 14:56 ` Johannes Berg 2024-06-07 15:53 ` [PATCH] net/sched: initialize noop_qdisc owner Johannes Berg 1 sibling, 0 replies; 8+ messages in thread From: Johannes Berg @ 2024-06-07 14:56 UTC (permalink / raw) To: Eric Dumazet Cc: Victor Nogueira, davem, kuba, pabeni, jhs, jiri, xiyou.wangcong, netdev, renmingshuai, pctammela On Fri, 2024-06-07 at 16:54 +0200, Eric Dumazet wrote: > > > > I'm not sure I understand the busylock logic well enough > Why not simply initialize noop_qdisc.owner to -1 ? I didn't understand the locking logic, so I was worried you could still have it be used in parallel since it can be assigned to any number of devices. johannes ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] net/sched: initialize noop_qdisc owner 2024-06-07 14:54 ` Eric Dumazet 2024-06-07 14:56 ` Johannes Berg @ 2024-06-07 15:53 ` Johannes Berg 2024-06-08 8:05 ` Eric Dumazet 2024-06-11 2:40 ` patchwork-bot+netdevbpf 1 sibling, 2 replies; 8+ messages in thread From: Johannes Berg @ 2024-06-07 15:53 UTC (permalink / raw) To: netdev; +Cc: Johannes Berg From: Johannes Berg <johannes.berg@intel.com> When the noop_qdisc owner isn't initialized, then it will be 0, so packets will erroneously be regarded as having been subject to recursion as long as only CPU 0 queues them. For non-SMP, that's all packets, of course. This causes a change in what's reported to userspace, normally noop_qdisc would drop packets silently, but with this change the syscall returns -ENOBUFS if RECVERR is also set on the socket. Fix this by initializing the owner field to -1, just like it would be for dynamically allocated qdiscs by qdisc_alloc(). Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- net/sched/sch_generic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index d3f6006b563c..fb32984d7a16 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -673,6 +673,7 @@ struct Qdisc noop_qdisc = { .qlen = 0, .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock), }, + .owner = -1, }; EXPORT_SYMBOL(noop_qdisc); -- 2.45.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] net/sched: initialize noop_qdisc owner 2024-06-07 15:53 ` [PATCH] net/sched: initialize noop_qdisc owner Johannes Berg @ 2024-06-08 8:05 ` Eric Dumazet 2024-06-11 2:40 ` patchwork-bot+netdevbpf 1 sibling, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2024-06-08 8:05 UTC (permalink / raw) To: Johannes Berg, netdev; +Cc: Johannes Berg On 6/7/24 17:53, Johannes Berg wrote: > From: Johannes Berg <johannes.berg@intel.com> > > When the noop_qdisc owner isn't initialized, then it will be 0, > so packets will erroneously be regarded as having been subject > to recursion as long as only CPU 0 queues them. For non-SMP, > that's all packets, of course. This causes a change in what's > reported to userspace, normally noop_qdisc would drop packets > silently, but with this change the syscall returns -ENOBUFS if > RECVERR is also set on the socket. > > Fix this by initializing the owner field to -1, just like it > would be for dynamically allocated qdiscs by qdisc_alloc(). > > Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") > Signed-off-by: Johannes Berg <johannes.berg@intel.com> I found this quite by luck. Please CC maintainers next time, and blamed patch authors :/ Believe it or not, I do not follow netdev@ traffic. Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] net/sched: initialize noop_qdisc owner 2024-06-07 15:53 ` [PATCH] net/sched: initialize noop_qdisc owner Johannes Berg 2024-06-08 8:05 ` Eric Dumazet @ 2024-06-11 2:40 ` patchwork-bot+netdevbpf 1 sibling, 0 replies; 8+ messages in thread From: patchwork-bot+netdevbpf @ 2024-06-11 2:40 UTC (permalink / raw) To: Johannes Berg; +Cc: netdev, johannes.berg Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Fri, 7 Jun 2024 17:53:32 +0200 you wrote: > From: Johannes Berg <johannes.berg@intel.com> > > When the noop_qdisc owner isn't initialized, then it will be 0, > so packets will erroneously be regarded as having been subject > to recursion as long as only CPU 0 queues them. For non-SMP, > that's all packets, of course. This causes a change in what's > reported to userspace, normally noop_qdisc would drop packets > silently, but with this change the syscall returns -ENOBUFS if > RECVERR is also set on the socket. > > [...] Here is the summary with links: - net/sched: initialize noop_qdisc owner https://git.kernel.org/netdev/net/c/44180feaccf2 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-06-11 2:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-15 21:07 [PATCH net] net/sched: Fix mirred deadlock on device recursion Victor Nogueira 2024-04-18 1:40 ` patchwork-bot+netdevbpf 2024-06-07 14:40 ` Johannes Berg 2024-06-07 14:54 ` Eric Dumazet 2024-06-07 14:56 ` Johannes Berg 2024-06-07 15:53 ` [PATCH] net/sched: initialize noop_qdisc owner Johannes Berg 2024-06-08 8:05 ` Eric Dumazet 2024-06-11 2:40 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).