netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] netlink: fix potential deadlock in netlink_set_err()
@ 2023-06-21 15:43 Eric Dumazet
  2023-06-22  8:04 ` Jiri Pirko
  2023-06-23  2:40 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2023-06-21 15:43 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, eric.dumazet, Eric Dumazet, syzbot+a7d200a347f912723e5c,
	Johannes Berg

syzbot reported a possible deadlock in netlink_set_err() [1]

A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
for netlink_lock_table()") in netlink_lock_table()

This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
which were not covered by cited commit.

[1]

WARNING: possible irq lock inversion dependency detected
6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted

syz-executor.2/23011 just changed the state of lock:
ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
but this lock was taken by another, SOFTIRQ-safe lock in the past:
 (&local->queue_stop_reason_lock){..-.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(nl_table_lock);
                               local_irq_disable();
                               lock(&local->queue_stop_reason_lock);
                               lock(nl_table_lock);
  <Interrupt>
    lock(&local->queue_stop_reason_lock);

 *** DEADLOCK ***

Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
Reported-by: syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=a7d200a347f912723e5c
Link: https://lore.kernel.org/netdev/000000000000e38d1605fea5747e@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
---
 net/netlink/af_netlink.c | 5 +++--
 net/netlink/diag.c       | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 3a1e0fd5bf149c3ece9e0f004107efe149e3f2c3..5968b6450d828a52a3990d18ea597687ab03170e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1600,6 +1600,7 @@ static int do_one_set_err(struct sock *sk, struct netlink_set_err_data *p)
 int netlink_set_err(struct sock *ssk, u32 portid, u32 group, int code)
 {
 	struct netlink_set_err_data info;
+	unsigned long flags;
 	struct sock *sk;
 	int ret = 0;
 
@@ -1609,12 +1610,12 @@ int netlink_set_err(struct sock *ssk, u32 portid, u32 group, int code)
 	/* sk->sk_err wants a positive error value */
 	info.code = -code;
 
-	read_lock(&nl_table_lock);
+	read_lock_irqsave(&nl_table_lock, flags);
 
 	sk_for_each_bound(sk, &nl_table[ssk->sk_protocol].mc_list)
 		ret += do_one_set_err(sk, &info);
 
-	read_unlock(&nl_table_lock);
+	read_unlock_irqrestore(&nl_table_lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(netlink_set_err);
diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index c6255eac305c7b64f6e00f10d3bf8cf42c680c15..4143b2ea4195aeacbd4eb828430c836cfb79d859 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -94,6 +94,7 @@ static int __netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 	struct net *net = sock_net(skb->sk);
 	struct netlink_diag_req *req;
 	struct netlink_sock *nlsk;
+	unsigned long flags;
 	struct sock *sk;
 	int num = 2;
 	int ret = 0;
@@ -152,7 +153,7 @@ static int __netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 	num++;
 
 mc_list:
-	read_lock(&nl_table_lock);
+	read_lock_irqsave(&nl_table_lock, flags);
 	sk_for_each_bound(sk, &tbl->mc_list) {
 		if (sk_hashed(sk))
 			continue;
@@ -173,7 +174,7 @@ static int __netlink_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 		}
 		num++;
 	}
-	read_unlock(&nl_table_lock);
+	read_unlock_irqrestore(&nl_table_lock, flags);
 
 done:
 	cb->args[0] = num;
-- 
2.41.0.178.g377b9f9a00-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-21 15:43 [PATCH net] netlink: fix potential deadlock in netlink_set_err() Eric Dumazet
@ 2023-06-22  8:04 ` Jiri Pirko
  2023-06-22  8:14   ` Eric Dumazet
  2023-06-23  2:40 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Pirko @ 2023-06-22  8:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
>syzbot reported a possible deadlock in netlink_set_err() [1]
>
>A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>for netlink_lock_table()") in netlink_lock_table()
>
>This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>which were not covered by cited commit.
>
>[1]
>
>WARNING: possible irq lock inversion dependency detected
>6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>
>syz-executor.2/23011 just changed the state of lock:
>ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>but this lock was taken by another, SOFTIRQ-safe lock in the past:
> (&local->queue_stop_reason_lock){..-.}-{2:2}
>
>and interrupts could create inverse lock ordering between them.
>
>other info that might help us debug this:
> Possible interrupt unsafe locking scenario:
>
>       CPU0                    CPU1
>       ----                    ----
>  lock(nl_table_lock);
>                               local_irq_disable();
>                               lock(&local->queue_stop_reason_lock);
>                               lock(nl_table_lock);
>  <Interrupt>
>    lock(&local->queue_stop_reason_lock);
>
> *** DEADLOCK ***
>
>Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")

I don't think that this "fixes" tag is correct. The referenced commit
is a fix to the same issue on a different codepath, not the one who
actually introduced the issue.

The code itself looks fine to me.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:04 ` Jiri Pirko
@ 2023-06-22  8:14   ` Eric Dumazet
  2023-06-22  8:25     ` Johannes Berg
  2023-06-22  8:29     ` Jiri Pirko
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2023-06-22  8:14 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
> >syzbot reported a possible deadlock in netlink_set_err() [1]
> >
> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> >for netlink_lock_table()") in netlink_lock_table()
> >
> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> >which were not covered by cited commit.
> >
> >[1]
> >
> >WARNING: possible irq lock inversion dependency detected
> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> >
> >syz-executor.2/23011 just changed the state of lock:
> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
> > (&local->queue_stop_reason_lock){..-.}-{2:2}
> >
> >and interrupts could create inverse lock ordering between them.
> >
> >other info that might help us debug this:
> > Possible interrupt unsafe locking scenario:
> >
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(nl_table_lock);
> >                               local_irq_disable();
> >                               lock(&local->queue_stop_reason_lock);
> >                               lock(nl_table_lock);
> >  <Interrupt>
> >    lock(&local->queue_stop_reason_lock);
> >
> > *** DEADLOCK ***
> >
> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>
> I don't think that this "fixes" tag is correct. The referenced commit
> is a fix to the same issue on a different codepath, not the one who
> actually introduced the issue.
>
> The code itself looks fine to me.

Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.

I presume that it would make no sense to backport my patch on stable branches
if the cited commit was not backported yet.

Now, if you think we can be more precise, I will let Johannes do the
archeology in ieee80211 code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:14   ` Eric Dumazet
@ 2023-06-22  8:25     ` Johannes Berg
  2023-06-22  8:39       ` Eric Dumazet
  2023-06-22  8:29     ` Jiri Pirko
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Berg @ 2023-06-22  8:25 UTC (permalink / raw)
  To: Eric Dumazet, Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com

On Thu, 2023-06-22 at 08:14 +0000, Eric Dumazet wrote:
> On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
> > 
> > Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
> > > syzbot reported a possible deadlock in netlink_set_err() [1]
> > > 
> > > A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> > > for netlink_lock_table()") in netlink_lock_table()
> > > 
> > > This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> > > which were not covered by cited commit.
> > > 
> > > [1]
> > > 
> > > WARNING: possible irq lock inversion dependency detected
> > > 6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> > > 
> > > syz-executor.2/23011 just changed the state of lock:
> > > ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> > > but this lock was taken by another, SOFTIRQ-safe lock in the past:
> > > (&local->queue_stop_reason_lock){..-.}-{2:2}
> > > 
> > > and interrupts could create inverse lock ordering between them.
> > > 
> > > other info that might help us debug this:
> > > Possible interrupt unsafe locking scenario:
> > > 
> > >       CPU0                    CPU1
> > >       ----                    ----
> > >  lock(nl_table_lock);
> > >                               local_irq_disable();
> > >                               lock(&local->queue_stop_reason_lock);
> > >                               lock(nl_table_lock);
> > >  <Interrupt>
> > >    lock(&local->queue_stop_reason_lock);
> > > 
> > > *** DEADLOCK ***
> > > 
> > > Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
> > 
> > I don't think that this "fixes" tag is correct. The referenced commit
> > is a fix to the same issue on a different codepath, not the one who
> > actually introduced the issue.
> > 
> > The code itself looks fine to me.
> 
> Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
> 
> I presume that it would make no sense to backport my patch on stable branches
> if the cited commit was not backported yet.

I'd tend to even say it doesn't make sense to backport this at all, it's
very unlikely to happen in practice since that code path ...

> Now, if you think we can be more precise, I will let Johannes do the
> archeology in ieee80211 code.

I first thought that'd be commit d4fa14cd62bd ("mac80211: use
ieee80211_free_txskb in a few more places") then, but that didn't call
to netlink yet ... so commit 8a2fbedcdc9b ("mac80211: combine
status/drop reporting"), but that's almost as old (and really old too,
kernel 3.8).

But again, I'm not sure it's worth worrying about ... Actually I'm
pretty sure it's _not_ worth worrying about :)

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:14   ` Eric Dumazet
  2023-06-22  8:25     ` Johannes Berg
@ 2023-06-22  8:29     ` Jiri Pirko
  2023-06-22  8:42       ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Pirko @ 2023-06-22  8:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@google.com wrote:
>On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
>> >syzbot reported a possible deadlock in netlink_set_err() [1]
>> >
>> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>> >for netlink_lock_table()") in netlink_lock_table()
>> >
>> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>> >which were not covered by cited commit.
>> >
>> >[1]
>> >
>> >WARNING: possible irq lock inversion dependency detected
>> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>> >
>> >syz-executor.2/23011 just changed the state of lock:
>> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
>> > (&local->queue_stop_reason_lock){..-.}-{2:2}
>> >
>> >and interrupts could create inverse lock ordering between them.
>> >
>> >other info that might help us debug this:
>> > Possible interrupt unsafe locking scenario:
>> >
>> >       CPU0                    CPU1
>> >       ----                    ----
>> >  lock(nl_table_lock);
>> >                               local_irq_disable();
>> >                               lock(&local->queue_stop_reason_lock);
>> >                               lock(nl_table_lock);
>> >  <Interrupt>
>> >    lock(&local->queue_stop_reason_lock);
>> >
>> > *** DEADLOCK ***
>> >
>> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>>
>> I don't think that this "fixes" tag is correct. The referenced commit
>> is a fix to the same issue on a different codepath, not the one who
>> actually introduced the issue.
>>
>> The code itself looks fine to me.
>
>Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.

I'm aware it didn't. But that does not implicate this patch should have
that commit as a "Fixes:" tag. Either have the correct one pointing out
which commit introduced the issue or omit the "Fixes:" tag entirely.
That's my point.


>
>I presume that it would make no sense to backport my patch on stable branches
>if the cited commit was not backported yet.
>
>Now, if you think we can be more precise, I will let Johannes do the
>archeology in ieee80211 code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:25     ` Johannes Berg
@ 2023-06-22  8:39       ` Eric Dumazet
  2023-06-22  8:44         ` Johannes Berg
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2023-06-22  8:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jiri Pirko, David S . Miller, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com

On Thu, Jun 22, 2023 at 10:26 AM Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> On Thu, 2023-06-22 at 08:14 +0000, Eric Dumazet wrote:
> > On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
> > >
> > > Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
> > > > syzbot reported a possible deadlock in netlink_set_err() [1]
> > > >
> > > > A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> > > > for netlink_lock_table()") in netlink_lock_table()
> > > >
> > > > This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> > > > which were not covered by cited commit.
> > > >
> > > > [1]
> > > >
> > > > WARNING: possible irq lock inversion dependency detected
> > > > 6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> > > >
> > > > syz-executor.2/23011 just changed the state of lock:
> > > > ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> > > > but this lock was taken by another, SOFTIRQ-safe lock in the past:
> > > > (&local->queue_stop_reason_lock){..-.}-{2:2}
> > > >
> > > > and interrupts could create inverse lock ordering between them.
> > > >
> > > > other info that might help us debug this:
> > > > Possible interrupt unsafe locking scenario:
> > > >
> > > >       CPU0                    CPU1
> > > >       ----                    ----
> > > >  lock(nl_table_lock);
> > > >                               local_irq_disable();
> > > >                               lock(&local->queue_stop_reason_lock);
> > > >                               lock(nl_table_lock);
> > > >  <Interrupt>
> > > >    lock(&local->queue_stop_reason_lock);
> > > >
> > > > *** DEADLOCK ***
> > > >
> > > > Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
> > >
> > > I don't think that this "fixes" tag is correct. The referenced commit
> > > is a fix to the same issue on a different codepath, not the one who
> > > actually introduced the issue.
> > >
> > > The code itself looks fine to me.
> >
> > Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
> >
> > I presume that it would make no sense to backport my patch on stable branches
> > if the cited commit was not backported yet.
>
> I'd tend to even say it doesn't make sense to backport this at all, it's
> very unlikely to happen in practice since that code path ...
>
> > Now, if you think we can be more precise, I will let Johannes do the
> > archeology in ieee80211 code.
>
> I first thought that'd be commit d4fa14cd62bd ("mac80211: use
> ieee80211_free_txskb in a few more places") then, but that didn't call
> to netlink yet ... so commit 8a2fbedcdc9b ("mac80211: combine
> status/drop reporting"), but that's almost as old (and really old too,
> kernel 3.8).
>
> But again, I'm not sure it's worth worrying about ... Actually I'm
> pretty sure it's _not_ worth worrying about :)

OK, should we revert your patch then ?

I am slightly confused, you are saying this bug is not worth fixing ?

This will prevent syzbot from discovering other bugs, this is your call.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:29     ` Jiri Pirko
@ 2023-06-22  8:42       ` Eric Dumazet
  2023-06-22  9:26         ` Jiri Pirko
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2023-06-22  8:42 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@google.com wrote:
> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
> >> >
> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> >> >for netlink_lock_table()") in netlink_lock_table()
> >> >
> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> >> >which were not covered by cited commit.
> >> >
> >> >[1]
> >> >
> >> >WARNING: possible irq lock inversion dependency detected
> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> >> >
> >> >syz-executor.2/23011 just changed the state of lock:
> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
> >> >
> >> >and interrupts could create inverse lock ordering between them.
> >> >
> >> >other info that might help us debug this:
> >> > Possible interrupt unsafe locking scenario:
> >> >
> >> >       CPU0                    CPU1
> >> >       ----                    ----
> >> >  lock(nl_table_lock);
> >> >                               local_irq_disable();
> >> >                               lock(&local->queue_stop_reason_lock);
> >> >                               lock(nl_table_lock);
> >> >  <Interrupt>
> >> >    lock(&local->queue_stop_reason_lock);
> >> >
> >> > *** DEADLOCK ***
> >> >
> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
> >>
> >> I don't think that this "fixes" tag is correct. The referenced commit
> >> is a fix to the same issue on a different codepath, not the one who
> >> actually introduced the issue.
> >>
> >> The code itself looks fine to me.
> >
> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
>
> I'm aware it didn't. But that does not implicate this patch should have
> that commit as a "Fixes:" tag. Either have the correct one pointing out
> which commit introduced the issue or omit the "Fixes:" tag entirely.
> That's my point.

My point is that the cited commit should have fixed all points
where the nl_table_lock was read locked.

When we do locking changes, we have to look at the whole picture,
not the precise point where lockdep complained.

For instance, this is the reason this patch also changes  __netlink_diag_dump(),
even if the report had nothing to do about it yet.

So this Fixes: tag is fine, thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:39       ` Eric Dumazet
@ 2023-06-22  8:44         ` Johannes Berg
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Berg @ 2023-06-22  8:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, David S . Miller, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, eric.dumazet@gmail.com,
	syzbot+a7d200a347f912723e5c@syzkaller.appspotmail.com

On Thu, 2023-06-22 at 10:39 +0200, Eric Dumazet wrote:
> > 
> > I first thought that'd be commit d4fa14cd62bd ("mac80211: use
> > ieee80211_free_txskb in a few more places") then, but that didn't call
> > to netlink yet ... so commit 8a2fbedcdc9b ("mac80211: combine
> > status/drop reporting"), but that's almost as old (and really old too,
> > kernel 3.8).
> > 
> > But again, I'm not sure it's worth worrying about ... Actually I'm
> > pretty sure it's _not_ worth worrying about :)
> 
> OK, should we revert your patch then ?

No no, I think we need that original commit, and also the fix now.

> I am slightly confused, you are saying this bug is not worth fixing ?

Sorry, yeah, that wasn't really clear. I'm saying it's no worth worrying
about backporting it to stable and really figuring out where it was
introduced, since it's never actually going to happen in practice. We
never even hit the code path that made lockdep show it in any prior
testing with lockdep (which we do too), and actually causing the
deadlock is far, far, less likely since you have to race simultaneously
with an interrupt from the same device (where the netdev down is
happening) doing something to the queue state ... so that already needs
multiple interfaces since one is just going down, etc.

> This will prevent syzbot from discovering other bugs, this is your call.

Oh sure, I agree we should fix it and I think your fix is fine. I'm just
not sure why we're having a discussion about backporting it and what it
really fixes when it's never really going to happen in practice, and
even syzbot couldn't figure out how to reproduce it due to the race
nature of the issue.

johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  8:42       ` Eric Dumazet
@ 2023-06-22  9:26         ` Jiri Pirko
  2023-06-22 10:10           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Pirko @ 2023-06-22  9:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@google.com wrote:
>On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@google.com wrote:
>> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
>> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
>> >> >
>> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>> >> >for netlink_lock_table()") in netlink_lock_table()
>> >> >
>> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>> >> >which were not covered by cited commit.
>> >> >
>> >> >[1]
>> >> >
>> >> >WARNING: possible irq lock inversion dependency detected
>> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>> >> >
>> >> >syz-executor.2/23011 just changed the state of lock:
>> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
>> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
>> >> >
>> >> >and interrupts could create inverse lock ordering between them.
>> >> >
>> >> >other info that might help us debug this:
>> >> > Possible interrupt unsafe locking scenario:
>> >> >
>> >> >       CPU0                    CPU1
>> >> >       ----                    ----
>> >> >  lock(nl_table_lock);
>> >> >                               local_irq_disable();
>> >> >                               lock(&local->queue_stop_reason_lock);
>> >> >                               lock(nl_table_lock);
>> >> >  <Interrupt>
>> >> >    lock(&local->queue_stop_reason_lock);
>> >> >
>> >> > *** DEADLOCK ***
>> >> >
>> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>> >>
>> >> I don't think that this "fixes" tag is correct. The referenced commit
>> >> is a fix to the same issue on a different codepath, not the one who
>> >> actually introduced the issue.
>> >>
>> >> The code itself looks fine to me.
>> >
>> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
>>
>> I'm aware it didn't. But that does not implicate this patch should have
>> that commit as a "Fixes:" tag. Either have the correct one pointing out
>> which commit introduced the issue or omit the "Fixes:" tag entirely.
>> That's my point.
>
>My point is that the cited commit should have fixed all points
>where the nl_table_lock was read locked.

Yeah, it was incomplete. I agree. I don't argue with that.

>
>When we do locking changes, we have to look at the whole picture,
>not the precise point where lockdep complained.
>
>For instance, this is the reason this patch also changes  __netlink_diag_dump(),
>even if the report had nothing to do about it yet.
>
>So this Fixes: tag is fine, thank you.

Then we have to agree to disagree I guess. It is not fine.

Quoting from Documentation/process/handling-regressions.rst:
  Add a "Fixes:" tag to specify the commit causing the regression.

Quoting from Documentation/process/submitting-patches.rst:
  A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
  is used to make it easy to determine where a bug originated, which can help
  review a bug fix.

1d482e666b8e is not causing any regression (to be known of), definitelly
not the one this patch is fixing.

Misusing "Fixes" tag like this only adds unnecessary confusion.
That is my point from the beginning. I don't understand your resistance
to be honest.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22  9:26         ` Jiri Pirko
@ 2023-06-22 10:10           ` Eric Dumazet
  2023-06-22 11:27             ` Jiri Pirko
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2023-06-22 10:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

On Thu, Jun 22, 2023 at 11:27 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@google.com wrote:
> >On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@google.com wrote:
> >> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >> >>
> >> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
> >> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
> >> >> >
> >> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> >> >> >for netlink_lock_table()") in netlink_lock_table()
> >> >> >
> >> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> >> >> >which were not covered by cited commit.
> >> >> >
> >> >> >[1]
> >> >> >
> >> >> >WARNING: possible irq lock inversion dependency detected
> >> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
> >> >> >
> >> >> >syz-executor.2/23011 just changed the state of lock:
> >> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
> >> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
> >> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
> >> >> >
> >> >> >and interrupts could create inverse lock ordering between them.
> >> >> >
> >> >> >other info that might help us debug this:
> >> >> > Possible interrupt unsafe locking scenario:
> >> >> >
> >> >> >       CPU0                    CPU1
> >> >> >       ----                    ----
> >> >> >  lock(nl_table_lock);
> >> >> >                               local_irq_disable();
> >> >> >                               lock(&local->queue_stop_reason_lock);
> >> >> >                               lock(nl_table_lock);
> >> >> >  <Interrupt>
> >> >> >    lock(&local->queue_stop_reason_lock);
> >> >> >
> >> >> > *** DEADLOCK ***
> >> >> >
> >> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
> >> >>
> >> >> I don't think that this "fixes" tag is correct. The referenced commit
> >> >> is a fix to the same issue on a different codepath, not the one who
> >> >> actually introduced the issue.
> >> >>
> >> >> The code itself looks fine to me.
> >> >
> >> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
> >>
> >> I'm aware it didn't. But that does not implicate this patch should have
> >> that commit as a "Fixes:" tag. Either have the correct one pointing out
> >> which commit introduced the issue or omit the "Fixes:" tag entirely.
> >> That's my point.
> >
> >My point is that the cited commit should have fixed all points
> >where the nl_table_lock was read locked.
>
> Yeah, it was incomplete. I agree. I don't argue with that.
>
> >
> >When we do locking changes, we have to look at the whole picture,
> >not the precise point where lockdep complained.
> >
> >For instance, this is the reason this patch also changes  __netlink_diag_dump(),
> >even if the report had nothing to do about it yet.
> >
> >So this Fixes: tag is fine, thank you.
>
> Then we have to agree to disagree I guess. It is not fine.
>
> Quoting from Documentation/process/handling-regressions.rst:
>   Add a "Fixes:" tag to specify the commit causing the regression.
>
> Quoting from Documentation/process/submitting-patches.rst:
>   A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
>   is used to make it easy to determine where a bug originated, which can help
>   review a bug fix.

The previous commit definitely had an issue, because it was not complete.

We have many other cases of commits that complete the work started earlier.

I will continue doing so, and I will continue writing changelogs that
displease you.

>
> 1d482e666b8e is not causing any regression (to be known of), definitelly
> not the one this patch is fixing.

>
> Misusing "Fixes" tag like this only adds unnecessary confusion.
> That is my point from the beginning. I don't understand your resistance
> to be honest.

To be honest, I do not understand you either, I suggest we move on,
we obviously are not on the same page.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-22 10:10           ` Eric Dumazet
@ 2023-06-22 11:27             ` Jiri Pirko
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Pirko @ 2023-06-22 11:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	eric.dumazet, syzbot+a7d200a347f912723e5c, Johannes Berg

Thu, Jun 22, 2023 at 12:10:19PM CEST, edumazet@google.com wrote:
>On Thu, Jun 22, 2023 at 11:27 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Thu, Jun 22, 2023 at 10:42:34AM CEST, edumazet@google.com wrote:
>> >On Thu, Jun 22, 2023 at 10:29 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Thu, Jun 22, 2023 at 10:14:56AM CEST, edumazet@google.com wrote:
>> >> >On Thu, Jun 22, 2023 at 10:04 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >>
>> >> >> Wed, Jun 21, 2023 at 05:43:37PM CEST, edumazet@google.com wrote:
>> >> >> >syzbot reported a possible deadlock in netlink_set_err() [1]
>> >> >> >
>> >> >> >A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
>> >> >> >for netlink_lock_table()") in netlink_lock_table()
>> >> >> >
>> >> >> >This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
>> >> >> >which were not covered by cited commit.
>> >> >> >
>> >> >> >[1]
>> >> >> >
>> >> >> >WARNING: possible irq lock inversion dependency detected
>> >> >> >6.4.0-rc6-syzkaller-00240-g4e9f0ec38852 #0 Not tainted
>> >> >> >
>> >> >> >syz-executor.2/23011 just changed the state of lock:
>> >> >> >ffffffff8e1a7a58 (nl_table_lock){.+.?}-{2:2}, at: netlink_set_err+0x2e/0x3a0 net/netlink/af_netlink.c:1612
>> >> >> >but this lock was taken by another, SOFTIRQ-safe lock in the past:
>> >> >> > (&local->queue_stop_reason_lock){..-.}-{2:2}
>> >> >> >
>> >> >> >and interrupts could create inverse lock ordering between them.
>> >> >> >
>> >> >> >other info that might help us debug this:
>> >> >> > Possible interrupt unsafe locking scenario:
>> >> >> >
>> >> >> >       CPU0                    CPU1
>> >> >> >       ----                    ----
>> >> >> >  lock(nl_table_lock);
>> >> >> >                               local_irq_disable();
>> >> >> >                               lock(&local->queue_stop_reason_lock);
>> >> >> >                               lock(nl_table_lock);
>> >> >> >  <Interrupt>
>> >> >> >    lock(&local->queue_stop_reason_lock);
>> >> >> >
>> >> >> > *** DEADLOCK ***
>> >> >> >
>> >> >> >Fixes: 1d482e666b8e ("netlink: disable IRQs for netlink_lock_table()")
>> >> >>
>> >> >> I don't think that this "fixes" tag is correct. The referenced commit
>> >> >> is a fix to the same issue on a different codepath, not the one who
>> >> >> actually introduced the issue.
>> >> >>
>> >> >> The code itself looks fine to me.
>> >> >
>> >> >Note that the 1d482e666b8e had no Fixes: tag, otherwise I would have taken it.
>> >>
>> >> I'm aware it didn't. But that does not implicate this patch should have
>> >> that commit as a "Fixes:" tag. Either have the correct one pointing out
>> >> which commit introduced the issue or omit the "Fixes:" tag entirely.
>> >> That's my point.
>> >
>> >My point is that the cited commit should have fixed all points
>> >where the nl_table_lock was read locked.
>>
>> Yeah, it was incomplete. I agree. I don't argue with that.
>>
>> >
>> >When we do locking changes, we have to look at the whole picture,
>> >not the precise point where lockdep complained.
>> >
>> >For instance, this is the reason this patch also changes  __netlink_diag_dump(),
>> >even if the report had nothing to do about it yet.
>> >
>> >So this Fixes: tag is fine, thank you.
>>
>> Then we have to agree to disagree I guess. It is not fine.
>>
>> Quoting from Documentation/process/handling-regressions.rst:
>>   Add a "Fixes:" tag to specify the commit causing the regression.
>>
>> Quoting from Documentation/process/submitting-patches.rst:
>>   A Fixes: tag indicates that the patch fixes an issue in a previous commit. It
>>   is used to make it easy to determine where a bug originated, which can help
>>   review a bug fix.
>
>The previous commit definitely had an issue, because it was not complete.

Is it the originator of the bug? No.


>
>We have many other cases of commits that complete the work started earlier.

The fact something was done wrongly in the past should not serve
as an argument to continue.


>
>I will continue doing so, and I will continue writing changelogs that
>displease you.

Awesome. Basically you say you don't care about the rules that others
are required to stick to. Making your statement personal is a bit
offending I have to say.


>
>>
>> 1d482e666b8e is not causing any regression (to be known of), definitelly
>> not the one this patch is fixing.
>
>>
>> Misusing "Fixes" tag like this only adds unnecessary confusion.
>> That is my point from the beginning. I don't understand your resistance
>> to be honest.
>
>To be honest, I do not understand you either, I suggest we move on,
>we obviously are not on the same page.

Yeah, well rules are here for very good reasons and should apply
to anyone. It's a matter of principles. Who else than one of the
maintainers should obey them and require people to obey them?

Just my 2 cents. I'm done with this thread.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] netlink: fix potential deadlock in netlink_set_err()
  2023-06-21 15:43 [PATCH net] netlink: fix potential deadlock in netlink_set_err() Eric Dumazet
  2023-06-22  8:04 ` Jiri Pirko
@ 2023-06-23  2:40 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-23  2:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, netdev, eric.dumazet,
	syzbot+a7d200a347f912723e5c, johannes.berg

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 21 Jun 2023 15:43:37 +0000 you wrote:
> syzbot reported a possible deadlock in netlink_set_err() [1]
> 
> A similar issue was fixed in commit 1d482e666b8e ("netlink: disable IRQs
> for netlink_lock_table()") in netlink_lock_table()
> 
> This patch adds IRQ safety to netlink_set_err() and __netlink_diag_dump()
> which were not covered by cited commit.
> 
> [...]

Here is the summary with links:
  - [net] netlink: fix potential deadlock in netlink_set_err()
    https://git.kernel.org/netdev/net/c/8d61f926d420

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-06-23  2:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21 15:43 [PATCH net] netlink: fix potential deadlock in netlink_set_err() Eric Dumazet
2023-06-22  8:04 ` Jiri Pirko
2023-06-22  8:14   ` Eric Dumazet
2023-06-22  8:25     ` Johannes Berg
2023-06-22  8:39       ` Eric Dumazet
2023-06-22  8:44         ` Johannes Berg
2023-06-22  8:29     ` Jiri Pirko
2023-06-22  8:42       ` Eric Dumazet
2023-06-22  9:26         ` Jiri Pirko
2023-06-22 10:10           ` Eric Dumazet
2023-06-22 11:27             ` Jiri Pirko
2023-06-23  2:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).