All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Florian Westphal <fw@strlen.de>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>,
	syzbot+8ea26396ff85d23a8929@syzkaller.appspotmail.com,
	davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
	kuba@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, pabeni@redhat.com,
	syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [net?] WARNING: refcount bug in inet_twsk_kill
Date: Sun, 11 Aug 2024 18:28:50 +0200	[thread overview]
Message-ID: <20240811162850.GE13736@breakpoint.cc> (raw)
In-Reply-To: <20240811145443.GD13736@breakpoint.cc>

Florian Westphal <fw@strlen.de> wrote:
> https://syzkaller.appspot.com/x/log.txt?x=117f3182980000
> 
> ... shows at two cores racing:
> 
> [ 3127.234402][ T1396] CPU: 3 PID: 1396 Comm: syz-executor.3 Not
> and
> [ 3127.257864][   T13] CPU: 1 PID: 13 Comm: kworker/u32:1 Not tainted 6.9.0-syzkalle (netns cleanup net).
> 
> 
> first splat backtrace shows invocation of tcp_sk_exit_batch() from
> netns error unwinding code.
> 
> Second one lacks backtrace, but its also in tcp_sk_exit_batch(),

... which doesn't work.  Does this look like a plausible
theory/exlanation?

Given:
1 exiting netns, has >= 1 tw sk.
1 (unrelated) netns that failed in setup_net

... we run into following race:

exiting netns, from cleanup wq, calls tcp_sk_exit_batch(), which calls
inet_twsk_purge(&tcp_hashinfo).

At same time, from error unwinding code, we also call tcp_sk_exit_batch().

Both threads walk tcp_hashinfo ehash buckets.

From work queue (normal netns exit path), we hit

303                         if (state == TCP_TIME_WAIT) {
304                                 inet_twsk_deschedule_put(inet_twsk(sk));

Because both threads operate on tcp_hashinfo, the unrelated
struct net (exiting net) is also visible to error-unwinding thread.

So, error unwinding code will call

303                         if (state == TCP_TIME_WAIT) {
304                                 inet_twsk_deschedule_put(inet_twsk(sk));

for the same tw sk and both threads do

218 void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
219 {
220         if (del_timer_sync(&tw->tw_timer))
221                 inet_twsk_kill(tw);

Error unwind path cancel timer, calls inet_twsk_kill, while
work queue sees timer as already shut-down so it ends up
returning to tcp_sk_exit_batch(), where it will WARN here:

  WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));

... because the supposedly-last tw_refcount decrement did not drop
it down to 0.

Meanwhile, error unwiding thread calls refcount_dec() on
tw_refcount, which now drops down to 0 instead of 1, which
provides another warn splat.

I'll ponder on ways to fix this tomorrow unless someone
else already has better theory/solution.

  reply	other threads:[~2024-08-11 16:28 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-11  1:29 [syzbot] [net?] WARNING: refcount bug in inet_twsk_kill syzbot
2024-08-11  2:29 ` Kuniyuki Iwashima
2024-08-11  5:42   ` Jason Xing
2024-08-11 13:24   ` Florian Westphal
2024-08-11 14:54     ` Florian Westphal
2024-08-11 16:28       ` Florian Westphal [this message]
2024-08-11 23:00         ` Kuniyuki Iwashima
2024-08-11 23:08           ` Kuniyuki Iwashima
2024-08-12  0:36             ` Jason Xing
2024-08-12 14:01             ` Florian Westphal
2024-08-12 14:30               ` Jason Xing
2024-08-12 15:03                 ` Florian Westphal
2024-08-12 15:49                   ` Jason Xing
2024-08-12 20:00               ` Kuniyuki Iwashima
2024-08-12 22:28                 ` [PATCH net] tcp: prevent concurrent execution of tcp_sk_exit_batch Florian Westphal
2024-08-12 23:28                   ` Kuniyuki Iwashima
2024-08-12 23:52                     ` Florian Westphal
2024-08-13  0:01                       ` Kuniyuki Iwashima
2024-08-13  2:48                   ` Jason Xing
2024-08-15 10:47                   ` Paolo Abeni
2024-08-19 15:36                     ` Eric Dumazet
2024-08-19 15:50                   ` patchwork-bot+netdevbpf
2024-08-11 13:32   ` [syzbot] [net?] WARNING: refcount bug in inet_twsk_kill Florian Westphal
2024-08-11 22:35     ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240811162850.GE13736@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+8ea26396ff85d23a8929@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.