* [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
@ 2024-05-21 14:49 Jason Xing
2024-05-21 15:49 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2024-05-21 14:49 UTC (permalink / raw)
To: edumazet, dsahern, kuba, pabeni, davem, kuniyu
Cc: netdev, kerneljasonxing, Jason Xing, syzbot+2eca27bdcb48ed330251
From: Jason Xing <kernelxing@tencent.com>
This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
Syzbot[1] reported the drecrement of reference count hits leaking memory.
If we failed in setup_net() and try to undo the setup process, the
reference now is 1 which shouldn't be decremented. However, it happened
actually.
After applying this patch which allows us to check the reference first,
it will not hit zero anymore in tcp_twsk_purge() without calling
inet_twsk_purge() one more time.
[1]
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
Modules linked in:
CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:336 [inline]
refcount_dec include/linux/refcount.h:351 [inline]
inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
setup_net+0x714/0xb40 net/core/net_namespace.c:375
copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
ksys_unshare+0x419/0x970 kernel/fork.c:3323
__do_sys_unshare kernel/fork.c:3394 [inline]
__se_sys_unshare kernel/fork.c:3392 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f56d7c7cee9
Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
The reverted patch trying to solve another issue causes unexpected error as above. I
think that issue can be properly analyzed and handled later. So can we revert it first?
---
net/ipv4/tcp_minisocks.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b93619b2384b..46e6f9db4227 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
/* Even if tw_refcount == 1, we must clean up kernel reqsk */
inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
} else if (!purged_once) {
+ /* The last refcount is decremented in tcp_sk_exit_batch() */
+ if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
+ continue;
+
inet_twsk_purge(&tcp_hashinfo);
purged_once = true;
}
--
2.37.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-21 14:49 [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()." Jason Xing
@ 2024-05-21 15:49 ` Eric Dumazet
2024-05-22 6:27 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-05-21 15:49 UTC (permalink / raw)
To: Jason Xing
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
>
> Syzbot[1] reported the drecrement of reference count hits leaking memory.
>
> If we failed in setup_net() and try to undo the setup process, the
> reference now is 1 which shouldn't be decremented. However, it happened
> actually.
>
> After applying this patch which allows us to check the reference first,
> it will not hit zero anymore in tcp_twsk_purge() without calling
> inet_twsk_purge() one more time.
>
> [1]
> refcount_t: decrement hit 0; leaking memory.
> WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> Modules linked in:
> CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> __refcount_dec include/linux/refcount.h:336 [inline]
> refcount_dec include/linux/refcount.h:351 [inline]
> inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> setup_net+0x714/0xb40 net/core/net_namespace.c:375
> copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> ksys_unshare+0x419/0x970 kernel/fork.c:3323
> __do_sys_unshare kernel/fork.c:3394 [inline]
> __se_sys_unshare kernel/fork.c:3392 [inline]
> __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f56d7c7cee9
>
> Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> The reverted patch trying to solve another issue causes unexpected error as above. I
> think that issue can be properly analyzed and handled later. So can we revert it first?
> ---
> net/ipv4/tcp_minisocks.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b93619b2384b..46e6f9db4227 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> } else if (!purged_once) {
> + /* The last refcount is decremented in tcp_sk_exit_batch() */
> + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> + continue;
> +
> inet_twsk_purge(&tcp_hashinfo);
> purged_once = true;
> }
> --
This can not be a fix for a race condition.
By definition a TW has a refcount on tcp_death_row.tw_refcount only
if its timer is armed.
And inet_twsk_deschedule_put() does
if (del_timer_sync(&tw->tw_timer))
inet_twsk_kill(tw);
I think you need to provide a full explanation, instead of a shot in the dark.
Before releasing this syzbot, I thought that maybe the refcount_inc()
was done too late,
but considering other invariants, this should not matter.
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
inet_timewait_sock *tw, int timeo, bool rearm)
__NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
LINUX_MIB_TIMEWAITED);
- BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
refcount_inc(&tw->tw_dr->tw_refcount);
+ BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
} else {
mod_timer_pending(&tw->tw_timer, jiffies + timeo);
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-21 15:49 ` Eric Dumazet
@ 2024-05-22 6:27 ` Jason Xing
2024-05-22 6:51 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2024-05-22 6:27 UTC (permalink / raw)
To: Eric Dumazet
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
Hello Eric,
On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> >
> > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> >
> > If we failed in setup_net() and try to undo the setup process, the
> > reference now is 1 which shouldn't be decremented. However, it happened
> > actually.
> >
> > After applying this patch which allows us to check the reference first,
> > it will not hit zero anymore in tcp_twsk_purge() without calling
> > inet_twsk_purge() one more time.
> >
> > [1]
> > refcount_t: decrement hit 0; leaking memory.
> > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > Modules linked in:
> > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > __refcount_dec include/linux/refcount.h:336 [inline]
> > refcount_dec include/linux/refcount.h:351 [inline]
> > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > __do_sys_unshare kernel/fork.c:3394 [inline]
> > __se_sys_unshare kernel/fork.c:3392 [inline]
> > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7f56d7c7cee9
> >
> > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > The reverted patch trying to solve another issue causes unexpected error as above. I
> > think that issue can be properly analyzed and handled later. So can we revert it first?
> > ---
> > net/ipv4/tcp_minisocks.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > index b93619b2384b..46e6f9db4227 100644
> > --- a/net/ipv4/tcp_minisocks.c
> > +++ b/net/ipv4/tcp_minisocks.c
> > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > } else if (!purged_once) {
> > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > + continue;
> > +
> > inet_twsk_purge(&tcp_hashinfo);
> > purged_once = true;
> > }
> > --
>
> This can not be a fix for a race condition.
>
> By definition a TW has a refcount on tcp_death_row.tw_refcount only
> if its timer is armed.
>
> And inet_twsk_deschedule_put() does
>
> if (del_timer_sync(&tw->tw_timer))
> inet_twsk_kill(tw);
>
> I think you need to provide a full explanation, instead of a shot in the dark.
>
> Before releasing this syzbot, I thought that maybe the refcount_inc()
> was done too late,
> but considering other invariants, this should not matter.
>
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> inet_timewait_sock *tw, int timeo, bool rearm)
>
> __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> LINUX_MIB_TIMEWAITED);
> - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> refcount_inc(&tw->tw_dr->tw_refcount);
> + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> } else {
> mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> }
Thanks for your information.
What you wrote is right, I think for a while. In
inet_twsk_deschedule_put(), there are two possible cases:
1) if it finds the timer is armed, then it can kill the tw socket by
decrementing the refcount in the right way. So it's a good/lucky thing
for us.
or
2) if it misses the point, then the tw socket arms the timer which
will expire in 60 seconds in the initialization phase. The tw socket
would have a chance to be destroyed when the timer expires.
It seems that you don't think your code could solve the race issue?
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-22 6:27 ` Jason Xing
@ 2024-05-22 6:51 ` Eric Dumazet
2024-05-22 10:41 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-05-22 6:51 UTC (permalink / raw)
To: Jason Xing
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
On Wed, May 22, 2024 at 8:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Hello Eric,
>
> On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> > >
> > > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> > >
> > > If we failed in setup_net() and try to undo the setup process, the
> > > reference now is 1 which shouldn't be decremented. However, it happened
> > > actually.
> > >
> > > After applying this patch which allows us to check the reference first,
> > > it will not hit zero anymore in tcp_twsk_purge() without calling
> > > inet_twsk_purge() one more time.
> > >
> > > [1]
> > > refcount_t: decrement hit 0; leaking memory.
> > > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > Modules linked in:
> > > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > > Call Trace:
> > > <TASK>
> > > __refcount_dec include/linux/refcount.h:336 [inline]
> > > refcount_dec include/linux/refcount.h:351 [inline]
> > > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > > __do_sys_unshare kernel/fork.c:3394 [inline]
> > > __se_sys_unshare kernel/fork.c:3392 [inline]
> > > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > RIP: 0033:0x7f56d7c7cee9
> > >
> > > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > > Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > The reverted patch trying to solve another issue causes unexpected error as above. I
> > > think that issue can be properly analyzed and handled later. So can we revert it first?
> > > ---
> > > net/ipv4/tcp_minisocks.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > index b93619b2384b..46e6f9db4227 100644
> > > --- a/net/ipv4/tcp_minisocks.c
> > > +++ b/net/ipv4/tcp_minisocks.c
> > > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > > } else if (!purged_once) {
> > > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > > + continue;
> > > +
> > > inet_twsk_purge(&tcp_hashinfo);
> > > purged_once = true;
> > > }
> > > --
> >
> > This can not be a fix for a race condition.
> >
> > By definition a TW has a refcount on tcp_death_row.tw_refcount only
> > if its timer is armed.
> >
> > And inet_twsk_deschedule_put() does
> >
> > if (del_timer_sync(&tw->tw_timer))
> > inet_twsk_kill(tw);
> >
> > I think you need to provide a full explanation, instead of a shot in the dark.
> >
> > Before releasing this syzbot, I thought that maybe the refcount_inc()
> > was done too late,
> > but considering other invariants, this should not matter.
> >
> > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> > 100644
> > --- a/net/ipv4/inet_timewait_sock.c
> > +++ b/net/ipv4/inet_timewait_sock.c
> > @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> > inet_timewait_sock *tw, int timeo, bool rearm)
> >
> > __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> > LINUX_MIB_TIMEWAITED);
> > - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > refcount_inc(&tw->tw_dr->tw_refcount);
> > + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > } else {
> > mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> > }
>
> Thanks for your information.
>
> What you wrote is right, I think for a while. In
> inet_twsk_deschedule_put(), there are two possible cases:
> 1) if it finds the timer is armed, then it can kill the tw socket by
> decrementing the refcount in the right way. So it's a good/lucky thing
> for us.
> or
> 2) if it misses the point, then the tw socket arms the timer which
> will expire in 60 seconds in the initialization phase. The tw socket
> would have a chance to be destroyed when the timer expires.
>
> It seems that you don't think your code could solve the race issue?
inet_twsk_schedule(tw) is called while the tw refcount is still 0, so
this tw can be be found in inet_twsk_purge()
at the same time. Look at inet_twsk_hashdance() for details.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-22 6:51 ` Eric Dumazet
@ 2024-05-22 10:41 ` Jason Xing
2024-05-22 10:53 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2024-05-22 10:41 UTC (permalink / raw)
To: Eric Dumazet
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
On Wed, May 22, 2024 at 2:51 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 22, 2024 at 8:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Hello Eric,
> >
> > On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> > > >
> > > > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> > > >
> > > > If we failed in setup_net() and try to undo the setup process, the
> > > > reference now is 1 which shouldn't be decremented. However, it happened
> > > > actually.
> > > >
> > > > After applying this patch which allows us to check the reference first,
> > > > it will not hit zero anymore in tcp_twsk_purge() without calling
> > > > inet_twsk_purge() one more time.
> > > >
> > > > [1]
> > > > refcount_t: decrement hit 0; leaking memory.
> > > > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > Modules linked in:
> > > > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > > > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > > > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > > > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > > > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > > > Call Trace:
> > > > <TASK>
> > > > __refcount_dec include/linux/refcount.h:336 [inline]
> > > > refcount_dec include/linux/refcount.h:351 [inline]
> > > > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > > > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > > > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > > > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > > > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > > > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > > > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > > > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > > > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > > > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > > > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > > > __do_sys_unshare kernel/fork.c:3394 [inline]
> > > > __se_sys_unshare kernel/fork.c:3392 [inline]
> > > > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > RIP: 0033:0x7f56d7c7cee9
> > > >
> > > > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > > > Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> > > > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > > The reverted patch trying to solve another issue causes unexpected error as above. I
> > > > think that issue can be properly analyzed and handled later. So can we revert it first?
> > > > ---
> > > > net/ipv4/tcp_minisocks.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > index b93619b2384b..46e6f9db4227 100644
> > > > --- a/net/ipv4/tcp_minisocks.c
> > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > > > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > > > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > > > } else if (!purged_once) {
> > > > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > > > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > > > + continue;
> > > > +
> > > > inet_twsk_purge(&tcp_hashinfo);
> > > > purged_once = true;
> > > > }
> > > > --
> > >
> > > This can not be a fix for a race condition.
> > >
> > > By definition a TW has a refcount on tcp_death_row.tw_refcount only
> > > if its timer is armed.
> > >
> > > And inet_twsk_deschedule_put() does
> > >
> > > if (del_timer_sync(&tw->tw_timer))
> > > inet_twsk_kill(tw);
> > >
> > > I think you need to provide a full explanation, instead of a shot in the dark.
> > >
> > > Before releasing this syzbot, I thought that maybe the refcount_inc()
> > > was done too late,
> > > but considering other invariants, this should not matter.
> > >
> > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > > index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> > > 100644
> > > --- a/net/ipv4/inet_timewait_sock.c
> > > +++ b/net/ipv4/inet_timewait_sock.c
> > > @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> > > inet_timewait_sock *tw, int timeo, bool rearm)
> > >
> > > __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> > > LINUX_MIB_TIMEWAITED);
> > > - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > refcount_inc(&tw->tw_dr->tw_refcount);
> > > + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > } else {
> > > mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> > > }
> >
> > Thanks for your information.
> >
> > What you wrote is right, I think for a while. In
> > inet_twsk_deschedule_put(), there are two possible cases:
> > 1) if it finds the timer is armed, then it can kill the tw socket by
> > decrementing the refcount in the right way. So it's a good/lucky thing
> > for us.
> > or
> > 2) if it misses the point, then the tw socket arms the timer which
> > will expire in 60 seconds in the initialization phase. The tw socket
> > would have a chance to be destroyed when the timer expires.
> >
> > It seems that you don't think your code could solve the race issue?
>
> inet_twsk_schedule(tw) is called while the tw refcount is still 0, so
> this tw can be be found in inet_twsk_purge()
> at the same time. Look at inet_twsk_hashdance() for details.
Yes, after inet_twsk_purge() finds the tw, there are two cases like my
previous email said after applying your code.
For 1) case, everything is good. inet_twsk_purge() will finish it up
because it can decrement the refcount safely.
For 2) case, even though inet_twsk_purge() finds it, it's not the time
to destroy it until the expired tw timer will finally handle the
process of destruction by calling inet_twsk_kill(), right? Let the
timer handle the destruction until its end of life, which I think is a
normal process for all the timewait sockets.
The only difference in 2) case is that inet_twsk_purge() calls
inet_twsk_put() twice while tw_timer_handler() only calls it one time.
Am I missing something?
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-22 10:41 ` Jason Xing
@ 2024-05-22 10:53 ` Eric Dumazet
2024-05-22 15:37 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-05-22 10:53 UTC (permalink / raw)
To: Jason Xing
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
On Wed, May 22, 2024 at 12:42 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, May 22, 2024 at 2:51 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, May 22, 2024 at 8:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> > > > >
> > > > > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> > > > >
> > > > > If we failed in setup_net() and try to undo the setup process, the
> > > > > reference now is 1 which shouldn't be decremented. However, it happened
> > > > > actually.
> > > > >
> > > > > After applying this patch which allows us to check the reference first,
> > > > > it will not hit zero anymore in tcp_twsk_purge() without calling
> > > > > inet_twsk_purge() one more time.
> > > > >
> > > > > [1]
> > > > > refcount_t: decrement hit 0; leaking memory.
> > > > > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > > Modules linked in:
> > > > > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > > > > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > > > > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > > > > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > > > > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > > > > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > > > > Call Trace:
> > > > > <TASK>
> > > > > __refcount_dec include/linux/refcount.h:336 [inline]
> > > > > refcount_dec include/linux/refcount.h:351 [inline]
> > > > > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > > > > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > > > > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > > > > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > > > > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > > > > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > > > > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > > > > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > > > > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > > > > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > > > > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > > > > __do_sys_unshare kernel/fork.c:3394 [inline]
> > > > > __se_sys_unshare kernel/fork.c:3392 [inline]
> > > > > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > > RIP: 0033:0x7f56d7c7cee9
> > > > >
> > > > > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > > > > Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> > > > > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > > The reverted patch trying to solve another issue causes unexpected error as above. I
> > > > > think that issue can be properly analyzed and handled later. So can we revert it first?
> > > > > ---
> > > > > net/ipv4/tcp_minisocks.c | 4 ++++
> > > > > 1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > index b93619b2384b..46e6f9db4227 100644
> > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > > > > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > > > > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > > > > } else if (!purged_once) {
> > > > > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > > > > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > > > > + continue;
> > > > > +
> > > > > inet_twsk_purge(&tcp_hashinfo);
> > > > > purged_once = true;
> > > > > }
> > > > > --
> > > >
> > > > This can not be a fix for a race condition.
> > > >
> > > > By definition a TW has a refcount on tcp_death_row.tw_refcount only
> > > > if its timer is armed.
> > > >
> > > > And inet_twsk_deschedule_put() does
> > > >
> > > > if (del_timer_sync(&tw->tw_timer))
> > > > inet_twsk_kill(tw);
> > > >
> > > > I think you need to provide a full explanation, instead of a shot in the dark.
> > > >
> > > > Before releasing this syzbot, I thought that maybe the refcount_inc()
> > > > was done too late,
> > > > but considering other invariants, this should not matter.
> > > >
> > > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > > > index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> > > > 100644
> > > > --- a/net/ipv4/inet_timewait_sock.c
> > > > +++ b/net/ipv4/inet_timewait_sock.c
> > > > @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> > > > inet_timewait_sock *tw, int timeo, bool rearm)
> > > >
> > > > __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> > > > LINUX_MIB_TIMEWAITED);
> > > > - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > > refcount_inc(&tw->tw_dr->tw_refcount);
> > > > + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > > } else {
> > > > mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> > > > }
> > >
> > > Thanks for your information.
> > >
> > > What you wrote is right, I think for a while. In
> > > inet_twsk_deschedule_put(), there are two possible cases:
> > > 1) if it finds the timer is armed, then it can kill the tw socket by
> > > decrementing the refcount in the right way. So it's a good/lucky thing
> > > for us.
> > > or
> > > 2) if it misses the point, then the tw socket arms the timer which
> > > will expire in 60 seconds in the initialization phase. The tw socket
> > > would have a chance to be destroyed when the timer expires.
> > >
> > > It seems that you don't think your code could solve the race issue?
> >
> > inet_twsk_schedule(tw) is called while the tw refcount is still 0, so
> > this tw can be be found in inet_twsk_purge()
> > at the same time. Look at inet_twsk_hashdance() for details.
>
> Yes, after inet_twsk_purge() finds the tw, there are two cases like my
> previous email said after applying your code.
>
> For 1) case, everything is good. inet_twsk_purge() will finish it up
> because it can decrement the refcount safely.
>
> For 2) case, even though inet_twsk_purge() finds it, it's not the time
> to destroy it until the expired tw timer will finally handle the
> process of destruction by calling inet_twsk_kill(), right? Let the
> timer handle the destruction until its end of life, which I think is a
> normal process for all the timewait sockets.
>
> The only difference in 2) case is that inet_twsk_purge() calls
> inet_twsk_put() twice while tw_timer_handler() only calls it one time.
>
> Am I missing something?
You are missing that inet_twsk_deschedule_put() is doing :
if (del_timer_sync(&tw->tw_timer))
inet_twsk_kill(tw);
You can not have both tw_timer_handler() and
inet_twsk_deschedule_put() calling inet_twsk_kill(tw);
Take a look at del_timer_sync()
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()."
2024-05-22 10:53 ` Eric Dumazet
@ 2024-05-22 15:37 ` Jason Xing
0 siblings, 0 replies; 7+ messages in thread
From: Jason Xing @ 2024-05-22 15:37 UTC (permalink / raw)
To: Eric Dumazet
Cc: dsahern, kuba, pabeni, davem, kuniyu, netdev, Jason Xing,
syzbot+2eca27bdcb48ed330251
On Wed, May 22, 2024 at 6:54 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 22, 2024 at 12:42 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Wed, May 22, 2024 at 2:51 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Wed, May 22, 2024 at 8:28 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > Hello Eric,
> > > >
> > > > On Tue, May 21, 2024 at 11:49 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Tue, May 21, 2024 at 4:49 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > > >
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > This reverts commit 2a750d6a5b365265dbda33330a6188547ddb5c24.
> > > > > >
> > > > > > Syzbot[1] reported the drecrement of reference count hits leaking memory.
> > > > > >
> > > > > > If we failed in setup_net() and try to undo the setup process, the
> > > > > > reference now is 1 which shouldn't be decremented. However, it happened
> > > > > > actually.
> > > > > >
> > > > > > After applying this patch which allows us to check the reference first,
> > > > > > it will not hit zero anymore in tcp_twsk_purge() without calling
> > > > > > inet_twsk_purge() one more time.
> > > > > >
> > > > > > [1]
> > > > > > refcount_t: decrement hit 0; leaking memory.
> > > > > > WARNING: CPU: 3 PID: 1396 at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > > > Modules linked in:
> > > > > > CPU: 3 PID: 1396 Comm: syz-executor.3 Not tainted 6.9.0-syzkaller-07370-g33e02dc69afb #0
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > > RIP: 0010:refcount_warn_saturate+0x1ed/0x210 lib/refcount.c:31
> > > > > > RSP: 0018:ffffc9000480fa70 EFLAGS: 00010282
> > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc9002ce28000
> > > > > > RDX: 0000000000040000 RSI: ffffffff81505406 RDI: 0000000000000001
> > > > > > RBP: ffff88804d8b3f80 R08: 0000000000000001 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: 0000000000000002 R12: ffff88804d8b3f80
> > > > > > R13: ffff888031c601c0 R14: ffffc900013c04f8 R15: 000000002a3e5567
> > > > > > FS: 00007f56d897c6c0(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000001b3182b000 CR3: 0000000034ed6000 CR4: 0000000000350ef0
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > __refcount_dec include/linux/refcount.h:336 [inline]
> > > > > > refcount_dec include/linux/refcount.h:351 [inline]
> > > > > > inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70
> > > > > > inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 [inline]
> > > > > > inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304
> > > > > > tcp_twsk_purge+0x115/0x150 net/ipv4/tcp_minisocks.c:402
> > > > > > tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522
> > > > > > ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
> > > > > > setup_net+0x714/0xb40 net/core/net_namespace.c:375
> > > > > > copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508
> > > > > > create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
> > > > > > unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
> > > > > > ksys_unshare+0x419/0x970 kernel/fork.c:3323
> > > > > > __do_sys_unshare kernel/fork.c:3394 [inline]
> > > > > > __se_sys_unshare kernel/fork.c:3392 [inline]
> > > > > > __x64_sys_unshare+0x31/0x40 kernel/fork.c:3392
> > > > > > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > > > do_syscall_64+0xcf/0x260 arch/x86/entry/common.c:83
> > > > > > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > > > RIP: 0033:0x7f56d7c7cee9
> > > > > >
> > > > > > Fixes: 2a750d6a5b36 ("rds: tcp: Fix use-after-free of net in reqsk_timer_handler().")
> > > > > > Reported-by: syzbot+2eca27bdcb48ed330251@syzkaller.appspotmail.com
> > > > > > Closes: https://syzkaller.appspot.com/bug?extid=2eca27bdcb48ed330251
> > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > > ---
> > > > > > The reverted patch trying to solve another issue causes unexpected error as above. I
> > > > > > think that issue can be properly analyzed and handled later. So can we revert it first?
> > > > > > ---
> > > > > > net/ipv4/tcp_minisocks.c | 4 ++++
> > > > > > 1 file changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > > index b93619b2384b..46e6f9db4227 100644
> > > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > > @@ -399,6 +399,10 @@ void tcp_twsk_purge(struct list_head *net_exit_list)
> > > > > > /* Even if tw_refcount == 1, we must clean up kernel reqsk */
> > > > > > inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo);
> > > > > > } else if (!purged_once) {
> > > > > > + /* The last refcount is decremented in tcp_sk_exit_batch() */
> > > > > > + if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
> > > > > > + continue;
> > > > > > +
> > > > > > inet_twsk_purge(&tcp_hashinfo);
> > > > > > purged_once = true;
> > > > > > }
> > > > > > --
> > > > >
> > > > > This can not be a fix for a race condition.
> > > > >
> > > > > By definition a TW has a refcount on tcp_death_row.tw_refcount only
> > > > > if its timer is armed.
> > > > >
> > > > > And inet_twsk_deschedule_put() does
> > > > >
> > > > > if (del_timer_sync(&tw->tw_timer))
> > > > > inet_twsk_kill(tw);
> > > > >
> > > > > I think you need to provide a full explanation, instead of a shot in the dark.
> > > > >
> > > > > Before releasing this syzbot, I thought that maybe the refcount_inc()
> > > > > was done too late,
> > > > > but considering other invariants, this should not matter.
> > > > >
> > > > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> > > > > index e28075f0006e333897ad379ebc8c87fc3f9643bd..d8f4d93c45331be64d70af96de33f783870e1dcc
> > > > > 100644
> > > > > --- a/net/ipv4/inet_timewait_sock.c
> > > > > +++ b/net/ipv4/inet_timewait_sock.c
> > > > > @@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
> > > > > inet_timewait_sock *tw, int timeo, bool rearm)
> > > > >
> > > > > __NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
> > > > > LINUX_MIB_TIMEWAITED);
> > > > > - BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > > > refcount_inc(&tw->tw_dr->tw_refcount);
> > > > > + BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
> > > > > } else {
> > > > > mod_timer_pending(&tw->tw_timer, jiffies + timeo);
> > > > > }
> > > >
> > > > Thanks for your information.
> > > >
> > > > What you wrote is right, I think for a while. In
> > > > inet_twsk_deschedule_put(), there are two possible cases:
> > > > 1) if it finds the timer is armed, then it can kill the tw socket by
> > > > decrementing the refcount in the right way. So it's a good/lucky thing
> > > > for us.
> > > > or
> > > > 2) if it misses the point, then the tw socket arms the timer which
> > > > will expire in 60 seconds in the initialization phase. The tw socket
> > > > would have a chance to be destroyed when the timer expires.
> > > >
> > > > It seems that you don't think your code could solve the race issue?
> > >
> > > inet_twsk_schedule(tw) is called while the tw refcount is still 0, so
> > > this tw can be be found in inet_twsk_purge()
> > > at the same time. Look at inet_twsk_hashdance() for details.
> >
> > Yes, after inet_twsk_purge() finds the tw, there are two cases like my
> > previous email said after applying your code.
> >
> > For 1) case, everything is good. inet_twsk_purge() will finish it up
> > because it can decrement the refcount safely.
> >
> > For 2) case, even though inet_twsk_purge() finds it, it's not the time
> > to destroy it until the expired tw timer will finally handle the
> > process of destruction by calling inet_twsk_kill(), right? Let the
> > timer handle the destruction until its end of life, which I think is a
> > normal process for all the timewait sockets.
> >
> > The only difference in 2) case is that inet_twsk_purge() calls
> > inet_twsk_put() twice while tw_timer_handler() only calls it one time.
> >
> > Am I missing something?
>
> You are missing that inet_twsk_deschedule_put() is doing :
>
> if (del_timer_sync(&tw->tw_timer))
> inet_twsk_kill(tw);
>
> You can not have both tw_timer_handler() and
> inet_twsk_deschedule_put() calling inet_twsk_kill(tw);
[...]
>
> Take a look at del_timer_sync()
Oops, I missed this function...Thanks for pointing it out.
I can test if the timer is active or queued before we can delete, like
this on top of your patch:
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index e28075f0006e..b890d1c280a1 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -255,8 +255,8 @@ void __inet_twsk_schedule(struct
inet_timewait_sock *tw, int timeo, bool rearm)
__NET_INC_STATS(twsk_net(tw), kill ? LINUX_MIB_TIMEWAITKILLED :
LINUX_MIB_TIMEWAITED);
- BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
refcount_inc(&tw->tw_dr->tw_refcount);
+ BUG_ON(mod_timer(&tw->tw_timer, jiffies + timeo));
} else {
mod_timer_pending(&tw->tw_timer, jiffies + timeo);
}
@@ -301,7 +301,14 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo)
rcu_read_unlock();
local_bh_disable();
if (state == TCP_TIME_WAIT) {
- inet_twsk_deschedule_put(inet_twsk(sk));
+ struct inet_timewait_sock *tw = inet_twsk(sk);
+
+ /* If the timer is armed, we can safely destroy
+ * it, or else we postpone the process
of destruction
+ * to tw_timer_handler().
+ */
+ if (timer_pending(&tw->tw_timer))
+ inet_twsk_deschedule_put(tw);
} else {
struct request_sock *req = inet_reqsk(sk);
As above, if we find the timer is armed, then we can destroy it
naturally in inet_twsk_deschedule_put(). If not, we can skip it in the
inet_twsk_purge() and then postpone the destruction process in the
timer handler.
I think I need to add another test about whether the timer is running or not.
If testing the status of timer is not a good way to go, perhaps I
would like to introduce a new flag to indicate whether the tw socket
still exists in the initialization phase.
Sorry, It's __not__ tested because it's too late for me, I will spend
more time taking care of this race condition tomorrow.
Thanks,
Jason
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-05-22 15:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-21 14:49 [PATCH net] Revert "rds: tcp: Fix use-after-free of net in reqsk_timer_handler()." Jason Xing
2024-05-21 15:49 ` Eric Dumazet
2024-05-22 6:27 ` Jason Xing
2024-05-22 6:51 ` Eric Dumazet
2024-05-22 10:41 ` Jason Xing
2024-05-22 10:53 ` Eric Dumazet
2024-05-22 15:37 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).