* [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
@ 2024-10-07 14:15 Kuniyuki Iwashima
2024-10-07 23:26 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-07 14:15 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
David Ahern
Cc: Martin KaFai Lau, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev
Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
"""
We are seeing a use-after-free from a bpf prog attached to
trace_tcp_retransmit_synack. The program passes the req->sk to the
bpf_sk_storage_get_tracing kernel helper which does check for null
before using it.
"""
The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.
Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.
If reqsk_queue_unlink() checks timer_pending() just before expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.
The reported UAF could happen if req->sk is close()d earlier than the timer
expiration.
Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().
[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb
Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/inet_connection_sock.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 2c5632d4fddb..36f03d51356e 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1045,12 +1045,13 @@ static bool reqsk_queue_unlink(struct request_sock *req)
found = __sk_nulls_del_node_init_rcu(sk);
spin_unlock(lock);
}
- if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
- reqsk_put(req);
+
return found;
}
-bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+static bool __inet_csk_reqsk_queue_drop(struct sock *sk,
+ struct request_sock *req,
+ bool from_timer)
{
bool unlinked = reqsk_queue_unlink(req);
@@ -1058,8 +1059,17 @@ bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
reqsk_queue_removed(&inet_csk(sk)->icsk_accept_queue, req);
reqsk_put(req);
}
+
+ if (!from_timer && timer_delete_sync(&req->rsk_timer))
+ reqsk_put(req);
+
return unlinked;
}
+
+bool inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req)
+{
+ return __inet_csk_reqsk_queue_drop(sk, req, false);
+}
EXPORT_SYMBOL(inet_csk_reqsk_queue_drop);
void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req)
@@ -1152,7 +1162,7 @@ static void reqsk_timer_handler(struct timer_list *t)
if (!inet_ehash_insert(req_to_sk(nreq), req_to_sk(oreq), NULL)) {
/* delete timer */
- inet_csk_reqsk_queue_drop(sk_listener, nreq);
+ __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
goto no_ownership;
}
@@ -1178,7 +1188,8 @@ static void reqsk_timer_handler(struct timer_list *t)
}
drop:
- inet_csk_reqsk_queue_drop_and_put(oreq->rsk_listener, oreq);
+ __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
+ reqsk_put(req);
}
static bool reqsk_queue_hash_req(struct request_sock *req,
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-07 14:15 [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink() Kuniyuki Iwashima
@ 2024-10-07 23:26 ` Jakub Kicinski
2024-10-07 23:52 ` Kuniyuki Iwashima
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2024-10-07 23:26 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, David Ahern,
Martin KaFai Lau, Kuniyuki Iwashima, netdev
On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
>
> """
> We are seeing a use-after-free from a bpf prog attached to
> trace_tcp_retransmit_synack. The program passes the req->sk to the
> bpf_sk_storage_get_tracing kernel helper which does check for null
> before using it.
> """
I think this crashes a bunch of selftests, example:
https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
--
pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-07 23:26 ` Jakub Kicinski
@ 2024-10-07 23:52 ` Kuniyuki Iwashima
2024-10-08 9:54 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-07 23:52 UTC (permalink / raw)
To: kuba; +Cc: davem, dsahern, edumazet, kuni1840, kuniyu, martin.lau, netdev,
pabeni
From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 7 Oct 2024 16:26:10 -0700
> On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> > Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
> >
> > """
> > We are seeing a use-after-free from a bpf prog attached to
> > trace_tcp_retransmit_synack. The program passes the req->sk to the
> > bpf_sk_storage_get_tracing kernel helper which does check for null
> > before using it.
> > """
>
> I think this crashes a bunch of selftests, example:
>
> https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
for different reqsk. I'll squash the diff below.
---8<---
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 36f03d51356e..433c80dc57d5 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
}
drop:
- __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
+ __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
reqsk_put(req);
}
---8<---
Thanks!
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-07 23:52 ` Kuniyuki Iwashima
@ 2024-10-08 9:54 ` Eric Dumazet
2024-10-08 14:21 ` Kuniyuki Iwashima
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2024-10-08 9:54 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: kuba, davem, dsahern, kuni1840, martin.lau, netdev, pabeni
On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Mon, 7 Oct 2024 16:26:10 -0700
> > On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> > > Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
> > >
> > > """
> > > We are seeing a use-after-free from a bpf prog attached to
> > > trace_tcp_retransmit_synack. The program passes the req->sk to the
> > > bpf_sk_storage_get_tracing kernel helper which does check for null
> > > before using it.
> > > """
> >
> > I think this crashes a bunch of selftests, example:
> >
> > https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
>
> Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
> for different reqsk. I'll squash the diff below.
>
> ---8<---
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 36f03d51356e..433c80dc57d5 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
> }
>
> drop:
> - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
> + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
> reqsk_put(req);
> }
>
> ---8<---
>
> Thanks!
Just to clarify. In the old times rsk_timer was pinned, right ?
83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
was fine I think.
So the bug was added recently ?
Can we give a precise Fixes: tag ?
Thank you.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-08 9:54 ` Eric Dumazet
@ 2024-10-08 14:21 ` Kuniyuki Iwashima
2024-10-08 14:28 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-08 14:21 UTC (permalink / raw)
To: edumazet; +Cc: davem, dsahern, kuba, kuni1840, kuniyu, martin.lau, netdev,
pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 8 Oct 2024 11:54:21 +0200
> On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Mon, 7 Oct 2024 16:26:10 -0700
> > > On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> > > > Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
> > > >
> > > > """
> > > > We are seeing a use-after-free from a bpf prog attached to
> > > > trace_tcp_retransmit_synack. The program passes the req->sk to the
> > > > bpf_sk_storage_get_tracing kernel helper which does check for null
> > > > before using it.
> > > > """
> > >
> > > I think this crashes a bunch of selftests, example:
> > >
> > > https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
> >
> > Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
> > for different reqsk. I'll squash the diff below.
> >
> > ---8<---
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 36f03d51356e..433c80dc57d5 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
> > }
> >
> > drop:
> > - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
> > + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
> > reqsk_put(req);
> > }
> >
> > ---8<---
> >
> > Thanks!
>
> Just to clarify. In the old times rsk_timer was pinned, right ?
>
> 83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
> was fine I think.
>
> So the bug was added recently ?
>
> Can we give a precise Fixes: tag ?
TIMER_PINNED was used in reqsk_queue_hash_req() in v6.4 mentioned
by Martin and still used in the latest net-next.
$ git blame -L:reqsk_queue_hash_req net/ipv4/inet_connection_sock.c v6.4
079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1095) static void reqsk_queue_hash_req(struct request_sock *req,
079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1096) unsigned long timeout)
fa76ce7328b28 (Eric Dumazet 2015-03-19 19:04:20 -0700 1097) {
59f379f9046a9 (Kees Cook 2017-10-16 17:29:19 -0700 1098) timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
Maybe the connection was localhost, or unlikely but RPS was
configured after SYN+ACK, or setup like ff46e3b44219 was used ??
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-08 14:21 ` Kuniyuki Iwashima
@ 2024-10-08 14:28 ` Eric Dumazet
2024-10-08 14:42 ` Kuniyuki Iwashima
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2024-10-08 14:28 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dsahern, kuba, kuni1840, martin.lau, netdev, pabeni
On Tue, Oct 8, 2024 at 4:21 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 8 Oct 2024 11:54:21 +0200
> > On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Date: Mon, 7 Oct 2024 16:26:10 -0700
> > > > On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> > > > > Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
> > > > >
> > > > > """
> > > > > We are seeing a use-after-free from a bpf prog attached to
> > > > > trace_tcp_retransmit_synack. The program passes the req->sk to the
> > > > > bpf_sk_storage_get_tracing kernel helper which does check for null
> > > > > before using it.
> > > > > """
> > > >
> > > > I think this crashes a bunch of selftests, example:
> > > >
> > > > https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
> > >
> > > Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
> > > for different reqsk. I'll squash the diff below.
> > >
> > > ---8<---
> > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > index 36f03d51356e..433c80dc57d5 100644
> > > --- a/net/ipv4/inet_connection_sock.c
> > > +++ b/net/ipv4/inet_connection_sock.c
> > > @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
> > > }
> > >
> > > drop:
> > > - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
> > > + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
> > > reqsk_put(req);
> > > }
> > >
> > > ---8<---
> > >
> > > Thanks!
> >
> > Just to clarify. In the old times rsk_timer was pinned, right ?
> >
> > 83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
> > was fine I think.
> >
> > So the bug was added recently ?
> >
> > Can we give a precise Fixes: tag ?
>
> TIMER_PINNED was used in reqsk_queue_hash_req() in v6.4 mentioned
> by Martin and still used in the latest net-next.
>
> $ git blame -L:reqsk_queue_hash_req net/ipv4/inet_connection_sock.c v6.4
> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1095) static void reqsk_queue_hash_req(struct request_sock *req,
> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1096) unsigned long timeout)
> fa76ce7328b28 (Eric Dumazet 2015-03-19 19:04:20 -0700 1097) {
> 59f379f9046a9 (Kees Cook 2017-10-16 17:29:19 -0700 1098) timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
>
> Maybe the connection was localhost, or unlikely but RPS was
> configured after SYN+ACK, or setup like ff46e3b44219 was used ??
I do not really understand the issue.
How a sk can be 'closed' with outstanding request sock ?
They hold a refcount on the listener.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-08 14:28 ` Eric Dumazet
@ 2024-10-08 14:42 ` Kuniyuki Iwashima
2024-10-12 4:01 ` Martin KaFai Lau
0 siblings, 1 reply; 8+ messages in thread
From: Kuniyuki Iwashima @ 2024-10-08 14:42 UTC (permalink / raw)
To: edumazet; +Cc: davem, dsahern, kuba, kuni1840, kuniyu, martin.lau, netdev,
pabeni
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 8 Oct 2024 16:28:53 +0200
> On Tue, Oct 8, 2024 at 4:21 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> > Date: Tue, 8 Oct 2024 11:54:21 +0200
> > > On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > Date: Mon, 7 Oct 2024 16:26:10 -0700
> > > > > On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
> > > > > > Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
> > > > > >
> > > > > > """
> > > > > > We are seeing a use-after-free from a bpf prog attached to
> > > > > > trace_tcp_retransmit_synack. The program passes the req->sk to the
> > > > > > bpf_sk_storage_get_tracing kernel helper which does check for null
> > > > > > before using it.
> > > > > > """
> > > > >
> > > > > I think this crashes a bunch of selftests, example:
> > > > >
> > > > > https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
> > > >
> > > > Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
> > > > for different reqsk. I'll squash the diff below.
> > > >
> > > > ---8<---
> > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > > index 36f03d51356e..433c80dc57d5 100644
> > > > --- a/net/ipv4/inet_connection_sock.c
> > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
> > > > }
> > > >
> > > > drop:
> > > > - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
> > > > + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
> > > > reqsk_put(req);
> > > > }
> > > >
> > > > ---8<---
> > > >
> > > > Thanks!
> > >
> > > Just to clarify. In the old times rsk_timer was pinned, right ?
> > >
> > > 83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
> > > was fine I think.
> > >
> > > So the bug was added recently ?
> > >
> > > Can we give a precise Fixes: tag ?
> >
> > TIMER_PINNED was used in reqsk_queue_hash_req() in v6.4 mentioned
> > by Martin and still used in the latest net-next.
> >
> > $ git blame -L:reqsk_queue_hash_req net/ipv4/inet_connection_sock.c v6.4
> > 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1095) static void reqsk_queue_hash_req(struct request_sock *req,
> > 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1096) unsigned long timeout)
> > fa76ce7328b28 (Eric Dumazet 2015-03-19 19:04:20 -0700 1097) {
> > 59f379f9046a9 (Kees Cook 2017-10-16 17:29:19 -0700 1098) timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
> >
> > Maybe the connection was localhost, or unlikely but RPS was
> > configured after SYN+ACK, or setup like ff46e3b44219 was used ??
>
> I do not really understand the issue.
> How a sk can be 'closed' with outstanding request sock ?
> They hold a refcount on the listener.
My understanding is
1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
but del_timer_sync() is missed
2. reqsk timer is executed and scheduled again
3. req->sk is accept()ed, but inet_csk_accept() does not clear
req->sk for non-TFO sockets, and reqsk_put() decrements one
refcnt, but still reqsk timer has another one
4. sk is close()d
5. reqsk timer is executed again, and BPF touches req->sk
reqsk timer will run for 63s by default, so I think it's possible
that sk is close()d earlier than the timer expiration.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
2024-10-08 14:42 ` Kuniyuki Iwashima
@ 2024-10-12 4:01 ` Martin KaFai Lau
0 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2024-10-12 4:01 UTC (permalink / raw)
To: Kuniyuki Iwashima, edumazet
Cc: davem, dsahern, kuba, kuni1840, martin.lau, netdev, pabeni
On 10/8/24 7:42 AM, Kuniyuki Iwashima wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 8 Oct 2024 16:28:53 +0200
>> On Tue, Oct 8, 2024 at 4:21 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>>
>>> From: Eric Dumazet <edumazet@google.com>
>>> Date: Tue, 8 Oct 2024 11:54:21 +0200
>>>> On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>>>>
>>>>> From: Jakub Kicinski <kuba@kernel.org>
>>>>> Date: Mon, 7 Oct 2024 16:26:10 -0700
>>>>>> On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
>>>>>>> Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
>>>>>>>
>>>>>>> """
>>>>>>> We are seeing a use-after-free from a bpf prog attached to
>>>>>>> trace_tcp_retransmit_synack. The program passes the req->sk to the
>>>>>>> bpf_sk_storage_get_tracing kernel helper which does check for null
>>>>>>> before using it.
>>>>>>> """
>>>>>>
>>>>>> I think this crashes a bunch of selftests, example:
>>>>>>
>>>>>> https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
>>>>>
>>>>> Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
>>>>> for different reqsk. I'll squash the diff below.
>>>>>
>>>>> ---8<---
>>>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>>>> index 36f03d51356e..433c80dc57d5 100644
>>>>> --- a/net/ipv4/inet_connection_sock.c
>>>>> +++ b/net/ipv4/inet_connection_sock.c
>>>>> @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
>>>>> }
>>>>>
>>>>> drop:
>>>>> - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
>>>>> + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
>>>>> reqsk_put(req);
>>>>> }
>>>>>
>>>>> ---8<---
>>>>>
>>>>> Thanks!
>>>>
>>>> Just to clarify. In the old times rsk_timer was pinned, right ?
>>>>
>>>> 83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
>>>> was fine I think.
>>>>
>>>> So the bug was added recently ?
>>>>
>>>> Can we give a precise Fixes: tag ?
>>>
>>> TIMER_PINNED was used in reqsk_queue_hash_req() in v6.4 mentioned
>>> by Martin and still used in the latest net-next.
>>>
>>> $ git blame -L:reqsk_queue_hash_req net/ipv4/inet_connection_sock.c v6.4
>>> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1095) static void reqsk_queue_hash_req(struct request_sock *req,
>>> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1096) unsigned long timeout)
>>> fa76ce7328b28 (Eric Dumazet 2015-03-19 19:04:20 -0700 1097) {
>>> 59f379f9046a9 (Kees Cook 2017-10-16 17:29:19 -0700 1098) timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
>>>
>>> Maybe the connection was localhost, or unlikely but RPS was
>>> configured after SYN+ACK, or setup like ff46e3b44219 was used ??
I don't know what exactly caused the ack to be handled on a different CPU. We
have a recent packet steering test, so it could be caused by this test
adjusting the steering config.
>>
>> I do not really understand the issue.
>> How a sk can be 'closed' with outstanding request sock ?
>> They hold a refcount on the listener.
>
> My understanding is
>
> 1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
> but del_timer_sync() is missed
>
> 2. reqsk timer is executed and scheduled again
>
> 3. req->sk is accept()ed, but inet_csk_accept() does not clear
> req->sk for non-TFO sockets, and reqsk_put() decrements one
> refcnt, but still reqsk timer has another one
>
> 4. sk is close()d
>
> 5. reqsk timer is executed again, and BPF touches req->sk
The above is also what I think is happening.
The kernel reqsk_timer_handler() is not using req->sk, so it has not been an issue.
>
> reqsk timer will run for 63s by default, so I think it's possible
> that sk is close()d earlier than the timer expiration.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-10-12 4:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 14:15 [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink() Kuniyuki Iwashima
2024-10-07 23:26 ` Jakub Kicinski
2024-10-07 23:52 ` Kuniyuki Iwashima
2024-10-08 9:54 ` Eric Dumazet
2024-10-08 14:21 ` Kuniyuki Iwashima
2024-10-08 14:28 ` Eric Dumazet
2024-10-08 14:42 ` Kuniyuki Iwashima
2024-10-12 4:01 ` Martin KaFai Lau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).