From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: bpf@vger.kernel.org, edumazet@google.com, kuba@kernel.org,
netdev@vger.kernel.org
Subject: Re: [Question]: A non NULL req->sk in tcp_rtx_synack. Not a fastopen connection.
Date: Thu, 3 Oct 2024 21:00:20 -0700 [thread overview]
Message-ID: <341af7e1-7817-4aca-97dc-8f2813a086df@linux.dev> (raw)
In-Reply-To: <20241004020255.36532-1-kuniyu@amazon.com>
On 10/3/24 7:02 PM, Kuniyuki Iwashima wrote:
> From: Martin KaFai Lau <martin.lau@linux.dev>
> Date: Thu, 3 Oct 2024 18:14:09 -0700
>> Hi,
>>
>> We are seeing a use-after-free from a bpf prog attached to
>> trace_tcp_retransmit_synack. The program passes the req->sk to the
>> bpf_sk_storage_get_tracing kernel helper which does check for null before using it.
>>
>> fastopen is not used.
>>
>> We got a kfence report on use-after-free (pasted at the end). It is running with
>> an older 6.4 kernel and we hardly hit this in production.
>>
>> From the upstream code, del_timer_sync() should have been done by
>> inet_csk_reqsk_queue_drop() before "req->sk = child;" is assigned in
>> inet_csk_reqsk_queue_add(). My understanding is the req->rsk_timer should have
>> been stopped before the "req->sk = child;" assignment.
>
> There seems to be a small race window in reqsk_queue_unlink().
>
> expire_timers() first calls detach_timer(, true), which marks the timer
> as not pending, and then calls reqsk_timer_handler().
>
> If reqsk_queue_unlink() calls timer_pending() just before expire_timers()
> calls reqsk_timer_handler(), reqsk_queue_unlink() could miss
> del_timer_sync() ?
This seems to explain it. :)
Does it mean there is a chance that the reqsk_timer_handler() may rearm the
timer again and I guess only a few more synack will be sent in this case and
should be no harm?
>
> ---8<---
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 2c5632d4fddb..4ba47ee6c9da 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1045,7 +1045,7 @@ static bool reqsk_queue_unlink(struct request_sock *req)
> found = __sk_nulls_del_node_init_rcu(sk);
> spin_unlock(lock);
> }
> - if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
> + if (del_timer_sync(&req->rsk_timer))
It seems the reqsk_timer_handler() will also call reqsk_queue_unlink() through
inet_csk_reqsk_queue_drop_and_put(). Not sure if the reqsk_timer_handler() can
del_timer_sync() itself.
> reqsk_put(req);
> return found;
> }
> ---8<---
>
>
>>
>> or there are cases that req->sk is not NULL in the reqsk_timer_handler()?
>>
>> BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
>>
>> Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
>> bpf_sk_storage_get_tracing+0x2e/0x1b0
>> bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
>> bpf_trace_run2+0x4c/0xc0
>> tcp_rtx_synack+0xf9/0x100
>> reqsk_timer_handler+0xda/0x3d0
>> run_timer_softirq+0x292/0x8a0
>> irq_exit_rcu+0xf5/0x320
>> sysvec_apic_timer_interrupt+0x6d/0x80
>> asm_sysvec_apic_timer_interrupt+0x16/0x20
>> intel_idle_irq+0x5a/0xa0
>> cpuidle_enter_state+0x94/0x273
>> cpu_startup_entry+0x15e/0x260
>> start_secondary+0x8a/0x90
>> secondary_startup_64_no_verify+0xfa/0xfb
>>
>> kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
>>
>> allocated by task 0 on cpu 9 at 260507.901592s:
>> sk_prot_alloc+0x35/0x140
>> sk_clone_lock+0x1f/0x3f0
>> inet_csk_clone_lock+0x15/0x160
>> tcp_create_openreq_child+0x1f/0x410
>> tcp_v6_syn_recv_sock+0x1da/0x700
>> tcp_check_req+0x1fb/0x510
>> tcp_v6_rcv+0x98b/0x1420
>> ipv6_list_rcv+0x2258/0x26e0
>> napi_complete_done+0x5b1/0x2990
>> mlx5e_napi_poll+0x2ae/0x8d0
>> net_rx_action+0x13e/0x590
>> irq_exit_rcu+0xf5/0x320
>> common_interrupt+0x80/0x90
>> asm_common_interrupt+0x22/0x40
>> cpuidle_enter_state+0xfb/0x273
>> cpu_startup_entry+0x15e/0x260
>> start_secondary+0x8a/0x90
>> secondary_startup_64_no_verify+0xfa/0xfb
>>
>> freed by task 0 on cpu 9 at 260507.927527s:
>> rcu_core_si+0x4ff/0xf10
>> irq_exit_rcu+0xf5/0x320
>> sysvec_apic_timer_interrupt+0x6d/0x80
>> asm_sysvec_apic_timer_interrupt+0x16/0x20
>> cpuidle_enter_state+0xfb/0x273
>> cpu_startup_entry+0x15e/0x260
>> start_secondary+0x8a/0x90
>> secondary_startup_64_no_verify+0xfa/0xfb
>>
>> Thanks,
>> Martin
next prev parent reply other threads:[~2024-10-04 4:00 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-04 1:14 [Question]: A non NULL req->sk in tcp_rtx_synack. Not a fastopen connection Martin KaFai Lau
2024-10-04 2:02 ` Kuniyuki Iwashima
2024-10-04 4:00 ` Martin KaFai Lau [this message]
2024-10-04 20:37 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=341af7e1-7817-4aca-97dc-8f2813a086df@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.