From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@amazon.com>, edumazet@google.com
Cc: davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org,
kuni1840@gmail.com, martin.lau@kernel.org,
netdev@vger.kernel.org, pabeni@redhat.com
Subject: Re: [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
Date: Fri, 11 Oct 2024 21:01:59 -0700 [thread overview]
Message-ID: <b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev> (raw)
In-Reply-To: <20241008144205.83199-1-kuniyu@amazon.com>
On 10/8/24 7:42 AM, Kuniyuki Iwashima wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 8 Oct 2024 16:28:53 +0200
>> On Tue, Oct 8, 2024 at 4:21 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>>
>>> From: Eric Dumazet <edumazet@google.com>
>>> Date: Tue, 8 Oct 2024 11:54:21 +0200
>>>> On Tue, Oct 8, 2024 at 1:53 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>>>>>
>>>>> From: Jakub Kicinski <kuba@kernel.org>
>>>>> Date: Mon, 7 Oct 2024 16:26:10 -0700
>>>>>> On Mon, 7 Oct 2024 07:15:57 -0700 Kuniyuki Iwashima wrote:
>>>>>>> Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
>>>>>>>
>>>>>>> """
>>>>>>> We are seeing a use-after-free from a bpf prog attached to
>>>>>>> trace_tcp_retransmit_synack. The program passes the req->sk to the
>>>>>>> bpf_sk_storage_get_tracing kernel helper which does check for null
>>>>>>> before using it.
>>>>>>> """
>>>>>>
>>>>>> I think this crashes a bunch of selftests, example:
>>>>>>
>>>>>> https://netdev-3.bots.linux.dev/vmksft-nf-dbg/results/805581/8-nft-queue-sh/stderr
>>>>>
>>>>> Oops, sorry, I copy-and-pasted __inet_csk_reqsk_queue_drop()
>>>>> for different reqsk. I'll squash the diff below.
>>>>>
>>>>> ---8<---
>>>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>>>> index 36f03d51356e..433c80dc57d5 100644
>>>>> --- a/net/ipv4/inet_connection_sock.c
>>>>> +++ b/net/ipv4/inet_connection_sock.c
>>>>> @@ -1188,7 +1190,7 @@ static void reqsk_timer_handler(struct timer_list *t)
>>>>> }
>>>>>
>>>>> drop:
>>>>> - __inet_csk_reqsk_queue_drop(sk_listener, nreq, true);
>>>>> + __inet_csk_reqsk_queue_drop(sk_listener, oreq, true);
>>>>> reqsk_put(req);
>>>>> }
>>>>>
>>>>> ---8<---
>>>>>
>>>>> Thanks!
>>>>
>>>> Just to clarify. In the old times rsk_timer was pinned, right ?
>>>>
>>>> 83fccfc3940c4 ("inet: fix potential deadlock in reqsk_queue_unlink()")
>>>> was fine I think.
>>>>
>>>> So the bug was added recently ?
>>>>
>>>> Can we give a precise Fixes: tag ?
>>>
>>> TIMER_PINNED was used in reqsk_queue_hash_req() in v6.4 mentioned
>>> by Martin and still used in the latest net-next.
>>>
>>> $ git blame -L:reqsk_queue_hash_req net/ipv4/inet_connection_sock.c v6.4
>>> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1095) static void reqsk_queue_hash_req(struct request_sock *req,
>>> 079096f103fac (Eric Dumazet 2015-10-02 11:43:32 -0700 1096) unsigned long timeout)
>>> fa76ce7328b28 (Eric Dumazet 2015-03-19 19:04:20 -0700 1097) {
>>> 59f379f9046a9 (Kees Cook 2017-10-16 17:29:19 -0700 1098) timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
>>>
>>> Maybe the connection was localhost, or unlikely but RPS was
>>> configured after SYN+ACK, or setup like ff46e3b44219 was used ??
I don't know what exactly caused the ack to be handled on a different CPU. We
have a recent packet steering test, so it could be caused by this test
adjusting the steering config.
>>
>> I do not really understand the issue.
>> How a sk can be 'closed' with outstanding request sock ?
>> They hold a refcount on the listener.
>
> My understanding is
>
> 1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
> but del_timer_sync() is missed
>
> 2. reqsk timer is executed and scheduled again
>
> 3. req->sk is accept()ed, but inet_csk_accept() does not clear
> req->sk for non-TFO sockets, and reqsk_put() decrements one
> refcnt, but still reqsk timer has another one
>
> 4. sk is close()d
>
> 5. reqsk timer is executed again, and BPF touches req->sk
The above is also what I think is happening.
The kernel reqsk_timer_handler() is not using req->sk, so it has not been an issue.
>
> reqsk timer will run for 63s by default, so I think it's possible
> that sk is close()d earlier than the timer expiration.
>
prev parent reply other threads:[~2024-10-12 4:02 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-07 14:15 [PATCH v1 net] tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink() Kuniyuki Iwashima
2024-10-07 23:26 ` Jakub Kicinski
2024-10-07 23:52 ` Kuniyuki Iwashima
2024-10-08 9:54 ` Eric Dumazet
2024-10-08 14:21 ` Kuniyuki Iwashima
2024-10-08 14:28 ` Eric Dumazet
2024-10-08 14:42 ` Kuniyuki Iwashima
2024-10-12 4:01 ` Martin KaFai Lau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev \
--to=martin.lau@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=kuniyu@amazon.com \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.