From: Martin KaFai Lau <martin.lau@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Paolo Abeni <pabeni@redhat.com>
Cc: Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, MPTCP Upstream <mptcp@lists.linux.dev>,
Matthieu Baerts <matttbe@kernel.org>,
Mat Martineau <martineau@kernel.org>,
Jakub Kicinski <kuba@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: mptcp splat
Date: Wed, 27 Mar 2024 15:16:16 -0700 [thread overview]
Message-ID: <649dc1dc-ca80-4686-ae37-62d7c81dde8b@linux.dev> (raw)
In-Reply-To: <CAADnVQKXcEhL680E85=rrYuu4eVvVTH60kYRY_VnAKzZo1qKYg@mail.gmail.com>
On 3/27/24 11:50 AM, Alexei Starovoitov wrote:
> On Wed, Mar 27, 2024 at 11:33 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> On Wed, 2024-03-27 at 10:00 -0700, Alexei Starovoitov wrote:
>>> On Wed, Mar 27, 2024 at 9:56 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> On Wed, 2024-03-27 at 09:43 -0700, Alexei Starovoitov wrote:
>>>>> I ffwded bpf tree with the recent net fixes and caught this:
>>>>>
>>>>> [ 48.386337] WARNING: CPU: 32 PID: 3276 at net/mptcp/subflow.c:1430
>>>>> subflow_data_ready+0x147/0x1c0
>>>>> [ 48.392012] Modules linked in: dummy bpf_testmod(O) [last unloaded:
>>>>> bpf_test_no_cfi(O)]
>>>>> [ 48.396609] CPU: 32 PID: 3276 Comm: test_progs Tainted: G
>>>>> O 6.8.0-12873-g2c43c33bfd23 #1014
>>>>> #[ 48.467143] Call Trace:
>>>>> [ 48.469094] <TASK>
>>>>> [ 48.472159] ? __warn+0x80/0x180
>>>>> [ 48.475019] ? subflow_data_ready+0x147/0x1c0
>>>>> [ 48.478068] ? report_bug+0x189/0x1c0
>>>>> [ 48.480725] ? handle_bug+0x36/0x70
>>>>> [ 48.483061] ? exc_invalid_op+0x13/0x60
>>>>> [ 48.485809] ? asm_exc_invalid_op+0x16/0x20
>>>>> [ 48.488754] ? subflow_data_ready+0x147/0x1c0
>>>>> [ 48.492159] mptcp_set_rcvlowat+0x79/0x1d0
>>>>> [ 48.495026] sk_setsockopt+0x6c0/0x1540
>>>>>
>>>>> It doesn't reproduce all the time though.
>>>>> Some race?
>>>>> Known issue?
>>>>
>>>> It was not known to me. Looks like something related to not so recent
>>>> changes (rcvlowat support).
>>>>
>>>> Definitely looks lie a race.
>>>>
>>>> If you could share more info about the running context and/or a full
>>>> decoded splat it could help, thanks!
>>>
>>> This is just running bpf selftests in parallel:
>>> test_progs -j
>>>
>>> The end of the splat:
>>> [ 48.500075] __bpf_setsockopt+0x6f/0x90
>>> [ 48.503124] bpf_sock_ops_setsockopt+0x3c/0x90
>>> [ 48.506053] bpf_prog_509ce5db2c7f9981_bpf_test_sockopt_int+0xb4/0x11b
>>> [ 48.510178] bpf_prog_dce07e362d941d2b_bpf_test_socket_sockopt+0x12b/0x132
>>> [ 48.515070] bpf_prog_348c9b5faaf10092_skops_sockopt+0x954/0xe86
>>> [ 48.519050] __cgroup_bpf_run_filter_sock_ops+0xbc/0x250
>>> [ 48.523836] tcp_connect+0x879/0x1160
>>> [ 48.527239] ? ktime_get_with_offset+0x8d/0x140
>>> [ 48.531362] tcp_v6_connect+0x50c/0x870
>>> [ 48.534609] ? mptcp_connect+0x129/0x280
>>> [ 48.538483] mptcp_connect+0x129/0x280
>>> [ 48.542436] __inet_stream_connect+0xce/0x370
>>> [ 48.546664] ? rcu_is_watching+0xd/0x40
>>> [ 48.549063] ? lock_release+0x1c4/0x280
>>> [ 48.553497] ? inet_stream_connect+0x22/0x50
>>> [ 48.557289] ? rcu_is_watching+0xd/0x40
>>> [ 48.560430] inet_stream_connect+0x36/0x50
>>> [ 48.563604] bpf_trampoline_6442491565+0x49/0xef
>>> [ 48.567770] ? security_socket_connect+0x34/0x50
>>> [ 48.575400] inet_stream_connect+0x5/0x50
>>> [ 48.577721] __sys_connect+0x63/0x90
>>> [ 48.580189] ? bpf_trace_run2+0xb0/0x1a0
>>> [ 48.583171] ? rcu_is_watching+0xd/0x40
>>> [ 48.585802] ? syscall_trace_enter+0xfb/0x1e0
>>> [ 48.588836] __x64_sys_connect+0x14/0x20
>>
>> Ouch, it looks bad. BPF should not allow any action on mptcp subflows
>> that go through sk_socket. They touch the mptcp main socket, which is
>> _not_ protected by the subflow socket lock.
>>
>> AFICS currently the relevant set of racing sockopt allowed by bpf boils
>> down to SO_RCVLOWAT only - sk_setsockopt(SO_RCVLOWAT) will call sk-
>>> sk_socket->ops->set_rcvlowat()
>>
>> So something like the following (completely untested) should possibly
>> address the issue at hand, but I think it would be better/safer
>> completely disable ebpf on mptcp subflows, WDYT?
>>
>> Thanks,
>>
>> Paolo
>>
>> ---
>> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
>> index dcd1c76d2a3b..6e5e64c2cf89 100644
>> --- a/net/mptcp/sockopt.c
>> +++ b/net/mptcp/sockopt.c
>> @@ -1493,6 +1493,9 @@ int mptcp_set_rcvlowat(struct sock *sk, int val)
>> struct mptcp_subflow_context *subflow;
>> int space, cap;
>>
>> + if (has_current_bpf_ctx())
>> + return -EINVAL;
>> +
>
> Looks fine to me.
>
> Martin,
>
> Do you have any better ideas?
>
> The splat explains the race.
> In this case setget_sockopt test happen to run in parallel
> with mptcp/bpf test and the former one was TCP connect request
> but it was for subflow.
>
> We can disable that callback when tcp flow is a subflow,
> but that doesn't feel right.
I am also not sure if we can disable all set/getsockopt for tcp subflows. Not
clear if there is use case that depends on this to setsockopt the subflow.
> I think Paolo's targeted fix is cleaner.
I will also go with Paolo's fix. The radius is smaller. I don't have a better idea.
Unrelated, is there a way to tell if a tcp_sock is a subflow? bpf prog can use
it to decide if it wants to setsockopt on a subflow or not.
next prev parent reply other threads:[~2024-03-27 22:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 16:43 mptcp splat Alexei Starovoitov
2024-03-27 16:56 ` Paolo Abeni
2024-03-27 17:00 ` Alexei Starovoitov
2024-03-27 18:32 ` Paolo Abeni
2024-03-27 18:45 ` Mat Martineau
2024-03-27 18:50 ` Alexei Starovoitov
2024-03-27 22:16 ` Martin KaFai Lau [this message]
2024-03-28 17:35 ` Matthieu Baerts
2024-03-29 16:26 ` Alexei Starovoitov
2024-03-29 16:34 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=649dc1dc-ca80-4686-ae37-62d7c81dde8b@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=kuba@kernel.org \
--cc=martin.lau@kernel.org \
--cc=martineau@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.