From: Eric Dumazet <eric.dumazet@gmail.com>
To: Neal Cardwell <ncardwell@google.com>, Lawrence Brakmo <brakmo@fb.com>
Cc: Yuchung Cheng <ycheng@google.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"edumazet@google.com" <edumazet@google.com>,
"soheil@google.com" <soheil@google.com>
Subject: Re: [PATCH net] bpf: always re-init the congestion control after switching to it
Date: Tue, 23 Jan 2018 11:50:04 -0800 [thread overview]
Message-ID: <1516737004.3715.8.camel@gmail.com> (raw)
In-Reply-To: <CADVnQym8+Z_OXDk7a5tVH7a4HHkXLFQ8AnjJ0qxF6dmC8=tODA@mail.gmail.com>
On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> >
> > The original patch that changes TCP's congestion control via eBPF only
> > re-initializes the new congestion control, if the BPF op is set to an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> >
> > What do you mean by “(invalid) value”?
> >
> > run the new congestion control from random states.
> >
> > This has always been possible with setsockopt, no?
> >
> > This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> >
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is established.
> >
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
>
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> >
> > The original patch that changes TCP's congestion control via eBPF only
> > re-initializes the new congestion control, if the BPF op is set to an
> > (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> >
> > What do you mean by “(invalid) value”?
> >
> > run the new congestion control from random states.
> >
> > This has always been possible with setsockopt, no?
> >
> > This patch fixes
> > the issue by always re-init the congestion control like other means
> > such as setsockopt and sysctl changes.
> >
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is established.
> >
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
>
> Interesting. So I wonder if the symptoms we were seeing were due to
> kernels that did not yet have this fix:
>
> 27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> connection):
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
>
> Before that fix, there could be TFO passive connections that at SYN time called:
> tcp_init_congestion_control(child);
> and then:
> tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
>
> So that if the CC was switched in the
> BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> init for the new module?
Note that bpf_sock->op can be written by a malicious BPF filter.
So, a malicious filter can switch from Cubic to BBR without re-init,
and bad things can happen.
I do not believe we should trust BPF here.
next prev parent reply other threads:[~2018-01-23 19:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-23 17:30 [PATCH net] bpf: always re-init the congestion control after switching to it Yuchung Cheng
2018-01-23 19:20 ` Lawrence Brakmo
2018-01-23 19:39 ` Neal Cardwell
2018-01-23 19:50 ` Eric Dumazet [this message]
2018-01-23 20:19 ` Lawrence Brakmo
2018-01-23 21:39 ` Yuchung Cheng
2018-01-23 23:25 ` Alexei Starovoitov
2018-01-23 23:30 ` Lawrence Brakmo
2018-01-23 23:36 ` Yuchung Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1516737004.3715.8.camel@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=brakmo@fb.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=soheil@google.com \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).