From: Daniel Borkmann <daniel@iogearbox.net>
To: Lawrence Brakmo <brakmo@fb.com>, netdev <netdev@vger.kernel.org>
Cc: Kernel Team <Kernel-team@fb.com>, Blake Matheny <bmatheny@fb.com>,
Alexei Starovoitov <ast@fb.com>,
David Ahern <dsa@cumulusnetworks.com>
Subject: Re: [RFC PATCH net-next v2 10/15] bpf: Add support for changing congestion control
Date: Tue, 20 Jun 2017 00:34:06 +0200 [thread overview]
Message-ID: <5948515E.1090607@iogearbox.net> (raw)
In-Reply-To: <59B6F12A-9740-4177-BCB3-72CD65EB1225@fb.com>
On 06/18/2017 04:39 AM, Lawrence Brakmo wrote:
> On 6/16/17, 6:58 AM, "Daniel Borkmann" <daniel@iogearbox.net> wrote:
[...]
> > /* Change congestion control for socket */
> > -int tcp_set_congestion_control(struct sock *sk, const char *name)
> > +int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
> > {
> > struct inet_connection_sock *icsk = inet_csk(sk);
> > const struct tcp_congestion_ops *ca;
> > @@ -344,7 +344,10 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
> > return -EPERM;
> >
> > rcu_read_lock();
> > - ca = __tcp_ca_find_autoload(name);
> > + if (!load)
> > + ca = tcp_ca_find(name);
> > + else
> > + ca = __tcp_ca_find_autoload(name);
>
> From BPF program side, we call with !load since we're not allowed
> to sleep under RCU, that's correct ...
>
> > /* No change asking for existing value */
> > if (ca == icsk->icsk_ca_ops) {
> > icsk->icsk_ca_setsockopt = 1;
> > @@ -352,8 +355,10 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
> > }
> > if (!ca)
> > err = -ENOENT;
> > + else if (!load)
> > + icsk->icsk_ca_ops = ca;
>
> ... but don't we also need to hold a module ref in this case as done
> below?
>
> Meaning, tcp_ca_find() could return a ca that was previously loaded
> to the tcp_cong_list as module, then resulting in ref count imbalance
> when set from BPF?
>
> As I mentioned above, this can be called before congestion has been
> initialized (op <= BPF_SOCKET_OPS_NEEDS_ECN) in which case
> tcp_init_congestion_control will be called later. If op > ..OPS_NEEDS_ECN
> then bpf_setsockopt() will call the reinit_congestion_control().
>
> But this points to an issue where someone else could call
> tcp_set_congestion_control() with load == false not knowing they
> need to call either init or reinit. I will add a comment to the function
> to make it clear.
Hm, I'm not sure it answers my question. What I meant was that from BPF
prog, you're setting tcp_set_congestion_control(..., false) so if
tcp_ca_find() returns a ca that was loaded earlier as a from a module
(so it becomes available in tcp_cong_list), the above...
[...]
else if (!load)
icsk->icsk_ca_ops = ca;
[...]
... will basically prevent the later try_module_get() on the ca. So any
later tcp_reinit_congestion_control() or tcp_init_congestion_control()
will still run not having the refcount held on the owner module. Meaning
a module unload would let the machine crash due to the refcnt imbalance?
What am I missing?
next prev parent reply other threads:[~2017-06-19 22:34 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-15 20:08 [RFC PATCH net-next v2 00/15] bpf: BPF support for socket ops Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 01/15] " Lawrence Brakmo
2017-06-16 12:07 ` Daniel Borkmann
2017-06-16 23:41 ` Lawrence Brakmo
2017-06-19 18:44 ` Daniel Borkmann
2017-06-19 20:49 ` Lawrence Brakmo
2017-06-17 21:48 ` Lawrence Brakmo
2017-06-19 18:52 ` Daniel Borkmann
2017-06-19 20:49 ` Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 02/15] bpf: program to load socketops BPF programs Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
2017-06-16 13:27 ` Daniel Borkmann
2017-06-17 23:17 ` Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 08/15] bpf: Add TCP connection BPF callbacks Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
2017-06-16 13:58 ` Daniel Borkmann
2017-06-18 2:39 ` Lawrence Brakmo
2017-06-19 22:34 ` Daniel Borkmann [this message]
2017-06-20 0:35 ` Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 12/15] bpf: Adds support for setting initial cwnd Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 14/15] bpf: Adds support for setting sndcwnd clamp Lawrence Brakmo
2017-06-15 20:08 ` [RFC PATCH net-next v2 15/15] bpf: Sample bpf program to set " Lawrence Brakmo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5948515E.1090607@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=Kernel-team@fb.com \
--cc=ast@fb.com \
--cc=bmatheny@fb.com \
--cc=brakmo@fb.com \
--cc=dsa@cumulusnetworks.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.