[PATCH net] bpf: always re-init the congestion control after switching to it

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] bpf: always re-init the congestion control after switching to it
@ 2018-01-23 17:30 Yuchung Cheng
  2018-01-23 19:20 ` Lawrence Brakmo
  0 siblings, 1 reply; 9+ messages in thread
From: Yuchung Cheng @ 2018-01-23 17:30 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, ncardwell, soheil, brakmo, Yuchung Cheng

The original patch that changes TCP's congestion control via eBPF only
re-initializes the new congestion control, if the BPF op is set to an
(invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
run the new congestion control from random states. This patch fixes
the issue by always re-init the congestion control like other means
such as setsockopt and sysctl changes.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/net/tcp.h   |  2 +-
 net/core/filter.c   |  3 +--
 net/ipv4/tcp.c      |  2 +-
 net/ipv4/tcp_cong.c | 11 ++---------
 4 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6da880d2f022..f94a71b62ba5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1006,7 +1006,7 @@ void tcp_get_default_congestion_control(struct net *net, char *name);
 void tcp_get_available_congestion_control(char *buf, size_t len);
 void tcp_get_allowed_congestion_control(char *buf, size_t len);
 int tcp_set_allowed_congestion_control(char *allowed);
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit);
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
 u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
 void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6a85e67fafce..757d52adccfc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3233,12 +3233,11 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 		   sk->sk_prot->setsockopt == tcp_setsockopt) {
 		if (optname == TCP_CONGESTION) {
 			char name[TCP_CA_NAME_MAX];
-			bool reinit = bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN;
 
 			strncpy(name, optval, min_t(long, optlen,
 						    TCP_CA_NAME_MAX-1));
 			name[TCP_CA_NAME_MAX-1] = 0;
-			ret = tcp_set_congestion_control(sk, name, false, reinit);
+			ret = tcp_set_congestion_control(sk, name, false);
 		} else {
 			struct tcp_sock *tp = tcp_sk(sk);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f08eebe60446..21e2a07e857e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2550,7 +2550,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		name[val] = 0;
 
 		lock_sock(sk);
-		err = tcp_set_congestion_control(sk, name, true, true);
+		err = tcp_set_congestion_control(sk, name, true);
 		release_sock(sk);
 		return err;
 	}
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index bc6c02f16243..70895bee3026 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -332,7 +332,7 @@ int tcp_set_allowed_congestion_control(char *val)
  * tcp_reinit_congestion_control (if the current congestion control was
  * already initialized.
  */
-int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit)
+int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	const struct tcp_congestion_ops *ca;
@@ -356,15 +356,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, boo
 	if (!ca) {
 		err = -ENOENT;
 	} else if (!load) {
-		const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
-
 		if (try_module_get(ca->owner)) {
-			if (reinit) {
-				tcp_reinit_congestion_control(sk, ca);
-			} else {
-				icsk->icsk_ca_ops = ca;
-				module_put(old_ca->owner);
-			}
+			tcp_reinit_congestion_control(sk, ca);
 		} else {
 			err = -EBUSY;
 		}
-- 
2.16.0.rc1.238.g530d649a79-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 17:30 [PATCH net] bpf: always re-init the congestion control after switching to it Yuchung Cheng
@ 2018-01-23 19:20 ` Lawrence Brakmo
  2018-01-23 19:39   ` Neal Cardwell
  0 siblings, 1 reply; 9+ messages in thread
From: Lawrence Brakmo @ 2018-01-23 19:20 UTC (permalink / raw)
  To: Yuchung Cheng, davem@davemloft.net
  Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com,
	soheil@google.com

On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:

    The original patch that changes TCP's congestion control via eBPF only
    re-initializes the new congestion control, if the BPF op is set to an
    (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will

What do you mean by “(invalid) value”?

    run the new congestion control from random states. 

This has always been possible with setsockopt, no?

   This patch fixes
    the issue by always re-init the congestion control like other means
    such as setsockopt and sysctl changes.

The current code re-inits the congestion control when calling 
tcp_set_congestion_control except when it is called early on (i.e. op <= 
BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
since it will be initialized later by TCP when the connection is established.

Otherwise, if we always call tcp_reinit_congestion_control we would call 
tcp_cleanup_congestion_control when the congestion control has not been
initialized yet.


    Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
    Signed-off-by: Yuchung Cheng <ycheng@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
    ---
     include/net/tcp.h   |  2 +-
     net/core/filter.c   |  3 +--
     net/ipv4/tcp.c      |  2 +-
     net/ipv4/tcp_cong.c | 11 ++---------
     4 files changed, 5 insertions(+), 13 deletions(-)
    
    diff --git a/include/net/tcp.h b/include/net/tcp.h
    index 6da880d2f022..f94a71b62ba5 100644
    --- a/include/net/tcp.h
    +++ b/include/net/tcp.h
    @@ -1006,7 +1006,7 @@ void tcp_get_default_congestion_control(struct net *net, char *name);
     void tcp_get_available_congestion_control(char *buf, size_t len);
     void tcp_get_allowed_congestion_control(char *buf, size_t len);
     int tcp_set_allowed_congestion_control(char *allowed);
    -int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit);
    +int tcp_set_congestion_control(struct sock *sk, const char *name, bool load);
     u32 tcp_slow_start(struct tcp_sock *tp, u32 acked);
     void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked);
     
    diff --git a/net/core/filter.c b/net/core/filter.c
    index 6a85e67fafce..757d52adccfc 100644
    --- a/net/core/filter.c
    +++ b/net/core/filter.c
    @@ -3233,12 +3233,11 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
     		   sk->sk_prot->setsockopt == tcp_setsockopt) {
     		if (optname == TCP_CONGESTION) {
     			char name[TCP_CA_NAME_MAX];
    -			bool reinit = bpf_sock->op > BPF_SOCK_OPS_NEEDS_ECN;
     
     			strncpy(name, optval, min_t(long, optlen,
     						    TCP_CA_NAME_MAX-1));
     			name[TCP_CA_NAME_MAX-1] = 0;
    -			ret = tcp_set_congestion_control(sk, name, false, reinit);
    +			ret = tcp_set_congestion_control(sk, name, false);
     		} else {
     			struct tcp_sock *tp = tcp_sk(sk);
     
    diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
    index f08eebe60446..21e2a07e857e 100644
    --- a/net/ipv4/tcp.c
    +++ b/net/ipv4/tcp.c
    @@ -2550,7 +2550,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
     		name[val] = 0;
     
     		lock_sock(sk);
    -		err = tcp_set_congestion_control(sk, name, true, true);
    +		err = tcp_set_congestion_control(sk, name, true);
     		release_sock(sk);
     		return err;
     	}
    diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
    index bc6c02f16243..70895bee3026 100644
    --- a/net/ipv4/tcp_cong.c
    +++ b/net/ipv4/tcp_cong.c
    @@ -332,7 +332,7 @@ int tcp_set_allowed_congestion_control(char *val)
      * tcp_reinit_congestion_control (if the current congestion control was
      * already initialized.
      */
    -int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, bool reinit)
    +int tcp_set_congestion_control(struct sock *sk, const char *name, bool load)
     {
     	struct inet_connection_sock *icsk = inet_csk(sk);
     	const struct tcp_congestion_ops *ca;
    @@ -356,15 +356,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, boo
     	if (!ca) {
     		err = -ENOENT;
     	} else if (!load) {
    -		const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops;
    -
     		if (try_module_get(ca->owner)) {
    -			if (reinit) {
    -				tcp_reinit_congestion_control(sk, ca);
    -			} else {
    -				icsk->icsk_ca_ops = ca;
    -				module_put(old_ca->owner);
    -			}
    +			tcp_reinit_congestion_control(sk, ca);
     		} else {
     			err = -EBUSY;
     		}
    -- 
    2.16.0.rc1.238.g530d649a79-goog
    
    


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 19:20 ` Lawrence Brakmo
@ 2018-01-23 19:39   ` Neal Cardwell
  2018-01-23 19:50     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Neal Cardwell @ 2018-01-23 19:39 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: Yuchung Cheng, davem@davemloft.net, netdev@vger.kernel.org,
	edumazet@google.com, soheil@google.com

On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>
>     The original patch that changes TCP's congestion control via eBPF only
>     re-initializes the new congestion control, if the BPF op is set to an
>     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>
> What do you mean by “(invalid) value”?
>
>     run the new congestion control from random states.
>
> This has always been possible with setsockopt, no?
>
>    This patch fixes
>     the issue by always re-init the congestion control like other means
>     such as setsockopt and sysctl changes.
>
> The current code re-inits the congestion control when calling
> tcp_set_congestion_control except when it is called early on (i.e. op <=
> BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> since it will be initialized later by TCP when the connection is established.
>
> Otherwise, if we always call tcp_reinit_congestion_control we would call
> tcp_cleanup_congestion_control when the congestion control has not been
> initialized yet.

On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>
>     The original patch that changes TCP's congestion control via eBPF only
>     re-initializes the new congestion control, if the BPF op is set to an
>     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>
> What do you mean by “(invalid) value”?
>
>     run the new congestion control from random states.
>
> This has always been possible with setsockopt, no?
>
>    This patch fixes
>     the issue by always re-init the congestion control like other means
>     such as setsockopt and sysctl changes.
>
> The current code re-inits the congestion control when calling
> tcp_set_congestion_control except when it is called early on (i.e. op <=
> BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> since it will be initialized later by TCP when the connection is established.
>
> Otherwise, if we always call tcp_reinit_congestion_control we would call
> tcp_cleanup_congestion_control when the congestion control has not been
> initialized yet.

Interesting. So I wonder if the symptoms we were seeing were due to
kernels that did not yet have this fix:

  27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
connection):
  https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf

Before that fix, there could be TFO passive connections that at SYN time called:
  tcp_init_congestion_control(child);
and then:
  tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);

So that if the CC was switched in the
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
init for the new module?

neal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 19:39   ` Neal Cardwell
@ 2018-01-23 19:50     ` Eric Dumazet
  2018-01-23 20:19       ` Lawrence Brakmo
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2018-01-23 19:50 UTC (permalink / raw)
  To: Neal Cardwell, Lawrence Brakmo
  Cc: Yuchung Cheng, davem@davemloft.net, netdev@vger.kernel.org,
	edumazet@google.com, soheil@google.com

On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> > 
> >     The original patch that changes TCP's congestion control via eBPF only
> >     re-initializes the new congestion control, if the BPF op is set to an
> >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> >     run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >    This patch fixes
> >     the issue by always re-init the congestion control like other means
> >     such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
> > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
> > 
> >     The original patch that changes TCP's congestion control via eBPF only
> >     re-initializes the new congestion control, if the BPF op is set to an
> >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
> > 
> > What do you mean by “(invalid) value”?
> > 
> >     run the new congestion control from random states.
> > 
> > This has always been possible with setsockopt, no?
> > 
> >    This patch fixes
> >     the issue by always re-init the congestion control like other means
> >     such as setsockopt and sysctl changes.
> > 
> > The current code re-inits the congestion control when calling
> > tcp_set_congestion_control except when it is called early on (i.e. op <=
> > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
> > since it will be initialized later by TCP when the connection is established.
> > 
> > Otherwise, if we always call tcp_reinit_congestion_control we would call
> > tcp_cleanup_congestion_control when the congestion control has not been
> > initialized yet.
> 
> Interesting. So I wonder if the symptoms we were seeing were due to
> kernels that did not yet have this fix:
> 
>   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
> connection):
>   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
> 
> Before that fix, there could be TFO passive connections that at SYN time called:
>   tcp_init_congestion_control(child);
> and then:
>   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
> 
> So that if the CC was switched in the
> BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
> init for the new module?


Note that bpf_sock->op can be written by a malicious BPF filter.

So, a malicious filter can switch from Cubic to BBR without re-init,
and bad things can happen.

I do not believe we should trust BPF here.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 19:50     ` Eric Dumazet
@ 2018-01-23 20:19       ` Lawrence Brakmo
  2018-01-23 21:39         ` Yuchung Cheng
  2018-01-23 23:25         ` Alexei Starovoitov
  0 siblings, 2 replies; 9+ messages in thread
From: Lawrence Brakmo @ 2018-01-23 20:19 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell
  Cc: Yuchung Cheng, davem@davemloft.net, netdev@vger.kernel.org,
	edumazet@google.com, soheil@google.com

On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:

    On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
    > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
    > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
    > > 
    > >     The original patch that changes TCP's congestion control via eBPF only
    > >     re-initializes the new congestion control, if the BPF op is set to an
    > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
    > > 
    > > What do you mean by “(invalid) value”?
    > > 
    > >     run the new congestion control from random states.
    > > 
    > > This has always been possible with setsockopt, no?
    > > 
    > >    This patch fixes
    > >     the issue by always re-init the congestion control like other means
    > >     such as setsockopt and sysctl changes.
    > > 
    > > The current code re-inits the congestion control when calling
    > > tcp_set_congestion_control except when it is called early on (i.e. op <=
    > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
    > > since it will be initialized later by TCP when the connection is established.
    > > 
    > > Otherwise, if we always call tcp_reinit_congestion_control we would call
    > > tcp_cleanup_congestion_control when the congestion control has not been
    > > initialized yet.
    > 
    > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
    > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
    > > 
    > >     The original patch that changes TCP's congestion control via eBPF only
    > >     re-initializes the new congestion control, if the BPF op is set to an
    > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
    > > 
    > > What do you mean by “(invalid) value”?
    > > 
    > >     run the new congestion control from random states.
    > > 
    > > This has always been possible with setsockopt, no?
    > > 
    > >    This patch fixes
    > >     the issue by always re-init the congestion control like other means
    > >     such as setsockopt and sysctl changes.
    > > 
    > > The current code re-inits the congestion control when calling
    > > tcp_set_congestion_control except when it is called early on (i.e. op <=
    > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
    > > since it will be initialized later by TCP when the connection is established.
    > > 
    > > Otherwise, if we always call tcp_reinit_congestion_control we would call
    > > tcp_cleanup_congestion_control when the congestion control has not been
    > > initialized yet.
    > 
    > Interesting. So I wonder if the symptoms we were seeing were due to
    > kernels that did not yet have this fix:
    > 
    >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
    > connection):
    >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
    > 
    > Before that fix, there could be TFO passive connections that at SYN time called:
    >   tcp_init_congestion_control(child);
    > and then:
    >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
    > 
    > So that if the CC was switched in the
    > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
    > init for the new module?
    
    
    Note that bpf_sock->op can be written by a malicious BPF filter.
    
    So, a malicious filter can switch from Cubic to BBR without re-init,
    and bad things can happen.
    
    I do not believe we should trust BPF here.
    
Very good point Eric. One solution would be to make bpf_sock->op not writeable by
the BPF program. 

Neal, you are correct that would have been a problem. I leave it up to you guys whether
making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
has not been initialized.

    


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 20:19       ` Lawrence Brakmo
@ 2018-01-23 21:39         ` Yuchung Cheng
  2018-01-23 23:25         ` Alexei Starovoitov
  1 sibling, 0 replies; 9+ messages in thread
From: Yuchung Cheng @ 2018-01-23 21:39 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: Eric Dumazet, Neal Cardwell, davem@davemloft.net,
	netdev@vger.kernel.org, edumazet@google.com, soheil@google.com

On Tue, Jan 23, 2018 at 12:19 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>
> On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>
>     On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
>     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     > >
>     > >     The original patch that changes TCP's congestion control via eBPF only
>     > >     re-initializes the new congestion control, if the BPF op is set to an
>     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     > >
>     > > What do you mean by “(invalid) value”?
>     > >
>     > >     run the new congestion control from random states.
>     > >
>     > > This has always been possible with setsockopt, no?
>     > >
>     > >    This patch fixes
>     > >     the issue by always re-init the congestion control like other means
>     > >     such as setsockopt and sysctl changes.
>     > >
>     > > The current code re-inits the congestion control when calling
>     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     > > since it will be initialized later by TCP when the connection is established.
>     > >
>     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     > > tcp_cleanup_congestion_control when the congestion control has not been
>     > > initialized yet.
>     >
>     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     > >
>     > >     The original patch that changes TCP's congestion control via eBPF only
>     > >     re-initializes the new congestion control, if the BPF op is set to an
>     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     > >
>     > > What do you mean by “(invalid) value”?
>     > >
>     > >     run the new congestion control from random states.
>     > >
>     > > This has always been possible with setsockopt, no?
>     > >
>     > >    This patch fixes
>     > >     the issue by always re-init the congestion control like other means
>     > >     such as setsockopt and sysctl changes.
>     > >
>     > > The current code re-inits the congestion control when calling
>     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     > > since it will be initialized later by TCP when the connection is established.
>     > >
>     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     > > tcp_cleanup_congestion_control when the congestion control has not been
>     > > initialized yet.
>     >
>     > Interesting. So I wonder if the symptoms we were seeing were due to
>     > kernels that did not yet have this fix:
>     >
>     >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
>     > connection):
>     >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
>     >
>     > Before that fix, there could be TFO passive connections that at SYN time called:
>     >   tcp_init_congestion_control(child);
>     > and then:
>     >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
>     >
>     > So that if the CC was switched in the
>     > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
>     > init for the new module?
>
>
>     Note that bpf_sock->op can be written by a malicious BPF filter.
>
>     So, a malicious filter can switch from Cubic to BBR without re-init,
>     and bad things can happen.
>
>     I do not believe we should trust BPF here.
>
> Very good point Eric. One solution would be to make bpf_sock->op not writeable by
> the BPF program.
>
> Neal, you are correct that would have been a problem. I leave it up to you guys whether
> making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
> re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
> has not been initialized.
Thank you Larry for the clarification. I prefer the latter approach
and will respin.

>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 20:19       ` Lawrence Brakmo
  2018-01-23 21:39         ` Yuchung Cheng
@ 2018-01-23 23:25         ` Alexei Starovoitov
  2018-01-23 23:30           ` Lawrence Brakmo
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2018-01-23 23:25 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: Eric Dumazet, Neal Cardwell, Yuchung Cheng, davem@davemloft.net,
	netdev@vger.kernel.org, edumazet@google.com, soheil@google.com

On Tue, Jan 23, 2018 at 08:19:54PM +0000, Lawrence Brakmo wrote:
> On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
> 
>     On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
>     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     > > 
>     > >     The original patch that changes TCP's congestion control via eBPF only
>     > >     re-initializes the new congestion control, if the BPF op is set to an
>     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     > > 
>     > > What do you mean by “(invalid) value”?
>     > > 
>     > >     run the new congestion control from random states.
>     > > 
>     > > This has always been possible with setsockopt, no?
>     > > 
>     > >    This patch fixes
>     > >     the issue by always re-init the congestion control like other means
>     > >     such as setsockopt and sysctl changes.
>     > > 
>     > > The current code re-inits the congestion control when calling
>     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     > > since it will be initialized later by TCP when the connection is established.
>     > > 
>     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     > > tcp_cleanup_congestion_control when the congestion control has not been
>     > > initialized yet.
>     > 
>     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     > > 
>     > >     The original patch that changes TCP's congestion control via eBPF only
>     > >     re-initializes the new congestion control, if the BPF op is set to an
>     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     > > 
>     > > What do you mean by “(invalid) value”?
>     > > 
>     > >     run the new congestion control from random states.
>     > > 
>     > > This has always been possible with setsockopt, no?
>     > > 
>     > >    This patch fixes
>     > >     the issue by always re-init the congestion control like other means
>     > >     such as setsockopt and sysctl changes.
>     > > 
>     > > The current code re-inits the congestion control when calling
>     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     > > since it will be initialized later by TCP when the connection is established.
>     > > 
>     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     > > tcp_cleanup_congestion_control when the congestion control has not been
>     > > initialized yet.
>     > 
>     > Interesting. So I wonder if the symptoms we were seeing were due to
>     > kernels that did not yet have this fix:
>     > 
>     >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
>     > connection):
>     >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
>     > 
>     > Before that fix, there could be TFO passive connections that at SYN time called:
>     >   tcp_init_congestion_control(child);
>     > and then:
>     >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
>     > 
>     > So that if the CC was switched in the
>     > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
>     > init for the new module?
>     
>     
>     Note that bpf_sock->op can be written by a malicious BPF filter.
>     
>     So, a malicious filter can switch from Cubic to BBR without re-init,
>     and bad things can happen.
>     
>     I do not believe we should trust BPF here.
>     
> Very good point Eric. One solution would be to make bpf_sock->op not writeable by
> the BPF program. 
> 
> Neal, you are correct that would have been a problem. I leave it up to you guys whether
> making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
> re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
> has not been initialized.

I think allowing write into 'op' and 'replylong' was a mistake.
Only 'reply' field is used by tcp_call_bpf().
No reason to let programs write into other fields.
I think we have to fix it now before programs start to rely
on this undefined behavior.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 23:25         ` Alexei Starovoitov
@ 2018-01-23 23:30           ` Lawrence Brakmo
  2018-01-23 23:36             ` Yuchung Cheng
  0 siblings, 1 reply; 9+ messages in thread
From: Lawrence Brakmo @ 2018-01-23 23:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Neal Cardwell, Yuchung Cheng, davem@davemloft.net,
	netdev@vger.kernel.org, edumazet@google.com, soheil@google.com



On 1/23/18, 3:26 PM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:

    On Tue, Jan 23, 2018 at 08:19:54PM +0000, Lawrence Brakmo wrote:
    > On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
    > 
    >     On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
    >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
    >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
    >     > > 
    >     > >     The original patch that changes TCP's congestion control via eBPF only
    >     > >     re-initializes the new congestion control, if the BPF op is set to an
    >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
    >     > > 
    >     > > What do you mean by “(invalid) value”?
    >     > > 
    >     > >     run the new congestion control from random states.
    >     > > 
    >     > > This has always been possible with setsockopt, no?
    >     > > 
    >     > >    This patch fixes
    >     > >     the issue by always re-init the congestion control like other means
    >     > >     such as setsockopt and sysctl changes.
    >     > > 
    >     > > The current code re-inits the congestion control when calling
    >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
    >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
    >     > > since it will be initialized later by TCP when the connection is established.
    >     > > 
    >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
    >     > > tcp_cleanup_congestion_control when the congestion control has not been
    >     > > initialized yet.
    >     > 
    >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
    >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
    >     > > 
    >     > >     The original patch that changes TCP's congestion control via eBPF only
    >     > >     re-initializes the new congestion control, if the BPF op is set to an
    >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
    >     > > 
    >     > > What do you mean by “(invalid) value”?
    >     > > 
    >     > >     run the new congestion control from random states.
    >     > > 
    >     > > This has always been possible with setsockopt, no?
    >     > > 
    >     > >    This patch fixes
    >     > >     the issue by always re-init the congestion control like other means
    >     > >     such as setsockopt and sysctl changes.
    >     > > 
    >     > > The current code re-inits the congestion control when calling
    >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
    >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
    >     > > since it will be initialized later by TCP when the connection is established.
    >     > > 
    >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
    >     > > tcp_cleanup_congestion_control when the congestion control has not been
    >     > > initialized yet.
    >     > 
    >     > Interesting. So I wonder if the symptoms we were seeing were due to
    >     > kernels that did not yet have this fix:
    >     > 
    >     >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
    >     > connection):
    >     >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
    >     > 
    >     > Before that fix, there could be TFO passive connections that at SYN time called:
    >     >   tcp_init_congestion_control(child);
    >     > and then:
    >     >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
    >     > 
    >     > So that if the CC was switched in the
    >     > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
    >     > init for the new module?
    >     
    >     
    >     Note that bpf_sock->op can be written by a malicious BPF filter.
    >     
    >     So, a malicious filter can switch from Cubic to BBR without re-init,
    >     and bad things can happen.
    >     
    >     I do not believe we should trust BPF here.
    >     
    > Very good point Eric. One solution would be to make bpf_sock->op not writeable by
    > the BPF program. 
    > 
    > Neal, you are correct that would have been a problem. I leave it up to you guys whether
    > making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
    > re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
    > has not been initialized.
    
    I think allowing write into 'op' and 'replylong' was a mistake.
    Only 'reply' field is used by tcp_call_bpf().
    No reason to let programs write into other fields.
    I think we have to fix it now before programs start to rely
    on this undefined behavior.
    
write into ‘op’ is a mistake. Writing to replylong is a mistake until we have a calling op
that uses the longer reply. I will do a patch to fix this once my outstanding patch is
accepted since otherwise I would need to update my current patch.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] bpf: always re-init the congestion control after switching to it
  2018-01-23 23:30           ` Lawrence Brakmo
@ 2018-01-23 23:36             ` Yuchung Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Yuchung Cheng @ 2018-01-23 23:36 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: Alexei Starovoitov, Eric Dumazet, Neal Cardwell,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	soheil@google.com

On Tue, Jan 23, 2018 at 3:30 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>
>
>
> On 1/23/18, 3:26 PM, "Alexei Starovoitov" <alexei.starovoitov@gmail.com> wrote:
>
>     On Tue, Jan 23, 2018 at 08:19:54PM +0000, Lawrence Brakmo wrote:
>     > On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@gmail.com> wrote:
>     >
>     >     On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
>     >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     >     > >
>     >     > >     The original patch that changes TCP's congestion control via eBPF only
>     >     > >     re-initializes the new congestion control, if the BPF op is set to an
>     >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     >     > >
>     >     > > What do you mean by “(invalid) value”?
>     >     > >
>     >     > >     run the new congestion control from random states.
>     >     > >
>     >     > > This has always been possible with setsockopt, no?
>     >     > >
>     >     > >    This patch fixes
>     >     > >     the issue by always re-init the congestion control like other means
>     >     > >     such as setsockopt and sysctl changes.
>     >     > >
>     >     > > The current code re-inits the congestion control when calling
>     >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     >     > > since it will be initialized later by TCP when the connection is established.
>     >     > >
>     >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     >     > > tcp_cleanup_congestion_control when the congestion control has not been
>     >     > > initialized yet.
>     >     >
>     >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@fb.com> wrote:
>     >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@google.com> wrote:
>     >     > >
>     >     > >     The original patch that changes TCP's congestion control via eBPF only
>     >     > >     re-initializes the new congestion control, if the BPF op is set to an
>     >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     >     > >
>     >     > > What do you mean by “(invalid) value”?
>     >     > >
>     >     > >     run the new congestion control from random states.
>     >     > >
>     >     > > This has always been possible with setsockopt, no?
>     >     > >
>     >     > >    This patch fixes
>     >     > >     the issue by always re-init the congestion control like other means
>     >     > >     such as setsockopt and sysctl changes.
>     >     > >
>     >     > > The current code re-inits the congestion control when calling
>     >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     >     > > since it will be initialized later by TCP when the connection is established.
>     >     > >
>     >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     >     > > tcp_cleanup_congestion_control when the congestion control has not been
>     >     > > initialized yet.
>     >     >
>     >     > Interesting. So I wonder if the symptoms we were seeing were due to
>     >     > kernels that did not yet have this fix:
>     >     >
>     >     >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
>     >     > connection):
>     >     >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
>     >     >
>     >     > Before that fix, there could be TFO passive connections that at SYN time called:
>     >     >   tcp_init_congestion_control(child);
>     >     > and then:
>     >     >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
>     >     >
>     >     > So that if the CC was switched in the
>     >     > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
>     >     > init for the new module?
>     >
>     >
>     >     Note that bpf_sock->op can be written by a malicious BPF filter.
>     >
>     >     So, a malicious filter can switch from Cubic to BBR without re-init,
>     >     and bad things can happen.
>     >
>     >     I do not believe we should trust BPF here.
>     >
>     > Very good point Eric. One solution would be to make bpf_sock->op not writeable by
>     > the BPF program.
>     >
>     > Neal, you are correct that would have been a problem. I leave it up to you guys whether
>     > making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
>     > re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
>     > has not been initialized.
>
>     I think allowing write into 'op' and 'replylong' was a mistake.
>     Only 'reply' field is used by tcp_call_bpf().
>     No reason to let programs write into other fields.
>     I think we have to fix it now before programs start to rely
>     on this undefined behavior.
>
> write into ‘op’ is a mistake. Writing to replylong is a mistake until we have a calling op
> that uses the longer reply. I will do a patch to fix this once my outstanding patch is
> accepted since otherwise I would need to update my current patch.
That'd be great. Thanks Larry. IMO conditioning reinit TCP C.C. module on bpf op
is also worth addressing. Otherwise that's just inviting future bugs.
So I am still interested in addressing
that (unless you'd like to address that too)

>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-01-23 23:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-23 17:30 [PATCH net] bpf: always re-init the congestion control after switching to it Yuchung Cheng
2018-01-23 19:20 ` Lawrence Brakmo
2018-01-23 19:39   ` Neal Cardwell
2018-01-23 19:50     ` Eric Dumazet
2018-01-23 20:19       ` Lawrence Brakmo
2018-01-23 21:39         ` Yuchung Cheng
2018-01-23 23:25         ` Alexei Starovoitov
2018-01-23 23:30           ` Lawrence Brakmo
2018-01-23 23:36             ` Yuchung Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).