[PATCH v2 bpf-next 0/5] bpf: Remove recursion check for struct

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog
@ 2022-09-23 22:44 Martin KaFai Lau
       [not found] ` <20220923224518.2353383-1-kafai@fb.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2022-09-23 22:44 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Eric Dumazet, Jakub Kicinski, kernel-team,
	Paolo Abeni

From: Martin KaFai Lau <martin.lau@kernel.org>

The struct_ops is sharing the tracing-trampoline's enter/exit
function which tracks prog->active to avoid recursion.  It turns
out the struct_ops bpf prog will hit this prog->active and
unnecessarily skipped running the struct_ops prog.  eg.  The
'.ssthresh' may run in_task() and then interrupted by softirq
that runs the same '.ssthresh'.

The kernel does not call the tcp-cc's ops in a recursive way,
so this set is to remove the recursion check for struct_ops prog.

v2:
v1 [0] turned into a long discussion on a few cases and also
whether it needs to follow the bpf_run_ctx chain if there is
tracing bpf_run_ctx (kprobe/trace/trampoline) running in between.

It is a good signal that it is not obvious enough to reason
about it and needs a tradeoff for a more straight forward approach.

This revision uses one bit out of an existing 1 byte hole
in the tcp_sock.  It is in Patch 4.

[0]: https://lore.kernel.org/bpf/20220922225616.3054840-1-kafai@fb.com/T/#md98d40ac5ec295fdadef476c227a3401b2b6b911

Martin KaFai Lau (5):
  bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline
  bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt()
  bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another
    function
  bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur
    itself
  selftests/bpf: Check -EBUSY for the recurred
    bpf_setsockopt(TCP_CONGESTION)

 arch/x86/net/bpf_jit_comp.c                   |  3 +
 include/linux/bpf.h                           |  4 ++
 include/linux/tcp.h                           |  6 ++
 kernel/bpf/trampoline.c                       | 23 ++++++
 net/core/filter.c                             | 70 ++++++++++++++-----
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 ++
 tools/testing/selftests/bpf/progs/bpf_dctcp.c | 25 ++++---
 7 files changed, 111 insertions(+), 24 deletions(-)

-- 
2.30.2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
       [not found] ` <20220923224518.2353383-1-kafai@fb.com>
@ 2022-09-27  3:34   ` Alexei Starovoitov
  2022-09-29  1:12     ` Alexei Starovoitov
  2022-09-29  2:04   ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2022-09-27  3:34 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Network Development, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	Kernel Team, Paolo Abeni, Martin KaFai Lau

On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> When a bad bpf prog '.init' calls
> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
>
> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
> ... => .init => bpf_setsockopt(tcp_cc).
>
> It was prevented by the prog->active counter before but the prog->active
> detection cannot be used in struct_ops as explained in the earlier
> patch of the set.
>
> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
> in order to break the loop.  This is done by using a bit of
> an existing 1 byte hole in tcp_sock to check if there is
> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
>
> Note that this essentially limits only the first '.init' can
> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
> does not support ECN) and the second '.init' cannot fallback to
> another cc.  This applies even the second
> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
>
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> ---
>  include/linux/tcp.h |  6 ++++++
>  net/core/filter.c   | 28 +++++++++++++++++++++++++++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index a9fbe22732c3..3bdf687e2fb3 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -388,6 +388,12 @@ struct tcp_sock {
>         u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>                                          * values defined in uapi/linux/tcp.h
>                                          */
> +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
> +                                         * bpf_setsockopt(TCP_CONGESTION),
> +                                         * it is to avoid the bpf_tcp_cc->init()
> +                                         * to recur itself by calling
> +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
> +                                         */
>  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>  #else
>  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 96f2f7a65e65..ac4c45c02da5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
>  static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>                                       int *optlen, bool getopt)
>  {
> +       struct tcp_sock *tp;
> +       int ret;
> +
>         if (*optlen < 2)
>                 return -EINVAL;
>
> @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>         if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
>                 return -ENOTSUPP;
>
> -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
> +       /* It stops this looping
> +        *
> +        * .init => bpf_setsockopt(tcp_cc) => .init =>
> +        * bpf_setsockopt(tcp_cc)" => .init => ....
> +        *
> +        * The second bpf_setsockopt(tcp_cc) is not allowed
> +        * in order to break the loop when both .init
> +        * are the same bpf prog.
> +        *
> +        * This applies even the second bpf_setsockopt(tcp_cc)
> +        * does not cause a loop.  This limits only the first
> +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
> +        * pick a fallback cc (eg. peer does not support ECN)
> +        * and the second '.init' cannot fallback to
> +        * another.
> +        */
> +       tp = tcp_sk(sk);
> +       if (tp->bpf_chg_cc_inprogress)
> +               return -EBUSY;
> +
> +       tp->bpf_chg_cc_inprogress = 1;
> +       ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
>                                 KERNEL_SOCKPTR(optval), *optlen);
> +       tp->bpf_chg_cc_inprogress = 0;
> +       return ret;

Eric,

Could you please ack this patch?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
  2022-09-27  3:34   ` [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Alexei Starovoitov
@ 2022-09-29  1:12     ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2022-09-29  1:12 UTC (permalink / raw)
  To: Martin KaFai Lau, Eric Dumazet
  Cc: bpf, Network Development, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, David Miller, Eric Dumazet, Jakub Kicinski,
	Kernel Team, Paolo Abeni, Martin KaFai Lau

Eric,

Ping! This is an important fix for anyone using bpf-based tcp-cc.

On Mon, Sep 26, 2022 at 8:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > From: Martin KaFai Lau <martin.lau@kernel.org>
> >
> > When a bad bpf prog '.init' calls
> > bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
> >
> > .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
> > ... => .init => bpf_setsockopt(tcp_cc).
> >
> > It was prevented by the prog->active counter before but the prog->active
> > detection cannot be used in struct_ops as explained in the earlier
> > patch of the set.
> >
> > In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
> > in order to break the loop.  This is done by using a bit of
> > an existing 1 byte hole in tcp_sock to check if there is
> > on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
> >
> > Note that this essentially limits only the first '.init' can
> > call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
> > does not support ECN) and the second '.init' cannot fallback to
> > another cc.  This applies even the second
> > bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
> >
> > Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> > ---
> >  include/linux/tcp.h |  6 ++++++
> >  net/core/filter.c   | 28 +++++++++++++++++++++++++++-
> >  2 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index a9fbe22732c3..3bdf687e2fb3 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -388,6 +388,12 @@ struct tcp_sock {
> >         u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
> >                                          * values defined in uapi/linux/tcp.h
> >                                          */
> > +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
> > +                                         * bpf_setsockopt(TCP_CONGESTION),
> > +                                         * it is to avoid the bpf_tcp_cc->init()
> > +                                         * to recur itself by calling
> > +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
> > +                                         */
> >  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
> >  #else
> >  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 96f2f7a65e65..ac4c45c02da5 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
> >  static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
> >                                       int *optlen, bool getopt)
> >  {
> > +       struct tcp_sock *tp;
> > +       int ret;
> > +
> >         if (*optlen < 2)
> >                 return -EINVAL;
> >
> > @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
> >         if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
> >                 return -ENOTSUPP;
> >
> > -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
> > +       /* It stops this looping
> > +        *
> > +        * .init => bpf_setsockopt(tcp_cc) => .init =>
> > +        * bpf_setsockopt(tcp_cc)" => .init => ....
> > +        *
> > +        * The second bpf_setsockopt(tcp_cc) is not allowed
> > +        * in order to break the loop when both .init
> > +        * are the same bpf prog.
> > +        *
> > +        * This applies even the second bpf_setsockopt(tcp_cc)
> > +        * does not cause a loop.  This limits only the first
> > +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
> > +        * pick a fallback cc (eg. peer does not support ECN)
> > +        * and the second '.init' cannot fallback to
> > +        * another.
> > +        */
> > +       tp = tcp_sk(sk);
> > +       if (tp->bpf_chg_cc_inprogress)
> > +               return -EBUSY;
> > +
> > +       tp->bpf_chg_cc_inprogress = 1;
> > +       ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
> >                                 KERNEL_SOCKPTR(optval), *optlen);
> > +       tp->bpf_chg_cc_inprogress = 0;
> > +       return ret;
>
> Eric,
>
> Could you please ack this patch?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
       [not found] ` <20220923224518.2353383-1-kafai@fb.com>
  2022-09-27  3:34   ` [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Alexei Starovoitov
@ 2022-09-29  2:04   ` Eric Dumazet
  2022-09-29  5:31     ` Martin KaFai Lau
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2022-09-29  2:04 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Jakub Kicinski, kernel-team, Paolo Abeni,
	Martin KaFai Lau

On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> From: Martin KaFai Lau <martin.lau@kernel.org>
>
> When a bad bpf prog '.init' calls
> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
>
> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
> ... => .init => bpf_setsockopt(tcp_cc).
>
> It was prevented by the prog->active counter before but the prog->active
> detection cannot be used in struct_ops as explained in the earlier
> patch of the set.
>
> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
> in order to break the loop.  This is done by using a bit of
> an existing 1 byte hole in tcp_sock to check if there is
> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
>
> Note that this essentially limits only the first '.init' can
> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
> does not support ECN) and the second '.init' cannot fallback to
> another cc.  This applies even the second
> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
>
> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> ---
>  include/linux/tcp.h |  6 ++++++
>  net/core/filter.c   | 28 +++++++++++++++++++++++++++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index a9fbe22732c3..3bdf687e2fb3 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -388,6 +388,12 @@ struct tcp_sock {
>         u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>                                          * values defined in uapi/linux/tcp.h
>                                          */
> +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
> +                                         * bpf_setsockopt(TCP_CONGESTION),
> +                                         * it is to avoid the bpf_tcp_cc->init()
> +                                         * to recur itself by calling
> +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
> +                                         */
>  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>  #else
>  #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 96f2f7a65e65..ac4c45c02da5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
>  static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>                                       int *optlen, bool getopt)
>  {
> +       struct tcp_sock *tp;
> +       int ret;
> +
>         if (*optlen < 2)
>                 return -EINVAL;
>
> @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>         if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
>                 return -ENOTSUPP;
>
> -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
> +       /* It stops this looping
> +        *
> +        * .init => bpf_setsockopt(tcp_cc) => .init =>
> +        * bpf_setsockopt(tcp_cc)" => .init => ....
> +        *
> +        * The second bpf_setsockopt(tcp_cc) is not allowed
> +        * in order to break the loop when both .init
> +        * are the same bpf prog.
> +        *
> +        * This applies even the second bpf_setsockopt(tcp_cc)
> +        * does not cause a loop.  This limits only the first
> +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
> +        * pick a fallback cc (eg. peer does not support ECN)
> +        * and the second '.init' cannot fallback to
> +        * another.
> +        */
> +       tp = tcp_sk(sk);
> +       if (tp->bpf_chg_cc_inprogress)
> +               return -EBUSY;
> +

Is the socket locked (and owned by current thread) at this point ?
If not, changing bpf_chg_cc_inprogress would be racy.


> +       tp->bpf_chg_cc_inprogress = 1;
> +       ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
>                                 KERNEL_SOCKPTR(optval), *optlen);
> +       tp->bpf_chg_cc_inprogress = 0;
> +       return ret;
>  }
>
>  static int sol_tcp_sockopt(struct sock *sk, int optname,
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
  2022-09-29  2:04   ` Eric Dumazet
@ 2022-09-29  5:31     ` Martin KaFai Lau
  2022-09-29  5:37       ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2022-09-29  5:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Jakub Kicinski, kernel-team, Paolo Abeni,
	Martin KaFai Lau

On 9/28/22 7:04 PM, Eric Dumazet wrote:
> On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
>>
>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>
>> When a bad bpf prog '.init' calls
>> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
>>
>> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
>> ... => .init => bpf_setsockopt(tcp_cc).
>>
>> It was prevented by the prog->active counter before but the prog->active
>> detection cannot be used in struct_ops as explained in the earlier
>> patch of the set.
>>
>> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
>> in order to break the loop.  This is done by using a bit of
>> an existing 1 byte hole in tcp_sock to check if there is
>> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
>>
>> Note that this essentially limits only the first '.init' can
>> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
>> does not support ECN) and the second '.init' cannot fallback to
>> another cc.  This applies even the second
>> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
>>
>> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
>> ---
>>   include/linux/tcp.h |  6 ++++++
>>   net/core/filter.c   | 28 +++++++++++++++++++++++++++-
>>   2 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index a9fbe22732c3..3bdf687e2fb3 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -388,6 +388,12 @@ struct tcp_sock {
>>          u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>>                                           * values defined in uapi/linux/tcp.h
>>                                           */
>> +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
>> +                                         * bpf_setsockopt(TCP_CONGESTION),
>> +                                         * it is to avoid the bpf_tcp_cc->init()
>> +                                         * to recur itself by calling
>> +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
>> +                                         */
>>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>>   #else
>>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 96f2f7a65e65..ac4c45c02da5 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
>>   static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>>                                        int *optlen, bool getopt)
>>   {
>> +       struct tcp_sock *tp;
>> +       int ret;
>> +
>>          if (*optlen < 2)
>>                  return -EINVAL;
>>
>> @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>>          if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
>>                  return -ENOTSUPP;
>>
>> -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
>> +       /* It stops this looping
>> +        *
>> +        * .init => bpf_setsockopt(tcp_cc) => .init =>
>> +        * bpf_setsockopt(tcp_cc)" => .init => ....
>> +        *
>> +        * The second bpf_setsockopt(tcp_cc) is not allowed
>> +        * in order to break the loop when both .init
>> +        * are the same bpf prog.
>> +        *
>> +        * This applies even the second bpf_setsockopt(tcp_cc)
>> +        * does not cause a loop.  This limits only the first
>> +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
>> +        * pick a fallback cc (eg. peer does not support ECN)
>> +        * and the second '.init' cannot fallback to
>> +        * another.
>> +        */
>> +       tp = tcp_sk(sk);
>> +       if (tp->bpf_chg_cc_inprogress)
>> +               return -EBUSY;
>> +
> 
> Is the socket locked (and owned by current thread) at this point ?
> If not, changing bpf_chg_cc_inprogress would be racy.

Yes, the socket is locked and owned.  There is a sock_owned_by_me check earlier 
in _bpf_setsockopt().

> 
> 
>> +       tp->bpf_chg_cc_inprogress = 1;
>> +       ret = do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
>>                                  KERNEL_SOCKPTR(optval), *optlen);
>> +       tp->bpf_chg_cc_inprogress = 0;
>> +       return ret;
>>   }
>>
>>   static int sol_tcp_sockopt(struct sock *sk, int optname,
>> --
>> 2.30.2
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
  2022-09-29  5:31     ` Martin KaFai Lau
@ 2022-09-29  5:37       ` Eric Dumazet
  2022-09-29  6:17         ` Martin KaFai Lau
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2022-09-29  5:37 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Jakub Kicinski, kernel-team, Paolo Abeni,
	Martin KaFai Lau

On Wed, Sep 28, 2022 at 10:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 9/28/22 7:04 PM, Eric Dumazet wrote:
> > On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >>
> >> From: Martin KaFai Lau <martin.lau@kernel.org>
> >>
> >> When a bad bpf prog '.init' calls
> >> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
> >>
> >> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
> >> ... => .init => bpf_setsockopt(tcp_cc).
> >>
> >> It was prevented by the prog->active counter before but the prog->active
> >> detection cannot be used in struct_ops as explained in the earlier
> >> patch of the set.
> >>
> >> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
> >> in order to break the loop.  This is done by using a bit of
> >> an existing 1 byte hole in tcp_sock to check if there is
> >> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
> >>
> >> Note that this essentially limits only the first '.init' can
> >> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
> >> does not support ECN) and the second '.init' cannot fallback to
> >> another cc.  This applies even the second
> >> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
> >>
> >> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
> >> ---
> >>   include/linux/tcp.h |  6 ++++++
> >>   net/core/filter.c   | 28 +++++++++++++++++++++++++++-
> >>   2 files changed, 33 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> >> index a9fbe22732c3..3bdf687e2fb3 100644
> >> --- a/include/linux/tcp.h
> >> +++ b/include/linux/tcp.h
> >> @@ -388,6 +388,12 @@ struct tcp_sock {
> >>          u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
> >>                                           * values defined in uapi/linux/tcp.h
> >>                                           */
> >> +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
> >> +                                         * bpf_setsockopt(TCP_CONGESTION),
> >> +                                         * it is to avoid the bpf_tcp_cc->init()
> >> +                                         * to recur itself by calling
> >> +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
> >> +                                         */
> >>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
> >>   #else
> >>   #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
> >> diff --git a/net/core/filter.c b/net/core/filter.c
> >> index 96f2f7a65e65..ac4c45c02da5 100644
> >> --- a/net/core/filter.c
> >> +++ b/net/core/filter.c
> >> @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
> >>   static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
> >>                                        int *optlen, bool getopt)
> >>   {
> >> +       struct tcp_sock *tp;
> >> +       int ret;
> >> +
> >>          if (*optlen < 2)
> >>                  return -EINVAL;
> >>
> >> @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
> >>          if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
> >>                  return -ENOTSUPP;
> >>
> >> -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
> >> +       /* It stops this looping
> >> +        *
> >> +        * .init => bpf_setsockopt(tcp_cc) => .init =>
> >> +        * bpf_setsockopt(tcp_cc)" => .init => ....
> >> +        *
> >> +        * The second bpf_setsockopt(tcp_cc) is not allowed
> >> +        * in order to break the loop when both .init
> >> +        * are the same bpf prog.
> >> +        *
> >> +        * This applies even the second bpf_setsockopt(tcp_cc)
> >> +        * does not cause a loop.  This limits only the first
> >> +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
> >> +        * pick a fallback cc (eg. peer does not support ECN)
> >> +        * and the second '.init' cannot fallback to
> >> +        * another.
> >> +        */
> >> +       tp = tcp_sk(sk);
> >> +       if (tp->bpf_chg_cc_inprogress)
> >> +               return -EBUSY;
> >> +
> >
> > Is the socket locked (and owned by current thread) at this point ?
> > If not, changing bpf_chg_cc_inprogress would be racy.
>
> Yes, the socket is locked and owned.  There is a sock_owned_by_me check earlier
> in _bpf_setsockopt().

Good to know. Note a listener can be cloned without socket lock being held.

In order to avoid surprises, I would clear bpf_chg_cc_inprogress in
tcp_create_openreq_child()

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself
  2022-09-29  5:37       ` Eric Dumazet
@ 2022-09-29  6:17         ` Martin KaFai Lau
  0 siblings, 0 replies; 7+ messages in thread
From: Martin KaFai Lau @ 2022-09-29  6:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	David Miller, Jakub Kicinski, kernel-team, Paolo Abeni

On 9/28/22 10:37 PM, Eric Dumazet wrote:
> On Wed, Sep 28, 2022 at 10:31 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 9/28/22 7:04 PM, Eric Dumazet wrote:
>>> On Fri, Sep 23, 2022 at 3:48 PM Martin KaFai Lau <kafai@fb.com> wrote:
>>>>
>>>> From: Martin KaFai Lau <martin.lau@kernel.org>
>>>>
>>>> When a bad bpf prog '.init' calls
>>>> bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:
>>>>
>>>> .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
>>>> ... => .init => bpf_setsockopt(tcp_cc).
>>>>
>>>> It was prevented by the prog->active counter before but the prog->active
>>>> detection cannot be used in struct_ops as explained in the earlier
>>>> patch of the set.
>>>>
>>>> In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
>>>> in order to break the loop.  This is done by using a bit of
>>>> an existing 1 byte hole in tcp_sock to check if there is
>>>> on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.
>>>>
>>>> Note that this essentially limits only the first '.init' can
>>>> call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
>>>> does not support ECN) and the second '.init' cannot fallback to
>>>> another cc.  This applies even the second
>>>> bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
>>>>
>>>> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
>>>> ---
>>>>    include/linux/tcp.h |  6 ++++++
>>>>    net/core/filter.c   | 28 +++++++++++++++++++++++++++-
>>>>    2 files changed, 33 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>>>> index a9fbe22732c3..3bdf687e2fb3 100644
>>>> --- a/include/linux/tcp.h
>>>> +++ b/include/linux/tcp.h
>>>> @@ -388,6 +388,12 @@ struct tcp_sock {
>>>>           u8      bpf_sock_ops_cb_flags;  /* Control calling BPF programs
>>>>                                            * values defined in uapi/linux/tcp.h
>>>>                                            */
>>>> +       u8      bpf_chg_cc_inprogress:1; /* In the middle of
>>>> +                                         * bpf_setsockopt(TCP_CONGESTION),
>>>> +                                         * it is to avoid the bpf_tcp_cc->init()
>>>> +                                         * to recur itself by calling
>>>> +                                         * bpf_setsockopt(TCP_CONGESTION, "itself").
>>>> +                                         */
>>>>    #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
>>>>    #else
>>>>    #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index 96f2f7a65e65..ac4c45c02da5 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -5105,6 +5105,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname,
>>>>    static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>>>>                                         int *optlen, bool getopt)
>>>>    {
>>>> +       struct tcp_sock *tp;
>>>> +       int ret;
>>>> +
>>>>           if (*optlen < 2)
>>>>                   return -EINVAL;
>>>>
>>>> @@ -5125,8 +5128,31 @@ static int sol_tcp_sockopt_congestion(struct sock *sk, char *optval,
>>>>           if (*optlen >= sizeof("cdg") - 1 && !strncmp("cdg", optval, *optlen))
>>>>                   return -ENOTSUPP;
>>>>
>>>> -       return do_tcp_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
>>>> +       /* It stops this looping
>>>> +        *
>>>> +        * .init => bpf_setsockopt(tcp_cc) => .init =>
>>>> +        * bpf_setsockopt(tcp_cc)" => .init => ....
>>>> +        *
>>>> +        * The second bpf_setsockopt(tcp_cc) is not allowed
>>>> +        * in order to break the loop when both .init
>>>> +        * are the same bpf prog.
>>>> +        *
>>>> +        * This applies even the second bpf_setsockopt(tcp_cc)
>>>> +        * does not cause a loop.  This limits only the first
>>>> +        * '.init' can call bpf_setsockopt(TCP_CONGESTION) to
>>>> +        * pick a fallback cc (eg. peer does not support ECN)
>>>> +        * and the second '.init' cannot fallback to
>>>> +        * another.
>>>> +        */
>>>> +       tp = tcp_sk(sk);
>>>> +       if (tp->bpf_chg_cc_inprogress)
>>>> +               return -EBUSY;
>>>> +
>>>
>>> Is the socket locked (and owned by current thread) at this point ?
>>> If not, changing bpf_chg_cc_inprogress would be racy.
>>
>> Yes, the socket is locked and owned.  There is a sock_owned_by_me check earlier
>> in _bpf_setsockopt().
> 
> Good to know. Note a listener can be cloned without socket lock being held.
> 
> In order to avoid surprises, I would clear bpf_chg_cc_inprogress in
> tcp_create_openreq_child()

Ah, make sense.  I will re-spin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-29  6:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-23 22:44 [PATCH v2 bpf-next 0/5] bpf: Remove recursion check for struct_ops prog Martin KaFai Lau
     [not found] ` <20220923224518.2353383-1-kafai@fb.com>
2022-09-27  3:34   ` [PATCH v2 bpf-next 4/5] bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself Alexei Starovoitov
2022-09-29  1:12     ` Alexei Starovoitov
2022-09-29  2:04   ` Eric Dumazet
2022-09-29  5:31     ` Martin KaFai Lau
2022-09-29  5:37       ` Eric Dumazet
2022-09-29  6:17         ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).