All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Geliang Tang <geliang.tang@suse.com>,
	Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,  Yonghong Song <yhs@fb.com>,
	KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
	 Jiri Olsa <jolsa@kernel.org>, bpf <bpf@vger.kernel.org>,
	 MPTCP Upstream <mptcp@lists.linux.dev>,
	netdev@vger.kernel.org
Subject: Re: [RFC bpf-next 0/8] BPF 'force to MPTCP'
Date: Thu, 13 Jul 2023 16:09:34 -0700	[thread overview]
Message-ID: <ZLCELtTGksxGwaFZ@google.com> (raw)
In-Reply-To: <CAADnVQ+aPyWea4QUD9TFNpr43u052zuqOXzGaqmM8-EeMrW6rg@mail.gmail.com>

On 07/13, Alexei Starovoitov wrote:
> On Thu, Jul 13, 2023 at 10:14 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 07/13, Geliang Tang wrote:
> > > On Thu, Jul 06, 2023 at 10:02:43AM -0700, Stanislav Fomichev wrote:
> > > > On 07/06, Geliang Tang wrote:
> > > > > As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
> > > > >
> > > > > "Your app can create sockets with IPPROTO_MPTCP as the proto:
> > > > > ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
> > > > > forced to create and use MPTCP sockets instead of TCP ones via the
> > > > > mptcpize command bundled with the mptcpd daemon."
> > > > >
> > > > > But the mptcpize (LD_PRELOAD technique) command has some limitations
> > > > > [2]:
> > > > >
> > > > >  - it doesn't work if the application is not using libc (e.g. GoLang
> > > > > apps)
> > > > >  - in some envs, it might not be easy to set env vars / change the way
> > > > > apps are launched, e.g. on Android
> > > > >  - mptcpize needs to be launched with all apps that want MPTCP: we could
> > > > > have more control from BPF to enable MPTCP only for some apps or all the
> > > > > ones of a netns or a cgroup, etc.
> > > > >  - it is not in BPF, we cannot talk about it at netdev conf.
> > > > >
> > > > > So this patchset attempts to use BPF to implement functions similer to
> > > > > mptcpize.
> > > > >
> > > > > The main idea is add a hook in sys_socket() to change the protocol id
> > > > > from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
> > > > >
> > > > > 1. This first solution [3] is using "cgroup/sock_create":
> > > > >
> > > > > Implement a new helper bpf_mptcpify() to change the protocol id:
> > > > >
> > > > > +BPF_CALL_1(bpf_mptcpify, struct sock *, sk)
> > > > > +{
> > > > > + if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP) {
> > > > > +         sk->sk_protocol = IPPROTO_MPTCP;
> > > > > +         return (unsigned long)sk;
> > > > > + }
> > > > > +
> > > > > + return (unsigned long)NULL;
> > > > > +}
> > > > > +
> > > > > +const struct bpf_func_proto bpf_mptcpify_proto = {
> > > > > + .func           = bpf_mptcpify,
> > > > > + .gpl_only       = false,
> > > > > + .ret_type       = RET_PTR_TO_BTF_ID_OR_NULL,
> > > > > + .ret_btf_id     = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> > > > > + .arg1_type      = ARG_PTR_TO_CTX,
> > > > > +};
> > > > >
> > > > > Add a hook in "cgroup/sock_create" section, invoking bpf_mptcpify()
> > > > > helper in this hook:
> > > > >
> > > > > +SEC("cgroup/sock_create")
> > > > > +int sock(struct bpf_sock *ctx)
> > > > > +{
> > > > > + struct sock *sk;
> > > > > +
> > > > > + if (ctx->type != SOCK_STREAM)
> > > > > +         return 1;
> > > > > +
> > > > > + sk = bpf_mptcpify(ctx);
> > > > > + if (!sk)
> > > > > +         return 1;
> > > > > +
> > > > > + protocol = sk->sk_protocol;
> > > > > + return 1;
> > > > > +}
> > > > >
> > > > > But this solution doesn't work, because the sock_create section is
> > > > > hooked at the end of inet_create(). It's too late to change the protocol
> > > > > id.
> > > > >
> > > > > 2. The second solution [4] is using "fentry":
> > > > >
> > > > > Implement a bpf_mptcpify() helper:
> > > > >
> > > > > +BPF_CALL_1(bpf_mptcpify, struct socket_args *, args)
> > > > > +{
> > > > > + if (args->family == AF_INET &&
> > > > > +     args->type == SOCK_STREAM &&
> > > > > +     (!args->protocol || args->protocol == IPPROTO_TCP))
> > > > > +         args->protocol = IPPROTO_MPTCP;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +BTF_ID_LIST(bpf_mptcpify_btf_ids)
> > > > > +BTF_ID(struct, socket_args)
> > > > > +
> > > > > +static const struct bpf_func_proto bpf_mptcpify_proto = {
> > > > > + .func           = bpf_mptcpify,
> > > > > + .gpl_only       = false,
> > > > > + .ret_type       = RET_INTEGER,
> > > > > + .arg1_type      = ARG_PTR_TO_BTF_ID,
> > > > > + .arg1_btf_id    = &bpf_mptcpify_btf_ids[0],
> > > > > +};
> > > > >
> > > > > Add a new wrapper socket_create() in sys_socket() path, passing a
> > > > > pointer of struct socket_args (int family; int type; int protocol) to
> > > > > the wrapper.
> > > > >
> > > > > +int socket_create(struct socket_args *args, struct socket **res)
> > > > > +{
> > > > > + return sock_create(args->family, args->type, args->protocol, res);
> > > > > +}
> > > > > +EXPORT_SYMBOL(socket_create);
> > > > > +
> > > > >  /**
> > > > >   *       sock_create_kern - creates a socket (kernel space)
> > > > >   *       @net: net namespace
> > > > > @@ -1608,6 +1614,7 @@  EXPORT_SYMBOL(sock_create_kern);
> > > > >
> > > > >  static struct socket *__sys_socket_create(int family, int type, int protocol)
> > > > >  {
> > > > > + struct socket_args args = { 0 };
> > > > >   struct socket *sock;
> > > > >   int retval;
> > > > >
> > > > > @@ -1621,7 +1628,10 @@  static struct socket *__sys_socket_create(int family, int type, int protocol)
> > > > >           return ERR_PTR(-EINVAL);
> > > > >   type &= SOCK_TYPE_MASK;
> > > > >
> > > > > - retval = sock_create(family, type, protocol, &sock);
> > > > > + args.family = family;
> > > > > + args.type = type;
> > > > > + args.protocol = protocol;
> > > > > + retval = socket_create(&args, &sock);
> > > > >   if (retval < 0)
> > > > >           return ERR_PTR(retval);
> > > > >
> > > > > Add "fentry" hook to the newly added wrapper, invoking bpf_mptcpify()
> > > > > in the hook:
> > > > >
> > > > > +SEC("fentry/socket_create")
> > > > > +int BPF_PROG(trace_socket_create, void *args,
> > > > > +         struct socket **res)
> > > > > +{
> > > > > + bpf_mptcpify(args);
> > > > > + return 0;
> > > > > +}
> > > > >
> > > > > This version works. But it's just a work around version. Since the code
> > > > > to add a wrapper to sys_socket() is not very elegant indeed, and it
> > > > > shouldn't be accepted by upstream.
> > > > >
> > > > > 3. The last solution is this patchset itself:
> > > > >
> > > > > Introduce new program type BPF_PROG_TYPE_CGROUP_SOCKINIT and attach type
> > > > > BPF_CGROUP_SOCKINIT on cgroup basis.
> > > > >
> > > > > Define BPF_CGROUP_RUN_PROG_SOCKINIT() helper, and implement
> > > > > __cgroup_bpf_run_sockinit() helper to run a sockinit program:
> > > > >
> > > > > +#define BPF_CGROUP_RUN_PROG_SOCKINIT(family, type, protocol)                    \
> > > > > +({                                                                              \
> > > > > + int __ret = 0;                                                         \
> > > > > + if (cgroup_bpf_enabled(CGROUP_SOCKINIT))                               \
> > > > > +         __ret = __cgroup_bpf_run_sockinit(family, type, protocol,      \
> > > > > +                                           CGROUP_SOCKINIT);            \
> > > > > + __ret;                                                                 \
> > > > > +})
> > > > > +
> > > > >
> > > > > Invoke BPF_CGROUP_RUN_PROG_SOCKINIT() in __socket_create() to change
> > > > > the arguments:
> > > > >
> > > > > @@ -1469,6 +1469,12 @@  int __sock_create(struct net *net, int family, int type, int protocol,
> > > > >   struct socket *sock;
> > > > >   const struct net_proto_family *pf;
> > > > >
> > > > > + if (!kern) {
> > > > > +         err = BPF_CGROUP_RUN_PROG_SOCKINIT(&family, &type, &protocol);
> > > > > +         if (err)
> > > > > +                 return err;
> > > > > + }
> > > > > +
> > > > >   /*
> > > > >    *      Check protocol is in range
> > > > >    */
> > > > >
> > > > > Define BPF program in this newly added 'sockinit' SEC, so it will be
> > > > > hooked in BPF_CGROUP_RUN_PROG_SOCKINIT() in __socket_create().
> > > > >
> > > > > @@ -158,6 +158,11 @@  static struct sec_name_test tests[] = {
> > > > >           {0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
> > > > >           {0, BPF_CGROUP_SETSOCKOPT},
> > > > >   },
> > > > > + {
> > > > > +         "cgroup/sockinit",
> > > > > +         {0, BPF_PROG_TYPE_CGROUP_SOCKINIT, BPF_CGROUP_SOCKINIT},
> > > > > +         {0, BPF_CGROUP_SOCKINIT},
> > > > > + },
> > > > >  };
> > > > >
> > > > > +SEC("cgroup/sockinit")
> > > > > +int mptcpify(struct bpf_sockinit_ctx *ctx)
> > > > > +{
> > > > > + if ((ctx->family == AF_INET || ctx->family == AF_INET6) &&
> > > > > +     ctx->type == SOCK_STREAM &&
> > > > > +     (!ctx->protocol || ctx->protocol == IPPROTO_TCP)) {
> > > > > +         ctx->protocol = IPPROTO_MPTCP;
> > > > > + }
> > > > > +
> > > > > + return 1;
> > > > > +}
> > > > >
> > > > > This version is the best solution we have found so far.
> > > > >
> > > > > [1]
> > > > > https://github.com/multipath-tcp/mptcp_net-next/wiki
> > > > > [2]
> > > > > https://github.com/multipath-tcp/mptcp_net-next/issues/79
> > > > > [3]
> > > > > https://patchwork.kernel.org/project/mptcp/cover/cover.1688215769.git.geliang.tang@suse.com/
> > > > > [4]
> > > > > https://patchwork.kernel.org/project/mptcp/cover/cover.1688366249.git.geliang.tang@suse.com/
> > > >
> > > > cgroup/sock_create being weird and triggering late and only for af_inet4/6
> > > > was the reason we added ability to attach to lsm hooks on a per-cgroup
> > > > basis:
> > > > https://lore.kernel.org/bpf/20220628174314.1216643-1-sdf@google.com/
> > > >
> > > > Unfortunately, using it here won't help either :-( The following
> > > > hook triggers early enough but doesn't allow changing the arguments (I was
> > > > interested only in filtering based on the arguments, not changing them):
> > > >
> > > > LSM_HOOK(int, 0, socket_create, int family, int type, int protocol, int kern)
> > > >
> > > > So maybe another alternative might be to change its definition to int *family,
> > > > int *type, int *protocol and use lsm_cgroup/socket_create to rewrite the
> > >
> > > Thanks Stanislav, this lsm_cgroup/socket_create works. The test patch
> > > is attached.
> > >
> > > > protocol? (some verifier changes might needed to make those writable)
> > >
> > > But I got some verification errors here:
> > >
> > >    invalid mem access 'scalar'.
> > >
> > > I tried many times but couldn't solve it, so I simply skipped the
> > > verifier in the test patch (I marked TODO in front of this code).
> > > Could you please give me some suggestions for verification?
> >
> > Right, that's the core issue here, to add support to the verifier
> > to let it write into scalar pointer arguments. I don't have a good suggestion,
> > but we've discussed it multiple times elsewhere that maybe we
> > should do it :-)
> >
> > Can we use some new btf_type_tag("allow_write") to annotate places
> > where we know that it's safe to write to them? Then we can extend
> > the verifier to check for this condition hopefully. Maybe other BPF
> > folks have better suggestions here?
> >
> > We should also CC LSM people to make sure it's not a no-no to
> > change LSM_HOOK(s) in a way to allow changing the arguments. I'm
> > not sure how rigid those definitions are :-(
> 
> imo all 3 options including this 4th one are too hacky.
> I understand ld_preload limitations and desire to have it per-cgroup,
> but messing this much with user space feels a little bit too much.
> What side effects will it cause?

Maybe all that is really needed is some new per-netns sysctl to automatically
upgrade from IPPROTO_TCP to IPPROTO_MPTCP? Or is it too broad of a
brush?

I've also CC'd netdev for visibility...

> Meaning is this enough to just change the proto?
> Nothing in user space later on needs to be aware the protocol is so different?

IIUC, if you use IPPROTO_MPTCP, you just get regular TCP until you start
adding extra routes (via netlink). That's why their current
unconditional IPPROTO_TCP->IPPROTO_MPTCP rewrite via ld_preload also somewhat
works.

> I feel the consequences are too drastic to support such thing
> through an official/stable hook.
> We can consider an fmod_ret unstable hook somewhere in the kernel
> that bpf prog can attach to and tweak the ret value and/or args,
> but the production environment won't be using it.
> It will be a temporary gap until user space is properly converted to mptcp.

Asking every app to do s/IPPROTO_TCP/IPPROTO_MPTCP/ might be annoying
though? (don't have a horse in this race, but have some v4->v6 migration
vibes from this)

  reply	other threads:[~2023-07-13 23:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06  4:08 [RFC bpf-next 0/8] BPF 'force to MPTCP' Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 1/8] bpf: Add new prog type sockinit Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 2/8] bpf: Run a sockinit program Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 3/8] net: socket: run sockinit hooks Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 4/8] libbpf: Support sockinit hook Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 5/8] selftests/bpf: Add mptcpify program Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 6/8] selftests/bpf: use random netns name for mptcp Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 7/8] selftests/bpf: add two mptcp netns helpers Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 8/8] selftests/bpf: Add mptcpify selftest Geliang Tang
2023-07-06 17:02 ` [RFC bpf-next 0/8] BPF 'force to MPTCP' Stanislav Fomichev
2023-07-13  5:47   ` Geliang Tang
2023-07-13 17:14     ` Stanislav Fomichev
2023-07-13 22:46       ` Alexei Starovoitov
2023-07-13 23:09         ` Stanislav Fomichev [this message]
2023-07-14  7:56           ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZLCELtTGksxGwaFZ@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=geliang.tang@suse.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.