All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Geliang Tang <geliang.tang@suse.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,  Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,  Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	bpf@vger.kernel.org,  mptcp@lists.linux.dev
Subject: Re: [RFC bpf-next 0/8] BPF 'force to MPTCP'
Date: Thu, 6 Jul 2023 10:02:43 -0700	[thread overview]
Message-ID: <ZKbzs7foUw+eeNnn@google.com> (raw)
In-Reply-To: <cover.1688616142.git.geliang.tang@suse.com>

On 07/06, Geliang Tang wrote:
> As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
> 
> "Your app can create sockets with IPPROTO_MPTCP as the proto:
> ( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
> forced to create and use MPTCP sockets instead of TCP ones via the
> mptcpize command bundled with the mptcpd daemon."
> 
> But the mptcpize (LD_PRELOAD technique) command has some limitations
> [2]:
> 
>  - it doesn't work if the application is not using libc (e.g. GoLang
> apps)
>  - in some envs, it might not be easy to set env vars / change the way
> apps are launched, e.g. on Android
>  - mptcpize needs to be launched with all apps that want MPTCP: we could
> have more control from BPF to enable MPTCP only for some apps or all the
> ones of a netns or a cgroup, etc.
>  - it is not in BPF, we cannot talk about it at netdev conf.
> 
> So this patchset attempts to use BPF to implement functions similer to
> mptcpize.
> 
> The main idea is add a hook in sys_socket() to change the protocol id
> from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
> 
> 1. This first solution [3] is using "cgroup/sock_create":
> 
> Implement a new helper bpf_mptcpify() to change the protocol id:
> 
> +BPF_CALL_1(bpf_mptcpify, struct sock *, sk)
> +{
> +	if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP) {
> +		sk->sk_protocol = IPPROTO_MPTCP;
> +		return (unsigned long)sk;
> +	}
> +
> +	return (unsigned long)NULL;
> +}
> +
> +const struct bpf_func_proto bpf_mptcpify_proto = {
> +	.func		= bpf_mptcpify,
> +	.gpl_only	= false,
> +	.ret_type	= RET_PTR_TO_BTF_ID_OR_NULL,
> +	.ret_btf_id	= &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +};
> 
> Add a hook in "cgroup/sock_create" section, invoking bpf_mptcpify()
> helper in this hook:
> 
> +SEC("cgroup/sock_create")
> +int sock(struct bpf_sock *ctx)
> +{
> +	struct sock *sk;
> +
> +	if (ctx->type != SOCK_STREAM)
> +		return 1;
> +
> +	sk = bpf_mptcpify(ctx);
> +	if (!sk)
> +		return 1;
> +
> +	protocol = sk->sk_protocol;
> +	return 1;
> +}
> 
> But this solution doesn't work, because the sock_create section is
> hooked at the end of inet_create(). It's too late to change the protocol
> id.
> 
> 2. The second solution [4] is using "fentry":
> 
> Implement a bpf_mptcpify() helper:
> 
> +BPF_CALL_1(bpf_mptcpify, struct socket_args *, args)
> +{
> +	if (args->family == AF_INET &&
> +	    args->type == SOCK_STREAM &&
> +	    (!args->protocol || args->protocol == IPPROTO_TCP))
> +		args->protocol = IPPROTO_MPTCP;
> +
> +	return 0;
> +}
> +
> +BTF_ID_LIST(bpf_mptcpify_btf_ids)
> +BTF_ID(struct, socket_args)
> +
> +static const struct bpf_func_proto bpf_mptcpify_proto = {
> +	.func		= bpf_mptcpify,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_BTF_ID,
> +	.arg1_btf_id	= &bpf_mptcpify_btf_ids[0],
> +};
> 
> Add a new wrapper socket_create() in sys_socket() path, passing a
> pointer of struct socket_args (int family; int type; int protocol) to
> the wrapper.
> 
> +int socket_create(struct socket_args *args, struct socket **res)
> +{
> +	return sock_create(args->family, args->type, args->protocol, res);
> +}
> +EXPORT_SYMBOL(socket_create);
> +
>  /**
>   *	sock_create_kern - creates a socket (kernel space)
>   *	@net: net namespace
> @@ -1608,6 +1614,7 @@  EXPORT_SYMBOL(sock_create_kern);
>  
>  static struct socket *__sys_socket_create(int family, int type, int protocol)
>  {
> +	struct socket_args args = { 0 };
>  	struct socket *sock;
>  	int retval;
>  
> @@ -1621,7 +1628,10 @@  static struct socket *__sys_socket_create(int family, int type, int protocol)
>  		return ERR_PTR(-EINVAL);
>  	type &= SOCK_TYPE_MASK;
>  
> -	retval = sock_create(family, type, protocol, &sock);
> +	args.family = family;
> +	args.type = type;
> +	args.protocol = protocol;
> +	retval = socket_create(&args, &sock);
>  	if (retval < 0)
>  		return ERR_PTR(retval);
> 
> Add "fentry" hook to the newly added wrapper, invoking bpf_mptcpify()
> in the hook:
> 
> +SEC("fentry/socket_create")
> +int BPF_PROG(trace_socket_create, void *args,
> +		struct socket **res)
> +{
> +	bpf_mptcpify(args);
> +	return 0;
> +}
> 
> This version works. But it's just a work around version. Since the code
> to add a wrapper to sys_socket() is not very elegant indeed, and it
> shouldn't be accepted by upstream.
> 
> 3. The last solution is this patchset itself:
> 
> Introduce new program type BPF_PROG_TYPE_CGROUP_SOCKINIT and attach type
> BPF_CGROUP_SOCKINIT on cgroup basis.
> 
> Define BPF_CGROUP_RUN_PROG_SOCKINIT() helper, and implement
> __cgroup_bpf_run_sockinit() helper to run a sockinit program:
> 
> +#define BPF_CGROUP_RUN_PROG_SOCKINIT(family, type, protocol)		       \
> +({									       \
> +	int __ret = 0;							       \
> +	if (cgroup_bpf_enabled(CGROUP_SOCKINIT))			       \
> +		__ret = __cgroup_bpf_run_sockinit(family, type, protocol,      \
> +						  CGROUP_SOCKINIT);	       \
> +	__ret;								       \
> +})
> +
> 
> Invoke BPF_CGROUP_RUN_PROG_SOCKINIT() in __socket_create() to change
> the arguments:
> 
> @@ -1469,6 +1469,12 @@  int __sock_create(struct net *net, int family, int type, int protocol,
>  	struct socket *sock;
>  	const struct net_proto_family *pf;
>  
> +	if (!kern) {
> +		err = BPF_CGROUP_RUN_PROG_SOCKINIT(&family, &type, &protocol);
> +		if (err)
> +			return err;
> +	}
> +
>  	/*
>  	 *      Check protocol is in range
>  	 */
> 
> Define BPF program in this newly added 'sockinit' SEC, so it will be
> hooked in BPF_CGROUP_RUN_PROG_SOCKINIT() in __socket_create().
> 
> @@ -158,6 +158,11 @@  static struct sec_name_test tests[] = {
>  		{0, BPF_PROG_TYPE_CGROUP_SOCKOPT, BPF_CGROUP_SETSOCKOPT},
>  		{0, BPF_CGROUP_SETSOCKOPT},
>  	},
> +	{
> +		"cgroup/sockinit",
> +		{0, BPF_PROG_TYPE_CGROUP_SOCKINIT, BPF_CGROUP_SOCKINIT},
> +		{0, BPF_CGROUP_SOCKINIT},
> +	},
>  };
> 
> +SEC("cgroup/sockinit")
> +int mptcpify(struct bpf_sockinit_ctx *ctx)
> +{
> +	if ((ctx->family == AF_INET || ctx->family == AF_INET6) &&
> +	    ctx->type == SOCK_STREAM &&
> +	    (!ctx->protocol || ctx->protocol == IPPROTO_TCP)) {
> +		ctx->protocol = IPPROTO_MPTCP;
> +	}
> +
> +	return 1;
> +}
> 
> This version is the best solution we have found so far.
> 
> [1]
> https://github.com/multipath-tcp/mptcp_net-next/wiki
> [2]
> https://github.com/multipath-tcp/mptcp_net-next/issues/79
> [3]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1688215769.git.geliang.tang@suse.com/
> [4]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1688366249.git.geliang.tang@suse.com/

cgroup/sock_create being weird and triggering late and only for af_inet4/6
was the reason we added ability to attach to lsm hooks on a per-cgroup
basis:
https://lore.kernel.org/bpf/20220628174314.1216643-1-sdf@google.com/

Unfortunately, using it here won't help either :-( The following
hook triggers early enough but doesn't allow changing the arguments (I was
interested only in filtering based on the arguments, not changing them):

LSM_HOOK(int, 0, socket_create, int family, int type, int protocol, int kern)

So maybe another alternative might be to change its definition to int *family,
int *type, int *protocol and use lsm_cgroup/socket_create to rewrite the
protocol? (some verifier changes might needed to make those writable)

  parent reply	other threads:[~2023-07-06 17:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06  4:08 [RFC bpf-next 0/8] BPF 'force to MPTCP' Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 1/8] bpf: Add new prog type sockinit Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 2/8] bpf: Run a sockinit program Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 3/8] net: socket: run sockinit hooks Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 4/8] libbpf: Support sockinit hook Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 5/8] selftests/bpf: Add mptcpify program Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 6/8] selftests/bpf: use random netns name for mptcp Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 7/8] selftests/bpf: add two mptcp netns helpers Geliang Tang
2023-07-06  4:08 ` [RFC bpf-next 8/8] selftests/bpf: Add mptcpify selftest Geliang Tang
2023-07-06 17:02 ` Stanislav Fomichev [this message]
2023-07-13  5:47   ` [RFC bpf-next 0/8] BPF 'force to MPTCP' Geliang Tang
2023-07-13 17:14     ` Stanislav Fomichev
2023-07-13 22:46       ` Alexei Starovoitov
2023-07-13 23:09         ` Stanislav Fomichev
2023-07-14  7:56           ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKbzs7foUw+eeNnn@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=geliang.tang@suse.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mptcp@lists.linux.dev \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.