All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Raman Shukhau <ramasha@fb.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
	daniel@iogearbox.net
Subject: Re: [PATCH bpf-next 1/1] Fix for bpf_sysctl_set_new_value
Date: Tue, 7 May 2024 16:20:36 -0700	[thread overview]
Message-ID: <ca8136e0-5d2a-402b-ad03-cc8a218affd4@linux.dev> (raw)
In-Reply-To: <20240504102312.3137741-2-ramasha@fb.com>

On 5/4/24 3:23 AM, Raman Shukhau wrote:
> Noticed that call to bpf_sysctl_set_new_value doesn't change final value
> of the parameter, when called from cgroup/syscall bpf handler. No error
> thrown in this case, new value is simply ignored and original value, sent
> to sysctl, is set. Example (see test added to this change for BPF handler
> logic):
> 
> sysctl -w net.ipv4.ip_local_reserved_ports = 11111
> ... cgroup/syscal handler call bpf_sysctl_set_new_value	and set 22222
> sysctl net.ipv4.ip_local_reserved_ports
> ... returns 11111
> 
> On investigation I found 2 things that needs to be changed:
> * return value check
> * new_len provided by bpf back to sysctl. proc_sys_call_handler	expects
>    this value NOT to include \0 symbol, e.g. if user do

Thanks for the report and the patch.

This patch is changing a few things (1 fix, 1 improvement, 1 test).

Separate these individual changes into its own patch. Patch 1 fixes the return 
value. Patch 2 improves the '\0' and *pcount situation. Patch 3 adds the test.

btw, I am curious what is missed in the test_sysctl.c that didn't catch the 
return value case?

> 
> 	```
>    open("/proc/sys/net/ipv4/ip_local_reserved_ports", ...)
>    write(fd, "11111", sizeof("22222"))
>    ```
> 
>    or `echo -n "11111" > /proc/sys/net/ipv4/ip_local_reserved_ports`
> 
>    or `sysctl -w	net.ipv4.ip_local_reserved_ports=11111
> 
>    proc_sys_call_handler receives count equal to `5`. To make it consistent
>    with bpf_sysctl_set_new_value, this change also adjust `new_len` with
>    `-1`, if `\0` passed as last character. Alternatively, using
>    `sizeof("11111") - 1` in BPF handler should work, but it might not be
>    obvious and spark confusion. Note: if incorrect count is used, sysctl
>    returns EINVAL to the user.
> 
> Signed-off-by: Raman Shukhau <ramasha@fb.com>
> ---
>   kernel/bpf/cgroup.c                           |  7 ++-
>   .../bpf/progs/test_sysctl_overwrite.c         | 47 +++++++++++++++++++
>   tools/testing/selftests/bpf/test_sysctl.c     | 35 +++++++++++++-
>   3 files changed, 85 insertions(+), 4 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 8ba73042a239..23736aed1b53 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1739,10 +1739,13 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
>   
>   	kfree(ctx.cur_val);
>   
> -	if (ret == 1 && ctx.new_updated) {
> +	if (ret == 0 && ctx.new_updated) {
>   		kfree(*buf);
>   		*buf = ctx.new_val;
> -		*pcount = ctx.new_len;
> +		if (!(*buf)[ctx.new_len])
> +			*pcount = ctx.new_len - 1;

 From looking at how new_updated is set, my understanding is new_len cannot be 0 
here. just want to double check.


> +		else
> +			*pcount = ctx.new_len;
>   	} else {
>   		kfree(ctx.new_val);
>   	}
> diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c b/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> new file mode 100644
> index 000000000000..e44b429fcfc1
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_sysctl_overwrite.c
> @@ -0,0 +1,47 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Facebook
> +
> +#include <string.h>
> +
> +#include <linux/bpf.h>
> +
> +#include <bpf/bpf_helpers.h>
> +
> +#include "bpf_compiler.h"
> +
> +static const char sysctl_value[] = "31337";
> +static const char sysctl_name[] = "net/ipv4/ip_local_reserved_ports";
> +static __always_inline int is_expected_name(struct bpf_sysctl *ctx)
> +{
> +	unsigned char i;
> +	char name[sizeof(sysctl_name)];
> +	int ret;
> +
> +	memset(name, 0, sizeof(name));
> +	ret = bpf_sysctl_get_name(ctx, name, sizeof(name), 0);
> +	if (ret < 0 || ret != sizeof(sysctl_name) - 1)
> +		return 0;
> +
> +	__pragma_loop_unroll_full
> +	for (i = 0; i < sizeof(sysctl_name); ++i)
> +		if (name[i] != sysctl_name[i])

bpf_strncmp() should be useful here.

> +			return 0;
> +
> +	return 1;
> +}
> +
> +SEC("cgroup/sysctl")
> +int test_value_overwrite(struct bpf_sysctl *ctx)
> +{
> +	if (!ctx->write)
> +		return 1;
> +
> +	if (!is_expected_name(ctx))
> +		return 0;
> +
> +	if (bpf_sysctl_set_new_value(ctx, sysctl_value, sizeof(sysctl_value)) == 0)
> +		return 1;
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_sysctl.c b/tools/testing/selftests/bpf/test_sysctl.c
> index bcdbd27f22f0..dfa479861d3a 100644
> --- a/tools/testing/selftests/bpf/test_sysctl.c
> +++ b/tools/testing/selftests/bpf/test_sysctl.c
> @@ -35,6 +35,7 @@ struct sysctl_test {
>   	int seek;
>   	const char *newval;
>   	const char *oldval;
> +	const char *updval;
>   	enum {
>   		LOAD_REJECT,
>   		ATTACH_REJECT,
> @@ -1395,6 +1396,16 @@ static struct sysctl_test tests[] = {
>   		.open_flags = O_RDONLY,
>   		.result = SUCCESS,
>   	},
> +	{
> +		"C prog: override write to ip_local_reserved_ports",
> +		.prog_file = "./test_sysctl_overwrite.bpf.o",

test_sysctl.c is not run in bpf CI. It is not very useful to extend this test 
further. Lets take this chance to create a new progs/cgrp_sysctl.c test that 
will be exercised by ./test_progs in bpf CI. Then it can use the newer skel 
open_and_load also.

Not asking to to migrate the existing tests in test_sysctl.c to the new 
progs/cgrp_sysctl.c in this patch set. The new cgrp_sysctl.c can only have the 
tests that exercise the changes in this patch set. However, it will be useful if 
progs/cgrp_sysctl.c can be bootstrapped in a way that the future test_sysctl.c 
migration will be easier. I also wouldn't worry too much on the existing raw 
insns tests in test_sysctl.c for now. They will need to be moved to either C or 
bpf asm in the future.

pw-bot: cr

> +		.attach_type = BPF_CGROUP_SYSCTL,
> +		.sysctl = "net/ipv4/ip_local_reserved_ports",
> +		.open_flags = O_RDWR,
> +		.newval = "11111",
> +		.updval = "31337",
> +		.result = SUCCESS,
> +	},
>   };
>   
>   static size_t probe_prog_length(const struct bpf_insn *fp)
> @@ -1520,13 +1531,33 @@ static int access_sysctl(const char *sysctl_path,
>   			log_err("Read value %s != %s", buf, test->oldval);
>   			goto err;
>   		}
> -	} else if (test->open_flags == O_WRONLY) {
> +	} else if (test->open_flags == O_WRONLY || test->open_flags == O_RDWR) {
>   		if (!test->newval) {
>   			log_err("New value for sysctl is not set");
>   			goto err;
>   		}
> -		if (write(fd, test->newval, strlen(test->newval)) == -1)
> +		if (write(fd, test->newval, strlen(test->newval)) == -1) {
> +			log_err("Unable to write sysctl value");
>   			goto err;
> +		}
> +		if (test->open_flags == O_RDWR) {
> +			char buf[128];
> +
> +			if (!test->updval) {
> +				log_err("Expected value for sysctl is not set");
> +				goto err;
> +			}
> +
> +			lseek(fd, 0, SEEK_SET);
> +			if (read(fd, buf, sizeof(buf)) == -1) {
> +				log_err("Unable to read updated value");
> +				goto err;
> +			}
> +			if (strncmp(buf, test->updval, strlen(test->updval))) {
> +				log_err("Overwritten value %s != %s", buf, test->updval);
> +				goto err;
> +			}
> +		}
>   	} else {
>   		log_err("Unexpected sysctl access: neither read nor write");
>   		goto err;


  reply	other threads:[~2024-05-07 23:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-04 10:23 [PATCH bpf-next 0/1] Fix for bpf_sysctl_set_new_value Raman Shukhau
2024-05-04 10:23 ` [PATCH bpf-next 1/1] " Raman Shukhau
2024-05-07 23:20   ` Martin KaFai Lau [this message]
2024-05-16 21:16     ` Raman Shukhau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca8136e0-5d2a-402b-ad03-cc8a218affd4@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=ramasha@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.