From: Stanislav Fomichev <sdf@google.com>
To: bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org
Subject: [PATCH bpf-next 6/6] bpf: Document EFAULT changes for sockopt
Date: Tue, 18 Apr 2023 15:53:43 -0700 [thread overview]
Message-ID: <20230418225343.553806-7-sdf@google.com> (raw)
In-Reply-To: <20230418225343.553806-1-sdf@google.com>
And add examples for how to correctly handle large optlens.
This is less relevant now when we don't EFAULT anymore, but
that's still the correct thing to do.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
Documentation/bpf/prog_cgroup_sockopt.rst | 64 +++++++++++++++++++++--
1 file changed, 60 insertions(+), 4 deletions(-)
diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst
index 172f957204bf..dcb8d4681257 100644
--- a/Documentation/bpf/prog_cgroup_sockopt.rst
+++ b/Documentation/bpf/prog_cgroup_sockopt.rst
@@ -29,7 +29,7 @@ chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
Note, that ``optlen`` can not be increased beyond the user-supplied
value. It can only be decreased or set to -1. Any other value will
-trigger ``EFAULT``.
+ignore the BPF program's changes to ``optlen``/``optval``.
Return Type
-----------
@@ -45,13 +45,14 @@ sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
if it's interested in whatever kernel has returned. BPF hook can override
the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
has been increased above initial ``getsockopt`` value (i.e. userspace
-buffer is too small), ``EFAULT`` is returned.
+buffer is too small), BPF program's ``optlen`` and ``optval`` are
+ignored.
This hook has access to the cgroup and socket local storage.
Note, that the only acceptable value to set to ``retval`` is 0 and the
original value that the kernel returned. Any other value will trigger
-``EFAULT``.
+ignore BPF program's changes.
Return Type
-----------
@@ -98,10 +99,65 @@ When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
indicates that the kernel should use BPF's trimmed ``optval``.
When the BPF program returns with the ``optlen`` greater than
-``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
+``PAGE_SIZE``, the userspace will receive original kernel
+buffers without any modifications that the BPF program might have
+applied.
Example
=======
+Recommended way to handle BPF programs is as follows:
+
+```
+SEC("cgroup/getsockopt")
+int getsockopt(struct bpf_sockopt *ctx)
+{
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > 4096)
+ ctx->optlen = 0;
+
+ return 1;
+}
+
+SEC("cgroup/setsockopt")
+int setsockopt(struct bpf_sockopt *ctx)
+{
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ /* do something */
+ ctx->optlen = -1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ optval[0] = ...;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > 4096)
+ ctx->optlen = 0;
+
+ return 1;
+}
+```
+
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
of BPF program that handles socket options.
--
2.40.0.634.g4ca3ef3211-goog
next prev parent reply other threads:[~2023-04-18 22:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 22:53 [PATCH bpf-next 0/6] bpf: handle another corner case in getsockopt Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 1/6] bpf: Don't EFAULT for getsockopt with optval=NULL Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 2/6] selftests/bpf: Verify optval=NULL case Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 3/6] bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen Stanislav Fomichev
2023-04-21 15:24 ` Daniel Borkmann
2023-04-21 16:09 ` Stanislav Fomichev
2023-04-25 0:56 ` Martin KaFai Lau
2023-04-25 17:12 ` Stanislav Fomichev
2023-04-25 18:31 ` Martin KaFai Lau
2023-04-26 17:27 ` Stanislav Fomichev
2023-04-26 18:07 ` Martin KaFai Lau
2023-04-18 22:53 ` [PATCH bpf-next 4/6] selftests/bpf: Update EFAULT {g,s}etsockopt selftests Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 5/6] selftests/bpf: Correctly handle optlen > 4096 Stanislav Fomichev
2023-04-18 22:53 ` Stanislav Fomichev [this message]
2023-04-19 20:08 ` [PATCH bpf-next 6/6] bpf: Document EFAULT changes for sockopt kernel test robot
2023-04-20 18:17 ` Stanislav Fomichev
2023-04-21 15:20 ` [PATCH bpf-next 0/6] bpf: handle another corner case in getsockopt patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230418225343.553806-7-sdf@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.