From: Stanislav Fomichev <sdf@google.com>
To: bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org
Subject: [PATCH bpf-next 6/6] bpf: Document EFAULT changes for sockopt
Date: Tue, 18 Apr 2023 15:53:43 -0700 [thread overview]
Message-ID: <20230418225343.553806-7-sdf@google.com> (raw)
In-Reply-To: <20230418225343.553806-1-sdf@google.com>
And add examples for how to correctly handle large optlens.
This is less relevant now when we don't EFAULT anymore, but
that's still the correct thing to do.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
Documentation/bpf/prog_cgroup_sockopt.rst | 64 +++++++++++++++++++++--
1 file changed, 60 insertions(+), 4 deletions(-)
diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst
index 172f957204bf..dcb8d4681257 100644
--- a/Documentation/bpf/prog_cgroup_sockopt.rst
+++ b/Documentation/bpf/prog_cgroup_sockopt.rst
@@ -29,7 +29,7 @@ chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed).
Note, that ``optlen`` can not be increased beyond the user-supplied
value. It can only be decreased or set to -1. Any other value will
-trigger ``EFAULT``.
+ignore the BPF program's changes to ``optlen``/``optval``.
Return Type
-----------
@@ -45,13 +45,14 @@ sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
if it's interested in whatever kernel has returned. BPF hook can override
the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
has been increased above initial ``getsockopt`` value (i.e. userspace
-buffer is too small), ``EFAULT`` is returned.
+buffer is too small), BPF program's ``optlen`` and ``optval`` are
+ignored.
This hook has access to the cgroup and socket local storage.
Note, that the only acceptable value to set to ``retval`` is 0 and the
original value that the kernel returned. Any other value will trigger
-``EFAULT``.
+ignore BPF program's changes.
Return Type
-----------
@@ -98,10 +99,65 @@ When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
indicates that the kernel should use BPF's trimmed ``optval``.
When the BPF program returns with the ``optlen`` greater than
-``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
+``PAGE_SIZE``, the userspace will receive original kernel
+buffers without any modifications that the BPF program might have
+applied.
Example
=======
+Recommended way to handle BPF programs is as follows:
+
+```
+SEC("cgroup/getsockopt")
+int getsockopt(struct bpf_sockopt *ctx)
+{
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ ctx->retval = 0;
+ optval[0] = ...;
+ ctx->optlen = 1;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > 4096)
+ ctx->optlen = 0;
+
+ return 1;
+}
+
+SEC("cgroup/setsockopt")
+int setsockopt(struct bpf_sockopt *ctx)
+{
+ /* Custom socket option. */
+ if (ctx->level == MY_SOL && ctx->optname == MY_OPTNAME) {
+ /* do something */
+ ctx->optlen = -1;
+ return 1;
+ }
+
+ /* Modify kernel's socket option. */
+ if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+ optval[0] = ...;
+ return 1;
+ }
+
+ /* optval larger than PAGE_SIZE use kernel's buffer. */
+ if (ctx->optlen > 4096)
+ ctx->optlen = 0;
+
+ return 1;
+}
+```
+
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
of BPF program that handles socket options.
--
2.40.0.634.g4ca3ef3211-goog
next prev parent reply other threads:[~2023-04-18 22:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 22:53 [PATCH bpf-next 0/6] bpf: handle another corner case in getsockopt Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 1/6] bpf: Don't EFAULT for getsockopt with optval=NULL Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 2/6] selftests/bpf: Verify optval=NULL case Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 3/6] bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen Stanislav Fomichev
2023-04-21 15:24 ` Daniel Borkmann
2023-04-21 16:09 ` Stanislav Fomichev
2023-04-25 0:56 ` Martin KaFai Lau
2023-04-25 17:12 ` Stanislav Fomichev
2023-04-25 18:31 ` Martin KaFai Lau
2023-04-26 17:27 ` Stanislav Fomichev
2023-04-26 18:07 ` Martin KaFai Lau
2023-04-18 22:53 ` [PATCH bpf-next 4/6] selftests/bpf: Update EFAULT {g,s}etsockopt selftests Stanislav Fomichev
2023-04-18 22:53 ` [PATCH bpf-next 5/6] selftests/bpf: Correctly handle optlen > 4096 Stanislav Fomichev
2023-04-18 22:53 ` Stanislav Fomichev [this message]
2023-04-19 20:08 ` [PATCH bpf-next 6/6] bpf: Document EFAULT changes for sockopt kernel test robot
2023-04-20 18:17 ` Stanislav Fomichev
2023-04-21 15:20 ` [PATCH bpf-next 0/6] bpf: handle another corner case in getsockopt patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230418225343.553806-7-sdf@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox