From: Daniel Rosenberg <drosen@google.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Amir Goldstein <amir73il@gmail.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-unionfs@vger.kernel.org,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Jonathan Corbet <corbet@lwn.net>,
Joanne Koong <joannelkoong@gmail.com>,
Mykola Lysenko <mykolal@fb.com>,
kernel-team@android.com
Subject: Re: [RFC PATCH v3 28/37] WIP: bpf: Add fuse_ops struct_op programs
Date: Tue, 2 May 2023 18:53:25 -0700 [thread overview]
Message-ID: <CA+PiJmTQ-mpSywa-P6aW5VSUQu-49XtxNDMZP4Ks84UtVjDsNA@mail.gmail.com> (raw)
In-Reply-To: <CAEf4Bzb+suNcvM_tXBmYMzG0j9EH9kPW4F_x1s6ZEM118=tAAw@mail.gmail.com>
On Wed, Apr 26, 2023 at 9:18 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> Have you considered grouping this huge amount of callbacks into a
> smaller set of more generic callbacks where each callback would get
> enum argument specifying what sort of operation it is called for? This
> has many advantages, starting from not having to deal with struct_ops
> limits, ending with not needing to instantiate dozens of individual
> BPF programs.
>
> E.g., for a lot of operations the difference between pre- and
> post-filter is in having in argument as read-only and maybe having
> extra out argument for post-filter. One way to unify such post/pre
> filters into one callback would be to record whether in has to be
> read-only or read-write and not allow to create r/w dynptr for the
> former case. Pass bool or enum specifying if it's post or pre filter.
> For that optional out argument, you can simulate effectively the same
> by always supplying it, but making sure that out parameter is
> read-only and zero-sized, for example.
>
> That would cut the number of callbacks in two, which I'd say still is
> not great :) I think it would be better still to have even larger
> groups of callbacks for whole families of operations with the same (or
> "unifiable") interface (domain experts like you would need to do an
> analysis here to see what makes sense to group, of course).
>
> We'll probably touch on that tomorrow at BPF office hours, but I
> wanted to point this out beforehand, so that you have time to think
> about it.
>
The meta info struct we pass in includes the opcode which contains
whether it is a prefilter or postfilter, although I guess that may be
less accessible to the verifier than a separate bool. In the v1
version, we handled all op codes in a single program, although I think
we were running into some slowdowns when we had every opcode in a
giant switch statement, plus we were incurring the cost of the bpf
program even when we didn't need to do anything in it. The struct_op
version lets us entirely skip calling the bpf for opcodes we don't
need to handle.
Many of the arguments we pass currently are structs. If they were all
dynptrs, we could set the output related ones to empty/readonly, but
that removes one of the other strengths of the struct_op setup, where
we can actually label the inputs as the structs they are instead of a
void* equivalent. There are definitely some cases where we could
easily merge opcode callbacks, like FUSE_FSYNCDIR/FUSE_FSYNC and
FUSE_OPEN/FUSE_OPENDIR. I set them up as separate since it's easy to
assign the same program to both callbacks in the case where you want
both to be handled the same, while maintaining flexibility to handle
them separately.
> +void bpf_fuse_get_rw_dynptr(struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit, u64 size, bool copy)
>
> not clear why size is passed from outside instead of instantiating
> dynptr with buffer->size? See [0] for bpf_dynptr_adjust and
> bpf_dynptr_clone that allow you to adjust buffer as necessary.
>
> As for the copy parameter, can you elaborate on the idea behind it?
>
> [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=741584&state=*
>
We're storing these buffers as fuse_buffers initially because of the
additional metadata we're carrying. Some fields have variable lengths,
or are backed by const data. For instance, names. If you wanted to
alter the name we use on the lower filesystem, you cannot change it
directly since it's being backed by the dentry name. If you wanted to
adjust something, like perhaps adding an extension, you would pass
bpf_fuse_get_rw_dynptr the size you'd want for the new buffer, and
copy=true to get the preexisting data. Fuse_buffer tracks that data
was allocated so Fuse can clean up after the call. Additionally, say
you wanted to trim half the data returned by an xattr for some reason.
You would give it a size less than the buffer size to inform fuse that
it should ignore the second half of the data. That part could be
handled by bpf_dynptr_adjust if we didn't also need to handle the
allocation case.
Say you wanted to have the lower file name be the hash of the one you
created. In that case, you could get bpf_fuse_get_ro_dynptr to get
access to compute the hash, and then bpf_fuse_get_rw_dynptr to get a
buffer to write the hash to. Since the data is not directly related to
the original data, there would be no benefit to getting a copy.
I initially intended for bpf_fuse_get_ro_dynptr/bpf_fuse_get_rw_dynptr
to be called at most once for each field, but that may be too
restrictive. At the moment, if you make two calls that require
reallocating, any pointers to the old buffer would be invalid. This is
not the case for the original name, as we aren't touching the original
source. There are two possible approaches here. I could either
refcount the buffer and have a put kfunc, or I could invalidate old
dynpointers when bpf_fuse_get_rw_dynptr is called, similar to what
skb/xdp do. I'm leaning towards the latter to disallow having many
allocations active at once by calling bpf_fuse_get_rw_dynptr for
increasing sizes, though I could also just disallow reallocating a
buffer that already was reallocated.
The new dynptr helpers are pretty exciting since they'll make it much
easier to deal with chunks of data, which we may end up doing in
read/write filters. I haven't fully set those up since I was waiting
to see what the dynptr helpers ended up looking like.
> > +void bpf_fuse_get_ro_dynptr(const struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit)
>
> these kfuncs probably should be more consistently named as
> bpf_dynptr_from_fuse_buffer_{ro,rw}() ?
>
Yeah, that fits in much better with the skb/xdp functions.
> > +
> > +uint32_t bpf_fuse_return_len(struct fuse_buffer *buffer)
> > +{
> > + return buffer->size;
>
> you should be able to get this with bpf_dynptr_size() (once you create
> it from fuse_buffer).
>
Yes, this might be unnecessary. I added it while testing kfuncs, and
had intended to use it with a fuse_buffer strncmp before I saw that
there's now a bpf_strncmp :) I had tried using it with
bpf_dynptr_slice, but that requires a known constant at verification
time, which may make using it in real cases a bit difficult...
bpf_strncmp also has some restrictions around the second string being
a fixed map, or something like that.
>
> you probably should be fine with just using bpf_tracing_func_proto as is
>
> > + .is_valid_access = bpf_fuse_is_valid_access,
>
> similarly, why custom no-op callback?
>
Those are largely carried over from iterations when I was less sure
what I would need. A lot of the work I was doing in the v1 code is
handled by default with the struct_op setup now, or is otherwise
unnecessary. This area in particular needs a lot of cleanup.
> > +static int bpf_fuse_init(struct btf *btf)
> > +{
> > + s32 type_id;
> > +
> > + type_id = btf_find_by_name_kind(btf, "fuse_buffer", BTF_KIND_STRUCT);
> > + if (type_id < 0)
> > + return -EINVAL;
> > + fuse_buffer_struct_type = btf_type_by_id(btf, type_id);
> > +
>
> see BTF_ID and BTF_ID_LIST uses for how to get ID for your custom
> well-known type
>
Thanks, I'll look into those.
next prev parent reply other threads:[~2023-05-03 1:53 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 1:40 [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 01/37] bpf: verifier: Accept dynptr mem as mem in herlpers Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 02/37] bpf: Allow NULL buffers in bpf_dynptr_slice(_rw) Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 03/37] selftests/bpf: Test allowing NULL buffer in dynptr slice Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 04/37] fs: Generic function to convert iocb to rw flags Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 05/37] fuse-bpf: Update fuse side uapi Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 06/37] fuse-bpf: Add data structures for fuse-bpf Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 07/37] fuse-bpf: Prepare for fuse-bpf patch Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 08/37] fuse: Add fuse-bpf, a stacked fs extension for FUSE Daniel Rosenberg
2023-05-02 3:38 ` Alexei Starovoitov
2023-05-03 3:45 ` Amir Goldstein
2023-04-18 1:40 ` [RFC PATCH v3 09/37] fuse-bpf: Add ioctl interface for /dev/fuse Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 10/37] fuse-bpf: Don't support export_operations Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 11/37] fuse-bpf: Add support for access Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 12/37] fuse-bpf: Partially add mapping support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 13/37] fuse-bpf: Add lseek support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 14/37] fuse-bpf: Add support for fallocate Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 15/37] fuse-bpf: Support file/dir open/close Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 16/37] fuse-bpf: Support mknod/unlink/mkdir/rmdir Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 17/37] fuse-bpf: Add support for read/write iter Daniel Rosenberg
2023-05-16 20:21 ` Amir Goldstein
2023-04-18 1:40 ` [RFC PATCH v3 18/37] fuse-bpf: support readdir Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 19/37] fuse-bpf: Add support for sync operations Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 20/37] fuse-bpf: Add Rename support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 21/37] fuse-bpf: Add attr support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 22/37] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 23/37] fuse-bpf: Add xattr support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 24/37] fuse-bpf: Add symlink/link support Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 25/37] fuse-bpf: allow mounting with no userspace daemon Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 26/37] bpf: Increase struct_op limits Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 27/37] fuse-bpf: Add fuse-bpf constants Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 28/37] WIP: bpf: Add fuse_ops struct_op programs Daniel Rosenberg
2023-04-27 4:18 ` Andrii Nakryiko
2023-05-03 1:53 ` Daniel Rosenberg [this message]
2023-05-03 18:22 ` Andrii Nakryiko
2023-04-18 1:40 ` [RFC PATCH v3 29/37] fuse-bpf: Export Functions Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 30/37] fuse: Provide registration functions for fuse-bpf Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 31/37] fuse-bpf: Set fuse_ops at mount or lookup time Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 32/37] fuse-bpf: Call bpf for pre/post filters Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 33/37] fuse-bpf: Add userspace " Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 34/37] WIP: fuse-bpf: add error_out Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 35/37] tools: Add FUSE, update bpf includes Daniel Rosenberg
2023-04-27 4:24 ` Andrii Nakryiko
2023-04-28 0:48 ` Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 36/37] fuse-bpf: Add selftests Daniel Rosenberg
2023-04-18 1:40 ` [RFC PATCH v3 37/37] fuse: Provide easy way to test fuse struct_op call Daniel Rosenberg
2023-04-18 5:33 ` [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE Amir Goldstein
2023-04-21 1:41 ` Daniel Rosenberg
2023-04-23 14:50 ` Amir Goldstein
2023-04-24 15:32 ` Miklos Szeredi
2023-05-02 0:07 ` Daniel Rosenberg
2023-05-17 2:50 ` Gao Xiang
2023-05-17 6:51 ` Amir Goldstein
2023-05-17 7:05 ` Gao Xiang
2023-05-17 7:19 ` Gao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+PiJmTQ-mpSywa-P6aW5VSUQu-49XtxNDMZP4Ks84UtVjDsNA@mail.gmail.com \
--to=drosen@google.com \
--cc=amir73il@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=joannelkoong@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@android.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=miklos@szeredi.hu \
--cc=mykolal@fb.com \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).