From: sdf@google.com
To: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
Joe Burton <jevburton.kernel@gmail.com>,
bpf@vger.kernel.org
Subject: Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
Date: Fri, 7 Jan 2022 11:25:34 -0800 [thread overview]
Message-ID: <YdiTrq4Y7JwmQumc@google.com> (raw)
In-Reply-To: <CA+khW7h4OG0=w5RXnentwnsi614wZdpYW4EUwN6k7Vce3unBKw@mail.gmail.com>
On 01/07, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> >
> > On 01/06, Hao Luo wrote:
> > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > > In particular, this patchset allows pinning bpf objects in kernfs.
> This
> > > creates a new file entry in the kernfs file system and the created
> file
> > > is able to reference the bpf object. By doing so, bpf can be used to
> > > customize the file's operations, such as seq_show.
> >
> > > As a concrete usecase of this feature, this patchset introduces a
> > > simple new program type called 'bpf_view', which can be used to format
> > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > into a cgroup directory, userspace is able to read the cgroup's state
> > > from file in a format defined by the bpf program.
> >
> > > Different from bpffs, kernfs doesn't have a callback when a kernfs
> node
> > > is freed, which is problem if we allow the kernfs node to hold an
> extra
> > > reference of the bpf object, because there is no chance to dec the
> > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > hold reference of the bpf object. The lifetime of the kernfs node
> > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > automatically. This somehow treats a pinned bpf object as a persistent
> > > "device".
> >
> > > We rely on fsnotify to monitor the inode events in bpffs. A new
> function
> > > bpf_watch_inode() is introduced. It allows registering a callback
> > > function at inode destruction. For the kernfs case, a callback that
> > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > For other file systems such as sockfs, bpf_watch_inode() can monitor
> the
> > > destruction of sockfs inodes and the created file entry can hold the
> bpf
> > > object's reference. In this case, it is truly "pinning".
> >
> > > File operations other than seq_show can also be implemented using bpf.
> > > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > This looks awesome!
> >
> > One thing I don't understand is: why did go through the pinning
> > interface VS regular attach/detach? IOW, why not allow regular
> > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > (which, in turn, creates the kernfs nodes). Seems like this way you can
> drop
> > the requirement on the object being pinned in the bpffs first?
> Thanks Stan.
> Yeah, the attach/detach approach is definitely another option. IIUC,
> in comparison to pinning, does attach/detach only work for cgroups?
attach has target_fd argument that, in theory, can be whatever. We can
add support for different fd types.
> Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> But I don't know whether this generality is welcome and implementing
> seq_show is the only concrete use case I can think of right now. If
> people think the ability of creating files in other subsystems is not
> good, I'd be happy to take a look at the attach/detach approach and
> that may be the right way.
The reason I started thinking about attach/detach is because of clunky
unlink that you have to do (aka echo "rm" > file). IMO, having standard
attach/detach is a much more clear. But I might be missing some
complexity associated with non-cgroup filesystems.
next prev parent reply other threads:[~2022-01-07 19:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
2022-01-07 0:33 ` Yonghong Song
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
2022-01-07 0:35 ` Yonghong Song
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs Hao Luo
2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
2022-01-07 18:59 ` Hao Luo
2022-01-07 19:25 ` sdf [this message]
2022-01-10 18:55 ` Hao Luo
2022-01-10 19:22 ` Stanislav Fomichev
2022-01-11 3:33 ` Alexei Starovoitov
2022-01-11 17:06 ` Stanislav Fomichev
2022-01-11 18:20 ` Hao Luo
2022-01-12 18:55 ` Song Liu
2022-01-12 19:19 ` Hao Luo
2022-01-07 0:30 ` Yonghong Song
2022-01-07 20:43 ` Hao Luo
2022-01-10 17:30 ` Yonghong Song
2022-01-10 18:56 ` Hao Luo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YdiTrq4Y7JwmQumc@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=jevburton.kernel@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).