bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: sdf@google.com
To: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Joe Burton <jevburton.kernel@gmail.com>,
	bpf@vger.kernel.org
Subject: Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
Date: Fri, 7 Jan 2022 11:25:34 -0800	[thread overview]
Message-ID: <YdiTrq4Y7JwmQumc@google.com> (raw)
In-Reply-To: <CA+khW7h4OG0=w5RXnentwnsi614wZdpYW4EUwN6k7Vce3unBKw@mail.gmail.com>

On 01/07, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> >
> > On 01/06, Hao Luo wrote:
> > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > > In particular, this patchset allows pinning bpf objects in kernfs.  
> This
> > > creates a new file entry in the kernfs file system and the created  
> file
> > > is able to reference the bpf object. By doing so, bpf can be used to
> > > customize the file's operations, such as seq_show.
> >
> > > As a concrete usecase of this feature, this patchset introduces a
> > > simple new program type called 'bpf_view', which can be used to format
> > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > into a cgroup directory, userspace is able to read the cgroup's state
> > > from file in a format defined by the bpf program.
> >
> > > Different from bpffs, kernfs doesn't have a callback when a kernfs  
> node
> > > is freed, which is problem if we allow the kernfs node to hold an  
> extra
> > > reference of the bpf object, because there is no chance to dec the
> > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > hold reference of the bpf object. The lifetime of the kernfs node
> > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > automatically. This somehow treats a pinned bpf object as a persistent
> > > "device".
> >
> > > We rely on fsnotify to monitor the inode events in bpffs. A new  
> function
> > > bpf_watch_inode() is introduced. It allows registering a callback
> > > function at inode destruction. For the kernfs case, a callback that
> > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > For other file systems such as sockfs, bpf_watch_inode() can monitor  
> the
> > > destruction of sockfs inodes and the created file entry can hold the  
> bpf
> > > object's reference. In this case, it is truly "pinning".
> >
> > > File operations other than seq_show can also be implemented using bpf.
> > > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > This looks awesome!
> >
> > One thing I don't understand is: why did go through the pinning
> > interface VS regular attach/detach? IOW, why not allow regular
> > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > (which, in turn, creates the kernfs nodes). Seems like this way you can  
> drop
> > the requirement on the object being pinned in the bpffs first?

> Thanks Stan.

> Yeah, the attach/detach approach is definitely another option. IIUC,
> in comparison to pinning, does attach/detach only work for cgroups?

attach has target_fd argument that, in theory, can be whatever. We can
add support for different fd types.

> Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> But I don't know whether this generality is welcome and implementing
> seq_show is the only concrete use case I can think of right now. If
> people think the ability of creating files in other subsystems is not
> good, I'd be happy to take a look at the attach/detach approach and
> that may be the right way.

The reason I started thinking about attach/detach is because of clunky
unlink that you have to do (aka echo "rm" > file). IMO, having standard
attach/detach is a much more clear. But I might be missing some
complexity associated with non-cgroup filesystems.

  reply	other threads:[~2022-01-07 19:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
2022-01-07  0:33   ` Yonghong Song
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
2022-01-07  0:35   ` Yonghong Song
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs Hao Luo
2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
2022-01-07 18:59   ` Hao Luo
2022-01-07 19:25     ` sdf [this message]
2022-01-10 18:55       ` Hao Luo
2022-01-10 19:22         ` Stanislav Fomichev
2022-01-11  3:33         ` Alexei Starovoitov
2022-01-11 17:06           ` Stanislav Fomichev
2022-01-11 18:20           ` Hao Luo
2022-01-12 18:55             ` Song Liu
2022-01-12 19:19               ` Hao Luo
2022-01-07  0:30 ` Yonghong Song
2022-01-07 20:43   ` Hao Luo
2022-01-10 17:30     ` Yonghong Song
2022-01-10 18:56       ` Hao Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdiTrq4Y7JwmQumc@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=jevburton.kernel@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).