All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Daniel Rosenberg <drosen@google.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: Struct_ops Questions
Date: Tue, 17 Jan 2023 11:05:57 -0800	[thread overview]
Message-ID: <0c4e8a7f-9891-338e-1308-599f2c2af447@linux.dev> (raw)
In-Reply-To: <CA+PiJmTMrWq-BGAMZgd317q0sT7tN-6=r3sDbZdb8iVzwPKdsw@mail.gmail.com>

On 1/13/23 5:05 PM, Daniel Rosenberg wrote:
> I'm currently working on switching Fuse-BPF over to using struct_ops,
> and have a few questions.
> 
> I noticed that the lifespan of the [name]_struct_op struct passed in
> reg, and the associated maps have a lifespan that can't be influenced
> by the subsystem. Fuse-bpf keeps struct ops on an arbitrary number of
> inodes which will potentially outlive the op structure. My current
> plan is to have fuse treat unreg similar to if the daemon simply
> stopped responding, failing all impacted calls with ENOTCONN. I'm
> currently looking at two approaches.
> 
> 1. Copy the struct received in reg, and have a flag to indicate if the
> structure is live, with unreg just marking it as dead and some RCU
> style sync to ensure we don't lose the function pointers post check.
> 2. Maintain an additional struct that points to the reg provided
> struct. Null out that pointer in unreg, with the same sort of RCU
> sync.
> 
> I'm currently leaning towards 1, but not sure what the best approach
> is here, or if there should be some way for the subsystem to grab a
> kref on the maps/struct_op structures. Given the analogue of the FUSE
> daemon being able to disappear at any given time, I think one of the
> above options should be enough.
> 
> The old fuse-bpf implementation, which had its own program type
> defined, would use the program fd as an identifier, which allowed it
> to increment the ref count as needed. That sort of information isn't
> exposed to the register function, and you can't reach the struct_ops
> structure from a map fd as is. Either of those would allow us to
> directly use the map/struct objects without needing to maintain an
> extra layer of duplicated data. Currently all the register function
> does is add information to be able to check if the map still exists,
> which wouldn't be needed if we could just grab a ref.
> 
> I'm not sure how to handle Fuse being built as a module. Currently,
> bpf_struct_ops uses an include file list to define all available
> struct_ops, along with externs for their bpf_struct_ops structure. If
> we build fuse as a module, that either would not be available, or
> would require us to build extra things into the kernel when we make
> fuse as a module, which sort of seems to defeat the point.
> 
> Do we need a module interface here? At the moment I'm messing around
> with just having the reg/unreg functions implemented within the FUSE
> module, with all of the verification functions built in on the bpf
> side. I've got a rough prototype working, but there's some messiness
> around unloading the module while there are still struct_op programs
> registered. If you unload and reload the module, the struct_op program
> will still show up via bpftools, but would be unusable since it would
> no longer be registered. All of that goes away if we can directly use
> the map fd as an identifier. That wouldn't be useful for anything that
> requires extra setup in their reg function of course.
> 
> It seems like a fair bit of these issues go away if we allow for a way
> to grab the specific struct_op structure from the map fd. Would that
> be a reasonable thing to expose to a module?

I think the first question is related to the refcnt of the struct_ops. Whenever 
a new tcp connection is established, it also needs to bump the refcnt of a (bpf) 
tcp_congestion_ops. The tcp subsystem currently does a bpf_try_module_get() 
which then calls bpf_struct_ops_get() if 'owner == BPF_MODULE_OWNER'. 
bpf_struct_ops_get() will inc the refcnt of the struct_ops.

Regarding unreg, the st_map->st_ops->unreg() will unregister the struct_ops from 
the tcp subsystem. The future tcp connection won't be able to find this (bpf) 
tcp_congestion_ops.

After unreg(), the existing tcp connections could still hold a refcnt to the 
already unregistered (bpf) tcp_congestion_ops. The refcnt will eventually be 
released when all these old connections are closed. This is similar to all other 
kernel tcp-cc modules (eg. when tcp_dctcp.c is built as a module).  Note that 
once unreg, the struct_ops will still be shown in 'bpftools struct_ops dump 
id...' but the status will be in BPF_STRUCT_OPS_STATE_TOBEFREE and it won't be 
usable by new connection.

Does the above behavior work for FUSE also? If not, how is FUSE different?

I think the next set of question is about when FUSE itself is built as a module. 
The first is how to register 'struct bpf_struct_ops bpf_fuse_struct_ops'. You 
are correct that right now it is done during compile time in 
bpf_struct_ops_types.h. To make FUSE itself built as a module and 
bpf_fuse_struct_ops defined in the FUSE module, some work is needed in 
bpf_struct_ops.c to allow registering struct_ops in runtime during module_init. 
For module reference counting, I think the bpf_fuse_struct_ops should be able to 
hold one refcnt of the fuse module. Not sure how bpf_fuse_struct_ops looks like 
and I also don't know how grabbing the map fd will help. It seems you already 
have something working, so it may be easier to discuss base on that.

  reply	other threads:[~2023-01-17 20:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-14  1:05 Struct_ops Questions Daniel Rosenberg
2023-01-17 19:05 ` Martin KaFai Lau [this message]
2023-01-24  1:55   ` Daniel Rosenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c4e8a7f-9891-338e-1308-599f2c2af447@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=drosen@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.