All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Groves <John@groves.net>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Bernd Schubert <bschubert@ddn.com>,
	John Groves <jgroves@micron.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Stefan Hajnoczi <shajnocz@redhat.com>,
	Joanne Koong <joannelkoong@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Aravind Ramesh <arramesh@micron.com>,
	Ajay Joshi <ajayjoshi@micron.com>
Subject: Re: [RFC V2 10/18] famfs_fuse: Basic fuse kernel ABI enablement for famfs
Date: Tue, 8 Jul 2025 18:53:48 -0700	[thread overview]
Message-ID: <20250709015348.GD2672029@frogsfrogsfrogs> (raw)
In-Reply-To: <ueepqz3oqeqzwiidk2wlf3f7enxxte4ws27gtxhakfmdiq4t26@cvfmozym5rme>

On Tue, Jul 08, 2025 at 07:02:03AM -0500, John Groves wrote:
> On 25/07/07 10:39AM, Darrick J. Wong wrote:
> > On Fri, Jul 04, 2025 at 08:39:59AM -0500, John Groves wrote:
> > > On 25/07/04 09:54AM, Amir Goldstein wrote:
> > > > On Thu, Jul 3, 2025 at 8:51 PM John Groves <John@groves.net> wrote:
> > > > >
> > > > > * FUSE_DAX_FMAP flag in INIT request/reply
> > > > >
> > > > > * fuse_conn->famfs_iomap (enable famfs-mapped files) to denote a
> > > > >   famfs-enabled connection
> > > > >
> > > > > Signed-off-by: John Groves <john@groves.net>
> > > > > ---
> > > > >  fs/fuse/fuse_i.h          |  3 +++
> > > > >  fs/fuse/inode.c           | 14 ++++++++++++++
> > > > >  include/uapi/linux/fuse.h |  4 ++++
> > > > >  3 files changed, 21 insertions(+)
> > > > >
> > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > > > index 9d87ac48d724..a592c1002861 100644
> > > > > --- a/fs/fuse/fuse_i.h
> > > > > +++ b/fs/fuse/fuse_i.h
> > > > > @@ -873,6 +873,9 @@ struct fuse_conn {
> > > > >         /* Use io_uring for communication */
> > > > >         unsigned int io_uring;
> > > > >
> > > > > +       /* dev_dax_iomap support for famfs */
> > > > > +       unsigned int famfs_iomap:1;
> > > > > +
> > > > 
> > > > pls move up to the bit fields members.
> > > 
> > > Oops, done, thanks.
> > > 
> > > > 
> > > > >         /** Maximum stack depth for passthrough backing files */
> > > > >         int max_stack_depth;
> > > > >
> > > > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > > > > index 29147657a99f..e48e11c3f9f3 100644
> > > > > --- a/fs/fuse/inode.c
> > > > > +++ b/fs/fuse/inode.c
> > > > > @@ -1392,6 +1392,18 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> > > > >                         }
> > > > >                         if (flags & FUSE_OVER_IO_URING && fuse_uring_enabled())
> > > > >                                 fc->io_uring = 1;
> > > > > +                       if (IS_ENABLED(CONFIG_FUSE_FAMFS_DAX) &&
> > > > > +                           flags & FUSE_DAX_FMAP) {
> > > > > +                               /* XXX: Should also check that fuse server
> > > > > +                                * has CAP_SYS_RAWIO and/or CAP_SYS_ADMIN,
> > > > > +                                * since it is directing the kernel to access
> > > > > +                                * dax memory directly - but this function
> > > > > +                                * appears not to be called in fuse server
> > > > > +                                * process context (b/c even if it drops
> > > > > +                                * those capabilities, they are held here).
> > > > > +                                */
> > > > > +                               fc->famfs_iomap = 1;
> > > > > +                       }
> > > > 
> > > > 1. As long as the mapping requests are checking capabilities we should be ok
> > > >     Right?
> > > 
> > > It depends on the definition of "are", or maybe of "mapping requests" ;)
> > > 
> > > Forgive me if this *is* obvious, but the fuse server capabilities are what
> > > I think need to be checked here - not the app that it accessing a file.
> > > 
> > > An app accessing a regular file doesn't need permission to do raw access to
> > > the underlying block dev, but the fuse server does - becuase it is directing
> > > the kernel to access that for apps.
> > > 
> > > > 2. What's the deal with capable(CAP_SYS_ADMIN) in process_init_limits then?
> > > 
> > > I *think* that's checking the capabilities of the app that is accessing the
> > > file, and not the fuse server. But I might be wrong - I have not pulled very
> > > hard on that thread yet.
> > 
> > The init reply should be processed in the context of the fuse server.
> > At that point the kernel hasn't exposed the fs to user programs, so
> > (AFAICT) there won't be any other programs accessing that fuse mount.
> 
> Hmm. It would be good if you're right about that. My fuse server *is* running
> as root, and when I check those capabilities in process_init_reply(), I
> find those capabilities. So far so good.
> 
> Then I added code to my fuse server to drop those capabilities prior to
> starting the fuse session (prctl(PR_CAPBSET_DROP, CAP_SYS_RAWIO) and 
> prctl(PR_CAPBSET_DROP, CAP_SYS_ADMIN). I expected (hoped?) to see those 
> capabilities disappear in process_init_reply() - but they did not disappear.
> 
> I'm all ears if somebody can see a flaw in my logic here. Otherwise, the
> capabilities need to be stashed away before the reply is processsed, when 
> fs/fuse *is* running in fuse server context.
> 
> I'm somewhat surprised if that isn't already happening somewhere...

Hrm.  I *thought* that since FUSE_INIT isn't queued as a background
command, it should still execute in the same process context as the fuse
server.

OTOH it also occurs to me that I have this code in fuse_send_init:

	if (has_capability_noaudit(current, CAP_SYS_RAWIO))
		flags |= FUSE_IOMAP | FUSE_IOMAP_DIRECTIO | FUSE_IOMAP_PAGECACHE;
	...
	ia->in.flags = flags;
	ia->in.flags2 = flags >> 32;

which means that we only advertise iomap support in FUSE_INIT if the
process running fuse_fill_super (which you hope is the fuse server)
actually has CAP_SYS_RAWIO.  Would that work for you?  Or are you
dropping privileges before you even open /dev/fuse?

Note: I might decide to relax that approach later on, since iomap
requires you to have opened a block device ... which implies that the
process had read/write access to start with; and maybe we're ok with
unprivileged fuse2fs servers running on a chmod 666 block device?

<shrug> always easier to /relax/ the privilege checks. :)

> > > > 3. Darrick mentioned the need for a synchronic INIT variant for his work on
> > > >     blockdev iomap support [1]
> > > 
> > > I'm not sure that's the same thing (Darrick?), but I do think Darrick's
> > > use case probably needs to check capabilities for a server that is sending
> > > apps (via files) off to access extents of block devices.
> > 
> > I don't know either, Miklos hasn't responded to my questions.  I think
> > the motivation for a synchronous 
> 
> ?

..."I don't know what his motivations for synchronous FUSE_INIT are."

I guess I fubard vim. :(

> > As for fuse/iomap, I just only need to ask the kernel if iomap support
> > is available before calling ext2fs_open2() because the iomap question
> > has some implications for how we open the ext4 filesystem.
> > 
> > > > I also wonder how much of your patches and Darrick's patches end up
> > > > being an overlap?
> > > 
> > > Darrick and I spent some time hashing through this, and came to the conclusion
> > > that the actual overlap is slim-to-none. 
> > 
> > Yeah.  The neat thing about FMAPs is that you can establish repeating
> > patterns, which is useful for interleaved DRAM/pmem devices.  Disk
> > filesystems don't do repeating patterns, so they'd much rather manage
> > non-repeating mappings.
> 
> Right. Interleaving is critical to how we use memory, so fmaps are designed
> to support it.
> 
> Tangent: at some point a broader-than-just-me discussion of how block devices
> have the device mapper, but memory has no such layout tools, might be good
> to have. Without such a thing (which might or might not be possible/practical),
> it's essential that famfs do the interleaving. Lacking a mapper layer also
> means that we need dax to provide a clean "device abstraction" (meaning
> a single CXL allocation [which has a uuid/tag] needs to appear as a single
> dax device whether or not it's HPA-contiguous).

Well it's not as simple as device-mapper, where we can intercept struct
bio and remap/split it to our heart's content.  I guess you could do
that with an iovec...?  Would be sorta amusing if you could software
RAID10 some DRAM. :P

--D

> Cheers,
> John
> 
> 

  reply	other threads:[~2025-07-09  1:53 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 18:50 [RFC V2 00/18] famfs: port into fuse John Groves
2025-07-03 18:50 ` [RFC V2 01/18] dev_dax_iomap: Move dax_pgoff_to_phys() from device.c to bus.c John Groves
2025-07-03 18:50 ` [RFC V2 02/18] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2025-07-04 10:39   ` Jonathan Cameron
2025-07-04 12:54     ` John Groves
2025-07-03 18:50 ` [RFC V2 03/18] dev_dax_iomap: Save the kva from memremap John Groves
2025-07-04 11:11   ` Jonathan Cameron
2025-07-03 18:50 ` [RFC V2 04/18] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2025-07-04 12:47   ` Jonathan Cameron
2025-07-05 22:56     ` John Groves
2025-07-03 18:50 ` [RFC V2 05/18] dev_dax_iomap: export dax_dev_get() John Groves
2025-07-03 18:50 ` [RFC V2 06/18] dev_dax_iomap: (ignore!) Drop poisoned page warning in fs/dax.c John Groves
2025-07-03 18:50 ` [RFC V2 07/18] famfs_fuse: magic.h: Add famfs magic numbers John Groves
2025-07-03 18:50 ` [RFC V2 08/18] famfs_fuse: Kconfig John Groves
2025-07-03 18:50 ` [RFC V2 09/18] famfs_fuse: Update macro s/FUSE_IS_DAX/FUSE_IS_VIRTIO_DAX/ John Groves
2025-07-04  8:44   ` Amir Goldstein
2025-07-03 18:50 ` [RFC V2 10/18] famfs_fuse: Basic fuse kernel ABI enablement for famfs John Groves
2025-07-03 22:45   ` John Groves
2025-07-07 17:32     ` Darrick J. Wong
2025-07-04  7:54   ` Amir Goldstein
2025-07-04 13:39     ` John Groves
2025-07-07 17:39       ` Darrick J. Wong
2025-07-08 12:02         ` John Groves
2025-07-09  1:53           ` Darrick J. Wong [this message]
2025-07-11  1:32             ` John Groves
2025-07-12  4:49               ` Darrick J. Wong
2025-08-11 18:30               ` John Groves
2025-08-12 16:37                 ` Darrick J. Wong
2025-08-13 13:07                   ` John Groves
2025-08-14 17:16                     ` Darrick J. Wong
2025-07-03 18:50 ` [RFC V2 11/18] famfs_fuse: Basic famfs mount opts John Groves
2025-07-09  3:59   ` Darrick J. Wong
2025-07-11 15:28     ` John Groves
2025-07-12  5:54       ` Darrick J. Wong
2025-08-14 10:37         ` Miklos Szeredi
2025-08-14 14:39           ` John Groves
2025-08-14 15:19             ` Miklos Szeredi
2025-08-14 23:52               ` John Groves
2025-07-03 18:50 ` [RFC V2 12/18] famfs_fuse: Plumb the GET_FMAP message/response John Groves
2025-07-04  8:54   ` Amir Goldstein
2025-07-04 20:30     ` John Groves
2025-07-05  0:06       ` John Groves
2025-07-05  7:58         ` Amir Goldstein
2025-07-05 19:17           ` John Groves
2025-07-09  4:27   ` Darrick J. Wong
2025-07-11 13:46     ` John Groves
2025-08-14 13:36   ` Miklos Szeredi
2025-08-14 14:36     ` Miklos Szeredi
2025-08-14 18:20       ` Darrick J. Wong
2025-08-15 15:06         ` John Groves
2025-08-19 21:55           ` Darrick J. Wong
2025-08-15 16:53       ` John Groves
2025-08-19 22:13         ` Darrick J. Wong
2025-08-14 18:05     ` Darrick J. Wong
2025-08-16 15:00       ` John Groves
2025-08-19 22:17         ` Darrick J. Wong
2025-08-15  0:38     ` John Groves
2025-07-03 18:50 ` [RFC V2 13/18] famfs_fuse: Create files with famfs fmaps John Groves
2025-07-04  9:01   ` Amir Goldstein
2025-07-05 19:27     ` John Groves
2025-07-03 18:50 ` [RFC V2 14/18] famfs_fuse: GET_DAXDEV message and daxdev_table John Groves
2025-07-04 13:20   ` Jonathan Cameron
2025-07-06 17:07     ` John Groves
2025-08-14 13:58   ` Miklos Szeredi
2025-08-14 17:19     ` Darrick J. Wong
2025-08-14 18:25       ` Miklos Szeredi
2025-08-14 18:55         ` Darrick J. Wong
2025-08-14 19:19           ` Miklos Szeredi
2025-08-16 16:22         ` John Groves
2025-08-19 22:32           ` Darrick J. Wong
2025-08-15 16:38     ` John Groves
2025-08-19 22:34       ` Darrick J. Wong
2025-07-03 18:50 ` [RFC V2 15/18] famfs_fuse: Plumb dax iomap and fuse read/write/mmap John Groves
2025-07-04  9:13   ` Amir Goldstein
2025-07-05 19:44     ` John Groves
2025-07-03 18:50 ` [RFC V2 16/18] famfs_fuse: Add holder_operations for dax notify_failure() John Groves
2025-07-03 18:50 ` [RFC V2 17/18] famfs_fuse: Add famfs metadata documentation John Groves
2025-07-03 18:50 ` [RFC V2 18/18] famfs_fuse: Add documentation John Groves
2025-07-04  0:27   ` Bagas Sanjaya
2025-07-04  2:22     ` Jonathan Corbet
2025-07-04  3:53       ` Bagas Sanjaya
2025-07-04 18:58         ` Matthew Wilcox
2025-07-04 23:29           ` Bagas Sanjaya
2025-07-04 23:43             ` Matthew Wilcox
2025-07-05  1:11               ` Bagas Sanjaya
2025-07-04  6:09   ` Randy Dunlap
2025-07-04  8:27   ` Amir Goldstein
2025-07-04 23:36     ` Bagas Sanjaya
2025-07-03 18:56 ` [RFC V2 00/18] famfs: port into fuse John Groves
2025-07-09  3:26   ` Miklos Szeredi
2025-07-11  1:18     ` John Groves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250709015348.GD2672029@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=John@groves.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ajayjoshi@micron.com \
    --cc=amir73il@gmail.com \
    --cc=arramesh@micron.com \
    --cc=brauner@kernel.org \
    --cc=bschubert@ddn.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=jack@suse.cz \
    --cc=jgroves@micron.com \
    --cc=jlayton@kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=nvdimm@lists.linux.dev \
    --cc=rdunlap@infradead.org \
    --cc=shajnocz@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.