From: "J. Bruce Fields" <bfields@fieldses.org>
To: NeilBrown <neilb@suse.de>
Cc: Josef Bacik <josef@toxicpanda.com>,
Christoph Hellwig <hch@infradead.org>,
Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>,
linux-nfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>,
Ulli Horlacher <framstag@rus.uni-stuttgart.de>,
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better.
Date: Mon, 19 Jul 2021 11:49:07 -0400 [thread overview]
Message-ID: <20210719154907.GA28482@fieldses.org> (raw)
In-Reply-To: <162638862766.13764.8566962032225976326@noble.neil.brown.name>
On Fri, Jul 16, 2021 at 08:37:07AM +1000, NeilBrown wrote:
> On Fri, 16 Jul 2021, Josef Bacik wrote:
> > On 7/15/21 1:24 PM, Christoph Hellwig wrote:
> > > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote:
> > >> Because there's no alternative. We need a way to tell userspace they've
> > >> wandered into a different inode namespace. There's no argument that what
> > >> we're doing is ugly, but there's never been a clear "do X instead". Just a
> > >> lot of whinging that btrfs is broken. This makes userspace happy and is
> > >> simple and straightforward. I'm open to alternatives, but there have been 0
> > >> workable alternatives proposed in the last decade of complaining about it.
> > >
> > > Make sure we cross a vfsmount when crossing the "st_dev" domain so
> > > that it is properly reported. Suggested many times and ignored all
> > > the time beause it requires a bit of work.
> > >
> >
> > You keep telling me this but forgetting that I did all this work when you
> > originally suggested it. The problem I ran into was the automount stuff
> > requires that we have a completely different superblock for every vfsmount.
> > This is fine for things like nfs or samba where the automount literally points
> > to a completely different mount, but doesn't work for btrfs where it's on the
> > same file system. If you have 1000 subvolumes and run sync() you're going to
> > write the superblock 1000 times for the same file system. You are going to
> > reclaim inodes on the same file system 1000 times. You are going to reclaim
> > dcache on the same filesytem 1000 times. You are also going to pin 1000
> > dentries/inodes into memory whenever you wander into these things because the
> > super is going to hold them open.
> >
> > This is not a workable solution. It's not a matter of simply tying into
> > existing infrastructure, we'd have to completely rework how the VFS deals with
> > this stuff in order to be reasonable. And when I brought this up to Al he told
> > me I was insane and we absolutely had to have a different SB for every vfsmount,
> > which means we can't use vfsmount for this, which means we don't have any other
> > options. Thanks,
>
> When I was first looking at this, I thought that separate vfsmnts
> and auto-mounting was the way to go "just like NFS". NFS still shares a
> lot between the multiple superblock - certainly it shares the same
> connection to the server.
>
> But I dropped the idea when Bruce pointed out that nfsd is not set up to
> export auto-mounted filesystems.
Yes. I wish it was.... But we'd need some way to look a
not-currently-mounted filesystem by filehandle:
> It needs to be able to find a
> filesystem given a UUID (extracted from a filehandle), and it does this
> by walking through the mount table to find one that matches. So unless
> all btrfs subvols were mounted all the time (which I wouldn't propose),
> it would need major work to fix.
>
> NFSv4 describes the fsid as having a "major" and "minor" component.
> We've never treated these as having an important meaning - just extra
> bits to encode uniqueness in. Maybe we should have used "major" for the
> vfsmnt, and kept "minor" for the subvol.....
So nfsd would use the "major" ID to find the parent export, and then
btrfs would use the "minor" ID to identify the subvolume?
--b.
> The idea for a single vfsmnt exposing multiple inode-name-spaces does
> appeal to me. The "st_dev" is just part of the name, and already a
> fairly blurry part. Thanks to bind mounts, multiple mounts can have the
> same st_dev. I see no intrinsic reason that a single mount should not
> have multiple fsids, provided that a coherent picture is provided to
> userspace which doesn't contain too many surprises.
next prev parent reply other threads:[~2021-07-19 16:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-13 3:53 any idea about auto export multiple btrfs snapshots? Wang Yugui
2021-06-14 22:50 ` NeilBrown
2021-06-15 15:13 ` Wang Yugui
2021-06-15 15:41 ` Wang Yugui
2021-06-16 5:47 ` Wang Yugui
2021-06-17 3:02 ` NeilBrown
2021-06-17 4:28 ` Wang Yugui
2021-06-18 0:32 ` NeilBrown
2021-06-18 7:26 ` Wang Yugui
2021-06-18 13:34 ` Wang Yugui
2021-06-19 6:47 ` Wang Yugui
2021-06-20 12:27 ` Wang Yugui
2021-06-21 4:52 ` NeilBrown
2021-06-21 5:13 ` NeilBrown
2021-06-21 8:34 ` Wang Yugui
2021-06-22 1:28 ` NeilBrown
2021-06-22 3:22 ` Wang Yugui
2021-06-22 7:14 ` Wang Yugui
2021-06-23 0:59 ` NeilBrown
2021-06-23 6:14 ` Wang Yugui
2021-06-23 6:29 ` NeilBrown
2021-06-23 9:34 ` Wang Yugui
2021-06-23 23:38 ` NeilBrown
2021-06-23 15:35 ` J. Bruce Fields
2021-06-23 22:04 ` NeilBrown
2021-06-23 22:25 ` J. Bruce Fields
2021-06-23 23:29 ` NeilBrown
2021-06-23 23:41 ` Frank Filz
2021-06-24 0:01 ` J. Bruce Fields
2021-06-24 21:58 ` Patrick Goetz
2021-06-24 23:27 ` NeilBrown
2021-06-21 14:35 ` Frank Filz
2021-06-21 14:55 ` Wang Yugui
2021-06-21 17:49 ` Frank Filz
2021-06-21 22:41 ` Wang Yugui
2021-06-22 17:34 ` Frank Filz
2021-06-22 22:48 ` Wang Yugui
2021-06-17 2:15 ` Wang Yugui
[not found] ` <20210310074620.GA2158@tik.uni-stuttgart.de>
[not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name>
2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik
2021-07-15 16:45 ` Christoph Hellwig
2021-07-15 17:11 ` Josef Bacik
2021-07-15 17:24 ` Christoph Hellwig
2021-07-15 18:01 ` Josef Bacik
2021-07-15 22:37 ` NeilBrown
2021-07-19 15:40 ` Josef Bacik
2021-07-19 20:00 ` J. Bruce Fields
2021-07-19 20:44 ` Josef Bacik
2021-07-19 23:53 ` NeilBrown
2021-07-19 15:49 ` J. Bruce Fields [this message]
2021-07-20 0:02 ` NeilBrown
2021-07-19 9:16 ` Christoph Hellwig
2021-07-19 23:54 ` NeilBrown
2021-07-20 6:23 ` Christoph Hellwig
2021-07-20 7:17 ` NeilBrown
2021-07-20 8:00 ` Christoph Hellwig
2021-07-20 23:11 ` NeilBrown
2021-07-20 22:10 ` J. Bruce Fields
2021-07-15 23:02 ` NeilBrown
2021-07-15 15:45 ` J. Bruce Fields
2021-07-15 23:08 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210719154907.GA28482@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=framstag@rus.uni-stuttgart.de \
--cc=hch@infradead.org \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=wangyugui@e16-tech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox