public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: NeilBrown <neilb@suse.de>
Cc: Josef Bacik <josef@toxicpanda.com>,
	Christoph Hellwig <hch@infradead.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-nfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>,
	Ulli Horlacher <framstag@rus.uni-stuttgart.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better.
Date: Mon, 19 Jul 2021 11:49:07 -0400	[thread overview]
Message-ID: <20210719154907.GA28482@fieldses.org> (raw)
In-Reply-To: <162638862766.13764.8566962032225976326@noble.neil.brown.name>

On Fri, Jul 16, 2021 at 08:37:07AM +1000, NeilBrown wrote:
> On Fri, 16 Jul 2021, Josef Bacik wrote:
> > On 7/15/21 1:24 PM, Christoph Hellwig wrote:
> > > On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote:
> > >> Because there's no alternative.  We need a way to tell userspace they've
> > >> wandered into a different inode namespace.  There's no argument that what
> > >> we're doing is ugly, but there's never been a clear "do X instead".  Just a
> > >> lot of whinging that btrfs is broken.  This makes userspace happy and is
> > >> simple and straightforward.  I'm open to alternatives, but there have been 0
> > >> workable alternatives proposed in the last decade of complaining about it.
> > > 
> > > Make sure we cross a vfsmount when crossing the "st_dev" domain so
> > > that it is properly reported.   Suggested many times and ignored all
> > > the time beause it requires a bit of work.
> > > 
> > 
> > You keep telling me this but forgetting that I did all this work when you 
> > originally suggested it.  The problem I ran into was the automount stuff 
> > requires that we have a completely different superblock for every vfsmount. 
> > This is fine for things like nfs or samba where the automount literally points 
> > to a completely different mount, but doesn't work for btrfs where it's on the 
> > same file system.  If you have 1000 subvolumes and run sync() you're going to 
> > write the superblock 1000 times for the same file system.  You are going to 
> > reclaim inodes on the same file system 1000 times.  You are going to reclaim 
> > dcache on the same filesytem 1000 times.  You are also going to pin 1000 
> > dentries/inodes into memory whenever you wander into these things because the 
> > super is going to hold them open.
> > 
> > This is not a workable solution.  It's not a matter of simply tying into 
> > existing infrastructure, we'd have to completely rework how the VFS deals with 
> > this stuff in order to be reasonable.  And when I brought this up to Al he told 
> > me I was insane and we absolutely had to have a different SB for every vfsmount, 
> > which means we can't use vfsmount for this, which means we don't have any other 
> > options.  Thanks,
> 
> When I was first looking at this, I thought that separate vfsmnts
> and auto-mounting was the way to go "just like NFS".  NFS still shares a
> lot between the multiple superblock - certainly it shares the same
> connection to the server.
> 
> But I dropped the idea when Bruce pointed out that nfsd is not set up to
> export auto-mounted filesystems.

Yes.  I wish it was....  But we'd need some way to look a
not-currently-mounted filesystem by filehandle:

> It needs to be able to find a
> filesystem given a UUID (extracted from a filehandle), and it does this
> by walking through the mount table to find one that matches.  So unless
> all btrfs subvols were mounted all the time (which I wouldn't propose),
> it would need major work to fix.
> 
> NFSv4 describes the fsid as having a "major" and "minor" component.
> We've never treated these as having an important meaning - just extra
> bits to encode uniqueness in.  Maybe we should have used "major" for the
> vfsmnt, and kept "minor" for the subvol.....

So nfsd would use the "major" ID to find the parent export, and then
btrfs would use the "minor" ID to identify the subvolume?

--b.

> The idea for a single vfsmnt exposing multiple inode-name-spaces does
> appeal to me.  The "st_dev" is just part of the name, and already a
> fairly blurry part.  Thanks to bind mounts, multiple mounts can have the
> same st_dev.  I see no intrinsic reason that a single mount should not
> have multiple fsids, provided that a coherent picture is provided to
> userspace which doesn't contain too many surprises.

  parent reply	other threads:[~2021-07-19 16:12 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-13  3:53 any idea about auto export multiple btrfs snapshots? Wang Yugui
2021-06-14 22:50 ` NeilBrown
2021-06-15 15:13   ` Wang Yugui
2021-06-15 15:41     ` Wang Yugui
2021-06-16  5:47     ` Wang Yugui
2021-06-17  3:02     ` NeilBrown
2021-06-17  4:28       ` Wang Yugui
2021-06-18  0:32         ` NeilBrown
2021-06-18  7:26           ` Wang Yugui
2021-06-18 13:34             ` Wang Yugui
2021-06-19  6:47               ` Wang Yugui
2021-06-20 12:27             ` Wang Yugui
2021-06-21  4:52             ` NeilBrown
2021-06-21  5:13               ` NeilBrown
2021-06-21  8:34                 ` Wang Yugui
2021-06-22  1:28                   ` NeilBrown
2021-06-22  3:22                     ` Wang Yugui
2021-06-22  7:14                       ` Wang Yugui
2021-06-23  0:59                         ` NeilBrown
2021-06-23  6:14                           ` Wang Yugui
2021-06-23  6:29                             ` NeilBrown
2021-06-23  9:34                               ` Wang Yugui
2021-06-23 23:38                                 ` NeilBrown
2021-06-23 15:35                           ` J. Bruce Fields
2021-06-23 22:04                             ` NeilBrown
2021-06-23 22:25                               ` J. Bruce Fields
2021-06-23 23:29                                 ` NeilBrown
2021-06-23 23:41                                   ` Frank Filz
2021-06-24  0:01                                   ` J. Bruce Fields
2021-06-24 21:58                               ` Patrick Goetz
2021-06-24 23:27                                 ` NeilBrown
2021-06-21 14:35               ` Frank Filz
2021-06-21 14:55                 ` Wang Yugui
2021-06-21 17:49                   ` Frank Filz
2021-06-21 22:41                     ` Wang Yugui
2021-06-22 17:34                       ` Frank Filz
2021-06-22 22:48                         ` Wang Yugui
2021-06-17  2:15   ` Wang Yugui
     [not found] ` <20210310074620.GA2158@tik.uni-stuttgart.de>
     [not found]   ` <162632387205.13764.6196748476850020429@noble.neil.brown.name>
2021-07-15 14:09     ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik
2021-07-15 16:45       ` Christoph Hellwig
2021-07-15 17:11         ` Josef Bacik
2021-07-15 17:24           ` Christoph Hellwig
2021-07-15 18:01             ` Josef Bacik
2021-07-15 22:37               ` NeilBrown
2021-07-19 15:40                 ` Josef Bacik
2021-07-19 20:00                   ` J. Bruce Fields
2021-07-19 20:44                     ` Josef Bacik
2021-07-19 23:53                       ` NeilBrown
2021-07-19 15:49                 ` J. Bruce Fields [this message]
2021-07-20  0:02                   ` NeilBrown
2021-07-19  9:16               ` Christoph Hellwig
2021-07-19 23:54                 ` NeilBrown
2021-07-20  6:23                   ` Christoph Hellwig
2021-07-20  7:17                     ` NeilBrown
2021-07-20  8:00                       ` Christoph Hellwig
2021-07-20 23:11                         ` NeilBrown
2021-07-20 22:10               ` J. Bruce Fields
2021-07-15 23:02       ` NeilBrown
2021-07-15 15:45     ` J. Bruce Fields
2021-07-15 23:08       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210719154907.GA28482@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=framstag@rus.uni-stuttgart.de \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox