public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: NeilBrown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	linux-nfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>,
	Ulli Horlacher <framstag@rus.uni-stuttgart.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better.
Date: Mon, 19 Jul 2021 11:40:28 -0400	[thread overview]
Message-ID: <15d0f450-cae5-22bc-eef3-8a973e6dda27@toxicpanda.com> (raw)
In-Reply-To: <162638862766.13764.8566962032225976326@noble.neil.brown.name>

On 7/15/21 6:37 PM, NeilBrown wrote:
> On Fri, 16 Jul 2021, Josef Bacik wrote:
>> On 7/15/21 1:24 PM, Christoph Hellwig wrote:
>>> On Thu, Jul 15, 2021 at 01:11:29PM -0400, Josef Bacik wrote:
>>>> Because there's no alternative.  We need a way to tell userspace they've
>>>> wandered into a different inode namespace.  There's no argument that what
>>>> we're doing is ugly, but there's never been a clear "do X instead".  Just a
>>>> lot of whinging that btrfs is broken.  This makes userspace happy and is
>>>> simple and straightforward.  I'm open to alternatives, but there have been 0
>>>> workable alternatives proposed in the last decade of complaining about it.
>>>
>>> Make sure we cross a vfsmount when crossing the "st_dev" domain so
>>> that it is properly reported.   Suggested many times and ignored all
>>> the time beause it requires a bit of work.
>>>
>>
>> You keep telling me this but forgetting that I did all this work when you
>> originally suggested it.  The problem I ran into was the automount stuff
>> requires that we have a completely different superblock for every vfsmount.
>> This is fine for things like nfs or samba where the automount literally points
>> to a completely different mount, but doesn't work for btrfs where it's on the
>> same file system.  If you have 1000 subvolumes and run sync() you're going to
>> write the superblock 1000 times for the same file system.  You are going to
>> reclaim inodes on the same file system 1000 times.  You are going to reclaim
>> dcache on the same filesytem 1000 times.  You are also going to pin 1000
>> dentries/inodes into memory whenever you wander into these things because the
>> super is going to hold them open.
>>
>> This is not a workable solution.  It's not a matter of simply tying into
>> existing infrastructure, we'd have to completely rework how the VFS deals with
>> this stuff in order to be reasonable.  And when I brought this up to Al he told
>> me I was insane and we absolutely had to have a different SB for every vfsmount,
>> which means we can't use vfsmount for this, which means we don't have any other
>> options.  Thanks,
> 
> When I was first looking at this, I thought that separate vfsmnts
> and auto-mounting was the way to go "just like NFS".  NFS still shares a
> lot between the multiple superblock - certainly it shares the same
> connection to the server.
> 
> But I dropped the idea when Bruce pointed out that nfsd is not set up to
> export auto-mounted filesystems.  It needs to be able to find a
> filesystem given a UUID (extracted from a filehandle), and it does this
> by walking through the mount table to find one that matches.  So unless
> all btrfs subvols were mounted all the time (which I wouldn't propose),
> it would need major work to fix.
> 
> NFSv4 describes the fsid as having a "major" and "minor" component.
> We've never treated these as having an important meaning - just extra
> bits to encode uniqueness in.  Maybe we should have used "major" for the
> vfsmnt, and kept "minor" for the subvol.....
> 
> The idea for a single vfsmnt exposing multiple inode-name-spaces does
> appeal to me.  The "st_dev" is just part of the name, and already a
> fairly blurry part.  Thanks to bind mounts, multiple mounts can have the
> same st_dev.  I see no intrinsic reason that a single mount should not
> have multiple fsids, provided that a coherent picture is provided to
> userspace which doesn't contain too many surprises.
> 

Ok so setting aside btrfs for the moment, how does NFS deal with exporting a 
directory that has multiple other file systems under that tree?  I assume the 
same sort of problem doesn't occur, but why is that?  Is it because it's a 
different vfsmount/sb or is there some other magic making this work?  Thanks,

Josef

  reply	other threads:[~2021-07-19 16:12 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-13  3:53 any idea about auto export multiple btrfs snapshots? Wang Yugui
2021-06-14 22:50 ` NeilBrown
2021-06-15 15:13   ` Wang Yugui
2021-06-15 15:41     ` Wang Yugui
2021-06-16  5:47     ` Wang Yugui
2021-06-17  3:02     ` NeilBrown
2021-06-17  4:28       ` Wang Yugui
2021-06-18  0:32         ` NeilBrown
2021-06-18  7:26           ` Wang Yugui
2021-06-18 13:34             ` Wang Yugui
2021-06-19  6:47               ` Wang Yugui
2021-06-20 12:27             ` Wang Yugui
2021-06-21  4:52             ` NeilBrown
2021-06-21  5:13               ` NeilBrown
2021-06-21  8:34                 ` Wang Yugui
2021-06-22  1:28                   ` NeilBrown
2021-06-22  3:22                     ` Wang Yugui
2021-06-22  7:14                       ` Wang Yugui
2021-06-23  0:59                         ` NeilBrown
2021-06-23  6:14                           ` Wang Yugui
2021-06-23  6:29                             ` NeilBrown
2021-06-23  9:34                               ` Wang Yugui
2021-06-23 23:38                                 ` NeilBrown
2021-06-23 15:35                           ` J. Bruce Fields
2021-06-23 22:04                             ` NeilBrown
2021-06-23 22:25                               ` J. Bruce Fields
2021-06-23 23:29                                 ` NeilBrown
2021-06-23 23:41                                   ` Frank Filz
2021-06-24  0:01                                   ` J. Bruce Fields
2021-06-24 21:58                               ` Patrick Goetz
2021-06-24 23:27                                 ` NeilBrown
2021-06-21 14:35               ` Frank Filz
2021-06-21 14:55                 ` Wang Yugui
2021-06-21 17:49                   ` Frank Filz
2021-06-21 22:41                     ` Wang Yugui
2021-06-22 17:34                       ` Frank Filz
2021-06-22 22:48                         ` Wang Yugui
2021-06-17  2:15   ` Wang Yugui
     [not found] ` <20210310074620.GA2158@tik.uni-stuttgart.de>
     [not found]   ` <162632387205.13764.6196748476850020429@noble.neil.brown.name>
2021-07-15 14:09     ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik
2021-07-15 16:45       ` Christoph Hellwig
2021-07-15 17:11         ` Josef Bacik
2021-07-15 17:24           ` Christoph Hellwig
2021-07-15 18:01             ` Josef Bacik
2021-07-15 22:37               ` NeilBrown
2021-07-19 15:40                 ` Josef Bacik [this message]
2021-07-19 20:00                   ` J. Bruce Fields
2021-07-19 20:44                     ` Josef Bacik
2021-07-19 23:53                       ` NeilBrown
2021-07-19 15:49                 ` J. Bruce Fields
2021-07-20  0:02                   ` NeilBrown
2021-07-19  9:16               ` Christoph Hellwig
2021-07-19 23:54                 ` NeilBrown
2021-07-20  6:23                   ` Christoph Hellwig
2021-07-20  7:17                     ` NeilBrown
2021-07-20  8:00                       ` Christoph Hellwig
2021-07-20 23:11                         ` NeilBrown
2021-07-20 22:10               ` J. Bruce Fields
2021-07-15 23:02       ` NeilBrown
2021-07-15 15:45     ` J. Bruce Fields
2021-07-15 23:08       ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15d0f450-cae5-22bc-eef3-8a973e6dda27@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=framstag@rus.uni-stuttgart.de \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox