From: Josef Bacik <josef@toxicpanda.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: NeilBrown <neilb@suse.de>, Christoph Hellwig <hch@infradead.org>,
Chuck Lever <chuck.lever@oracle.com>, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>,
linux-nfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>,
Ulli Horlacher <framstag@rus.uni-stuttgart.de>,
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH/RFC] NFSD: handle BTRFS subvolumes better.
Date: Mon, 19 Jul 2021 16:44:00 -0400 [thread overview]
Message-ID: <56bd8b67-a72c-1946-e877-838d9c0c65bd@toxicpanda.com> (raw)
In-Reply-To: <20210719200003.GA32471@fieldses.org>
On 7/19/21 4:00 PM, J. Bruce Fields wrote:
> On Mon, Jul 19, 2021 at 11:40:28AM -0400, Josef Bacik wrote:
>> Ok so setting aside btrfs for the moment, how does NFS deal with
>> exporting a directory that has multiple other file systems under
>> that tree? I assume the same sort of problem doesn't occur, but why
>> is that? Is it because it's a different vfsmount/sb or is there
>> some other magic making this work? Thanks,
>
> There are two main ways an NFS client can look up a file: by name or by
> filehandle. The former's the normal filesystem directory lookup that
> we're used to. If the name refers to a mountpoint, the server can cross
> into the mounted filesystem like anyone else.
>
> It's the lookup by filehandle that's interesting. Typically the
> filehandle includes a UUID and an inode number. The server looks up the
> UUID with some help from mountd, and that gives a superblock that nfsd
> can use for the inode lookup.
>
> As Neil says, mountd does that basically by searching among mounted
> filesystems for one with that uuid.
>
> So if you wanted to be able to handle a uuid for a filesystem that's not
> even mounted yet, you'd need some new mechanism to look up such uuids.
>
> That's something we don't currently support but that we'd need to
> support if BTRFS subvolumes were automounted. (And it might have other
> uses as well.)
>
> But I'm not entirely sure if that answers your question....
>
Right, because btrfs handles the filehandles ourselves properly with the
export_operations and we encode the subvolume id's into those things to make
sure we can always do the proper lookup.
I suppose the real problem is that NFS is exposing the inode->i_ino to the
client without understanding that it's on a different subvolume.
Our trick of simply allocating an anonymous bdev every time you wander into a
subvolume to get a unique st_dev doesn't help you guys because you are looking
for mounted file systems.
I'm not concerned about the FH case, because for that it's already been crafted
by btrfs and we know what to do with it, so it's always going to be correct.
The actual problem is that we can do
getattr(/file1)
getattr(/snap/file1)
on the client and the NFS server just blind sends i_ino with the same fsid
because / and /snap are the same fsid.
Which brings us back to what HCH is complaining about. In his view if we had a
vfsmount for /snap then you would know that it was a different fs. However that
would only actually work if we generated a completely different superblock and
thus gave /snap a unique fsid, right?
If we did the automount thing, and the NFS server went down and came back up and
got a getattr(/snap/file1) from a previously generated FH it would still work
right, because it would come into the export_operations with the format that
btrfs is expecting and it would be able to do the lookup. This FH lookup would
do the automount magic it needs to and then NFS would have the fsid it needs,
correct? Thanks,
Josef
next prev parent reply other threads:[~2021-07-19 23:27 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-13 3:53 any idea about auto export multiple btrfs snapshots? Wang Yugui
2021-06-14 22:50 ` NeilBrown
2021-06-15 15:13 ` Wang Yugui
2021-06-15 15:41 ` Wang Yugui
2021-06-16 5:47 ` Wang Yugui
2021-06-17 3:02 ` NeilBrown
2021-06-17 4:28 ` Wang Yugui
2021-06-18 0:32 ` NeilBrown
2021-06-18 7:26 ` Wang Yugui
2021-06-18 13:34 ` Wang Yugui
2021-06-19 6:47 ` Wang Yugui
2021-06-20 12:27 ` Wang Yugui
2021-06-21 4:52 ` NeilBrown
2021-06-21 5:13 ` NeilBrown
2021-06-21 8:34 ` Wang Yugui
2021-06-22 1:28 ` NeilBrown
2021-06-22 3:22 ` Wang Yugui
2021-06-22 7:14 ` Wang Yugui
2021-06-23 0:59 ` NeilBrown
2021-06-23 6:14 ` Wang Yugui
2021-06-23 6:29 ` NeilBrown
2021-06-23 9:34 ` Wang Yugui
2021-06-23 23:38 ` NeilBrown
2021-06-23 15:35 ` J. Bruce Fields
2021-06-23 22:04 ` NeilBrown
2021-06-23 22:25 ` J. Bruce Fields
2021-06-23 23:29 ` NeilBrown
2021-06-23 23:41 ` Frank Filz
2021-06-24 0:01 ` J. Bruce Fields
2021-06-24 21:58 ` Patrick Goetz
2021-06-24 23:27 ` NeilBrown
2021-06-21 14:35 ` Frank Filz
2021-06-21 14:55 ` Wang Yugui
2021-06-21 17:49 ` Frank Filz
2021-06-21 22:41 ` Wang Yugui
2021-06-22 17:34 ` Frank Filz
2021-06-22 22:48 ` Wang Yugui
2021-06-17 2:15 ` Wang Yugui
[not found] ` <20210310074620.GA2158@tik.uni-stuttgart.de>
[not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name>
2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik
2021-07-15 16:45 ` Christoph Hellwig
2021-07-15 17:11 ` Josef Bacik
2021-07-15 17:24 ` Christoph Hellwig
2021-07-15 18:01 ` Josef Bacik
2021-07-15 22:37 ` NeilBrown
2021-07-19 15:40 ` Josef Bacik
2021-07-19 20:00 ` J. Bruce Fields
2021-07-19 20:44 ` Josef Bacik [this message]
2021-07-19 23:53 ` NeilBrown
2021-07-19 15:49 ` J. Bruce Fields
2021-07-20 0:02 ` NeilBrown
2021-07-19 9:16 ` Christoph Hellwig
2021-07-19 23:54 ` NeilBrown
2021-07-20 6:23 ` Christoph Hellwig
2021-07-20 7:17 ` NeilBrown
2021-07-20 8:00 ` Christoph Hellwig
2021-07-20 23:11 ` NeilBrown
2021-07-20 22:10 ` J. Bruce Fields
2021-07-15 23:02 ` NeilBrown
2021-07-15 15:45 ` J. Bruce Fields
2021-07-15 23:08 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56bd8b67-a72c-1946-e877-838d9c0c65bd@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=framstag@rus.uni-stuttgart.de \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=wangyugui@e16-tech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox