From: Wang Yugui <wangyugui@e16-tech.com>
To: "NeilBrown" <neilb@suse.de>, linux-nfs@vger.kernel.org
Subject: Re: any idea about auto export multiple btrfs snapshots?
Date: Tue, 22 Jun 2021 15:14:07 +0800 [thread overview]
Message-ID: <20210622151407.C002.409509F4@e16-tech.com> (raw)
In-Reply-To: <20210622112253.DAEE.409509F4@e16-tech.com>
[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]
Hi,
> > >
> > > > > > It seems more fixes are needed.
> > > > >
> > > > > I think the problem is that the submount doesn't appear in /proc/mounts.
> > > > > "nfsd_fh()" in nfs-utils needs to be able to map from the uuid for a
> > > > > filesystem to the mount point. To do this it walks through /proc/mounts
> > > > > checking the uuid of each filesystem. If a filesystem isn't listed
> > > > > there, it obviously fails.
> > > > >
> > > > > I guess you could add code to nfs-utils to do whatever "btrfs subvol
> > > > > list" does to make up for the fact that btrfs doesn't register in
> > > > > /proc/mounts.
> > > >
> > > > Another approach might be to just change svcxdr_encode_fattr3() and
> > > > nfsd4_encode_fattr() in the 'FSIDSOJURCE_UUID' case to check if
> > > > dentry->d_inode has a different btrfs volume id to
> > > > exp->ex_path.dentry->d_inode.
> > > > If it does, then mix the volume id into the fsid somehow.
> > > >
> > > > With that, you wouldn't want the first change I suggested.
> > >
> > > This is what I have done. and it is based on linux 5.10.44
> > >
> > > but it still not work, so still more jobs needed.
> > >
> >
> > The following is more what I had in mind. It doesn't quite work and I
> > cannot work out why.
> >
> > If you 'stat' a file inside the subvol, then 'find' will not complete.
> > If you don't, then it will.
> >
> > Doing that 'stat' changes the st_dev number of the main filesystem,
> > which seems really weird.
> > I'm probably missing something obvious. Maybe a more careful analysis
> > of what is changing when will help.
>
> we compare the trace output between crossmnt and btrfs subvol with some
> trace, we found out that we need to add the subvol support to
> follow_down().
>
> btrfs subvol should be treated as virtual 'mount point' for nfsd in follow_down().
btrfs subvol crossmnt begin to work, although buggy.
some subvol is crossmnt-ed, some subvol is yet not, and some dir is
wrongly crossmnt-ed
'stat /nfs/test /nfs/test/sub1' will cause btrfs subvol crossmnt begin
to happen.
This is the current patch based on 5.10.44.
At least nfsd_follow_up() is buggy.
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2021/06/22
[-- Attachment #2: 0001-nfsd-btrfs-subvol-support.txt --]
[-- Type: application/octet-stream, Size: 6044 bytes --]
From 57e6b3cec9b8ac396b661c190511af80839ddbe5 Mon Sep 17 00:00:00 2001
From: wangyugui <wangyugui@e16-tech.com>
Date: Thu, 17 Jun 2021 08:33:06 +0800
Subject: [PATCH] nfsd: btrfs subvol support
(struct statfs).f_fsid: uniq between btrfs subvols
(struct stat).st_dev: uniq between btrfs subvols
(struct statx).stx_mnt_id: NOT uniq between btrfs subvols, but yet not used in nfs/nfsd
kernel samples/vfs/test-statx.c
stx_rdev_major/stx_rdev_minor seems be truncated by something
like old_encode_dev()/old_decode_dev()?
TODO: (struct nfs_fattr).fsid
TODO: FSIDSOURCE_FSID in nfs3xdr.c/nfsxdr.c
---
fs/nfsd/nfs3xdr.c | 2 +-
fs/nfsd/nfs4xdr.c | 16 ++++++++++++----
fs/nfsd/nfsd.h | 42 ++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/vfs.c | 10 ++++++++--
4 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 716566d..0de2953 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -877,7 +877,7 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp,
dchild = lookup_positive_unlocked(name, dparent, namlen);
if (IS_ERR(dchild))
return rv;
- if (d_mountpoint(dchild))
+ if (d_mountpoint(dchild) || unlikely(d_is_btrfs_subvol(dchild)))
goto out;
if (dchild->d_inode->i_ino != ino)
goto out;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5f5169b..ee335fc 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2457,7 +2457,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
if (path_equal(&cur, root))
break;
if (cur.dentry == cur.mnt->mnt_root) {
- if (follow_up(&cur))
+ if (nfsd_follow_up(&cur))
continue;
goto out_free;
}
@@ -2648,7 +2648,7 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
int err;
path_get(&path);
- while (follow_up(&path)) {
+ while (nfsd_follow_up(&path)) {
if (path.dentry != path.mnt->mnt_root)
break;
}
@@ -2728,6 +2728,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
.dentry = dentry,
};
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ bool is_btrfs_subvol= d_is_btrfs_subvol(dentry);
BUG_ON(bmval1 & NFSD_WRITEONLY_ATTRS_WORD1);
BUG_ON(!nfsd_attrs_supported(minorversion, bmval));
@@ -2744,7 +2745,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
if ((bmval0 & (FATTR4_WORD0_FILES_AVAIL | FATTR4_WORD0_FILES_FREE |
FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_MAXNAME)) ||
(bmval1 & (FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE |
- FATTR4_WORD1_SPACE_TOTAL))) {
+ FATTR4_WORD1_SPACE_TOTAL)) ||
+ unlikely(is_btrfs_subvol)) {
err = vfs_statfs(&path, &statfs);
if (err)
goto out_nfserr;
@@ -2895,7 +2897,13 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
*p++ = cpu_to_be32(MINOR(stat.dev));
break;
case FSIDSOURCE_UUID:
- p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
+ if (unlikely(is_btrfs_subvol)){
+ *p++ = cpu_to_be32(statfs.f_fsid.val[0]);
+ *p++ = cpu_to_be32(statfs.f_fsid.val[1]);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ } else
+ p = xdr_encode_opaque_fixed(p, exp->ex_uuid,
EX_UUID_LEN);
break;
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cb742e1..42e14d6 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -487,4 +487,47 @@ static inline int nfsd4_is_junction(struct dentry *dentry)
#endif /* CONFIG_NFSD_V4 */
+/* btrfs subvol support */
+/*
+ * same logical as fs/btrfs is_subvolume_inode(struct inode *inode)
+ * #define BTRFS_FIRST_FREE_OBJECTID 256ULL
+ * #define BTRFS_SUPER_MAGIC 0x9123683E
+ */
+static inline bool d_is_btrfs_subvol(const struct dentry *dentry)
+{
+ bool ret = dentry->d_inode && unlikely(dentry->d_inode->i_ino == 256ULL) &&
+ dentry->d_sb && dentry->d_sb->s_magic == BTRFS_SUPER_MAGIC;
+ //printk(KERN_INFO "d_is_btrfs_subvol(%s)=%d\n", dentry->d_name.name, ret);
+ return ret;
+}
+#include <linux/namei.h>
+/* add btrfs subvol support that only used in nfsd */
+/* FIXME: free clone_private_mount()? */
+static inline int nfsd_follow_down(struct path *path)
+{
+ if(unlikely(d_is_btrfs_subvol(path->dentry))){
+ //struct dentry *mnt_root=path->dentry;
+ struct vfsmount *mounted = clone_private_mount(path);
+ if (mounted) {
+ //mounted->mnt_root=mnt_root;
+ //? dput(path->dentry);
+ //? mntput(path->mnt);
+ path->mnt = mounted;
+ path->dentry = dget(mounted->mnt_root);
+ return 0;
+ }
+ }
+ return follow_down(path);
+}
+/* add btrfs subvol support that only used in nfsd */
+/* FIXME: free clone_private_mount()? */
+static inline int nfsd_follow_up(struct path *path)
+{
+ printk(KERN_INFO "nfsd_follow_up(%s)\n", path->dentry->d_name.name);
+ if(unlikely(d_is_btrfs_subvol(path->dentry))){
+ return 0;
+ }
+ return follow_up(path);
+}
+
#endif /* LINUX_NFSD_NFSD_H */
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 1ecacee..3ab9b7f 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -65,9 +65,13 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
.dentry = dget(dentry)};
int err = 0;
- err = follow_down(&path);
+ err = nfsd_follow_down(&path);
if (err < 0)
goto out;
+ if (unlikely(d_is_btrfs_subvol(dentry))){
+ path_put(&path);
+ goto out;
+ } else
if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
nfsd_mountpoint(dentry, exp) == 2) {
/* This is only a mountpoint in some other namespace */
@@ -114,7 +118,7 @@ static void follow_to_parent(struct path *path)
{
struct dentry *dp;
- while (path->dentry == path->mnt->mnt_root && follow_up(path))
+ while (path->dentry == path->mnt->mnt_root && nfsd_follow_up(path))
;
dp = dget_parent(path->dentry);
dput(path->dentry);
@@ -160,6 +164,8 @@ int nfsd_mountpoint(struct dentry *dentry, struct svc_export *exp)
return 1;
if (nfsd4_is_junction(dentry))
return 1;
+ if (d_is_btrfs_subvol(dentry))
+ return 1;
if (d_mountpoint(dentry))
/*
* Might only be a mountpoint in a different namespace,
--
2.30.2
[-- Attachment #3: 0002-trace-nfsd-btrfs-subvol-support.txt --]
[-- Type: application/octet-stream, Size: 5897 bytes --]
From 639489a60b84f9d16955143f52fc6316205ac57a Mon Sep 17 00:00:00 2001
From: wangyugui <wangyugui@e16-tech.com>
Date: Thu, 17 Jun 2021 08:33:06 +0800
Subject: [PATCH] trace nfsd: btrfs subvol support
[ 235.831136] set_version_and_fsid_type fsid_type=7
[ 235.842483] nfsd_cross_mnt(test)=0
[ 235.845882] nfsd: nfsd_lookup(fh 28: 00070001 00440001 00000000 73fb4b0a 31596b2e 7be9789b, test)=/
[ 235.854902] set_version_and_fsid_type fsid_type=6
[ 235.859686] nfs_d_automount(test)
[ 235.863069] nfsd_cross_mnt(test)=0
[ 235.866478] nfsd: nfsd_lookup(fh 28: 00070001 00440001 00000000 73fb4b0a 31596b2e 7be9789b, test)=/
[ 235.875500] set_version_and_fsid_type fsid_type=6
[ 239.204677] lookup_positive_unlocked(name=xfs2) dentry=xfs2
[ 239.210311] nfsd_cross_mnt(xfs2)=0
[ 239.213708] set_version_and_fsid_type fsid_type=6
[ 239.218406] nfsd4_encode_dirent_fattr(/) FATTR4_WORD0_FSID=1 FATTR4_WORD1_MOUNTED_ON_FILEID=1
why /?
[ 239.227078] nfs_d_automount(xfs2)
why?
[ 239.230437] nfsd_cross_mnt(xfs2)=0
[ 239.233838] nfsd: nfsd_lookup(fh 20: 00060001 2b031f7d c249fdd0 1aa84b8e 045d774a 00000000, xfs2)=/
[ 239.242854] set_version_and_fsid_type fsid_type=6
[ 373.332124] set_version_and_fsid_type fsid_type=7
[ 373.337639] nfsd_cross_mnt(test)=0
[ 373.341035] nfsd: nfsd_lookup(fh 28: 00070001 00440001 00000000 73fb4b0a 31596b2e 7be9789b, test)=/
[ 373.350047] set_version_and_fsid_type fsid_type=6
[ 373.354781] nfs_d_automount(test)
[ 373.358125] nfsd_cross_mnt(test)=0
[ 373.361524] nfsd: nfsd_lookup(fh 28: 00070001 00440001 00000000 73fb4b0a 31596b2e 7be9789b, test)=/
[ 373.370537] set_version_and_fsid_type fsid_type=6
[ 377.521908] lookup_positive_unlocked(name=sub1) dentry=sub1
[ 377.527477] nfsd_cross_mnt(sub1)=0
[ 377.530879] set_version_and_fsid_type fsid_type=6
[ 377.535572] nfsd4_encode_dirent_fattr(sub1) FATTR4_WORD0_FSID=1 FATTR4_WORD1_MOUNTED_ON_FILEID=1
[ 377.544420] lookup_positive_unlocked(name=.snapshot) dentry=.snapshot
btrfs subvols =>force crossmnt
subvol nfs/umount: os shutdonw or manual nfs/umount?
special status(BTRFS_LAST_FREE_OBJECTID,only return to nfs)?
#define BTRFS_LAST_FREE_OBJECTID -256ULL
(struct file )->(struct inode *f_inode)->(struct super_block *i_sb;)->(unsigned long s_magic)
btrfs->xfs =>still need crossmnt
xfs->btrfs =>still need crossmnt
NFSEXP_CROSSMOUNT
NFSD_JUNCTION_XATTR_NAME
AT_NO_AUTOMOUNT
NFS_ATTR_FATTR_MOUNTPOINT
S_AUTOMOUNT
---
fs/nfs/dir.c | 2 ++
fs/nfs/namespace.c | 1 +
fs/nfsd/nfs4xdr.c | 5 +++++
fs/nfsd/nfsfh.c | 1 +
fs/nfsd/vfs.c | 5 +++++
5 files changed, 14 insertions(+)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c837675..975440d 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1799,6 +1799,8 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
if (!(flags & LOOKUP_OPEN) || (flags & LOOKUP_DIRECTORY))
goto full_reval;
+ if (dentry->d_inode && dentry->d_inode->i_ino == 256ULL && dentry->d_sb)
+ printk(KERN_INFO "nfs4_do_lookup_revalidate(%s)=%lx\n", dentry->d_name.name, dentry->d_sb->s_magic);
if (d_mountpoint(dentry))
goto full_reval;
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 2bcbe38..f69715c 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -152,6 +152,7 @@ struct vfsmount *nfs_d_automount(struct path *path)
int timeout = READ_ONCE(nfs_mountpoint_expiry_timeout);
int ret;
+ printk(KERN_INFO "nfs_d_automount(%s)\n", path->dentry->d_name.name);
if (IS_ROOT(path->dentry))
return ERR_PTR(-ESTALE);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6255b06..257ee17 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3307,6 +3307,7 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
dentry = lookup_positive_unlocked(name, cd->rd_fhp->fh_dentry, namlen);
if (IS_ERR(dentry))
return nfserrno(PTR_ERR(dentry));
+ printk(KERN_INFO "lookup_positive_unlocked(name=%s) dentry=%s\n", name, dentry->d_name.name);
exp_get(exp);
/*
@@ -3345,6 +3346,10 @@ nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
out_put:
dput(dentry);
exp_put(exp);
+ printk(KERN_INFO "nfsd4_encode_dirent_fattr(%s) FATTR4_WORD0_FSID=%d FATTR4_WORD1_MOUNTED_ON_FILEID=%d\n",
+ dentry->d_name.name,
+ !!(cd->rd_bmval[0]&FATTR4_WORD0_FSID),
+ !!(cd->rd_bmval[1]&FATTR4_WORD1_MOUNTED_ON_FILEID));
return nfserr;
}
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index c81dbba..28eaea3 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -530,6 +530,7 @@ static void set_version_and_fsid_type(struct svc_fh *fhp, struct svc_export *exp
fhp->fh_handle.fh_version = version;
if (version)
fhp->fh_handle.fh_fsid_type = fsid_type;
+ printk(KERN_INFO "set_version_and_fsid_type fsid_type=%d\n", fsid_type);
}
__be32
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ae34ffc..6c55010 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -66,6 +66,8 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
int err = 0;
err = nfsd_follow_down(&path);
+ printk(KERN_INFO "follow_down()=%d path.mnt=%s path.dentry=%s\n", err,
+ path.mnt->mnt_root->d_name.name, path.dentry->d_name.name);
if (err < 0)
goto out;
if (unlikely(d_is_btrfs_subvol(dentry))){
@@ -111,6 +113,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
path_put(&path);
exp_put(exp2);
out:
+ printk(KERN_INFO "nfsd_cross_mnt(%s)=%d\n", dentry->d_name.name, err);
return err;
}
@@ -233,9 +236,11 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
*dentry_ret = dentry;
*exp_ret = exp;
+ // printk(KERN_INFO "nfsd: nfsd_lookup(fh %s, %.*s)=%s\n", SVCFH_fmt(fhp), len, name, dentry->d_name.name);
return 0;
out_nfserr:
+ // printk(KERN_INFO "nfsd: nfsd_lookup(fh %s, %.*s) error\n", SVCFH_fmt(fhp), len, name);
exp_put(exp);
return nfserrno(host_err);
}
--
2.30.2
next prev parent reply other threads:[~2021-06-22 7:14 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-13 3:53 any idea about auto export multiple btrfs snapshots? Wang Yugui
2021-06-14 22:50 ` NeilBrown
2021-06-15 15:13 ` Wang Yugui
2021-06-15 15:41 ` Wang Yugui
2021-06-16 5:47 ` Wang Yugui
2021-06-17 3:02 ` NeilBrown
2021-06-17 4:28 ` Wang Yugui
2021-06-18 0:32 ` NeilBrown
2021-06-18 7:26 ` Wang Yugui
2021-06-18 13:34 ` Wang Yugui
2021-06-19 6:47 ` Wang Yugui
2021-06-20 12:27 ` Wang Yugui
2021-06-21 4:52 ` NeilBrown
2021-06-21 5:13 ` NeilBrown
2021-06-21 8:34 ` Wang Yugui
2021-06-22 1:28 ` NeilBrown
2021-06-22 3:22 ` Wang Yugui
2021-06-22 7:14 ` Wang Yugui [this message]
2021-06-23 0:59 ` NeilBrown
2021-06-23 6:14 ` Wang Yugui
2021-06-23 6:29 ` NeilBrown
2021-06-23 9:34 ` Wang Yugui
2021-06-23 23:38 ` NeilBrown
2021-06-23 15:35 ` J. Bruce Fields
2021-06-23 22:04 ` NeilBrown
2021-06-23 22:25 ` J. Bruce Fields
2021-06-23 23:29 ` NeilBrown
2021-06-23 23:41 ` Frank Filz
2021-06-24 0:01 ` J. Bruce Fields
2021-06-24 21:58 ` Patrick Goetz
2021-06-24 23:27 ` NeilBrown
2021-06-21 14:35 ` Frank Filz
2021-06-21 14:55 ` Wang Yugui
2021-06-21 17:49 ` Frank Filz
2021-06-21 22:41 ` Wang Yugui
2021-06-22 17:34 ` Frank Filz
2021-06-22 22:48 ` Wang Yugui
2021-06-17 2:15 ` Wang Yugui
[not found] ` <20210310074620.GA2158@tik.uni-stuttgart.de>
[not found] ` <162632387205.13764.6196748476850020429@noble.neil.brown.name>
2021-07-15 14:09 ` [PATCH/RFC] NFSD: handle BTRFS subvolumes better Josef Bacik
2021-07-15 16:45 ` Christoph Hellwig
2021-07-15 17:11 ` Josef Bacik
2021-07-15 17:24 ` Christoph Hellwig
2021-07-15 18:01 ` Josef Bacik
2021-07-15 22:37 ` NeilBrown
2021-07-19 15:40 ` Josef Bacik
2021-07-19 20:00 ` J. Bruce Fields
2021-07-19 20:44 ` Josef Bacik
2021-07-19 23:53 ` NeilBrown
2021-07-19 15:49 ` J. Bruce Fields
2021-07-20 0:02 ` NeilBrown
2021-07-19 9:16 ` Christoph Hellwig
2021-07-19 23:54 ` NeilBrown
2021-07-20 6:23 ` Christoph Hellwig
2021-07-20 7:17 ` NeilBrown
2021-07-20 8:00 ` Christoph Hellwig
2021-07-20 23:11 ` NeilBrown
2021-07-20 22:10 ` J. Bruce Fields
2021-07-15 23:02 ` NeilBrown
2021-07-15 15:45 ` J. Bruce Fields
2021-07-15 23:08 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210622151407.C002.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox