From: "Dilger, Andreas" <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>,
"bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org"
<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
Cc: "cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org"
<cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org"
<fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"HPDD-discuss-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org"
<HPDD-discuss-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org"
<ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org>
Subject: Re: [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
Date: Sat, 12 Sep 2015 04:41:33 +0000 [thread overview]
Message-ID: <D21906EE.10A741%andreas.dilger@intel.com> (raw)
In-Reply-To: <1441966830-5517-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
<hpdd-discuss-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org on behalf of jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
wrote:
>With NFSv3 nfsd will always attempt to send along WCC data to the
>client. This generally involves saving off the in-core inode information
>prior to doing the operation on the given filehandle, and then issuing a
>vfs_getattr to it after the op.
>
>Some filesystems (particularly clustered or networked ones) have an
>expensive ->getattr inode operation. Atomicitiy is also often difficult
>or impossible to guarantee on such filesystems. For those, we're best
>off not trying to provide WCC information to the client at all, and to
>simply allow it to poll for that information as needed with a GETATTR
>RPC.
>
>This patch adds a new flags field to struct export_operations, and
>defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
>that nfsd should not attempt to provide WCC info in NFSv3 replies. It
>also adds a blurb about the new flags field and flag to the exporting
>documentation.
>
>The server will also now skip collecting this information for NFSv2 as
>well, since that info is never used there anyway.
>
>Note that this patch does not add this flag to any filesystem
>export_operations structures. This was originally developed to allow
>reexporting nfs via nfsd. That code is not (and may never be) suitable
>for merging into mainline.
>
>Other filesystems may want to consider enabling this flag too. It's hard
>to tell however which ones have export operations to enable export via
>knfsd and which ones mostly rely on them for open-by-filehandle support,
>so I'm leaving that up to the individual maintainers to decide. I am
>cc'ing the relevant lists for those filesystems that I think may want to
>consider adding this though.
>
>Cc: HPDD-discuss-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
>Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>Cc: cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
>Cc: fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>Cc: ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org
>Signed-off-by: Jeff Layton <jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
>---
> Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
> fs/nfsd/nfs3xdr.c | 5 ++++-
> fs/nfsd/nfsfh.c | 14 ++++++++++++++
> fs/nfsd/nfsfh.h | 5 ++++-
> include/linux/exportfs.h | 2 ++
> 5 files changed, 51 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/filesystems/nfs/Exporting
>b/Documentation/filesystems/nfs/Exporting
>index 520a4becb75c..fa636cde3907 100644
>--- a/Documentation/filesystems/nfs/Exporting
>+++ b/Documentation/filesystems/nfs/Exporting
>@@ -138,6 +138,11 @@ struct which has the following members:
> to find potential names, and matches inode numbers to find the
>correct
> match.
>
>+ flags
>+ Some filesystems may need to be handled differently than others. The
>+ export_operations struct also includes a flags field that allows the
>+ filesystem to communicate such information to nfsd. See the Export
>+ Operations Flags section below for more explanation.
>
> A filehandle fragment consists of an array of 1 or more 4byte words,
> together with a one byte "type".
>@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
>been padded with
> nuls. Rather, the encode_fh routine should choose a "type" which
> indicates the decode_fh how much of the filehandle is valid, and how
> it should be interpreted.
>+
>+Export Operations Flags
>+-----------------------
>+In addition to the operation vector pointers, struct export_operations
>also
>+contains a "flags" field that allows the filesystem to communicate to
>nfsd
>+that it may want to do things differently when dealing with it. The
>+following flags are defined:
>+
>+ EXPORT_OP_NOWCC
>+ RFC 1813 recommends that servers always send weak cache consistency
>+ (WCC) data to the client after each operation. The server should
>+ atomically collect attributes about the inode, do an operation on it,
>+ and then collect the attributes afterward. This allows the client to
>+ skip issuing GETATTRs in some situations but means that the server
>+ is calling vfs_getattr for almost all RPCs. On some filesystems
>+ (particularly those that are clustered or networked) this is
>expensive
>+ and atomicity is difficult to guarantee. This flag indicates to nfsd
>+ that it should skip providing WCC attributes to the client in NFSv3
>+ replies when doing operations on this filesystem. Consider enabling
>+ this on filesystems that have an expensive ->getattr inode operation,
>+ or when atomicity between pre and post operation attribute collection
>+ is impossible to guarantee.
>diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>index 01dcd494f781..c30c8c604e2a 100644
>--- a/fs/nfsd/nfs3xdr.c
>+++ b/fs/nfsd/nfs3xdr.c
>@@ -203,7 +203,7 @@ static __be32 *
> encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh
>*fhp)
> {
> struct dentry *dentry = fhp->fh_dentry;
>- if (dentry && d_really_is_positive(dentry)) {
>+ if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> __be32 err;
> struct kstat stat;
>
>@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> {
> __be32 err;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> if (fhp->fh_post_saved)
> printk("nfsd: inode locked twice during operation.\n");
>
>diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
>index 350041a40fe5..29ae37f62b9b 100644
>--- a/fs/nfsd/nfsfh.c
>+++ b/fs/nfsd/nfsfh.c
>@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst
>*rqstp, struct svc_fh *fhp)
>
> fhp->fh_dentry = dentry;
> fhp->fh_export = exp;
>+
>+ switch (rqstp->rq_vers) {
>+ case 3:
>+ if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
>+ break;
>+ /* Fallthrough */
>+ case 2:
>+ fhp->fh_no_wcc = true;
>+ }
>+
> return 0;
> out:
> exp_put(exp);
>@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export
>*exp, struct dentry *dentry,
> */
> set_version_and_fsid_type(fhp, exp, ref_fh);
>
>+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
>+ fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
>+
> if (ref_fh == fhp)
> fh_put(ref_fh);
>
>@@ -641,6 +654,7 @@ fh_put(struct svc_fh *fhp)
> exp_put(exp);
> fhp->fh_export = NULL;
> }
>+ fhp->fh_no_wcc = false;
> return;
> }
>
>diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
>index 1e90dad4926b..9ddead4d98f8 100644
>--- a/fs/nfsd/nfsfh.h
>+++ b/fs/nfsd/nfsfh.h
>@@ -32,6 +32,7 @@ typedef struct svc_fh {
>
> unsigned char fh_locked; /* inode locked by us */
> unsigned char fh_want_write; /* remount protection taken */
>+ bool fh_no_wcc; /* no wcc data needed */
This increases the size of svc_fh because it splits the four unsigned
chars.
You could change all of these (fh_locked, fh_want_write,
fh_{pre,post}saved)
to be bools to avoid that and make it more clear they are only used as
booleans (I verified that they all are only assigned 0 or 1).
Cheers, Andreas
>
> #ifdef CONFIG_NFSD_V3
> unsigned char fh_post_saved; /* post-op attrs saved */
>@@ -51,7 +52,6 @@ typedef struct svc_fh {
> struct kstat fh_post_attr; /* full attrs after operation */
> u64 fh_post_change; /* nfsv4 change; see above */
> #endif /* CONFIG_NFSD_V3 */
>-
> } svc_fh;
>
> enum nfsd_fsid {
>@@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> {
> struct inode *inode;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> inode = d_inode(fhp->fh_dentry);
> if (!fhp->fh_pre_saved) {
> fhp->fh_pre_mtime = inode->i_mtime;
>diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
>index fa05e04c5531..600c3fccc999 100644
>--- a/include/linux/exportfs.h
>+++ b/include/linux/exportfs.h
>@@ -214,6 +214,8 @@ struct export_operations {
> bool write, u32 *device_generation);
> int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
> int nr_iomaps, struct iattr *iattr);
>+#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3
>replies */
>+ unsigned long flags;
> };
>
> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
>--
>2.4.3
>
>_______________________________________________
>HPDD-discuss mailing list
>HPDD-discuss-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
>https://lists.01.org/mailman/listinfo/hpdd-discuss
>
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Dilger, Andreas <andreas.dilger@intel.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
Date: Sat, 12 Sep 2015 04:41:33 +0000 [thread overview]
Message-ID: <D21906EE.10A741%andreas.dilger@intel.com> (raw)
In-Reply-To: <1441966830-5517-1-git-send-email-jeff.layton@primarydata.com>
On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
<hpdd-discuss-bounces at lists.01.org on behalf of jlayton@poochiereds.net>
wrote:
>With NFSv3 nfsd will always attempt to send along WCC data to the
>client. This generally involves saving off the in-core inode information
>prior to doing the operation on the given filehandle, and then issuing a
>vfs_getattr to it after the op.
>
>Some filesystems (particularly clustered or networked ones) have an
>expensive ->getattr inode operation. Atomicitiy is also often difficult
>or impossible to guarantee on such filesystems. For those, we're best
>off not trying to provide WCC information to the client at all, and to
>simply allow it to poll for that information as needed with a GETATTR
>RPC.
>
>This patch adds a new flags field to struct export_operations, and
>defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
>that nfsd should not attempt to provide WCC info in NFSv3 replies. It
>also adds a blurb about the new flags field and flag to the exporting
>documentation.
>
>The server will also now skip collecting this information for NFSv2 as
>well, since that info is never used there anyway.
>
>Note that this patch does not add this flag to any filesystem
>export_operations structures. This was originally developed to allow
>reexporting nfs via nfsd. That code is not (and may never be) suitable
>for merging into mainline.
>
>Other filesystems may want to consider enabling this flag too. It's hard
>to tell however which ones have export operations to enable export via
>knfsd and which ones mostly rely on them for open-by-filehandle support,
>so I'm leaving that up to the individual maintainers to decide. I am
>cc'ing the relevant lists for those filesystems that I think may want to
>consider adding this though.
>
>Cc: HPDD-discuss at lists.01.org
>Cc: ceph-devel at vger.kernel.org
>Cc: cluster-devel at redhat.com
>Cc: fuse-devel at lists.sourceforge.net
>Cc: ocfs2-devel at oss.oracle.com
>Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
>---
> Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
> fs/nfsd/nfs3xdr.c | 5 ++++-
> fs/nfsd/nfsfh.c | 14 ++++++++++++++
> fs/nfsd/nfsfh.h | 5 ++++-
> include/linux/exportfs.h | 2 ++
> 5 files changed, 51 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/filesystems/nfs/Exporting
>b/Documentation/filesystems/nfs/Exporting
>index 520a4becb75c..fa636cde3907 100644
>--- a/Documentation/filesystems/nfs/Exporting
>+++ b/Documentation/filesystems/nfs/Exporting
>@@ -138,6 +138,11 @@ struct which has the following members:
> to find potential names, and matches inode numbers to find the
>correct
> match.
>
>+ flags
>+ Some filesystems may need to be handled differently than others. The
>+ export_operations struct also includes a flags field that allows the
>+ filesystem to communicate such information to nfsd. See the Export
>+ Operations Flags section below for more explanation.
>
> A filehandle fragment consists of an array of 1 or more 4byte words,
> together with a one byte "type".
>@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
>been padded with
> nuls. Rather, the encode_fh routine should choose a "type" which
> indicates the decode_fh how much of the filehandle is valid, and how
> it should be interpreted.
>+
>+Export Operations Flags
>+-----------------------
>+In addition to the operation vector pointers, struct export_operations
>also
>+contains a "flags" field that allows the filesystem to communicate to
>nfsd
>+that it may want to do things differently when dealing with it. The
>+following flags are defined:
>+
>+ EXPORT_OP_NOWCC
>+ RFC 1813 recommends that servers always send weak cache consistency
>+ (WCC) data to the client after each operation. The server should
>+ atomically collect attributes about the inode, do an operation on it,
>+ and then collect the attributes afterward. This allows the client to
>+ skip issuing GETATTRs in some situations but means that the server
>+ is calling vfs_getattr for almost all RPCs. On some filesystems
>+ (particularly those that are clustered or networked) this is
>expensive
>+ and atomicity is difficult to guarantee. This flag indicates to nfsd
>+ that it should skip providing WCC attributes to the client in NFSv3
>+ replies when doing operations on this filesystem. Consider enabling
>+ this on filesystems that have an expensive ->getattr inode operation,
>+ or when atomicity between pre and post operation attribute collection
>+ is impossible to guarantee.
>diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>index 01dcd494f781..c30c8c604e2a 100644
>--- a/fs/nfsd/nfs3xdr.c
>+++ b/fs/nfsd/nfs3xdr.c
>@@ -203,7 +203,7 @@ static __be32 *
> encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh
>*fhp)
> {
> struct dentry *dentry = fhp->fh_dentry;
>- if (dentry && d_really_is_positive(dentry)) {
>+ if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> __be32 err;
> struct kstat stat;
>
>@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> {
> __be32 err;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> if (fhp->fh_post_saved)
> printk("nfsd: inode locked twice during operation.\n");
>
>diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
>index 350041a40fe5..29ae37f62b9b 100644
>--- a/fs/nfsd/nfsfh.c
>+++ b/fs/nfsd/nfsfh.c
>@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst
>*rqstp, struct svc_fh *fhp)
>
> fhp->fh_dentry = dentry;
> fhp->fh_export = exp;
>+
>+ switch (rqstp->rq_vers) {
>+ case 3:
>+ if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
>+ break;
>+ /* Fallthrough */
>+ case 2:
>+ fhp->fh_no_wcc = true;
>+ }
>+
> return 0;
> out:
> exp_put(exp);
>@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export
>*exp, struct dentry *dentry,
> */
> set_version_and_fsid_type(fhp, exp, ref_fh);
>
>+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
>+ fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
>+
> if (ref_fh == fhp)
> fh_put(ref_fh);
>
>@@ -641,6 +654,7 @@ fh_put(struct svc_fh *fhp)
> exp_put(exp);
> fhp->fh_export = NULL;
> }
>+ fhp->fh_no_wcc = false;
> return;
> }
>
>diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
>index 1e90dad4926b..9ddead4d98f8 100644
>--- a/fs/nfsd/nfsfh.h
>+++ b/fs/nfsd/nfsfh.h
>@@ -32,6 +32,7 @@ typedef struct svc_fh {
>
> unsigned char fh_locked; /* inode locked by us */
> unsigned char fh_want_write; /* remount protection taken */
>+ bool fh_no_wcc; /* no wcc data needed */
This increases the size of svc_fh because it splits the four unsigned
chars.
You could change all of these (fh_locked, fh_want_write,
fh_{pre,post}saved)
to be bools to avoid that and make it more clear they are only used as
booleans (I verified that they all are only assigned 0 or 1).
Cheers, Andreas
>
> #ifdef CONFIG_NFSD_V3
> unsigned char fh_post_saved; /* post-op attrs saved */
>@@ -51,7 +52,6 @@ typedef struct svc_fh {
> struct kstat fh_post_attr; /* full attrs after operation */
> u64 fh_post_change; /* nfsv4 change; see above */
> #endif /* CONFIG_NFSD_V3 */
>-
> } svc_fh;
>
> enum nfsd_fsid {
>@@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> {
> struct inode *inode;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> inode = d_inode(fhp->fh_dentry);
> if (!fhp->fh_pre_saved) {
> fhp->fh_pre_mtime = inode->i_mtime;
>diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
>index fa05e04c5531..600c3fccc999 100644
>--- a/include/linux/exportfs.h
>+++ b/include/linux/exportfs.h
>@@ -214,6 +214,8 @@ struct export_operations {
> bool write, u32 *device_generation);
> int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
> int nr_iomaps, struct iattr *iattr);
>+#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3
>replies */
>+ unsigned long flags;
> };
>
> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
>--
>2.4.3
>
>_______________________________________________
>HPDD-discuss mailing list
>HPDD-discuss at lists.01.org
>https://lists.01.org/mailman/listinfo/hpdd-discuss
>
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
WARNING: multiple messages have this Message-ID (diff)
From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: Jeff Layton <jlayton@poochiereds.net>,
"bfields@fieldses.org" <bfields@fieldses.org>
Cc: "cluster-devel@redhat.com" <cluster-devel@redhat.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"fuse-devel@lists.sourceforge.net"
<fuse-devel@lists.sourceforge.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"HPDD-discuss@lists.01.org" <HPDD-discuss@ml01.01.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com>
Subject: Re: [HPDD-discuss] [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
Date: Sat, 12 Sep 2015 04:41:33 +0000 [thread overview]
Message-ID: <D21906EE.10A741%andreas.dilger@intel.com> (raw)
In-Reply-To: <1441966830-5517-1-git-send-email-jeff.layton@primarydata.com>
On 2015/09/11, 4:20 AM, "HPDD-discuss on behalf of Jeff Layton"
<hpdd-discuss-bounces@lists.01.org on behalf of jlayton@poochiereds.net>
wrote:
>With NFSv3 nfsd will always attempt to send along WCC data to the
>client. This generally involves saving off the in-core inode information
>prior to doing the operation on the given filehandle, and then issuing a
>vfs_getattr to it after the op.
>
>Some filesystems (particularly clustered or networked ones) have an
>expensive ->getattr inode operation. Atomicitiy is also often difficult
>or impossible to guarantee on such filesystems. For those, we're best
>off not trying to provide WCC information to the client at all, and to
>simply allow it to poll for that information as needed with a GETATTR
>RPC.
>
>This patch adds a new flags field to struct export_operations, and
>defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
>that nfsd should not attempt to provide WCC info in NFSv3 replies. It
>also adds a blurb about the new flags field and flag to the exporting
>documentation.
>
>The server will also now skip collecting this information for NFSv2 as
>well, since that info is never used there anyway.
>
>Note that this patch does not add this flag to any filesystem
>export_operations structures. This was originally developed to allow
>reexporting nfs via nfsd. That code is not (and may never be) suitable
>for merging into mainline.
>
>Other filesystems may want to consider enabling this flag too. It's hard
>to tell however which ones have export operations to enable export via
>knfsd and which ones mostly rely on them for open-by-filehandle support,
>so I'm leaving that up to the individual maintainers to decide. I am
>cc'ing the relevant lists for those filesystems that I think may want to
>consider adding this though.
>
>Cc: HPDD-discuss@lists.01.org
>Cc: ceph-devel@vger.kernel.org
>Cc: cluster-devel@redhat.com
>Cc: fuse-devel@lists.sourceforge.net
>Cc: ocfs2-devel@oss.oracle.com
>Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
>---
> Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++++++++++++++
> fs/nfsd/nfs3xdr.c | 5 ++++-
> fs/nfsd/nfsfh.c | 14 ++++++++++++++
> fs/nfsd/nfsfh.h | 5 ++++-
> include/linux/exportfs.h | 2 ++
> 5 files changed, 51 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/filesystems/nfs/Exporting
>b/Documentation/filesystems/nfs/Exporting
>index 520a4becb75c..fa636cde3907 100644
>--- a/Documentation/filesystems/nfs/Exporting
>+++ b/Documentation/filesystems/nfs/Exporting
>@@ -138,6 +138,11 @@ struct which has the following members:
> to find potential names, and matches inode numbers to find the
>correct
> match.
>
>+ flags
>+ Some filesystems may need to be handled differently than others. The
>+ export_operations struct also includes a flags field that allows the
>+ filesystem to communicate such information to nfsd. See the Export
>+ Operations Flags section below for more explanation.
>
> A filehandle fragment consists of an array of 1 or more 4byte words,
> together with a one byte "type".
>@@ -147,3 +152,25 @@ generated by encode_fh, in which case it will have
>been padded with
> nuls. Rather, the encode_fh routine should choose a "type" which
> indicates the decode_fh how much of the filehandle is valid, and how
> it should be interpreted.
>+
>+Export Operations Flags
>+-----------------------
>+In addition to the operation vector pointers, struct export_operations
>also
>+contains a "flags" field that allows the filesystem to communicate to
>nfsd
>+that it may want to do things differently when dealing with it. The
>+following flags are defined:
>+
>+ EXPORT_OP_NOWCC
>+ RFC 1813 recommends that servers always send weak cache consistency
>+ (WCC) data to the client after each operation. The server should
>+ atomically collect attributes about the inode, do an operation on it,
>+ and then collect the attributes afterward. This allows the client to
>+ skip issuing GETATTRs in some situations but means that the server
>+ is calling vfs_getattr for almost all RPCs. On some filesystems
>+ (particularly those that are clustered or networked) this is
>expensive
>+ and atomicity is difficult to guarantee. This flag indicates to nfsd
>+ that it should skip providing WCC attributes to the client in NFSv3
>+ replies when doing operations on this filesystem. Consider enabling
>+ this on filesystems that have an expensive ->getattr inode operation,
>+ or when atomicity between pre and post operation attribute collection
>+ is impossible to guarantee.
>diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>index 01dcd494f781..c30c8c604e2a 100644
>--- a/fs/nfsd/nfs3xdr.c
>+++ b/fs/nfsd/nfs3xdr.c
>@@ -203,7 +203,7 @@ static __be32 *
> encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh
>*fhp)
> {
> struct dentry *dentry = fhp->fh_dentry;
>- if (dentry && d_really_is_positive(dentry)) {
>+ if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) {
> __be32 err;
> struct kstat stat;
>
>@@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> {
> __be32 err;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> if (fhp->fh_post_saved)
> printk("nfsd: inode locked twice during operation.\n");
>
>diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
>index 350041a40fe5..29ae37f62b9b 100644
>--- a/fs/nfsd/nfsfh.c
>+++ b/fs/nfsd/nfsfh.c
>@@ -267,6 +267,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst
>*rqstp, struct svc_fh *fhp)
>
> fhp->fh_dentry = dentry;
> fhp->fh_export = exp;
>+
>+ switch (rqstp->rq_vers) {
>+ case 3:
>+ if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC))
>+ break;
>+ /* Fallthrough */
>+ case 2:
>+ fhp->fh_no_wcc = true;
>+ }
>+
> return 0;
> out:
> exp_put(exp);
>@@ -535,6 +545,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export
>*exp, struct dentry *dentry,
> */
> set_version_and_fsid_type(fhp, exp, ref_fh);
>
>+ /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */
>+ fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false;
>+
> if (ref_fh == fhp)
> fh_put(ref_fh);
>
>@@ -641,6 +654,7 @@ fh_put(struct svc_fh *fhp)
> exp_put(exp);
> fhp->fh_export = NULL;
> }
>+ fhp->fh_no_wcc = false;
> return;
> }
>
>diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
>index 1e90dad4926b..9ddead4d98f8 100644
>--- a/fs/nfsd/nfsfh.h
>+++ b/fs/nfsd/nfsfh.h
>@@ -32,6 +32,7 @@ typedef struct svc_fh {
>
> unsigned char fh_locked; /* inode locked by us */
> unsigned char fh_want_write; /* remount protection taken */
>+ bool fh_no_wcc; /* no wcc data needed */
This increases the size of svc_fh because it splits the four unsigned
chars.
You could change all of these (fh_locked, fh_want_write,
fh_{pre,post}saved)
to be bools to avoid that and make it more clear they are only used as
booleans (I verified that they all are only assigned 0 or 1).
Cheers, Andreas
>
> #ifdef CONFIG_NFSD_V3
> unsigned char fh_post_saved; /* post-op attrs saved */
>@@ -51,7 +52,6 @@ typedef struct svc_fh {
> struct kstat fh_post_attr; /* full attrs after operation */
> u64 fh_post_change; /* nfsv4 change; see above */
> #endif /* CONFIG_NFSD_V3 */
>-
> } svc_fh;
>
> enum nfsd_fsid {
>@@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> {
> struct inode *inode;
>
>+ if (fhp->fh_no_wcc)
>+ return;
>+
> inode = d_inode(fhp->fh_dentry);
> if (!fhp->fh_pre_saved) {
> fhp->fh_pre_mtime = inode->i_mtime;
>diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
>index fa05e04c5531..600c3fccc999 100644
>--- a/include/linux/exportfs.h
>+++ b/include/linux/exportfs.h
>@@ -214,6 +214,8 @@ struct export_operations {
> bool write, u32 *device_generation);
> int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
> int nr_iomaps, struct iattr *iattr);
>+#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3
>replies */
>+ unsigned long flags;
> };
>
> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
>--
>2.4.3
>
>_______________________________________________
>HPDD-discuss mailing list
>HPDD-discuss@lists.01.org
>https://lists.01.org/mailman/listinfo/hpdd-discuss
>
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
next prev parent reply other threads:[~2015-09-12 4:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-11 10:20 [PATCH] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Jeff Layton
2015-09-11 10:20 ` Jeff Layton
2015-09-11 10:20 ` [Cluster-devel] " Jeff Layton
2015-09-11 19:13 ` [lustre-devel] FW: [HPDD-discuss] " Patrick Farrell
2015-09-12 4:44 ` Dilger, Andreas
2015-09-11 21:29 ` J. Bruce Fields
2015-09-11 21:29 ` J. Bruce Fields
2015-09-11 21:29 ` [Cluster-devel] " J. Bruce Fields
[not found] ` <20150911212957.GH11677-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-09-11 21:44 ` Jeff Layton
2015-09-11 21:44 ` Jeff Layton
2015-09-11 21:44 ` [Cluster-devel] " Jeff Layton
[not found] ` <1441966830-5517-1-git-send-email-jeff.layton-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-09-12 4:41 ` Dilger, Andreas [this message]
2015-09-12 4:41 ` [HPDD-discuss] " Dilger, Andreas
2015-09-12 4:41 ` [Cluster-devel] " Dilger, Andreas
[not found] ` <D21906EE.10A741%andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-09-12 10:24 ` Jeff Layton
2015-09-12 10:24 ` Jeff Layton
2015-09-12 10:24 ` [Cluster-devel] " Jeff Layton
2015-09-14 16:10 ` bfields
2015-09-14 16:10 ` bfields
2015-09-14 16:10 ` [Cluster-devel] " bfields
2015-09-16 17:18 ` Jeff Layton
2015-09-16 17:18 ` Jeff Layton
2015-09-16 17:18 ` [Cluster-devel] " Jeff Layton
2015-09-16 21:30 ` bfields
2015-09-16 21:30 ` bfields
2015-09-16 21:30 ` [Cluster-devel] " bfields
[not found] ` <20150916213044.GB5169-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-09-16 22:04 ` Jeff Layton
2015-09-16 22:04 ` Jeff Layton
2015-09-16 22:04 ` [Cluster-devel] " Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D21906EE.10A741%andreas.dilger@intel.com \
--to=andreas.dilger-ral2jqcrhueavxtiumwx3w@public.gmane.org \
--cc=HPDD-discuss-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
--cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
--cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=cluster-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.