From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies
Date: Thu, 3 Sep 2015 16:20:16 -0400 [thread overview]
Message-ID: <20150903202016.GH10088@fieldses.org> (raw)
In-Reply-To: <20150903155417.468aaca0@tlielax.poochiereds.net>
On Thu, Sep 03, 2015 at 03:54:17PM -0400, Jeff Layton wrote:
> On Thu, 3 Sep 2015 15:19:14 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>
> > On Thu, Sep 03, 2015 at 02:52:25PM -0400, Jeff Layton wrote:
> > > On Thu, 3 Sep 2015 14:43:27 -0400
> > > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > >
> > > > On Thu, Sep 03, 2015 at 01:33:14PM -0400, Jeff Layton wrote:
> > > > > There are cases with NFSv3 where the client doesn't actually care about
> > > > > WCC attributes in replies. If the server is mainly acting as a DS for
> > > > > flexfiles, then the client just throws out those attributes anyway.
> > > > > Also, in the case where the client is primarily doing direct I/O, post
> > > > > op attributes aren't terribly useful
> > > > >
> > > > > Another reason to allow turning these off is that NFS will flush all
> > > > > buffered writes prior to issuing a GETATTR, and it also takes the
> > > > > i_mutex in its ->getattr operation.
> > > > >
> > > > > If we're doing a vfs_getattr after most RPCs, then we can end up
> > > > > deadlocking or (at best) prematurely flushing buffered writes, which
> > > > > kills performance.
> > > >
> > > > So you're talking about the NFS re-export case? Do we know of any other
> > > > case when a ->getattr is so expensive?
> > > >
> > >
> > > That's the main one that I have experience with, but getattr can be
> > > pretty expensive in clustered filesystems. For instance, on ceph:
> > >
> > > err = ceph_do_getattr(inode, CEPH_STAT_CAP_INODE_ALL, false);
> > > if (!err) {
> > > generic_fillattr(inode, stat);
> > > stat->ino = ceph_translate_ino(inode->i_sb, inode->i_ino);
> > >
> > >
> > > ...and it looks like ceph_do_getattr can issue a request on the network
> > > (though I'm not familiar with that code and I imagine that it's
> > > sometimes optimized out).
> >
> > OK. Could we get something like that in the changelog? The change
> > really needs to stand on the non-NFS case alone as long as NFS
> > reexport's not upstream. For that reason (and because without the
> > context that second paragraph's kind of confusing), it'd be helpful to
> > preface the NFS discussion by smoething like "In the (out-of-tree) NFS
> > re-export case".
> >
>
> Yeah, no problem. I'll respin the changelog on both patches and resend
> within the next day or two.
>
> > What's keeping that out of upstream, anyway? Apparently there's some
> > use case, and if it's inspiring a lot of changes in generic code, then
> > it'd simplify life to have it upstream.
> >
> >
>
> There are several problems. Here are few but there are others:
>
> 1) it is at least somewhat of a potential security concern. By mounting
> on a box that has access and then reexporting it, you can circumvent
> the export restrictions on the original server. Granted you can do that
> today with samba or something, but still -- it's a little sketchy.
Or with ganesha, or you could run a web server, or just mail the file
contents to someone.... This just isn't the way to enforce anything.
I've hard Trond argue something like this before, but I think his point
was a little different: not that we have to deny reexports for security
reasons, but just that such a policy-circumventing use case isn't worth
supporting. So he wasn't interested in reexports as long as that was
the only use case he'd heard about.
> 2) getattrs: We're working around the problem with this new export
> option, but if you don't use that then you can potentially deadlock
> with NFS. It wants to take the i_mutex in its ->getattr operation but
> knfsd calls vfs_getattr with that held to do post-op attrs. My initial
> workaround was to drop the i_mutex before calling fh_getattr instead of
> after, but then I hit the performance problem I described.
>
> 3) locking: proxying v3 locking is a painful mess. If the reexporter
> reboots, it'll lose its lease on the main server, which will kick out
> all of its state. At that point you can end up with another client
> racing and getting your lock before the reexporter can come back up and
> reclaim it.
>
> Our main use-case for this is pretty limited and doesn't involve file
> locking (so far!).
So this is the interest part, I guess.--b.
> We'll probably have to code up some mechanism to
> deal with that at some point, but that won't be an in-kernel solution.
>
> Once we ship this thing, the kernel changes will be GPLed of course for
> anyone who wants to use them, but I'm not sure any of us really want the
> headache of merging this upstream as a first-class feature.
>
> > >
> > >
> > > >
> > > > > This patch adds a new export option -- "nowcc" that disables the
> > > > > return of WCC attributes in NFSv3 replies. I also have a userland
> > > > > patch that adds support for the same option to nfs-utils that I'll
> > > > > send along as well.
> > > > >
> > > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > > > ---
> > > > > fs/nfsd/export.c | 1 +
> > > > > fs/nfsd/nfs3xdr.c | 5 ++++-
> > > > > fs/nfsd/nfsfh.c | 4 ++++
> > > > > fs/nfsd/nfsfh.h | 5 ++++-
> > > > > include/uapi/linux/nfsd/export.h | 3 ++-
> > > > > 5 files changed, 15 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> > > > > index b4d84b579f20..97258009ce1e 100644
> > > > > --- a/fs/nfsd/export.c
> > > > > +++ b/fs/nfsd/export.c
> > > > > @@ -1092,6 +1092,7 @@ static struct flags {
> > > > > { NFSEXP_NOAUTHNLM, {"insecure_locks", ""}},
> > > > > { NFSEXP_V4ROOT, {"v4root", ""}},
> > > > > { NFSEXP_PNFS, {"pnfs", ""}},
> > > > > + { NFSEXP_NOWCC, {"nowcc", ""}},
> > > > > { 0, {"", ""}}
> > > > > };
> > > > >
> > > > > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > > > > index 01dcd494f781..c30c8c604e2a 100644
> > > > > --- a/fs/nfsd/nfs3xdr.c
> > > > > +++ b/fs/nfsd/nfs3xdr.c
> > > > > @@ -203,7 +203,7 @@ static __be32 *
> > > > > encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct
> > > > > svc_fh *fhp) {
> > > > > struct dentry *dentry = fhp->fh_dentry;
> > > > > - if (dentry && d_really_is_positive(dentry)) {
> > > > > + if (!fhp->fh_no_wcc && dentry &&
> > > > > d_really_is_positive(dentry)) { __be32 err;
> > > > > struct kstat stat;
> > > > >
> > > > > @@ -256,6 +256,9 @@ void fill_post_wcc(struct svc_fh *fhp)
> > > > > {
> > > > > __be32 err;
> > > > >
> > > > > + if (fhp->fh_no_wcc)
> > > > > + return;
> > > > > +
> > > > > if (fhp->fh_post_saved)
> > > > > printk("nfsd: inode locked twice during
> > > > > operation.\n");
> > > > > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > > > > index 350041a40fe5..32093b7dce55 100644
> > > > > --- a/fs/nfsd/nfsfh.c
> > > > > +++ b/fs/nfsd/nfsfh.c
> > > > > @@ -267,6 +267,9 @@ static __be32 nfsd_set_fh_dentry(struct
> > > > > svc_rqst *rqstp, struct svc_fh *fhp)
> > > > > fhp->fh_dentry = dentry;
> > > > > fhp->fh_export = exp;
> > > > > + if (exp->ex_flags & NFSEXP_NOWCC && rqstp->rq_vers == 3)
> > > > > + fhp->fh_no_wcc = true;
> > > > > +
> > > > > return 0;
> > > > > out:
> > > > > exp_put(exp);
> > > > > @@ -641,6 +644,7 @@ fh_put(struct svc_fh *fhp)
> > > > > exp_put(exp);
> > > > > fhp->fh_export = NULL;
> > > > > }
> > > > > + fhp->fh_no_wcc = false;
> > > > > return;
> > > > > }
> > > > >
> > > > > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > > > > index 1e90dad4926b..9ddead4d98f8 100644
> > > > > --- a/fs/nfsd/nfsfh.h
> > > > > +++ b/fs/nfsd/nfsfh.h
> > > > > @@ -32,6 +32,7 @@ typedef struct svc_fh {
> > > > >
> > > > > unsigned char fh_locked; /* inode
> > > > > locked by us */ unsigned char
> > > > > fh_want_write; /* remount protection taken */
> > > > > + bool fh_no_wcc; /* no wcc
> > > > > data needed */
> > > > > #ifdef CONFIG_NFSD_V3
> > > > > unsigned char fh_post_saved; /*
> > > > > post-op attrs saved */ @@ -51,7 +52,6 @@ typedef struct svc_fh {
> > > > > struct kstat fh_post_attr; /* full
> > > > > attrs after operation */ u64
> > > > > fh_post_change; /* nfsv4 change; see above */ #endif /*
> > > > > CONFIG_NFSD_V3 */ -
> > > > > } svc_fh;
> > > > >
> > > > > enum nfsd_fsid {
> > > > > @@ -225,6 +225,9 @@ fill_pre_wcc(struct svc_fh *fhp)
> > > > > {
> > > > > struct inode *inode;
> > > > >
> > > > > + if (fhp->fh_no_wcc)
> > > > > + return;
> > > > > +
> > > > > inode = d_inode(fhp->fh_dentry);
> > > > > if (!fhp->fh_pre_saved) {
> > > > > fhp->fh_pre_mtime = inode->i_mtime;
> > > > > diff --git a/include/uapi/linux/nfsd/export.h
> > > > > b/include/uapi/linux/nfsd/export.h index 0df7bd5d2fb1..4c132290f414
> > > > > 100644 --- a/include/uapi/linux/nfsd/export.h
> > > > > +++ b/include/uapi/linux/nfsd/export.h
> > > > > @@ -51,9 +51,10 @@
> > > > > */
> > > > > #define NFSEXP_V4ROOT 0x10000
> > > > > #define NFSEXP_PNFS 0x20000
> > > > > +#define NFSEXP_NOWCC 0x40000
> > > > >
> > > > > /* All flags that we claim to support. (Note we don't support
> > > > > NOACL.) */ -#define NFSEXP_ALLFLAGS 0x3FE7F
> > > > > +#define NFSEXP_ALLFLAGS 0x7FE7F
> > > > >
> > > > > /* The flags that may vary depending on security flavor: */
> > > > > #define NFSEXP_SECINFO_FLAGS (NFSEXP_READONLY |
> > > > > NFSEXP_ROOTSQUASH \ --
> > > > > 2.4.3
> > >
> > >
> > > --
> > > Jeff Layton <jlayton@poochiereds.net>
>
>
> --
> Jeff Layton <jlayton@poochiereds.net>
next prev parent reply other threads:[~2015-09-03 20:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-03 17:33 [PATCH] nfsd: add a new nowcc export option to disable WCC attrs in v3 replies Jeff Layton
2015-09-03 18:43 ` J. Bruce Fields
2015-09-03 18:52 ` Jeff Layton
2015-09-03 19:19 ` J. Bruce Fields
2015-09-03 19:54 ` Jeff Layton
2015-09-03 20:20 ` J. Bruce Fields [this message]
2015-09-03 20:35 ` Jeff Layton
2015-09-11 13:35 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150903202016.GH10088@fieldses.org \
--to=bfields@fieldses.org \
--cc=jlayton@poochiereds.net \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).