From: Jeff Layton <jlayton@poochiereds.net>
To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Cc: Trond Myklebust <trondmy@primarydata.com>,
linux-nfs@vger.kernel.org,
Anna Schumaker <Anna.Schumaker@netapp.com>,
hch@infradead.org
Subject: Re: [RFC PATCH] nfs: allow nfs client to handle servers that hand out multiple layout types
Date: Tue, 07 Jun 2016 08:43:10 -0400 [thread overview]
Message-ID: <1465303390.3024.15.camel@poochiereds.net> (raw)
In-Reply-To: <797047869.16381632.1465302372374.JavaMail.zimbra@desy.de>
On Tue, 2016-06-07 at 14:26 +0200, Mkrtchyan, Tigran wrote:
>
> ----- Original Message -----
> >
> > From: "Jeff Layton" <jlayton@poochiereds.net>
> > To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> > Cc: "Trond Myklebust" <trondmy@primarydata.com>, linux-nfs@vger.ker
> > nel.org, "Anna Schumaker"
> > <Anna.Schumaker@netapp.com>, hch@infradead.org
> > Sent: Thursday, June 2, 2016 1:04:19 PM
> > Subject: Re: [RFC PATCH] nfs: allow nfs client to handle servers
> > that hand out multiple layout types
> >
> > On Thu, 2016-06-02 at 09:12 +0200, Mkrtchyan, Tigran wrote:
> > >
> > >
> > > ----- Original Message -----
> > > >
> > > > From: "Jeff Layton" <jlayton@poochiereds.net>
> > > > To: "Trond Myklebust" <trondmy@primarydata.com>, linux-nfs@vger
> > > > .kernel.org
> > > > Cc: "tigran mkrtchyan" <tigran.mkrtchyan@desy.de>, "Anna
> > > > Schumaker"
> > > > <Anna.Schumaker@netapp.com>, hch@infradead.org
> > > > Sent: Wednesday, June 1, 2016 11:53:03 PM
> > > > Subject: Re: [RFC PATCH] nfs: allow nfs client to handle
> > > > servers that hand out
> > > > multiple layout types
> > > >
> > > > On Tue, 2016-05-31 at 17:54 -0400, Jeff Layton wrote:
> > > > >
> > > > > On Tue, 2016-05-31 at 21:41 +0000, Trond Myklebust wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 5/31/16, 17:09, "Jeff Layton" <jlayton@poochiereds.net>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > On Tue, 2016-05-31 at 16:03 +0000, Trond Myklebust wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 5/30/16, 12:35, "Jeff Layton" <jlayton@poochiereds.n
> > > > > > > > et> wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Allow the client to deal with servers that hand out
> > > > > > > > > multiple layout
> > > > > > > > > types for the same filesystem. When this happens, we
> > > > > > > > > pick the "best" one,
> > > > > > > > > based on a hardcoded assumed order in the client
> > > > > > > > > code.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.c
> > > > > > > > > om>
> > > > > > > > > ---
> > > > > > > > > fs/nfs/client.c | 2 +-
> > > > > > > > > fs/nfs/nfs4proc.c | 2 +-
> > > > > > > > > fs/nfs/nfs4xdr.c | 41 +++++++++++++-------------
> > > > > > > > > fs/nfs/pnfs.c | 76
> > > > > > > > > ++++++++++++++++++++++++++++++++++++++-----------
> > > > > > > > > include/linux/nfs_xdr.h | 2 +-
> > > > > > > > > 5 files changed, 85 insertions(+), 38 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > > > > > > > index 0c96528db94a..53b41f4bd45a 100644
> > > > > > > > > --- a/fs/nfs/client.c
> > > > > > > > > +++ b/fs/nfs/client.c
> > > > > > > > > @@ -787,7 +787,7 @@ int nfs_probe_fsinfo(struct
> > > > > > > > > nfs_server *server, struct
> > > > > > > > > nfs_fh *mntfh, struct nfs
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > fsinfo.fattr = fattr;
> > > > > > > > > - fsinfo.layouttype = 0;
> > > > > > > > > + fsinfo.layouttypes = 0;
> > > > > > > > > error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo);
> > > > > > > > > if (error < 0)
> > > > > > > > > goto out_error;
> > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > > > > > > index de97567795a5..9446aef89b48 100644
> > > > > > > > > --- a/fs/nfs/nfs4proc.c
> > > > > > > > > +++ b/fs/nfs/nfs4proc.c
> > > > > > > > > @@ -4252,7 +4252,7 @@ static int
> > > > > > > > > nfs4_proc_fsinfo(struct nfs_server *server,
> > > > > > > > > struct nfs_fh *fhandle, s
> > > > > > > > > if (error == 0) {
> > > > > > > > > /* block layout checks this! */
> > > > > > > > > server->pnfs_blksize = fsinfo->blksize;
> > > > > > > > > - set_pnfs_layoutdriver(server, fhandle,
> > > > > > > > > fsinfo->layouttype);
> > > > > > > > > + set_pnfs_layoutdriver(server, fhandle,
> > > > > > > > > fsinfo->layouttypes);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > return error;
> > > > > > > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > > > > > > > > index 661e753fe1c9..876a80802c1d 100644
> > > > > > > > > --- a/fs/nfs/nfs4xdr.c
> > > > > > > > > +++ b/fs/nfs/nfs4xdr.c
> > > > > > > > > @@ -4723,33 +4723,36 @@ static int
> > > > > > > > > decode_getfattr(struct xdr_stream *xdr,
> > > > > > > > > struct nfs_fattr *fattr,
> > > > > > > > > * Decode potentially multiple layout types. Currently
> > > > > > > > > we only support
> > > > > > > > > * one layout driver per file system.
> > > > > > > > > */
> > > > > > > > > -static int decode_first_pnfs_layout_type(struct
> > > > > > > > > xdr_stream *xdr,
> > > > > > > > > - uint32_t *layouttype)
> > > > > > > > > +static int decode_pnfs_layout_types(struct
> > > > > > > > > xdr_stream *xdr, u32 *layouttypes)
> > > > > > > > > {
> > > > > > > > > __be32 *p;
> > > > > > > > > int num;
> > > > > > > > > + u32 type;
> > > > > > > > >
> > > > > > > > > p = xdr_inline_decode(xdr, 4);
> > > > > > > > > if (unlikely(!p))
> > > > > > > > > goto out_overflow;
> > > > > > > > > num = be32_to_cpup(p);
> > > > > > > > >
> > > > > > > > > - /* pNFS is not supported by the underlying
> > > > > > > > > file system */
> > > > > > > > > - if (num == 0) {
> > > > > > > > > - *layouttype = 0;
> > > > > > > > > - return 0;
> > > > > > > > > - }
> > > > > > > > > - if (num > 1)
> > > > > > > > > - printk(KERN_INFO "NFS: %s: Warning:
> > > > > > > > > Multiple pNFS layout "
> > > > > > > > > - "drivers per filesystem not supported\n",
> > > > > > > > > __func__);
> > > > > > > > > + *layouttypes = 0;
> > > > > > > > >
> > > > > > > > > - /* Decode and set first layout type, move
> > > > > > > > > xdr->p past unused types */
> > > > > > > > > - p = xdr_inline_decode(xdr, num * 4);
> > > > > > > > > - if (unlikely(!p))
> > > > > > > > > - goto out_overflow;
> > > > > > > > > - *layouttype = be32_to_cpup(p);
> > > > > > > > > + for (; num; --num) {
> > > > > > > > > + p = xdr_inline_decode(xdr, 4);
> > > > > > > > > +
> > > > > > > > > + if (unlikely(!p))
> > > > > > > > > + goto out_overflow;
> > > > > > > > > +
> > > > > > > > > + type = be32_to_cpup(p);
> > > > > > > > > +
> > > > > > > > > + /* Ignore any that we don't understand */
> > > > > > > > > + if (unlikely(type >= LAYOUT_TYPE_MAX))
> > > > > > > >
> > > > > > > > This will in effect hard code the layouts that the
> > > > > > > > client supports.
> > > > > > > > LAYOUT_TYPE_MAX is something that applies to knfsd only
> > > > > > > > for now.
> > > > > > > > Let’s not leak it into the client. I suggest just
> > > > > > > > making this
> > > > > > > > 8*sizeof(*layouttypes).
> > > > > > > >
> > > > > > > Fair enough. I'll make that change.
> > > > > > >
> > > > > > > That said...LAYOUT_TYPE_MAX is a value in the
> > > > > > > pnfs_layouttype enum, and
> > > > > > > that enum is used in both the client and the server code,
> > > > > > > AFAICT. If we
> > > > > > > add a new LAYOUT_* value to that enum for the client,
> > > > > > > then we'll need
> > > > > > > to increase that value anyway. So, I'm not sure I
> > > > > > > understand how this
> > > > > > > limits the client in any way...
> > > > > > No, the client doesn’t use enum pnfs_layouttype anywhere.
> > > > > > If you look
> > > > > > at set_pnfs_layoutdriver(), you’ll note that we currently
> > > > > > support all
> > > > > > values for the layout type.
> > > > > >
> > > > > Ok, I see. So if someone were to (for instance) create a 3rd
> > > > > party
> > > > > layout driver module that had used a value above
> > > > > LAYOUT_TYPE_MAX then
> > > > > this would prevent it from working.
> > > > >
> > > > > Hmmm...so even if I make the change that you're suggesting,
> > > > > this will
> > > > > still limit the client to working with layout types that are
> > > > > below a
> > > > > value of 32. Is that also a problem? If so, then maybe I
> > > > > should respin
> > > > > this to be more like the one Tigran had: make an array or
> > > > > something to
> > > > > hold those values.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > Yecchhhhh...ok after thinking about this, the whole out-of-tree
> > > > layout
> > > > driver possibility really throws a wrench into this plan...
> > > >
> > > > Suppose someone creates such a layout driver, drops the module
> > > > onto the
> > > > client and the core kernel knows nothing about it. With the
> > > > current
> > > > patch, it'd be ignored. I don't think that's what we want
> > > > though.
> > > >
> > > > Where should that driver fit in the selection order in
> > > > set_pnfs_layoutdriver?
> > > >
> > > > Tigran's patch had the client start with the second element and
> > > > only
> > > > pick the first one in the list if nothing else worked. That's
> > > > sort of
> > > > icky though.
> > > >
> > > > Another idea might be to just attempt unrecognized ones as the
> > > > driver
> > > > of last resort, when no other driver has worked?
> > > >
> > > > Alternately, we could add a mount option or something that
> > > > would affect
> > > > the selection order? If so, how should such an option work?
> > > >
> > > > I'm really open to suggestions here -- I've no idea what the
> > > > right
> > > > thing to do is at this point...sigh.
> > >
> > > There are two things in my patch what I don't like:
> > >
> > > - an int array to store layouts, which mostly will be used by a
> > > single element
> > > only
> > > - server must know client implementation to achieve desired
> > > result
> > >
> > Meh, the array is not too big a deal. We only allocate a fsinfo
> > struct
> > to handle the call. Once we've selected the layout type, it gets
> > discarded. The second problem is the bigger one, IMO.
> >
> > >
> > > In your approach other two problems:
> > >
> > > - max layout type id 32
> > > - hard coded supported layout types and the order
> > >
> > Right, both are problems. For now, I'm not too worried about
> > getting
> > _official_ layout type values that are above 32, but the spec says:
> >
> > Types within the range 0x00000001-0x7FFFFFFF are
> > globally unique and are assigned according to the description in
> > Section 22.4; they are maintained by IANA. Types within the
> > range
> > 0x80000000-0xFFFFFFFF are site specific and for private use
> > only.
> >
> > So both of the above problems in my RFC patch make it difficult to
> > experiment with new layout types.
> >
> > >
> > > Any of them will help in adoption of flexfile layout, especially
> > > if we get it
> > > into
> > > RHEL7.
> > >
> > > In discussion with Christoph Hellwig back in March, I have
> > > proposed a mount
> > > option:
> > >
> > > mount -o preferred_layout=nfs4_file,vers=4.1
> > >
> > > or may be even an nfs kernel module option.
> > >
> > >
> > > This will allow server to send layout in any order, but let
> > > client to re-order
> > > them by it's own rules.
> > >
> > Yeah, I was thinking something along the same lines.
> >
> > The problem with a mount option is that you can transit to
> > different
> > filesystems in multiple ways with NFS these days (referrals,
> > etc...).
> > Propagating and handling mount options in those cases can quickly
> > become quite messy.
> >
> > A module option to set the selection order might be best. For
> > instance:
> >
> >
> > nfs4.pnfs_layout_order=0x80000006:scsi:block:object:flexfile:file
> Hi Jeff,
>
> after some mental exercises around this topic, I came to a
> conclusion, that
> module option is a wrong approach. The module configuration is a
> global
> setting for kernel nfs client. Imagine a situation in which you want
> to use
> flexfiles with one server and nfs4_files with another server, but
> both
> support both layout types.
>
> Looks like there is no way around mount option.
>
> Tigran.
>
>
Sure, that sort of thing is possible. For now though most servers still
only send a list of 1 layout type, with a few sending a list of two or
three. I don't know that we really need to plumb in that level of
granularity just yet.
The reason I'm hesitant to add a mount option is that because of the
way that structures are aggressively shared, it can be difficult to set
this type of thing on a per-mount basis.
The set I sent this morning sidesteps the whole configuration issue,
but should make it possible to add that in later once the maintainers
express a preference on how they'd like that to work (hint, hint)...
--
Jeff Layton <jlayton@poochiereds.net>
prev parent reply other threads:[~2016-06-07 12:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-30 16:35 [RFC PATCH] nfs: allow nfs client to handle servers that hand out multiple layout types Jeff Layton
2016-05-31 16:03 ` Trond Myklebust
2016-05-31 21:09 ` Jeff Layton
2016-05-31 21:41 ` Trond Myklebust
2016-05-31 21:54 ` Jeff Layton
2016-06-01 21:53 ` Jeff Layton
2016-06-02 7:12 ` Mkrtchyan, Tigran
2016-06-02 11:04 ` Jeff Layton
2016-06-07 12:26 ` Mkrtchyan, Tigran
2016-06-07 12:43 ` Jeff Layton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1465303390.3024.15.camel@poochiereds.net \
--to=jlayton@poochiereds.net \
--cc=Anna.Schumaker@netapp.com \
--cc=hch@infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=tigran.mkrtchyan@desy.de \
--cc=trondmy@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).