All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@poochiereds.net>
To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
Cc: Trond Myklebust <trondmy@primarydata.com>,
	linux-nfs@vger.kernel.org,
	Anna Schumaker <Anna.Schumaker@netapp.com>,
	hch@infradead.org
Subject: Re: [RFC PATCH] nfs: allow nfs client to handle servers that hand out multiple layout types
Date: Tue, 07 Jun 2016 08:43:10 -0400	[thread overview]
Message-ID: <1465303390.3024.15.camel@poochiereds.net> (raw)
In-Reply-To: <797047869.16381632.1465302372374.JavaMail.zimbra@desy.de>

On Tue, 2016-06-07 at 14:26 +0200, Mkrtchyan, Tigran wrote:
> 
> ----- Original Message -----
> > 
> > From: "Jeff Layton" <jlayton@poochiereds.net>
> > To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>
> > Cc: "Trond Myklebust" <trondmy@primarydata.com>, linux-nfs@vger.ker
> > nel.org, "Anna Schumaker"
> > <Anna.Schumaker@netapp.com>, hch@infradead.org
> > Sent: Thursday, June 2, 2016 1:04:19 PM
> > Subject: Re: [RFC PATCH] nfs: allow nfs client to handle servers
> > that hand out multiple layout types
> > 
> > On Thu, 2016-06-02 at 09:12 +0200, Mkrtchyan, Tigran wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > 
> > > > From: "Jeff Layton" <jlayton@poochiereds.net>
> > > > To: "Trond Myklebust" <trondmy@primarydata.com>, linux-nfs@vger
> > > > .kernel.org
> > > > Cc: "tigran mkrtchyan" <tigran.mkrtchyan@desy.de>, "Anna
> > > > Schumaker"
> > > > <Anna.Schumaker@netapp.com>, hch@infradead.org
> > > > Sent: Wednesday, June 1, 2016 11:53:03 PM
> > > > Subject: Re: [RFC PATCH] nfs: allow nfs client to handle
> > > > servers that hand out
> > > > multiple layout types
> > > > 
> > > > On Tue, 2016-05-31 at 17:54 -0400, Jeff Layton wrote:
> > > > > 
> > > > > On Tue, 2016-05-31 at 21:41 +0000, Trond Myklebust wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 5/31/16, 17:09, "Jeff Layton" <jlayton@poochiereds.net>
> > > > > > wrote:
> > > > > > 
> > > > > > > 
> > > > > > > On Tue, 2016-05-31 at 16:03 +0000, Trond Myklebust wrote:
> > > > > > > > 
> > > > > > > >  
> > > > > > > > On 5/30/16, 12:35, "Jeff Layton" <jlayton@poochiereds.n
> > > > > > > > et> wrote:
> > > > > > > >  
> > > > > > > > > 
> > > > > > > > > Allow the client to deal with servers that hand out
> > > > > > > > > multiple layout
> > > > > > > > > types for the same filesystem. When this happens, we
> > > > > > > > > pick the "best" one,
> > > > > > > > > based on a hardcoded assumed order in the client
> > > > > > > > > code.
> > > > > > > > >  
> > > > > > > > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.c
> > > > > > > > > om>
> > > > > > > > > ---
> > > > > > > > > fs/nfs/client.c | 2 +-
> > > > > > > > > fs/nfs/nfs4proc.c | 2 +-
> > > > > > > > > fs/nfs/nfs4xdr.c | 41 +++++++++++++-------------
> > > > > > > > > fs/nfs/pnfs.c | 76
> > > > > > > > > ++++++++++++++++++++++++++++++++++++++-----------
> > > > > > > > > include/linux/nfs_xdr.h | 2 +-
> > > > > > > > > 5 files changed, 85 insertions(+), 38 deletions(-)
> > > > > > > > >  
> > > > > > > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > > > > > > > index 0c96528db94a..53b41f4bd45a 100644
> > > > > > > > > --- a/fs/nfs/client.c
> > > > > > > > > +++ b/fs/nfs/client.c
> > > > > > > > > @@ -787,7 +787,7 @@ int nfs_probe_fsinfo(struct
> > > > > > > > > nfs_server *server, struct
> > > > > > > > > nfs_fh *mntfh, struct nfs
> > > > > > > > > }
> > > > > > > > >  
> > > > > > > > > fsinfo.fattr = fattr;
> > > > > > > > > -	fsinfo.layouttype = 0;
> > > > > > > > > +	fsinfo.layouttypes = 0;
> > > > > > > > > error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo);
> > > > > > > > > if (error < 0)
> > > > > > > > > goto out_error;
> > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > > > > > > index de97567795a5..9446aef89b48 100644
> > > > > > > > > --- a/fs/nfs/nfs4proc.c
> > > > > > > > > +++ b/fs/nfs/nfs4proc.c
> > > > > > > > > @@ -4252,7 +4252,7 @@ static int
> > > > > > > > > nfs4_proc_fsinfo(struct nfs_server *server,
> > > > > > > > > struct nfs_fh *fhandle, s
> > > > > > > > > if (error == 0) {
> > > > > > > > > /* block layout checks this! */
> > > > > > > > > server->pnfs_blksize = fsinfo->blksize;
> > > > > > > > > -	 set_pnfs_layoutdriver(server, fhandle,
> > > > > > > > > fsinfo->layouttype);
> > > > > > > > > +	 set_pnfs_layoutdriver(server, fhandle,
> > > > > > > > > fsinfo->layouttypes);
> > > > > > > > > }
> > > > > > > > >  
> > > > > > > > > return error;
> > > > > > > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > > > > > > > > index 661e753fe1c9..876a80802c1d 100644
> > > > > > > > > --- a/fs/nfs/nfs4xdr.c
> > > > > > > > > +++ b/fs/nfs/nfs4xdr.c
> > > > > > > > > @@ -4723,33 +4723,36 @@ static int
> > > > > > > > > decode_getfattr(struct xdr_stream *xdr,
> > > > > > > > > struct nfs_fattr *fattr,
> > > > > > > > > * Decode potentially multiple layout types. Currently
> > > > > > > > > we only support
> > > > > > > > > * one layout driver per file system.
> > > > > > > > > */
> > > > > > > > > -static int decode_first_pnfs_layout_type(struct
> > > > > > > > > xdr_stream *xdr,
> > > > > > > > > -	 uint32_t *layouttype)
> > > > > > > > > +static int decode_pnfs_layout_types(struct
> > > > > > > > > xdr_stream *xdr, u32 *layouttypes)
> > > > > > > > > {
> > > > > > > > > __be32 *p;
> > > > > > > > > int num;
> > > > > > > > > +	u32 type;
> > > > > > > > >  
> > > > > > > > > p = xdr_inline_decode(xdr, 4);
> > > > > > > > > if (unlikely(!p))
> > > > > > > > > goto out_overflow;
> > > > > > > > > num = be32_to_cpup(p);
> > > > > > > > >  
> > > > > > > > > -	/* pNFS is not supported by the underlying
> > > > > > > > > file system */
> > > > > > > > > -	if (num == 0) {
> > > > > > > > > -	 *layouttype = 0;
> > > > > > > > > -	 return 0;
> > > > > > > > > -	}
> > > > > > > > > -	if (num > 1)
> > > > > > > > > -	 printk(KERN_INFO "NFS: %s: Warning:
> > > > > > > > > Multiple pNFS layout "
> > > > > > > > > -	 "drivers per filesystem not supported\n",
> > > > > > > > > __func__);
> > > > > > > > > +	*layouttypes = 0;
> > > > > > > > >  
> > > > > > > > > -	/* Decode and set first layout type, move
> > > > > > > > > xdr->p past unused types */
> > > > > > > > > -	p = xdr_inline_decode(xdr, num * 4);
> > > > > > > > > -	if (unlikely(!p))
> > > > > > > > > -	 goto out_overflow;
> > > > > > > > > -	*layouttype = be32_to_cpup(p);
> > > > > > > > > +	for (; num; --num) {
> > > > > > > > > +	 p = xdr_inline_decode(xdr, 4);
> > > > > > > > > +
> > > > > > > > > +	 if (unlikely(!p))
> > > > > > > > > +	 goto out_overflow;
> > > > > > > > > +
> > > > > > > > > +	 type = be32_to_cpup(p);
> > > > > > > > > +
> > > > > > > > > +	 /* Ignore any that we don't understand */
> > > > > > > > > +	 if (unlikely(type >= LAYOUT_TYPE_MAX))
> > > > > > > >  
> > > > > > > > This will in effect hard code the layouts that the
> > > > > > > > client supports.
> > > > > > > > LAYOUT_TYPE_MAX is something that applies to knfsd only
> > > > > > > > for now.
> > > > > > > > Let’s not leak it into the client. I suggest just
> > > > > > > > making this
> > > > > > > > 8*sizeof(*layouttypes).
> > > > > > > >  
> > > > > > > Fair enough. I'll make that change.
> > > > > > > 
> > > > > > > That said...LAYOUT_TYPE_MAX is a value in the
> > > > > > > pnfs_layouttype enum, and
> > > > > > > that enum is used in both the client and the server code,
> > > > > > > AFAICT. If we
> > > > > > > add a new LAYOUT_* value to that enum for the client,
> > > > > > > then we'll need
> > > > > > > to increase that value anyway. So, I'm not sure I
> > > > > > > understand how this
> > > > > > > limits the client in any way...
> > > > > > No, the client doesn’t use enum pnfs_layouttype anywhere.
> > > > > > If you look
> > > > > > at set_pnfs_layoutdriver(), you’ll note that we currently
> > > > > > support all
> > > > > > values for the layout type.
> > > > > > 
> > > > > Ok, I see. So if someone were to (for instance) create a 3rd
> > > > > party
> > > > > layout driver module that had used a value above
> > > > > LAYOUT_TYPE_MAX then
> > > > > this would prevent it from working.
> > > > > 
> > > > > Hmmm...so even if I make the change that you're suggesting,
> > > > > this will
> > > > > still limit the client to working with layout types that are
> > > > > below a
> > > > > value of 32. Is that also a problem? If so, then maybe I
> > > > > should respin
> > > > > this to be more like the one Tigran had: make an array or
> > > > > something to
> > > > > hold those values.
> > > > > 
> > > > > Thoughts?
> > > > > 
> > > > Yecchhhhh...ok after thinking about this, the whole out-of-tree 
> > > > layout
> > > > driver possibility really throws a wrench into this plan...
> > > > 
> > > > Suppose someone creates such a layout driver, drops the module
> > > > onto the
> > > > client and the core kernel knows nothing about it.  With the
> > > > current
> > > > patch, it'd be ignored. I don't think that's what we want
> > > > though.
> > > > 
> > > > Where should that driver fit in the selection order in
> > > > set_pnfs_layoutdriver?
> > > > 
> > > > Tigran's patch had the client start with the second element and
> > > > only
> > > > pick the first one in the list if nothing else worked. That's
> > > > sort of
> > > > icky though.
> > > > 
> > > > Another idea might be to just attempt unrecognized ones as the
> > > > driver
> > > > of last resort, when no other driver has worked?
> > > > 
> > > > Alternately, we could add a mount option or something that
> > > > would affect
> > > > the selection order? If so, how should such an option work?
> > > > 
> > > > I'm really open to suggestions here -- I've no idea what the
> > > > right
> > > > thing to do is at this point...sigh.
> > > 
> > > There are two things in my patch what I don't like:
> > > 
> > >   - an int array to store layouts, which mostly will be used by a
> > > single element
> > >   only
> > >   - server must know client implementation to achieve desired
> > > result
> > > 
> > Meh, the array is not too big a deal. We only allocate a fsinfo
> > struct
> > to handle the call. Once we've selected the layout type, it gets
> > discarded. The second problem is the bigger one, IMO.
> > 
> > > 
> > > In your approach other two problems:
> > > 
> > >   - max layout type id 32
> > >   - hard coded supported layout types and the order
> > > 
> > Right, both are problems. For now, I'm not too worried about
> > getting
> > _official_ layout type values that are above 32, but the spec says:
> > 
> >    Types within the range 0x00000001-0x7FFFFFFF are
> >    globally unique and are assigned according to the description in
> >    Section 22.4; they are maintained by IANA.  Types within the
> > range
> >    0x80000000-0xFFFFFFFF are site specific and for private use
> > only.
> > 
> > So both of the above problems in my RFC patch make it difficult to
> > experiment with new layout types.
> > 
> > > 
> > > Any of them will help in adoption of flexfile layout, especially
> > > if we get it
> > > into
> > > RHEL7.
> > > 
> > > In discussion with Christoph Hellwig back in March, I have
> > > proposed a mount
> > > option:
> > > 
> > >    mount -o preferred_layout=nfs4_file,vers=4.1
> > > 
> > > or may be even an nfs kernel module option.
> > > 
> > > 
> > > This will allow server to send layout in any order, but let
> > > client to re-order
> > > them by it's own rules.
> > > 
> > Yeah, I was thinking something along the same lines.
> > 
> > The problem with a mount option is that you can transit to
> > different
> > filesystems in multiple ways with NFS these days (referrals,
> > etc...).
> > Propagating and handling mount options in those cases can quickly
> > become quite messy.
> > 
> > A module option to set the selection order might be best. For
> > instance:
> > 
> >    
> > nfs4.pnfs_layout_order=0x80000006:scsi:block:object:flexfile:file
> Hi Jeff,
> 
> after some mental exercises around this topic, I came to a
> conclusion, that
> module option is a wrong approach. The module configuration is a
> global
> setting for kernel nfs client. Imagine a situation in which you want
> to use
> flexfiles with one server and nfs4_files with another server, but
> both
> support both layout types.
> 
> Looks like there is no way around mount option.
> 
> Tigran.
> 
> 

Sure, that sort of thing is possible. For now though most servers still
only send a list of 1 layout type, with a few sending a list of two or
three. I don't know that we really need to plumb in that level of
granularity just yet.

The reason I'm hesitant to add a mount option is that because of the
way that structures are aggressively shared, it can be difficult to set
this type of thing on a per-mount basis.

The set I sent this morning sidesteps the whole configuration issue,
but should make it possible to add that in later once the maintainers
express a preference on how they'd like that to work (hint, hint)...

-- 
Jeff Layton <jlayton@poochiereds.net>


      reply	other threads:[~2016-06-07 12:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-30 16:35 [RFC PATCH] nfs: allow nfs client to handle servers that hand out multiple layout types Jeff Layton
2016-05-31 16:03 ` Trond Myklebust
2016-05-31 21:09   ` Jeff Layton
2016-05-31 21:41     ` Trond Myklebust
2016-05-31 21:54       ` Jeff Layton
2016-06-01 21:53         ` Jeff Layton
2016-06-02  7:12           ` Mkrtchyan, Tigran
2016-06-02 11:04             ` Jeff Layton
2016-06-07 12:26               ` Mkrtchyan, Tigran
2016-06-07 12:43                 ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465303390.3024.15.camel@poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=Anna.Schumaker@netapp.com \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tigran.mkrtchyan@desy.de \
    --cc=trondmy@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.