All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Benny Halevy <bhalevy@panasas.com>, Andy Adamson <andros@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 5/5] FIXME: pnfs-post-submit: per mount layout driver private data
Date: Sun, 09 May 2010 19:25:04 +0300	[thread overview]
Message-ID: <4BE6E1E0.8020706@panasas.com> (raw)
In-Reply-To: <1273173832-27786-1-git-send-email-bhalevy@panasas.com>

On 05/06/2010 10:23 PM, Benny Halevy wrote:
> Temporary relief until we convert to use generic device cache.
> 

[In short]
Rrrr, No! This is a complete crash. The private data is per-server but
is called per super-block. In the case of two mounts of same "server",
the user of this will:
- Leak a device cache on second mount
- Crash after close of first super block.

[The long story]
I'm commenting here for the complete series Andy's and Benny included.

What Andy tried to do was move the per super-block device cache to a
per "server" device cache. (This is the per server struct at the NFS
client side not to be confused with an NFS-server what so ever). This
is because as mandated by the Protocol each device id uniqueness is
governed by a per-server-per-client-per-layoput_type, so multiple
mounts can share the device-cache and save resources. The old code
of per-mount-point was correct only not optimal.

But he did not finish his job. Because he still calls the device
cache initialization at per-mount-point init. ->initialize_mountpoint is
called from set_pnfs_layoutdriver() which is called with a super-block
per mount-point. He went to grate length (I hope, I did not check) to
make sure only the first mount, allocates the cache, and the last mount
destroys it.

But otherwise he noticed (and Benny tried to help) that now
initialize_mountpoint is per-server, and not per-sb. Hence the pointer
to struct server. 

So the old code is now a layering violation and hence the mix-up and the bug:

- If it is a per-server? name it ->initialize_server() receiving server
  pointer, No?

- If it is a per-server then shift the all set_pnfs_layoutdriver() to
  be called once per-server construction (At the server constructor code)
  then you don't need to sync in multiple places the initialization/destruction
  of sb(s) vs. servers vs device-caches. Server struct life-cycle will govern that.

Accommodating future needs:
- In objects layout (In code not yet released) I have a per-super-block: pages-
  cache-pool, raid-engine governing struct, and some other raid related information.
  I use per-super-block because this is the most natural In the Linux VFS API. So
  global stuff per super-block directly pointed by every inode for easy (random)
  access at every API level. I could shift this to be per-server in NFS-client. I surly
  don't want it global, (Rrrrr) and per-inode is two small. I will need to work harder
  to optimize for the extra contention (or maybe not).

  So the per-server model is fine, I guess, but don't let me slave over a broken API that
  forces me to duplicate lifetime rules of things that are already taken care of, only
  not seen by the layout driver.

  If moving to a per-server model then some current structures referencing and pointing
  could change to remove the SB from the picture and directly point to server.

I know this is lots of work and who's going to do it, but I was not the one who suggested
the optimization in the first place. A per-SB is some much easier because of the Linux
environment we live in, but if we do it, it must be done right.

Boaz

> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
> ---
>  include/linux/nfs_fs_sb.h |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index cad56a7..00a4e7e 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -164,6 +164,7 @@ struct nfs_server {
>  
>  #ifdef CONFIG_NFS_V4_1
>  	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout driver */
> +	void			       *pnfs_ld_data; /* Per-mount data */
>  	unsigned int			ds_rsize;  /* Data server read size */
>  	unsigned int			ds_wsize;  /* Data server write size */
>  #endif /* CONFIG_NFS_V4_1 */


  reply	other threads:[~2010-05-09 16:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-06 19:20 [PATCH 0/7] pnfs-submit api touch ups Benny Halevy
2010-05-06 19:22 ` [PATCH 1/2] SQUASHME: pnfs-submit: have initialize_mountpoint return status Benny Halevy
2010-05-06 19:23 ` [PATCH 2/2] SQUASHME: pnfs-submit: pass struct nfs_server * to getdeviceinfo Benny Halevy
2010-05-06 19:23 ` [PATCH 3/5] pnfs-post-submit: pass struct nfs_server * to getdevicelist Benny Halevy
2010-05-06 19:23 ` [PATCH 4/5] pnfs-post-submit: pass mntfh down the init_pnfs path Benny Halevy
2010-05-06 19:23 ` [PATCH 5/5] FIXME: pnfs-post-submit: per mount layout driver private data Benny Halevy
2010-05-09 16:25   ` Boaz Harrosh [this message]
2010-05-10 14:24     ` Andy Adamson
2010-05-11  8:46       ` Boaz Harrosh
2010-05-11 15:02         ` William A. (Andy) Adamson
2010-05-06 19:24 ` [PATCH 6/6] SQUASHME: pnfs-block: convert APIs pnfs-post-submit Benny Halevy
2010-05-06 19:24 ` [PATCH 7/7] SQUASHME: pnfs-obj: " Benny Halevy
2010-05-06 19:33 ` [PATCH 0/7] pnfs-submit api touch ups William A. (Andy) Adamson
     [not found]   ` <o2o89c397151005061233vcaefdd27s4513be2b641348a2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-06 20:24     ` Benny Halevy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE6E1E0.8020706@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=andros@netapp.com \
    --cc=bhalevy@panasas.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.