All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Andy Adamson <andros@netapp.com>
Cc: Benny Halevy <bhalevy@panasas.com>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 5/5] FIXME: pnfs-post-submit: per mount layout driver private data
Date: Tue, 11 May 2010 11:46:43 +0300	[thread overview]
Message-ID: <4BE91973.60505@panasas.com> (raw)
In-Reply-To: <CC482AF1-FEF0-4D01-A307-4CE5751F7689@netapp.com>

On 05/10/2010 05:24 PM, Andy Adamson wrote:
> 
> On May 9, 2010, at 12:25 PM, Boaz Harrosh wrote:
> 
>> On 05/06/2010 10:23 PM, Benny Halevy wrote:
>>> Temporary relief until we convert to use generic device cache.
>>>
>>
>> [In short]
>> Rrrr, No! This is a complete crash. The private data is per-server but
>> is called per super-block. In the case of two mounts of same "server",
>> the user of this will:
>> - Leak a device cache on second mount
>> - Crash after close of first super block.
>>
>> [The long story]
>> I'm commenting here for the complete series Andy's and Benny included.
>>
>> What Andy tried to do was move the per super-block device cache to a
>> per "server" device cache. (This is the per server struct at the NFS
>> client side not to be confused with an NFS-server what so ever). This
>> is because as mandated by the Protocol each device id uniqueness is
>> governed by a per-server-per-client-per-layoput_type, so multiple
>> mounts can share the device-cache and save resources. The old code
>> of per-mount-point was correct only not optimal.
>>
>> But he did not finish his job. Because he still calls the device
>> cache initialization at per-mount-point init. - 
>>> initialize_mountpoint is
>> called from set_pnfs_layoutdriver() which is called with a super-block
>> per mount-point. He went to grate length (I hope, I did not check) to
>> make sure only the first mount, allocates the cache, and the last  
>> mount
>> destroys it.
>>
>> But otherwise he noticed (and Benny tried to help) that now
>> initialize_mountpoint is per-server, and not per-sb. Hence the pointer
>> to struct server.
>>
>> So the old code is now a layering violation and hence the mix-up and  
>> the bug
> 
>>
>> - If it is a per-server? name it ->initialize_server() receiving  
>> server
>>  pointer, No?
>>

What about the name ?

>> - If it is a per-server then shift the all set_pnfs_layoutdriver() to
>>  be called once per-server construction (At the server constructor  
>> code)
> 
> set_pnfs_layoutdriver checks to see if nfs_server->pnfs_curr_ld is set.

Yes set_pnfs_layoutdriver does, again a layering violation in my opinion.
However ->uninitialize_mountpoint() is still called for every sb-unmount
how useful is that? (at super.c nfs4_kill_super)

It is very simple really.

- ->initialize_server() is called from nfs_server_set_fsinfo() for every
  mount the check should be there. If you need it that late? Perhaps it could
  be called earlier at nfs_init_server()?

- Call ->uninitialize_server() from nfs_free_server(). And all is well.


- Give me a void* at server to keep my stuff.

> Put the private pointer into struct pnfs_layoutdriver_type and your  
> problem will be solved.
> 

No pnfs_layoutdriver_type is global. I might as well just put it at
the data_segment. I wanted a per mount. Willing to compromise on per server

> -->Andy
> 

OK I finally read the code. and forget everything I said!

So current code is one big bug in regard to filelayout_uninitialize_mountpoint
called for every nfs4_kill_super and the destruction of the cache.

Struct server has nothing to do with it. it is just on the way to receive
the struct *client* pointer. (My god the server/client relationships in 
Linux-nfs-client I still don't understand it).

So everything I said above but exchange the name "server" with *"client"*

And most importantly. YOU DO NOT NEED THE REFERENCE COUNTS. (right, there
are two)

if you stick the device-cache on the lifetime of the client structure
then:
- There will not be any super-blocks alive before/after a particular client
  dies. (right? sb->ref->server->ref->client)
- There will not be any inodes alive after a super-blocks dies, hence any
  IOs nor any layouts.

So a device cache is what you said it is per client structure no more no
less. No?

(I can't find the last version of patches you sent to the mailing list
 I wanted to comment on them (Sorry for the delay was busy). I'll look
 in Benny's git I hope he did not squash them yet. And will comment
 on that)

Boaz

>>  then you don't need to sync in multiple places the initialization/ 
>> destruction
>>  of sb(s) vs. servers vs device-caches. Server struct life-cycle  
>> will govern that.
>>
>> Accommodating future needs:
>> - In objects layout (In code not yet released) I have a per-super- 
>> block: pages-
>>  cache-pool, raid-engine governing struct, and some other raid  
>> related information.
>>  I use per-super-block because this is the most natural In the Linux  
>> VFS API. So
>>  global stuff per super-block directly pointed by every inode for  
>> easy (random)
>>  access at every API level. I could shift this to be per-server in  
>> NFS-client. I surly
>>  don't want it global, (Rrrrr) and per-inode is two small. I will  
>> need to work harder
>>  to optimize for the extra contention (or maybe not).
>>
>>  So the per-server model is fine, I guess, but don't let me slave  
>> over a broken API that
>>  forces me to duplicate lifetime rules of things that are already  
>> taken care of, only
>>  not seen by the layout driver.
>>
>>  If moving to a per-server model then some current structures  
>> referencing and pointing
>>  could change to remove the SB from the picture and directly point  
>> to server.
>>
>> I know this is lots of work and who's going to do it, but I was not  
>> the one who suggested
>> the optimization in the first place. A per-SB is some much easier  
>> because of the Linux
>> environment we live in, but if we do it, it must be done right.
>>
>> Boaz
>>
>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>> ---
>>> include/linux/nfs_fs_sb.h |    1 +
>>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index cad56a7..00a4e7e 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -164,6 +164,7 @@ struct nfs_server {
>>>
>>> #ifdef CONFIG_NFS_V4_1
>>> 	struct pnfs_layoutdriver_type  *pnfs_curr_ld; /* Active layout  
>>> driver */
>>> +	void			       *pnfs_ld_data; /* Per-mount data */
>>> 	unsigned int			ds_rsize;  /* Data server read size */
>>> 	unsigned int			ds_wsize;  /* Data server write size */
>>> #endif /* CONFIG_NFS_V4_1 */
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


  reply	other threads:[~2010-05-11  8:46 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-06 19:20 [PATCH 0/7] pnfs-submit api touch ups Benny Halevy
2010-05-06 19:22 ` [PATCH 1/2] SQUASHME: pnfs-submit: have initialize_mountpoint return status Benny Halevy
2010-05-06 19:23 ` [PATCH 2/2] SQUASHME: pnfs-submit: pass struct nfs_server * to getdeviceinfo Benny Halevy
2010-05-06 19:23 ` [PATCH 3/5] pnfs-post-submit: pass struct nfs_server * to getdevicelist Benny Halevy
2010-05-06 19:23 ` [PATCH 4/5] pnfs-post-submit: pass mntfh down the init_pnfs path Benny Halevy
2010-05-06 19:23 ` [PATCH 5/5] FIXME: pnfs-post-submit: per mount layout driver private data Benny Halevy
2010-05-09 16:25   ` Boaz Harrosh
2010-05-10 14:24     ` Andy Adamson
2010-05-11  8:46       ` Boaz Harrosh [this message]
2010-05-11 15:02         ` William A. (Andy) Adamson
2010-05-06 19:24 ` [PATCH 6/6] SQUASHME: pnfs-block: convert APIs pnfs-post-submit Benny Halevy
2010-05-06 19:24 ` [PATCH 7/7] SQUASHME: pnfs-obj: " Benny Halevy
2010-05-06 19:33 ` [PATCH 0/7] pnfs-submit api touch ups William A. (Andy) Adamson
     [not found]   ` <o2o89c397151005061233vcaefdd27s4513be2b641348a2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-05-06 20:24     ` Benny Halevy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BE91973.60505@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=andros@netapp.com \
    --cc=bhalevy@panasas.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.