From: Kinglong Mee <kinglongmee@gmail.com>
To: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
"J. Bruce Fields" <bfields@fieldses.org>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org,
Trond Myklebust <trond.myklebust@primarydata.com>,
kinglongmee@gmail.com
Subject: Re: [PATCH 10/10 v7] nfsd: Allows user un-mounting filesystem where nfsd exports base on
Date: Mon, 27 Jul 2015 11:17:41 +0800 [thread overview]
Message-ID: <55B5A2D5.4000004@gmail.com> (raw)
In-Reply-To: <20150727125121.23655367@noble>
On 7/27/2015 10:51, NeilBrown wrote:
> On Mon, 27 Jul 2015 10:28:52 +0800 Kinglong Mee <kinglongmee@gmail.com>
> wrote:
>
>> On 7/24/2015 10:05, NeilBrown wrote:
>>> On Mon, 13 Jul 2015 05:45:53 +0100 Al Viro <viro@ZenIV.linux.org.uk>
>>> wrote:
>>>
>>>> On Mon, Jul 13, 2015 at 02:20:59PM +1000, NeilBrown wrote:
>>>>
>>>>> Actually, with that change to pin_kill, this side of things becomes
>>>>> really easy.
>>>>> All expXXX_pin_kill needs to do is call your new cache_delete_entry.
>>>>> If that doesn't cause the entry to be put, then something else has a
>>>>> temporary reference which will be put soon. In any case, pin_kill()
>>>>> will wait long enough, but not indefinitely.
>>>>> No need for kref_get_unless_zero() or any of that.
>>>>
>>>> No. You are seriously misunderstanding what ->kill() is for and what the
>>>> existing instances are doing. Again, there is no promise whatsoever that
>>>> the object containing fs_pin instance will *survive* past ->kill().
>>>> At all.
>>>>
>>>> RTFS, please. What is sorely missing in this recurring patchset is a clear
>>>> description of lifetime rules and ordering (who waits for whom and how long).
>>>> For all the objects involved.
>>>
>>> Good point. Let me try.
>>>
>>> Entries in the sunrpc 'cache' each contain some 'key' fields and some
>>> 'content' fields.
>>>
>>> The key fields are set by the .init() method when the entry is
>>> created, which can happen in a call to sunrpc_cache_lookup() or to
>>> sunrpc_cache_update().
>>>
>>> The content fields are set by the .update() method when a value is
>>> provided for the cache entry. This happens in sunrpc_cache_update();
>>>
>>> A cache entry can be not-valid, negative, or valid.
>>> It starts non-valid when sunrpc_cache_lookup() fails to find the search
>>> key and so creates a new entry (and sets up the key with .init).
>>> It then transitions to either negative or valid.
>>> This can happen through sunrpc_cache_update() or through an error when
>>> instigating an up-call, in which case it goes to negative.
>>> Once it is negative or valid, it stays that way until it is released.
>>> If sunrpc_cache_update is called on an entry that is not not-valid,
>>> then a new entry is created and the old one is marked as expired.
>>> A cache search will find the new one before the old.
>>>
>>> The vfsmount object is involved in two separate caches.
>>> It is part of the content of svc_expkey and part of the key of
>>> svc_export.
>>>
>>> An svc_expkey entry is only ever held transiently. It is held while an
>>> update is being processed, and it is held briefly while mapping a
>>> filehandle to a mnt+dentry.
>>> Firstly part of the filehandle is used to acccess the svc_expkey cache
>>> to get the vfsmnt. Then that vfsmnt plus the client identification is
>>> looked up in the svc_export cache to find the export options. Then the
>>> svc_expkey cache entry is released.
>>>
>>> So it is only held during a lookup of another cache. This can take an
>>> arbitrarily long time as the lookup can go to rpc.mountd in user-space.
>>>
>>>
>>> The svc_export cache entry can be held for the duration of a single NFS
>>> request. It is stored in the 'struct svc_fh' file handle structure
>>> which is release at the end of handling the request.
>>>
>>> The vfsmnt and dentry are only "used" to validate the filehandle and
>>> then while that filehandle is still active.
>>>
>>>
>>> To avoid having unmount hang while nfsd is performing an upcall to
>>> mountd, we need to legitimize the vfsmnt in the svc_expkey. If that
>>> fails, exp_find_key() can fail and we would never perform the lookup on
>>> svc_export.
>>>
>>> If it succeeds, then the legitimacy can be handed over to the svc_export
>>> cache entry, which could then continue to own it, or could hand it on
>>> to the svc_fh.
>>>
>>> The latter is *probably* cleanest.
>>> i.e. an svc_fh should always own a reference to exp->ex_path.mnt, and
>>> fh_put must put it.
>>
>> I don't agree adding new argument (eg, fh_vfsmnt) in svc_fh.
>
> I wasn't suggesting that a new field be added to svc_fh.
> Just that if svc_fh->fh_export was not NULL, then the svc_fh "owned" a
> reference to svc_fh->fh_export->ex_path.mnt which it had to mnt_put()
> when it released ->fh_export.
>
> So fh_put would need to change, but not much else.
>
> It isn't the only way to handle that references - it just seemed the
> neatest as I was writing the description. Something else might work
> better in the code.
Got it, thanks for your comments.
>
>>
>> With it, should nfsd using fh_vfsmnt always, never using exp->ex_path.mnt
>> outside of export.c/export.h ?
>>
>> If choose fh_vfsmnt, so many codes need be updated, especially functions.
>> If exp->ex_path.mnt, the new argument fh_vfsmnt seems redundant.
>>
>> Thanks for your work.
>>
>> It reminders a new method,
>>
>> 1. There are only one outlet from each cache, exp_find_key() for expkey,
>> exp_get_by_name() for export.
>> 2. Any fsid to export or filehandle to export will call the function.
>> 3. exp_get()/exp_put() increase/decrease the reference of export.
>>
>> Like the fh_vfsmnt (not same), call legitimize_mntget() in the only
>> outlet function exp_find_key()/exp_get_by_name(), if fail return STALE,
>> otherwise, any valid expkey/export from the cache is validated (Have
>> get the reference of vfsmnt).
>>
>> Add mntget() in exp_get() and mntput() in exp_put(), because the export
>> passed to exp_get/exp_put are returned from exp_find_key/exp_get_by_name.
>>
>>>
>>> exp_find_key needs to legitimize ek->ek_path.mnt, so a successful
>>> return from exp_find implies an active refernece to ->ex_path.mnt.
>>> If exp_find fails, it needs to mnt_put(ek->ek_path.mnt).
>>
>> Yes, it's great.
>>
>>> All callers of exp_find need to mnt_put(exp->ex_path.mnt) when they
>>> decide not to use the exp, and must otherwise store it in an svc_fh.
>>>
>>> With this, pin_kill() should only need to wait for exp_find_key() to
>>> discover that it cannot legitimize the mount, or for expkey_path() to
>>> replace the key via sunrpc_cache_update(), or maybe for cache_clean()
>>> to discard an old entry.
>>>
>>> Hopefully that makes it all clear.
>>
>> Yes, thanks again.
>>
>> With my method, for expkey cache,
>> 1. At first, a fsid is passed to exp_find_key, and lookup a cache
>> in svc_expkey_lookup, if success, ekey->ek_path is pined to mount.
>> 2. Then call legitimize_mntget getting a reference of vfsmnt
>> before return from exp_find_key.
>> 3. Any calling exp_find_key with valid cache must put the vfsmnt.
>>
>> for export cache,
>> 1. At first, a path (returned from exp_find_key) with validate vfsmnt
>> is passed to exp_get_by_name, if success, exp->ex_path is pined to mount.
>> 2. Then call legitimize_mntget getting a reference of vfsmnt
>> before return from exp_get_by_name.
>
> I don't see any point in calling legitimise_mntget here. exp_find_key
> already did the 'legitimize' bit so there is no need to do it again.
I just think they are in two logical.
But, does export cache contains a different vfsmnt as expkey exist?
thanks,
Kinglong Mee
next prev parent reply other threads:[~2015-07-27 3:17 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-11 12:46 [PATCH 00/10 v7] NFSD: Pin to vfsmount for nfsd exports cache Kinglong Mee
2015-07-11 12:46 ` Kinglong Mee
2015-07-11 12:47 ` [PATCH 01/10 v7] fs_pin: Initialize value for fs_pin explicitly Kinglong Mee
2015-07-11 12:47 ` Kinglong Mee
2015-07-11 12:47 ` [PATCH 02/10 v7] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-07-11 12:47 ` Kinglong Mee
2015-07-11 12:48 ` [PATCH 03/10 v7] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-07-11 12:48 ` Kinglong Mee
2015-07-11 12:48 ` [PATCH 04/10 v7] fs: New helper legitimize_mntget() for getting a legitimize mnt Kinglong Mee
2015-07-11 12:48 ` Kinglong Mee
2015-07-11 12:49 ` [PATCH 05/10 v7] sunrpc: Store cache_detail in seq_file's private, directly Kinglong Mee
2015-07-11 12:49 ` [PATCH 06/10 v7] sunrpc/nfsd: Remove redundant code by exports seq_operations functions Kinglong Mee
2015-07-11 12:50 ` [PATCH 07/10 v7] sunrpc: Switch to using list_head instead single list Kinglong Mee
2015-07-11 12:50 ` Kinglong Mee
2015-07-11 12:54 ` Christoph Hellwig
2015-07-11 12:54 ` Christoph Hellwig
2015-07-13 1:30 ` NeilBrown
2015-07-13 8:27 ` Kinglong Mee
2015-07-13 8:27 ` Kinglong Mee
2015-07-11 12:51 ` [PATCH 08/10 v7] sunrpc: New helper cache_delete_entry for deleting cache_head directly Kinglong Mee
2015-07-11 12:51 ` [PATCH 09/10 v7] sunrpc: Support get_ref/put_ref for reference change in cache_head Kinglong Mee
2015-07-11 12:51 ` Kinglong Mee
2015-07-11 12:52 ` [PATCH 10/10 v7] nfsd: Allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
2015-07-11 12:52 ` Kinglong Mee
2015-07-13 3:39 ` NeilBrown
2015-07-13 3:39 ` NeilBrown
2015-07-13 4:02 ` Al Viro
2015-07-13 5:19 ` NeilBrown
2015-07-13 5:19 ` NeilBrown
2015-07-13 6:02 ` Al Viro
2015-07-13 6:02 ` Al Viro
2015-07-13 4:20 ` NeilBrown
2015-07-13 4:45 ` Al Viro
2015-07-13 4:45 ` Al Viro
2015-07-13 5:21 ` NeilBrown
2015-07-13 5:21 ` NeilBrown
2015-07-13 6:02 ` NeilBrown
2015-07-13 6:02 ` NeilBrown
2015-07-13 6:08 ` Al Viro
2015-07-13 6:08 ` Al Viro
2015-07-13 6:32 ` NeilBrown
2015-07-13 6:32 ` NeilBrown
2015-07-13 6:43 ` Al Viro
2015-07-13 6:43 ` Al Viro
2015-07-15 3:49 ` NeilBrown
2015-07-15 4:57 ` Al Viro
2015-07-15 4:57 ` Al Viro
2015-07-15 6:51 ` NeilBrown
2015-07-24 2:05 ` NeilBrown
2015-07-27 2:28 ` Kinglong Mee
2015-07-27 2:51 ` NeilBrown
2015-07-27 2:51 ` NeilBrown
2015-07-27 3:17 ` Kinglong Mee [this message]
2015-07-15 21:07 ` J. Bruce Fields
2015-07-15 21:07 ` J. Bruce Fields
2015-07-15 23:40 ` NeilBrown
2015-07-15 23:40 ` NeilBrown
2015-07-16 20:51 ` J. Bruce Fields
2015-07-16 20:51 ` J. Bruce Fields
2015-07-21 21:58 ` NeilBrown
2015-07-21 21:58 ` NeilBrown
2015-07-22 15:08 ` J. Bruce Fields
2015-07-22 15:08 ` J. Bruce Fields
2015-07-23 23:46 ` export table lookup: was " NeilBrown
2015-07-23 23:46 ` NeilBrown
2015-07-24 19:48 ` J. Bruce Fields
2015-07-24 19:48 ` J. Bruce Fields
2015-07-25 0:40 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55B5A2D5.4000004@gmail.com \
--to=kinglongmee@gmail.com \
--cc=bfields@fieldses.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.com \
--cc=trond.myklebust@primarydata.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.