From: Kinglong Mee <kinglongmee@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
linux-fsdevel@vger.kernel.org,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
NeilBrown <neilb@suse.de>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Steve Dickson <SteveD@redhat.com>,
kinglongmee@gmail.com
Subject: Re: [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on
Date: Sat, 06 Jun 2015 21:38:06 +0800 [thread overview]
Message-ID: <5572F7BE.2070300@gmail.com> (raw)
In-Reply-To: <20150606022158.GZ7232@ZenIV.linux.org.uk>
On 6/6/2015 10:21 AM, Al Viro wrote:
> On Fri, Jun 05, 2015 at 04:02:13PM +0100, Al Viro wrote:
>> On Sun, May 24, 2015 at 11:10:50PM +0800, Kinglong Mee wrote:
>>> --- a/fs/nfsd/export.c
>>> +++ b/fs/nfsd/export.c
>>> @@ -43,9 +43,9 @@ static void expkey_put(struct kref *ref)
>>>
>>> if (test_bit(CACHE_VALID, &key->h.flags) &&
>>> !test_bit(CACHE_NEGATIVE, &key->h.flags))
>>> - path_put(&key->ek_path);
>>> + path_put_unpin(&key->ek_path, &key->ek_pin);
>>> auth_domain_put(key->ek_client);
>>> - kfree(key);
>>> + kfree_rcu(key, rcu_head);
>>> }
>>
>> That looks wrong. OK, so you want umount() to proceed; fine, no problem
>> with that. However, what happens if the final mntput() hits while you
>> are just approaching that path_put_unpin()? ->kill() will be triggered,
>> and it would bloody better
>> a) make sure that expkey_put() is called for that key if it hadn't
>> already been done and
>> b) do not return until such expkey_put() completes. Including the
>> ones that might have been already entered by the time we'd got to ->kill().
You are right.
Sorry for my fault, the above patch misses caring the race.
>>
>> Am I missing something subtle here?
>
> Having looked through that code... It *is* wrong. Note that the normal
> approach is to have pin_remove() called via pin_kill(), directly or triggered
> from group_pin_kill() and/or cleanup_mnt() on the mount it's attached to.
> pin_remove() should never be called outside of ->kill() callbacks. It should
> be called at the point where you are OK with fs being shut down.
Thank you very much for your comments.
I will try to using fs_pin as the restrict.
>
> The fundamental reason why it's broken is different, though - you *can't*
> grab a reference if all you've got is a pin. By the time the callback is
> called, the mount in question is already irretrievably committed to being
> killed. There's one hell of a wide window between the point of no return
> and the point where you are notified of anything, and that's by design -
> you might very well have had several mounts doomed by a syscall and they
> all get through cleanup_mnt() just before return to userland. One by one.
> So between the point where this puppy is doomed and the call of your callback
> there might have been several filesystems going through shutdown, with tons
> of IO, waiting for remote servers, etc.
>
> We could add a primitive that would _try_ to grab a reference - that can
> be done (lock_mount_hash(), check if it has MNT_DOOMED or MNT_SYNC_UMOUNT,
> fail if it does, otherwise mnt_add_count(mnt, 1) and succeed, doing
> unlock_mount_hash() on both exit paths). HOWEVER, you'll need to think
> very carefully where to use that primitive - unlike mntget() it _can_
> fail and lock_mount_hash() can inflict quite a bit of cacheline pingpong
> if used heavily.
Do you mean adding a new feature?
>
> Could you give details on lifecycle of those objects, including the stages
> at which we might try to grab references? Combination of such primitive with
> a pin (doing just "NULL the references to vfsmount/dentry, do dput() on
> what that dentry used to be and call pin_remove()") might work, if the
> lifecycle is good enough.
NFSD has two caches named expkey and export which are managed by sunrpc cache
fundamental. I will only explain export following for expkey is similar as export.
struct cache_head {
struct kref ref;
... ...
};
struct svc_export {
struct cache_head h;
struct path ex_path;
... ...
};
1. svc_export has a reference, will be freed when the reference is decreased to zero.
2. ex_path must be put when freed (Want change mntget to fs_pin for ex_path's vfsmnt).
3. With fs_pin, there are two logic (one is the normal logic, the other is pin_kill)
which can cause free svc_export.
4. The reference of the normal logic is zero, but the pin_kill logic is not zero.
the second logic will decrease the reference indirectly, if decrease to zero,
umount will go though the normal logic's code, at last frees the svc_export;
if not zero, umount must don't free the svc_export.
I try to solve the window as,
struct svc_export {
struct cache_head h;
struct path ex_path;
... ...
struct fs_pin ex_pin;
struct rcu_head rcu_head;
/* For cache_put and fs umounting window */
struct completion ex_done;
struct work_struct ex_work;
};
1. ex_done is for umount waiting the reference is decreased to zero.
2. ex_work is for umount decrease the reference indirectly.
3. The normal logic don't free the svc_export, calls complete() and
go though pin_kill() logic as,
(svc_export_put will be called when reference is decreased to zero)
static void svc_export_put(struct kref *ref)
{
struct svc_export *exp = container_of(ref, struct svc_export, h.ref);
rcu_read_lock();
complete(&exp->ex_done);
pin_kill(&exp->ex_pin);
}
4. pin_kill() logic will schedules to decrease the reference though ex_work,
and at last path_put_unpin and destroy the svc_export.
static void export_pin_kill(struct fs_pin *pin)
{
struct svc_export *exp = container_of(pin, struct svc_export, ex_pin);
if (!completion_done(&exp->ex_done)) {
schedule_work(&exp->ex_work);
wait_for_completion(&exp->ex_done);
}
path_put_unpin(&exp->ex_path, &exp->ex_pin);
svc_export_destroy(exp);
}
The full patches will be sent later. Thanks again.
thanks,
Kinglong Mee
next prev parent reply other threads:[~2015-06-06 13:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-24 15:01 [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache Kinglong Mee
[not found] ` <5561E7E4.50604-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-05-24 15:10 ` [PATCH 1/5 v2] fs_pin: Fix uninitialized value in fs_pin Kinglong Mee
2015-05-24 15:10 ` [PATCH 2/5 v2] fs_pin: Export functions for specific filesystem Kinglong Mee
2015-05-24 15:10 ` [PATCH 3/5 v2] path: New helpers path_get_pin/path_put_unpin for path pin Kinglong Mee
2015-05-24 15:10 ` [PATCH 4/5 v2] sunrpc: New helper cache_force_expire for cache cleanup Kinglong Mee
2015-05-24 15:10 ` [PATCH 5/5] nfsd: allows user un-mounting filesystem where nfsd exports base on Kinglong Mee
2015-06-05 15:02 ` Al Viro
2015-06-06 2:21 ` Al Viro
2015-06-06 13:38 ` Kinglong Mee [this message]
2015-06-01 18:21 ` [PATCH 0/4 v2] NFSD: Pin to vfsmount for some nfsd exports cache J. Bruce Fields
2015-06-02 1:41 ` Kinglong Mee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5572F7BE.2070300@gmail.com \
--to=kinglongmee@gmail.com \
--cc=SteveD@redhat.com \
--cc=bfields@fieldses.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@primarydata.com \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).