From: Stanislav Kinsbursky <skinsbursky@parallels.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>
Cc: "Trond.Myklebust@netapp.com" <Trond.Myklebust@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Grace period
Date: Tue, 10 Apr 2012 18:10:27 +0400 [thread overview]
Message-ID: <4F843F53.9050301@parallels.com> (raw)
In-Reply-To: <20120410133735.GE18465@fieldses.org>
10.04.2012 17:37, bfields@fieldses.org пишет:
> On Tue, Apr 10, 2012 at 03:29:11PM +0400, Stanislav Kinsbursky wrote:
>> 10.04.2012 03:26, bfields@fieldses.org пишет:
>>> On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
>>>> 07.04.2012 03:40, bfields@fieldses.org пишет:
>>>>> On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
>>>>>> Hello, Bruce.
>>>>>> Could you, please, clarify this reason why grace list is used?
>>>>>> I.e. why list is used instead of some atomic variable, for example?
>>>>>
>>>>> Like just a reference count? Yeah, that would be OK.
>>>>>
>>>>> In theory it could provide some sort of debugging help. (E.g. we could
>>>>> print out the list of "lock managers" currently keeping us in grace.) I
>>>>> had some idea we'd make those lock manager objects more complicated, and
>>>>> might have more for individual containerized services.
>>>>
>>>> Could you share this idea, please?
>>>>
>>>> Anyway, I have nothing against lists. Just was curious, why it was used.
>>>> I added Trond and lists to this reply.
>>>>
>>>> Let me explain, what is the problem with grace period I'm facing
>>>> right know, and what I'm thinking about it.
>>>> So, one of the things to be containerized during "NFSd per net ns"
>>>> work is the grace period, and these are the basic components of it:
>>>> 1) Grace period start.
>>>> 2) Grace period end.
>>>> 3) Grace period check.
>>>> 3) Grace period restart.
>>>
>>> For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
>>> that's called on aisngal in lockd()?
>>>
>>> I wonder if there's any way to figure out if that's actually used by
>>> anyone? (E.g. by any distro init scripts). It strikes me as possibly
>>> impossible to use correctly. Perhaps we could deprecate it....
>>>
>>
>> Or (since lockd kthread is visible only from initial pid namespace)
>> we can just hardcode "init_net" in this case. But it means, that
>> this "kill" logic will be broken if two containers shares one pid
>> namespace, but have separated networks namespaces.
>> Anyway, both (this one or Bruce's) solutions suits me.
>>
>>>> So, the simplest straight-forward way is to make all internal stuff:
>>>> "grace_list", "grace_lock", "grace_period_end" work and both
>>>> "lockd_manager" and "nfsd4_manager" - per network namespace. Also,
>>>> "laundromat_work" have to be per-net as well.
>>>> In this case:
>>>> 1) Start - grace period can be started per net ns in
>>>> "lockd_up_net()" (thus has to be moves there from "lockd()") and
>>>> "nfs4_state_start()".
>>>> 2) End - grace period can be ended per net ns in "lockd_down_net()"
>>>> (thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
>>>> "fs4_state_shutdown()".
>>>> 3) Check - looks easy. There is either svc_rqst or net context can
>>>> be passed to function.
>>>> 4) Restart - this is a tricky place. It would be great to restart
>>>> grace period only for the networks namespace of the sender of the
>>>> kill signal. So, the idea is to check siginfo_t for the pid of
>>>> sender, then try to locate the task, and if found, then get sender's
>>>> networks namespace, and restart grace period only for this namespace
>>>> (of course, if lockd was started for this namespace - see below).
>>>
>>> If it's really the signalling that's the problem--perhaps we can get
>>> away from the signal-based interface.
>>>
>>> At least in the case of lockd I suspect we could.
>>>
>>
>> I'm ok with that. So, if no objections will follow, I'll drop it and
>> send the patch. Or you want to do it?
>
> Please do go ahead.
>
> The safest approach might be:
> - leave lockd's signal handling there (just accept that it may
> behave incorrectly in container case), assuming that's safe.
> - add a printk ("signalling lockd to restart is deprecated",
> or something) if it's used.
>
> Then eventually we'll remove it entirely.
>
> (But if that doesn't work, it'd likely also be OK just to remove it
> completely now.)
>
Well, I can do this to restart grace only for "init_net" and a printk with your
message and information, that it affect only init_net.
Looks good to you?
--
Best regards,
Stanislav Kinsbursky
next prev parent reply other threads:[~2012-04-10 14:11 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4F7F230A.6080506@parallels.com>
[not found] ` <20120406234039.GA20940@fieldses.org>
2012-04-09 11:24 ` Grace period Stanislav Kinsbursky
2012-04-09 13:47 ` Jeff Layton
2012-04-09 14:25 ` Stanislav Kinsbursky
2012-04-09 15:27 ` Jeff Layton
2012-04-09 16:08 ` Stanislav Kinsbursky
2012-04-09 16:11 ` bfields
2012-04-09 16:17 ` Myklebust, Trond
2012-04-09 16:21 ` bfields
2012-04-09 16:33 ` Myklebust, Trond
2012-04-09 16:39 ` bfields
2012-04-09 16:56 ` Stanislav Kinsbursky
2012-04-09 18:11 ` bfields
2012-04-10 10:56 ` Stanislav Kinsbursky
2012-04-10 13:39 ` bfields
2012-04-10 15:36 ` Stanislav Kinsbursky
2012-04-10 18:28 ` Jeff Layton
2012-04-10 20:46 ` bfields
2012-04-11 10:08 ` Stanislav Kinsbursky
2012-04-09 23:26 ` bfields
2012-04-10 11:29 ` Stanislav Kinsbursky
2012-04-10 13:37 ` bfields
2012-04-10 14:10 ` Stanislav Kinsbursky [this message]
2012-04-10 14:18 ` bfields
2016-06-14 21:25 [PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale Trond Myklebust
2016-06-30 21:46 ` grace period Marc Eshel
2016-07-01 16:08 ` Bruce Fields
2016-07-01 17:31 ` Marc Eshel
2016-07-01 20:07 ` Bruce Fields
2016-07-01 20:24 ` Marc Eshel
2016-07-01 20:47 ` Bruce Fields
2016-07-01 20:46 ` Marc Eshel
2016-07-01 21:01 ` Bruce Fields
2016-07-01 22:42 ` Marc Eshel
2016-07-02 0:58 ` Bruce Fields
2016-07-03 5:30 ` Marc Eshel
2016-07-05 20:51 ` Bruce Fields
2016-07-05 23:05 ` Marc Eshel
2016-07-06 0:38 ` Bruce Fields
[not found] ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
2016-07-01 20:51 ` Marc Eshel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F843F53.9050301@parallels.com \
--to=skinsbursky@parallels.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bfields@fieldses.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).