linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: "Trond.Myklebust@netapp.com" <Trond.Myklebust@netapp.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Grace period
Date: Mon, 9 Apr 2012 19:26:18 -0400	[thread overview]
Message-ID: <20120409232618.GI10508@fieldses.org> (raw)
In-Reply-To: <4F82C6E3.3030009@parallels.com>

On Mon, Apr 09, 2012 at 03:24:19PM +0400, Stanislav Kinsbursky wrote:
> 07.04.2012 03:40, bfields@fieldses.org пишет:
> >On Fri, Apr 06, 2012 at 09:08:26PM +0400, Stanislav Kinsbursky wrote:
> >>Hello, Bruce.
> >>Could you, please, clarify this reason why grace list is used?
> >>I.e. why list is used instead of some atomic variable, for example?
> >
> >Like just a reference count?  Yeah, that would be OK.
> >
> >In theory it could provide some sort of debugging help.  (E.g. we could
> >print out the list of "lock managers" currently keeping us in grace.)  I
> >had some idea we'd make those lock manager objects more complicated, and
> >might have more for individual containerized services.
> 
> Could you share this idea, please?
> 
> Anyway, I have nothing against lists. Just was curious, why it was used.
> I added Trond and lists to this reply.
> 
> Let me explain, what is the problem with grace period I'm facing
> right know, and what I'm thinking about it.
> So, one of the things to be containerized during "NFSd per net ns"
> work is the grace period, and these are the basic components of it:
> 1) Grace period start.
> 2) Grace period end.
> 3) Grace period check.
> 3) Grace period restart.

For restart, you're thinking of the fs/lockd/svc.c:restart_grace()
that's called on aisngal in lockd()?

I wonder if there's any way to figure out if that's actually used by
anyone?  (E.g. by any distro init scripts).  It strikes me as possibly
impossible to use correctly.  Perhaps we could deprecate it....

> So, the simplest straight-forward way is to make all internal stuff:
> "grace_list", "grace_lock", "grace_period_end" work and both
> "lockd_manager" and "nfsd4_manager" - per network namespace. Also,
> "laundromat_work" have to be per-net as well.
> In this case:
> 1) Start - grace period can be started per net ns in
> "lockd_up_net()" (thus has to be moves there from "lockd()") and
> "nfs4_state_start()".
> 2) End - grace period can be ended per net ns in "lockd_down_net()"
> (thus has to be moved there from "lockd()"), "nfsd4_end_grace()" and
> "fs4_state_shutdown()".
> 3) Check - looks easy. There is either svc_rqst or net context can
> be passed to function.
> 4) Restart - this is a tricky place. It would be great to restart
> grace period only for the networks namespace of the sender of the
> kill signal. So, the idea is to check siginfo_t for the pid of
> sender, then try to locate the task, and if found, then get sender's
> networks namespace, and restart grace period only for this namespace
> (of course, if lockd was started for this namespace - see below).

If it's really the signalling that's the problem--perhaps we can get
away from the signal-based interface.

At least in the case of lockd I suspect we could.

Or perhaps the decision to share a single lockd thread (or set of nsfd
threads) among multiple network namespaces was a poor one.  But I
realize multithreading lockd doesn't look easy.

--b.

> If task not found, of it's lockd wasn't started for it's namespace,
> then grace period can be either restarted for all namespaces, of
> just silently dropped. This is the place where I'm not sure, how to
> do. Because calling grace period for all namespaces will be
> overkill...
> 
> There also another problem with the "task by pid" search, that found
> task can be actually not sender (which died already), but some other
> new task with the same pid number. In this case, I think, we can
> just neglect this probability and always assume, that we located
> sender (if, of course, lockd was started for sender's network
> namespace).
> 
> Trond, Bruce, could you, please, comment this ideas?
> 
> -- 
> Best regards,
> Stanislav Kinsbursky
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-04-09 23:26 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4F7F230A.6080506@parallels.com>
     [not found] ` <20120406234039.GA20940@fieldses.org>
2012-04-09 11:24   ` Grace period Stanislav Kinsbursky
2012-04-09 13:47     ` Jeff Layton
2012-04-09 14:25       ` Stanislav Kinsbursky
2012-04-09 15:27         ` Jeff Layton
2012-04-09 16:08           ` Stanislav Kinsbursky
2012-04-09 16:11             ` bfields
2012-04-09 16:17               ` Myklebust, Trond
2012-04-09 16:21                 ` bfields
2012-04-09 16:33                   ` Myklebust, Trond
2012-04-09 16:39                     ` bfields
2012-04-09 16:56                     ` Stanislav Kinsbursky
2012-04-09 18:11                       ` bfields
2012-04-10 10:56                         ` Stanislav Kinsbursky
2012-04-10 13:39                           ` bfields
2012-04-10 15:36                             ` Stanislav Kinsbursky
2012-04-10 18:28                               ` Jeff Layton
2012-04-10 20:46                                 ` bfields
2012-04-11 10:08                                 ` Stanislav Kinsbursky
2012-04-09 23:26     ` bfields [this message]
2012-04-10 11:29       ` Stanislav Kinsbursky
2012-04-10 13:37         ` bfields
2012-04-10 14:10           ` Stanislav Kinsbursky
2012-04-10 14:18             ` bfields
2016-06-14 21:25 [PATCH] NFS: Don't let readdirplus revalidate an inode that was marked as stale Trond Myklebust
2016-06-30 21:46 ` grace period Marc Eshel
2016-07-01 16:08   ` Bruce Fields
2016-07-01 17:31     ` Marc Eshel
2016-07-01 20:07       ` Bruce Fields
2016-07-01 20:24         ` Marc Eshel
2016-07-01 20:47           ` Bruce Fields
2016-07-01 20:46         ` Marc Eshel
2016-07-01 21:01           ` Bruce Fields
2016-07-01 22:42             ` Marc Eshel
2016-07-02  0:58               ` Bruce Fields
2016-07-03  5:30                 ` Marc Eshel
2016-07-05 20:51                   ` Bruce Fields
2016-07-05 23:05                     ` Marc Eshel
2016-07-06  0:38                       ` Bruce Fields
     [not found]         ` <OF5D486F02.62CECB7B-ON88257FE3.0071DBE5-88257FE3.00722318@LocalDomain>
2016-07-01 20:51           ` Marc Eshel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120409232618.GI10508@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=skinsbursky@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).