linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Steve Dickson <SteveD@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v4 09/11] nfsdcld: reopen pipe if it's deleted and recreated
Date: Thu, 26 Jan 2012 10:41:57 -0500	[thread overview]
Message-ID: <20120126104157.671a10c2@corrin.poochiereds.net> (raw)
In-Reply-To: <4F2171B5.2030103@RedHat.com>

On Thu, 26 Jan 2012 10:31:01 -0500
Steve Dickson <SteveD@redhat.com> wrote:

> 
> 
> On 01/26/2012 09:30 AM, Jeff Layton wrote:
> > On Thu, 26 Jan 2012 08:28:30 -0500
> > Jeff Layton <jlayton@redhat.com> wrote:
> > 
> >> On Thu, 26 Jan 2012 07:47:51 -0500
> >> Steve Dickson <SteveD@redhat.com> wrote:
> >>
> >>>
> >>>
> >>> On 01/25/2012 06:32 PM, Jeff Layton wrote:
> >>>> On Wed, 25 Jan 2012 17:04:44 -0500
> >>>> Steve Dickson <SteveD@redhat.com> wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 01/25/2012 03:28 PM, Jeff Layton wrote:
> >>>>>> On Wed, 25 Jan 2012 14:31:10 -0500
> >>>>>> Steve Dickson <SteveD@redhat.com> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 01/25/2012 02:09 PM, Jeff Layton wrote:
> >>>>>>>> On Wed, 25 Jan 2012 13:16:24 -0500
> >>>>>>>> Steve Dickson <SteveD@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hey Jeff,
> >>>>>>>>>
> >>>>>>>>> Commit inline... 
> >>>>>>>>>
> >>>>>>>>> On 01/23/2012 03:02 PM, Jeff Layton wrote:
> >>>>>>>>>> This can happen if nfsd is shut down and restarted. If that occurs,
> >>>>>>>>>> then reopen the pipe so we're not waiting for data on the defunct
> >>>>>>>>>> pipe.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> >>>>>>>>>> ---
> >>>>>>>>>>  utils/nfsdcld/nfsdcld.c |   84 +++++++++++++++++++++++++++++++++++++++++-----
> >>>>>>>>>>  1 files changed, 74 insertions(+), 10 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/utils/nfsdcld/nfsdcld.c b/utils/nfsdcld/nfsdcld.c
> >>>>>>>>>> index b0c08e2..0dc5b37 100644
> >>>>>>>>>> --- a/utils/nfsdcld/nfsdcld.c
> >>>>>>>>>> +++ b/utils/nfsdcld/nfsdcld.c
> >>>>>>>>>> @@ -57,6 +57,8 @@ struct cld_client {
> >>>>>>>>>>  
> >>>>>>>>>>  /* global variables */
> >>>>>>>>>>  static char *pipepath = DEFAULT_CLD_PATH;
> >>>>>>>>>> +static int 		inotify_fd = -1;
> >>>>>>>>>> +static struct event	pipedir_event;
> >>>>>>>>>>  
> >>>>>>>>>>  static struct option longopts[] =
> >>>>>>>>>>  {
> >>>>>>>>>> @@ -68,8 +70,10 @@ static struct option longopts[] =
> >>>>>>>>>>  	{ NULL, 0, 0, 0 },
> >>>>>>>>>>  };
> >>>>>>>>>>  
> >>>>>>>>>> +
> >>>>>>>>>>  /* forward declarations */
> >>>>>>>>>>  static void cldcb(int UNUSED(fd), short which, void *data);
> >>>>>>>>>> +static void cld_pipe_reopen(struct cld_client *clnt);
> >>>>>>>>>>  
> >>>>>>>>>>  static void
> >>>>>>>>>>  usage(char *progname)
> >>>>>>>>>> @@ -80,10 +84,62 @@ usage(char *progname)
> >>>>>>>>>>  
> >>>>>>>>>>  #define INOTIFY_EVENT_MAX (sizeof(struct inotify_event) + NAME_MAX)
> >>>>>>>>>>  
> >>>>>>>>>> +static void
> >>>>>>>>>> +cld_inotify_cb(int UNUSED(fd), short which, void *data)
> >>>>>>>>>> +{
> >>>>>>>>>> +	int ret, oldfd;
> >>>>>>>>>> +	char evbuf[INOTIFY_EVENT_MAX];
> >>>>>>>>>> +	char *dirc = NULL, *pname;
> >>>>>>>>>> +	struct inotify_event *event = (struct inotify_event *)evbuf;
> >>>>>>>>>> +	struct cld_client *clnt = data;
> >>>>>>>>>> +
> >>>>>>>>>> +	if (which != EV_READ)
> >>>>>>>>>> +		return;
> >>>>>>>>>> +
> >>>>>>>>>> +	dirc = strndup(pipepath, PATH_MAX);
> >>>>>>>>>> +	if (!dirc) {
> >>>>>>>>>> +		xlog_err("%s: unable to allocate memory", __func__);
> >>>>>>>>>> +		goto out;
> >>>>>>>>>> +	}
> >>>>>>>>>> +
> >>>>>>>>>> +	ret = read(inotify_fd, evbuf, INOTIFY_EVENT_MAX);
> >>>>>>>>>> +	if (ret < 0) {
> >>>>>>>>>> +		xlog_err("%s: read from inotify fd failed: %m", __func__);
> >>>>>>>>>> +		goto out;
> >>>>>>>>>> +	}
> >>>>>>>>>> +
> >>>>>>>>>> +	/* check to see if we have a filename in the evbuf */
> >>>>>>>>>> +	if (!event->len)
> >>>>>>>>>> +		goto out;
> >>>>>>>>>> +
> >>>>>>>>>> +	pname = basename(dirc);
> >>>>>>>>>> +
> >>>>>>>>>> +	/* does the filename match our pipe? */
> >>>>>>>>>> +	if (strncmp(pname, event->name, event->len))
> >>>>>>>>>> +		goto out;
> >>>>>>>>>> +
> >>>>>>>>>> +	/*
> >>>>>>>>>> +	 * reopen the pipe. The old fd is not closed until the new one is
> >>>>>>>>>> +	 * opened, so we know they should be different if the reopen is
> >>>>>>>>>> +	 * successful.
> >>>>>>>>>> +	 */
> >>>>>>>>>> +	oldfd = clnt->cl_fd;
> >>>>>>>>>> +	do {
> >>>>>>>>>> +		cld_pipe_reopen(clnt);
> >>>>>>>>>> +	} while (oldfd == clnt->cl_fd);
> >>>>>>>>> Doesn't this have a potential for an infinite loop? 
> >>>>>>>>>
> >>>>>>>>> steved.  
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Yes. If reopening the new pipe continually fails then it will loop
> >>>>>>>> forever.
> >>>>>>> Would it be more accurate to say it would be spinning forever? 
> >>>>>>> Since there is no sleep or delay in cld_pipe_reopen, what's
> >>>>>>> going to stop the daemon from spinning in a CPU bound loop?
> >>>>>>>
> >>>>>>
> >>>>>> Well, not spinning in a userspace loop...it'll continually be cycling on
> >>>>>> an open() call that's not working for whatever reason. We sort of have
> >>>>>> to loop on that though. I think the best we can do is add a sleep(1) in
> >>>>>> there or something. Would that be sufficient?
> >>>>>>
> >>>>> I still think it going to needlessly suck up CPU cycles... 
> >>>>>
> >>>>> The way I handled this in the rpc.idmapd daemon was to do the
> >>>>> reopen on a SIGHUP signal. Then in NFS server initscript 
> >>>>> I did the following:
> >>>>>     /usr/bin/pkill -HUP rpc.idmapd
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>
> >>>> Ugh, that requires manual intervention if the pipe is removed and
> >>>> recreated. If someone restarts nfsd and doesn't send the signal, then
> >>>> they won't get the upcalls. I'd prefer something that "just works".
> >>> I have not seen any bz open saying rpc.idmapd doesn't just work... 
> >>>
> >>>>
> >>>> Seriously, is it that big a deal to just loop here? One open(2) call
> >>>> every second doesn't seem that bad, and honestly if a new pipe pops up
> >>>> and the daemon can't open it then a few CPU cycles is the least of your
> >>>> worries.
> >>>>
> >>> Put the daemon in that loop and then run the top command in another 
> >>> window.. If the daemon is at the top of the list then it is a big
> >>> deal because that daemon will on the top forever for no reason, in
> >>> the cast of the NFS server not coming back. 
> >>>
> >>
> >> This situation is really unlikely. The daemon does not reopen the pipe
> >> when the old one goes away. It reopens it when a new one with the same
> >> name is recreated in the directory.
> >>
> >> That's an important distinction because in order to get into this loop,
> >> you'd need to:
> >>
> >> 1/ remove the old pipe -- this happens when the daemon is shut down
> Just to be clear, when this happens, that while loop is *not* executed
> 
> >>
> >> 2/ create a new pipe -- this happens when the daemon is restarted
> Then when this happens that while loop is *not* executed.
> 
> >>
> > 
> > To clarify, the above happen when knfsd are stopped and started...
> > 
> >> 3/ not be able to open the new pipe for some reason, even though you
> >> were able to open the old one
> Only when 1,2,3 happens synchronously will that while loop be execute, correct?
> 
> More the to the point, stopping the server will *not* cause this while 
> to be execute until the server is restarted, correct?
> 

Correct. When knfsd starts back up, a new pipe will be created. At that
point, the daemon will try to open the new one and will then close the
old if the open succeeds. It will only loop here if that open fails for
some reason.

> >>
> >> The reason I put this in a loop is because it's possible (though not
> >> likely) that you'd hit condition #3 temporarily. In that event, looping
> >> and retrying an open(2) call every second seems entirely reasonable and
> >> is more fault tolerant than just dying here. The open of a pipe takes
> >> much less than 1s, so there's plenty of time between open attempts for
> >> the machine to get other things done
> By no means am I saying not to make it fault tolerant... Please do! I'm
> just worried about the daemon spinning out of control.. :-)
> 

Agreed. Adding a small sleep between open attempts should keep it from
going crazy if reopening the pipe continually fails. I should probably
also ratelimit the log message when the reopen fails too. I'll plan to
add that in the next iteration.

> >>
> >> If it turns out that there's a problem, the admin can shut down the
> >> daemon at that point. They may need to do so anyway in order to resolve
> >> the situation if the thing preventing the opening of the pipe isn't
> >> temporary.
> I guess I would rather figure this out now, during the design, than
> after the bits hit the street... 
> 

Yep, definitely.

Thanks for the review!
-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2012-01-26 15:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-23 20:02 [PATCH v4 00/11] nfsdcld: add a daemon to track NFSv4 client names on stable storage Jeff Layton
2012-01-23 20:02 ` [PATCH v4 01/11] nfsdcld: add client tracking daemon stub Jeff Layton
2012-01-23 20:02 ` [PATCH v4 02/11] nfsdcld: add autoconf goop for sqlite Jeff Layton
2012-01-23 20:02 ` [PATCH v4 03/11] nfsdcld: add routines for a sqlite backend database Jeff Layton
2012-01-23 20:02 ` [PATCH v4 04/11] nfsdcld: add check/update functionality Jeff Layton
2012-01-23 20:02 ` [PATCH v4 05/11] nfsdcld: add function to remove unreclaimed client records Jeff Layton
2012-01-23 20:02 ` [PATCH v4 06/11] nfsdcld: have daemon pass client row index back to kernel Jeff Layton
2012-01-23 20:02 ` [PATCH v4 07/11] nfsdcld: implement an init upcall Jeff Layton
2012-01-23 20:02 ` [PATCH v4 08/11] nfsdcld: allow daemon to wait for pipe to show up Jeff Layton
2012-01-23 20:02 ` [PATCH v4 09/11] nfsdcld: reopen pipe if it's deleted and recreated Jeff Layton
2012-01-25 18:16   ` Steve Dickson
2012-01-25 19:09     ` Jeff Layton
2012-01-25 19:31       ` Steve Dickson
2012-01-25 20:28         ` Jeff Layton
2012-01-25 22:04           ` Steve Dickson
2012-01-25 23:32             ` Jeff Layton
2012-01-26 12:47               ` Steve Dickson
2012-01-26 13:28                 ` Jeff Layton
2012-01-26 14:30                   ` Jeff Layton
2012-01-26 15:31                     ` Steve Dickson
2012-01-26 15:41                       ` Jeff Layton [this message]
2012-01-26 18:58                         ` J. Bruce Fields
2012-01-26 19:36                           ` Jeff Layton
2012-01-26 20:18                             ` J. Bruce Fields
2012-01-26 21:58                               ` Steve Dickson
2012-01-23 20:02 ` [PATCH v4 10/11] nfsdcld: add a manpage for nfsdcld Jeff Layton
2012-01-23 20:02 ` [PATCH v4 11/11] nfsdcld: update the README Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120126104157.671a10c2@corrin.poochiereds.net \
    --to=jlayton@redhat.com \
    --cc=SteveD@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).