All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Marc Eshel <eshel@almaden.ibm.com>,
	neilb@suse.de, akpm@linux-foundation.org,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: nfs: infinite loop in fcntl(F_SETLKW)
Date: Thu, 10 Apr 2008 17:54:10 -0400	[thread overview]
Message-ID: <20080410215410.GF22324@fieldses.org> (raw)
In-Reply-To: <1207862436.8180.30.camel@heimdal.trondhjem.org>

On Thu, Apr 10, 2008 at 05:20:36PM -0400, Trond Myklebust wrote:
> 
> On Thu, 2008-04-10 at 17:07 -0400, Trond Myklebust wrote:
> > On Thu, 2008-04-10 at 17:02 -0400, Trond Myklebust wrote:
> > > On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote:
> > > > Another infinite loop, this one involving both client and server.
> > > > 
> > > > Basically what happens is that on the server nlm_fopen() calls
> > > > nfsd_open() which returns -EACCES, to which nlm_fopen() returns
> > > > NLM_LCK_DENIED.
> > > > 
> > > > On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
> > > > which in will cause fcntl_setlk() to retry forever.
> > > > 
> > > > I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for
> > > > blocking locks, as NLM_LCK_BLOCKED is for the contended case.  For
> > > > testing the lock leave NLM_LCK_DENIED as EAGAIN.  That still could be
> > > > misleading, but at least there's no infinite loop in that case.
> > > > 
> > > > I've minimally tested this patch to verify that it cures the lockup,
> > > > and that simple blocking locks keep working.
> > > > 
> > > > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> > > > ---
> > > >  fs/lockd/clntproc.c |    3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > Index: linux/fs/lockd/clntproc.c
> > > > ===================================================================
> > > > --- linux.orig/fs/lockd/clntproc.c	2008-04-02 13:34:57.000000000 +0200
> > > > +++ linux/fs/lockd/clntproc.c	2008-04-10 21:23:46.000000000 +0200
> > > > @@ -536,6 +536,9 @@ again:
> > > >  		up_read(&host->h_rwsem);
> > > >  	}
> > > >  	status = nlm_stat_to_errno(resp->status);
> > > > +	/* Don't return EAGAIN, as that would make fcntl_setlk() loop */
> > > > +	if (status == -EAGAIN)
> > > > +		status = -ENOLCK;
> > > >  out_unblock:
> > > >  	nlmclnt_finish_block(block);
> > > >  	/* Cancel the blocked request if it is still pending */
> > > 
> > > 
> > > Wait. There is something really weird going on here.
> > > 
> > > According to the spec, LCK_DENIED means 'the request failed' (i.e.
> > > ENOLCK is definitely correct)
> > > 
> > > OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary
> > > failures, the first because the server had a resource problem, and the
> > > second because the server rebooted and is in the grace period (i.e.
> > > EAGAIN would appear to be more appropriate). See
> > > 
> > > http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02
> > > 
> > > AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by
> > > swapping the return values for NLM_LCK_DENIED and
> > > NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD.
> > > 
> > > The problem is that there appears to be a similar confusion on the Linux
> > > server side in nlmsvc_lock(). :-(
> > 
> > Duh... Sorry, EAGAIN is indeed the correct return value for fcntl() when
> > the lock attempt failed. I should have reread the manpage/posix spec
> > before replying.
> 
> OK. So the correct fix here should really be applied to fcntl_setlk().
> There is absolutely no reason why we should be looping at all if the
> filesystem has a ->lock() method.
> 
> In fact, this looping behaviour was introduced recently in commit
> 7723ec9777d9832849b76475b1a21a2872a40d20.

Apologies, that was indeed a behavioral change introduced in a commit
that claimed just to be shuffling code around.

> Marc, Bruce, why was this
> done, and how are filesystems now supposed to behave?
> 

The assumption must have been that -EAGAIN could only mean that we
needed to keep blocking, and hence was a nonsensical return from a
filesystem lock method that waited itself for the lock to become
available--such a method would return 0, -EINTR (or some more exotic
error), or continue waiting.

If we agree that EAGAIN is actually a legimate error to return from a
blocking lock, then, yes, we need take ->lock() back out of this loop.

And I don't think there's any real reason we need the new behavior.  So
we should probably revert that--I'll take a closer look tomorrow....

--b.

  reply	other threads:[~2008-04-10 21:54 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-09 15:57 [patch] fix infinite loop in generic_file_splice_read() Miklos Szeredi
2008-04-09 17:05 ` Oliver Pinter
2008-04-09 17:05   ` Oliver Pinter
2008-04-09 18:57 ` Andrew Morton
2008-04-09 19:25   ` Miklos Szeredi
2008-04-09 19:52   ` Jens Axboe
2008-04-10  6:29   ` Allard Hoeve
2008-04-10 19:51 ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi
2008-04-10 21:02   ` Trond Myklebust
2008-04-10 21:07     ` Trond Myklebust
     [not found]       ` <1207861661.8180.18.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-10 21:20         ` Trond Myklebust
2008-04-10 21:20           ` Trond Myklebust
2008-04-10 21:20           ` Trond Myklebust
2008-04-10 21:54           ` J. Bruce Fields [this message]
2008-04-11 19:12             ` Miklos Szeredi
2008-04-11 19:19               ` J. Bruce Fields
2008-04-11 19:22                 ` Miklos Szeredi
2008-04-11 19:22                   ` Miklos Szeredi
2008-04-13  0:08               ` J. Bruce Fields
2008-04-13  8:13                 ` Miklos Szeredi
2008-04-13  8:13                   ` Miklos Szeredi
2008-04-14 17:07                   ` J. Bruce Fields
     [not found]                   ` <E1JkxKz-0003A8-9V-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-04-14 19:03                     ` [PATCH] locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs J. Bruce Fields
2008-04-14 19:03                       ` J. Bruce Fields
2008-04-14 19:03                       ` J. Bruce Fields
2008-04-13  8:28             ` nfs: infinite loop in fcntl(F_SETLKW) Miklos Szeredi
2008-04-13  8:28               ` Miklos Szeredi
2008-04-14 17:19               ` J. Bruce Fields
2008-04-14 21:15                 ` Miklos Szeredi
2008-04-15 18:58                   ` J. Bruce Fields
2008-04-16 16:28                     ` Miklos Szeredi
2008-04-17 22:26                       ` J. Bruce Fields
2008-04-18 12:47                         ` Miklos Szeredi
2008-04-18 12:47                           ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080410215410.GF22324@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=akpm@linux-foundation.org \
    --cc=eshel@almaden.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.de \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.