All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-nfs@vger.kernel.org, teigland@redhat.com
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)
Date: Fri, 21 Nov 2008 20:15:55 -0500	[thread overview]
Message-ID: <20081122011555.GA28485@fieldses.org> (raw)
In-Reply-To: <1227130623-31607-1-git-send-email-jlayton@redhat.com>

On Wed, Nov 19, 2008 at 04:37:03PM -0500, Jeff Layton wrote:
> Dave Teigland opened a bug stating that he was having some problems with
> DLM and lockd. Essentially, the problem boiled down to the fact that
> when you do a posix lock of a region that touches another lock, the VFS
> will coalesce the locks and return a lock that encompasses the whole
> range.
> 
> The problem here is that DLM then does a fl_grant callback to lockd with
> the new coalesced lock. The fl_grant callback then looks through all
> of the blocks and eventually returns -ENOENT since none match the
> coalesced lock.

Ugh.

> I'm having a very hard time tracking down info about how the fl_grant
> callback is supposed to work. Is it OK to send an fl_grant callback
> with a lock that's larger than the one requested? If so, then lockd
> needs to account for that possibility. Also, what exactly is the
> purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?

It's only used in the case the lock failed, and it can optionally be set
to a copy of the conflicting lock.

> What follows is a patch that changes nlmsvc_grant_deferred to account
> for the possibility that it may receive an fl_grant that has already
> been coalesced. It changes nlmsvc_grant_deferred to walk the entire
> list of blocks and grant any blocks that are covered by the range of
> the lock in the grant callback, if doing so will not conflict with an
> earlier grant.

Hm.  That might work.

> The patch is still very rough and is probably broken in subtle (and
> maybe overt) ways, but it fixes the reproducer that Dave provided. It's
> just intended as a starting point for discussion. Can anyone clarify how
> fl_grant is supposed to work? Who's wrong here? DLM or NLM?

I think this wasn't thought through, apologies.  (It was my
responsibility to make sure it was!)

I also occasionally think that it would be better to keep any actual
lock application the caller's responsibility, to the extent that's
possible--so fl_grant would just be a notification, and it would be up
to lockd to try the lock again and check the result.  That's more or
less the way it always worked before, and it seems a simpler model.

Some work in the filesystem would be required to make sure it would be
ready to return an answer on that second request.

I also think there's a problem with lock cancelling in the current code.

--b.

> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/lockd/svclock.c          |   58 ++++++++++++++++++++++++++++++++++---------
>  include/linux/lockd/lockd.h |   35 ++++++++++++++++++++++++++
>  2 files changed, 81 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index 6063a8e..5ecd1db 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -611,6 +611,35 @@ nlmsvc_cancel_blocked(struct nlm_file *file, struct nlm_lock *lock)
>  }
>  
>  /*
> + * Would granting this block cause a conflict with one already granted? Check
> + * by walking the f_blocks list for the file.
> + */
> +static int
> +nlmsvc_check_overlap(struct nlm_block *block)
> +{
> +	struct nlm_block *fblock, *next;
> +
> +	list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
> +				 b_flist) {
> +		/* skip myself */
> +		if (fblock == block)
> +			continue;
> +
> +		/* skip any locks that aren't granted */
> +		if (!fblock->b_granted)
> +			continue;
> +
> +		/* do block and fblock have any overlap in their ranges? */
> +		if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
> +					  &fblock->b_call->a_args.lock.fl))
> +			return 1;
> +	}
> +
> +	/* no conflicts found */
> +	return 0;
> +}
> +
> +/*
>   * This is a callback from the filesystem for VFS file lock requests.
>   * It will be used if fl_grant is defined and the filesystem can not
>   * respond to the request immediately.
> @@ -625,9 +654,10 @@ nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
>  			     int result)
>  {
>  	block->b_flags |= B_GOT_CALLBACK;
> -	if (result == 0)
> -		block->b_granted = 1;
> -	else
> +	if (result == 0) {
> +		if (!nlmsvc_check_overlap(block))
> +			block->b_granted = 1;
> +	} else
>  		block->b_flags |= B_TIMED_OUT;
>  	if (conf) {
>  		if (block->b_fl)
> @@ -643,22 +673,26 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
>  
>  	lock_kernel();
>  	list_for_each_entry(block, &nlm_blocked, b_list) {
> -		if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> -			dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> -							block, block->b_flags);
> +		if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
> +			dprintk("lockd: nlmsvc_notify_blocked block %p flags "
> +				"%d\n", block, block->b_flags);
> +
>  			if (block->b_flags & B_QUEUED) {
>  				if (block->b_flags & B_TIMED_OUT) {
> -					rc = -ENOLCK;
> -					break;
> +					if (rc == -ENOENT)
> +						rc = -ENOLCK;
> +					continue;
>  				}
> -				nlmsvc_update_deferred_block(block, conf, result);
> -			} else if (result == 0)
> -				block->b_granted = 1;
> +				nlmsvc_update_deferred_block(block, conf,
> +							     result);
> +			} else if (result == 0) {
> +				if (!nlmsvc_check_overlap(block))
> +					block->b_granted = 1;
> +			}
>  
>  			nlmsvc_insert_block(block, 0);
>  			svc_wake_up(block->b_daemon);
>  			rc = 0;
> -			break;
>  		}
>  	}
>  	unlock_kernel();
> diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> index b56d5aa..aa38d47 100644
> --- a/include/linux/lockd/lockd.h
> +++ b/include/linux/lockd/lockd.h
> @@ -367,6 +367,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
>  	     &&(fl1->fl_type  == fl2->fl_type || fl2->fl_type == F_UNLCK);
>  }
>  
> +/*
> + * Compare two NLM locks.
> + * When the second lock is of type F_UNLCK, this acts like a wildcard.
> + * Similar to nlm_compare_locks, but also return true as long as fl1's range
> + * is completely covered by fl2.
> + */
> +static inline int nlm_lock_in_range(const struct file_lock *fl1,
> +				    const struct file_lock *fl2)
> +{
> +	return	fl1->fl_pid   == fl2->fl_pid
> +	     && fl1->fl_owner == fl2->fl_owner
> +	     && fl1->fl_start >= fl2->fl_start
> +	     && fl1->fl_end   <= fl2->fl_end
> +	     && (fl1->fl_type  == fl2->fl_type || fl2->fl_type == F_UNLCK);
> +}
> +
> +static inline int nlm_overlapping_locks(const struct file_lock *fl1,
> +					const struct file_lock *fl2)
> +{
> +	/* does fl1 completely cover fl2? */
> +	if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
> +		return 1;
> +
> +	/* does fl2 completely cover fl1 */
> +	if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
> +		return 1;
> +
> +	/* does the start or end of fl1 sit inside of fl2? */
> +	if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
> +	    (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
> +		return 1;
> +
> +	return 0;
> +}
> +
>  extern struct lock_manager_operations nlmsvc_lock_operations;
>  
>  #endif /* __KERNEL__ */
> -- 
> 1.5.5.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2008-11-22  1:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-19 21:37 [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC) Jeff Layton
2008-11-22  1:15 ` J. Bruce Fields [this message]
2008-11-24 15:33   ` Jeff Layton
     [not found]     ` <20081124103313.0c779324-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-11-24 17:06       ` J. Bruce Fields
2008-11-25 15:12         ` Jeff Layton
2008-12-13 12:40         ` Jeff Layton
     [not found]           ` <20081213074042.2e8223c3-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-12-16 19:38             ` J. Bruce Fields
2008-12-16 19:56               ` J. Bruce Fields
2008-12-16 21:11                 ` Jeff Layton
     [not found]                   ` <20081216161158.2d173667-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-12-17 19:14                     ` David Teigland
2008-12-17 20:01                       ` J. Bruce Fields
2008-12-17 21:28                         ` David Teigland
2009-01-20 23:05                           ` J. Bruce Fields
2009-01-20 23:15                             ` J. Bruce Fields
2009-01-15 16:30                         ` David Teigland
2009-01-19 22:54                           ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081122011555.GA28485@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.