From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-nfs@vger.kernel.org, teigland@redhat.com
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)
Date: Fri, 21 Nov 2008 20:15:55 -0500 [thread overview]
Message-ID: <20081122011555.GA28485@fieldses.org> (raw)
In-Reply-To: <1227130623-31607-1-git-send-email-jlayton@redhat.com>
On Wed, Nov 19, 2008 at 04:37:03PM -0500, Jeff Layton wrote:
> Dave Teigland opened a bug stating that he was having some problems with
> DLM and lockd. Essentially, the problem boiled down to the fact that
> when you do a posix lock of a region that touches another lock, the VFS
> will coalesce the locks and return a lock that encompasses the whole
> range.
>
> The problem here is that DLM then does a fl_grant callback to lockd with
> the new coalesced lock. The fl_grant callback then looks through all
> of the blocks and eventually returns -ENOENT since none match the
> coalesced lock.
Ugh.
> I'm having a very hard time tracking down info about how the fl_grant
> callback is supposed to work. Is it OK to send an fl_grant callback
> with a lock that's larger than the one requested? If so, then lockd
> needs to account for that possibility. Also, what exactly is the
> purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?
It's only used in the case the lock failed, and it can optionally be set
to a copy of the conflicting lock.
> What follows is a patch that changes nlmsvc_grant_deferred to account
> for the possibility that it may receive an fl_grant that has already
> been coalesced. It changes nlmsvc_grant_deferred to walk the entire
> list of blocks and grant any blocks that are covered by the range of
> the lock in the grant callback, if doing so will not conflict with an
> earlier grant.
Hm. That might work.
> The patch is still very rough and is probably broken in subtle (and
> maybe overt) ways, but it fixes the reproducer that Dave provided. It's
> just intended as a starting point for discussion. Can anyone clarify how
> fl_grant is supposed to work? Who's wrong here? DLM or NLM?
I think this wasn't thought through, apologies. (It was my
responsibility to make sure it was!)
I also occasionally think that it would be better to keep any actual
lock application the caller's responsibility, to the extent that's
possible--so fl_grant would just be a notification, and it would be up
to lockd to try the lock again and check the result. That's more or
less the way it always worked before, and it seems a simpler model.
Some work in the filesystem would be required to make sure it would be
ready to return an answer on that second request.
I also think there's a problem with lock cancelling in the current code.
--b.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> fs/lockd/svclock.c | 58 ++++++++++++++++++++++++++++++++++---------
> include/linux/lockd/lockd.h | 35 ++++++++++++++++++++++++++
> 2 files changed, 81 insertions(+), 12 deletions(-)
>
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index 6063a8e..5ecd1db 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -611,6 +611,35 @@ nlmsvc_cancel_blocked(struct nlm_file *file, struct nlm_lock *lock)
> }
>
> /*
> + * Would granting this block cause a conflict with one already granted? Check
> + * by walking the f_blocks list for the file.
> + */
> +static int
> +nlmsvc_check_overlap(struct nlm_block *block)
> +{
> + struct nlm_block *fblock, *next;
> +
> + list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
> + b_flist) {
> + /* skip myself */
> + if (fblock == block)
> + continue;
> +
> + /* skip any locks that aren't granted */
> + if (!fblock->b_granted)
> + continue;
> +
> + /* do block and fblock have any overlap in their ranges? */
> + if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
> + &fblock->b_call->a_args.lock.fl))
> + return 1;
> + }
> +
> + /* no conflicts found */
> + return 0;
> +}
> +
> +/*
> * This is a callback from the filesystem for VFS file lock requests.
> * It will be used if fl_grant is defined and the filesystem can not
> * respond to the request immediately.
> @@ -625,9 +654,10 @@ nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
> int result)
> {
> block->b_flags |= B_GOT_CALLBACK;
> - if (result == 0)
> - block->b_granted = 1;
> - else
> + if (result == 0) {
> + if (!nlmsvc_check_overlap(block))
> + block->b_granted = 1;
> + } else
> block->b_flags |= B_TIMED_OUT;
> if (conf) {
> if (block->b_fl)
> @@ -643,22 +673,26 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
>
> lock_kernel();
> list_for_each_entry(block, &nlm_blocked, b_list) {
> - if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> - dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> - block, block->b_flags);
> + if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
> + dprintk("lockd: nlmsvc_notify_blocked block %p flags "
> + "%d\n", block, block->b_flags);
> +
> if (block->b_flags & B_QUEUED) {
> if (block->b_flags & B_TIMED_OUT) {
> - rc = -ENOLCK;
> - break;
> + if (rc == -ENOENT)
> + rc = -ENOLCK;
> + continue;
> }
> - nlmsvc_update_deferred_block(block, conf, result);
> - } else if (result == 0)
> - block->b_granted = 1;
> + nlmsvc_update_deferred_block(block, conf,
> + result);
> + } else if (result == 0) {
> + if (!nlmsvc_check_overlap(block))
> + block->b_granted = 1;
> + }
>
> nlmsvc_insert_block(block, 0);
> svc_wake_up(block->b_daemon);
> rc = 0;
> - break;
> }
> }
> unlock_kernel();
> diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> index b56d5aa..aa38d47 100644
> --- a/include/linux/lockd/lockd.h
> +++ b/include/linux/lockd/lockd.h
> @@ -367,6 +367,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
> &&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> }
>
> +/*
> + * Compare two NLM locks.
> + * When the second lock is of type F_UNLCK, this acts like a wildcard.
> + * Similar to nlm_compare_locks, but also return true as long as fl1's range
> + * is completely covered by fl2.
> + */
> +static inline int nlm_lock_in_range(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + return fl1->fl_pid == fl2->fl_pid
> + && fl1->fl_owner == fl2->fl_owner
> + && fl1->fl_start >= fl2->fl_start
> + && fl1->fl_end <= fl2->fl_end
> + && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> +}
> +
> +static inline int nlm_overlapping_locks(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + /* does fl1 completely cover fl2? */
> + if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
> + return 1;
> +
> + /* does fl2 completely cover fl1 */
> + if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
> + return 1;
> +
> + /* does the start or end of fl1 sit inside of fl2? */
> + if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
> + (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
> + return 1;
> +
> + return 0;
> +}
> +
> extern struct lock_manager_operations nlmsvc_lock_operations;
>
> #endif /* __KERNEL__ */
> --
> 1.5.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-11-22 1:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-19 21:37 [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC) Jeff Layton
2008-11-22 1:15 ` J. Bruce Fields [this message]
2008-11-24 15:33 ` Jeff Layton
[not found] ` <20081124103313.0c779324-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-11-24 17:06 ` J. Bruce Fields
2008-11-25 15:12 ` Jeff Layton
2008-12-13 12:40 ` Jeff Layton
[not found] ` <20081213074042.2e8223c3-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-12-16 19:38 ` J. Bruce Fields
2008-12-16 19:56 ` J. Bruce Fields
2008-12-16 21:11 ` Jeff Layton
[not found] ` <20081216161158.2d173667-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-12-17 19:14 ` David Teigland
2008-12-17 20:01 ` J. Bruce Fields
2008-12-17 21:28 ` David Teigland
2009-01-20 23:05 ` J. Bruce Fields
2009-01-20 23:15 ` J. Bruce Fields
2009-01-15 16:30 ` David Teigland
2009-01-19 22:54 ` David Teigland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081122011555.GA28485@fieldses.org \
--to=bfields@fieldses.org \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=teigland@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox