linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: "J. Bruce Fields" <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, asn@redhat.com
Subject: Re: [PATCH] locks: ignore same lock in blocked_lock_hash
Date: Fri, 22 Mar 2019 12:58:47 +1100	[thread overview]
Message-ID: <87sgvf6560.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190322004233.GA28329@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 3828 bytes --]

On Thu, Mar 21 2019, J. Bruce Fields wrote:

> On Thu, Mar 21, 2019 at 07:27:44AM -0400, Jeff Layton wrote:
>> Andreas reported that he was seeing the tdbtorture test fail in
>> some cases with -EDEADLCK when it wasn't before. Some debugging
>> showed that deadlock detection was sometimes discovering the
>> caller's lock request itself in a dependency chain.
>> 
>> If posix_locks_deadlock() fails to find a deadlock, the caller_fl
>> will be passed to __locks_insert_block(), and this wakes up all
>> locks that are blocked on caller_fl, clearing the fl_blocker link.
>> 
>> So if posix_locks_deadlock() finds caller_fl while searching for
>> a deadlock,
>
> I'm feeling dense.  Could you step me through the scenario in a little
> more detail?

Three processes: A, B, C.  One byte in one file: X

A gets a read lock on X.  Doesn't block, so nothing goes in the hash
  table.

B gets a read lock on X.  Doesn't block.

C tries to get a write lock on X.  It cannot so it can block on either A
  or B.  It chooses B.  It check if this might cause a deadlock, but the
  hash is empty so it doesn't.  It gets added to the hash. "C -> B"

A tries to get a write lock on X. It cannot because B has a read lock,
  so it needs to block on B, or some child.  C is a child and A's
  request conflicts with C's request, so A blocks on C.
  It first checks: Does A->C cause a loop in the hash-table?  No. So
  it proceeds to block and adds  "A->C" to the hash table.

B drops its read lock. This wakes up the first child: C.

C tries (again) to get a write lock on X.  It cannot because A still
  holds a read lock, so it will have to block on A.  First it must
  check if "C->A" will cause a loop in the hash-table ... Yes it will
  because "A->C" is already there - error!

This is a false-positive because the very next thing to be done after
posix_locks_deadlock() (while still holding all spinlocks), is
__locks_insert_block(C->A) which calls __locks_wake_up_blocks(C) which
calls __locks_delete_block() for any pending block ??->C and so will
remove A->C from the hash.

Maybe we could call __locks_wake_up_block() *before* calling
posix_locks_deadlock().  That *might* make more sense. 
  

>
> Also, how do we know this catches every such case?

The only links in the hash table that are about to be removed are those
that record that something is blocked by C (the lock we are trying to
get).  So these are the only ones that it is reasonable to special-case
in posix_locks_deadlock().

>
> And why aren't we unhashing blocks when we wake them up?

We do - that is exactly when we unhash them.  The problem is they we
wake them up *after* we check for errors, rather than before.

Thanks,
NeilBrown


>
> --b.
>
>> it can be sure that link in the cycle is about to be
>> broken and it need not treat it as the cause of a deadlock.
>> 
>> More details here: https://bugzilla.kernel.org/show_bug.cgi?id=202975
>> 
>> Cc: stable@vger.kernel.org
>> Fixes: 5946c4319ebb ("fs/locks: allow a lock request to block other requests.")
>> Reported-by: Andreas Schneider <asn@redhat.com>
>> Signed-off-by: Neil Brown <neilb@suse.com>
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>  fs/locks.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/fs/locks.c b/fs/locks.c
>> index eaa1cfaf73b0..b074f6d7fd2d 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -1023,6 +1023,10 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
>>  	while ((block_fl = what_owner_is_waiting_for(block_fl))) {
>>  		if (i++ > MAX_DEADLK_ITERATIONS)
>>  			return 0;
>> +
>> +		if (caller_fl == block_fl)
>> +			return 0;
>> +
>>  		if (posix_same_owner(caller_fl, block_fl))
>>  			return 1;
>>  	}
>> -- 
>> 2.20.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2019-03-22  1:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-21 11:27 [PATCH] locks: ignore same lock in blocked_lock_hash Jeff Layton
2019-03-21 21:51 ` NeilBrown
2019-03-22  0:42 ` J. Bruce Fields
2019-03-22  1:58   ` NeilBrown [this message]
2019-03-22 20:27     ` J. Bruce Fields
2019-03-23 12:08       ` [PATCH v2] " Jeff Layton
2019-03-24  0:51         ` NeilBrown
2019-03-25 12:31           ` Jeff Layton
2019-03-25 12:32             ` [PATCH v3] locks: wake any locks blocked on request before deadlock check Jeff Layton
2019-03-25 22:09               ` J. Bruce Fields
2019-03-25 22:33               ` NeilBrown
2019-03-23 12:05     ` [PATCH] locks: ignore same lock in blocked_lock_hash Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sgvf6560.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=asn@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).