Re: Lockups with btrfs on 3.16-rc1 - bisected

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Mason <clm@fb.com>
To: Waiman Long <waiman.long@hp.com>
Cc: Marc Dionne <marc.c.dionne@gmail.com>,
	Josef Bacik <jbacik@fb.com>, <linux-btrfs@vger.kernel.org>,
	<t-itoh@jp.fujitsu.com>
Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected
Date: Thu, 19 Jun 2014 17:50:41 -0400	[thread overview]
Message-ID: <53A35B31.4060203@fb.com> (raw)
In-Reply-To: <53A343B9.3000900@fb.com>

On 06/19/2014 04:10 PM, Chris Mason wrote:
> On 06/19/2014 01:52 PM, Waiman Long wrote:
>> On 06/19/2014 12:51 PM, Chris Mason wrote:
>>> On 06/18/2014 11:21 PM, Waiman Long wrote:
>>>> On 06/18/2014 10:11 PM, Chris Mason wrote:
>>>>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>>>>> Dionne<marc.c.dionne@gmail.com>   wrote:
>>>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>
>>>>>>> wrote:
>>>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>>>> And for an additional data point, just removing those
>>>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>>>>> the symptoms when lockdep is not enabled.
>>>>> Ok, somehow we've added a lock inversion here that wasn't here before.
>>>>> Thanks for confirming, I'll nail it down.
>>>>>
>>>>> -chris
>>>>>
>>>> I am pretty sure that the hangup is caused by the following kind of code
>>>> fragment in the locking.c file:
>>>>
>>>>   if (eb->lock_nested) {
>>>>                  read_lock(&eb->lock);
>>>>                  if (eb->lock_nested&&  current->pid ==
>>>> eb->lock_owner) {
>>>>
>>>> Is it possible to do the check without taking the read_lock?
>>> I think you're right, we haven't added any new recursive takers of the
>>> lock.  The path where we are deadlocking has an extent buffer that isn't
>>> in the path yet locked.  I think we're taking the read lock while that
>>> one is write locked.
>>>
>>> Reworking the nesting a big here.
>>>
>>> -chris
>>
>> I would like to take back my comments. I took out the read_lock, but the
>> process still hang while doing file activities on btrfs filesystem. So
>> the problem is trickier than I thought. Below are the stack backtraces
>> of some of the relevant processes.
>>
> 
> You weren't wrong, but it was also the tree trylock code.  Our trylocks
> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
> be a true trylock.  The confusing part is this hasn't really changed,
> but one of the callers must be a spinner where we used to have a blocker.

This is what I have queued up, it's working here.

-chris

commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
Author: Chris Mason <clm@fb.com>
Date:   Thu Jun 19 14:16:52 2014 -0700

    Btrfs: fix deadlocks with trylock on tree nodes
    
    The Btrfs tree trylock function is poorly named.  It always takes
    the spinlock and backs off if the blocking lock is held.  This
    can lead to surprising lockups because people expect it to really be a
    trylock.
    
    This commit makes it a pure trylock, both for the spinlock and the
    blocking lock.  It also reworks the nested lock handling slightly to
    avoid taking the read lock while a spinning write lock might be held.
    
    Signed-off-by: Chris Mason <clm@fb.com>

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 01277b8..5665d21 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -33,14 +33,14 @@ static void btrfs_assert_tree_read_locked(struct extent_buffer *eb);
  */
 void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
 	if (rw == BTRFS_WRITE_LOCK) {
 		if (atomic_read(&eb->blocking_writers) == 0) {
 			WARN_ON(atomic_read(&eb->spinning_writers) != 1);
@@ -65,14 +65,15 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
  */
 void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
+
 	if (rw == BTRFS_WRITE_LOCK_BLOCKING) {
 		BUG_ON(atomic_read(&eb->blocking_writers) != 1);
 		write_lock(&eb->lock);
@@ -99,6 +100,9 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
 again:
+	BUG_ON(!atomic_read(&eb->blocking_writers) &&
+	       current->pid == eb->lock_owner);
+
 	read_lock(&eb->lock);
 	if (atomic_read(&eb->blocking_writers) &&
 	    current->pid == eb->lock_owner) {
@@ -132,7 +136,9 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers))
 		return 0;
 
-	read_lock(&eb->lock);
+	if (!read_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers)) {
 		read_unlock(&eb->lock);
 		return 0;
@@ -151,7 +157,10 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers))
 		return 0;
-	write_lock(&eb->lock);
+
+	if (!write_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers)) {
 		write_unlock(&eb->lock);
@@ -168,14 +177,15 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->spinning_readers) == 0);
@@ -189,14 +199,15 @@ void btrfs_tree_read_unlock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->blocking_readers) == 0);
@@ -244,6 +255,7 @@ void btrfs_tree_unlock(struct extent_buffer *eb)
 	BUG_ON(blockers > 1);
 
 	btrfs_assert_tree_locked(eb);
+	eb->lock_owner = 0;
 	atomic_dec(&eb->write_locks);
 
 	if (blockers) {

next prev parent reply	other threads:[~2014-06-19 21:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-18 20:57 Lockups with btrfs on 3.16-rc1 - bisected Marc Dionne
2014-06-18 22:17 ` Waiman Long
2014-06-18 22:27   ` Josef Bacik
2014-06-18 22:47     ` Waiman Long
2014-06-18 23:10       ` Josef Bacik
2014-06-18 23:19         ` Waiman Long
2014-06-18 23:27           ` Chris Mason
2014-06-18 23:30             ` Waiman Long
2014-06-18 23:53               ` Chris Mason
2014-06-19  0:03                 ` Marc Dionne
2014-06-19  0:08                   ` Waiman Long
2014-06-19  0:41                     ` Marc Dionne
2014-06-19  2:03                       ` Marc Dionne
2014-06-19  2:11                         ` Chris Mason
2014-06-19  3:21                           ` Waiman Long
2014-06-19 16:51                             ` Chris Mason
2014-06-19 17:52                               ` Waiman Long
2014-06-19 20:10                                 ` Chris Mason
2014-06-19 21:50                                   ` Chris Mason [this message]
2014-06-19 23:21                                     ` Waiman Long
2014-06-20  3:20                                       ` Tsutomu Itoh
2014-06-21  1:09                                         ` Long, Wai Man
2014-06-19  9:49 ` btrfs-transacti:516 blocked 120 seconds on 3.16-rc1 Konstantinos Skarlatos

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:01277b8 dfblob:5665d21 )
 OR (
bs:"Re: Lockups with btrfs on 3.16-rc1 - bisected" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A35B31.4060203@fb.com \
    --to=clm@fb.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc.c.dionne@gmail.com \
    --cc=t-itoh@jp.fujitsu.com \
    --cc=waiman.long@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).