All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Jan Kara <jack@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 2/8] fix the broken lockdep logic in __sb_start_write()
Date: Tue, 11 Aug 2015 19:04:01 +0200	[thread overview]
Message-ID: <20150811170401.GA26904@redhat.com> (raw)
In-Reply-To: <20150811170343.GA26881@redhat.com>

1. wait_event(frozen < level) without rwsem_acquire_read() is just
   wrong from lockdep perspective. If we are going to deadlock
   because the caller is buggy, lockdep detect this problem.

2. __sb_start_write() can race with thaw_super() + freeze_super(),
   and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

3. The "tell lockdep we are doing trylock" hack doesn't look nice.

   I think this is correct, but this logic should be more explicit.
   Yes, the recursive read_lock() is fine if we hold the lock on a
   higher level. But we do not need to fool lockdep. If we can not
   deadlock in this case then try-lock must not fail and we can use
   use wait == F throughout this code.

Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   73 ++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 928c20f..d0fdd49 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level)
 }
 EXPORT_SYMBOL(__sb_end_write);
 
-#ifdef CONFIG_LOCKDEP
-/*
- * We want lockdep to tell us about possible deadlocks with freezing but
- * it's it bit tricky to properly instrument it. Getting a freeze protection
- * works as getting a read lock but there are subtle problems. XFS for example
- * gets freeze protection on internal level twice in some cases, which is OK
- * only because we already hold a freeze protection also on higher level. Due
- * to these cases we have to tell lockdep we are doing trylock when we
- * already hold a freeze protection for a higher freeze level.
- */
-static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
+static int do_sb_start_write(struct super_block *sb, int level, bool wait,
 				unsigned long ip)
 {
-	int i;
-
-	if (!trylock) {
-		for (i = 0; i < level - 1; i++)
-			if (lock_is_held(&sb->s_writers.lock_map[i])) {
-				trylock = true;
-				break;
-			}
-	}
-	rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
-}
-#endif
-
-/*
- * This is an internal function, please use sb_start_{write,pagefault,intwrite}
- * instead.
- */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
-{
+	if (wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
 retry:
 	if (unlikely(sb->s_writers.frozen >= level)) {
 		if (!wait)
@@ -1198,9 +1171,6 @@ retry:
 			   sb->s_writers.frozen < level);
 	}
 
-#ifdef CONFIG_LOCKDEP
-	acquire_freeze_lock(sb, level, !wait, _RET_IP_);
-#endif
 	percpu_counter_inc(&sb->s_writers.counter[level-1]);
 	/*
 	 * Make sure counter is updated before we check for frozen.
@@ -1211,8 +1181,45 @@ retry:
 		__sb_end_write(sb, level);
 		goto retry;
 	}
+
+	if (!wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
 	return 1;
 }
+
+/*
+ * This is an internal function, please use sb_start_{write,pagefault,intwrite}
+ * instead.
+ */
+int __sb_start_write(struct super_block *sb, int level, bool wait)
+{
+	bool force_trylock = false;
+	int ret;
+
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * We want lockdep to tell us about possible deadlocks with freezing
+	 * but it's it bit tricky to properly instrument it. Getting a freeze
+	 * protection works as getting a read lock but there are subtle
+	 * problems. XFS for example gets freeze protection on internal level
+	 * twice in some cases, which is OK only because we already hold a
+	 * freeze protection also on higher level. Due to these cases we have
+	 * to use wait == F (trylock mode) which must not fail.
+	 */
+	if (wait) {
+		int i;
+
+		for (i = 0; i < level - 1; i++)
+			if (lock_is_held(&sb->s_writers.lock_map[i])) {
+				force_trylock = true;
+				break;
+			}
+	}
+#endif
+	ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
+	WARN_ON(force_trylock & !ret);
+	return ret;
+}
 EXPORT_SYMBOL(__sb_start_write);
 
 /**
-- 
1.5.5.1

  parent reply	other threads:[~2015-08-11 17:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-11 17:03 [PATCH v2 0/8] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-08-11 17:03 ` [PATCH v2 1/8] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
2015-08-13  9:45   ` Jan Kara
2015-08-13  9:56     ` Jan Kara
2015-08-13 13:17       ` Oleg Nesterov
2015-08-13 13:32         ` Jan Kara
2015-08-13 13:37           ` Oleg Nesterov
2015-08-11 17:04 ` Oleg Nesterov [this message]
2015-08-13 10:02   ` [PATCH v2 2/8] fix the broken lockdep logic in __sb_start_write() Jan Kara
2015-08-13 13:22     ` Oleg Nesterov
2015-08-13 13:29       ` Jan Kara
2015-08-11 17:04 ` [PATCH v2 3/8] document rwsem_release() in sb_wait_write() Oleg Nesterov
2015-08-13 10:22   ` Jan Kara
2015-08-11 17:04 ` [PATCH v2 4/8] percpu-rwsem: introduce percpu_down_read_trylock() Oleg Nesterov
2015-08-11 17:04 ` [PATCH v2 5/8] percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() Oleg Nesterov
2015-08-11 17:04 ` [PATCH v2 6/8] percpu-rwsem: kill CONFIG_PERCPU_RWSEM Oleg Nesterov
2015-08-11 17:04 ` [PATCH v2 7/8] shift percpu_counter_destroy() into destroy_super_work() Oleg Nesterov
2015-08-13 10:35   ` Jan Kara
2015-08-13 13:36     ` Oleg Nesterov
2015-08-13 14:09       ` Jan Kara
2015-08-13 15:20         ` Oleg Nesterov
2015-08-11 17:04 ` [PATCH v2 8/8] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-08-13 10:48   ` Jan Kara
2015-08-12 13:11 ` [PATCH v2 9/8] don't fool lockdep in freeze_super() and thaw_super() paths Oleg Nesterov
2015-08-13 11:01   ` Jan Kara
2015-08-13 13:58     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150811170401.GA26904@redhat.com \
    --to=oleg@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.