From: Peter Zijlstra <peterz@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
xfs@oss.sgi.com, linux-mm@kvack.org,
Michal Hocko <mhocko@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Wed, 18 May 2016 09:20:05 +0200 [thread overview]
Message-ID: <20160518072005.GA3193@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160517223549.GV26977@dastard>
On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote:
> On Tue, May 17, 2016 at 04:49:12PM +0200, Peter Zijlstra wrote:
> >
> > Thanks for writing all that down Dave!
> >
> > On Tue, May 17, 2016 at 09:10:56AM +1000, Dave Chinner wrote:
> >
> > > The reason we don't have lock clases for the ilock is that we aren't
> > > supposed to call memory reclaim with that lock held in exclusive
> > > mode. This is because reclaim can run transactions, and that may
> > > need to flush dirty inodes to make progress. Flushing dirty inode
> > > requires taking the ilock in shared mode.
> > >
> > > In the code path that was reported, we hold the ilock in /shared/
> > > mode with no transaction context (we are doing a read-only
> > > operation). This means we can run transactions in memory reclaim
> > > because a) we can't deadlock on the inode we hold locks on, and b)
> > > transaction reservations will be able to make progress as we don't
> > > hold any locks it can block on.
> >
> > Just to clarify; I read the above as that we cannot block on recursive
> > shared locks, is this correct?
> >
> > Because we can in fact block on down_read()+down_read() just fine, so if
> > you're assuming that, then something's busted.
>
> The transaction reservation path will run down_read_trylock() on the
> inode, not down_read(). Hence if there are no pending writers, it
> will happily take the lock twice and make progress, otherwise it
> will skip the inode and there's no deadlock. If there's a pending
> writer, then we have another context that is already in a
> transaction context and has already pushed the item, hence it is
> only in the scope of the current push because IO hasn't completed
> yet and removed it from the list.
>
> > Otherwise, I'm not quite reading it right, which is, given the
> > complexity of that stuff, entirely possible.
>
> There's a maze of dark, grue-filled twisty passages here...
I'm sure. Thanks for the extra detail; yes the down_read_trylock() thing
will work.
> > In any case; would something like this work for you? Its entirely
> > untested, but the idea is to mark an entire class to skip reclaim
> > validation, instead of marking individual sites.
>
> Probably would, but it seems like swatting a fly with runaway
> train. I'd much prefer a per-site annotation (e.g. as a GFP_ flag)
> so that we don't turn off something that will tell us we've made a
> mistake while developing new code...
Fair enough; if the mm folks don't object to 'wasting' a GFP flag on
this the below ought to do I think.
---
include/linux/gfp.h | 9 ++++++++-
kernel/locking/lockdep.c | 3 +++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..7b5b9db4c821 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -41,6 +41,7 @@ struct vm_area_struct;
#define ___GFP_OTHER_NODE 0x800000u
#define ___GFP_WRITE 0x1000000u
#define ___GFP_KSWAPD_RECLAIM 0x2000000u
+#define ___GFP_NOVALIDATE 0x4000000u
/* If the above are modified, __GFP_BITS_SHIFT may need updating */
/*
@@ -146,6 +147,11 @@ struct vm_area_struct;
* return NULL when direct reclaim and memory compaction have failed to allow
* the allocation to succeed. The OOM killer is not called with the current
* implementation.
+ *
+ * __GFP_NOVALIDATE: lockdep annotation; do not use this allocation to set
+ * LOCK_USED_IN_RECLAIM_FS and thereby avoid possible false positives.
+ * Use of this flag should have a comment explaining the false positive
+ * in detail.
*/
#define __GFP_IO ((__force gfp_t)___GFP_IO)
#define __GFP_FS ((__force gfp_t)___GFP_FS)
@@ -155,6 +161,7 @@ struct vm_area_struct;
#define __GFP_REPEAT ((__force gfp_t)___GFP_REPEAT)
#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL)
#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY)
+#define __GFP_NOVALIDATE ((__force gfp_t)___GFP_NOVALIDATE)
/*
* Action modifiers
@@ -188,7 +195,7 @@ struct vm_area_struct;
#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
/* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 27
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/*
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a7107c0e..88bf43c3bf76 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2853,6 +2853,9 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
if (unlikely(!debug_locks))
return;
+ if (gfp_mask & __GFP_NOVALIDATE)
+ return;
+
/* no reclaim without waiting on it */
if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
return;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Qu Wenruo <quwenruo@cn.fujitsu.com>,
xfs@oss.sgi.com, linux-mm@kvack.org,
Ingo Molnar <mingo@kernel.org>
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Wed, 18 May 2016 09:20:05 +0200 [thread overview]
Message-ID: <20160518072005.GA3193@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160517223549.GV26977@dastard>
On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote:
> On Tue, May 17, 2016 at 04:49:12PM +0200, Peter Zijlstra wrote:
> >
> > Thanks for writing all that down Dave!
> >
> > On Tue, May 17, 2016 at 09:10:56AM +1000, Dave Chinner wrote:
> >
> > > The reason we don't have lock clases for the ilock is that we aren't
> > > supposed to call memory reclaim with that lock held in exclusive
> > > mode. This is because reclaim can run transactions, and that may
> > > need to flush dirty inodes to make progress. Flushing dirty inode
> > > requires taking the ilock in shared mode.
> > >
> > > In the code path that was reported, we hold the ilock in /shared/
> > > mode with no transaction context (we are doing a read-only
> > > operation). This means we can run transactions in memory reclaim
> > > because a) we can't deadlock on the inode we hold locks on, and b)
> > > transaction reservations will be able to make progress as we don't
> > > hold any locks it can block on.
> >
> > Just to clarify; I read the above as that we cannot block on recursive
> > shared locks, is this correct?
> >
> > Because we can in fact block on down_read()+down_read() just fine, so if
> > you're assuming that, then something's busted.
>
> The transaction reservation path will run down_read_trylock() on the
> inode, not down_read(). Hence if there are no pending writers, it
> will happily take the lock twice and make progress, otherwise it
> will skip the inode and there's no deadlock. If there's a pending
> writer, then we have another context that is already in a
> transaction context and has already pushed the item, hence it is
> only in the scope of the current push because IO hasn't completed
> yet and removed it from the list.
>
> > Otherwise, I'm not quite reading it right, which is, given the
> > complexity of that stuff, entirely possible.
>
> There's a maze of dark, grue-filled twisty passages here...
I'm sure. Thanks for the extra detail; yes the down_read_trylock() thing
will work.
> > In any case; would something like this work for you? Its entirely
> > untested, but the idea is to mark an entire class to skip reclaim
> > validation, instead of marking individual sites.
>
> Probably would, but it seems like swatting a fly with runaway
> train. I'd much prefer a per-site annotation (e.g. as a GFP_ flag)
> so that we don't turn off something that will tell us we've made a
> mistake while developing new code...
Fair enough; if the mm folks don't object to 'wasting' a GFP flag on
this the below ought to do I think.
---
include/linux/gfp.h | 9 ++++++++-
kernel/locking/lockdep.c | 3 +++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..7b5b9db4c821 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -41,6 +41,7 @@ struct vm_area_struct;
#define ___GFP_OTHER_NODE 0x800000u
#define ___GFP_WRITE 0x1000000u
#define ___GFP_KSWAPD_RECLAIM 0x2000000u
+#define ___GFP_NOVALIDATE 0x4000000u
/* If the above are modified, __GFP_BITS_SHIFT may need updating */
/*
@@ -146,6 +147,11 @@ struct vm_area_struct;
* return NULL when direct reclaim and memory compaction have failed to allow
* the allocation to succeed. The OOM killer is not called with the current
* implementation.
+ *
+ * __GFP_NOVALIDATE: lockdep annotation; do not use this allocation to set
+ * LOCK_USED_IN_RECLAIM_FS and thereby avoid possible false positives.
+ * Use of this flag should have a comment explaining the false positive
+ * in detail.
*/
#define __GFP_IO ((__force gfp_t)___GFP_IO)
#define __GFP_FS ((__force gfp_t)___GFP_FS)
@@ -155,6 +161,7 @@ struct vm_area_struct;
#define __GFP_REPEAT ((__force gfp_t)___GFP_REPEAT)
#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL)
#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY)
+#define __GFP_NOVALIDATE ((__force gfp_t)___GFP_NOVALIDATE)
/*
* Action modifiers
@@ -188,7 +195,7 @@ struct vm_area_struct;
#define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
/* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT 26
+#define __GFP_BITS_SHIFT 27
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/*
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 81f1a7107c0e..88bf43c3bf76 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2853,6 +2853,9 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
if (unlikely(!debug_locks))
return;
+ if (gfp_mask & __GFP_NOVALIDATE)
+ return;
+
/* no reclaim without waiting on it */
if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
return;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-05-18 7:20 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-12 5:53 Xfs lockdep warning with for-dave-for-4.6 branch Qu Wenruo
2016-05-12 5:57 ` Darrick J. Wong
2016-05-12 8:03 ` Dave Chinner
2016-05-13 16:03 ` Michal Hocko
2016-05-13 16:03 ` Michal Hocko
2016-05-16 10:41 ` Peter Zijlstra
2016-05-16 10:41 ` Peter Zijlstra
2016-05-16 13:05 ` Michal Hocko
2016-05-16 13:05 ` Michal Hocko
2016-05-16 13:25 ` Peter Zijlstra
2016-05-16 13:25 ` Peter Zijlstra
2016-05-16 23:10 ` Dave Chinner
2016-05-16 23:10 ` Dave Chinner
2016-05-17 14:49 ` Peter Zijlstra
2016-05-17 14:49 ` Peter Zijlstra
2016-05-17 22:35 ` Dave Chinner
2016-05-17 22:35 ` Dave Chinner
2016-05-18 7:20 ` Peter Zijlstra [this message]
2016-05-18 7:20 ` Peter Zijlstra
2016-05-18 8:25 ` Michal Hocko
2016-05-18 8:25 ` Michal Hocko
2016-05-18 9:49 ` Peter Zijlstra
2016-05-18 9:49 ` Peter Zijlstra
2016-05-18 11:31 ` Michal Hocko
2016-05-18 11:31 ` Michal Hocko
2016-05-19 8:11 ` Peter Zijlstra
2016-05-19 8:11 ` Peter Zijlstra
2016-05-20 0:17 ` Dave Chinner
2016-05-20 0:17 ` Dave Chinner
2016-06-01 13:17 ` Michal Hocko
2016-06-01 13:17 ` Michal Hocko
2016-06-01 18:16 ` Peter Zijlstra
2016-06-01 18:16 ` Peter Zijlstra
2016-06-02 14:50 ` Michal Hocko
2016-06-02 14:50 ` Michal Hocko
2016-06-02 15:11 ` Peter Zijlstra
2016-06-02 15:11 ` Peter Zijlstra
2016-06-02 15:46 ` Michal Hocko
2016-06-02 15:46 ` Michal Hocko
2016-06-02 23:22 ` Dave Chinner
2016-06-02 23:22 ` Dave Chinner
2016-06-06 12:20 ` Michal Hocko
2016-06-06 12:20 ` Michal Hocko
2016-06-15 7:21 ` Dave Chinner
2016-06-15 7:21 ` Dave Chinner
2016-06-21 14:26 ` Michal Hocko
2016-06-21 14:26 ` Michal Hocko
2016-06-22 1:03 ` Dave Chinner
2016-06-22 1:03 ` Dave Chinner
2016-06-22 12:38 ` Michal Hocko
2016-06-22 12:38 ` Michal Hocko
2016-06-22 22:58 ` Dave Chinner
2016-06-22 22:58 ` Dave Chinner
2016-06-23 11:35 ` Michal Hocko
2016-06-23 11:35 ` Michal Hocko
2016-10-06 13:04 ` Michal Hocko
2016-10-06 13:04 ` Michal Hocko
2016-10-17 13:49 ` Michal Hocko
2016-10-17 13:49 ` Michal Hocko
2016-10-19 0:33 ` Dave Chinner
2016-10-19 0:33 ` Dave Chinner
2016-10-19 5:30 ` Dave Chinner
2016-10-19 5:30 ` Dave Chinner
2016-10-19 8:33 ` Peter Zijlstra
2016-10-19 8:33 ` Peter Zijlstra
2016-10-19 12:06 ` Michal Hocko
2016-10-19 12:06 ` Michal Hocko
2016-10-19 21:49 ` Dave Chinner
2016-10-19 21:49 ` Dave Chinner
2016-10-20 7:15 ` Michal Hocko
2016-10-20 7:15 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160518072005.GA3193@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=quwenruo@cn.fujitsu.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.