* [PATCH v4] futex: Remove requirement for lock_page in get_futex_key
@ 2016-01-23 17:14 Davidlohr Bueso
2016-01-27 20:10 ` Thomas Gleixner
2016-01-28 3:02 ` Hugh Dickins
0 siblings, 2 replies; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-23 17:14 UTC (permalink / raw)
To: Mel Gorman, Peter Zijlstra, Thomas Gleixner, Ingo Molnar
Cc: Sebastian Andrzej Siewior, Chris Mason, Darren Hart, dave,
linux-kernel, Davidlohr Bueso, Mel Gorman
From: Mel Gorman <mgorman@suse.de>
When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.
Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.
Task A
get_futex_key()
lock_page()
---> preempted
Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.
With this patch, we pretty much have a lockless futex_get_key().
Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
[ported on top of thp refcount rework, changelog, comments, fixes]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
Changes from v3:
- Redo mapping sanity check, now do not halt the kernel.
Changes from v2:
- Minor adjustments by peterz.
- Applies on top of -next-20160118
Changes from v1:
- Remove unnecesary mb, as atomic_inc returning does what we need.
- Fix bogus mapping load.
- Minor code cleanups/comments.
kernel/futex.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 85 insertions(+), 8 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 0773f2b..6b02b5b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -520,7 +520,20 @@ again:
else
err = 0;
- lock_page(page);
+ /*
+ * The treatment of mapping from this point on is critical. The page
+ * lock protects many things but in this context the page lock
+ * stabilizes mapping, prevents inode freeing in the shared
+ * file-backed region case and guards against movement to swap cache.
+ *
+ * Strictly speaking the page lock is not needed in all cases being
+ * considered here and page lock forces unnecessarily serialization
+ * From this point on, mapping will be re-verified if necessary and
+ * page lock will be acquired only if it is unavoidable
+ */
+ page = compound_head(page);
+ mapping = READ_ONCE(page->mapping);
+
/*
* If page->mapping is NULL, then it cannot be a PageAnon
* page; but it might be the ZERO_PAGE or in the gate area or
@@ -536,19 +549,32 @@ again:
* shmem_writepage move it from filecache to swapcache beneath us:
* an unlikely race, but we do need to retry for page->mapping.
*/
- mapping = compound_head(page)->mapping;
- if (!mapping) {
- int shmem_swizzled = PageSwapCache(page);
+ if (unlikely(!mapping)) {
+ int shmem_swizzled;
+
+ /*
+ * Page lock is required to identify which special case above
+ * applies. If this is really a shmem page then the page lock
+ * will prevent unexpected transitions.
+ */
+ lock_page(page);
+ shmem_swizzled = PageSwapCache(page);
unlock_page(page);
put_page(page);
+ WARN_ON_ONCE(READ_ONCE(page->mapping));
+
if (shmem_swizzled)
goto again;
+
return -EFAULT;
}
/*
* Private mappings are handled in a simple way.
*
+ * If the futex key is stored on an anonymous page, then the associated
+ * object is the mm which is implicitly pinned by the calling process.
+ *
* NOTE: When userspace waits on a MAP_SHARED mapping, even if
* it's a read-only handle, it's expected that futexes attach to
* the object not the particular process.
@@ -566,16 +592,67 @@ again:
key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
key->private.mm = mm;
key->private.address = address;
+
+ get_futex_key_refs(key); /* implies MB (B) */
+
} else {
+ struct inode *inode;
+
+ /*
+ * The associtated futex object in this case is the inode and
+ * the page->mapping must be traversed. Ordinarily this should
+ * be stabilised under page lock but it's not strictly
+ * necessary in this case as we just want to pin the inode, not
+ * update radix tree or anything like that.
+ *
+ * The RCU read lock is taken as the inode is finally freed
+ * under RCU. If the mapping still matches expectations then the
+ * mapping->host can be safely accessed as being a valid inode.
+ */
+ rcu_read_lock();
+ if (READ_ONCE(page->mapping) != mapping ||
+ !mapping->host) {
+ rcu_read_unlock();
+ put_page(page);
+
+ goto again;
+ }
+ inode = READ_ONCE(mapping->host);
+
+ /*
+ * Take a reference unless it is about to be freed. Previously
+ * this reference was taken by ihold under the page lock
+ * pinning the inode in place so i_lock was unnecessary. The
+ * only way for this check to fail is if the inode was
+ * truncated in parallel so warn for now if this happens.
+ *
+ * We are not calling into get_futex_key_refs() in file-backed
+ * cases, therefore a successful atomic_inc return below will
+ * guarantee that get_futex_key() will continue to imply MB (B).
+ */
+ if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
+ rcu_read_unlock();
+ put_page(page);
+
+ goto again;
+ }
+
+ /* Should be impossible but lets be paranoid for now */
+ if (WARN_ON_ONCE(inode->i_mapping != mapping)) {
+ err = -EFAULT;
+ iput(inode);
+ rcu_read_unlock();
+
+ goto out;
+ }
+
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
- key->shared.inode = mapping->host;
+ key->shared.inode = inode;
key->shared.pgoff = basepage_index(page);
+ rcu_read_unlock();
}
- get_futex_key_refs(key); /* implies MB (B) */
-
out:
- unlock_page(page);
put_page(page);
return err;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4] futex: Remove requirement for lock_page in get_futex_key
2016-01-23 17:14 [PATCH v4] futex: Remove requirement for lock_page in get_futex_key Davidlohr Bueso
@ 2016-01-27 20:10 ` Thomas Gleixner
2016-01-28 3:02 ` Hugh Dickins
1 sibling, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2016-01-27 20:10 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Mel Gorman, Peter Zijlstra, Ingo Molnar,
Sebastian Andrzej Siewior, Chris Mason, Darren Hart, dave,
linux-kernel, Mel Gorman
On Sat, 23 Jan 2016, Davidlohr Bueso wrote:
> + if (unlikely(!mapping)) {
> + int shmem_swizzled;
> +
> + /*
> + * Page lock is required to identify which special case above
> + * applies. If this is really a shmem page then the page lock
> + * will prevent unexpected transitions.
> + */
> + lock_page(page);
> + shmem_swizzled = PageSwapCache(page);
> unlock_page(page);
> put_page(page);
> + WARN_ON_ONCE(READ_ONCE(page->mapping));
You just did put_page(page). So dereferencing page is a nono. This needs to be
flipped around.
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] futex: Remove requirement for lock_page in get_futex_key
2016-01-23 17:14 [PATCH v4] futex: Remove requirement for lock_page in get_futex_key Davidlohr Bueso
2016-01-27 20:10 ` Thomas Gleixner
@ 2016-01-28 3:02 ` Hugh Dickins
2016-01-29 18:35 ` Davidlohr Bueso
1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2016-01-28 3:02 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Mel Gorman, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
Sebastian Andrzej Siewior, Chris Mason, Darren Hart, dave,
linux-kernel, Mel Gorman
On Sat, 23 Jan 2016, Davidlohr Bueso wrote:
> From: Mel Gorman <mgorman@suse.de>
>
> When dealing with key handling for shared futexes, we can drastically reduce
> the usage/need of the page lock. 1) For anonymous pages, the associated futex
> object is the mm_struct which does not require the page lock. 2) For inode
> based, keys, we can check under RCU read lock if the page mapping is still
> valid and take reference to the inode. This just leaves one rare race that
> requires the page lock in the slow path when examining the swapcache.
>
> Additionally realtime users currently have a problem with the page lock being
> contended for unbounded periods of time during futex operations.
>
> Task A
> get_futex_key()
> lock_page()
> ---> preempted
>
> Now any other task trying to lock that page will have to wait until
> task A gets scheduled back in, which is an unbound time.
>
> With this patch, we pretty much have a lockless futex_get_key().
>
> Experiments show that this patch can boost/speedup the hashing of shared
> futexes with the perf futex benchmarks (which is good for measuring such
> change) by up to 45% when there are high (> 100) thread counts on a 60 core
> Westmere. Lower counts are pretty much in the noise range or less than 10%,
> but mid range can be seen at over 30% overall throughput (hash ops/sec).
> This makes anon-mem shared futexes much closer to its private counterpart.
>
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> [ported on top of thp refcount rework, changelog, comments, fixes]
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>
> Changes from v3:
> - Redo mapping sanity check, now do not halt the kernel.
>
> Changes from v2:
>
> - Minor adjustments by peterz.
> - Applies on top of -next-20160118
>
> Changes from v1:
> - Remove unnecesary mb, as atomic_inc returning does what we need.
> - Fix bogus mapping load.
> - Minor code cleanups/comments.
>
> kernel/futex.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 85 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 0773f2b..6b02b5b 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -520,7 +520,20 @@ again:
> else
> err = 0;
>
> - lock_page(page);
> + /*
> + * The treatment of mapping from this point on is critical. The page
> + * lock protects many things but in this context the page lock
> + * stabilizes mapping, prevents inode freeing in the shared
> + * file-backed region case and guards against movement to swap cache.
> + *
> + * Strictly speaking the page lock is not needed in all cases being
> + * considered here and page lock forces unnecessarily serialization
> + * From this point on, mapping will be re-verified if necessary and
> + * page lock will be acquired only if it is unavoidable
> + */
> + page = compound_head(page);
> + mapping = READ_ONCE(page->mapping);
> +
> /*
> * If page->mapping is NULL, then it cannot be a PageAnon
> * page; but it might be the ZERO_PAGE or in the gate area or
> @@ -536,19 +549,32 @@ again:
> * shmem_writepage move it from filecache to swapcache beneath us:
> * an unlikely race, but we do need to retry for page->mapping.
> */
> - mapping = compound_head(page)->mapping;
> - if (!mapping) {
> - int shmem_swizzled = PageSwapCache(page);
> + if (unlikely(!mapping)) {
> + int shmem_swizzled;
> +
> + /*
> + * Page lock is required to identify which special case above
> + * applies. If this is really a shmem page then the page lock
> + * will prevent unexpected transitions.
> + */
> + lock_page(page);
> + shmem_swizzled = PageSwapCache(page);
> unlock_page(page);
> put_page(page);
> + WARN_ON_ONCE(READ_ONCE(page->mapping));
Good point from Thomas, but it's worse than that: the patch as it stands
makes no sense here. There is no point in doing a lock_page() just to
look at the PageSwapCache bit; which like page->mapping may change again
immediately after the unlock_page(). (Certainly very unlikely, but...)
What the page lock is here for, is to take a snapshot of page->mapping
and PageSwapCache(page) together, to prevent either one of them changing
while we decide. So you need something like
lock_page(page);
shmem_swizzled = PageSwapCache(page) || page->mapping;
unlock_page(page);
put_page(page);
Just drop the WARN_ON_ONCE. And the whole case will be so very rare,
after the preceding get_user_pages_fast(), that you're absolutely right
not to bother to try to avoid the lock_page/unlock_page in just this block.
> +
> if (shmem_swizzled)
> goto again;
> +
> return -EFAULT;
> }
>
> /*
> * Private mappings are handled in a simple way.
> *
> + * If the futex key is stored on an anonymous page, then the associated
> + * object is the mm which is implicitly pinned by the calling process.
> + *
> * NOTE: When userspace waits on a MAP_SHARED mapping, even if
> * it's a read-only handle, it's expected that futexes attach to
> * the object not the particular process.
> @@ -566,16 +592,67 @@ again:
> key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
> key->private.mm = mm;
> key->private.address = address;
> +
> + get_futex_key_refs(key); /* implies MB (B) */
> +
> } else {
> + struct inode *inode;
> +
> + /*
> + * The associtated futex object in this case is the inode and
> + * the page->mapping must be traversed. Ordinarily this should
> + * be stabilised under page lock but it's not strictly
> + * necessary in this case as we just want to pin the inode, not
> + * update radix tree or anything like that.
> + *
> + * The RCU read lock is taken as the inode is finally freed
> + * under RCU. If the mapping still matches expectations then the
> + * mapping->host can be safely accessed as being a valid inode.
> + */
> + rcu_read_lock();
> + if (READ_ONCE(page->mapping) != mapping ||
> + !mapping->host) {
If you're being as paranoid as all the WARN_ON_ONCEs hereabouts imply,
then it would be better to do the inode = READ_ONCE(mapping->host)
before checking !inode rather than !mapping->host.
> + rcu_read_unlock();
> + put_page(page);
> +
> + goto again;
> + }
> + inode = READ_ONCE(mapping->host);
> +
> + /*
> + * Take a reference unless it is about to be freed. Previously
> + * this reference was taken by ihold under the page lock
> + * pinning the inode in place so i_lock was unnecessary. The
> + * only way for this check to fail is if the inode was
> + * truncated in parallel so warn for now if this happens.
> + *
> + * We are not calling into get_futex_key_refs() in file-backed
> + * cases, therefore a successful atomic_inc return below will
> + * guarantee that get_futex_key() will continue to imply MB (B).
> + */
> + if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) {
> + rcu_read_unlock();
> + put_page(page);
> +
> + goto again;
> + }
> +
> + /* Should be impossible but lets be paranoid for now */
> + if (WARN_ON_ONCE(inode->i_mapping != mapping)) {
> + err = -EFAULT;
> + iput(inode);
> + rcu_read_unlock();
I think this is probably a WARN_ON_ONCE too many (but I'm error-prone on
inode -> i_mapping -> host relationships, so ignore me); but if it's kept
then I think you ought to do the iput(inode) after the rcu_read_unlock() -
iput() can get into lots more work than you expect.
Otherwise it appeared to be good to me (but years since I've been near here).
Hugh
> +
> + goto out;
> + }
> +
> key->both.offset |= FUT_OFF_INODE; /* inode-based key */
> - key->shared.inode = mapping->host;
> + key->shared.inode = inode;
> key->shared.pgoff = basepage_index(page);
> + rcu_read_unlock();
> }
>
> - get_futex_key_refs(key); /* implies MB (B) */
> -
> out:
> - unlock_page(page);
> put_page(page);
> return err;
> }
> --
> 2.1.4
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] futex: Remove requirement for lock_page in get_futex_key
2016-01-28 3:02 ` Hugh Dickins
@ 2016-01-29 18:35 ` Davidlohr Bueso
2016-01-29 18:46 ` Hugh Dickins
0 siblings, 1 reply; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-29 18:35 UTC (permalink / raw)
To: Hugh Dickins
Cc: Davidlohr Bueso, Mel Gorman, Peter Zijlstra, Thomas Gleixner,
Ingo Molnar, Sebastian Andrzej Siewior, Chris Mason, Darren Hart,
linux-kernel, Mel Gorman
On Wed, 27 Jan 2016, Hugh Dickins wrote:
>> + *
>> + * The RCU read lock is taken as the inode is finally freed
>> + * under RCU. If the mapping still matches expectations then the
>> + * mapping->host can be safely accessed as being a valid inode.
>> + */
>> + rcu_read_lock();
>> + if (READ_ONCE(page->mapping) != mapping ||
>> + !mapping->host) {
>
>If you're being as paranoid as all the WARN_ON_ONCEs hereabouts imply,
>then it would be better to do the inode = READ_ONCE(mapping->host)
>before checking !inode rather than !mapping->host.
Ok, it also reads a bit nicer than the above, which was simply avoiding
a load in the again case.
rcu_read_lock();
inode = READ_ONCE(mapping->host);
if (!inode || READ_ONCE(page->mapping) != mapping)
rcu_read_unlock();
put_page(page);
goto again;
}
[...]
>> +
>> + /* Should be impossible but lets be paranoid for now */
>> + if (WARN_ON_ONCE(inode->i_mapping != mapping)) {
>> + err = -EFAULT;
>> + iput(inode);
>> + rcu_read_unlock();
>
>I think this is probably a WARN_ON_ONCE too many (but I'm error-prone on
>inode -> i_mapping -> host relationships, so ignore me); but if it's kept
>then I think you ought to do the iput(inode) after the rcu_read_unlock() -
>iput() can get into lots more work than you expect.
I'd rather keep some verbosity if something screws up here, albeit very rare, yes.
You make a good point about doing the inode cleanup while holding rcu, and moving
it down should be safe here even if the inode disappears after that iput (ie no
dereferencing). Secondly, we still hold reference to the inode by the time we drop
the rcu read lock, so thats safe too.
>
>Otherwise it appeared to be good to me (but years since I've been near here).
Thanks for taking a look at this. I'll send a v5 with your suggestions.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4] futex: Remove requirement for lock_page in get_futex_key
2016-01-29 18:35 ` Davidlohr Bueso
@ 2016-01-29 18:46 ` Hugh Dickins
0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2016-01-29 18:46 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Hugh Dickins, Davidlohr Bueso, Mel Gorman, Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Sebastian Andrzej Siewior,
Chris Mason, Darren Hart, linux-kernel, Mel Gorman
On Fri, 29 Jan 2016, Davidlohr Bueso wrote:
> On Wed, 27 Jan 2016, Hugh Dickins wrote:
>
> > > + *
> > > + * The RCU read lock is taken as the inode is finally freed
> > > + * under RCU. If the mapping still matches expectations then
> > > the
> > > + * mapping->host can be safely accessed as being a valid
> > > inode.
> > > + */
> > > + rcu_read_lock();
> > > + if (READ_ONCE(page->mapping) != mapping ||
> > > + !mapping->host) {
> >
> > If you're being as paranoid as all the WARN_ON_ONCEs hereabouts imply,
> > then it would be better to do the inode = READ_ONCE(mapping->host)
> > before checking !inode rather than !mapping->host.
>
> Ok, it also reads a bit nicer than the above, which was simply avoiding
> a load in the again case.
>
> rcu_read_lock();
> inode = READ_ONCE(mapping->host);
Just a quick unthinking from-the-hip response to that, something
for you to think over before sending v5: without looking back
at the code, it's not obvious to me that it's safe to read
mapping->host before we've confirmed under rcu_read_lock()
that page->mapping still matches mapping.
>
> if (!inode || READ_ONCE(page->mapping) != mapping)
> rcu_read_unlock();
> put_page(page);
>
> goto again;
> }
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-01-29 18:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-23 17:14 [PATCH v4] futex: Remove requirement for lock_page in get_futex_key Davidlohr Bueso
2016-01-27 20:10 ` Thomas Gleixner
2016-01-28 3:02 ` Hugh Dickins
2016-01-29 18:35 ` Davidlohr Bueso
2016-01-29 18:46 ` Hugh Dickins
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.