From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Zhang Yi <zhang.yi20@zte.com.cn>,
Jiang Biao <jiang.biao2@zte.com.cn>,
Ma Chenggong <ma.chenggong@zte.com.cn>,
Mel Gorman <mgorman@suse.de>,
Darren Hart <dvhart@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [ 11/19] futex: Take hugepages into account when generating futex_key
Date: Thu, 11 Jul 2013 15:01:28 -0700 [thread overview]
Message-ID: <20130711214831.923621274@linuxfoundation.org> (raw)
In-Reply-To: <20130711214830.611455274@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhang Yi <wetpzy@gmail.com>
commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.
The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.
Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.
Steps to reproduce the bug:
1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
mapping.
The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
their keys solely depend on the user space address.
2. Lock mutex1 and mutex2
3. Create thread1 and in the thread function lock mutex1, which
results in thread1 blocking on the locked mutex1.
4. Create thread2 and in the thread function lock mutex2, which
results in thread2 blocking on the locked mutex2.
5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
still blocks on mutex2 because the futex_key points to mutex1.
To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.
Mappings which are not based on hugetlbfs are not affected and still
use page->index.
Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.
[ tglx: Massaged changelog ]
Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/hugetlb.h | 16 ++++++++++++++++
kernel/futex.c | 3 ++-
mm/hugetlb.c | 17 +++++++++++++++++
3 files changed, 35 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -358,6 +358,17 @@ static inline int hstate_index(struct hs
return h - hstates;
}
+pgoff_t __basepage_index(struct page *page);
+
+/* Return page->index in PAGE_SIZE units */
+static inline pgoff_t basepage_index(struct page *page)
+{
+ if (!PageCompound(page))
+ return page->index;
+
+ return __basepage_index(page);
+}
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
#define alloc_huge_page_node(h, nid) NULL
@@ -378,6 +389,11 @@ static inline unsigned int pages_per_hug
}
#define hstate_index_to_shift(index) 0
#define hstate_index(h) 0
+
+static inline pgoff_t basepage_index(struct page *page)
+{
+ return page->index;
+}
#endif /* CONFIG_HUGETLB_PAGE */
#endif /* _LINUX_HUGETLB_H */
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -61,6 +61,7 @@
#include <linux/nsproxy.h>
#include <linux/ptrace.h>
#include <linux/sched/rt.h>
+#include <linux/hugetlb.h>
#include <asm/futex.h>
@@ -365,7 +366,7 @@ again:
} else {
key->both.offset |= FUT_OFF_INODE; /* inode-based key */
key->shared.inode = page_head->mapping->host;
- key->shared.pgoff = page_head->index;
+ key->shared.pgoff = basepage_index(page);
}
get_futex_key_refs(key);
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -690,6 +690,23 @@ int PageHuge(struct page *page)
}
EXPORT_SYMBOL_GPL(PageHuge);
+pgoff_t __basepage_index(struct page *page)
+{
+ struct page *page_head = compound_head(page);
+ pgoff_t index = page_index(page_head);
+ unsigned long compound_idx;
+
+ if (!PageHuge(page_head))
+ return page_index(page);
+
+ if (compound_order(page_head) >= MAX_ORDER)
+ compound_idx = page_to_pfn(page) - page_to_pfn(page_head);
+ else
+ compound_idx = page - page_head;
+
+ return (index << compound_order(page_head)) + compound_idx;
+}
+
static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
{
struct page *page;
next prev parent reply other threads:[~2013-07-11 22:01 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-11 22:01 [ 00/19] 3.10.1-stable review Greg Kroah-Hartman
2013-07-11 22:01 ` [ 01/19] libceph: Fix NULL pointer dereference in auth client code Greg Kroah-Hartman
2013-07-11 22:01 ` [ 02/19] ceph: fix sleeping function called from invalid context Greg Kroah-Hartman
2013-07-11 22:01 ` [ 03/19] libceph: fix invalid unsigned->signed conversion for timespec encoding Greg Kroah-Hartman
2013-07-11 22:01 ` [ 04/19] drivers/cdrom/cdrom.c: use kzalloc() for failing hardware Greg Kroah-Hartman
2013-07-11 22:01 ` [ 05/19] module: do percpu allocation after uniqueness check. No, really! Greg Kroah-Hartman
2013-07-11 22:01 ` [ 06/19] charger-manager: Ensure event is not used as format string Greg Kroah-Hartman
2013-07-11 22:01 ` [ 07/19] hpfs: better test for errors Greg Kroah-Hartman
2013-07-11 22:01 ` [ 08/19] block: do not pass disk names as format strings Greg Kroah-Hartman
2013-07-11 22:01 ` [ 09/19] crypto: sanitize argument for format string Greg Kroah-Hartman
2013-07-11 22:01 ` [ 10/19] MAINTAINERS: add stable_kernel_rules.txt to stable maintainer information Greg Kroah-Hartman
2013-07-11 22:01 ` Greg Kroah-Hartman [this message]
2013-07-11 22:01 ` [ 12/19] tty: Reset itty for other pty Greg Kroah-Hartman
2013-07-11 22:01 ` [ 13/19] Revert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller" Greg Kroah-Hartman
2013-07-11 22:01 ` [ 14/19] NFSv4.1 end back channel session draining Greg Kroah-Hartman
2013-07-11 22:01 ` [ 15/19] nfsd4: fix decoding of compounds across page boundaries Greg Kroah-Hartman
2013-07-11 22:01 ` [ 16/19] KVM: VMX: mark unusable segment as nonpresent Greg Kroah-Hartman
2013-07-11 22:01 ` [ 17/19] SCSI: sd: Fix parsing of temporary cache mode prefix Greg Kroah-Hartman
2013-07-11 22:01 ` [ 18/19] cpufreq: Fix cpufreq regression after suspend/resume Greg Kroah-Hartman
2013-07-11 22:01 ` [ 19/19] Revert "memcg: avoid dangling reference count in creation failure" Greg Kroah-Hartman
2013-07-11 22:14 ` [ 00/19] 3.10.1-stable review Josh Boyer
2013-07-14 22:54 ` Benjamin Herrenschmidt
2013-07-11 22:29 ` Dave Jones
2013-07-11 22:44 ` Greg Kroah-Hartman
2013-07-12 1:51 ` Steven Rostedt
2013-07-12 14:15 ` Guenter Roeck
2013-07-12 15:22 ` Linus Torvalds
2013-07-12 15:47 ` Steven Rostedt
2013-07-12 15:55 ` Linus Torvalds
2013-07-12 16:17 ` Ingo Molnar
2013-07-12 16:35 ` Josh Boyer
2013-07-12 16:36 ` Josh Boyer
2013-07-12 17:05 ` Greg Kroah-Hartman
2013-07-14 22:40 ` Benjamin Herrenschmidt
2013-07-12 16:48 ` Steven Rostedt
2013-07-12 17:31 ` Guenter Roeck
2013-07-12 17:50 ` Linus Torvalds
2013-07-12 18:11 ` Guenter Roeck
2013-07-12 19:35 ` Theodore Ts'o
2013-07-12 19:49 ` Steven Rostedt
2013-07-12 19:55 ` Willy Tarreau
2013-07-12 20:19 ` Dave Jones
2013-07-12 20:28 ` Steven Rostedt
2013-07-12 20:31 ` Steven Rostedt
2013-07-12 21:19 ` Justin M. Forbes
2013-07-13 0:47 ` Jochen Striepe
2013-07-13 11:11 ` Steven Rostedt
2013-07-13 15:10 ` Dave Jones
2013-07-13 15:54 ` Steven Rostedt
2013-07-12 19:50 ` Willy Tarreau
2013-07-12 20:47 ` Theodore Ts'o
2013-07-12 21:02 ` Guenter Roeck
2013-07-13 6:22 ` Greg Kroah-Hartman
2013-07-13 6:36 ` Willy Tarreau
2013-07-13 6:48 ` Greg Kroah-Hartman
2013-07-13 7:12 ` Willy Tarreau
2013-07-15 4:12 ` Li Zefan
2013-07-15 4:43 ` Willy Tarreau
2013-07-13 11:42 ` Theodore Ts'o
2013-07-13 18:27 ` Greg Kroah-Hartman
2013-07-14 2:22 ` Theodore Ts'o
2013-07-14 3:51 ` Greg Kroah-Hartman
2013-07-14 5:24 ` Guenter Roeck
2013-07-14 20:31 ` Geert Uytterhoeven
2013-07-13 6:43 ` Guenter Roeck
2013-07-13 6:58 ` Greg Kroah-Hartman
2013-07-14 23:52 ` Benjamin Herrenschmidt
2013-07-15 1:40 ` Linus Torvalds
2013-07-15 2:08 ` Benjamin Herrenschmidt
2013-07-14 22:58 ` Benjamin Herrenschmidt
2013-07-12 0:50 ` When to push bug fixes to mainline Theodore Ts'o
2013-07-12 1:20 ` [Ksummit-2013-discuss] " Nicholas A. Bellinger
2013-07-12 1:54 ` Steven Rostedt
2013-07-12 9:46 ` Jiri Kosina
2013-07-12 11:19 ` Josh Boyer
2013-07-12 2:57 ` John W. Linville
2013-07-12 3:34 ` Greg Kroah-Hartman
2013-07-12 7:32 ` James Bottomley
2013-07-12 17:20 ` H. Peter Anvin
2013-07-12 17:28 ` Greg Kroah-Hartman
2013-07-12 17:50 ` Steven Rostedt
2013-07-12 17:59 ` Linus Torvalds
2013-07-12 18:14 ` Steven Rostedt
2013-07-13 17:52 ` Geert Uytterhoeven
2013-07-12 17:57 ` Theodore Ts'o
2013-07-12 18:13 ` Guenter Roeck
2013-07-12 18:16 ` H. Peter Anvin
2013-07-12 18:28 ` H. Peter Anvin
2013-07-12 19:44 ` Linus Torvalds
2013-07-12 19:53 ` Steven Rostedt
2013-07-12 20:09 ` Shuah Khan
2013-07-12 20:33 ` Greg Kroah-Hartman
2013-07-12 20:46 ` Steven Rostedt
2013-07-12 22:19 ` H. Peter Anvin
2013-07-12 22:17 ` H. Peter Anvin
2013-07-13 6:44 ` Ingo Molnar
2013-07-13 0:24 ` Rafael J. Wysocki
2013-07-13 1:32 ` Greg Kroah-Hartman
2013-07-13 12:16 ` Rafael J. Wysocki
2013-07-12 3:25 ` Li Zefan
2013-07-15 4:22 ` Rob Landley
2013-07-12 5:14 ` Willy Tarreau
2013-07-16 7:19 ` David Lang
2013-07-16 16:40 ` [Ksummit-2013-discuss] " Takashi Iwai
2013-07-16 16:42 ` David Lang
2013-07-16 19:29 ` Takashi Iwai
2013-07-16 16:59 ` Mark Brown
2013-07-16 17:58 ` Luck, Tony
2013-07-16 18:29 ` Linus Torvalds
2013-07-16 18:41 ` Steven Rostedt
2013-07-16 19:11 ` Greg Kroah-Hartman
2013-07-16 19:43 ` Steven Rostedt
2013-07-16 20:10 ` Willy Tarreau
2013-07-17 2:58 ` Ben Hutchings
2013-07-17 9:43 ` Li Zefan
2013-07-16 18:48 ` Willy Tarreau
2013-07-19 10:13 ` Ingo Molnar
2013-07-16 18:39 ` Willy Tarreau
2013-07-16 18:40 ` H. Peter Anvin
2013-07-16 20:29 ` David Lang
2013-07-12 17:11 ` H. Peter Anvin
2013-07-12 17:20 ` [ 00/19] 3.10.1-stable review Shuah Khan
2013-07-12 17:29 ` Greg Kroah-Hartman
2013-07-13 4:14 ` Satoru Takeuchi
2013-07-14 23:06 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130711214831.923621274@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dvhart@linux.intel.com \
--cc=jiang.biao2@zte.com.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=ma.chenggong@zte.com.cn \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=zhang.yi20@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).