[PATCH/RFC] Futex mmap_sem deadlock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: olof@austin.ibm.com (Olof Johansson)
To: linux-kernel@vger.kernel.org
Cc: akpm@osdl.org, torvalds@osdl.org, jamie@shareable.org,
	rusty@rustcorp.com.au
Subject: [PATCH/RFC] Futex mmap_sem deadlock
Date: Tue, 22 Feb 2005 13:06:46 -0600	[thread overview]
Message-ID: <20050222190646.GA7079@austin.ibm.com> (raw)

Hi,

Consider a small testcase that spawns off two threads, either thread
doing a loop of:

	buf = mmap /dev/zero MAP_SHARED for 0x100000 bytes
	call sys_futex (buf+page, FUTEX_WAIT, 1, NULL, NULL) for each page in said mmap
	munmap(buf)
	repeat

This will quickly lock up, since the futex_wait code dows a
down_read(mmap_sem), then a get_user().

The do_page_fault code on ppc64 (as well as other architectures) needs
to take the same semaphore for reading. This is all good until the
second thread comes into play: Its mmap call tries to take the same
semaphore for writing which causes in the do_page_fault down_read()
to get stuck. Classic deadlock.

Stacks are as follows:

a.out         S 000000000ff9772c 12288  6491   6484  6492               (NOTLB)
Call Trace:
[c0000003eb72fa10] [0000000030ecbd28] 0x30ecbd28 (unreliable)
[c0000003eb72fbe0] [c0000000000119ec] .__switch_to+0xa4/0xf0
[c0000003eb72fc70] [c00000000043ffbc] .schedule+0x4d4/0xe5c
[c0000003eb72fd90] [c0000000000247c8] .sys32_rt_sigsuspend+0xe0/0x120
[c0000003eb72fe30] [c00000000000d7e8] .ppc32_rt_sigsuspend+0x8/0x14
a.out         D 000000000ff0ad38 13632  6492   6491  6493               (NOTLB)
Call Trace:
[c00000003a1a7830] [c00000003a1a7850] 0xc00000003a1a7850 (unreliable)
[c00000003a1a7a00] [c0000000000119ec] .__switch_to+0xa4/0xf0
[c00000003a1a7a90] [c00000000043ffbc] .schedule+0x4d4/0xe5c
[c00000003a1a7bb0] [c000000000441ec0] .rwsem_down_write_failed+0x138/0x2f0
[c00000003a1a7c90] [c000000000098db0] .sys_mprotect+0x7bc/0x81c
[c00000003a1a7e30] [c00000000000d600] syscall_exit+0x0/0x18
a.out         D 0000000010000654 12832  6493   6492                     (NOTLB)
Call Trace:
[c00000000f2674f0] [c0000000000119ec] .__switch_to+0xa4/0xf0
[c00000000f267580] [c00000000043ffbc] .schedule+0x4d4/0xe5c
[c00000000f2676a0] [c0000000004421b4] .rwsem_down_read_failed+0x13c/0x2f4
[c00000000f267780] [c00000000003ba6c] .do_page_fault+0x66c/0x738
[c00000000f267900] [c00000000000b2dc] .handle_page_fault+0x20/0x54
--- Exception: 301 at .do_futex+0x5d4/0x720
    LR = .do_futex+0x1e0/0x720
[c00000000f267d60] [c000000000078124] .compat_sys_futex+0xa4/0x16c
[c00000000f267e30] [c00000000000d600] syscall_exit+0x0/0x18


One attempt to fix this is included below. It works, but I'm not entirely
happy with the fact that it's a bit messy solution. If anyone has a
better idea for how to solve it I'd be all ears.

Auditing other read-takers of mmap_sem, I found one more exposure that
could be solved by just moving code around.

-----

Some futex functions do get_user calls while holding mmap_sem for
reading. If get_user() faults, and another thread happens to be in mmap
(or somewhere else holding waiting on down_write for the same semaphore),
then do_page_fault will deadlock. Most architectures seem to be exposed
to this.

To avoid it, make sure the page is available. If not, release the
semaphore, fault it in and retake it.

I also found another exposure by inspection, moving some of the code
around avoids the possible deadlock there.

Signed-off-by: Olof Johansson <olof@austin.ibm.com>


Index: linux-2.5/kernel/futex.c
===================================================================
--- linux-2.5.orig/kernel/futex.c	2005-02-21 16:09:38.000000000 -0600
+++ linux-2.5/kernel/futex.c	2005-02-22 12:35:01.000000000 -0600
@@ -326,11 +326,22 @@
 	struct futex_hash_bucket *bh1, *bh2;
 	struct list_head *head1;
 	struct futex_q *this, *next;
-	int ret, drop_count = 0;
+	int ret, curval, drop_count = 0;
 	unsigned int nqueued;
 
 	down_read(&current->mm->mmap_sem);
 
+	/* Make sure we can access the page, since there's a risk to
+	 * deadlock with do_page_fault() if get_user() faults.
+	 */
+	while (!check_user_page_readable(current->mm, uaddr1)) {
+		up_read(&current->mm->mmap_sem);
+		/* Fault in the page through get_user() but discard result */
+		if (get_user(curval, (int __user *)uaddr1) != 0)
+			return -EFAULT;
+		down_read(&current->mm->mmap_sem);
+	}
+
 	ret = get_futex_key(uaddr1, &key1);
 	if (unlikely(ret != 0))
 		goto out;
@@ -343,8 +354,6 @@
 
 	nqueued = bh1->nqueued;
 	if (likely(valp != NULL)) {
-		int curval;
-
 		/* In order to avoid doing get_user while
 		   holding bh1->lock and bh2->lock, nqueued
 		   (monotonically increasing field) must be first
@@ -482,6 +491,19 @@
 
 	down_read(&current->mm->mmap_sem);
 
+	/* Make sure we can access the page, since there's a risk to
+	 * deadlock with do_page_fault() if get_user() faults.
+	 */
+	while (!check_user_page_readable(current->mm, uaddr)) {
+		up_read(&current->mm->mmap_sem);
+
+		/* Fault in the page through get_user() but discard result */
+		if (get_user(curval, (int __user *)uaddr) != 0)
+			return -EFAULT;
+
+		down_read(&current->mm->mmap_sem);
+	}
+
 	ret = get_futex_key(uaddr, &q.key);
 	if (unlikely(ret != 0))
 		goto out_release_sem;
Index: linux-2.5/mm/mempolicy.c
===================================================================
--- linux-2.5.orig/mm/mempolicy.c	2005-02-04 00:27:40.000000000 -0600
+++ linux-2.5/mm/mempolicy.c	2005-02-21 16:43:08.000000000 -0600
@@ -486,6 +486,7 @@
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma = NULL;
 	struct mempolicy *pol = current->mempolicy;
+	DECLARE_BITMAP(nodes, MAX_NUMNODES);
 
 	if (flags & ~(unsigned long)(MPOL_F_NODE|MPOL_F_ADDR))
 		return -EINVAL;
@@ -524,16 +525,21 @@
 	} else
 		pval = pol->policy;
 
+	if (nmask)
+		get_zonemask(pol, nodes);
+
+	if (vma) {
+		up_read(&current->mm->mmap_sem);
+		vma = NULL;
+	}
+
 	err = -EFAULT;
 	if (policy && put_user(pval, policy))
 		goto out;
 
 	err = 0;
-	if (nmask) {
-		DECLARE_BITMAP(nodes, MAX_NUMNODES);
-		get_zonemask(pol, nodes);
+	if (nmask)
 		err = copy_nodes_to_user(nmask, maxnode, nodes, sizeof(nodes));
-	}
 
  out:
 	if (vma)

next             reply	other threads:[~2005-02-22 19:15 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-22 19:06 Olof Johansson [this message]
2005-02-22 19:36 ` [PATCH/RFC] Futex mmap_sem deadlock Linus Torvalds
2005-02-22 21:16   ` Benjamin Herrenschmidt
2005-02-22 21:19     ` Benjamin Herrenschmidt
2005-02-22 21:31     ` Linus Torvalds
2005-02-22 21:42       ` Benjamin Herrenschmidt
2005-02-22 22:10         ` Linus Torvalds
2005-02-22 22:24           ` Benjamin Herrenschmidt
2005-02-22 23:08           ` Greg KH
2005-02-23 11:24   ` David Howells
2005-02-22 19:55 ` Andrew Morton
2005-02-22 21:07   ` Jamie Lokier
2005-02-22 21:19     ` Olof Johansson
2005-02-22 22:09       ` Jamie Lokier
2005-02-22 21:19     ` Chris Friesen
2005-02-22 21:27       ` Jamie Lokier
2005-02-22 21:30     ` Linus Torvalds
2005-02-22 22:34       ` Jamie Lokier
2005-02-22 22:42         ` Olof Johansson
2005-02-22 23:20           ` Andrew Morton
2005-02-22 23:23             ` Olof Johansson
2005-02-23 11:39       ` David Howells
2005-02-23 16:22         ` Olof Johansson
2005-02-23 18:44           ` David Howells
2005-02-23 14:49       ` Joe Korty
2005-02-23 15:54         ` Linus Torvalds
2005-02-23 17:10           ` Olof Johansson
2005-02-23 17:37             ` Arjan van de Ven
2005-02-23 18:22             ` Jamie Lokier
2005-02-23 18:34               ` Linus Torvalds
2005-02-23 18:49                 ` Jamie Lokier
2005-02-23 19:12                   ` Olof Johansson
2005-02-23 22:00                     ` Linus Torvalds
2005-02-24  0:00                     ` Jamie Lokier
2005-02-23 18:37               ` Olof Johansson
2005-02-22 21:40     ` Andrew Morton
2005-02-22 21:59       ` Linus Torvalds
2005-02-23 11:42       ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050222190646.GA7079@austin.ibm.com \
    --to=olof@austin.ibm.com \
    --cc=akpm@osdl.org \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox