From: Brice Goglin <Brice.Goglin@inria.fr>
To: Andrew Morton <akpm@osdl.org>,
Christoph Lameter <cl@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
Date: Sun, 07 Dec 2008 03:14:17 +0100 [thread overview]
Message-ID: <493B3179.7030204@inria.fr> (raw)
Hello,
I have been seeing some deadlocks that seem to be related to do_pages_stat()
page-faulting while holding the mmap_sem. Part of my sys_move_pages() rework
has been applied to 2.6.28-rc. So do_pages_stat() now gets page addresses
from user-space (and puts the result back to user-space) while holding the
mmap_sem for read. If there's a page-fault there, the page-fault handler
grabs the mmap_sem for read again. But if another thread took it for write
in the meantime (for instance in mprotect), it deadlocks since rwsem readers
are blocked if a writer is already waiting.
Reading the archives, I see some similar deadlocks a couple years ago but
I can't find the final answer/fix. From what I understand, the mmap_sem
fairness could not be changed easily. So I am not sure whether accessing
user-space while holding mmap_sem for read is still valid/recommended today.
But the behavior of do_pages_stat() changed between 2.6.27 and 2.6.28-rc
because of my patch, and this deadlock seems to be happening for real.
So I would like to fix this small regression in 2.6.28. The patch below seems
to make my do_pages_stat() deadlock disappear here.
Brice
[PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat
Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.
This patch adds a temporary address and status buffer so as to only hold
mmap_sem while working on these kernel buffers. This is implemented by
extracting do_pages_stat_array() out of do_pages_stat().
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
diff --git a/mm/migrate.c b/mm/migrate.c
index 1e0d6b2..4350101 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -987,25 +987,18 @@ out:
/*
* Determine the nodes of an array of pages and store it in an array of status.
*/
-static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
- const void __user * __user *pages,
- int __user *status)
+static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
+ const void * __user *pages, int *status)
{
unsigned long i;
- int err;
down_read(&mm->mmap_sem);
- for (i = 0; i < nr_pages; i++) {
- const void __user *p;
- unsigned long addr;
+ for (i = 0; i < nr_pages; i++, pages++, status++) {
+ unsigned long addr = (unsigned long)(*pages);
struct vm_area_struct *vma;
struct page *page;
-
- err = -EFAULT;
- if (get_user(p, pages+i))
- goto out;
- addr = (unsigned long) p;
+ int err;
vma = find_vma(mm, addr);
if (!vma)
@@ -1024,12 +1017,59 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
err = page_to_nid(page);
set_status:
- put_user(err, status+i);
+ *status = err;
+ }
+
+ up_read(&mm->mmap_sem);
+}
+
+/*
+ * Determine the nodes of a user array of pages and store it in
+ * a user array of status.
+ */
+static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
+ const void __user * __user *pages,
+ int __user *status)
+{
+ const void * __user *chunk_pages;
+ int *chunk_status;
+ unsigned long i,chunk_nr;
+ int err;
+
+ err = -ENOMEM;
+ chunk_pages = (const void * __user *)__get_free_page(GFP_KERNEL);
+ if (!chunk_pages)
+ goto out;
+ chunk_status = (int *)__get_free_page(GFP_KERNEL);
+ if (!chunk_status)
+ goto out_with_chunk_pages;
+
+ chunk_nr = PAGE_SIZE/max(sizeof(*chunk_pages), sizeof(*chunk_status));
+ for(i = 0; i < nr_pages; i += chunk_nr, pages += chunk_nr, status += chunk_nr) {
+ if (chunk_nr + i > nr_pages)
+ chunk_nr = nr_pages - i;
+
+ err = copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages));
+ if (err) {
+ err = -EFAULT;
+ goto out_with_chunk_status;
+ }
+
+ do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);
+
+ err = copy_to_user(status, chunk_status, chunk_nr * sizeof(*chunk_status));
+ if (err) {
+ err = -EFAULT;
+ goto out_with_chunk_status;
+ }
}
err = 0;
+out_with_chunk_status:
+ free_page((unsigned long)chunk_status);
+out_with_chunk_pages:
+ free_page((unsigned long)chunk_pages);
out:
- up_read(&mm->mmap_sem);
return err;
}
next reply other threads:[~2008-12-07 2:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-07 2:14 Brice Goglin [this message]
2008-12-07 2:50 ` [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat? Andrew Morton
2008-12-07 14:21 ` Brice Goglin
2008-12-09 14:19 ` Christoph Lameter
2008-12-09 16:35 ` Peter Zijlstra
2008-12-09 16:47 ` Brice Goglin
2008-12-09 17:46 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=493B3179.7030204@inria.fr \
--to=brice.goglin@inria.fr \
--cc=akpm@osdl.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox