From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Robin Holt <holt@sgi.com>, Andrew Morton <akpm@osdl.org>,
Linus Torvalds <torvalds@osdl.org>,
Roland McGrath <roland@redhat.com>,
Hugh Dickins <hugh@veritas.com>,
linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [patch 2.6.13-rc4] fix get_user_pages bug
Date: Mon, 1 Aug 2005 11:19:56 +0200 [thread overview]
Message-ID: <20050801091956.GA3950@elte.hu> (raw)
In-Reply-To: <42EDDB82.1040900@yahoo.com.au>
* Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Hi,
>
> Not sure if this should be fixed for 2.6.13. It can result in
> pagecache corruption: so I guess that answers my own question.
>
> This was tested by Robin and appears to solve the problem. Roland had
> a quick look and thought the basic idea was sound. I'd like to get a
> couple more acks before going forward, and in particular Robin was
> contemplating possible efficiency improvements (although efficiency
> can wait on correctness).
>
> Feedback please, anyone.
it looks good to me, but wouldnt it be simpler (in terms of patch and
architecture impact) to always retry the follow_page() in
get_user_pages(), in case of a minor fault? The sequence of minor faults
must always be finite so it's a safe solution, and should solve the race
too. The extra overhead of an additional follow_page() is small.
Especially with 2.6.13 being close this looks like the safest bet, any
improvement to this (i.e. your patch) can then be done in 2.6.14.
Ingo
When get_user_pages for write access races with another process in the page
fault handler that has established the pte for read access, handle_mm_fault
in get_user_pages will return VM_FAULT_MINOR even if it hasn't made the page
correctly writeable (for example, broken COW).
Thus the assumption that get_user_pages has a writeable page at the mapping
after handle_mm_fault returns is incorrect. Fix this by retrying the lookup
and fault in get_user_pages before making the assumption that we have a
writeable page.
Great work by Robin Holt <holt@sgi.com> to debug the problem.
Originally-from: Nick Piggin <npiggin@suse.de>
simplified the patch into a two-liner:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mm/memory.c | 7 +++++++
1 files changed, 7 insertions(+)
Index: linux-sched-curr/mm/memory.c
===================================================================
--- linux-sched-curr.orig/mm/memory.c
+++ linux-sched-curr/mm/memory.c
@@ -961,6 +961,13 @@ int get_user_pages(struct task_struct *t
switch (handle_mm_fault(mm,vma,start,write)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
+ /*
+ * We might have raced with a readonly
+ * pagefault, retry to make sure we
+ * got write access:
+ */
+ if (write)
+ continue;
break;
case VM_FAULT_MAJOR:
tsk->maj_flt++;
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Robin Holt <holt@sgi.com>, Andrew Morton <akpm@osdl.org>,
Linus Torvalds <torvalds@osdl.org>,
Roland McGrath <roland@redhat.com>,
Hugh Dickins <hugh@veritas.com>,
linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [patch 2.6.13-rc4] fix get_user_pages bug
Date: Mon, 1 Aug 2005 11:19:56 +0200 [thread overview]
Message-ID: <20050801091956.GA3950@elte.hu> (raw)
In-Reply-To: <42EDDB82.1040900@yahoo.com.au>
* Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Hi,
>
> Not sure if this should be fixed for 2.6.13. It can result in
> pagecache corruption: so I guess that answers my own question.
>
> This was tested by Robin and appears to solve the problem. Roland had
> a quick look and thought the basic idea was sound. I'd like to get a
> couple more acks before going forward, and in particular Robin was
> contemplating possible efficiency improvements (although efficiency
> can wait on correctness).
>
> Feedback please, anyone.
it looks good to me, but wouldnt it be simpler (in terms of patch and
architecture impact) to always retry the follow_page() in
get_user_pages(), in case of a minor fault? The sequence of minor faults
must always be finite so it's a safe solution, and should solve the race
too. The extra overhead of an additional follow_page() is small.
Especially with 2.6.13 being close this looks like the safest bet, any
improvement to this (i.e. your patch) can then be done in 2.6.14.
Ingo
When get_user_pages for write access races with another process in the page
fault handler that has established the pte for read access, handle_mm_fault
in get_user_pages will return VM_FAULT_MINOR even if it hasn't made the page
correctly writeable (for example, broken COW).
Thus the assumption that get_user_pages has a writeable page at the mapping
after handle_mm_fault returns is incorrect. Fix this by retrying the lookup
and fault in get_user_pages before making the assumption that we have a
writeable page.
Great work by Robin Holt <holt@sgi.com> to debug the problem.
Originally-from: Nick Piggin <npiggin@suse.de>
simplified the patch into a two-liner:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mm/memory.c | 7 +++++++
1 files changed, 7 insertions(+)
Index: linux-sched-curr/mm/memory.c
===================================================================
--- linux-sched-curr.orig/mm/memory.c
+++ linux-sched-curr/mm/memory.c
@@ -961,6 +961,13 @@ int get_user_pages(struct task_struct *t
switch (handle_mm_fault(mm,vma,start,write)) {
case VM_FAULT_MINOR:
tsk->min_flt++;
+ /*
+ * We might have raced with a readonly
+ * pagefault, retry to make sure we
+ * got write access:
+ */
+ if (write)
+ continue;
break;
case VM_FAULT_MAJOR:
tsk->maj_flt++;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2005-08-01 9:20 UTC|newest]
Thread overview: 133+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-30 20:53 get_user_pages() with write=1 and force=1 gets read-only pages Robin Holt
2005-07-30 22:13 ` Hugh Dickins
2005-07-31 1:52 ` Nick Piggin
2005-07-31 10:52 ` Robin Holt
2005-07-31 11:07 ` Nick Piggin
2005-07-31 11:30 ` Robin Holt
2005-07-31 11:39 ` Robin Holt
2005-07-31 12:09 ` Robin Holt
2005-07-31 22:27 ` Nick Piggin
2005-08-01 3:22 ` Roland McGrath
2005-08-01 8:21 ` [patch 2.6.13-rc4] fix get_user_pages bug Nick Piggin
2005-08-01 9:19 ` Ingo Molnar [this message]
2005-08-01 9:19 ` Ingo Molnar
2005-08-01 9:27 ` Nick Piggin
2005-08-01 9:27 ` Nick Piggin
2005-08-01 10:15 ` Ingo Molnar
2005-08-01 10:15 ` Ingo Molnar
2005-08-01 10:57 ` Nick Piggin
2005-08-01 10:57 ` Nick Piggin
2005-08-01 19:43 ` Hugh Dickins
2005-08-01 19:43 ` Hugh Dickins
2005-08-01 20:08 ` Linus Torvalds
2005-08-01 20:08 ` Linus Torvalds
2005-08-01 21:06 ` Hugh Dickins
2005-08-01 21:06 ` Hugh Dickins
2005-08-01 21:51 ` Linus Torvalds
2005-08-01 21:51 ` Linus Torvalds
2005-08-01 22:01 ` Linus Torvalds
2005-08-01 22:01 ` Linus Torvalds
2005-08-02 12:01 ` Martin Schwidefsky
2005-08-02 12:01 ` Martin Schwidefsky
2005-08-02 12:26 ` Hugh Dickins
2005-08-02 12:26 ` Hugh Dickins
2005-08-02 12:28 ` Nick Piggin
2005-08-02 15:19 ` Martin Schwidefsky
2005-08-02 15:19 ` Martin Schwidefsky
2005-08-02 15:30 ` Linus Torvalds
2005-08-02 15:30 ` Linus Torvalds
2005-08-02 16:03 ` Hugh Dickins
2005-08-02 16:03 ` Hugh Dickins
2005-08-02 16:25 ` Linus Torvalds
2005-08-02 16:25 ` Linus Torvalds
2005-08-02 17:02 ` Linus Torvalds
2005-08-02 17:02 ` Linus Torvalds
2005-08-02 17:27 ` Hugh Dickins
2005-08-02 17:27 ` Hugh Dickins
2005-08-02 17:21 ` Hugh Dickins
2005-08-02 17:21 ` Hugh Dickins
2005-08-02 18:47 ` Linus Torvalds
2005-08-02 18:47 ` Linus Torvalds
2005-08-02 19:20 ` Hugh Dickins
2005-08-02 19:20 ` Hugh Dickins
2005-08-02 19:54 ` Linus Torvalds
2005-08-02 19:54 ` Linus Torvalds
2005-08-02 20:55 ` Hugh Dickins
2005-08-02 20:55 ` Hugh Dickins
2005-08-03 10:24 ` Nick Piggin
2005-08-03 11:47 ` Hugh Dickins
2005-08-03 11:47 ` Hugh Dickins
2005-08-03 12:13 ` Nick Piggin
2005-08-03 12:13 ` Nick Piggin
2005-08-03 16:12 ` Linus Torvalds
2005-08-03 16:12 ` Linus Torvalds
2005-08-03 16:39 ` Linus Torvalds
2005-08-03 16:39 ` Linus Torvalds
2005-08-03 16:42 ` Linus Torvalds
2005-08-03 16:42 ` Linus Torvalds
2005-08-03 17:12 ` Hugh Dickins
2005-08-03 17:12 ` Hugh Dickins
2005-08-03 23:03 ` Nick Piggin
2005-08-03 23:03 ` Nick Piggin
2005-08-04 14:14 ` Alexander Nyberg
2005-08-04 14:14 ` Alexander Nyberg
2005-08-04 14:30 ` Nick Piggin
2005-08-04 14:30 ` Nick Piggin
2005-08-04 15:00 ` Alexander Nyberg
2005-08-04 15:00 ` Alexander Nyberg
2005-08-04 15:35 ` Hugh Dickins
2005-08-04 15:35 ` Hugh Dickins
2005-08-04 16:32 ` Russell King
2005-08-04 16:32 ` Russell King
2005-08-04 15:36 ` Linus Torvalds
2005-08-04 15:36 ` Linus Torvalds
2005-08-04 16:29 ` Russell King
2005-08-04 16:29 ` Russell King
2005-08-03 10:24 ` Martin Schwidefsky
2005-08-03 10:24 ` Martin Schwidefsky
2005-08-03 11:57 ` Hugh Dickins
2005-08-03 11:57 ` Hugh Dickins
2005-08-02 16:44 ` Martin Schwidefsky
2005-08-02 16:44 ` Martin Schwidefsky
2005-08-01 15:42 ` Linus Torvalds
2005-08-01 15:42 ` Linus Torvalds
2005-08-01 18:18 ` Linus Torvalds
2005-08-01 18:18 ` Linus Torvalds
2005-08-03 8:24 ` Robin Holt
2005-08-03 8:24 ` Robin Holt
2005-08-03 11:31 ` Hugh Dickins
2005-08-03 11:31 ` Hugh Dickins
2005-08-04 11:48 ` Robin Holt
2005-08-04 11:48 ` Robin Holt
2005-08-04 13:04 ` Hugh Dickins
2005-08-04 13:04 ` Hugh Dickins
2005-08-01 19:29 ` Hugh Dickins
2005-08-01 19:29 ` Hugh Dickins
2005-08-01 19:48 ` Linus Torvalds
2005-08-01 19:48 ` Linus Torvalds
2005-08-02 8:07 ` Martin Schwidefsky
2005-08-02 8:07 ` Martin Schwidefsky
2005-08-01 19:57 ` Andrew Morton
2005-08-01 19:57 ` Andrew Morton
2005-08-01 20:16 ` Linus Torvalds
2005-08-01 20:16 ` Linus Torvalds
2005-08-02 0:14 ` Nick Piggin
2005-08-02 0:14 ` Nick Piggin
2005-08-02 1:27 ` Nick Piggin
2005-08-02 1:27 ` Nick Piggin
2005-08-02 3:45 ` Linus Torvalds
2005-08-02 3:45 ` Linus Torvalds
2005-08-02 4:25 ` Nick Piggin
2005-08-02 4:25 ` Nick Piggin
2005-08-02 4:35 ` Linus Torvalds
2005-08-02 4:35 ` Linus Torvalds
2005-08-01 20:03 ` Hugh Dickins
2005-08-01 20:03 ` Hugh Dickins
2005-08-01 20:12 ` Andrew Morton
2005-08-01 20:12 ` Andrew Morton
2005-08-01 20:26 ` Linus Torvalds
2005-08-01 20:26 ` Linus Torvalds
2005-08-01 20:51 ` Hugh Dickins
2005-08-01 20:51 ` Hugh Dickins
-- strict thread matches above, loose matches on Subject: below --
2005-08-02 14:02 Dan Higgins
2005-08-02 14:02 ` Dan Higgins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050801091956.GA3950@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@osdl.org \
--cc=holt@sgi.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=roland@redhat.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.