From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>, Nick Piggin <npiggin@novell.com>,
Hugh Dickins <hugh@veritas.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix]
Date: Wed, 11 Mar 2009 22:57:21 +0100 [thread overview]
Message-ID: <20090311215721.GS27823@random.random> (raw)
In-Reply-To: <alpine.LFD.2.00.0903111417230.32478@localhost.localdomain>
On Wed, Mar 11, 2009 at 02:28:08PM -0700, Linus Torvalds wrote:
> The fact that the non-fast "get_user_pages()" takes the mmap semaphore for
> reading doesn't even protect that. It just means that the pages made sense
> at the time the get_user_pages() happened, not necessarily at the time
> when the actual use of them did.
Indeed this is a generic problem, not specific to
get_user_pages_fast. get_user_pages_fast just adds a few complications
to serialize against.
> O_DIRECT is actually the _simple_ case, since we won't be returning until
> it is done (ie it's not actually a async interface). So no, O_DIRECT
> doesn't need any interrupt handler games. It would just need to hold the
> sem over the actual call to the filesystem (ie just over the ->direct_IO()
> call).
I don't see how you can solve the race by only holding the sem only
over the direct_IO call (and not until the I/O completion handler
fires). I think to solve the race using mmap_sem only, the bio I/O
completion handler that eventually calls into direct-io.c from irq
context would need to up_read(&mmap_sem).
The way my patch avoids to alter the I/O completion path running from
irq context is by ensuring no I/O is going on at all to the pages that
are being shared with the child, and by ensuring that any gup or
gup-fast will trigger cow before it can write to the shared
page. Pages simply can't be shared before I/O is complete.
> People want the relaxed synchronization we give them, and that's literally
> why get_user_pages_fast exists - because people don't want _more_
> synchronization, they want _less_.
>
> But the thing is, with less synchronization, the behavior really is
> surprising in the edge cases. Which is why I think "threaded fork" plus
> "get_user_pages_fast" just doesn't make sense to even _worry_ about. If
> you use O_DIRECT and mix it with fork, you get what you get, and it's
> random - exactly because people who want O_DIRECT don't want any locking.
>
> It's a user-space issue, not a kernel issue.
I think your point of view is clear, I sure can write userland code
that copes it the currently altered memory protection semantics of
read vs fork if fd is opened with O_DIRECT or drivers using gup, so
I'll let the userland folks comment on it, some are in CC.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-03-11 21:57 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090311170611.GA2079@elte.hu>
2009-03-11 17:33 ` [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix] Linus Torvalds
2009-03-11 17:41 ` Ingo Molnar
2009-03-11 17:58 ` Linus Torvalds
2009-03-11 18:37 ` Andrea Arcangeli
2009-03-11 18:46 ` Linus Torvalds
2009-03-11 19:01 ` Linus Torvalds
2009-03-11 19:59 ` Andrea Arcangeli
2009-03-11 20:19 ` Linus Torvalds
2009-03-11 20:33 ` Linus Torvalds
2009-03-11 20:55 ` Andrea Arcangeli
2009-03-11 21:28 ` Linus Torvalds
2009-03-11 21:57 ` Andrea Arcangeli [this message]
2009-03-11 22:06 ` Linus Torvalds
2009-03-11 22:07 ` Linus Torvalds
2009-03-11 22:22 ` Davide Libenzi
2009-03-11 22:32 ` Linus Torvalds
2009-03-14 5:07 ` Benjamin Herrenschmidt
2009-03-11 20:48 ` Andrea Arcangeli
2009-03-14 5:06 ` Benjamin Herrenschmidt
2009-03-14 5:20 ` Nick Piggin
2009-03-16 16:01 ` KOSAKI Motohiro
2009-03-16 16:23 ` Nick Piggin
2009-03-16 16:32 ` Linus Torvalds
2009-03-16 16:50 ` Nick Piggin
2009-03-16 17:02 ` Linus Torvalds
2009-03-16 17:19 ` Nick Piggin
2009-03-16 17:42 ` Linus Torvalds
2009-03-16 18:02 ` Nick Piggin
2009-03-16 18:05 ` Nick Piggin
2009-03-16 18:17 ` Linus Torvalds
2009-03-16 18:33 ` Nick Piggin
2009-03-16 19:22 ` Linus Torvalds
2009-03-17 5:44 ` Nick Piggin
2009-03-16 18:14 ` Linus Torvalds
2009-03-16 18:29 ` Nick Piggin
2009-03-16 19:17 ` Linus Torvalds
2009-03-17 5:42 ` Nick Piggin
2009-03-17 5:58 ` Nick Piggin
2009-03-16 18:37 ` Andrea Arcangeli
2009-03-16 18:28 ` Andrea Arcangeli
2009-03-16 23:59 ` KAMEZAWA Hiroyuki
2009-03-18 2:04 ` KOSAKI Motohiro
2009-03-22 12:23 ` KOSAKI Motohiro
2009-03-23 0:13 ` KOSAKI Motohiro
2009-03-23 16:29 ` Ingo Molnar
2009-03-23 16:46 ` Linus Torvalds
2009-03-24 5:08 ` KOSAKI Motohiro
2009-03-24 13:43 ` Nick Piggin
2009-03-24 17:56 ` Linus Torvalds
2009-03-30 10:52 ` KOSAKI Motohiro
[not found] ` <200904022307.12043.nickpiggin@yahoo.com.au>
2009-04-03 3:49 ` Nick Piggin
2009-03-17 0:44 ` Linus Torvalds
2009-03-17 0:56 ` KAMEZAWA Hiroyuki
2009-03-17 12:19 ` Andrea Arcangeli
2009-03-17 16:43 ` Linus Torvalds
2009-03-17 17:01 ` Linus Torvalds
2009-03-17 17:10 ` Andrea Arcangeli
2009-03-17 17:43 ` Linus Torvalds
2009-03-17 18:09 ` Linus Torvalds
2009-03-17 18:19 ` Linus Torvalds
2009-03-17 18:46 ` Andrea Arcangeli
2009-03-17 19:03 ` Linus Torvalds
2009-03-17 19:35 ` Andrea Arcangeli
2009-03-17 19:55 ` Linus Torvalds
2009-03-11 19:06 ` Andrea Arcangeli
2009-03-12 5:36 ` Nick Piggin
2009-03-12 16:23 ` Nick Piggin
2009-03-12 17:00 ` Andrea Arcangeli
2009-03-12 17:20 ` Nick Piggin
2009-03-12 17:23 ` Nick Piggin
2009-03-12 18:06 ` Andrea Arcangeli
2009-03-12 18:58 ` Andrea Arcangeli
2009-03-13 16:09 ` Nick Piggin
2009-03-13 19:34 ` Andrea Arcangeli
2009-03-14 4:59 ` Nick Piggin
2009-03-16 13:56 ` Andrea Arcangeli
2009-03-16 16:01 ` Nick Piggin
2009-03-14 4:46 ` Nick Piggin
2009-03-14 5:06 ` Nick Piggin
2009-03-11 18:53 ` Andrea Arcangeli
2009-03-11 18:22 ` Andrea Arcangeli
2009-03-11 19:06 ` Ingo Molnar
2009-03-11 19:15 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090311215721.GS27823@random.random \
--to=aarcange@redhat.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@novell.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).