From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id A8DF86B0047 for ; Wed, 11 Mar 2009 17:57:33 -0400 (EDT) Date: Wed, 11 Mar 2009 22:57:21 +0100 From: Andrea Arcangeli Subject: Re: [aarcange@redhat.com: [PATCH] fork vs gup(-fast) fix] Message-ID: <20090311215721.GS27823@random.random> References: <20090311174103.GA11979@elte.hu> <20090311183748.GK27823@random.random> <20090311195935.GO27823@random.random> <20090311205529.GR27823@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Linus Torvalds Cc: Ingo Molnar , Nick Piggin , Hugh Dickins , KOSAKI Motohiro , KAMEZAWA Hiroyuki , linux-mm@kvack.org List-ID: On Wed, Mar 11, 2009 at 02:28:08PM -0700, Linus Torvalds wrote: > The fact that the non-fast "get_user_pages()" takes the mmap semaphore for > reading doesn't even protect that. It just means that the pages made sense > at the time the get_user_pages() happened, not necessarily at the time > when the actual use of them did. Indeed this is a generic problem, not specific to get_user_pages_fast. get_user_pages_fast just adds a few complications to serialize against. > O_DIRECT is actually the _simple_ case, since we won't be returning until > it is done (ie it's not actually a async interface). So no, O_DIRECT > doesn't need any interrupt handler games. It would just need to hold the > sem over the actual call to the filesystem (ie just over the ->direct_IO() > call). I don't see how you can solve the race by only holding the sem only over the direct_IO call (and not until the I/O completion handler fires). I think to solve the race using mmap_sem only, the bio I/O completion handler that eventually calls into direct-io.c from irq context would need to up_read(&mmap_sem). The way my patch avoids to alter the I/O completion path running from irq context is by ensuring no I/O is going on at all to the pages that are being shared with the child, and by ensuring that any gup or gup-fast will trigger cow before it can write to the shared page. Pages simply can't be shared before I/O is complete. > People want the relaxed synchronization we give them, and that's literally > why get_user_pages_fast exists - because people don't want _more_ > synchronization, they want _less_. > > But the thing is, with less synchronization, the behavior really is > surprising in the edge cases. Which is why I think "threaded fork" plus > "get_user_pages_fast" just doesn't make sense to even _worry_ about. If > you use O_DIRECT and mix it with fork, you get what you get, and it's > random - exactly because people who want O_DIRECT don't want any locking. > > It's a user-space issue, not a kernel issue. I think your point of view is clear, I sure can write userland code that copes it the currently altered memory protection semantics of read vs fork if fd is opened with O_DIRECT or drivers using gup, so I'll let the userland folks comment on it, some are in CC. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org