From: Peter Zijlstra <peterz@infradead.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Edward Shishkin <edward.shishkin@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Ryan Hope <rmh3093@gmail.com>,
Randy Dunlap <randy.dunlap@oracle.com>,
linux-kernel@vger.kernel.org,
ReiserFS Mailing List <reiserfs-devel@vger.kernel.org>
Subject: set_page_dirty races (was: Re: [patch 2/4] vfs: add set_page_dirty_notag)
Date: Tue, 17 Feb 2009 11:40:00 +0100 [thread overview]
Message-ID: <1234867200.4744.65.camel@laptop> (raw)
In-Reply-To: <20090217102443.GA26402@wotan.suse.de>
On Tue, 2009-02-17 at 11:24 +0100, Nick Piggin wrote:
> On Tue, Feb 17, 2009 at 11:05:16AM +0100, Peter Zijlstra wrote:
> > On Tue, 2009-02-17 at 10:38 +0100, Nick Piggin wrote:
> >
> > > It is a great shame that filesystems are not properly notified
> > > that a page may become dirty before the actual set_page_dirty
> > > event (which is not allowed to fail and is called after the
> > > page is already dirty).
> >
> > Not quite true, for example the set_page_dirty() done by the write fault
> > code is done _before_ the page becomes dirty.
> >
> > This before/after thing was the reason for that horrid file corruption
> > bug that dragged on for a few weeks back in .19 (IIRC).
>
> Yeah, there are actually races though. The page can become cleaned
> before set_page_dirty is reached, and there are also nasty races with
> truncate.
Hmm, so you're saying that never got properly fixed?
> > > This is a big problem I have with fsblock simply in trying to
> > > make the memory allocation robust. page_mkwrite unfortunately
> > > is racy and I've fixed problems there... the big problem though
> > > is get_user_pages. Fixing that properly seems to require fixing
> > > callers so it is not really realistic in the short term.
> >
> > Right, I'm just not sure what we can do, even with a
> > prepage_page_dirty() function, what are you going to do, fail the fault?
>
> Oh, for regular page fault functions using page_mkwrite, they
> definitely want to fail the fault with a SIGBUS, and actually XFS
> already does that (for fsblock robust memory allocations you
> would also want to fail OOM on metadata allocation failure). What
> is the other option? Silently fail the write?
OK, agreed.
> For XFS purpose (ie. -ENOSPC handling), the current code is reasonable
> although there could be some truncate races with block allocation. But
> mostly probably works. For something like fsblock it can be much more
> common to have the metadata refcount reach 0 and freed before spd is
> called. In that case the code actually goes into a bug situation so it
> is a bit more critical.
>
> But no that's the "easy" part. The hard part is get_user_pages
> because the caller can hold onto the page indefinitely simply with a
> refcount, and go along happily dirtying it at any stage (actually
> writing to the page memory) before actually calling set_page_dirty.
Should a gup user not specify .write=1 if it wants to dirty the page, at
which point the follow_page() will do the dirty-fault thingy.
Ah, but then we can clean it because we're not holding the page-lock. I
see.
> The "cleanest" way to fix this from VM point of view is probably to
> force gup callers to hold the page locked for the duration to
> prevent truncation or writeout after the filesystem notification.
> Don't know if that would be very popular, however.
Right, so you'd want to keep the page locked over gup(.write=1)
sections.
So should we extend the gup() with put_user_page()?
next prev parent reply other threads:[~2009-02-17 10:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-13 11:56 [patch 2/4] vfs: add set_page_dirty_notag Edward Shishkin
2009-02-13 13:08 ` Peter Zijlstra
2009-02-13 13:57 ` Edward Shishkin
2009-02-13 14:09 ` Peter Zijlstra
2009-02-14 13:11 ` Edward Shishkin
2009-02-14 21:11 ` Peter Zijlstra
2009-02-16 22:43 ` Edward Shishkin
2009-02-17 9:09 ` Peter Zijlstra
2009-02-17 9:38 ` Nick Piggin
2009-02-17 10:05 ` Peter Zijlstra
2009-02-17 10:24 ` Nick Piggin
2009-02-17 10:40 ` Peter Zijlstra [this message]
2009-02-17 11:25 ` set_page_dirty races (was: Re: [patch 2/4] vfs: add set_page_dirty_notag) Nick Piggin
2009-02-17 11:39 ` Peter Zijlstra
2009-02-17 11:55 ` Nick Piggin
2009-02-17 12:05 ` Peter Zijlstra
2009-02-17 12:30 ` Nick Piggin
2009-02-17 22:35 ` [patch 2/4] vfs: add set_page_dirty_notag Andrew Morton
2009-02-18 0:26 ` Edward Shishkin
2009-02-18 0:38 ` Andrew Morton
2009-02-18 13:27 ` [patch 1/2] vfs: add/use update_page_accounting Edward Shishkin
2009-02-18 14:06 ` Nick Piggin
2009-02-18 18:23 ` Andrew Morton
2009-02-18 13:27 ` [patch 2/2] vfs: (take 2)add set_page_dirty_notag Edward Shishkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1234867200.4744.65.camel@laptop \
--to=peterz@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=edward.shishkin@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=randy.dunlap@oracle.com \
--cc=reiserfs-devel@vger.kernel.org \
--cc=rmh3093@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox