public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Edward Shishkin <edward.shishkin@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ryan Hope <rmh3093@gmail.com>,
	Randy Dunlap <randy.dunlap@oracle.com>,
	linux-kernel@vger.kernel.org,
	ReiserFS Mailing List <reiserfs-devel@vger.kernel.org>
Subject: Re: [patch 2/4] vfs: add set_page_dirty_notag
Date: Tue, 17 Feb 2009 11:24:43 +0100	[thread overview]
Message-ID: <20090217102443.GA26402@wotan.suse.de> (raw)
In-Reply-To: <1234865116.4744.46.camel@laptop>

On Tue, Feb 17, 2009 at 11:05:16AM +0100, Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 10:38 +0100, Nick Piggin wrote:
> 
> > It is a great shame that filesystems are not properly notified
> > that a page may become dirty before the actual set_page_dirty
> > event (which is not allowed to fail and is called after the
> > page is already dirty).
> 
> Not quite true, for example the set_page_dirty() done by the write fault
> code is done _before_ the page becomes dirty.
> 
> This before/after thing was the reason for that horrid file corruption
> bug that dragged on for a few weeks back in .19 (IIRC).

Yeah, there are actually races though. The page can become cleaned
before set_page_dirty is reached, and there are also nasty races with
truncate.

 
> > This is a big problem I have with fsblock simply in trying to
> > make the memory allocation robust. page_mkwrite unfortunately
> > is racy and I've fixed problems there... the big problem though
> > is get_user_pages. Fixing that properly seems to require fixing
> > callers so it is not really realistic in the short term.
> 
> Right, I'm just not sure what we can do, even with a
> prepage_page_dirty() function, what are you going to do, fail the fault?

Oh, for regular page fault functions using page_mkwrite, they
definitely want to fail the fault with a SIGBUS, and actually XFS
already does that (for fsblock robust memory allocations you
would also want to fail OOM on metadata allocation failure). What
is the other option? Silently fail the write?

For XFS purpose (ie. -ENOSPC handling), the current code is reasonable
although there could be some truncate races with block allocation. But
mostly probably works. For something like fsblock it can be much more
common to have the metadata refcount reach 0 and freed before spd is
called. In that case the code actually goes into a bug situation so it
is a bit more critical.

But no that's the "easy" part. The hard part is get_user_pages
because the caller can hold onto the page indefinitely simply with a
refcount, and go along happily dirtying it at any stage (actually
writing to the page memory) before actually calling set_page_dirty.

The "cleanest" way to fix this from VM point of view is probably to
force gup callers to hold the page locked for the duration to
prevent truncation or writeout after the filesystem notification.
Don't know if that would be very popular, however.

  reply	other threads:[~2009-02-17 10:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-13 11:56 [patch 2/4] vfs: add set_page_dirty_notag Edward Shishkin
2009-02-13 13:08 ` Peter Zijlstra
2009-02-13 13:57   ` Edward Shishkin
2009-02-13 14:09     ` Peter Zijlstra
2009-02-14 13:11       ` Edward Shishkin
2009-02-14 21:11         ` Peter Zijlstra
2009-02-16 22:43           ` Edward Shishkin
2009-02-17  9:09             ` Peter Zijlstra
2009-02-17  9:38               ` Nick Piggin
2009-02-17 10:05                 ` Peter Zijlstra
2009-02-17 10:24                   ` Nick Piggin [this message]
2009-02-17 10:40                     ` set_page_dirty races (was: Re: [patch 2/4] vfs: add set_page_dirty_notag) Peter Zijlstra
2009-02-17 11:25                       ` Nick Piggin
2009-02-17 11:39                         ` Peter Zijlstra
2009-02-17 11:55                           ` Nick Piggin
2009-02-17 12:05                             ` Peter Zijlstra
2009-02-17 12:30                               ` Nick Piggin
2009-02-17 22:35             ` [patch 2/4] vfs: add set_page_dirty_notag Andrew Morton
2009-02-18  0:26               ` Edward Shishkin
2009-02-18  0:38                 ` Andrew Morton
2009-02-18 13:27                   ` [patch 1/2] vfs: add/use update_page_accounting Edward Shishkin
2009-02-18 14:06                     ` Nick Piggin
2009-02-18 18:23                       ` Andrew Morton
2009-02-18 13:27                   ` [patch 2/2] vfs: (take 2)add set_page_dirty_notag Edward Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090217102443.GA26402@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=edward.shishkin@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=randy.dunlap@oracle.com \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=rmh3093@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox