Re: Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Joel Becker <Joel.Becker@oracle.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	ocfs2-devel@oss.oracle.com
Subject: Re: Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]
Date: Fri, 31 Dec 2010 03:39:50 -0800	[thread overview]
Message-ID: <20101231113949.GA21179@mail.oracle.com> (raw)
In-Reply-To: <AANLkTik3_=Jo2ubML1hhanFFKC3MkEibrCPruto2oY_v@mail.gmail.com>

On Fri, Dec 31, 2010 at 10:16:35PM +1100, Nick Piggin wrote:
> On Fri, Dec 31, 2010 at 10:00 PM, Joel Becker <jlbec@evilplan.org> wrote:
> > On Fri, Dec 31, 2010 at 08:31:41PM +1100, Nick Piggin wrote:
> >> On Fri, Dec 31, 2010 at 10:29 AM, Joel Becker <Joel.Becker@oracle.com> wrote:
> 
> >> The better way to do this would be to just return VM_FAULT_NOPAGE
> >> in any case you need the VM to retry the fault. When you reach the
> >> business end of your handler, you want to hold the page locked, after
> >> you verify it is correct, and return that to the fault handler.
> >
> >        This is going to be hard.  Our write_end() assumes it must
> > unlock the pages (which is normal behavior for write(2)), but in the
> > page_mkwrite() case we need to avoid the unlock to follow your
> > recommendation (we use our write_begin/write_end pair to trigger any
> > allocation or zeroing needed before the page is writable).
> 
> Yes that would be the best option. It is possible to use the unlocked
> return, but that still gives possibility of races in some cases. Consider

	Yes, I am aware of that.
 
> Basically it would be nice for all filesystems to move to this convention
> so we can remove the old cruft.

	I can appreciate that.  And you've just answered the "Do you
want us to get there, or are minor faults in the old-0-style OK?"
question.
 
> >        The find_or_create_page() is deep at the meat of the function,
> > not the cursory check at the top.  The idea is that at this point,
> > find_or_create_page() will return a locked page that must, by
> > definition, be part of the correct mapping.
> 
> But you must still handle failures there, because find_or_create_page
> may return -ENOMEM. So just lock the page, recheck the mapping
> there, and then do exactly the same error handling.

	Of course it can error, but error is different than clean
restart.  Though cleanup should be the same; I'm looking at our code
trying to convince myself that this is so ;-)  Btw, -ENOMEM is that OOM
fault error, right? ;-)

> >        If the VM is rechecking the pte after we return from
> > page_mkwrite(), won't it see any new page created?
> 
> But the point of page_mkwrite is a dirty notifier for the fs. If this new
> different page was installed due to say truncate then a read-only
> fault, the filesystem would miss the notification. So it just does the
> simple thing and retries the whole sequence (if needed).

	Fair enough, even if we caused the new page and thus are already
notified ;-)
	Essentially, you would really like us (and ext4, and gfs2, etc)
to start returning VM_FAULT_NOPAGE for any place we aren't sure what
happened to our page, VM_FAULT_OOM when we get an -ENOMEM, and
VM_FAULT_LOCKED for all successful operations.  That the long and short
of it?  Thank you for helping me understand your plan for this code.
	Btw, what not use VM_FAULT_RETRY for "I'd like you to retry"?
Especially since the comment for NOPAGE doesn't read anything like its
actual behavior.

Joel

-- 

"Friends may come and go, but enemies accumulate." 
        - Thomas Jones

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-12-31 11:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-30 23:29 Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs] Joel Becker
2010-12-31  9:31 ` Nick Piggin
2010-12-31 11:00   ` Joel Becker
2010-12-31 11:16     ` Nick Piggin
2010-12-31 11:39       ` Joel Becker [this message]
2010-12-31 11:51         ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101231113949.GA21179@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).