Re: Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Becker <Joel.Becker@oracle.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	ocfs2-devel@oss.oracle.com
Subject: Re: Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]
Date: Fri, 31 Dec 2010 03:39:50 -0800	[thread overview]
Message-ID: <20101231113949.GA21179@mail.oracle.com> (raw)
In-Reply-To: <AANLkTik3_=Jo2ubML1hhanFFKC3MkEibrCPruto2oY_v@mail.gmail.com>

On Fri, Dec 31, 2010 at 10:16:35PM +1100, Nick Piggin wrote:
> On Fri, Dec 31, 2010 at 10:00 PM, Joel Becker <jlbec@evilplan.org> wrote:
> > On Fri, Dec 31, 2010 at 08:31:41PM +1100, Nick Piggin wrote:
> >> On Fri, Dec 31, 2010 at 10:29 AM, Joel Becker <Joel.Becker@oracle.com> wrote:
> 
> >> The better way to do this would be to just return VM_FAULT_NOPAGE
> >> in any case you need the VM to retry the fault. When you reach the
> >> business end of your handler, you want to hold the page locked, after
> >> you verify it is correct, and return that to the fault handler.
> >
> >        This is going to be hard.  Our write_end() assumes it must
> > unlock the pages (which is normal behavior for write(2)), but in the
> > page_mkwrite() case we need to avoid the unlock to follow your
> > recommendation (we use our write_begin/write_end pair to trigger any
> > allocation or zeroing needed before the page is writable).
> 
> Yes that would be the best option. It is possible to use the unlocked
> return, but that still gives possibility of races in some cases. Consider

	Yes, I am aware of that.
 
> Basically it would be nice for all filesystems to move to this convention
> so we can remove the old cruft.

	I can appreciate that.  And you've just answered the "Do you
want us to get there, or are minor faults in the old-0-style OK?"
question.
 
> >        The find_or_create_page() is deep at the meat of the function,
> > not the cursory check at the top.  The idea is that at this point,
> > find_or_create_page() will return a locked page that must, by
> > definition, be part of the correct mapping.
> 
> But you must still handle failures there, because find_or_create_page
> may return -ENOMEM. So just lock the page, recheck the mapping
> there, and then do exactly the same error handling.

	Of course it can error, but error is different than clean
restart.  Though cleanup should be the same; I'm looking at our code
trying to convince myself that this is so ;-)  Btw, -ENOMEM is that OOM
fault error, right? ;-)

> >        If the VM is rechecking the pte after we return from
> > page_mkwrite(), won't it see any new page created?
> 
> But the point of page_mkwrite is a dirty notifier for the fs. If this new
> different page was installed due to say truncate then a read-only
> fault, the filesystem would miss the notification. So it just does the
> simple thing and retries the whole sequence (if needed).

	Fair enough, even if we caused the new page and thus are already
notified ;-)
	Essentially, you would really like us (and ext4, and gfs2, etc)
to start returning VM_FAULT_NOPAGE for any place we aren't sure what
happened to our page, VM_FAULT_OOM when we get an -ENOMEM, and
VM_FAULT_LOCKED for all successful operations.  That the long and short
of it?  Thank you for helping me understand your plan for this code.
	Btw, what not use VM_FAULT_RETRY for "I'd like you to retry"?
Especially since the comment for NOPAGE doesn't read anything like its
actual behavior.

Joel

-- 

"Friends may come and go, but enemies accumulate." 
        - Thomas Jones

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: Joel Becker <Joel.Becker@oracle.com>
To: Nick Piggin <npiggin@gmail.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]
Date: Fri, 31 Dec 2010 03:39:50 -0800	[thread overview]
Message-ID: <20101231113949.GA21179@mail.oracle.com> (raw)
In-Reply-To: <AANLkTik3_=Jo2ubML1hhanFFKC3MkEibrCPruto2oY_v@mail.gmail.com>

On Fri, Dec 31, 2010 at 10:16:35PM +1100, Nick Piggin wrote:
> On Fri, Dec 31, 2010 at 10:00 PM, Joel Becker <jlbec@evilplan.org> wrote:
> > On Fri, Dec 31, 2010 at 08:31:41PM +1100, Nick Piggin wrote:
> >> On Fri, Dec 31, 2010 at 10:29 AM, Joel Becker <Joel.Becker@oracle.com> wrote:
> 
> >> The better way to do this would be to just return VM_FAULT_NOPAGE
> >> in any case you need the VM to retry the fault. When you reach the
> >> business end of your handler, you want to hold the page locked, after
> >> you verify it is correct, and return that to the fault handler.
> >
> > ? ? ? ?This is going to be hard. ?Our write_end() assumes it must
> > unlock the pages (which is normal behavior for write(2)), but in the
> > page_mkwrite() case we need to avoid the unlock to follow your
> > recommendation (we use our write_begin/write_end pair to trigger any
> > allocation or zeroing needed before the page is writable).
> 
> Yes that would be the best option. It is possible to use the unlocked
> return, but that still gives possibility of races in some cases. Consider

	Yes, I am aware of that.
 
> Basically it would be nice for all filesystems to move to this convention
> so we can remove the old cruft.

	I can appreciate that.  And you've just answered the "Do you
want us to get there, or are minor faults in the old-0-style OK?"
question.
 
> > ? ? ? ?The find_or_create_page() is deep at the meat of the function,
> > not the cursory check at the top. ?The idea is that at this point,
> > find_or_create_page() will return a locked page that must, by
> > definition, be part of the correct mapping.
> 
> But you must still handle failures there, because find_or_create_page
> may return -ENOMEM. So just lock the page, recheck the mapping
> there, and then do exactly the same error handling.

	Of course it can error, but error is different than clean
restart.  Though cleanup should be the same; I'm looking at our code
trying to convince myself that this is so ;-)  Btw, -ENOMEM is that OOM
fault error, right? ;-)

> > ? ? ? ?If the VM is rechecking the pte after we return from
> > page_mkwrite(), won't it see any new page created?
> 
> But the point of page_mkwrite is a dirty notifier for the fs. If this new
> different page was installed due to say truncate then a read-only
> fault, the filesystem would miss the notification. So it just does the
> simple thing and retries the whole sequence (if needed).

	Fair enough, even if we caused the new page and thus are already
notified ;-)
	Essentially, you would really like us (and ext4, and gfs2, etc)
to start returning VM_FAULT_NOPAGE for any place we aren't sure what
happened to our page, VM_FAULT_OOM when we get an -ENOMEM, and
VM_FAULT_LOCKED for all successful operations.  That the long and short
of it?  Thank you for helping me understand your plan for this code.
	Btw, what not use VM_FAULT_RETRY for "I'd like you to retry"?
Especially since the comment for NOPAGE doesn't read anything like its
actual behavior.

Joel

-- 

"Friends may come and go, but enemies accumulate." 
        - Thomas Jones

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

next prev parent reply	other threads:[~2010-12-31 11:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-30 23:29 Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs] Joel Becker
2010-12-30 23:29 ` [Ocfs2-devel] " Joel Becker
2010-12-31  9:31 ` Nick Piggin
2010-12-31  9:31   ` [Ocfs2-devel] " Nick Piggin
2010-12-31 11:00   ` Joel Becker
2010-12-31 11:00     ` [Ocfs2-devel] " Joel Becker
2010-12-31 11:16     ` Nick Piggin
2010-12-31 11:16       ` [Ocfs2-devel] " Nick Piggin
2010-12-31 11:39       ` Joel Becker [this message]
2010-12-31 11:39         ` Joel Becker
2010-12-31 11:51         ` Nick Piggin
2010-12-31 11:51           ` [Ocfs2-devel] " Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101231113949.GA21179@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.