All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Andi Kleen <ak@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Tue, 18 Feb 2014 16:23:05 +0200	[thread overview]
Message-ID: <20140218142305.GA5933@node.dhcp.inet.fi> (raw)
In-Reply-To: <53035FE2.4080300@redhat.com>

On Tue, Feb 18, 2014 at 08:28:02AM -0500, Rik van Riel wrote:
> On 02/17/2014 02:01 PM, Linus Torvalds wrote:
> 
> >  - increment the page _mapcount (iow, do "page_add_file_rmap()"
> > early). This guarantees that any *subsequent* unmap activity on this
> > page will walk the file mapping lists, and become serialized by the
> > page table lock we hold.
> > 
> >  - mb_after_atomic_inc() (this is generally free)
> > 
> >  - test that the page is still unlocked and uptodate, and the page
> > mapping still points to our page.
> > 
> >  - if that is true, we're all good, we can use the page, otherwise we
> > decrement the mapcount (page_remove_rmap()) and skip the page.
> > 
> > Hmm? Doing something like this means that we would never lock the
> > pages we prefault, and you can go back to your gang lookup rather than
> > that "one page at a time". And the race case is basically never going
> > to trigger.
> > 
> > Comments?
> 
> What would the direct io code do when it runs into a page with
> elevated mapcount, but for which a mapping cannot be found yet?
> 
> Looking at the code, it looks like the above scheme could cause
> some trouble with invalidate_inode_pages2_range(), which has
> the following sequence:
> 
>                         if (page_mapped(page)) {
> 				... unmap page
> 			}
>                         BUG_ON(page_mapped(page));
> 
> In other words, it looks like incrementing _mapcount first could
> lead to an oops in the truncate and direct IO code.
> 
> The page lock is used to prevent such races.
> 
> *sigh*

What if we will retry unmap once again, before triggering BUG().
The second unmap will be serialized by page table lock, right?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Andi Kleen <ak@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Tue, 18 Feb 2014 16:23:05 +0200	[thread overview]
Message-ID: <20140218142305.GA5933@node.dhcp.inet.fi> (raw)
In-Reply-To: <53035FE2.4080300@redhat.com>

On Tue, Feb 18, 2014 at 08:28:02AM -0500, Rik van Riel wrote:
> On 02/17/2014 02:01 PM, Linus Torvalds wrote:
> 
> >  - increment the page _mapcount (iow, do "page_add_file_rmap()"
> > early). This guarantees that any *subsequent* unmap activity on this
> > page will walk the file mapping lists, and become serialized by the
> > page table lock we hold.
> > 
> >  - mb_after_atomic_inc() (this is generally free)
> > 
> >  - test that the page is still unlocked and uptodate, and the page
> > mapping still points to our page.
> > 
> >  - if that is true, we're all good, we can use the page, otherwise we
> > decrement the mapcount (page_remove_rmap()) and skip the page.
> > 
> > Hmm? Doing something like this means that we would never lock the
> > pages we prefault, and you can go back to your gang lookup rather than
> > that "one page at a time". And the race case is basically never going
> > to trigger.
> > 
> > Comments?
> 
> What would the direct io code do when it runs into a page with
> elevated mapcount, but for which a mapping cannot be found yet?
> 
> Looking at the code, it looks like the above scheme could cause
> some trouble with invalidate_inode_pages2_range(), which has
> the following sequence:
> 
>                         if (page_mapped(page)) {
> 				... unmap page
> 			}
>                         BUG_ON(page_mapped(page));
> 
> In other words, it looks like incrementing _mapcount first could
> lead to an oops in the truncate and direct IO code.
> 
> The page lock is used to prevent such races.
> 
> *sigh*

What if we will retry unmap once again, before triggering BUG().
The second unmap will be serialized by page table lock, right?

-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2014-02-18 14:23 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-17 18:38 [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache Kirill A. Shutemov
2014-02-17 18:38 ` Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 1/2] mm: introduce vm_ops->fault_nonblock() Kirill A. Shutemov
2014-02-17 18:38   ` Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 2/2] mm: implement ->fault_nonblock() for page cache Kirill A. Shutemov
2014-02-17 18:38   ` Kirill A. Shutemov
2014-02-17 19:01 ` [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in " Linus Torvalds
2014-02-17 19:01   ` Linus Torvalds
2014-02-17 19:49   ` Kirill A. Shutemov
2014-02-17 19:49     ` Kirill A. Shutemov
2014-02-17 20:24     ` Linus Torvalds
2014-02-17 20:24       ` Linus Torvalds
2014-02-18 13:28   ` Rik van Riel
2014-02-18 13:28     ` Rik van Riel
2014-02-18 14:15     ` Wilcox, Matthew R
2014-02-18 14:15       ` Wilcox, Matthew R
2014-02-18 18:02       ` Linus Torvalds
2014-02-18 18:02         ` Linus Torvalds
2014-02-18 18:53         ` Matthew Wilcox
2014-02-18 18:53           ` Matthew Wilcox
2014-02-18 19:07           ` Linus Torvalds
2014-02-18 19:07             ` Linus Torvalds
2014-02-18 14:23     ` Kirill A. Shutemov [this message]
2014-02-18 14:23       ` Kirill A. Shutemov
2014-02-18 17:51     ` Linus Torvalds
2014-02-18 17:51       ` Linus Torvalds
2014-02-18 17:59   ` Kirill A. Shutemov
2014-02-18 17:59     ` Kirill A. Shutemov
2014-02-18 18:07     ` Kirill A. Shutemov
2014-02-18 18:07       ` Kirill A. Shutemov
2014-02-18 18:07       ` Kirill A. Shutemov
2014-02-18 18:28       ` Linus Torvalds
2014-02-18 18:28         ` Linus Torvalds
2014-02-18 23:57         ` Kirill A. Shutemov
2014-02-18 23:57           ` Kirill A. Shutemov
2014-02-19  0:29           ` Linus Torvalds
2014-02-19  0:29             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140218142305.GA5933@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.