linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	Rik van Riel <riel@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache
Date: Tue, 18 Feb 2014 13:53:23 -0500	[thread overview]
Message-ID: <20140218185323.GB5744@linux.intel.com> (raw)
In-Reply-To: <CA+55aFzqZ2S==NyWG67hNV1YsY-oXLjLvCR0JeiHGJOfnoGJBg@mail.gmail.com>

On Tue, Feb 18, 2014 at 10:02:26AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 6:15 AM, Wilcox, Matthew R
> <matthew.r.wilcox@intel.com> wrote:
> > We don't really need to lock all the pages being returned to protect
> > against truncate.  We only need to lock the one at the highest index,
> > and check i_size while that lock is held since truncate_inode_pages_range()
> > will block on any page that is locked.
> >
> > We're still vulnerable to holepunches, but there's no locking currently
> > between holepunches and truncate, so we're no worse off now.
> 
> It's not "holepunches and truncate", it's "holepunches and page
> mapping", and I do think we currently serialize the two - the whole
> "check page->mapping still being non-NULL" before mapping it while
> having the page locked does that.

Yes, I did mean "holepunches and page faults".  But here's the race I see:

Process A			Process B
ext4_fallocate()
ext4_punch_hole()
filemap_write_and_wait_range()
mutex_lock(&inode->i_mutex);
truncate_pagecache_range()
unmap_mapping_range()
				__do_fault()
				filemap_fault()
				lock_page_or_retry()
				(page->mapping == mapping at this point)
				set_pte_at()
				unlock_page()
truncate_inode_pages_range()
(now the pte is pointing at a page that
 is no longer attached to this file)
mutex_unlock(&inode->i_mutex);

Would we solve the problem by putting in a second call to
unmap_mapping_range() after calling truncate_inode_pages_range() in
truncate_pagecache_range(), like truncate_pagecache() does?

> Besides, that per-page locking should serialize against truncate too.
> No, there is no "global" serialization, but there *is* exactly that
> page-level serialization where both truncation and hole punching end
> up making sure that the page no longer exists in the page cache and
> isn't mapped.

What I'm suggesting is going back to Kirill's earlier patch, but only
locking the page with the highest index instead of all of the pages.
truncate() will block on that page and then we'll notice that some or
all of the other pages are also now past i_size and give up.

  reply	other threads:[~2014-02-18 18:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-17 18:38 [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in page cache Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 1/2] mm: introduce vm_ops->fault_nonblock() Kirill A. Shutemov
2014-02-17 18:38 ` [PATCH 2/2] mm: implement ->fault_nonblock() for page cache Kirill A. Shutemov
2014-02-17 19:01 ` [RFC, PATCHv2 0/2] mm: map few pages around fault address if they are in " Linus Torvalds
2014-02-17 19:49   ` Kirill A. Shutemov
2014-02-17 20:24     ` Linus Torvalds
2014-02-18 13:28   ` Rik van Riel
2014-02-18 14:15     ` Wilcox, Matthew R
2014-02-18 18:02       ` Linus Torvalds
2014-02-18 18:53         ` Matthew Wilcox [this message]
2014-02-18 19:07           ` Linus Torvalds
2014-02-18 14:23     ` Kirill A. Shutemov
2014-02-18 17:51     ` Linus Torvalds
2014-02-18 17:59   ` Kirill A. Shutemov
2014-02-18 18:07     ` Kirill A. Shutemov
2014-02-18 18:28       ` Linus Torvalds
2014-02-18 23:57         ` Kirill A. Shutemov
2014-02-19  0:29           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140218185323.GB5744@linux.intel.com \
    --to=willy@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).