From: Andrew Morton <akpm@linux-foundation.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings
Date: Tue, 6 Mar 2007 22:36:41 -0800 [thread overview]
Message-ID: <20070306223641.505db0e0.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070221023724.6306.53097.sendpatchset@linux.site>
On Wed, 21 Feb 2007 05:50:05 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:
> Fix the race between invalidate_inode_pages and do_no_page.
>
> Andrea Arcangeli identified a subtle race between invalidation of
> pages from pagecache with userspace mappings, and do_no_page.
>
> The issue is that invalidation has to shoot down all mappings to the
> page, before it can be discarded from the pagecache. Between shooting
> down ptes to a particular page, and actually dropping the struct page
> from the pagecache, do_no_page from any process might fault on that
> page and establish a new mapping to the page just before it gets
> discarded from the pagecache.
>
> The most common case where such invalidation is used is in file
> truncation. This case was catered for by doing a sort of open-coded
> seqlock between the file's i_size, and its truncate_count.
>
> Truncation will decrease i_size, then increment truncate_count before
> unmapping userspace pages; do_no_page will read truncate_count, then
> find the page if it is within i_size, and then check truncate_count
> under the page table lock and back out and retry if it had
> subsequently been changed (ptl will serialise against unmapping, and
> ensure a potentially updated truncate_count is actually visible).
>
> Complexity and documentation issues aside, the locking protocol fails
> in the case where we would like to invalidate pagecache inside i_size.
> do_no_page can come in anytime and filemap_nopage is not aware of the
> invalidation in progress (as it is when it is outside i_size). The
> end result is that dangling (->mapping == NULL) pages that appear to
> be from a particular file may be mapped into userspace with nonsense
> data. Valid mappings to the same place will see a different page.
>
> Andrea implemented two working fixes, one using a real seqlock,
> another using a page->flags bit. He also proposed using the page lock
> in do_no_page, but that was initially considered too heavyweight.
> However, it is not a global or per-file lock, and the page cacheline
> is modified in do_no_page to increment _count and _mapcount anyway, so
> a further modification should not be a large performance hit.
> Scalability is not an issue.
>
> This patch implements this latter approach. ->nopage implementations
> return with the page locked if it is possible for their underlying
> file to be invalidated (in that case, they must set a special vm_flags
> bit to indicate so). do_no_page only unlocks the page after setting
> up the mapping completely. invalidation is excluded because it holds
> the page lock during invalidation of each page (and ensures that the
> page is not mapped while holding the lock).
>
> This also allows significant simplifications in do_no_page, because
> we have the page locked in the right place in the pagecache from the
> start.
>
Why was truncate_inode_pages_range() altered to unmap the page if it got
mapped again?
Oh. Because the unmap_mapping_range() call got removed from vmtruncate().
Why? (Please send suitable updates to the changelog).
I guess truncate of a mmapped area isn't sufficiently common to worry about
the inefficiency of this change.
Lots of memory barriers got removed in memory.c, unchangeloggedly.
Gratuitous renaming of locals in do_no_page() makes the change hard to
review. Should have been a separate patch.
In fact, the patch would have been heaps clearer if that renaming had been
a separate patch.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings
Date: Tue, 6 Mar 2007 22:36:41 -0800 [thread overview]
Message-ID: <20070306223641.505db0e0.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070221023724.6306.53097.sendpatchset@linux.site>
On Wed, 21 Feb 2007 05:50:05 +0100 (CET) Nick Piggin <npiggin@suse.de> wrote:
> Fix the race between invalidate_inode_pages and do_no_page.
>
> Andrea Arcangeli identified a subtle race between invalidation of
> pages from pagecache with userspace mappings, and do_no_page.
>
> The issue is that invalidation has to shoot down all mappings to the
> page, before it can be discarded from the pagecache. Between shooting
> down ptes to a particular page, and actually dropping the struct page
> from the pagecache, do_no_page from any process might fault on that
> page and establish a new mapping to the page just before it gets
> discarded from the pagecache.
>
> The most common case where such invalidation is used is in file
> truncation. This case was catered for by doing a sort of open-coded
> seqlock between the file's i_size, and its truncate_count.
>
> Truncation will decrease i_size, then increment truncate_count before
> unmapping userspace pages; do_no_page will read truncate_count, then
> find the page if it is within i_size, and then check truncate_count
> under the page table lock and back out and retry if it had
> subsequently been changed (ptl will serialise against unmapping, and
> ensure a potentially updated truncate_count is actually visible).
>
> Complexity and documentation issues aside, the locking protocol fails
> in the case where we would like to invalidate pagecache inside i_size.
> do_no_page can come in anytime and filemap_nopage is not aware of the
> invalidation in progress (as it is when it is outside i_size). The
> end result is that dangling (->mapping == NULL) pages that appear to
> be from a particular file may be mapped into userspace with nonsense
> data. Valid mappings to the same place will see a different page.
>
> Andrea implemented two working fixes, one using a real seqlock,
> another using a page->flags bit. He also proposed using the page lock
> in do_no_page, but that was initially considered too heavyweight.
> However, it is not a global or per-file lock, and the page cacheline
> is modified in do_no_page to increment _count and _mapcount anyway, so
> a further modification should not be a large performance hit.
> Scalability is not an issue.
>
> This patch implements this latter approach. ->nopage implementations
> return with the page locked if it is possible for their underlying
> file to be invalidated (in that case, they must set a special vm_flags
> bit to indicate so). do_no_page only unlocks the page after setting
> up the mapping completely. invalidation is excluded because it holds
> the page lock during invalidation of each page (and ensures that the
> page is not mapped while holding the lock).
>
> This also allows significant simplifications in do_no_page, because
> we have the page locked in the right place in the pagecache from the
> start.
>
Why was truncate_inode_pages_range() altered to unmap the page if it got
mapped again?
Oh. Because the unmap_mapping_range() call got removed from vmtruncate().
Why? (Please send suitable updates to the changelog).
I guess truncate of a mmapped area isn't sufficiently common to worry about
the inefficiency of this change.
Lots of memory barriers got removed in memory.c, unchangeloggedly.
Gratuitous renaming of locals in do_no_page() makes the change hard to
review. Should have been a separate patch.
In fact, the patch would have been heaps clearer if that renaming had been
a separate patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-03-07 6:36 UTC|newest]
Thread overview: 198+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-21 4:49 [patch 0/6] fault vs truncate/invalidate race fix Nick Piggin
2007-02-21 4:49 ` Nick Piggin
2007-02-21 4:49 ` [patch 1/6] mm: debug check for the fault vs invalidate race Nick Piggin
2007-02-21 4:49 ` Nick Piggin
2007-02-21 4:49 ` [patch 2/6] mm: simplify filemap_nopage Nick Piggin
2007-02-21 4:49 ` Nick Piggin
2007-02-21 4:50 ` [patch 3/6] mm: fix fault vs invalidate race for linear mappings Nick Piggin
2007-02-21 4:50 ` Nick Piggin
2007-03-07 6:36 ` Andrew Morton [this message]
2007-03-07 6:36 ` Andrew Morton
2007-03-07 6:57 ` Nick Piggin
2007-03-07 6:57 ` Nick Piggin
2007-03-07 7:08 ` Andrew Morton
2007-03-07 7:08 ` Andrew Morton
2007-03-07 7:25 ` Nick Piggin
2007-03-07 7:25 ` Nick Piggin
2007-02-21 4:50 ` [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear) Nick Piggin
2007-02-21 4:50 ` Nick Piggin
2007-03-07 6:51 ` Andrew Morton
2007-03-07 6:51 ` Andrew Morton
2007-03-07 7:08 ` Nick Piggin
2007-03-07 7:08 ` Nick Piggin
2007-03-07 8:19 ` Nick Piggin
2007-03-07 8:19 ` Nick Piggin
2007-03-07 8:27 ` Ingo Molnar
2007-03-07 8:27 ` Ingo Molnar
2007-03-07 8:35 ` Andrew Morton
2007-03-07 8:35 ` Andrew Morton
2007-03-07 8:53 ` Ingo Molnar
2007-03-07 8:53 ` Ingo Molnar
2007-03-07 9:28 ` Nick Piggin
2007-03-07 9:28 ` Nick Piggin
2007-03-07 9:44 ` Bill Irwin
2007-03-07 9:44 ` Bill Irwin
2007-03-07 9:49 ` Nick Piggin
2007-03-07 9:49 ` Nick Piggin
2007-03-07 10:02 ` Nick Piggin
2007-03-07 10:02 ` Nick Piggin
2007-03-12 23:01 ` Blaisorblade
2007-03-12 23:01 ` Blaisorblade
2007-03-13 1:19 ` Nick Piggin
2007-03-13 1:19 ` Nick Piggin
2007-03-17 12:17 ` Blaisorblade
2007-03-17 12:17 ` Blaisorblade
2007-03-18 2:50 ` Nick Piggin
2007-03-18 2:50 ` Nick Piggin
2007-03-18 13:09 ` Jeff Dike
2007-03-18 13:09 ` Jeff Dike
2007-03-19 12:04 ` Bill Irwin
2007-03-19 12:04 ` Bill Irwin
2007-03-19 20:44 ` Blaisorblade
2007-03-19 20:44 ` Blaisorblade
2007-03-20 6:00 ` Nick Piggin
2007-03-20 6:00 ` Nick Piggin
2007-03-21 19:45 ` Blaisorblade
2007-03-21 19:45 ` Blaisorblade
2007-03-08 12:39 ` Blaisorblade
2007-03-08 12:39 ` Blaisorblade
2007-03-07 9:29 ` Bill Irwin
2007-03-07 9:29 ` Bill Irwin
2007-03-07 9:39 ` Andrew Morton
2007-03-07 9:39 ` Andrew Morton
2007-03-07 10:09 ` Bill Irwin
2007-03-07 10:09 ` Bill Irwin
2007-03-07 8:38 ` Miklos Szeredi
2007-03-07 8:38 ` Miklos Szeredi
2007-03-07 8:47 ` Andrew Morton
2007-03-07 8:47 ` Andrew Morton
2007-03-07 8:51 ` Miklos Szeredi
2007-03-07 8:51 ` Miklos Szeredi
2007-03-07 9:07 ` Andrew Morton
2007-03-07 9:07 ` Andrew Morton
2007-03-07 9:18 ` Nick Piggin
2007-03-07 9:18 ` Nick Piggin
2007-03-07 9:26 ` Andrew Morton
2007-03-07 9:26 ` Andrew Morton
2007-03-07 9:28 ` Miklos Szeredi
2007-03-07 9:28 ` Miklos Szeredi
2007-03-07 9:38 ` Nick Piggin
2007-03-07 9:38 ` Nick Piggin
2007-03-07 9:25 ` Miklos Szeredi
2007-03-07 9:25 ` Miklos Szeredi
2007-03-07 9:32 ` Peter Zijlstra
2007-03-07 9:32 ` Peter Zijlstra
2007-03-07 9:45 ` Nick Piggin
2007-03-07 9:45 ` Nick Piggin
2007-03-07 10:04 ` Nick Piggin
2007-03-07 10:04 ` Nick Piggin
2007-03-07 10:06 ` Peter Zijlstra
2007-03-07 10:06 ` Peter Zijlstra
2007-03-07 10:13 ` Miklos Szeredi
2007-03-07 10:13 ` Miklos Szeredi
2007-03-07 10:21 ` Nick Piggin
2007-03-07 10:21 ` Nick Piggin
2007-03-07 10:24 ` Peter Zijlstra
2007-03-07 10:24 ` Peter Zijlstra
2007-03-07 10:38 ` Nick Piggin
2007-03-07 10:38 ` Nick Piggin
2007-03-07 10:47 ` Peter Zijlstra
2007-03-07 10:47 ` Peter Zijlstra
2007-03-07 11:00 ` Nick Piggin
2007-03-07 11:00 ` Nick Piggin
2007-03-07 11:48 ` Peter Zijlstra
2007-03-07 11:48 ` Peter Zijlstra
2007-03-07 12:17 ` Nick Piggin
2007-03-07 12:17 ` Nick Piggin
2007-03-07 12:41 ` Peter Zijlstra
2007-03-07 12:41 ` Peter Zijlstra
2007-03-07 13:08 ` Nick Piggin
2007-03-07 13:08 ` Nick Piggin
2007-03-07 13:19 ` Peter Zijlstra
2007-03-07 13:19 ` Peter Zijlstra
2007-03-07 13:36 ` Nick Piggin
2007-03-07 13:36 ` Nick Piggin
2007-03-07 13:52 ` Peter Zijlstra
2007-03-07 13:52 ` Peter Zijlstra
2007-03-07 13:56 ` Miklos Szeredi
2007-03-07 13:56 ` Miklos Szeredi
2007-03-07 14:34 ` Peter Zijlstra
2007-03-07 14:34 ` Peter Zijlstra
2007-03-07 15:01 ` Nick Piggin
2007-03-07 15:01 ` Nick Piggin
2007-03-07 16:58 ` [RFC][PATCH] mm: fix page_mkclean() vs non-linear vmas Peter Zijlstra
2007-03-07 16:58 ` Peter Zijlstra
2007-03-07 18:00 ` Linus Torvalds
2007-03-07 18:00 ` Linus Torvalds
2007-03-07 18:12 ` Peter Zijlstra
2007-03-07 18:12 ` Peter Zijlstra
2007-03-07 18:24 ` Peter Zijlstra
2007-03-07 18:24 ` Peter Zijlstra
2007-03-08 11:21 ` Miklos Szeredi
2007-03-08 11:21 ` Miklos Szeredi
2007-03-08 11:37 ` Peter Zijlstra
2007-03-08 11:37 ` Peter Zijlstra
2007-03-08 11:48 ` Miklos Szeredi
2007-03-08 11:48 ` Miklos Szeredi
2007-03-08 12:11 ` Peter Zijlstra
2007-03-08 12:11 ` Peter Zijlstra
2007-03-08 12:19 ` Nick Piggin
2007-03-08 12:19 ` Nick Piggin
2007-03-08 12:25 ` Miklos Szeredi
2007-03-08 12:25 ` Miklos Szeredi
2007-03-08 11:58 ` Nick Piggin
2007-03-08 11:58 ` Nick Piggin
2007-03-08 12:09 ` Miklos Szeredi
2007-03-08 12:09 ` Miklos Szeredi
2007-03-07 15:10 ` [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear) Jeff Dike
2007-03-07 15:10 ` Jeff Dike
2007-03-07 13:53 ` Miklos Szeredi
2007-03-07 13:53 ` Miklos Szeredi
2007-03-07 14:50 ` Nick Piggin
2007-03-07 14:50 ` Nick Piggin
2007-03-07 12:22 ` Bill Irwin
2007-03-07 12:22 ` Bill Irwin
2007-03-07 12:36 ` Nick Piggin
2007-03-07 12:36 ` Nick Piggin
2007-03-07 10:30 ` [rfc][patch 7/6] mm: merge page_mkwrite Nick Piggin
2007-03-07 10:30 ` Nick Piggin
2007-03-07 8:59 ` [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear) Nick Piggin
2007-03-07 8:59 ` Nick Piggin
2007-03-07 9:11 ` Nick Piggin
2007-03-07 9:11 ` Nick Piggin
2007-03-07 9:22 ` Ingo Molnar
2007-03-07 9:22 ` Ingo Molnar
2007-03-07 9:32 ` Bill Irwin
2007-03-07 9:32 ` Bill Irwin
2007-03-07 9:35 ` Ingo Molnar
2007-03-07 9:35 ` Ingo Molnar
2007-03-07 9:50 ` Bill Irwin
2007-03-07 9:50 ` Bill Irwin
2007-03-07 9:52 ` Nick Piggin
2007-03-07 9:52 ` Nick Piggin
2007-03-07 7:19 ` Bill Irwin
2007-03-07 7:19 ` Bill Irwin
2007-03-07 10:05 ` Benjamin Herrenschmidt
2007-03-07 10:05 ` Benjamin Herrenschmidt
2007-03-07 10:17 ` Nick Piggin
2007-03-07 10:17 ` Nick Piggin
2007-03-07 10:46 ` Benjamin Herrenschmidt
2007-03-07 10:46 ` Benjamin Herrenschmidt
2007-02-21 4:50 ` [patch 5/6] mm: merge nopfn into fault Nick Piggin
2007-02-21 4:50 ` Nick Piggin
2007-02-21 5:13 ` Nick Piggin
2007-02-21 5:13 ` Nick Piggin
2007-02-21 4:50 ` [patch 6/6] mm: remove legacy cruft Nick Piggin
2007-02-21 4:50 ` Nick Piggin
2007-02-27 4:36 ` [patch 0/6] fault vs truncate/invalidate race fix Dave Airlie
2007-02-27 4:36 ` Dave Airlie
2007-02-27 5:32 ` Andrew Morton
2007-02-27 5:32 ` Andrew Morton
2007-02-27 6:26 ` Dave Airlie
2007-02-27 6:26 ` Dave Airlie
2007-02-27 6:54 ` Benjamin Herrenschmidt
2007-02-27 6:54 ` Benjamin Herrenschmidt
2007-03-18 23:13 ` Dave Airlie
2007-03-18 23:13 ` Dave Airlie
2007-02-27 8:50 ` Nick Piggin
2007-02-27 8:50 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070306223641.505db0e0.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.