linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Wed, 12 Nov 2008 18:32:58 +0100	[thread overview]
Message-ID: <20081112173258.GX10818@random.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0811112109390.10501@quilx.com>

On Tue, Nov 11, 2008 at 09:10:45PM -0600, Christoph Lameter wrote:
> get_user_pages() cannot get to it since the pagetables have already been
> modified. If get_user_pages runs then the fault handling will occur
> which will block the thread until migration is complete.

migrate.c does nothing for ptes pointing to swap entries and
do_swap_page won't wait for them either. Assume follow_page in
migrate.c returns a swapcache not mapped but with a pte pointing to
it. That means page_count 1 (+1 after you isolate it from the lru),
page_mapcount 0, page_mapped 0, page_mapping = swap address space,
swap_count = 2 (1 swapcache, 1 the pte with the swapentry). Now assume
one thread does o_direct read from disk that triggers a minor fault in
do_swap_cache called by get_user_pages. The other cpu is running
sys_move_pages and the expected count will match the page count in
migrate_page_move_mapping. Page is still in swapcache. So after the
expected count matches in the migrate.c thread, the other thread
continues in do_swap_page and runs lookup_swap_cache that succeeds
(the page wasn't removed from swapcache yet as migrate.c needs to bail
out if the expected count doesn't match, so it can't mess with the
oldpage until it's sure it can migrate it). After that do_swap_page
gets a reference on the swapcache (at that point migrate.c continues
despite the expected count isn't 2 anymore! just a second after having
verified that it was 2). lock_page blocks do_swap_page until migration
is complete but pte_same in do_swap_page won't fail because the pte is
still pointing to the same swapentry (it's just the swapcache inode
radix tree that points to a different page, the swapentry is still the
same as before the migration - is_swap_pte will succeed but
is_migration_entry failed when restoring the pte). Finally the pte is
overwritten with the old page and any data written to the new page in
between is lost.

However it's not exactly the same bug as the one in fork, I was
talking about before, it's also not o_direct specific. Still
page_wrprotect + replace_page is orders of magnitude simpler logic
than migrate.c and it has no bugs or at least it's certainly much
simpler to proof as correct. Furthermore we never 'stall' any userland
task while we do our work. We only mark the pte wrprotected, the task
can cow or takeover it if refcount allows anytime, and later we'll
bailout during replace_page if something has happened in between
page_wrprotect and replace_page. So our logic is simpler and tuned for
max performance and fewer interference with userland runtime. Not
really sure if it worth for us to call into migrate.c.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-11-12 17:32 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-11 13:21 [PATCH 0/4] ksm - dynamic page sharing driver for linux Izik Eidus
2008-11-11 13:21 ` [PATCH 1/4] rmap: add page_wrprotect() function, Izik Eidus, Izik Eidus
2008-11-11 13:21   ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Izik Eidus, Izik Eidus
2008-11-11 13:21     ` [PATCH 3/4] add ksm kernel shared memory driver Izik Eidus, Izik Eidus
2008-11-11 13:21       ` [PATCH 4/4] MMU_NOTIFIRES: add set_pte_at_notify() Izik Eidus, Izik Eidus
2008-11-11 20:38       ` [PATCH 3/4] add ksm kernel shared memory driver Andrew Morton
2008-11-11 22:03         ` Andrea Arcangeli
2008-11-11 22:03       ` Jonathan Corbet
2008-11-11 22:17         ` Izik Eidus
2008-11-11 22:25           ` Jonathan Corbet
2008-11-11 22:31             ` Izik Eidus
2008-11-11 22:30           ` Jonathan Corbet
2008-11-11 22:38             ` Izik Eidus
2008-11-11 23:02             ` Izik Eidus
2008-11-11 23:03             ` Andrea Arcangeli
2008-11-11 22:49           ` Avi Kivity
2008-11-11 22:40         ` Valdis.Kletnieks
2008-11-13  6:13           ` Eric Rannaud
2008-11-11 22:43         ` Avi Kivity
2008-11-11 19:45     ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Andrew Morton
2008-11-11 20:57       ` Izik Eidus
2008-11-11 21:21         ` Christoph Lameter
2008-11-11 21:23           ` Izik Eidus
2008-11-11 21:31             ` Christoph Lameter
2008-11-11 21:37               ` Izik Eidus
2008-11-11 22:24               ` Andrea Arcangeli
2008-11-12  2:19                 ` KAMEZAWA Hiroyuki
2008-11-12 10:05                   ` Avi Kivity
2008-11-12 11:11                     ` Izik Eidus
2008-11-13  6:11                       ` KAMEZAWA Hiroyuki
2008-11-13 10:38                         ` Izik Eidus
2008-11-13 11:32                           ` KAMEZAWA Hiroyuki
2008-11-11 21:35           ` Andrea Arcangeli
2008-11-11 21:06       ` Andrea Arcangeli
2008-11-11 21:26         ` Christoph Lameter
2008-11-11 21:39           ` Avi Kivity
2008-11-11 21:47             ` Christoph Lameter
2008-11-11 21:55               ` Izik Eidus
2008-11-11 22:36               ` Avi Kivity
2008-11-11 22:17           ` Andrea Arcangeli
2008-11-11 22:30             ` Christoph Lameter
2008-11-11 23:17               ` Andrea Arcangeli
2008-11-11 23:25                 ` Andrea Arcangeli
2008-11-12  0:27                 ` Christoph Lameter
2008-11-12  2:27                   ` Andrea Arcangeli
2008-11-12  3:10                     ` Christoph Lameter
2008-11-12 17:32                       ` Andrea Arcangeli [this message]
2008-11-12 20:08                         ` Lee Schermerhorn
2008-11-12 20:31                           ` Christoph Lameter
2008-11-12 20:27                         ` Christoph Lameter
2008-11-12 22:09                           ` Lee Schermerhorn
2008-11-13  2:00                             ` Andrea Arcangeli
2008-11-13  2:31                               ` Andrea Arcangeli
2008-11-13  4:02                                 ` Nick Piggin
2008-11-11 19:39   ` [PATCH 1/4] rmap: add page_wrprotect() function, Andrew Morton
2008-11-11 20:38     ` Andrea Arcangeli
2008-11-11 21:01       ` Andrew Morton
2008-11-11 21:17         ` Andrea Arcangeli
2008-11-11 18:30 ` [PATCH 0/4] ksm - dynamic page sharing driver for linux Andrew Morton
2008-11-11 18:48   ` Avi Kivity
2008-11-11 19:08     ` Izik Eidus
2008-11-11 19:11     ` Andrew Morton
2008-11-11 19:18       ` Izik Eidus
2008-11-11 19:32         ` Andrew Morton
2008-11-11 19:52           ` Izik Eidus
2008-11-11 20:08             ` Izik Eidus
2008-11-11 19:29       ` Avi Kivity
2008-11-11 19:55       ` Andrea Arcangeli
2008-11-11 19:07   ` Izik Eidus
2008-11-11 19:20     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081112173258.GX10818@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=ieidus@redhat.com \
    --cc=izike@qumranet.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).