All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Wed, 12 Nov 2008 03:27:01 +0100	[thread overview]
Message-ID: <20081112022701.GT10818@random.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0811111823030.31625@quilx.com>

On Tue, Nov 11, 2008 at 06:27:09PM -0600, Christoph Lameter wrote:
> Then page migration will not occur because there is an unresolved
> reference.

So are you checking if there's an unresolved reference only in the
very place I just quoted in the previous email? If answer is yes: what
should prevent get_user_pages from running in parallel from another
thread? get_user_pages will trigger a minor fault and get the elevated
reference just after you read page_count. To you it looks like there
is no o_direct in progress when you proceed to the core of migration
code, but in effect o_direct just started a moment after you read the
page count.

What can protect you is PG lock or mmap_sem in _write_ mode (and
they've to be hold for the whole duration of the migration). I don't
see any of the two being hold while you read the page count... You
don't seem to be using stop_machine either (stop_machine pretty
expensive on the 4096 way I guess).

This wasn't reproduced in practice but it should be possible to
reproduce it by just writing a testcase with three threads, one forks
in a loop (child just quit) the other memset 0 the first 512bytes of a
page, and then o_direct read from a 0xff 512byte region and checks
that the first 512bytes are all non-zero in a loop, and the third
writes 1 byte to the last 512bytes of the page in a loop. Eventually
the comparison should show zero data in the page.

To reproduce with migration just start the thread that memset 0, reads
a 0xff region with o_direct, and checks it's all 0xff in a loop, and
then migrate the memory of this thread back and forth between two
nodes with the sys_move_pages (mpol is safe by luck because it
surrounds migrate_pages with the mmap_sem in write mode). Eventually
you should see zero bytes despite I/O is complete.

Reproducing this is normal life would take time and for the fork bug
it may not be reproducible depending of what the app is doing. Mixing
sys_move_pages with o_direct in the same process with on two different
threads, instead should eventually eventually reproduce it. And with
gup_fast is now unfixable until more infrastructure is added to
slowdown gup_fast a bit (unless Nick finds an RCU way of doing it).

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Wed, 12 Nov 2008 03:27:01 +0100	[thread overview]
Message-ID: <20081112022701.GT10818@random.random> (raw)
In-Reply-To: <Pine.LNX.4.64.0811111823030.31625@quilx.com>

On Tue, Nov 11, 2008 at 06:27:09PM -0600, Christoph Lameter wrote:
> Then page migration will not occur because there is an unresolved
> reference.

So are you checking if there's an unresolved reference only in the
very place I just quoted in the previous email? If answer is yes: what
should prevent get_user_pages from running in parallel from another
thread? get_user_pages will trigger a minor fault and get the elevated
reference just after you read page_count. To you it looks like there
is no o_direct in progress when you proceed to the core of migration
code, but in effect o_direct just started a moment after you read the
page count.

What can protect you is PG lock or mmap_sem in _write_ mode (and
they've to be hold for the whole duration of the migration). I don't
see any of the two being hold while you read the page count... You
don't seem to be using stop_machine either (stop_machine pretty
expensive on the 4096 way I guess).

This wasn't reproduced in practice but it should be possible to
reproduce it by just writing a testcase with three threads, one forks
in a loop (child just quit) the other memset 0 the first 512bytes of a
page, and then o_direct read from a 0xff 512byte region and checks
that the first 512bytes are all non-zero in a loop, and the third
writes 1 byte to the last 512bytes of the page in a loop. Eventually
the comparison should show zero data in the page.

To reproduce with migration just start the thread that memset 0, reads
a 0xff region with o_direct, and checks it's all 0xff in a loop, and
then migrate the memory of this thread back and forth between two
nodes with the sys_move_pages (mpol is safe by luck because it
surrounds migrate_pages with the mmap_sem in write mode). Eventually
you should see zero bytes despite I/O is complete.

Reproducing this is normal life would take time and for the fork bug
it may not be reproducible depending of what the app is doing. Mixing
sys_move_pages with o_direct in the same process with on two different
threads, instead should eventually eventually reproduce it. And with
gup_fast is now unfixable until more infrastructure is added to
slowdown gup_fast a bit (unless Nick finds an RCU way of doing it).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-11-12  2:27 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-11 13:21 [PATCH 0/4] ksm - dynamic page sharing driver for linux Izik Eidus
2008-11-11 13:21 ` Izik Eidus
2008-11-11 13:21 ` [PATCH 1/4] rmap: add page_wrprotect() function, Izik Eidus
2008-11-11 13:21   ` Izik Eidus, Izik Eidus
2008-11-11 13:21   ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Izik Eidus
2008-11-11 13:21     ` Izik Eidus, Izik Eidus
2008-11-11 13:21     ` [PATCH 3/4] add ksm kernel shared memory driver Izik Eidus
2008-11-11 13:21       ` Izik Eidus, Izik Eidus
2008-11-11 13:21       ` [PATCH 4/4] MMU_NOTIFIRES: add set_pte_at_notify() Izik Eidus
2008-11-11 13:21         ` Izik Eidus, Izik Eidus
2008-11-11 20:38       ` [PATCH 3/4] add ksm kernel shared memory driver Andrew Morton
2008-11-11 20:38         ` Andrew Morton
2008-11-11 22:03         ` Andrea Arcangeli
2008-11-11 22:03           ` Andrea Arcangeli
2008-11-11 22:03       ` Jonathan Corbet
2008-11-11 22:03         ` Jonathan Corbet
2008-11-11 22:17         ` Izik Eidus
2008-11-11 22:17           ` Izik Eidus
2008-11-11 22:25           ` Jonathan Corbet
2008-11-11 22:25             ` Jonathan Corbet
2008-11-11 22:31             ` Izik Eidus
2008-11-11 22:31               ` Izik Eidus
2008-11-11 22:30           ` Jonathan Corbet
2008-11-11 22:30             ` Jonathan Corbet
2008-11-11 22:38             ` Izik Eidus
2008-11-11 22:38               ` Izik Eidus
2008-11-11 23:02             ` Izik Eidus
2008-11-11 23:02               ` Izik Eidus
2008-11-11 23:03             ` Andrea Arcangeli
2008-11-11 23:03               ` Andrea Arcangeli
2008-11-11 22:49           ` Avi Kivity
2008-11-11 22:49             ` Avi Kivity
2008-11-11 22:40         ` Valdis.Kletnieks
2008-11-13  6:13           ` Eric Rannaud
2008-11-13  6:13             ` Eric Rannaud
2008-11-11 22:43         ` Avi Kivity
2008-11-11 22:43           ` Avi Kivity
2008-11-11 19:45     ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Andrew Morton
2008-11-11 19:45       ` Andrew Morton
2008-11-11 20:57       ` Izik Eidus
2008-11-11 20:57         ` Izik Eidus
2008-11-11 21:21         ` Christoph Lameter
2008-11-11 21:21           ` Christoph Lameter
2008-11-11 21:23           ` Izik Eidus
2008-11-11 21:23             ` Izik Eidus
2008-11-11 21:31             ` Christoph Lameter
2008-11-11 21:31               ` Christoph Lameter
2008-11-11 21:37               ` Izik Eidus
2008-11-11 21:37                 ` Izik Eidus
2008-11-11 22:24               ` Andrea Arcangeli
2008-11-11 22:24                 ` Andrea Arcangeli
2008-11-12  2:19                 ` KAMEZAWA Hiroyuki
2008-11-12  2:19                   ` KAMEZAWA Hiroyuki
2008-11-12 10:05                   ` Avi Kivity
2008-11-12 10:05                     ` Avi Kivity
2008-11-12 11:11                     ` Izik Eidus
2008-11-12 11:11                       ` Izik Eidus
2008-11-13  6:11                       ` KAMEZAWA Hiroyuki
2008-11-13  6:11                         ` KAMEZAWA Hiroyuki
2008-11-13 10:38                         ` Izik Eidus
2008-11-13 10:38                           ` Izik Eidus
2008-11-13 11:32                           ` KAMEZAWA Hiroyuki
2008-11-13 11:32                             ` KAMEZAWA Hiroyuki
2008-11-11 21:35           ` Andrea Arcangeli
2008-11-11 21:35             ` Andrea Arcangeli
2008-11-11 21:06       ` Andrea Arcangeli
2008-11-11 21:06         ` Andrea Arcangeli
2008-11-11 21:26         ` Christoph Lameter
2008-11-11 21:26           ` Christoph Lameter
2008-11-11 21:39           ` Avi Kivity
2008-11-11 21:39             ` Avi Kivity
2008-11-11 21:47             ` Christoph Lameter
2008-11-11 21:47               ` Christoph Lameter
2008-11-11 21:55               ` Izik Eidus
2008-11-11 21:55                 ` Izik Eidus
2008-11-11 22:36               ` Avi Kivity
2008-11-11 22:36                 ` Avi Kivity
2008-11-11 22:17           ` Andrea Arcangeli
2008-11-11 22:17             ` Andrea Arcangeli
2008-11-11 22:30             ` Christoph Lameter
2008-11-11 22:30               ` Christoph Lameter
2008-11-11 23:17               ` Andrea Arcangeli
2008-11-11 23:17                 ` Andrea Arcangeli
2008-11-11 23:25                 ` Andrea Arcangeli
2008-11-11 23:25                   ` Andrea Arcangeli
2008-11-12  0:27                 ` Christoph Lameter
2008-11-12  0:27                   ` Christoph Lameter
2008-11-12  2:27                   ` Andrea Arcangeli [this message]
2008-11-12  2:27                     ` Andrea Arcangeli
2008-11-12  3:10                     ` Christoph Lameter
2008-11-12  3:10                       ` Christoph Lameter
2008-11-12 17:32                       ` Andrea Arcangeli
2008-11-12 17:32                         ` Andrea Arcangeli
2008-11-12 20:08                         ` Lee Schermerhorn
2008-11-12 20:08                           ` Lee Schermerhorn
2008-11-12 20:31                           ` Christoph Lameter
2008-11-12 20:31                             ` Christoph Lameter
2008-11-12 20:27                         ` Christoph Lameter
2008-11-12 20:27                           ` Christoph Lameter
2008-11-12 22:09                           ` Lee Schermerhorn
2008-11-12 22:09                             ` Lee Schermerhorn
2008-11-13  2:00                             ` Andrea Arcangeli
2008-11-13  2:00                               ` Andrea Arcangeli
2008-11-13  2:31                               ` Andrea Arcangeli
2008-11-13  2:31                                 ` Andrea Arcangeli
2008-11-13  4:02                                 ` Nick Piggin
2008-11-13  4:02                                   ` Nick Piggin
2008-11-11 19:39   ` [PATCH 1/4] rmap: add page_wrprotect() function, Andrew Morton
2008-11-11 19:39     ` Andrew Morton
2008-11-11 20:38     ` Andrea Arcangeli
2008-11-11 20:38       ` Andrea Arcangeli
2008-11-11 21:01       ` Andrew Morton
2008-11-11 21:01         ` Andrew Morton
2008-11-11 21:17         ` Andrea Arcangeli
2008-11-11 21:17           ` Andrea Arcangeli
2008-11-11 18:30 ` [PATCH 0/4] ksm - dynamic page sharing driver for linux Andrew Morton
2008-11-11 18:30   ` Andrew Morton
2008-11-11 18:48   ` Avi Kivity
2008-11-11 18:48     ` Avi Kivity
2008-11-11 19:08     ` Izik Eidus
2008-11-11 19:08       ` Izik Eidus
2008-11-11 19:11     ` Andrew Morton
2008-11-11 19:11       ` Andrew Morton
2008-11-11 19:18       ` Izik Eidus
2008-11-11 19:18         ` Izik Eidus
2008-11-11 19:32         ` Andrew Morton
2008-11-11 19:32           ` Andrew Morton
2008-11-11 19:52           ` Izik Eidus
2008-11-11 19:52             ` Izik Eidus
2008-11-11 20:08             ` Izik Eidus
2008-11-11 20:08               ` Izik Eidus
2008-11-11 19:29       ` Avi Kivity
2008-11-11 19:29         ` Avi Kivity
2008-11-11 19:55       ` Andrea Arcangeli
2008-11-11 19:55         ` Andrea Arcangeli
2008-11-11 19:07   ` Izik Eidus
2008-11-11 19:07     ` Izik Eidus
2008-11-11 19:20     ` Andrew Morton
2008-11-11 19:20       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081112022701.GT10818@random.random \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=ieidus@redhat.com \
    --cc=izike@qumranet.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.