All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Thu, 13 Nov 2008 03:31:35 +0100	[thread overview]
Message-ID: <20081113023135.GD10818@random.random> (raw)
In-Reply-To: <20081113020059.GC10818@random.random>

On Thu, Nov 13, 2008 at 03:00:59AM +0100, Andrea Arcangeli wrote:
> CPU0 migrate.c			CPU1 filemap.c
> -------				----------
> 				find_get_page
> 				radix_tree_lookup_slot returns the oldpage
> page_count still = expected_count
> freeze_ref (oldpage->count = 0)
> radix_tree_replace (too late, other side already got the oldpage)
> unfreeze_ref (oldpage->count = 2)
> 				page_cache_get_speculative(old_page)
> 				set count to 3 and succeeds

After reading more of this lockless radix tree code, I realized this
below check is the one that was intended to restart find_get_page and
prevent it to return the oldpage:

				    if (unlikely(page != *pagep)) {

But there's no barrier there, atomic_add_unless would need to provide
an atomic smp_mb() _after_ atomic_add_unless executed. In the old days
the atomic_* routines had no implied memory barriers, you had to use
smp_mb__after_atomic_add_unless if you wanted to avoid the race. I
don't see much in the ppc implementation of atomic_add_unless that
would provide an implicit smb_mb after the page_cache_get_speculative
returns, so I can't see why the pagep can't be by find_get_page read
before the other cpu executes radix_tree_replace in the above
timeline.

I guess you intended to put an smp_mb() in between the
page_cache_get_speculative and the *pagep to make the code safe on ppc
too, but there isn't, and I think it must be fixed, either that or I
don't understand ppc assembly right. The other side has a smp_wmb
implicit inside radix_tree_replace_slot so it should be ok already to
ensure we see the refcount going to 0 before we see the pagep changed
(the fact the other side has a memory barrier, further confirms this
side needs it too).

BTW, the radix_tree_deref_slot might miss a rcu_barrier_depends()
after radix_tree_deref_slot returns but I'm not entirely sure and only
alpha would be affected in the worst case.

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Thu, 13 Nov 2008 03:31:35 +0100	[thread overview]
Message-ID: <20081113023135.GD10818@random.random> (raw)
In-Reply-To: <20081113020059.GC10818@random.random>

On Thu, Nov 13, 2008 at 03:00:59AM +0100, Andrea Arcangeli wrote:
> CPU0 migrate.c			CPU1 filemap.c
> -------				----------
> 				find_get_page
> 				radix_tree_lookup_slot returns the oldpage
> page_count still = expected_count
> freeze_ref (oldpage->count = 0)
> radix_tree_replace (too late, other side already got the oldpage)
> unfreeze_ref (oldpage->count = 2)
> 				page_cache_get_speculative(old_page)
> 				set count to 3 and succeeds

After reading more of this lockless radix tree code, I realized this
below check is the one that was intended to restart find_get_page and
prevent it to return the oldpage:

				    if (unlikely(page != *pagep)) {

But there's no barrier there, atomic_add_unless would need to provide
an atomic smp_mb() _after_ atomic_add_unless executed. In the old days
the atomic_* routines had no implied memory barriers, you had to use
smp_mb__after_atomic_add_unless if you wanted to avoid the race. I
don't see much in the ppc implementation of atomic_add_unless that
would provide an implicit smb_mb after the page_cache_get_speculative
returns, so I can't see why the pagep can't be by find_get_page read
before the other cpu executes radix_tree_replace in the above
timeline.

I guess you intended to put an smp_mb() in between the
page_cache_get_speculative and the *pagep to make the code safe on ppc
too, but there isn't, and I think it must be fixed, either that or I
don't understand ppc assembly right. The other side has a smp_wmb
implicit inside radix_tree_replace_slot so it should be ok already to
ensure we see the refcount going to 0 before we see the pagep changed
(the fact the other side has a memory barrier, further confirms this
side needs it too).

BTW, the radix_tree_deref_slot might miss a rcu_barrier_depends()
after radix_tree_deref_slot returns but I'm not entirely sure and only
alpha would be affected in the worst case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-11-13  2:31 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-11 13:21 [PATCH 0/4] ksm - dynamic page sharing driver for linux Izik Eidus
2008-11-11 13:21 ` Izik Eidus
2008-11-11 13:21 ` [PATCH 1/4] rmap: add page_wrprotect() function, Izik Eidus
2008-11-11 13:21   ` Izik Eidus, Izik Eidus
2008-11-11 13:21   ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Izik Eidus
2008-11-11 13:21     ` Izik Eidus, Izik Eidus
2008-11-11 13:21     ` [PATCH 3/4] add ksm kernel shared memory driver Izik Eidus
2008-11-11 13:21       ` Izik Eidus, Izik Eidus
2008-11-11 13:21       ` [PATCH 4/4] MMU_NOTIFIRES: add set_pte_at_notify() Izik Eidus
2008-11-11 13:21         ` Izik Eidus, Izik Eidus
2008-11-11 20:38       ` [PATCH 3/4] add ksm kernel shared memory driver Andrew Morton
2008-11-11 20:38         ` Andrew Morton
2008-11-11 22:03         ` Andrea Arcangeli
2008-11-11 22:03           ` Andrea Arcangeli
2008-11-11 22:03       ` Jonathan Corbet
2008-11-11 22:03         ` Jonathan Corbet
2008-11-11 22:17         ` Izik Eidus
2008-11-11 22:17           ` Izik Eidus
2008-11-11 22:25           ` Jonathan Corbet
2008-11-11 22:25             ` Jonathan Corbet
2008-11-11 22:31             ` Izik Eidus
2008-11-11 22:31               ` Izik Eidus
2008-11-11 22:30           ` Jonathan Corbet
2008-11-11 22:30             ` Jonathan Corbet
2008-11-11 22:38             ` Izik Eidus
2008-11-11 22:38               ` Izik Eidus
2008-11-11 23:02             ` Izik Eidus
2008-11-11 23:02               ` Izik Eidus
2008-11-11 23:03             ` Andrea Arcangeli
2008-11-11 23:03               ` Andrea Arcangeli
2008-11-11 22:49           ` Avi Kivity
2008-11-11 22:49             ` Avi Kivity
2008-11-11 22:40         ` Valdis.Kletnieks
2008-11-13  6:13           ` Eric Rannaud
2008-11-13  6:13             ` Eric Rannaud
2008-11-11 22:43         ` Avi Kivity
2008-11-11 22:43           ` Avi Kivity
2008-11-11 19:45     ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Andrew Morton
2008-11-11 19:45       ` Andrew Morton
2008-11-11 20:57       ` Izik Eidus
2008-11-11 20:57         ` Izik Eidus
2008-11-11 21:21         ` Christoph Lameter
2008-11-11 21:21           ` Christoph Lameter
2008-11-11 21:23           ` Izik Eidus
2008-11-11 21:23             ` Izik Eidus
2008-11-11 21:31             ` Christoph Lameter
2008-11-11 21:31               ` Christoph Lameter
2008-11-11 21:37               ` Izik Eidus
2008-11-11 21:37                 ` Izik Eidus
2008-11-11 22:24               ` Andrea Arcangeli
2008-11-11 22:24                 ` Andrea Arcangeli
2008-11-12  2:19                 ` KAMEZAWA Hiroyuki
2008-11-12  2:19                   ` KAMEZAWA Hiroyuki
2008-11-12 10:05                   ` Avi Kivity
2008-11-12 10:05                     ` Avi Kivity
2008-11-12 11:11                     ` Izik Eidus
2008-11-12 11:11                       ` Izik Eidus
2008-11-13  6:11                       ` KAMEZAWA Hiroyuki
2008-11-13  6:11                         ` KAMEZAWA Hiroyuki
2008-11-13 10:38                         ` Izik Eidus
2008-11-13 10:38                           ` Izik Eidus
2008-11-13 11:32                           ` KAMEZAWA Hiroyuki
2008-11-13 11:32                             ` KAMEZAWA Hiroyuki
2008-11-11 21:35           ` Andrea Arcangeli
2008-11-11 21:35             ` Andrea Arcangeli
2008-11-11 21:06       ` Andrea Arcangeli
2008-11-11 21:06         ` Andrea Arcangeli
2008-11-11 21:26         ` Christoph Lameter
2008-11-11 21:26           ` Christoph Lameter
2008-11-11 21:39           ` Avi Kivity
2008-11-11 21:39             ` Avi Kivity
2008-11-11 21:47             ` Christoph Lameter
2008-11-11 21:47               ` Christoph Lameter
2008-11-11 21:55               ` Izik Eidus
2008-11-11 21:55                 ` Izik Eidus
2008-11-11 22:36               ` Avi Kivity
2008-11-11 22:36                 ` Avi Kivity
2008-11-11 22:17           ` Andrea Arcangeli
2008-11-11 22:17             ` Andrea Arcangeli
2008-11-11 22:30             ` Christoph Lameter
2008-11-11 22:30               ` Christoph Lameter
2008-11-11 23:17               ` Andrea Arcangeli
2008-11-11 23:17                 ` Andrea Arcangeli
2008-11-11 23:25                 ` Andrea Arcangeli
2008-11-11 23:25                   ` Andrea Arcangeli
2008-11-12  0:27                 ` Christoph Lameter
2008-11-12  0:27                   ` Christoph Lameter
2008-11-12  2:27                   ` Andrea Arcangeli
2008-11-12  2:27                     ` Andrea Arcangeli
2008-11-12  3:10                     ` Christoph Lameter
2008-11-12  3:10                       ` Christoph Lameter
2008-11-12 17:32                       ` Andrea Arcangeli
2008-11-12 17:32                         ` Andrea Arcangeli
2008-11-12 20:08                         ` Lee Schermerhorn
2008-11-12 20:08                           ` Lee Schermerhorn
2008-11-12 20:31                           ` Christoph Lameter
2008-11-12 20:31                             ` Christoph Lameter
2008-11-12 20:27                         ` Christoph Lameter
2008-11-12 20:27                           ` Christoph Lameter
2008-11-12 22:09                           ` Lee Schermerhorn
2008-11-12 22:09                             ` Lee Schermerhorn
2008-11-13  2:00                             ` Andrea Arcangeli
2008-11-13  2:00                               ` Andrea Arcangeli
2008-11-13  2:31                               ` Andrea Arcangeli [this message]
2008-11-13  2:31                                 ` Andrea Arcangeli
2008-11-13  4:02                                 ` Nick Piggin
2008-11-13  4:02                                   ` Nick Piggin
2008-11-11 19:39   ` [PATCH 1/4] rmap: add page_wrprotect() function, Andrew Morton
2008-11-11 19:39     ` Andrew Morton
2008-11-11 20:38     ` Andrea Arcangeli
2008-11-11 20:38       ` Andrea Arcangeli
2008-11-11 21:01       ` Andrew Morton
2008-11-11 21:01         ` Andrew Morton
2008-11-11 21:17         ` Andrea Arcangeli
2008-11-11 21:17           ` Andrea Arcangeli
2008-11-11 18:30 ` [PATCH 0/4] ksm - dynamic page sharing driver for linux Andrew Morton
2008-11-11 18:30   ` Andrew Morton
2008-11-11 18:48   ` Avi Kivity
2008-11-11 18:48     ` Avi Kivity
2008-11-11 19:08     ` Izik Eidus
2008-11-11 19:08       ` Izik Eidus
2008-11-11 19:11     ` Andrew Morton
2008-11-11 19:11       ` Andrew Morton
2008-11-11 19:18       ` Izik Eidus
2008-11-11 19:18         ` Izik Eidus
2008-11-11 19:32         ` Andrew Morton
2008-11-11 19:32           ` Andrew Morton
2008-11-11 19:52           ` Izik Eidus
2008-11-11 19:52             ` Izik Eidus
2008-11-11 20:08             ` Izik Eidus
2008-11-11 20:08               ` Izik Eidus
2008-11-11 19:29       ` Avi Kivity
2008-11-11 19:29         ` Avi Kivity
2008-11-11 19:55       ` Andrea Arcangeli
2008-11-11 19:55         ` Andrea Arcangeli
2008-11-11 19:07   ` Izik Eidus
2008-11-11 19:07     ` Izik Eidus
2008-11-11 19:20     ` Andrew Morton
2008-11-11 19:20       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081113023135.GD10818@random.random \
    --to=aarcange@redhat.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=ieidus@redhat.com \
    --cc=izike@qumranet.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.