linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Izik Eidus <ieidus@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
	izike@qumranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another
Date: Thu, 13 Nov 2008 15:02:19 +1100	[thread overview]
Message-ID: <200811131502.20102.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20081113023135.GD10818@random.random>

On Thursday 13 November 2008 13:31, Andrea Arcangeli wrote:
> On Thu, Nov 13, 2008 at 03:00:59AM +0100, Andrea Arcangeli wrote:
> > CPU0 migrate.c			CPU1 filemap.c
> > -------				----------
> > 				find_get_page
> > 				radix_tree_lookup_slot returns the oldpage
> > page_count still = expected_count
> > freeze_ref (oldpage->count = 0)
> > radix_tree_replace (too late, other side already got the oldpage)
> > unfreeze_ref (oldpage->count = 2)
> > 				page_cache_get_speculative(old_page)
> > 				set count to 3 and succeeds
>
> After reading more of this lockless radix tree code, I realized this
> below check is the one that was intended to restart find_get_page and
> prevent it to return the oldpage:
>
> 				    if (unlikely(page != *pagep)) {
>
> But there's no barrier there, atomic_add_unless would need to provide
> an atomic smp_mb() _after_ atomic_add_unless executed. In the old days
> the atomic_* routines had no implied memory barriers, you had to use
> smp_mb__after_atomic_add_unless if you wanted to avoid the race. I
> don't see much in the ppc implementation of atomic_add_unless that
> would provide an implicit smb_mb after the page_cache_get_speculative
> returns, so I can't see why the pagep can't be by find_get_page read
> before the other cpu executes radix_tree_replace in the above
> timeline.

All atomic functions that both return a value and modify their memory
are defined to provide full barriers before and after the operation.

powerpc should be OK

        __asm__ __volatile__ (
        LWSYNC_ON_SMP
"1:     lwarx   %0,0,%1         # atomic_add_unless\n\
        cmpw    0,%0,%3 \n\
        beq-    2f \n\
        add     %0,%2,%0 \n"
        PPC405_ERR77(0,%2)
"       stwcx.  %0,0,%1 \n\
        bne-    1b \n"
        ISYNC_ON_SMP
"       subf    %0,%2,%0 \n\
2:"
        : "=&r" (t)
        : "r" (&v->counter), "r" (a), "r" (u)
        : "cc", "memory");

lwsync instruction prevents all reorderings except allows loads to
pass stores. But because it is followed by a LL/SC sequence, the
store part of that sequence is prevented from passing stores, so
therefore it will fail if the load had executed earlier and the
value has changed.

isync prevents speculative execution, so the branch has to be
resolved before any subsequent instructions start. The branch
depends on the result of the LL/SC, so that has to be complete.

So that prevents loads from passing the LL/SC.

For the SC to complete, I think by definition the store has to be
visible, because it has to check the value has not changed (so it
would have to own the cacheline).

I think that's how it works. I'm not an expert at powerpc's weird
barriers, but I'm pretty sure they work.


> I guess you intended to put an smp_mb() in between the
> page_cache_get_speculative and the *pagep to make the code safe on ppc
> too, but there isn't, and I think it must be fixed, either that or I
> don't understand ppc assembly right. The other side has a smp_wmb
> implicit inside radix_tree_replace_slot so it should be ok already to
> ensure we see the refcount going to 0 before we see the pagep changed
> (the fact the other side has a memory barrier, further confirms this
> side needs it too).

I think it is OK. It should have a comment there however to say it
relies on a barrier in get_page_unless_zero (I thought I had one..)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-11-13  4:02 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-11 13:21 [PATCH 0/4] ksm - dynamic page sharing driver for linux Izik Eidus
2008-11-11 13:21 ` [PATCH 1/4] rmap: add page_wrprotect() function, Izik Eidus, Izik Eidus
2008-11-11 13:21   ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Izik Eidus, Izik Eidus
2008-11-11 13:21     ` [PATCH 3/4] add ksm kernel shared memory driver Izik Eidus, Izik Eidus
2008-11-11 13:21       ` [PATCH 4/4] MMU_NOTIFIRES: add set_pte_at_notify() Izik Eidus, Izik Eidus
2008-11-11 20:38       ` [PATCH 3/4] add ksm kernel shared memory driver Andrew Morton
2008-11-11 22:03         ` Andrea Arcangeli
2008-11-11 22:03       ` Jonathan Corbet
2008-11-11 22:17         ` Izik Eidus
2008-11-11 22:25           ` Jonathan Corbet
2008-11-11 22:31             ` Izik Eidus
2008-11-11 22:30           ` Jonathan Corbet
2008-11-11 22:38             ` Izik Eidus
2008-11-11 23:02             ` Izik Eidus
2008-11-11 23:03             ` Andrea Arcangeli
2008-11-11 22:49           ` Avi Kivity
2008-11-11 22:40         ` Valdis.Kletnieks
2008-11-13  6:13           ` Eric Rannaud
2008-11-11 22:43         ` Avi Kivity
2008-11-11 19:45     ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Andrew Morton
2008-11-11 20:57       ` Izik Eidus
2008-11-11 21:21         ` Christoph Lameter
2008-11-11 21:23           ` Izik Eidus
2008-11-11 21:31             ` Christoph Lameter
2008-11-11 21:37               ` Izik Eidus
2008-11-11 22:24               ` Andrea Arcangeli
2008-11-12  2:19                 ` KAMEZAWA Hiroyuki
2008-11-12 10:05                   ` Avi Kivity
2008-11-12 11:11                     ` Izik Eidus
2008-11-13  6:11                       ` KAMEZAWA Hiroyuki
2008-11-13 10:38                         ` Izik Eidus
2008-11-13 11:32                           ` KAMEZAWA Hiroyuki
2008-11-11 21:35           ` Andrea Arcangeli
2008-11-11 21:06       ` Andrea Arcangeli
2008-11-11 21:26         ` Christoph Lameter
2008-11-11 21:39           ` Avi Kivity
2008-11-11 21:47             ` Christoph Lameter
2008-11-11 21:55               ` Izik Eidus
2008-11-11 22:36               ` Avi Kivity
2008-11-11 22:17           ` Andrea Arcangeli
2008-11-11 22:30             ` Christoph Lameter
2008-11-11 23:17               ` Andrea Arcangeli
2008-11-11 23:25                 ` Andrea Arcangeli
2008-11-12  0:27                 ` Christoph Lameter
2008-11-12  2:27                   ` Andrea Arcangeli
2008-11-12  3:10                     ` Christoph Lameter
2008-11-12 17:32                       ` Andrea Arcangeli
2008-11-12 20:08                         ` Lee Schermerhorn
2008-11-12 20:31                           ` Christoph Lameter
2008-11-12 20:27                         ` Christoph Lameter
2008-11-12 22:09                           ` Lee Schermerhorn
2008-11-13  2:00                             ` Andrea Arcangeli
2008-11-13  2:31                               ` Andrea Arcangeli
2008-11-13  4:02                                 ` Nick Piggin [this message]
2008-11-11 19:39   ` [PATCH 1/4] rmap: add page_wrprotect() function, Andrew Morton
2008-11-11 20:38     ` Andrea Arcangeli
2008-11-11 21:01       ` Andrew Morton
2008-11-11 21:17         ` Andrea Arcangeli
2008-11-11 18:30 ` [PATCH 0/4] ksm - dynamic page sharing driver for linux Andrew Morton
2008-11-11 18:48   ` Avi Kivity
2008-11-11 19:08     ` Izik Eidus
2008-11-11 19:11     ` Andrew Morton
2008-11-11 19:18       ` Izik Eidus
2008-11-11 19:32         ` Andrew Morton
2008-11-11 19:52           ` Izik Eidus
2008-11-11 20:08             ` Izik Eidus
2008-11-11 19:29       ` Avi Kivity
2008-11-11 19:55       ` Andrea Arcangeli
2008-11-11 19:07   ` Izik Eidus
2008-11-11 19:20     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200811131502.20102.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=ieidus@redhat.com \
    --cc=izike@qumranet.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).