From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
Nick Piggin <npiggin-l3A5Bk7waGM@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
shaggy-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org,
axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Clark Williams <williams-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>
Subject: Re: [patch 2/2]: introduce fast_gup
Date: Mon, 21 Apr 2008 15:00:04 +0300 [thread overview]
Message-ID: <480C81C4.8030200@qumranet.com> (raw)
In-Reply-To: <alpine.LFD.1.00.0804170814090.2879-5CScLwifNT1QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Linus Torvalds wrote:
> Finally, I don't think that comment is correct in the first place. It's
> not that simple. The thing is, even *with* the memory barrier in place, we
> may have:
>
> CPU#1 CPU#2
> ===== =====
>
> fast_gup:
> - read low word
>
> native_set_pte_present:
> - set low word to 0
> - set high word to new value
>
> - read high word
>
> - set low word to new value
>
> and so you read a low word that is associated with a *different* high
> word! Notice?
>
> So trivial memory ordering is _not_ enough.
>
> So I think the code literally needs to be something like this
>
> #ifdef CONFIG_X86_PAE
>
> static inline pte_t native_get_pte(pte_t *ptep)
> {
> pte_t pte;
>
> retry:
> pte.pte_low = ptep->pte_low;
> smp_rmb();
> pte.pte_high = ptep->pte_high;
> smp_rmb();
> if (unlikely(pte.pte_low != ptep->pte_low)
> goto retry;
> return pte;
> }
>
>
I think this is still broken. Suppose that after reading pte_high
native_set_pte() is called again on another cpu, changing pte_low back
to the original value (but with a different pte_high). You now have
pte_low from second native_set_pte() but pte_high from the first
native_set_pte().
You could use cmpxchg8b to atomically load the pte, but at the expense
of taking the cacheline for exclusive access.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@qumranet.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
shaggy@austin.ibm.com, axboe@kernel.dk, linux-mm@kvack.org,
linux-arch@vger.kernel.org, Clark Williams <williams@redhat.com>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: [patch 2/2]: introduce fast_gup
Date: Mon, 21 Apr 2008 15:00:04 +0300 [thread overview]
Message-ID: <480C81C4.8030200@qumranet.com> (raw)
Message-ID: <20080421120004.wCKrDu-GkBVG6KhNZYjbPPpK0jn-qBDKeYsRm3tnMsI@z> (raw)
In-Reply-To: <alpine.LFD.1.00.0804170814090.2879@woody.linux-foundation.org>
Linus Torvalds wrote:
> Finally, I don't think that comment is correct in the first place. It's
> not that simple. The thing is, even *with* the memory barrier in place, we
> may have:
>
> CPU#1 CPU#2
> ===== =====
>
> fast_gup:
> - read low word
>
> native_set_pte_present:
> - set low word to 0
> - set high word to new value
>
> - read high word
>
> - set low word to new value
>
> and so you read a low word that is associated with a *different* high
> word! Notice?
>
> So trivial memory ordering is _not_ enough.
>
> So I think the code literally needs to be something like this
>
> #ifdef CONFIG_X86_PAE
>
> static inline pte_t native_get_pte(pte_t *ptep)
> {
> pte_t pte;
>
> retry:
> pte.pte_low = ptep->pte_low;
> smp_rmb();
> pte.pte_high = ptep->pte_high;
> smp_rmb();
> if (unlikely(pte.pte_low != ptep->pte_low)
> goto retry;
> return pte;
> }
>
>
I think this is still broken. Suppose that after reading pte_high
native_set_pte() is called again on another cpu, changing pte_low back
to the original value (but with a different pte_high). You now have
pte_low from second native_set_pte() but pte_high from the first
native_set_pte().
You could use cmpxchg8b to atomically load the pte, but at the expense
of taking the cacheline for exclusive access.
--
error compiling committee.c: too many arguments to function
WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@qumranet.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
shaggy@austin.ibm.com, axboe@kernel.dk, linux-mm@kvack.org,
linux-arch@vger.kernel.org, Clark Williams <williams@redhat.com>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: [patch 2/2]: introduce fast_gup
Date: Mon, 21 Apr 2008 15:00:04 +0300 [thread overview]
Message-ID: <480C81C4.8030200@qumranet.com> (raw)
In-Reply-To: <alpine.LFD.1.00.0804170814090.2879@woody.linux-foundation.org>
Linus Torvalds wrote:
> Finally, I don't think that comment is correct in the first place. It's
> not that simple. The thing is, even *with* the memory barrier in place, we
> may have:
>
> CPU#1 CPU#2
> ===== =====
>
> fast_gup:
> - read low word
>
> native_set_pte_present:
> - set low word to 0
> - set high word to new value
>
> - read high word
>
> - set low word to new value
>
> and so you read a low word that is associated with a *different* high
> word! Notice?
>
> So trivial memory ordering is _not_ enough.
>
> So I think the code literally needs to be something like this
>
> #ifdef CONFIG_X86_PAE
>
> static inline pte_t native_get_pte(pte_t *ptep)
> {
> pte_t pte;
>
> retry:
> pte.pte_low = ptep->pte_low;
> smp_rmb();
> pte.pte_high = ptep->pte_high;
> smp_rmb();
> if (unlikely(pte.pte_low != ptep->pte_low)
> goto retry;
> return pte;
> }
>
>
I think this is still broken. Suppose that after reading pte_high
native_set_pte() is called again on another cpu, changing pte_low back
to the original value (but with a different pte_high). You now have
pte_low from second native_set_pte() but pte_high from the first
native_set_pte().
You could use cmpxchg8b to atomically load the pte, but at the expense
of taking the cacheline for exclusive access.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-21 12:00 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-28 2:54 [patch 0/2]: lockless get_user_pages patchset Nick Piggin
2008-03-28 2:54 ` Nick Piggin
2008-03-28 2:54 ` Nick Piggin
[not found] ` <20080328025455.GA8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 2:55 ` [patch 1/2]: x86: implement pte_special Nick Piggin
2008-03-28 2:55 ` Nick Piggin
2008-03-28 2:55 ` Nick Piggin
[not found] ` <20080328025541.GB8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 3:23 ` David Miller
2008-03-28 3:23 ` David Miller, Nick Piggin
2008-03-28 3:23 ` David Miller
[not found] ` <20080327.202334.250213398.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-03-28 3:31 ` Nick Piggin
2008-03-28 3:31 ` Nick Piggin
2008-03-28 3:31 ` Nick Piggin
[not found] ` <20080328033149.GD8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 3:44 ` David Miller
2008-03-28 3:44 ` David Miller, Nick Piggin
2008-03-28 3:44 ` David Miller
[not found] ` <20080327.204431.201380891.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-03-28 4:04 ` Nick Piggin
2008-03-28 4:04 ` Nick Piggin
2008-03-28 4:04 ` Nick Piggin
[not found] ` <20080328040442.GE8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 4:09 ` David Miller
2008-03-28 4:09 ` David Miller, Nick Piggin
2008-03-28 4:09 ` David Miller
[not found] ` <20080327.210910.101408473.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-03-28 4:15 ` Nick Piggin
2008-03-28 4:15 ` Nick Piggin
2008-03-28 4:15 ` Nick Piggin
[not found] ` <20080328041519.GF8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 4:16 ` David Miller
2008-03-28 4:16 ` David Miller, Nick Piggin
2008-03-28 4:16 ` David Miller
[not found] ` <20080327.211632.02770342.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-03-28 4:19 ` Nick Piggin
2008-03-28 4:19 ` Nick Piggin
2008-03-28 4:19 ` Nick Piggin
2008-03-28 4:17 ` Nick Piggin
2008-03-28 4:17 ` Nick Piggin
2008-03-28 4:17 ` Nick Piggin
2008-03-28 3:00 ` [patch 2/2]: introduce fast_gup Nick Piggin
2008-03-28 3:00 ` Nick Piggin
2008-03-28 3:00 ` Nick Piggin
[not found] ` <20080328030023.GC8083-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-03-28 10:01 ` Jens Axboe
2008-03-28 10:01 ` Jens Axboe
2008-03-28 10:01 ` Jens Axboe
2008-04-17 15:03 ` Peter Zijlstra
2008-04-17 15:03 ` Peter Zijlstra
2008-04-17 15:03 ` Peter Zijlstra
2008-04-17 15:25 ` Linus Torvalds
2008-04-17 15:25 ` Linus Torvalds
2008-04-17 15:25 ` Linus Torvalds
[not found] ` <alpine.LFD.1.00.0804170814090.2879-5CScLwifNT1QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2008-04-17 16:12 ` Peter Zijlstra
2008-04-17 16:12 ` Peter Zijlstra
2008-04-17 16:12 ` Peter Zijlstra
2008-04-17 16:18 ` Linus Torvalds
2008-04-17 16:18 ` Linus Torvalds
2008-04-17 16:18 ` Linus Torvalds
[not found] ` <alpine.LFD.1.00.0804170916470.2879-5CScLwifNT1QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2008-04-17 16:35 ` Peter Zijlstra
2008-04-17 16:35 ` Peter Zijlstra
2008-04-17 16:35 ` Peter Zijlstra
2008-04-17 16:40 ` Linus Torvalds
2008-04-17 16:40 ` Linus Torvalds
2008-04-17 16:40 ` Linus Torvalds
[not found] ` <alpine.LFD.1.00.0804170940270.2879-5CScLwifNT1QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2008-04-17 17:23 ` Peter Zijlstra
2008-04-17 17:23 ` Peter Zijlstra
2008-04-17 17:23 ` Peter Zijlstra
2008-04-17 18:28 ` Linus Torvalds
2008-04-17 18:28 ` Linus Torvalds
2008-04-17 18:28 ` Linus Torvalds
[not found] ` <alpine.LFD.1.00.0804171127310.2879-5CScLwifNT1QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2008-04-22 3:14 ` Nick Piggin
2008-04-22 3:14 ` Nick Piggin
2008-04-22 3:14 ` Nick Piggin
2008-04-18 6:31 ` Geert Uytterhoeven
2008-04-18 6:31 ` Geert Uytterhoeven
2008-04-18 6:31 ` Geert Uytterhoeven
2008-04-18 14:40 ` Linus Torvalds
2008-04-18 14:40 ` Linus Torvalds
2008-04-18 14:40 ` Linus Torvalds
2008-04-18 9:58 ` Jeremy Fitzhardinge
2008-04-18 9:58 ` Jeremy Fitzhardinge
2008-04-18 9:58 ` Jeremy Fitzhardinge
2008-04-21 12:00 ` Avi Kivity [this message]
2008-04-21 12:00 ` Avi Kivity
2008-04-21 12:00 ` Avi Kivity
[not found] ` <480C81C4.8030200-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-04-21 12:30 ` Peter Zijlstra
2008-04-21 12:30 ` Peter Zijlstra
2008-04-21 12:30 ` Peter Zijlstra
2008-04-21 13:26 ` Avi Kivity
2008-04-21 13:26 ` Avi Kivity
2008-04-21 13:26 ` Avi Kivity
[not found] ` <480C9619.2050201-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2008-04-21 14:35 ` Peter Zijlstra
2008-04-21 14:35 ` Peter Zijlstra
2008-04-21 14:35 ` Peter Zijlstra
2008-04-22 3:23 ` Nick Piggin
2008-04-22 3:23 ` Nick Piggin
2008-04-22 3:23 ` Nick Piggin
[not found] ` <20080422032319.GB21993-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-04-22 7:19 ` Avi Kivity
2008-04-22 7:19 ` Avi Kivity
2008-04-22 7:19 ` Avi Kivity
2008-04-22 8:07 ` Ingo Molnar
2008-04-22 8:07 ` Ingo Molnar
2008-04-22 8:07 ` Ingo Molnar
2008-04-22 9:42 ` Peter Zijlstra
2008-04-22 9:42 ` Peter Zijlstra
2008-04-22 9:42 ` Peter Zijlstra
2008-04-22 9:46 ` Nick Piggin
2008-04-22 9:46 ` Nick Piggin
2008-04-22 9:46 ` Nick Piggin
2008-05-14 18:33 ` Dave Kleikamp
2008-05-14 18:33 ` Dave Kleikamp
2008-05-15 1:13 ` Nick Piggin
2008-05-15 1:13 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=480C81C4.8030200@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mingo-X9Un+BFzKDI@public.gmane.org \
--cc=npiggin-l3A5Bk7waGM@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=shaggy-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=williams-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.