From: Zachary Amsden <zamsden@redhat.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Glauber Costa <glommer@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>, Avi Kivity <avi@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86: fix ordering constraints on crX read/writes
Date: Wed, 14 Jul 2010 14:28:39 -1000 [thread overview]
Message-ID: <4C3E5637.4010300@redhat.com> (raw)
In-Reply-To: <4C3E363B.7060804@goop.org>
On 07/14/2010 12:12 PM, Jeremy Fitzhardinge wrote:
> Change d3ca901f94b3299 introduces a new mechanism for sequencing
> accesses to the control registers using a variable to act as a
> dependency. (However, the patch description only says it is unifying
> parts of system.h and makes no mention of this new code.)
>
> This sequencing implementation is flawed in two ways:
> - Firstly, it gets the input/outputs for __force_order wrong on
> the asm statements. __force_order is a proxy for the control
> registers themselves, so a write_crX should write __force_order,
> and conversely for read_crX. The original code got this backwards,
> making write_crX read from __force_order, which in principle would
> allow gcc to eliminate a "redundant" write_crX.
>
> - Secondly, writes to control registers can have drastic
> side-effects on all memory operations (write to cr3 changes the
> current pagetable and redefines the meaning of all addresses,
> for example), so they must clobber "memory" in order to be
> ordered with respect to memory operations.
>
> We seem to have been saved by the implicit ordering that "asm volatile"
> gives us.
>
> Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Cc: Glauber de Oliveira Costa<gcosta@redhat.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: "H. Peter Anvin"<hpa@zytor.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Avi Kivity<avi@redhat.com>
> Cc: Zachary Amsden<zamsden@redhat.com>
>
> diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
> index e7f4d33..b782af2 100644
> --- a/arch/x86/include/asm/system.h
> +++ b/arch/x86/include/asm/system.h
> @@ -212,53 +212,68 @@ static inline void native_clts(void)
>
> /*
> * Volatile isn't enough to prevent the compiler from reordering the
> - * read/write functions for the control registers and messing everything up.
> - * A memory clobber would solve the problem, but would prevent reordering of
> - * all loads stores around it, which can hurt performance. Solution is to
> - * use a variable and mimic reads and writes to it to enforce serialization
> + * read/write functions for the control registers and messing
> + * everything up. A memory clobber would solve the problem, but would
> + * prevent reordering of all loads stores around it, which can hurt
> + * performance (however, control register writes can have drastic
> + * effects on memory accesses - like switching pagetables and thereby
> + * redefining what an address means - so they still need to clobber
> + * memory).
> + *
> + * Solution is to use a variable and mimic reads and writes to it to
> + * enforce serialization. __force_order acts as a proxy for the
> + * control registers, so a read_crX reads __force_order, and write_crX
> + * writes it (actually both reads and writes it to indicate that
> + * write-over-write can't be "optimised" away).
> + *
> + * This assumes there's no global optimisation between object files,
> + * so using a static per-file "__force_order" is OK. (In theory we
> + * don't need __force_order to be instantiated at all, since it is
> + * never actually read or written to. But gcc might decide to
> + * generate a reference to it anyway, so we need to keep it around.)
> */
> static unsigned long __force_order;
>
> static inline unsigned long native_read_cr0(void)
> {
> unsigned long val;
> - asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
> + asm volatile("mov %%cr0,%0\n\t" : "=r" (val) : "m" (__force_order));
> return val;
> }
>
> static inline void native_write_cr0(unsigned long val)
> {
> - asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order));
> + asm volatile("mov %1,%%cr0": "+m" (__force_order) : "r" (val) : "memory");
> }
>
> static inline unsigned long native_read_cr2(void)
> {
> unsigned long val;
> - asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
> + asm volatile("mov %%cr2,%0\n\t" : "=r" (val) : "m" (__force_order));
> return val;
> }
>
> static inline void native_write_cr2(unsigned long val)
> {
> - asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
> + asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : "memory");
> }
>
You don't need the memory clobber there. Technically, this should never
be used, however.
>
> static inline unsigned long native_read_cr3(void)
> {
> unsigned long val;
> - asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
> + asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" (__force_order));
> return val;
> }
>
> static inline void native_write_cr3(unsigned long val)
> {
> - asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
> + asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : "memory");
> }
>
> static inline unsigned long native_read_cr4(void)
> {
> unsigned long val;
> - asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
> + asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" (__force_order));
> return val;
> }
>
> @@ -271,7 +286,7 @@ static inline unsigned long native_read_cr4_safe(void)
> asm volatile("1: mov %%cr4, %0\n"
> "2:\n"
> _ASM_EXTABLE(1b, 2b)
> - : "=r" (val), "=m" (__force_order) : "0" (0));
> + : "=r" (val) : "m" (__force_order), "0" (0));
> #else
> val = native_read_cr4();
> #endif
> @@ -280,7 +295,7 @@ static inline unsigned long native_read_cr4_safe(void)
>
> static inline void native_write_cr4(unsigned long val)
> {
> - asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> + asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : "memory");
> }
>
> #ifdef CONFIG_X86_64
>
>
>
Looks good. I really hope __force_order gets pruned however. Does it
actually?
next prev parent reply other threads:[~2010-07-15 0:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-14 22:12 [PATCH] x86: fix ordering constraints on crX read/writes Jeremy Fitzhardinge
2010-07-15 0:28 ` Zachary Amsden [this message]
2010-07-15 0:55 ` Jeremy Fitzhardinge
2010-07-15 1:00 ` Zachary Amsden
2010-07-15 1:29 ` H. Peter Anvin
2010-07-15 14:34 ` Jeremy Fitzhardinge
2010-07-15 18:54 ` H. Peter Anvin
2010-07-15 19:28 ` Jeremy Fitzhardinge
2010-07-15 19:36 ` H. Peter Anvin
2010-07-15 19:57 ` Jeremy Fitzhardinge
2010-07-15 7:07 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C3E5637.4010300@redhat.com \
--to=zamsden@redhat.com \
--cc=avi@redhat.com \
--cc=glommer@redhat.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.