[PATCH] x86: fix ordering constraints on crX read/writes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] x86: fix ordering constraints on crX read/writes
@ 2010-07-14 22:12 Jeremy Fitzhardinge
  2010-07-15  0:28 ` Zachary Amsden
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-14 22:12 UTC (permalink / raw)
  To: Glauber Costa
  Cc: H. Peter Anvin, Thomas Gleixner, Avi Kivity, Zachary Amsden,
	Linux Kernel Mailing List

Change d3ca901f94b3299 introduces a new mechanism for sequencing
accesses to the control registers using a variable to act as a
dependency.  (However, the patch description only says it is unifying
parts of system.h and makes no mention of this new code.)

This sequencing implementation is flawed in two ways:
 - Firstly, it gets the input/outputs for __force_order wrong on
   the asm statements.  __force_order is a proxy for the control
   registers themselves, so a write_crX should write __force_order,
   and conversely for read_crX.  The original code got this backwards,
   making write_crX read from __force_order, which in principle would
   allow gcc to eliminate a "redundant" write_crX.

 - Secondly, writes to control registers can have drastic
   side-effects on all memory operations (write to cr3 changes the
   current pagetable and redefines the meaning of all addresses,
   for example), so they must clobber "memory" in order to be
   ordered with respect to memory operations.

We seem to have been saved by the implicit ordering that "asm volatile"
gives us.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Glauber de Oliveira Costa <gcosta@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Avi Kivity <avi@redhat.com>
Cc: Zachary Amsden <zamsden@redhat.com>

diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index e7f4d33..b782af2 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -212,53 +212,68 @@ static inline void native_clts(void)
 
 /*
  * Volatile isn't enough to prevent the compiler from reordering the
- * read/write functions for the control registers and messing everything up.
- * A memory clobber would solve the problem, but would prevent reordering of
- * all loads stores around it, which can hurt performance. Solution is to
- * use a variable and mimic reads and writes to it to enforce serialization
+ * read/write functions for the control registers and messing
+ * everything up.  A memory clobber would solve the problem, but would
+ * prevent reordering of all loads stores around it, which can hurt
+ * performance (however, control register writes can have drastic
+ * effects on memory accesses - like switching pagetables and thereby
+ * redefining what an address means - so they still need to clobber
+ * memory).
+ *
+ * Solution is to use a variable and mimic reads and writes to it to
+ * enforce serialization.  __force_order acts as a proxy for the
+ * control registers, so a read_crX reads __force_order, and write_crX
+ * writes it (actually both reads and writes it to indicate that
+ * write-over-write can't be "optimised" away).
+ *
+ * This assumes there's no global optimisation between object files,
+ * so using a static per-file "__force_order" is OK.  (In theory we
+ * don't need __force_order to be instantiated at all, since it is
+ * never actually read or written to.  But gcc might decide to
+ * generate a reference to it anyway, so we need to keep it around.)
  */
 static unsigned long __force_order;
 
 static inline unsigned long native_read_cr0(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr0,%0\n\t" : "=r" (val) : "m" (__force_order));
 	return val;
 }
 
 static inline void native_write_cr0(unsigned long val)
 {
-	asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order));
+	asm volatile("mov %1,%%cr0": "+m" (__force_order) : "r" (val) : "memory");
 }
 
 static inline unsigned long native_read_cr2(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr2,%0\n\t" : "=r" (val) : "m" (__force_order));
 	return val;
 }
 
 static inline void native_write_cr2(unsigned long val)
 {
-	asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
+	asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : "memory");
 }
 
 static inline unsigned long native_read_cr3(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" (__force_order));
 	return val;
 }
 
 static inline void native_write_cr3(unsigned long val)
 {
-	asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
+	asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : "memory");
 }
 
 static inline unsigned long native_read_cr4(void)
 {
 	unsigned long val;
-	asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
+	asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" (__force_order));
 	return val;
 }
 
@@ -271,7 +286,7 @@ static inline unsigned long native_read_cr4_safe(void)
 	asm volatile("1: mov %%cr4, %0\n"
 		     "2:\n"
 		     _ASM_EXTABLE(1b, 2b)
-		     : "=r" (val), "=m" (__force_order) : "0" (0));
+		     : "=r" (val) : "m" (__force_order), "0" (0));
 #else
 	val = native_read_cr4();
 #endif
@@ -280,7 +295,7 @@ static inline unsigned long native_read_cr4_safe(void)
 
 static inline void native_write_cr4(unsigned long val)
 {
-	asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
+	asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : "memory");
 }
 
 #ifdef CONFIG_X86_64



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-14 22:12 [PATCH] x86: fix ordering constraints on crX read/writes Jeremy Fitzhardinge
@ 2010-07-15  0:28 ` Zachary Amsden
  2010-07-15  0:55   ` Jeremy Fitzhardinge
  2010-07-15  7:07   ` Avi Kivity
  0 siblings, 2 replies; 11+ messages in thread
From: Zachary Amsden @ 2010-07-15  0:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Glauber Costa, H. Peter Anvin, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/14/2010 12:12 PM, Jeremy Fitzhardinge wrote:
> Change d3ca901f94b3299 introduces a new mechanism for sequencing
> accesses to the control registers using a variable to act as a
> dependency.  (However, the patch description only says it is unifying
> parts of system.h and makes no mention of this new code.)
>
> This sequencing implementation is flawed in two ways:
>   - Firstly, it gets the input/outputs for __force_order wrong on
>     the asm statements.  __force_order is a proxy for the control
>     registers themselves, so a write_crX should write __force_order,
>     and conversely for read_crX.  The original code got this backwards,
>     making write_crX read from __force_order, which in principle would
>     allow gcc to eliminate a "redundant" write_crX.
>
>   - Secondly, writes to control registers can have drastic
>     side-effects on all memory operations (write to cr3 changes the
>     current pagetable and redefines the meaning of all addresses,
>     for example), so they must clobber "memory" in order to be
>     ordered with respect to memory operations.
>
> We seem to have been saved by the implicit ordering that "asm volatile"
> gives us.
>
> Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Cc: Glauber de Oliveira Costa<gcosta@redhat.com>
> Cc: Ingo Molnar<mingo@elte.hu>
> Cc: "H. Peter Anvin"<hpa@zytor.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Avi Kivity<avi@redhat.com>
> Cc: Zachary Amsden<zamsden@redhat.com>
>
> diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
> index e7f4d33..b782af2 100644
> --- a/arch/x86/include/asm/system.h
> +++ b/arch/x86/include/asm/system.h
> @@ -212,53 +212,68 @@ static inline void native_clts(void)
>
>   /*
>    * Volatile isn't enough to prevent the compiler from reordering the
> - * read/write functions for the control registers and messing everything up.
> - * A memory clobber would solve the problem, but would prevent reordering of
> - * all loads stores around it, which can hurt performance. Solution is to
> - * use a variable and mimic reads and writes to it to enforce serialization
> + * read/write functions for the control registers and messing
> + * everything up.  A memory clobber would solve the problem, but would
> + * prevent reordering of all loads stores around it, which can hurt
> + * performance (however, control register writes can have drastic
> + * effects on memory accesses - like switching pagetables and thereby
> + * redefining what an address means - so they still need to clobber
> + * memory).
> + *
> + * Solution is to use a variable and mimic reads and writes to it to
> + * enforce serialization.  __force_order acts as a proxy for the
> + * control registers, so a read_crX reads __force_order, and write_crX
> + * writes it (actually both reads and writes it to indicate that
> + * write-over-write can't be "optimised" away).
> + *
> + * This assumes there's no global optimisation between object files,
> + * so using a static per-file "__force_order" is OK.  (In theory we
> + * don't need __force_order to be instantiated at all, since it is
> + * never actually read or written to.  But gcc might decide to
> + * generate a reference to it anyway, so we need to keep it around.)
>    */
>   static unsigned long __force_order;
>
>   static inline unsigned long native_read_cr0(void)
>   {
>   	unsigned long val;
> -	asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
> +	asm volatile("mov %%cr0,%0\n\t" : "=r" (val) : "m" (__force_order));
>   	return val;
>   }
>
>   static inline void native_write_cr0(unsigned long val)
>   {
> -	asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order));
> +	asm volatile("mov %1,%%cr0": "+m" (__force_order) : "r" (val) : "memory");
>   }
>
>   static inline unsigned long native_read_cr2(void)
>   {
>   	unsigned long val;
> -	asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
> +	asm volatile("mov %%cr2,%0\n\t" : "=r" (val) : "m" (__force_order));
>   	return val;
>   }
>
>   static inline void native_write_cr2(unsigned long val)
>   {
> -	asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
> +	asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : "memory");
>   }
>    

You don't need the memory clobber there.  Technically, this should never 
be used, however.

>
>   static inline unsigned long native_read_cr3(void)
>   {
>   	unsigned long val;
> -	asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
> +	asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m" (__force_order));
>   	return val;
>   }
>
>   static inline void native_write_cr3(unsigned long val)
>   {
> -	asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
> +	asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) : "memory");
>   }
>
>   static inline unsigned long native_read_cr4(void)
>   {
>   	unsigned long val;
> -	asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
> +	asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m" (__force_order));
>   	return val;
>   }
>
> @@ -271,7 +286,7 @@ static inline unsigned long native_read_cr4_safe(void)
>   	asm volatile("1: mov %%cr4, %0\n"
>   		     "2:\n"
>   		     _ASM_EXTABLE(1b, 2b)
> -		     : "=r" (val), "=m" (__force_order) : "0" (0));
> +		     : "=r" (val) : "m" (__force_order), "0" (0));
>   #else
>   	val = native_read_cr4();
>   #endif
> @@ -280,7 +295,7 @@ static inline unsigned long native_read_cr4_safe(void)
>
>   static inline void native_write_cr4(unsigned long val)
>   {
> -	asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
> +	asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) : "memory");
>   }
>
>   #ifdef CONFIG_X86_64
>
>
>    

Looks good.  I really hope __force_order gets pruned however.  Does it 
actually?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15  0:28 ` Zachary Amsden
@ 2010-07-15  0:55   ` Jeremy Fitzhardinge
  2010-07-15  1:00     ` Zachary Amsden
  2010-07-15  1:29     ` H. Peter Anvin
  2010-07-15  7:07   ` Avi Kivity
  1 sibling, 2 replies; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-15  0:55 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Glauber Costa, H. Peter Anvin, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/14/2010 05:28 PM, Zachary Amsden wrote:
>
>>   static inline void native_write_cr2(unsigned long val)
>>   {
>> -    asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
>> +    asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) :
>> "memory");
>>   }
>>    
>
>
> You don't need the memory clobber there.  Technically, this should
> never be used, however.

Yes.  I just did it for consistency.  Likewise, I didn't pore over the
manuals to work out whether writes to any crX could really have memory
side-effects.

>>
>>   static inline unsigned long native_read_cr3(void)
>>   {
>>       unsigned long val;
>> -    asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m"
>> (__force_order));
>> +    asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m"
>> (__force_order));
>>       return val;
>>   }
>>
>>   static inline void native_write_cr3(unsigned long val)
>>   {
>> -    asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
>> +    asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) :
>> "memory");
>>   }
>>
>>   static inline unsigned long native_read_cr4(void)
>>   {
>>       unsigned long val;
>> -    asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m"
>> (__force_order));
>> +    asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m"
>> (__force_order));
>>       return val;
>>   }
>>
>> @@ -271,7 +286,7 @@ static inline unsigned long
>> native_read_cr4_safe(void)
>>       asm volatile("1: mov %%cr4, %0\n"
>>                "2:\n"
>>                _ASM_EXTABLE(1b, 2b)
>> -             : "=r" (val), "=m" (__force_order) : "0" (0));
>> +             : "=r" (val) : "m" (__force_order), "0" (0));
>>   #else
>>       val = native_read_cr4();
>>   #endif
>> @@ -280,7 +295,7 @@ static inline unsigned long
>> native_read_cr4_safe(void)
>>
>>   static inline void native_write_cr4(unsigned long val)
>>   {
>> -    asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
>> +    asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) :
>> "memory");
>>   }
>>
>>   #ifdef CONFIG_X86_64
>>
>>
>>    
>
> Looks good.  I really hope __force_order gets pruned however.  Does it
> actually?

There's a couple of instances in my vmlinux.  I didn't try to track them
back to specific .o files.  gcc tends to generate references by putting
its address into a register and passing that into the asms.

    J


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15  0:55   ` Jeremy Fitzhardinge
@ 2010-07-15  1:00     ` Zachary Amsden
  2010-07-15  1:29     ` H. Peter Anvin
  1 sibling, 0 replies; 11+ messages in thread
From: Zachary Amsden @ 2010-07-15  1:00 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Glauber Costa, H. Peter Anvin, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/14/2010 02:55 PM, Jeremy Fitzhardinge wrote:
> On 07/14/2010 05:28 PM, Zachary Amsden wrote:
>    
>>      
>>>    static inline void native_write_cr2(unsigned long val)
>>>    {
>>> -    asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>    }
>>>
>>>        
>>
>> You don't need the memory clobber there.  Technically, this should
>> never be used, however.
>>      
> Yes.  I just did it for consistency.  Likewise, I didn't pore over the
> manuals to work out whether writes to any crX could really have memory
> side-effects.
>    

0,3,4 all can.

>>>    static inline unsigned long native_read_cr3(void)
>>>    {
>>>        unsigned long val;
>>> -    asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m"
>>> (__force_order));
>>> +    asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m"
>>> (__force_order));
>>>        return val;
>>>    }
>>>
>>>    static inline void native_write_cr3(unsigned long val)
>>>    {
>>> -    asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>    }
>>>
>>>    static inline unsigned long native_read_cr4(void)
>>>    {
>>>        unsigned long val;
>>> -    asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m"
>>> (__force_order));
>>> +    asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m"
>>> (__force_order));
>>>        return val;
>>>    }
>>>
>>> @@ -271,7 +286,7 @@ static inline unsigned long
>>> native_read_cr4_safe(void)
>>>        asm volatile("1: mov %%cr4, %0\n"
>>>                 "2:\n"
>>>                 _ASM_EXTABLE(1b, 2b)
>>> -             : "=r" (val), "=m" (__force_order) : "0" (0));
>>> +             : "=r" (val) : "m" (__force_order), "0" (0));
>>>    #else
>>>        val = native_read_cr4();
>>>    #endif
>>> @@ -280,7 +295,7 @@ static inline unsigned long
>>> native_read_cr4_safe(void)
>>>
>>>    static inline void native_write_cr4(unsigned long val)
>>>    {
>>> -    asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>    }
>>>
>>>    #ifdef CONFIG_X86_64
>>>
>>>
>>>
>>>        
>> Looks good.  I really hope __force_order gets pruned however.  Does it
>> actually?
>>      
> There's a couple of instances in my vmlinux.  I didn't try to track them
> back to specific .o files.  gcc tends to generate references by putting
> its address into a register and passing that into the asms.
>    

Can you make it extern so at least there's only one in the final bss?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15  0:55   ` Jeremy Fitzhardinge
  2010-07-15  1:00     ` Zachary Amsden
@ 2010-07-15  1:29     ` H. Peter Anvin
  2010-07-15 14:34       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2010-07-15  1:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Zachary Amsden
  Cc: Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

Yes, it will definitely NOT be pruned.  I'm going to file a gcc documentation request to see if any of this is actually needed, though.  There may also be a need for gcc to handle *inbound* general memory constraints.

"Jeremy Fitzhardinge" <jeremy@goop.org> wrote:

>On 07/14/2010 05:28 PM, Zachary Amsden wrote:
>>
>>>   static inline void native_write_cr2(unsigned long val)
>>>   {
>>> -    asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>   }
>>>    
>>
>>
>> You don't need the memory clobber there.  Technically, this should
>> never be used, however.
>
>Yes.  I just did it for consistency.  Likewise, I didn't pore over the
>manuals to work out whether writes to any crX could really have memory
>side-effects.
>
>>>
>>>   static inline unsigned long native_read_cr3(void)
>>>   {
>>>       unsigned long val;
>>> -    asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m"
>>> (__force_order));
>>> +    asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : "m"
>>> (__force_order));
>>>       return val;
>>>   }
>>>
>>>   static inline void native_write_cr3(unsigned long val)
>>>   {
>>> -    asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr3": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>   }
>>>
>>>   static inline unsigned long native_read_cr4(void)
>>>   {
>>>       unsigned long val;
>>> -    asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m"
>>> (__force_order));
>>> +    asm volatile("mov %%cr4,%0\n\t" : "=r" (val) : "m"
>>> (__force_order));
>>>       return val;
>>>   }
>>>
>>> @@ -271,7 +286,7 @@ static inline unsigned long
>>> native_read_cr4_safe(void)
>>>       asm volatile("1: mov %%cr4, %0\n"
>>>                "2:\n"
>>>                _ASM_EXTABLE(1b, 2b)
>>> -             : "=r" (val), "=m" (__force_order) : "0" (0));
>>> +             : "=r" (val) : "m" (__force_order), "0" (0));
>>>   #else
>>>       val = native_read_cr4();
>>>   #endif
>>> @@ -280,7 +295,7 @@ static inline unsigned long
>>> native_read_cr4_safe(void)
>>>
>>>   static inline void native_write_cr4(unsigned long val)
>>>   {
>>> -    asm volatile("mov %0,%%cr4": : "r" (val), "m" (__force_order));
>>> +    asm volatile("mov %1,%%cr4": "+m" (__force_order) : "r" (val) :
>>> "memory");
>>>   }
>>>
>>>   #ifdef CONFIG_X86_64
>>>
>>>
>>>    
>>
>> Looks good.  I really hope __force_order gets pruned however.  Does it
>> actually?
>
>There's a couple of instances in my vmlinux.  I didn't try to track them
>back to specific .o files.  gcc tends to generate references by putting
>its address into a register and passing that into the asms.
>
>    J
>

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15  1:29     ` H. Peter Anvin
@ 2010-07-15 14:34       ` Jeremy Fitzhardinge
  2010-07-15 18:54         ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-15 14:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Zachary Amsden, Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/14/2010 06:29 PM, H. Peter Anvin wrote:
> Yes, it will definitely NOT be pruned.  I'm going to file a gcc documentation request to see if any of this is actually needed, though.  There may also be a need for gcc to handle *inbound* general memory constraints.
>   

You mean "depends on all prior memory updates"?  We have been relying on
"memory" to do that (barrier(), for example), but it would be nice to
explicitly confirm that's OK, or get something which is guaranteed to be OK.

    J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15 14:34       ` Jeremy Fitzhardinge
@ 2010-07-15 18:54         ` H. Peter Anvin
  2010-07-15 19:28           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2010-07-15 18:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/15/2010 07:34 AM, Jeremy Fitzhardinge wrote:
> On 07/14/2010 06:29 PM, H. Peter Anvin wrote:
>> Yes, it will definitely NOT be pruned.  I'm going to file a gcc documentation request to see if any of this is actually needed, though.  There may also be a need for gcc to handle *inbound* general memory constraints.
>>   
> 
> You mean "depends on all prior memory updates"?  We have been relying on
> "memory" to do that (barrier(), for example), but it would be nice to
> explicitly confirm that's OK, or get something which is guaranteed to be OK.
> 

No, we haven't.  You're misunderstanding what a "memory" clobber does.
A clobber affects the output side only, but doesn't inherently provide
ordering on the input side.  Apparently this is implicit in "asm
volatile", which is a very important property.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15 18:54         ` H. Peter Anvin
@ 2010-07-15 19:28           ` Jeremy Fitzhardinge
  2010-07-15 19:36             ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-15 19:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Zachary Amsden, Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/15/2010 11:54 AM, H. Peter Anvin wrote:
> No, we haven't.  You're misunderstanding what a "memory" clobber does.
> A clobber affects the output side only, but doesn't inherently provide
> ordering on the input side.  Apparently this is implicit in "asm
> volatile", which is a very important property.

OK. It would be nice to get that confirmed.

    J


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15 19:28           ` Jeremy Fitzhardinge
@ 2010-07-15 19:36             ` H. Peter Anvin
  2010-07-15 19:57               ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2010-07-15 19:36 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Zachary Amsden, Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/15/2010 12:28 PM, Jeremy Fitzhardinge wrote:
> On 07/15/2010 11:54 AM, H. Peter Anvin wrote:
>> No, we haven't.  You're misunderstanding what a "memory" clobber does.
>> A clobber affects the output side only, but doesn't inherently provide
>> ordering on the input side.  Apparently this is implicit in "asm
>> volatile", which is a very important property.
> 
> OK. It would be nice to get that confirmed.
> 

The section in the docs (gcc 4.4.4 section 5.37) reads as:

If your assembler instructions access memory in an unpredictable
fashion, add `memory' to the list of clobbered registers.  This will
cause GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You will also want to add the `volatile' keyword if the memory affected
is not listed in the inputs or outputs of the `asm', as the `memory'
clobber does not count as a side-effect of the `asm'.  If you know how
large the accessed memory is, you can add it as input or output but if
this is not known, you should add `memory'.

This was clearer to me when I read it last evening, either because I was
tired and on an airplane ;) or because I read too much into it... it's
worth noting that an asm() in gcc is really nothing more than an
internal compiler event exposed to the user; the terms "output", "input"
and "clobber" have pretty specific meaning in compiler theory, and they
at least appear to be used that way.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15 19:36             ` H. Peter Anvin
@ 2010-07-15 19:57               ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-15 19:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Zachary Amsden, Glauber Costa, Thomas Gleixner, Avi Kivity,
	Linux Kernel Mailing List

On 07/15/2010 12:36 PM, H. Peter Anvin wrote:
> On 07/15/2010 12:28 PM, Jeremy Fitzhardinge wrote:
>   
>> On 07/15/2010 11:54 AM, H. Peter Anvin wrote:
>>     
>>> No, we haven't.  You're misunderstanding what a "memory" clobber does.
>>> A clobber affects the output side only, but doesn't inherently provide
>>> ordering on the input side.  Apparently this is implicit in "asm
>>> volatile", which is a very important property.
>>>       
>> OK. It would be nice to get that confirmed.
>>
>>     
> The section in the docs (gcc 4.4.4 section 5.37) reads as:
>
> If your assembler instructions access memory in an unpredictable
> fashion, add `memory' to the list of clobbered registers.  This will
> cause GCC to not keep memory values cached in registers across the
> assembler instruction and not optimize stores or loads to that memory.
> You will also want to add the `volatile' keyword if the memory affected
> is not listed in the inputs or outputs of the `asm', as the `memory'
> clobber does not count as a side-effect of the `asm'.  If you know how
> large the accessed memory is, you can add it as input or output but if
> this is not known, you should add `memory'.
>
> This was clearer to me when I read it last evening, either because I was
> tired and on an airplane ;) or because I read too much into it...

Yes, I think it reads fairly ambigiously.  The first two and last
sentences definitely read as if "memory" not only says that all memory
could be modified by the asm, but could also be used as an input by the
asm, and therefore prevents two-way reordering of the asm with respect
to memory operations.

But I don't know how to parse the "volatile" sentence, I guess because
they're using the term "side-effect" in a *very* specific way which
means something other than "accessed in an unpredictable way".  Maybe
the memory clobber means it doesn't cache things in registers, but the
most recent version of some memory contents may not be stored in its
"home" location?  Or something to do with alias analysis?

>  it's
> worth noting that an asm() in gcc is really nothing more than an
> internal compiler event exposed to the user; the terms "output", "input"
> and "clobber" have pretty specific meaning in compiler theory, and they
> at least appear to be used that way.
>   

Yes, and it means they're stuck trying to support a compiler-internal
implementation as an external API.  But it really means you end up
having to go to the source and rummage around in md files to really work
out what the syntax is, let alone what the more subtle semantics are.

    J

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: fix ordering constraints on crX read/writes
  2010-07-15  0:28 ` Zachary Amsden
  2010-07-15  0:55   ` Jeremy Fitzhardinge
@ 2010-07-15  7:07   ` Avi Kivity
  1 sibling, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2010-07-15  7:07 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Jeremy Fitzhardinge, Glauber Costa, H. Peter Anvin,
	Thomas Gleixner, Linux Kernel Mailing List

On 07/15/2010 03:28 AM, Zachary Amsden wrote:
>>   static inline void native_write_cr2(unsigned long val)
>>   {
>> -    asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
>> +    asm volatile("mov %1,%%cr2": "+m" (__force_order) : "r" (val) : 
>> "memory");
>>   }
>
>
> You don't need the memory clobber there.  Technically, this should 
> never be used, however.

kvm writes cr2 in order to present the correct value to the guest.  It 
doesn't use native_write_cr2(), however.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-07-15 19:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-14 22:12 [PATCH] x86: fix ordering constraints on crX read/writes Jeremy Fitzhardinge
2010-07-15  0:28 ` Zachary Amsden
2010-07-15  0:55   ` Jeremy Fitzhardinge
2010-07-15  1:00     ` Zachary Amsden
2010-07-15  1:29     ` H. Peter Anvin
2010-07-15 14:34       ` Jeremy Fitzhardinge
2010-07-15 18:54         ` H. Peter Anvin
2010-07-15 19:28           ` Jeremy Fitzhardinge
2010-07-15 19:36             ` H. Peter Anvin
2010-07-15 19:57               ` Jeremy Fitzhardinge
2010-07-15  7:07   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).