public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: do not allow to optimize flag_is_changeable_p()
@ 2008-09-29 18:06 Krzysztof Helt
  2008-09-29 18:17 ` H. Peter Anvin
  2008-09-30  6:14 ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 7+ messages in thread
From: Krzysztof Helt @ 2008-09-29 18:06 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar; +Cc: Thomas Gleixner, H. Peter Anvin, Yinghai Lu

From: Krzysztof Helt <krzysztof.h1@wp.pl>

The flag_is_changeable_p() is used by
has_cpuid_p() which can return different results
in the code sequence below:

 if (!have_cpuid_p())
      identify_cpu_without_cpuid(c);

  /* cyrix could have cpuid enabled via c_identify()*/
  if (!have_cpuid_p())
      return;

Otherwise, the gcc 3.4.6 optimizes these two calls
into one which make the code not working correctly.
Cyrix cpus have the CPUID instruction enabled but
it is not detected due to the gcc optimization.
Thus the ARR registers (mtrr like) are not detected
on such a cpu.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
---

I have tested the 6x86MX cpu with the CPUID
disabled. I have used linux-next tree (20080819)
and Yinghai Lu's patch:

x86: identify_cpu_without_cpuid v2

http://marc.info/?l=linux-kernel&m=122138380004347&w=2

The patch below is required to make the patch
above working correctly.

diff -urp linux-orig/arch/x86/kernel/cpu/common.c linux-2.6.27/arch/x86/kernel/cpu/common.c
--- linux-orig/arch/x86/kernel/cpu/common.c	2008-09-29 07:11:54.000000000 +0200
+++ linux-2.6.27/arch/x86/kernel/cpu/common.c	2008-09-29 18:07:27.667392725 +0200
@@ -124,18 +124,18 @@ static inline int flag_is_changeable_p(u
 {
 	u32 f1, f2;
 
-	asm("pushfl\n\t"
-	    "pushfl\n\t"
-	    "popl %0\n\t"
-	    "movl %0,%1\n\t"
-	    "xorl %2,%0\n\t"
-	    "pushl %0\n\t"
-	    "popfl\n\t"
-	    "pushfl\n\t"
-	    "popl %0\n\t"
-	    "popfl\n\t"
-	    : "=&r" (f1), "=&r" (f2)
-	    : "ir" (flag));
+	asm volatile ("pushfl\n\t"
+		      "pushfl\n\t"
+		      "popl %0\n\t"
+		      "movl %0,%1\n\t"
+		      "xorl %2,%0\n\t"
+		      "pushl %0\n\t"
+		      "popfl\n\t"
+		      "pushfl\n\t"
+		      "popl %0\n\t"
+		      "popfl\n\t"
+		      : "=&r" (f1), "=&r" (f2)
+		      : "ir" (flag));
 
 	return ((f1^f2) & flag) != 0;
 }

----------------------------------------------------------------------
Dzwon taniej na zagraniczne komorki!
Sprawdz >> http://link.interia.pl/f1f26 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
  2008-09-29 18:06 [PATCH] x86: do not allow to optimize flag_is_changeable_p() Krzysztof Helt
@ 2008-09-29 18:17 ` H. Peter Anvin
  2008-09-30  6:14 ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2008-09-29 18:17 UTC (permalink / raw)
  To: Krzysztof Helt; +Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Yinghai Lu

Krzysztof Helt wrote:
> From: Krzysztof Helt <krzysztof.h1@wp.pl>
> 
> The flag_is_changeable_p() is used by
> has_cpuid_p() which can return different results
> in the code sequence below:
> 
>  if (!have_cpuid_p())
>       identify_cpu_without_cpuid(c);
> 
>   /* cyrix could have cpuid enabled via c_identify()*/
>   if (!have_cpuid_p())
>       return;
> 
> Otherwise, the gcc 3.4.6 optimizes these two calls
> into one which make the code not working correctly.
> Cyrix cpus have the CPUID instruction enabled but
> it is not detected due to the gcc optimization.
> Thus the ARR registers (mtrr like) are not detected
> on such a cpu.
> 

I suspect we should also out-of-line this function.  It's actually 
relatively sizable and certainly there is no point in inlining it.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
  2008-09-29 18:06 [PATCH] x86: do not allow to optimize flag_is_changeable_p() Krzysztof Helt
  2008-09-29 18:17 ` H. Peter Anvin
@ 2008-09-30  6:14 ` Jeremy Fitzhardinge
  2008-09-30  6:34   ` Yinghai Lu
  1 sibling, 1 reply; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-30  6:14 UTC (permalink / raw)
  To: Krzysztof Helt
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Yinghai Lu

Krzysztof Helt wrote:
> From: Krzysztof Helt <krzysztof.h1@wp.pl>
>
> The flag_is_changeable_p() is used by
> has_cpuid_p() which can return different results
> in the code sequence below:
>
>  if (!have_cpuid_p())
>       identify_cpu_without_cpuid(c);
>
>   /* cyrix could have cpuid enabled via c_identify()*/
>   if (!have_cpuid_p())
>       return;
>
> Otherwise, the gcc 3.4.6 optimizes these two calls
> into one which make the code not working correctly.
> Cyrix cpus have the CPUID instruction enabled but
> it is not detected due to the gcc optimization.
> Thus the ARR registers (mtrr like) are not detected
> on such a cpu.
>   

If "asm volatile" changes the code and fixes the bug, it seems like
you're making use of an undocumented - or at least non-portable - behaviour.

Does adding a "memory" clobber also fix the problem?  That would have
better defined characteristics.

    J

> Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
> ---
>
> I have tested the 6x86MX cpu with the CPUID
> disabled. I have used linux-next tree (20080819)
> and Yinghai Lu's patch:
>
> x86: identify_cpu_without_cpuid v2
>
> http://marc.info/?l=linux-kernel&m=122138380004347&w=2
>
> The patch below is required to make the patch
> above working correctly.
>
> diff -urp linux-orig/arch/x86/kernel/cpu/common.c linux-2.6.27/arch/x86/kernel/cpu/common.c
> --- linux-orig/arch/x86/kernel/cpu/common.c	2008-09-29 07:11:54.000000000 +0200
> +++ linux-2.6.27/arch/x86/kernel/cpu/common.c	2008-09-29 18:07:27.667392725 +0200
> @@ -124,18 +124,18 @@ static inline int flag_is_changeable_p(u
>  {
>  	u32 f1, f2;
>  
> -	asm("pushfl\n\t"
> -	    "pushfl\n\t"
> -	    "popl %0\n\t"
> -	    "movl %0,%1\n\t"
> -	    "xorl %2,%0\n\t"
> -	    "pushl %0\n\t"
> -	    "popfl\n\t"
> -	    "pushfl\n\t"
> -	    "popl %0\n\t"
> -	    "popfl\n\t"
> -	    : "=&r" (f1), "=&r" (f2)
> -	    : "ir" (flag));
> +	asm volatile ("pushfl\n\t"
> +		      "pushfl\n\t"
> +		      "popl %0\n\t"
> +		      "movl %0,%1\n\t"
> +		      "xorl %2,%0\n\t"
> +		      "pushl %0\n\t"
> +		      "popfl\n\t"
> +		      "pushfl\n\t"
> +		      "popl %0\n\t"
> +		      "popfl\n\t"
> +		      : "=&r" (f1), "=&r" (f2)
> +		      : "ir" (flag));
>  
>  	return ((f1^f2) & flag) != 0;
>  }
>
> ----------------------------------------------------------------------
> Dzwon taniej na zagraniczne komorki!
> Sprawdz >> http://link.interia.pl/f1f26 
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
  2008-09-30  6:14 ` Jeremy Fitzhardinge
@ 2008-09-30  6:34   ` Yinghai Lu
  2008-09-30  6:54     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Yinghai Lu @ 2008-09-30  6:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Krzysztof Helt, linux-kernel, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Mon, Sep 29, 2008 at 11:14 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Krzysztof Helt wrote:
>> From: Krzysztof Helt <krzysztof.h1@wp.pl>
>>
>> The flag_is_changeable_p() is used by
>> has_cpuid_p() which can return different results
>> in the code sequence below:
>>
>>  if (!have_cpuid_p())
>>       identify_cpu_without_cpuid(c);
>>
>>   /* cyrix could have cpuid enabled via c_identify()*/
>>   if (!have_cpuid_p())
>>       return;
>>
>> Otherwise, the gcc 3.4.6 optimizes these two calls
>> into one which make the code not working correctly.
>> Cyrix cpus have the CPUID instruction enabled but
>> it is not detected due to the gcc optimization.
>> Thus the ARR registers (mtrr like) are not detected
>> on such a cpu.
>>
>
> If "asm volatile" changes the code and fixes the bug, it seems like
> you're making use of an undocumented - or at least non-portable - behaviour.
>
> Does adding a "memory" clobber also fix the problem?  That would have
> better defined characteristics.
>

how about

        if (!have_cpuid_p()) {
                identify_cpu_without_cpuid(c);

                /* cyrix could have cpuid enabled via c_identify()*/
                if (!have_cpuid_p())
                        return;
        }


YH

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
  2008-09-30  6:34   ` Yinghai Lu
@ 2008-09-30  6:54     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-30  6:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Krzysztof Helt, linux-kernel, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

Yinghai Lu wrote:
> On Mon, Sep 29, 2008 at 11:14 PM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>   
>> Krzysztof Helt wrote:
>>     
>>> From: Krzysztof Helt <krzysztof.h1@wp.pl>
>>>
>>> The flag_is_changeable_p() is used by
>>> has_cpuid_p() which can return different results
>>> in the code sequence below:
>>>
>>>  if (!have_cpuid_p())
>>>       identify_cpu_without_cpuid(c);
>>>
>>>   /* cyrix could have cpuid enabled via c_identify()*/
>>>   if (!have_cpuid_p())
>>>       return;
>>>
>>> Otherwise, the gcc 3.4.6 optimizes these two calls
>>> into one which make the code not working correctly.
>>> Cyrix cpus have the CPUID instruction enabled but
>>> it is not detected due to the gcc optimization.
>>> Thus the ARR registers (mtrr like) are not detected
>>> on such a cpu.
>>>
>>>       
>> If "asm volatile" changes the code and fixes the bug, it seems like
>> you're making use of an undocumented - or at least non-portable - behaviour.
>>
>> Does adding a "memory" clobber also fix the problem?  That would have
>> better defined characteristics.
>>
>>     
>
> how about
>
>         if (!have_cpuid_p()) {
>                 identify_cpu_without_cpuid(c);
>
>                 /* cyrix could have cpuid enabled via c_identify()*/
>                 if (!have_cpuid_p())
>                         return;
>         }
>   

That doesn't help, does it?  If gcc thinks it can get away with
evaluating have_cpuid_p() once, then that's the same as:

	if (!have_cpuid_p()) {
		identify_cpu_without_cpuid(c);

		return;
	}

even though identify_cpu_without_cpuid() can cause the cpu to suddenly
start supporting cpuid.

The trouble is that flag_is_changeable_p() doesn't have any obvious
global dependencies; it just takes a constant argument and returns a
result.   The asm() needs to be updated to have a "memory" constraint as
a stand-in for the specific constraint of "cpu has switched into
cpuid-supporting state".

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
@ 2008-09-30  8:27 krzysztof.h1
  2008-09-30 15:23 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: krzysztof.h1 @ 2008-09-30  8:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Yinghai Lu
  Cc: Krzysztof Helt, linux-kernel@vger.kernel.org, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin

> Yinghai Lu wrote:
> > On Mon, Sep 29, 2008 at 11:14 PM, Jeremy Fitzhardinge <jeremy@goop.org>
> wrote:
> >   
> >> Krzysztof Helt wrote:
> >>     
> >>> From: Krzysztof Helt <krzysztof.h1@wp.pl>
> >>>
> >>> The flag_is_changeable_p() is used by
> >>> has_cpuid_p() which can return different results
> >>> in the code sequence below:
> >>>
> >>>  if (!have_cpuid_p())
> >>>       identify_cpu_without_cpuid(c);
> >>>
> >>>   /* cyrix could have cpuid enabled via c_identify()*/
> >>>   if (!have_cpuid_p())
> >>>       return;
> >>>
> >>> Otherwise, the gcc 3.4.6 optimizes these two calls
> >>> into one which make the code not working correctly.
> >>> Cyrix cpus have the CPUID instruction enabled but
> >>> it is not detected due to the gcc optimization.
> >>> Thus the ARR registers (mtrr like) are not detected
> >>> on such a cpu.
> >>>
> >>>       
> >> If "asm volatile" changes the code and fixes the bug, it seems like
> >> you're making use of an undocumented - or at least non-portable -
> behaviour.
> >>

Why you call it undocumented. This is information you can find with "info gcc" in the Extendend Asm section:

If your assembler instructions access memory in an unpredictable
fashion, add `memory' to the list of clobbered registers.  This will
cause GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You will also want to add the `volatile' keyword if the memory affected
is not listed in the inputs or outputs of the `asm', as the `memory'
clobber does not count as a side-effect of the `asm'.  If you know how
large the accessed memory is, you can add it as input or output but if
this is not known, you should add `memory'.

> >> Does adding a "memory" clobber also fix the problem?  That would have
> >> better defined characteristics.
> >>

A changeable flag bit is hardly a memory side effect. IMO, the volatile attribute is better as it says that each evaluation may have a different results despite the inputs and outputs are the same.

> 
> The trouble is that flag_is_changeable_p() doesn't have any obvious
> global dependencies; it just takes a constant argument and returns a
> result.   The asm() needs to be updated to have a "memory" constraint as
> a stand-in for the specific constraint of "cpu has switched into
> cpuid-supporting state".
> 

See above about adding the memory constrain.

Kind regards,
Krzysztof

----------------------------------------------------------------------
Tanie i proste polaczenia telefoniczne!
Sprawdz >>  http://link.interia.pl/f1f23 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: do not allow to optimize flag_is_changeable_p()
  2008-09-30  8:27 krzysztof.h1
@ 2008-09-30 15:23 ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-30 15:23 UTC (permalink / raw)
  To: krzysztof.h1
  Cc: Yinghai Lu, linux-kernel@vger.kernel.org, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin

krzysztof.h1@poczta.fm wrote:
> If your assembler instructions access memory in an unpredictable
> fashion, add `memory' to the list of clobbered registers.  This will
> cause GCC to not keep memory values cached in registers across the
> assembler instruction and not optimize stores or loads to that memory.
> You will also want to add the `volatile' keyword if the memory affected
> is not listed in the inputs or outputs of the `asm', as the `memory'
> clobber does not count as a side-effect of the `asm'.  If you know how
> large the accessed memory is, you can add it as input or output but if
> this is not known, you should add `memory'.
>   

Yes, you're right.  The pertinent part of the manual is:

    The `volatile' keyword indicates that the instruction has important
    side-effects.  GCC will not delete a volatile `asm' if it is reachable.
    (The instruction can still be deleted if GCC can prove that
    control-flow will never reach the location of the instruction.)  Note
    that even a volatile `asm' instruction can be moved relative to other
    code, including across jump instructions.
      

I normally do my "asm volatile" rant when people try to use it to
enforce ordering, but in this case we just want gcc to not elide the
second use.

So, yes, I think your patch is fine as-is, but it would be worth adding
a comment on the asm (its not necessarily obvious that the
cpuid-capability of a cpu can change).

    J

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-09-30 15:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-29 18:06 [PATCH] x86: do not allow to optimize flag_is_changeable_p() Krzysztof Helt
2008-09-29 18:17 ` H. Peter Anvin
2008-09-30  6:14 ` Jeremy Fitzhardinge
2008-09-30  6:34   ` Yinghai Lu
2008-09-30  6:54     ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2008-09-30  8:27 krzysztof.h1
2008-09-30 15:23 ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox