public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* SSE related security hole
@ 2002-04-17 14:51 Jan Hubicka
  2002-04-17 15:23 ` Jan Hubicka
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Hubicka @ 2002-04-17 14:51 UTC (permalink / raw)
  To: bugtraq, linux-kernel, Pavel Machek, jakub, aj, ak

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

Hi,
while debugging GCC bugreport, I noticed following behaviour of simple
program with no syscalls:

hubicka@nikam:~$ ./a.out
sum of 7 ints: 28
hubicka@nikam:~$ ./a.out
sum of 7 ints: 56
Bad sum (seen with gcc -O -march=pentiumpro -msse)
hubicka@nikam:~$ ./a.out
sum of 7 ints: 84
Bad sum (seen with gcc -O -march=pentiumpro -msse)
hubicka@nikam:~$ ./a.out
sum of 7 ints: 112
Bad sum (seen with gcc -O -march=pentiumpro -msse)
hubicka@nikam:~$ echo

hubicka@nikam:~$ ./a.out
sum of 7 ints: 28


ie it always returns different value, moreover when something else
is run in meantime (verified by loading WWW page served by same machine),
the counter is reinitialized to 28.

I am attaching the source, but it needs to be compiled by cfg-branch GCC
with settings -O2 -march=pentium3 -mfpmath=sse, so I've placed static
binary to http://atrey.karlin.mff.cuni.cz/~hubicka/badsum.bin

The problem appears to be reproducible only on pentium3 and athlon4 systems,
not pentium4 system, where it appears to work as expected.  Reproduced on
both 2.4.9-RH and 2.4.16 kernels.

Honza

[-- Attachment #2: badsum.c --]
[-- Type: text/x-csrc, Size: 495 bytes --]

#include <stdlib.h>
#include <stdio.h>

#define t(x) asm("rdtsc":"=A"(x));
	char str[256],b;
int m(){
	long long a,b,c,d;
int i,n=7;	//choose any n < sqrt(1/FLT_EPSILON)
    float comp,sum=0;
	sin(1);
    for(i=1;i<=n;++i)
	sum += i;
    sprintf(str,"sum of %d ints: %g\n",n,sum);
    comp = n*(n*.5f+.5f);
    if(sum != comp){
	    b=1;
	}
    printf(str);
    if (b)
	printf("Bad sum (seen with gcc -O -march=pentiumpro -msse)\n");
    return 0;
}
main()
{
	/*sin(1);*/
	asm("finit");
	m();
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-17 14:51 Jan Hubicka
@ 2002-04-17 15:23 ` Jan Hubicka
  2002-04-18 14:57   ` Denis Vlasenko
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Hubicka @ 2002-04-17 15:23 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: bugtraq, linux-kernel, Pavel Machek, jakub, aj, ak

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

Hi,
Jakub asked me to cleanup the source and post assembly file.  Here it comes.

#include <stdlib.h>
#include <stdio.h>

int
m ()
{
  int i, n = 7;	
  float comp, sum = 0;
  sin(1);
  for (i = 1; i <= n; ++i)
    sum += i;
  printf ("sum of %d ints: %g\n", n, sum);
  return 0;
}

main ()
{
  m ();
}

[-- Attachment #2: bad3.s --]
[-- Type: text/plain, Size: 810 bytes --]

	.file	"bad3.c"
	.section	.rodata
.LC1:
	.string	"sum of %d ints: %g\n"
	.text
	.align 2
	.p2align 4,,15
.globl m
	.type	m,@function
m:
	pushl	%ebp
	movl	%esp, %ebp
	pxor	%xmm1, %xmm1
	subl	$24, %esp
	movss	%xmm1, -4(%ebp)
	movl	$0, (%esp)
	movl	$1072693248, 4(%esp)
	call	sin
	fstp	%st(0)
	movl	$1, %eax
	.p2align 4,,15
.L6:
	cvtsi2ss	%eax, %xmm1
	incl	%eax
	cmpl	$7, %eax
	addss	-4(%ebp), %xmm1
	movss	%xmm1, -4(%ebp)
	jle	.L6
	flds	-4(%ebp)
	movl	$.LC1, (%esp)
	movl	$7, 4(%esp)
	fstpl	8(%esp)
	call	printf
	leave
	xorl	%eax, %eax
	ret
.Lfe1:
	.size	m,.Lfe1-m
	.align 2
	.p2align 4,,15
.globl main
	.type	main,@function
main:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	andl	$-16, %esp
	call	m
	movl	%ebp, %esp
	popl	%ebp
	ret
.Lfe2:
	.size	main,.Lfe2-main
	.ident	"GCC: (GNU) 3.2 20020415 (experimental)"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
@ 2002-04-17 23:42 Doug Ledford
  2002-04-18  5:26 ` Andrea Arcangeli
  2002-04-18  8:22 ` Andi Kleen
  0 siblings, 2 replies; 30+ messages in thread
From: Doug Ledford @ 2002-04-17 23:42 UTC (permalink / raw)
  To: jh; +Cc: linux-kernel, jakub, aj, ak, pavel

Subject: Re: SSE related security hole
Reply-To: 
In-Reply-To: <200204171513.g3HFD3n04056@fenrus.demon.nl>; from arjan@fenrus.demon.nl on Wed, Apr 17, 2002 at 04:13:03PM +0100

(NOTE: This may have already been answered by someone else, but I haven't 
seen it if it has, so I'm sending this through)

> Hi,
> while debugging GCC bugreport, I noticed following behaviour of simple
> program with no syscalls:
> 
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 28
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 56
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 84
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 112
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ echo
> 
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 28
> 
> 
> ie it always returns different value, moreover when something else
> is run in meantime (verified by loading WWW page served by same machine),
> the counter is reinitialized to 28.
> 
> I am attaching the source, but it needs to be compiled by cfg-branch GCC
> with settings -O2 -march=pentium3 -mfpmath=sse, so I've placed static
> binary to http://atrey.karlin.mff.cuni.cz/~hubicka/badsum.bin

Compiling the asm source with a different compiler will also make it fail.

> The problem appears to be reproducible only on pentium3 and athlon4 systems,
> not pentium4 system, where it appears to work as expected.  Reproduced on
> both 2.4.9-RH and 2.4.16 kernels.

[ program snipped ]

So there are two different issues at play here.  First, the kernel uses
the fninit instruction to initialize the fpu on first use.  Nothing in the
Intel manuals says anything about the fninit instruction clearing the mmx
or sse registers, and experimentally we now know for sure that it doesn't.  
That means that when the first time your program was ran it left 28 in
register xmm1.  The next time the program was run, the fninit did nothing
to clear register xmm1 so it still held 28.  Now, the pxor instruction
that is part of the m() function and intended to 0 out the xmm1 register
is an sse2 instruction.  It just so happens that it doesn't work on sse
only processors such as P3 CPUs.  So, when 28 was left in xmm1, then the
pxor failed to 0 out xmm1, we saved 28 as the starting value for the loop
and then looped through 7 additions until we had, you guessed it, 56.  In
fact, if you do a while :;do bad; done loop the it will increment by 28
each time it is run except when something else intervenes.  Replacing the
pxor instruction with xorps instead makes it work.  So, that's a bug in
gcc I suspect, using sse2 instructions when only called to use sse
instructions.  It seems odd to me that the CPU wouldn't generate an
illegal instruction exception, but oh well, it evidently doesn't.

So, we really should change arch/i386/kernel/i387.c something like this:

(WARNING, totally untested and not even compile checked change follows)

--- i387.c.save	Wed Apr 17 19:22:47 2002
+++ i387.c	Wed Apr 17 19:28:27 2002
@@ -33,8 +33,26 @@
 void init_fpu(void)
 {
 	__asm__("fninit");
-	if ( cpu_has_xmm )
+	if ( cpu_has_mmx )
+		asm volatile("xorq %%mm0, %%mm0;
+			      xorq %%mm1, %%mm1;
+			      xorq %%mm2, %%mm2;
+			      xorq %%mm3, %%mm3;
+			      xorq %%mm4, %%mm4;
+			      xorq %%mm5, %%mm5;
+			      xorq %%mm6, %%mm6;
+			      xorq %%mm7, %%mm7");
+	if ( cpu_has_xmm ) {
+		asm volatile("xorps %%xmm0, %%xmm0;
+			      xorps %%xmm1, %%xmm1;
+			      xorps %%xmm2, %%xmm2;
+			      xorps %%xmm3, %%xmm3;
+			      xorps %%xmm4, %%xmm4;
+			      xorps %%xmm5, %%xmm5;
+			      xorps %%xmm6, %%xmm6;
+			      xorps %%xmm7, %%xmm7");
 		load_mxcsr(0x1f80);
+	}
 		
 	current->used_math = 1;
 }


The rest of the problem is a gcc bug and possibly something that Intel 
should make a note of on the p3 processors (that is, that the p3 will 
silently fail to execute some sse2 instructions without generating the 
expected exception).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-17 23:42 Doug Ledford
@ 2002-04-18  5:26 ` Andrea Arcangeli
  2002-04-18  9:10   ` Arjan van de Ven
                     ` (2 more replies)
  2002-04-18  8:22 ` Andi Kleen
  1 sibling, 3 replies; 30+ messages in thread
From: Andrea Arcangeli @ 2002-04-18  5:26 UTC (permalink / raw)
  To: Doug Ledford, jh, linux-kernel, jakub, aj, ak, pavel

On Wed, Apr 17, 2002 at 07:42:49PM -0400, Doug Ledford wrote:
> --- i387.c.save	Wed Apr 17 19:22:47 2002
> +++ i387.c	Wed Apr 17 19:28:27 2002
> @@ -33,8 +33,26 @@
>  void init_fpu(void)
>  {
>  	__asm__("fninit");
> -	if ( cpu_has_xmm )
> +	if ( cpu_has_mmx )
> +		asm volatile("xorq %%mm0, %%mm0;
> +			      xorq %%mm1, %%mm1;
> +			      xorq %%mm2, %%mm2;
> +			      xorq %%mm3, %%mm3;
> +			      xorq %%mm4, %%mm4;
> +			      xorq %%mm5, %%mm5;
> +			      xorq %%mm6, %%mm6;
> +			      xorq %%mm7, %%mm7");

This mean the mmx isn't really backwards compatible and that's
potentially a problem for all the legacy x86 multiuser operative
systems.  That's an hardware design bug, not a software problem.  In
short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
same potentially applies to windows NT and other unix, no matter of SSE.

I verified with this simple proggy:

main()
{
	long long x = 2;
	long long z = 3;

	asm volatile("movq %0, %%mm0":: "m" (x));
	asm volatile("fninit");
	asm volatile("movq %%mm0, %0": "=m" (z):);

	printf("%d\n", z);
}

it prints 2 here, while it should print zero or at least random to be
backwards compatible.

SSE was a completly different issue, that is a software bug.  SSE is
disabled by non aware OS, and so if we enable it we also must take care
of clearing it at the first math fault.

> +	if ( cpu_has_xmm ) {
> +		asm volatile("xorps %%xmm0, %%xmm0;
> +			      xorps %%xmm1, %%xmm1;
> +			      xorps %%xmm2, %%xmm2;
> +			      xorps %%xmm3, %%xmm3;
> +			      xorps %%xmm4, %%xmm4;
> +			      xorps %%xmm5, %%xmm5;
> +			      xorps %%xmm6, %%xmm6;
> +			      xorps %%xmm7, %%xmm7");

The patch has a couple of problems. xorq doesn't exists. Since there are
no params you should also drop one %. Also I think we need an emms after
the mmx operations to remain binary compatible with the x86 ABI.

How does this look?

--- 2.4.19pre7aa1/arch/i386/kernel/i387.c.~1~	Thu Apr 18 05:23:12 2002
+++ 2.4.19pre7aa1/arch/i386/kernel/i387.c	Thu Apr 18 07:20:26 2002
@@ -33,8 +33,28 @@
 void init_fpu(void)
 {
 	__asm__("fninit");
-	if ( cpu_has_xmm )
+	if (cpu_has_mmx) {
+		asm volatile("pxor %mm0, %mm0\n\t"
+			     "movq %mm0, %mm1\n\t"
+			     "movq %mm0, %mm2\n\t"
+			     "movq %mm0, %mm3\n\t"
+			     "movq %mm0, %mm4\n\t"
+			     "movq %mm0, %mm5\n\t"
+			     "movq %mm0, %mm6\n\t"
+			     "movq %mm0, %mm7\n\t"
+			     "emms\n");
+	}
+	if ( cpu_has_xmm ) {
+		asm volatile("xorps %xmm0, %xmm0\n\t"
+			     "xorps %xmm1, %xmm1\n\t"
+			     "xorps %xmm2, %xmm2\n\t"
+			     "xorps %xmm3, %xmm3\n\t"
+			     "xorps %xmm4, %xmm4\n\t"
+			     "xorps %xmm5, %xmm5\n\t"
+			     "xorps %xmm6, %xmm6\n\t"
+			     "xorps %xmm7, %xmm7\n");
 		load_mxcsr(0x1f80);
+	}
 		
 	current->used_math = 1;
 }

Andrea

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-17 23:42 Doug Ledford
  2002-04-18  5:26 ` Andrea Arcangeli
@ 2002-04-18  8:22 ` Andi Kleen
  1 sibling, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2002-04-18  8:22 UTC (permalink / raw)
  To: Doug Ledford, jh, linux-kernel, jakub, aj, ak, pavel

> --- i387.c.save	Wed Apr 17 19:22:47 2002
> +++ i387.c	Wed Apr 17 19:28:27 2002
> @@ -33,8 +33,26 @@
>  void init_fpu(void)
>  {
>  	__asm__("fninit");
> -	if ( cpu_has_xmm )
> +	if ( cpu_has_mmx )

Shouldn't that be cpu_has_mmx && !cpu_has_sse2  ?


-Andi

> +		asm volatile("xorq %%mm0, %%mm0;
> +			      xorq %%mm1, %%mm1;
> +			      xorq %%mm2, %%mm2;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18  5:26 ` Andrea Arcangeli
@ 2002-04-18  9:10   ` Arjan van de Ven
  2002-04-18 11:18   ` Alan Cox
  2002-04-18 13:44   ` Doug Ledford
  2 siblings, 0 replies; 30+ messages in thread
From: Arjan van de Ven @ 2002-04-18  9:10 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Andrea Arcangeli wrote:
> 
> On Wed, Apr 17, 2002 at 07:42:49PM -0400, Doug Ledford wrote:
> > --- i387.c.save       Wed Apr 17 19:22:47 2002
> > +++ i387.c    Wed Apr 17 19:28:27 2002
> > @@ -33,8 +33,26 @@
> >  void init_fpu(void)
> >  {
> >       __asm__("fninit");
> > -     if ( cpu_has_xmm )
> > +     if ( cpu_has_mmx )
> > +             asm volatile("xorq %%mm0, %%mm0;
> > +                           xorq %%mm1, %%mm1;
> > +                           xorq %%mm2, %%mm2;
> > +                           xorq %%mm3, %%mm3;
> > +                           xorq %%mm4, %%mm4;
> > +                           xorq %%mm5, %%mm5;
> > +                           xorq %%mm6, %%mm6;
> > +                           xorq %%mm7, %%mm7");
> 
> This mean the mmx isn't really backwards compatible and that's
> potentially a problem for all the legacy x86 multiuser operative
> systems.  That's an hardware design bug, not a software problem.  In
> short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
> same potentially applies to windows NT and other unix, no matter of SSE.
> 
> I verified with this simple proggy:
> 
> main()
> {
>         long long x = 2;
>         long long z = 3;
> 
>         asm volatile("movq %0, %%mm0":: "m" (x));
>         asm volatile("fninit");
>         asm volatile("movq %%mm0, %0": "=m" (z):);
> 
>         printf("%d\n", z);
> }
> 
> it prints 2 here, while it should print zero or at least random to be
> backwards compatible.
> 
> SSE was a completly different issue, that is a software bug.  SSE is
> disabled by non aware OS, and so if we enable it we also must take care
> of clearing it at the first math fault.
> 
> > +     if ( cpu_has_xmm ) {
> > +             asm volatile("xorps %%xmm0, %%xmm0;
> > +                           xorps %%xmm1, %%xmm1;
> > +                           xorps %%xmm2, %%xmm2;
> > +                           xorps %%xmm3, %%xmm3;
> > +                           xorps %%xmm4, %%xmm4;
> > +                           xorps %%xmm5, %%xmm5;
> > +                           xorps %%xmm6, %%xmm6;
> > +                           xorps %%xmm7, %%xmm7");
> 
> The patch has a couple of problems. xorq doesn't exists. Since there are
> no params you should also drop one %. Also I think we need an emms after
> the mmx operations to remain binary compatible with the x86 ABI.
> 
> How does this look?
> 
> --- 2.4.19pre7aa1/arch/i386/kernel/i387.c.~1~   Thu Apr 18 05:23:12 2002
> +++ 2.4.19pre7aa1/arch/i386/kernel/i387.c       Thu Apr 18 07:20:26 2002
> @@ -33,8 +33,28 @@
>  void init_fpu(void)
>  {
>         __asm__("fninit");
> -       if ( cpu_has_xmm )
> +       if (cpu_has_mmx) {
> +               asm volatile("pxor %mm0, %mm0\n\t"
> +                            "movq %mm0, %mm1\n\t"
> +                            "movq %mm0, %mm2\n\t"
> +                            "movq %mm0, %mm3\n\t"
> +                            "movq %mm0, %mm4\n\t"
> +                            "movq %mm0, %mm5\n\t"
> +                            "movq %mm0, %mm6\n\t"
> +                            "movq %mm0, %mm7\n\t"
> +                            "emms\n");
> +       }
> +       if ( cpu_has_xmm ) {
> +               asm volatile("xorps %xmm0, %xmm0\n\t"
> +                            "xorps %xmm1, %xmm1\n\t"
> +                            "xorps %xmm2, %xmm2\n\t"
> +                            "xorps %xmm3, %xmm3\n\t"
> +                            "xorps %xmm4, %xmm4\n\t"
> +                            "xorps %xmm5, %xmm5\n\t"
> +                            "xorps %xmm6, %xmm6\n\t"
> +                            "xorps %xmm7, %xmm7\n");
>                 load_mxcsr(0x1f80);
> +       }
> 
>         current->used_math = 1;
>  }

Looks good; I just did the same thing ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 11:18   ` Alan Cox
@ 2002-04-18 11:14     ` Andi Kleen
  2002-04-18 11:53       ` Alan Cox
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2002-04-18 11:14 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Doug Ledford, jh, linux-kernel, jakub, aj, ak,
	pavel

On Thu, Apr 18, 2002 at 12:18:34PM +0100, Alan Cox wrote:
> > This mean the mmx isn't really backwards compatible and that's
> > potentially a problem for all the legacy x86 multiuser operative
> > systems.  That's an hardware design bug, not a software problem.  In
> > short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
> > same potentially applies to windows NT and other unix, no matter of SSE.
> 
> That was my initial reaction but when I reread the documentation the 
> Intel folks are actually saying even back in Pentium MMX days that it isnt
> guaranteed that the FP/MMX state are not seperate registers

In this case it would be possible to only do the explicit clear
when the CPU does support sse1. For mmx only it shouldn't be needed.
For sse2 also not.


-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18  5:26 ` Andrea Arcangeli
  2002-04-18  9:10   ` Arjan van de Ven
@ 2002-04-18 11:18   ` Alan Cox
  2002-04-18 11:14     ` Andi Kleen
  2002-04-18 13:44   ` Doug Ledford
  2 siblings, 1 reply; 30+ messages in thread
From: Alan Cox @ 2002-04-18 11:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Doug Ledford, jh, linux-kernel, jakub, aj, ak, pavel

> This mean the mmx isn't really backwards compatible and that's
> potentially a problem for all the legacy x86 multiuser operative
> systems.  That's an hardware design bug, not a software problem.  In
> short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
> same potentially applies to windows NT and other unix, no matter of SSE.

That was my initial reaction but when I reread the documentation the 
Intel folks are actually saying even back in Pentium MMX days that it isnt
guaranteed that the FP/MMX state are not seperate registers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 11:53       ` Alan Cox
@ 2002-04-18 11:46         ` Andi Kleen
  2002-04-18 11:55         ` Andi Kleen
  1 sibling, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2002-04-18 11:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, Andrea Arcangeli, Doug Ledford, jh, linux-kernel,
	jakub, aj, pavel

On Thu, Apr 18, 2002 at 12:53:12PM +0100, Alan Cox wrote:
> > > Intel folks are actually saying even back in Pentium MMX days that it isnt
> > > guaranteed that the FP/MMX state are not seperate registers
> > 
> > In this case it would be possible to only do the explicit clear
> > when the CPU does support sse1. For mmx only it shouldn't be needed.
> > For sse2 also not.
> 
> Do you have a documentation cite for that claim ?

Never mind. It was a bogus suggestion and fninit indeed also doesn't clear
XMM on P4 and other SSE2 implementation.

-Andi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 11:14     ` Andi Kleen
@ 2002-04-18 11:53       ` Alan Cox
  2002-04-18 11:46         ` Andi Kleen
  2002-04-18 11:55         ` Andi Kleen
  0 siblings, 2 replies; 30+ messages in thread
From: Alan Cox @ 2002-04-18 11:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Andrea Arcangeli, Doug Ledford, jh, linux-kernel, jakub,
	aj, ak, pavel

> > Intel folks are actually saying even back in Pentium MMX days that it isnt
> > guaranteed that the FP/MMX state are not seperate registers
> 
> In this case it would be possible to only do the explicit clear
> when the CPU does support sse1. For mmx only it shouldn't be needed.
> For sse2 also not.

Do you have a documentation cite for that claim ?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 11:53       ` Alan Cox
  2002-04-18 11:46         ` Andi Kleen
@ 2002-04-18 11:55         ` Andi Kleen
  1 sibling, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2002-04-18 11:55 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, Andrea Arcangeli, Doug Ledford, jh, linux-kernel,
	jakub, aj, pavel

On Thu, Apr 18, 2002 at 12:53:12PM +0100, Alan Cox wrote:
> > > Intel folks are actually saying even back in Pentium MMX days that it isnt
> > > guaranteed that the FP/MMX state are not seperate registers
> > 
> > In this case it would be possible to only do the explicit clear
> > when the CPU does support sse1. For mmx only it shouldn't be needed.
> > For sse2 also not.
> 
> Do you have a documentation cite for that claim ?

Actually I did some more tests: 

test program

main()
{
	unsigned int i[4], o[4]; 

	i[0] = 1; i[1] = 2; i[2] = 3; i[3] = 4;
	asm("movups %1,%%xmm1 ; fninit ; movups %%xmm1,%0" : "=m" (o) : "m" (i)); 
	printf("%x %x %x %x\n",o[0],o[1],o[2],o[3]);

	asm("movups %1,%%xmm1 ; movups %%xmm1,%0" : "=m" (o) : "m" (i)); 
	printf("%x %x %x %x\n",o[0],o[1],o[2],o[3]);
}

Result on a pentium4: 

./xmm
bffff68c 8048431 8049640 8049660
bffff68c bffff68c bffff68c 8048431

So fninit seems to change something in XMM1. 

and pentium 3: 

bffff81c 8048431 8049640 8049660
bffff81c bffff81c bffff81c 8048431

changes something different ? 

If even Intel cannot agree on this it is probably safest to do an explicit
zeroing like Andrea's patch does. I retract the origina suggestion.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18  5:26 ` Andrea Arcangeli
  2002-04-18  9:10   ` Arjan van de Ven
  2002-04-18 11:18   ` Alan Cox
@ 2002-04-18 13:44   ` Doug Ledford
  2002-04-18 19:20     ` Pavel Machek
  2 siblings, 1 reply; 30+ messages in thread
From: Doug Ledford @ 2002-04-18 13:44 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: jh, linux-kernel, jakub, aj, ak, pavel

On Thu, Apr 18, 2002 at 07:26:15AM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 17, 2002 at 07:42:49PM -0400, Doug Ledford wrote:
> > --- i387.c.save	Wed Apr 17 19:22:47 2002
> > +++ i387.c	Wed Apr 17 19:28:27 2002
> > @@ -33,8 +33,26 @@
> >  void init_fpu(void)
> >  {
> >  	__asm__("fninit");
> > -	if ( cpu_has_xmm )
> > +	if ( cpu_has_mmx )
> > +		asm volatile("xorq %%mm0, %%mm0;
> > +			      xorq %%mm1, %%mm1;
> > +			      xorq %%mm2, %%mm2;
> > +			      xorq %%mm3, %%mm3;
> > +			      xorq %%mm4, %%mm4;
> > +			      xorq %%mm5, %%mm5;
> > +			      xorq %%mm6, %%mm6;
> > +			      xorq %%mm7, %%mm7");
> 
> This mean the mmx isn't really backwards compatible and that's
> potentially a problem for all the legacy x86 multiuser operative
> systems.  That's an hardware design bug, not a software problem.  In
> short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
> same potentially applies to windows NT and other unix, no matter of SSE.

Why is that not backwards compatible?  I've never heard of anywhere that 
specifies that the starting value in the mmx registers will be anything of 
consequence?  Also, even though register space is (possibly) shared with 
the FP register stack, clearing out the MMX registers does not actually 
harm the FP register stack since the fninit already blows the stack away, 
which forces the application to load fp data before it can use the fpu 
again.

> I verified with this simple proggy:
> 
> main()
> {
> 	long long x = 2;
> 	long long z = 3;
> 
> 	asm volatile("movq %0, %%mm0":: "m" (x));
> 	asm volatile("fninit");
> 	asm volatile("movq %%mm0, %0": "=m" (z):);
> 
> 	printf("%d\n", z);
> }
> 
> it prints 2 here, while it should print zero or at least random to be
> backwards compatible.

I verified the same here last night with a similar test.

> The patch has a couple of problems. xorq doesn't exists. Since there are
> no params you should also drop one %. Also I think we need an emms after
> the mmx operations to remain binary compatible with the x86 ABI.

I did mention that it was an untested and even uncompiled patch in my 
email after all ;-)

> How does this look?

I'm fine with this.

> --- 2.4.19pre7aa1/arch/i386/kernel/i387.c.~1~	Thu Apr 18 05:23:12 2002
> +++ 2.4.19pre7aa1/arch/i386/kernel/i387.c	Thu Apr 18 07:20:26 2002
> @@ -33,8 +33,28 @@
>  void init_fpu(void)
>  {
>  	__asm__("fninit");
> -	if ( cpu_has_xmm )
> +	if (cpu_has_mmx) {
> +		asm volatile("pxor %mm0, %mm0\n\t"
> +			     "movq %mm0, %mm1\n\t"
> +			     "movq %mm0, %mm2\n\t"
> +			     "movq %mm0, %mm3\n\t"
> +			     "movq %mm0, %mm4\n\t"
> +			     "movq %mm0, %mm5\n\t"
> +			     "movq %mm0, %mm6\n\t"
> +			     "movq %mm0, %mm7\n\t"
> +			     "emms\n");
> +	}
> +	if ( cpu_has_xmm ) {
> +		asm volatile("xorps %xmm0, %xmm0\n\t"
> +			     "xorps %xmm1, %xmm1\n\t"
> +			     "xorps %xmm2, %xmm2\n\t"
> +			     "xorps %xmm3, %xmm3\n\t"
> +			     "xorps %xmm4, %xmm4\n\t"
> +			     "xorps %xmm5, %xmm5\n\t"
> +			     "xorps %xmm6, %xmm6\n\t"
> +			     "xorps %xmm7, %xmm7\n");
>  		load_mxcsr(0x1f80);
> +	}
>  		
>  	current->used_math = 1;
>  }
> 
> Andrea

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-17 15:23 ` Jan Hubicka
@ 2002-04-18 14:57   ` Denis Vlasenko
  0 siblings, 0 replies; 30+ messages in thread
From: Denis Vlasenko @ 2002-04-18 14:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: bugtraq, linux-kernel, Pavel Machek, jakub, aj, ak

On 17 April 2002 13:23, Jan Hubicka wrote:
> #include <stdlib.h>
> #include <stdio.h>
>
> int
> m ()
> {
>   int i, n = 7;
>   float comp, sum = 0;
          ^^^^
unused?

>   sin(1);

So, removing this sin() stops the bug?

>   for (i = 1; i <= n; ++i)
>     sum += i;
>   printf ("sum of %d ints: %g\n", n, sum);
>   return 0;
> }
>
> main ()
> {
>   m ();
> }
Can m() body be placed directly in main()?
--
vda

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
@ 2002-04-18 18:36 linux
  2002-04-18 18:53 ` Richard B. Johnson
  2002-04-18 21:06 ` H. Peter Anvin
  0 siblings, 2 replies; 30+ messages in thread
From: linux @ 2002-04-18 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux

Um, people here seem to be assuming that, in the absence of MMX,
fninit *doesn't* leak information.

I thought it was well-known to just clear (set to all-ones) the
tag register and not alter the actual floating-point registers.

Thus, it seems quite feasible to reset the tag word with FLDENV and
store out the FPU registers, even on an 80387.

Isn't this the same security hole?  Shouldn't there be 8 FLDZ instructions
(or equivalent) in the processor state initialization?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 18:36 SSE related security hole linux
@ 2002-04-18 18:53 ` Richard B. Johnson
  2002-04-21 19:52   ` Pavel Machek
  2002-04-21 22:11   ` David Wagner
  2002-04-18 21:06 ` H. Peter Anvin
  1 sibling, 2 replies; 30+ messages in thread
From: Richard B. Johnson @ 2002-04-18 18:53 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel

On 18 Apr 2002 linux@horizon.com wrote:

> Um, people here seem to be assuming that, in the absence of MMX,
> fninit *doesn't* leak information.
> 
> I thought it was well-known to just clear (set to all-ones) the
> tag register and not alter the actual floating-point registers.
> 
> Thus, it seems quite feasible to reset the tag word with FLDENV and
> store out the FPU registers, even on an 80387.
> 
> Isn't this the same security hole?  Shouldn't there be 8 FLDZ instructions
> (or equivalent) in the processor state initialization?

Well, if what's on the internal stack of the FPU can actually leak
information, I think the notion of "leak" has expanded just a bit
too much.

A rogue process could not even know what instruction was about to
be executed, nor what the previous instruction was, nor when since
boot it was executed, nor by whom. The 'data' associated with those
unknown instructions would make a good random number generator,
or a round-about method of obtaining PI (st0 almost always contains it).
That's about all.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 13:44   ` Doug Ledford
@ 2002-04-18 19:20     ` Pavel Machek
  2002-04-18 19:32       ` Doug Ledford
  0 siblings, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2002-04-18 19:20 UTC (permalink / raw)
  To: Doug Ledford, Andrea Arcangeli, jh, linux-kernel, jakub, aj, ak,
	pavel, 20020417194249.B23438, 20020418072615.I14322

Hi!

> > > +		asm volatile("xorq %%mm0, %%mm0;
> > > +			      xorq %%mm1, %%mm1;
> > > +			      xorq %%mm2, %%mm2;
> > > +			      xorq %%mm3, %%mm3;
> > > +			      xorq %%mm4, %%mm4;
> > > +			      xorq %%mm5, %%mm5;
> > > +			      xorq %%mm6, %%mm6;
> > > +			      xorq %%mm7, %%mm7");
> > 
> > This mean the mmx isn't really backwards compatible and that's
> > potentially a problem for all the legacy x86 multiuser operative
> > systems.  That's an hardware design bug, not a software problem.  In
> > short running a 2.[02] kernel on a MMX capable CPU isn't secure, the
> > same potentially applies to windows NT and other unix, no matter of SSE.
> 
> Why is that not backwards compatible?  I've never heard of anywhere that 
> specifies that the starting value in the mmx registers will be anything of 
> consequence?  Also, even though register space is (possibly) shared with 
> the FP register stack, clearing out the MMX registers does not actually 
> harm the FP register stack since the fninit already blows the stack away, 
> which forces the application to load fp data before it can use the fpu 
> again.

It introduces security hole: Unrelated tasks now have your top secret
value you stored in one of your registers.
								Pavel
-- 
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 19:20     ` Pavel Machek
@ 2002-04-18 19:32       ` Doug Ledford
  2002-04-21 19:54         ` Pavel Machek
  0 siblings, 1 reply; 30+ messages in thread
From: Doug Ledford @ 2002-04-18 19:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrea Arcangeli, jh, linux-kernel, jakub, aj, ak, pavel,
	20020417194249.B23438, 20020418072615.I14322

On Thu, Apr 18, 2002 at 09:20:03PM +0200, Pavel Machek wrote:
> It introduces security hole: Unrelated tasks now have your top secret
> value you stored in one of your registers.

Well, that's been my point all along and why I sent the patch.  I was not 
asking why leaving the registers alone instead of 0ing them out was not a 
security hole.  I was asking why doing so was not backward compatible?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 18:36 SSE related security hole linux
  2002-04-18 18:53 ` Richard B. Johnson
@ 2002-04-18 21:06 ` H. Peter Anvin
  1 sibling, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2002-04-18 21:06 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20020418183639.20946.qmail@science.horizon.com>
By author:    linux@horizon.com
In newsgroup: linux.dev.kernel
>
> Um, people here seem to be assuming that, in the absence of MMX,
> fninit *doesn't* leak information.
> 
> I thought it was well-known to just clear (set to all-ones) the
> tag register and not alter the actual floating-point registers.
> 
> Thus, it seems quite feasible to reset the tag word with FLDENV and
> store out the FPU registers, even on an 80387.
> 
> Isn't this the same security hole?  Shouldn't there be 8 FLDZ instructions
> (or equivalent) in the processor state initialization?
> 

Perhaps the right thing to do is to have a description in data of the
desired initialization state and just F[NX]RSTOR it?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
       [not found] <200204182320.53095.nahshon@actcom.co.il>
@ 2002-04-19 11:22 ` Alan Cox
  0 siblings, 0 replies; 30+ messages in thread
From: Alan Cox @ 2002-04-19 11:22 UTC (permalink / raw)
  To: nahshon
  Cc: Andi Kleen, Alan Cox, Andrea Arcangeli, Doug Ledford, jh,
	linux-kernel, jakub, aj, pavel

> If the FP/MMX state _are_ separate regs then they must also be
> stored/reloaded separately on a context switch.
> Is that already done? (if yes, where?)

FXSAVE/FXRSTOR

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
       [not found] ` <a9ncgs$2s2$1@cesium.transmeta.com.suse.lists.linux.kernel>
@ 2002-04-19 14:06   ` Andi Kleen
  2002-04-19 18:00     ` Doug Ledford
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2002-04-19 14:06 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:
> 
> Perhaps the right thing to do is to have a description in data of the
> desired initialization state and just F[NX]RSTOR it?

Sounds like the cleanest solution. The state could be saved at CPU bootup
with just MXCSR initialized.

I'll implement that for x86-64.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 14:06   ` Andi Kleen
@ 2002-04-19 18:00     ` Doug Ledford
  2002-04-19 21:04       ` Andrea Arcangeli
  0 siblings, 1 reply; 30+ messages in thread
From: Doug Ledford @ 2002-04-19 18:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel

On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> > 
> > Perhaps the right thing to do is to have a description in data of the
> > desired initialization state and just F[NX]RSTOR it?
> 
> Sounds like the cleanest solution. The state could be saved at CPU bootup
> with just MXCSR initialized.
> 
> I'll implement that for x86-64.

Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
likely *very* much faster.  Someone should check this before we sacrifice 
100 cycles needlessly or something.

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 18:00     ` Doug Ledford
@ 2002-04-19 21:04       ` Andrea Arcangeli
  2002-04-19 21:35         ` H. Peter Anvin
  2002-04-19 22:18         ` Jan Hubicka
  0 siblings, 2 replies; 30+ messages in thread
From: Andrea Arcangeli @ 2002-04-19 21:04 UTC (permalink / raw)
  To: Andi Kleen, H. Peter Anvin, linux-kernel; +Cc: Jan Hubicka

On Fri, Apr 19, 2002 at 02:00:31PM -0400, Doug Ledford wrote:
> On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> > "H. Peter Anvin" <hpa@zytor.com> writes:
> > > 
> > > Perhaps the right thing to do is to have a description in data of the
> > > desired initialization state and just F[NX]RSTOR it?
> > 
> > Sounds like the cleanest solution. The state could be saved at CPU bootup
> > with just MXCSR initialized.
> > 
> > I'll implement that for x86-64.
> 
> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
> likely *very* much faster.  Someone should check this before we sacrifice 
> 100 cycles needlessly or something.

most probably yes, fxrestor needs to read ram, pxor also takes some
icache and bytecode ram but it sounds like it will be faster.

Maybe we could also interleave the pxor with the xorps, since they uses
different parts of the cpu, Honza?

diff -urN ref/arch/x86_64/kernel/i387.c xmm/arch/x86_64/kernel/i387.c
--- ref/arch/x86_64/kernel/i387.c	Fri Apr 19 19:37:30 2002
+++ xmm/arch/x86_64/kernel/i387.c	Fri Apr 19 19:39:02 2002
@@ -34,6 +34,31 @@
 	struct task_struct *me = current;
 	__asm__("fninit");
 	load_mxcsr(0x1f80);
+	asm volatile("pxor %mm0, %mm0\n\t"
+		     "pxor %mm1, %mm1\n\t"
+		     "pxor %mm2, %mm2\n\t"
+		     "pxor %mm3, %mm3\n\t"
+		     "pxor %mm4, %mm4\n\t"
+		     "pxor %mm5, %mm5\n\t"
+		     "pxor %mm6, %mm6\n\t"
+		     "pxor %mm7, %mm7\n\t"
+		     "emms\n\t"
+		     "xorps %xmm0,  %xmm0\n\t"
+		     "xorps %xmm1,  %xmm1\n\t"
+		     "xorps %xmm2,  %xmm2\n\t"
+		     "xorps %xmm3,  %xmm3\n\t"
+		     "xorps %xmm4,  %xmm4\n\t"
+		     "xorps %xmm5,  %xmm5\n\t"
+		     "xorps %xmm6,  %xmm6\n\t"
+		     "xorps %xmm7,  %xmm7\n\t"
+		     "xorps %xmm8,  %xmm8\n\t"
+		     "xorps %xmm9,  %xmm9\n\t"
+		     "xorps %xmm10, %xmm10\n\t"
+		     "xorps %xmm11, %xmm11\n\t"
+		     "xorps %xmm12, %xmm12\n\t"
+		     "xorps %xmm13, %xmm13\n\t"
+		     "xorps %xmm14, %xmm14\n\t"
+		     "xorps %xmm15, %xmm15\n");
 	me->used_math = 1;
 }
 

Andrea

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:04       ` Andrea Arcangeli
@ 2002-04-19 21:35         ` H. Peter Anvin
  2002-04-19 21:42           ` Andi Kleen
  2002-04-19 22:18         ` Jan Hubicka
  1 sibling, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2002-04-19 21:35 UTC (permalink / raw)
  To: andrea; +Cc: ak, hpa, linux-kernel, jh

>>
>> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup
>> is  likely *very* much faster.  Someone should check this before we
>> sacrifice  100 cycles needlessly or something.
>
> most probably yes, fxrestor needs to read ram, pxor also takes some
> icache and bytecode ram but it sounds like it will be faster.
>
> Maybe we could also interleave the pxor with the xorps, since they uses
> different parts of the cpu, Honza?
>

You almost certainly should.  The reason I suggested FXRSTOR is that it
would initialize the entire FPU, including any state that future
processors may add, thus reducing the likelihood of any funnies in the
future.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:35         ` H. Peter Anvin
@ 2002-04-19 21:42           ` Andi Kleen
  2002-04-20  3:23             ` Andrea Arcangeli
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2002-04-19 21:42 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: andrea, ak, linux-kernel, jh

On Fri, Apr 19, 2002 at 02:35:57PM -0700, H. Peter Anvin wrote:
> >>
> >> Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup
> >> is  likely *very* much faster.  Someone should check this before we
> >> sacrifice  100 cycles needlessly or something.
> >
> > most probably yes, fxrestor needs to read ram, pxor also takes some
> > icache and bytecode ram but it sounds like it will be faster.
> >
> > Maybe we could also interleave the pxor with the xorps, since they uses
> > different parts of the cpu, Honza?
> >
> 
> You almost certainly should.  The reason I suggested FXRSTOR is that it
> would initialize the entire FPU, including any state that future
> processors may add, thus reducing the likelihood of any funnies in the
> future.

That's also why I like it. 

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:04       ` Andrea Arcangeli
  2002-04-19 21:35         ` H. Peter Anvin
@ 2002-04-19 22:18         ` Jan Hubicka
  1 sibling, 0 replies; 30+ messages in thread
From: Jan Hubicka @ 2002-04-19 22:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andi Kleen, H. Peter Anvin, linux-kernel, Jan Hubicka

> On Fri, Apr 19, 2002 at 02:00:31PM -0400, Doug Ledford wrote:
> > On Fri, Apr 19, 2002 at 04:06:17PM +0200, Andi Kleen wrote:
> > > "H. Peter Anvin" <hpa@zytor.com> writes:
> > > > 
> > > > Perhaps the right thing to do is to have a description in data of the
> > > > desired initialization state and just F[NX]RSTOR it?
> > > 
> > > Sounds like the cleanest solution. The state could be saved at CPU bootup
> > > with just MXCSR initialized.
> > > 
> > > I'll implement that for x86-64.
> > 
> > Ummm...last I knew, fxrstor is *expensive*.  The fninit/xor regs setup is 
> > likely *very* much faster.  Someone should check this before we sacrifice 
> > 100 cycles needlessly or something.
> 
> most probably yes, fxrestor needs to read ram, pxor also takes some
> icache and bytecode ram but it sounds like it will be faster.
> 
> Maybe we could also interleave the pxor with the xorps, since they uses
> different parts of the cpu, Honza?

Yes, I guess that should help to at least some chips.
Definitly nothing to loose :)

Honza

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-19 21:42           ` Andi Kleen
@ 2002-04-20  3:23             ` Andrea Arcangeli
  0 siblings, 0 replies; 30+ messages in thread
From: Andrea Arcangeli @ 2002-04-20  3:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel, jh, bgerst

On Fri, Apr 19, 2002 at 11:42:06PM +0200, Andi Kleen wrote:
> On Fri, Apr 19, 2002 at 02:35:57PM -0700, H. Peter Anvin wrote:
> > would initialize the entire FPU, including any state that future
> > processors may add, thus reducing the likelihood of any funnies in the
> > future.
> 
> That's also why I like it. 

Trusting the "boot state" of the cpu would require the BIOS to match the
linux ABI. The FPU must be in a known initialized state at the linux
level, not at the BIOS level, as first for the mxcsr, but also the other
registers should be set to zero by default so gcc can exploit that (I
guess that's what gcc is just doing and that's why Honza noticed it). so
if new future processors will add new stuff, the new stuff will have to
be initialized again in the "fxrestor" default payload in linux (so
requiring a modification to the OS), and having to change the default
rxrestor payload for a new cpu is equivalent to add another xor there
(modulo the runtime check on the cpu features that could be avoided with
two separate exception handlers for each cpu revision but it's fast
enough that it doesn't matter at the moment on x86).

Andrea

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 18:53 ` Richard B. Johnson
@ 2002-04-21 19:52   ` Pavel Machek
  2002-04-21 22:11   ` David Wagner
  1 sibling, 0 replies; 30+ messages in thread
From: Pavel Machek @ 2002-04-21 19:52 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: linux, linux-kernel

Hi!

> > Um, people here seem to be assuming that, in the absence of MMX,
> > fninit *doesn't* leak information.
> > 
> > I thought it was well-known to just clear (set to all-ones) the
> > tag register and not alter the actual floating-point registers.
> > 
> > Thus, it seems quite feasible to reset the tag word with FLDENV and
> > store out the FPU registers, even on an 80387.
> > 
> > Isn't this the same security hole?  Shouldn't there be 8 FLDZ instructions
> > (or equivalent) in the processor state initialization?
> 
> Well, if what's on the internal stack of the FPU can actually leak
> information, I think the notion of "leak" has expanded just a bit
> too much.
> 
> A rogue process could not even know what instruction was about to
> be executed, nor what the previous instruction was, nor when since
> boot it was executed, nor by whom. The 'data' associated with those

If fpu unit was used to memcpy your .ssh/identity, well, you might
change your mind.
									Pavel
-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 19:32       ` Doug Ledford
@ 2002-04-21 19:54         ` Pavel Machek
  0 siblings, 0 replies; 30+ messages in thread
From: Pavel Machek @ 2002-04-21 19:54 UTC (permalink / raw)
  To: Doug Ledford, Andrea Arcangeli, jh, linux-kernel, jakub, aj, ak,
	pavel

Hi!

> > It introduces security hole: Unrelated tasks now have your top secret
> > value you stored in one of your registers.
> 
> Well, that's been my point all along and why I sent the patch.  I was not 
> asking why leaving the registers alone instead of 0ing them out was not a 
> security hole.  I was asking why doing so was not backward compatible?

Introducing security hole counts as "poor backcompatibility" to me.
									Pavel
-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: SSE related security hole
  2002-04-18 18:53 ` Richard B. Johnson
  2002-04-21 19:52   ` Pavel Machek
@ 2002-04-21 22:11   ` David Wagner
  1 sibling, 0 replies; 30+ messages in thread
From: David Wagner @ 2002-04-21 22:11 UTC (permalink / raw)
  To: linux-kernel

Richard B. Johnson wrote:
>Well, if what's on the internal stack of the FPU can actually leak
>information, I think the notion of "leak" has expanded just a bit
>too much.

Note that some crypto implemenations use the FPU heavily to speed up
the encryption process.  Thus, if FPU data can leak, secret keys are
at risk.  I don't know about you, but that doesn't sound good to me.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: SSE related security hole
@ 2002-04-22 22:24 Saxena, Sunil
  0 siblings, 0 replies; 30+ messages in thread
From: Saxena, Sunil @ 2002-04-22 22:24 UTC (permalink / raw)
  To: 'rhkernel-dept-list@redhat.com'
  Cc: linux-kernel, jakub, aj, ak, pavel

Hi Everyone,

Sorry it took us some time to respond to this issue.

The Intel prescribed method for correct execution of SSE/SSE2 instructions
requires that software detect the support for these instructions via the
CPUID feature flags bits, prior to executing the instruction (see
<<ftp://download.intel.com/design/perftool/cbts/appnotes/ap900/cpuosid.pdf>>
). Also, Section 2.2 of the IA-32 Intel(R) Architecture Software Developer's
Manual, Vol 2, indicates that the use of the operand size prefix with
MMX/SSE/SSE2 instructions is reserved and may cause unpredictable behavior.


We recognized that there is a discrepancy in the individual instruction
descriptions in Vol 2 where it is indicated that the instruction would
generate a UD#. We will be rectifying this discrepancy in the next revision
of Vol 2 as well as via the monthly Specification Updates.

Thanks
Sunil


-----Original Message-----
From: Doug Ledford [mailto:dledford@redhat.com] 
Sent: Wednesday, April 17, 2002 4:56 PM
To: sunil.saxena@intel.com
Cc: linux-kernel@vger.kernel.org; jakub@redhat.com; aj@suse.de; ak@suse.de;
pavel@atrey.karlin.mff.cuni.cz
Subject: Re: SSE related security hole


Subject: Re: SSE related security hole
Reply-To: 
In-Reply-To: <200204171513.g3HFD3n04056@fenrus.demon.nl>; from
arjan@fenrus.demon.nl on Wed, Apr 17, 2002 at 04:13:03PM +0100

(NOTE: This may have already been answered by someone else, but I haven't 
seen it if it has, so I'm sending this through)

> Hi,
> while debugging GCC bugreport, I noticed following behaviour of simple
> program with no syscalls:
> 
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 28
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 56
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 84
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 112
> Bad sum (seen with gcc -O -march=pentiumpro -msse)
> hubicka@nikam:~$ echo
> 
> hubicka@nikam:~$ ./a.out
> sum of 7 ints: 28
> 
> 
> ie it always returns different value, moreover when something else
> is run in meantime (verified by loading WWW page served by same machine),
> the counter is reinitialized to 28.
> 
> I am attaching the source, but it needs to be compiled by cfg-branch GCC
> with settings -O2 -march=pentium3 -mfpmath=sse, so I've placed static
> binary to http://atrey.karlin.mff.cuni.cz/~hubicka/badsum.bin

Compiling the asm source with a different compiler will also make it fail.

> The problem appears to be reproducible only on pentium3 and athlon4
systems,
> not pentium4 system, where it appears to work as expected.  Reproduced on
> both 2.4.9-RH and 2.4.16 kernels.

[ program snipped ]

So there are two different issues at play here.  First, the kernel uses
the fninit instruction to initialize the fpu on first use.  Nothing in the
Intel manuals says anything about the fninit instruction clearing the mmx
or sse registers, and experimentally we now know for sure that it doesn't.  
That means that when the first time your program was ran it left 28 in
register xmm1.  The next time the program was run, the fninit did nothing
to clear register xmm1 so it still held 28.  Now, the pxor instruction
that is part of the m() function and intended to 0 out the xmm1 register
is an sse2 instruction.  It just so happens that it doesn't work on sse
only processors such as P3 CPUs.  So, when 28 was left in xmm1, then the
pxor failed to 0 out xmm1, we saved 28 as the starting value for the loop
and then looped through 7 additions until we had, you guessed it, 56.  In
fact, if you do a while :;do bad; done loop the it will increment by 28
each time it is run except when something else intervenes.  Replacing the
pxor instruction with xorps instead makes it work.  So, that's a bug in
gcc I suspect, using sse2 instructions when only called to use sse
instructions.  It seems odd to me that the CPU wouldn't generate an
illegal instruction exception, but oh well, it evidently doesn't.

So, we really should change arch/i386/kernel/i387.c something like this:

(WARNING, totally untested and not even compile checked change follows)

--- i387.c.save	Wed Apr 17 19:22:47 2002
+++ i387.c	Wed Apr 17 19:28:27 2002
@@ -33,8 +33,26 @@
 void init_fpu(void)
 {
 	__asm__("fninit");
-	if ( cpu_has_xmm )
+	if ( cpu_has_mmx )
+		asm volatile("xorq %%mm0, %%mm0;
+			      xorq %%mm1, %%mm1;
+			      xorq %%mm2, %%mm2;
+			      xorq %%mm3, %%mm3;
+			      xorq %%mm4, %%mm4;
+			      xorq %%mm5, %%mm5;
+			      xorq %%mm6, %%mm6;
+			      xorq %%mm7, %%mm7");
+	if ( cpu_has_xmm ) {
+		asm volatile("xorps %%xmm0, %%xmm0;
+			      xorps %%xmm1, %%xmm1;
+			      xorps %%xmm2, %%xmm2;
+			      xorps %%xmm3, %%xmm3;
+			      xorps %%xmm4, %%xmm4;
+			      xorps %%xmm5, %%xmm5;
+			      xorps %%xmm6, %%xmm6;
+			      xorps %%xmm7, %%xmm7");
 		load_mxcsr(0x1f80);
+	}
 		
 	current->used_math = 1;
 }


The rest of the problem is a gcc bug and possibly something that Intel 
should make a note of on the p3 processors (that is, that the p3 will 
silently fail to execute some sse2 instructions without generating the 
expected exception).

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606
  



_______________________________________________
rhkernel-dept-list mailing list
rhkernel-dept-list@redhat.com
http://post-office.corp.redhat.com/mailman/listinfo/rhkernel-dept-list

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2002-04-22 22:25 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-18 18:36 SSE related security hole linux
2002-04-18 18:53 ` Richard B. Johnson
2002-04-21 19:52   ` Pavel Machek
2002-04-21 22:11   ` David Wagner
2002-04-18 21:06 ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2002-04-22 22:24 Saxena, Sunil
     [not found] <20020418183639.20946.qmail@science.horizon.com.suse.lists.linux.kernel>
     [not found] ` <a9ncgs$2s2$1@cesium.transmeta.com.suse.lists.linux.kernel>
2002-04-19 14:06   ` Andi Kleen
2002-04-19 18:00     ` Doug Ledford
2002-04-19 21:04       ` Andrea Arcangeli
2002-04-19 21:35         ` H. Peter Anvin
2002-04-19 21:42           ` Andi Kleen
2002-04-20  3:23             ` Andrea Arcangeli
2002-04-19 22:18         ` Jan Hubicka
     [not found] <200204182320.53095.nahshon@actcom.co.il>
2002-04-19 11:22 ` Alan Cox
2002-04-17 23:42 Doug Ledford
2002-04-18  5:26 ` Andrea Arcangeli
2002-04-18  9:10   ` Arjan van de Ven
2002-04-18 11:18   ` Alan Cox
2002-04-18 11:14     ` Andi Kleen
2002-04-18 11:53       ` Alan Cox
2002-04-18 11:46         ` Andi Kleen
2002-04-18 11:55         ` Andi Kleen
2002-04-18 13:44   ` Doug Ledford
2002-04-18 19:20     ` Pavel Machek
2002-04-18 19:32       ` Doug Ledford
2002-04-21 19:54         ` Pavel Machek
2002-04-18  8:22 ` Andi Kleen
2002-04-17 14:51 Jan Hubicka
2002-04-17 15:23 ` Jan Hubicka
2002-04-18 14:57   ` Denis Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox