[patch] x86: fix taking DNA during 64bit sigreturn

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] x86: fix taking DNA during 64bit sigreturn
@ 2007-11-11 19:27 Siddha, Suresh B
  2007-11-12 19:08 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Siddha, Suresh B @ 2007-11-11 19:27 UTC (permalink / raw)
  To: torvalds, ak, linux-kernel; +Cc: mingo, hpa, tglx, akpm

restore sigcontext is taking a DNA exception while restoring FP context from
the user stack, during the sigreturn. Appended patch fixes it by doing clts()
if the app doesn't touch FP during the signal handler execution. This will
stop generating a DNA, during the fxrstor in the sigreturn.

This improves 64-bit lat_sig numbers by ~30% on my core2 platform.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

diff --git a/arch/x86/kernel/i387_64.c b/arch/x86/kernel/i387_64.c
index 56c1f11..bfaff28 100644
--- a/arch/x86/kernel/i387_64.c
+++ b/arch/x86/kernel/i387_64.c
@@ -92,13 +92,14 @@ int save_i387(struct _fpstate __user *buf)
 	if (task_thread_info(tsk)->status & TS_USEDFPU) {
 		err = save_i387_checking((struct i387_fxsave_struct __user *)buf);
 		if (err) return err;
+		task_thread_info(tsk)->status &= ~TS_USEDFPU;
 		stts();
-		} else {
-		if (__copy_to_user(buf, &tsk->thread.i387.fxsave, 
+	} else {
+		if (__copy_to_user(buf, &tsk->thread.i387.fxsave,
 				   sizeof(struct i387_fxsave_struct)))
 			return -1;
-	} 
-		return 1;
+	}
+	return 1;
 }
 
 /*
diff --git a/include/asm-x86/i387_64.h b/include/asm-x86/i387_64.h
index 0217b74..3a4ffba 100644
--- a/include/asm-x86/i387_64.h
+++ b/include/asm-x86/i387_64.h
@@ -203,6 +203,11 @@ static inline void save_init_fpu(struct task_struct *tsk)
  */
 static inline int restore_i387(struct _fpstate __user *buf)
 {
+	set_used_math();
+	if (!(task_thread_info(current)->status & TS_USEDFPU)) {
+		clts();
+		task_thread_info(current)->status |= TS_USEDFPU;
+	}
 	return restore_fpu_checking((__force struct i387_fxsave_struct *)buf);
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch] x86: fix taking DNA during 64bit sigreturn
  2007-11-11 19:27 [patch] x86: fix taking DNA during 64bit sigreturn Siddha, Suresh B
@ 2007-11-12 19:08 ` Linus Torvalds
  2007-11-12 19:33   ` H. Peter Anvin
  2007-11-12 21:16   ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2007-11-12 19:08 UTC (permalink / raw)
  To: Siddha, Suresh B; +Cc: ak, linux-kernel, mingo, hpa, tglx, akpm

On Sun, 11 Nov 2007, Siddha, Suresh B wrote:
>
> restore sigcontext is taking a DNA exception while restoring FP context from
> the user stack, during the sigreturn. Appended patch fixes it by doing clts()
> if the app doesn't touch FP during the signal handler execution. This will
> stop generating a DNA, during the fxrstor in the sigreturn.
> 
> This improves 64-bit lat_sig numbers by ~30% on my core2 platform.

The sad part here is that the 32-bit kernel doesn't seem to have this 
problem at all, and has it fixed thanks to just having cleaner 
abstractions (eg the "unlazy_fpu()" function will just automatically clear 
TS_USEDFPU as part of __save_init_fpu()).

In comparison, the x86-64 math handling is a total friggin mess. Maybe 
it's not *beautiful* in the 32-bit path either, but most of the complexity 
in the 32-bit stuff is due to just the complexity of interactions. In the 
64-bit code, the complexity seems to come largely from just being crap, 
crap, crap.

The *real* fix for this is almost certainly to just get rid of the 64-bit 
code entirely, and use the 32-bit code as the base for one single unified 
setup. The 32-bit code should be largely a superset of the 64-bit code 
anyway, since it has to handle more cases, and does it more cleanly.

Oh, well. Your patch looks good for now.

Btw: the 64-bit code looks like it might have preemption issues too. 
Another bug that seems to be fixed in the 32-bit codepath.

Anybody willing to try to do that unification around the 32-bit code? The 
biggest issue with the 64-bit code is to avoid the extra rex prefix for 
"fxsave", which means that the fxsave/fxrstor thing is done with

	"rex64/fxsave %P2(%1)"
		: "m" (tsk->thread.i387.fxsave)
		: "cdaSDb" (tsk),
		  "i" (offsetof(__typeof__(*tsk), thread.i387.fxsave)));

which isn't exactly pretty, but the memory address generation works fine 
in 32-bit code too, and the rex override is easily done with

	#ifdef CONFIG_X86_64
	  #define REX64 "rex64/"
	#else
	  #define REX64 ""
	#endif

and then you just use

	REX64 "fxsave"

in the asm statements instead.

So I'll apply this patch for 2.6.24, but considering that it really looks 
like x86-64 is preempt-broken too, if somebody does a larger patch that 
does the above merge and tests it, I would be open to apply that too. It 
does seem to be a real bug (having a scheduling event in the signal path 
between the actual fxsave and the clearing of the bits would *seem* to 
possibly screw up the FP state!)

But maybe I'm missing some reason why it doesn't matter. The 32-bit code 
was fixed back in 2003 (commit 5bff44fc272b948a85e893a007d01b9dfb3ad04f 
"Be a lot more careful about TS_USEDFPU and preemption" in the historical 
tree with a comment like this:

    We had some races where we tested (or set) TS_USEDFPU together
    with sequences that depended on the setting (like clearing or
    setting the TS flag in %cr0) and we could be preempted in between,
    which screws up the FPU state, since preemption will itself change
    USEDFPU and the TS flag.

and maybe it's just the 32-bit tree being unnecessarily careful. It's long 
enough ago that I don't remember the details, but I have a dim memory of 
there actually being some state corruption involved.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] x86: fix taking DNA during 64bit sigreturn
  2007-11-12 19:08 ` Linus Torvalds
@ 2007-11-12 19:33   ` H. Peter Anvin
  2007-11-12 21:16   ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2007-11-12 19:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Siddha, Suresh B, ak, linux-kernel, mingo, tglx, akpm

Linus Torvalds wrote:
> Anybody willing to try to do that unification around the 32-bit code?

Heck yes.  I'll grab this one.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] x86: fix taking DNA during 64bit sigreturn
  2007-11-12 19:08 ` Linus Torvalds
  2007-11-12 19:33   ` H. Peter Anvin
@ 2007-11-12 21:16   ` Andi Kleen
  2007-11-13  0:11     ` H. Peter Anvin
  1 sibling, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2007-11-12 21:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Siddha, Suresh B, linux-kernel, mingo, hpa, tglx, akpm

> 
> The *real* fix for this is almost certainly to just get rid of the 64-bit 
> code entirely, and use the 32-bit code as the base for one single unified 
> setup.

That would likely break the ABI. x86-64 ABI is completely different here --
no ibcs, just pure x86 ISA.

I always thought direct FXSAVE from/to user space to be a cute trick, but yes 
the exception Suresh noticed makes it lose some of its beauty.

> The 32-bit code should be largely a superset of the 64-bit code  
> anyway, since it has to handle more cases, and does it more cleanly.

If you consider compat code 64bit handles as many cases as 32bit.

> which isn't exactly pretty, but the memory address generation works fine 
> in 32-bit code too, and the rex override is easily done with
> 
> 	#ifdef CONFIG_X86_64
> 	  #define REX64 "rex64/"
> 	#else
> 	  #define REX64 ""
> 	#endif
> 
> and then you just use
> 
> 	REX64 "fxsave"

That didn't work on older assemblers.

> But maybe I'm missing some reason why it doesn't matter. The 32-bit code 
> was fixed back in 2003 (commit 5bff44fc272b948a85e893a007d01b9dfb3ad04f 

64bit FPU semantics are somewhat different. I don't remember if this
particular issue was addressed or not, but I fixed a few shared
bugs in a quite different way on 64bit vs 32bit. If anybody wants
to change something here don't assume they are the same.

>From a cursory look it's probably broken though.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] x86: fix taking DNA during 64bit sigreturn
  2007-11-12 21:16   ` Andi Kleen
@ 2007-11-13  0:11     ` H. Peter Anvin
  2007-11-13  0:43       ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2007-11-13  0:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Siddha, Suresh B, linux-kernel, mingo, tglx, akpm

Andi Kleen wrote:
>> The *real* fix for this is almost certainly to just get rid of the 64-bit 
>> code entirely, and use the 32-bit code as the base for one single unified 
>> setup.
> 
> That would likely break the ABI. x86-64 ABI is completely different here --
> no ibcs, just pure x86 ISA.
> 

Different ABIs clearly have to be handled, but I don't think that is a 
huge deal.  The i387 code overall (not just asm/i387.h) is very 
different, though, and I find it unlikely that a properly unified code 
is going to be ready and working in the 2.6.24 timeframe.  I'm exploring 
if a partial merge with high confidence level is feasible; either way, a 
proper merge for 2.6.25 is probably the right thing.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] x86: fix taking DNA during 64bit sigreturn
  2007-11-13  0:11     ` H. Peter Anvin
@ 2007-11-13  0:43       ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2007-11-13  0:43 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Siddha, Suresh B, linux-kernel, mingo, tglx, akpm

On Tuesday 13 November 2007 01:11, H. Peter Anvin wrote:
> Andi Kleen wrote:
> >> The *real* fix for this is almost certainly to just get rid of the
> >> 64-bit code entirely, and use the 32-bit code as the base for one single
> >> unified setup.
> >
> > That would likely break the ABI. x86-64 ABI is completely different here
> > -- no ibcs, just pure x86 ISA.
>
> Different ABIs clearly have to be handled, 

Well all four cases are quite different.

> but I don't think that is a 
> huge deal.  The i387 code overall (not just asm/i387.h) is very
> different,

Yes because x86-64 is vastly simpler. e.g. the 64bit path 
does not touch the state itself at all. And all the state convert
code moved into the compat layer only. There were also
various other cleanups.

Also 64bit e.g. defers some of the checking to the context
switch, which also simplifies things.

> though, and I find it unlikely that a properly unified code 
> is going to be ready and working in the 2.6.24 timeframe. 

To be honest it is unclear to me how you can do better than
direct FXSAVE from/to user space on 64bit.  Sure you could
put a manual convert pass inbetween like i386, but that
would be rather pointless because it matches 1:1 and in fact would
make the code to be worse (no transparent support of future
FXSAVE extensions) . Not sure what tripped
Linus up so much either. Was it the broken white space?

> I'm exploring 
> if a partial merge with high confidence level is feasible; either way, a
> proper merge for 2.6.25 is probably the right thing.

At least on 64bit the result would be probably more complicated
and likely slower code.

If anything it might make sense to add a few more of the 64bit 
improvements to 32bit (e.g. delayed exception checking).
The direct FXSAVE path cannot be moved over because it
would break the ABI, otherwise I would have done that 
long ago.

BTW if you're looking for real fixes to do iirc some of the FPU
context switch workarounds are missing in KVM which
does mostly its own thing currently.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-11-13  0:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-11 19:27 [patch] x86: fix taking DNA during 64bit sigreturn Siddha, Suresh B
2007-11-12 19:08 ` Linus Torvalds
2007-11-12 19:33   ` H. Peter Anvin
2007-11-12 21:16   ` Andi Kleen
2007-11-13  0:11     ` H. Peter Anvin
2007-11-13  0:43       ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox