public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: FPU-intensive programs crashing with floating point  exception on Cyrix MII
@ 2005-08-21  9:47 Chuck Ebbert
  2005-08-21 17:52 ` Linus Torvalds
  2005-08-22  7:19 ` FPU-intensive programs crashing with floating point Ingo Molnar
  0 siblings, 2 replies; 5+ messages in thread
From: Chuck Ebbert @ 2005-08-21  9:47 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar

On Thu, 18 Aug 2005 12:37:30 +0200, Ondrej Zary wrote:

> >   Could you modify this to print the full values of cwd and swd like this?
> > 
> >         printk("MATH ERROR: cwd = 0x%hx, swd = 0x%hx\n", cwd, swd);
> > 
> > Then post the result.
> MATH ERROR: cwd = 0x37f, swd = 0x5020
> MATH ERROR: cwd = 0x37f, swd = 0x20
> MATH ERROR: cwd = 0x37f, swd = 0x20
> MATH ERROR: cwd = 0x37f, swd = 0x2020
> MATH ERROR: cwd = 0x37f, swd = 0x20
> MATH ERROR: cwd = 0x37f, swd = 0x1820
> MATH ERROR: cwd = 0x37f, swd = 0x1820
> MATH ERROR: cwd = 0x37f, swd = 0x2020
> MATH ERROR: cwd = 0x37f, swd = 0x20
> MATH ERROR: cwd = 0x37f, swd = 0x2800     <===========
> MATH ERROR: cwd = 0x37f, swd = 0x1820
> MATH ERROR: cwd = 0x37f, swd = 0x820
> MATH ERROR: cwd = 0x37f, swd = 0x2820
> MATH ERROR: cwd = 0x37f, swd = 0x2820
> MATH ERROR: cwd = 0x37f, swd = 0x1820
> MATH ERROR: cwd = 0x37f, swd = 0x820
> MATH ERROR: cwd = 0x37f, swd = 0x1a20

 The error I marked has no exception flags set.  The rest are all (masked)
denormal exceptions.  Why your Cyrix MII would cause an FPU exception in these
cases is beyond me.  Could you try the statically-linked mprime program?

 I had hoped someone who knew more about FPU error handling would jump in.
The below code from arch/i386/kernel/traps.c sends a signal back to
userspace even when the status word shows a masked (or no) exception has
occurred.  The 'case 0x000' strongly suggests this is deliberate but I
don't know why.


        /*
         * (~cwd & swd) will mask out exceptions that are not set to unmasked
         * status.  0x3f is the exception bits in these regs, 0x200 is the
         * C1 reg you need in case of a stack fault, 0x040 is the stack
         * fault bit.  We should only be taking one exception at a time,
         * so if this combination doesn't produce any single exception,
         * then we have a bad program that isn't syncronizing its FPU usage
         * and it will suffer the consequences since we won't be able to
         * fully reproduce the context of the exception
         */
        cwd = get_fpu_cwd(task);
        swd = get_fpu_swd(task);
        switch (((~cwd) & swd & 0x3f) | (swd & 0x240)) {
                case 0x000:
                default:
                        break;
                case 0x001: /* Invalid Op */
                case 0x041: /* Stack Fault */
                case 0x241: /* Stack Fault | Direction */
                        info.si_code = FPE_FLTINV;
                        /* Should we clear the SF or let user space do it ???? */
                        break;


(And it looks like there is a small bug in there.  The switch should be:

        switch (((~cwd) & swd & 0x3f) | (swd & 1 ? swd & 0x240 : 0)) {

because the SF and CC1 bits are only relevant when IE is set.)

__
Chuck

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FPU-intensive programs crashing with floating point   exception on Cyrix MII
  2005-08-21  9:47 FPU-intensive programs crashing with floating point exception on Cyrix MII Chuck Ebbert
@ 2005-08-21 17:52 ` Linus Torvalds
  2005-08-21 21:28   ` Ondrej Zary
  2005-08-22  7:19 ` FPU-intensive programs crashing with floating point Ingo Molnar
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2005-08-21 17:52 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Ondrej Zary, linux-kernel, Ingo Molnar



On Sun, 21 Aug 2005, Chuck Ebbert wrote:
>
> > MATH ERROR: cwd = 0x37f, swd = 0x2800     <===========
> 
>  The error I marked has no exception flags set.  The rest are all (masked)
> denormal exceptions.  Why your Cyrix MII would cause an FPU exception in these
> cases is beyond me.  Could you try the statically-linked mprime program?

Also, please try this one, to see where it happens.

			Linus

---
diff --git a/arch/i386/kernel/i8259.c b/arch/i386/kernel/i8259.c
--- a/arch/i386/kernel/i8259.c
+++ b/arch/i386/kernel/i8259.c
@@ -357,11 +357,11 @@ void init_8259A(int auto_eoi)
 
 static irqreturn_t math_error_irq(int cpl, void *dev_id, struct pt_regs *regs)
 {
-	extern void math_error(void __user *);
+	extern void math_error(struct pt_regs *);
 	outb(0,0xF0);
 	if (ignore_fpu_irq || !boot_cpu_data.hard_math)
 		return IRQ_NONE;
-	math_error((void __user *)regs->eip);
+	math_error(regs);
 	return IRQ_HANDLED;
 }
 
diff --git a/arch/i386/kernel/traps.c b/arch/i386/kernel/traps.c
--- a/arch/i386/kernel/traps.c
+++ b/arch/i386/kernel/traps.c
@@ -774,8 +774,9 @@ clear_TF_reenable:
  * the correct behaviour even in the presence of the asynchronous
  * IRQ13 behaviour
  */
-void math_error(void __user *eip)
+void math_error(struct pt_regs *regs)
 {
+	void __user *eip = (void __user *)regs->eip;
 	struct task_struct * task;
 	siginfo_t info;
 	unsigned short cwd, swd;
@@ -805,6 +806,7 @@ void math_error(void __user *eip)
 	swd = get_fpu_swd(task);
 	switch (((~cwd) & swd & 0x3f) | (swd & 0x240)) {
 		case 0x000:
+			show_regs(regs);
 		default:
 			break;
 		case 0x001: /* Invalid Op */
@@ -833,7 +835,7 @@ void math_error(void __user *eip)
 fastcall void do_coprocessor_error(struct pt_regs * regs, long error_code)
 {
 	ignore_fpu_irq = 1;
-	math_error((void __user *)regs->eip);
+	math_error(regs);
 }
 
 static void simd_math_error(void __user *eip)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FPU-intensive programs crashing with floating point   exception on Cyrix MII
  2005-08-21 17:52 ` Linus Torvalds
@ 2005-08-21 21:28   ` Ondrej Zary
  2005-08-21 23:10     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Ondrej Zary @ 2005-08-21 21:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chuck Ebbert, linux-kernel, Ingo Molnar

Linus Torvalds wrote:
> 
> On Sun, 21 Aug 2005, Chuck Ebbert wrote:
> 
>>>MATH ERROR: cwd = 0x37f, swd = 0x2800     <===========
>>
>> The error I marked has no exception flags set.  The rest are all (masked)
>>denormal exceptions.  Why your Cyrix MII would cause an FPU exception in these
>>cases is beyond me.  Could you try the statically-linked mprime program?

I use only the statically linked mprime.

> Also, please try this one, to see where it happens.
I did some modification to the code so it calls show_regs() in both 
cases where I get problems and also added the return so it does not 
crash. The code looks like this:
---
         printk("MATH ERROR: cwd = 0x%hx, swd = 0x%hx\n", cwd, swd);
         switch (((~cwd) & swd & 0x3f) | (swd & 0x240)) {
                 case 0x000:
                 case 0x200:
                         show_regs(regs);
                         return;
---
Here are the results.

MATH ERROR: cwd = 0x37f, swd = 0x1820

Pid: 1699, comm:               mprime
EIP: 0073:[<08181c73>] CPU: 0
EIP is at 0x8181c73
  ESP: 007b:bf927ab4 EFLAGS: 00010202    Not tainted  (2.6.12-pentium)
EAX: 00000001 EBX: 00000000 ECX: 0000808d EDX: b7f09480
ESI: b7455340 EDI: 080e01f0 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b7ed6058 CR3: 006f0000 CR4: 00000080
MATH ERROR: cwd = 0x37f, swd = 0x1020

Pid: 1699, comm:               mprime
EIP: 0073:[<0818ca5f>] CPU: 0
EIP is at 0x818ca5f
  ESP: 007b:bf927ab0 EFLAGS: 00010207    Not tainted  (2.6.12-pentium)
EAX: 00000005 EBX: 00000000 ECX: 00008407 EDX: b7f08140
ESI: b789aea0 EDI: b7f08200 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b75c6000 CR3: 006f0000 CR4: 00000080
MATH ERROR: cwd = 0x37f, swd = 0x2820

Pid: 1699, comm:               mprime
EIP: 0073:[<0818c4b1>] CPU: 0
EIP is at 0x818c4b1
  ESP: 007b:bf927ab0 EFLAGS: 00010247    Not tainted  (2.6.12-pentium)
EAX: 00000000 EBX: 00000000 ECX: 0000880f EDX: b7f09480
ESI: b741fc20 EDI: 080e0160 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b75c6000 CR3: 006f0000 CR4: 00000080
MATH ERROR: cwd = 0x37f, swd = 0x20

Pid: 1699, comm:               mprime
EIP: 0073:[<08181ca1>] CPU: 0
EIP is at 0x8181ca1
  ESP: 007b:bf927ab4 EFLAGS: 00010202    Not tainted  (2.6.12-pentium)
EAX: 00000002 EBX: 00000000 ECX: 00000084 EDX: b7f09480
ESI: b74f86c0 EDI: 080e01f0 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b75c6000 CR3: 006f0000 CR4: 00000080
MATH ERROR: cwd = 0x37f, swd = 0x1a20

Pid: 1699, comm:               mprime
EIP: 0073:[<08193c68>] CPU: 0
EIP is at 0x8193c68
  ESP: 007b:bf927ab8 EFLAGS: 00010206    Not tainted  (2.6.12-pentium)
EAX: 00000042 EBX: 00000000 ECX: 00154306 EDX: b7e3ba40
ESI: b7a1e680 EDI: b7e3be40 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b7499000 CR3: 006f0000 CR4: 00000080

MATH ERROR: cwd = 0x37f, swd = 0x20

Pid: 1699, comm:               mprime
EIP: 0073:[<0818de05>] CPU: 0
EIP is at 0x818de05
  ESP: 007b:bf927ab4 EFLAGS: 00010247    Not tainted  (2.6.12-pentium)
EAX: 00000004 EBX: 00000000 ECX: 0000880f EDX: b7f06b40
ESI: b7426400 EDI: 080e1960 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b7499000 CR3: 006f0000 CR4: 00000080
MATH ERROR: cwd = 0x37f, swd = 0x20

Pid: 1699, comm:               mprime
EIP: 0073:[<0818dfe4>] CPU: 0
EIP is at 0x818dfe4
  ESP: 007b:bf927ab4 EFLAGS: 00010247    Not tainted  (2.6.12-pentium)
EAX: 00000200 EBX: 00000000 ECX: 0000880f EDX: b7f06b40
ESI: b742a680 EDI: 080e1c60 EBP: bf927bf8 DS: 007b ES: 007b
CR0: 8005003b CR2: b7499000 CR3: 006f0000 CR4: 00000080

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FPU-intensive programs crashing with floating point   exception on Cyrix MII
  2005-08-21 21:28   ` Ondrej Zary
@ 2005-08-21 23:10     ` Linus Torvalds
  0 siblings, 0 replies; 5+ messages in thread
From: Linus Torvalds @ 2005-08-21 23:10 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: Chuck Ebbert, linux-kernel, Ingo Molnar



On Sun, 21 Aug 2005, Ondrej Zary wrote:
> 
> MATH ERROR: cwd = 0x37f, swd = 0x1820
> 
> Pid: 1699, comm:               mprime
> EIP: 0073:[<08181c73>] CPU: 0
> EIP is at 0x8181c73
>   ESP: 007b:bf927ab4 EFLAGS: 00010202    Not tainted  (2.6.12-pentium)
> EAX: 00000001 EBX: 00000000 ECX: 0000808d EDX: b7f09480
> ESI: b7455340 EDI: 080e01f0 EBP: bf927bf8 DS: 007b ES: 007b
> CR0: 8005003b CR2: b7ed6058 CR3: 006f0000 CR4: 00000080

Ahh, so it's actually all in user space. I was thinking that the Cyrix
chip might use the old external interrupt-based (as opposed to exception
16) FP error reporting, and that it could be some kind of asynchronous
error that raced with the kernel task switching (ie the interrupt had
triggered, and then the FPU control register had been modified before the
irq handler actually got to run).

But that doesn't seem to be the case.

I don't see _why_ that exception would happen, other than a CPU bug.

Can you dump more of the FP state (the kernel doesn't have helpers for
doing that, so you'd have to write the code to print out the state by
hand)? Maybe there's some clue there - denormals or something..

		Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: FPU-intensive programs crashing with floating point
  2005-08-21  9:47 FPU-intensive programs crashing with floating point exception on Cyrix MII Chuck Ebbert
  2005-08-21 17:52 ` Linus Torvalds
@ 2005-08-22  7:19 ` Ingo Molnar
  1 sibling, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2005-08-22  7:19 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Ondrej Zary, linux-kernel, Linus Torvalds, Andrew Morton


* Chuck Ebbert <76306.1226@compuserve.com> wrote:

> (And it looks like there is a small bug in there.  The switch should 
> be:
> 
>         switch (((~cwd) & swd & 0x3f) | (swd & 1 ? swd & 0x240 : 0)) {
> 
> because the SF and CC1 bits are only relevant when IE is set.)

please send a separate patch for that against -mm to Andrew, we want 
this fixed too.

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-08-22 22:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-21  9:47 FPU-intensive programs crashing with floating point exception on Cyrix MII Chuck Ebbert
2005-08-21 17:52 ` Linus Torvalds
2005-08-21 21:28   ` Ondrej Zary
2005-08-21 23:10     ` Linus Torvalds
2005-08-22  7:19 ` FPU-intensive programs crashing with floating point Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox