public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: warn on apic error
@ 2008-07-18 17:28 Vegard Nossum
  2008-07-18 17:44 ` Cyrill Gorcunov
  2008-07-18 19:02 ` Maciej W. Rozycki
  0 siblings, 2 replies; 10+ messages in thread
From: Vegard Nossum @ 2008-07-18 17:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Maciej W. Rozycki, linux-kernel

>From e89f2a9f33d01a2df7553b63cb1df525c6e75ad4 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Fri, 18 Jul 2008 19:14:06 +0200
Subject: [PATCH] x86: warn on apic error

There are certain APIC errors which are obviously programmer errors,
e.g. writing to illegal APIC registers, or sending invalid interrupt
vectors. Since the error interrupt happens spot on the erroneous code,
we might as well make a bit of noise about it and display the stack-
trace.

Cc: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
---
 arch/x86/kernel/apic_32.c |    1 +
 arch/x86/kernel/apic_64.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index a437d02..b9289cb 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1317,6 +1317,7 @@ void smp_error_interrupt(struct pt_regs *regs)
 	*/
 	printk(KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
 		smp_processor_id(), v , v1);
+	WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));
 	irq_exit();
 }
 
diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 1e3d32e..2d959f2 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -997,6 +997,7 @@ asmlinkage void smp_error_interrupt(void)
 	*/
 	printk(KERN_DEBUG "APIC error on CPU%d: %02x(%02x)\n",
 		smp_processor_id(), v , v1);
+	WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));
 	irq_exit();
 }
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 17:28 [PATCH] x86: warn on apic error Vegard Nossum
@ 2008-07-18 17:44 ` Cyrill Gorcunov
  2008-07-18 17:45   ` Vegard Nossum
  2008-07-18 19:02 ` Maciej W. Rozycki
  1 sibling, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2008-07-18 17:44 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Ingo Molnar, Maciej W. Rozycki, linux-kernel

[Vegard Nossum - Fri, Jul 18, 2008 at 07:28:21PM +0200]
| From e89f2a9f33d01a2df7553b63cb1df525c6e75ad4 Mon Sep 17 00:00:00 2001
| From: Vegard Nossum <vegard.nossum@gmail.com>
| Date: Fri, 18 Jul 2008 19:14:06 +0200
| Subject: [PATCH] x86: warn on apic error
| 
| There are certain APIC errors which are obviously programmer errors,
| e.g. writing to illegal APIC registers, or sending invalid interrupt
| vectors. Since the error interrupt happens spot on the erroneous code,
| we might as well make a bit of noise about it and display the stack-
| trace.
| 
| Cc: Maciej W. Rozycki <macro@linux-mips.org>
| Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
| ---
|  arch/x86/kernel/apic_32.c |    1 +
|  arch/x86/kernel/apic_64.c |    1 +
|  2 files changed, 2 insertions(+), 0 deletions(-)
| 
| diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
| index a437d02..b9289cb 100644
| --- a/arch/x86/kernel/apic_32.c
| +++ b/arch/x86/kernel/apic_32.c
| @@ -1317,6 +1317,7 @@ void smp_error_interrupt(struct pt_regs *regs)
|  	*/
|  	printk(KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
|  		smp_processor_id(), v , v1);
| +	WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));
|  	irq_exit();
|  }
|  
| diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
| index 1e3d32e..2d959f2 100644
| --- a/arch/x86/kernel/apic_64.c
| +++ b/arch/x86/kernel/apic_64.c
| @@ -997,6 +997,7 @@ asmlinkage void smp_error_interrupt(void)
|  	*/
|  	printk(KERN_DEBUG "APIC error on CPU%d: %02x(%02x)\n",
|  		smp_processor_id(), v , v1);
| +	WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));
|  	irq_exit();
|  }
|  
| -- 
| 1.5.5.1
| 
| 

Hi Vegard, i think you better should use  #APIC_ESR_... macroses
from apicdef.h instead of hardcoded bits.

		- Cyrill -

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 17:44 ` Cyrill Gorcunov
@ 2008-07-18 17:45   ` Vegard Nossum
  2008-07-18 17:49     ` Cyrill Gorcunov
  0 siblings, 1 reply; 10+ messages in thread
From: Vegard Nossum @ 2008-07-18 17:45 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Ingo Molnar, Maciej W. Rozycki, linux-kernel

On Fri, Jul 18, 2008 at 7:44 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> [Vegard Nossum - Fri, Jul 18, 2008 at 07:28:21PM +0200]
> | From e89f2a9f33d01a2df7553b63cb1df525c6e75ad4 Mon Sep 17 00:00:00 2001
> | From: Vegard Nossum <vegard.nossum@gmail.com>
> | Date: Fri, 18 Jul 2008 19:14:06 +0200
> | Subject: [PATCH] x86: warn on apic error
> |
> | There are certain APIC errors which are obviously programmer errors,
> | e.g. writing to illegal APIC registers, or sending invalid interrupt
> | vectors. Since the error interrupt happens spot on the erroneous code,
> | we might as well make a bit of noise about it and display the stack-
> | trace.

> Hi Vegard, i think you better should use  #APIC_ESR_... macroses
> from apicdef.h instead of hardcoded bits.

OOps. I actually had this in my commit message, but it disappeared mysteriously:

    In particular, the errors we do this for are:

    - Send CS error
    - Send accept error
    - Send illegal vector
    - Illegal register address

    (The error codes are listed in a comment just above the code in
    question.)

But if these definitions exist, then I will use them. Thanks!


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 17:45   ` Vegard Nossum
@ 2008-07-18 17:49     ` Cyrill Gorcunov
  2008-07-18 19:09       ` Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2008-07-18 17:49 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Ingo Molnar, Maciej W. Rozycki, linux-kernel

[Vegard Nossum - Fri, Jul 18, 2008 at 07:45:53PM +0200]
| On Fri, Jul 18, 2008 at 7:44 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
| > [Vegard Nossum - Fri, Jul 18, 2008 at 07:28:21PM +0200]
| > | From e89f2a9f33d01a2df7553b63cb1df525c6e75ad4 Mon Sep 17 00:00:00 2001
| > | From: Vegard Nossum <vegard.nossum@gmail.com>
| > | Date: Fri, 18 Jul 2008 19:14:06 +0200
| > | Subject: [PATCH] x86: warn on apic error
| > |
| > | There are certain APIC errors which are obviously programmer errors,
| > | e.g. writing to illegal APIC registers, or sending invalid interrupt
| > | vectors. Since the error interrupt happens spot on the erroneous code,
| > | we might as well make a bit of noise about it and display the stack-
| > | trace.
| 
| > Hi Vegard, i think you better should use  #APIC_ESR_... macroses
| > from apicdef.h instead of hardcoded bits.
| 
| OOps. I actually had this in my commit message, but it disappeared mysteriously:
| 
|     In particular, the errors we do this for are:
| 
|     - Send CS error
|     - Send accept error
|     - Send illegal vector
|     - Illegal register address
| 
|     (The error codes are listed in a comment just above the code in
|     question.)
| 
| But if these definitions exist, then I will use them. Thanks!
| 
| 
| Vegard
| 
| -- 
| "The animistic metaphor of the bug that maliciously sneaked in while
| the programmer was not looking is intellectually dishonest as it
| disguises that the error is the programmer's own creation."
| 	-- E. W. Dijkstra, EWD1036
| 

iirc, they all were there (though some error codes are specific for
particular processor classes like P4, pentium and other - don't remeber
you could check intel dev manual for this).

		- Cyrill -

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 17:28 [PATCH] x86: warn on apic error Vegard Nossum
  2008-07-18 17:44 ` Cyrill Gorcunov
@ 2008-07-18 19:02 ` Maciej W. Rozycki
  1 sibling, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2008-07-18 19:02 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Ingo Molnar, linux-kernel

On Fri, 18 Jul 2008, Vegard Nossum wrote:

> There are certain APIC errors which are obviously programmer errors,
> e.g. writing to illegal APIC registers, or sending invalid interrupt
> vectors. Since the error interrupt happens spot on the erroneous code,
> we might as well make a bit of noise about it and display the stack-
> trace.
[...]
> @@ -1317,6 +1317,7 @@ void smp_error_interrupt(struct pt_regs *regs)
>  	*/
>  	printk(KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
>  		smp_processor_id(), v , v1);
> +	WARN_ON(v1 & ((1 << 0) | (1 << 2) | (1 << 5) | (1 << 7)));

 Hmm, I think there is no point in dumping state on send checksum errors
as these normally signify a problem like a data line driven low on the
inter-APIC bus by another agent while the sender drove it high.  Barring a
faulty chip, it would normally happen as a result of a problem with
arbitration, such as when duplicate IDs would happen on the bus.  In this
case finding the triggering command would not help at all as the culprit
would lie elsewhere.  It could result from crosstalk or some other problem
with board hardware too.

 The rest is fine with me.

 You need to check (v | v1) though as for Pentium processor integrated
APICs the error triggering the interrupt is provided by the first read of
the ESR register and for later implementations it is the second read that
does that after the latches have been updated by the write access between.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 17:49     ` Cyrill Gorcunov
@ 2008-07-18 19:09       ` Maciej W. Rozycki
  2008-07-18 19:13         ` Cyrill Gorcunov
  2008-07-19 12:59         ` Cyrill Gorcunov
  0 siblings, 2 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2008-07-18 19:09 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vegard Nossum, Ingo Molnar, linux-kernel

On Fri, 18 Jul 2008, Cyrill Gorcunov wrote:

> iirc, they all were there (though some error codes are specific for
> particular processor classes like P4, pentium and other - don't remeber
> you could check intel dev manual for this).

 This is an error in some newer Intel documents -- the set of supported
bits is the same for all APICs implementing the ESR register and the error
interrupt (the 82489DX is the one to provide neither).

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 19:09       ` Maciej W. Rozycki
@ 2008-07-18 19:13         ` Cyrill Gorcunov
  2008-07-19 12:59         ` Cyrill Gorcunov
  1 sibling, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2008-07-18 19:13 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Vegard Nossum, Ingo Molnar, linux-kernel

[Maciej W. Rozycki - Fri, Jul 18, 2008 at 08:09:20PM +0100]
| On Fri, 18 Jul 2008, Cyrill Gorcunov wrote:
| 
| > iirc, they all were there (though some error codes are specific for
| > particular processor classes like P4, pentium and other - don't remeber
| > you could check intel dev manual for this).
| 
|  This is an error in some newer Intel documents -- the set of supported
| bits is the same for all APICs implementing the ESR register and the error
| interrupt (the 82489DX is the one to provide neither).
| 
|   Maciej
| 

thanks, Maciej

		- Cyrill -

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-18 19:09       ` Maciej W. Rozycki
  2008-07-18 19:13         ` Cyrill Gorcunov
@ 2008-07-19 12:59         ` Cyrill Gorcunov
  2008-07-19 23:08           ` Maciej W. Rozycki
  1 sibling, 1 reply; 10+ messages in thread
From: Cyrill Gorcunov @ 2008-07-19 12:59 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Vegard Nossum, Ingo Molnar, linux-kernel

[Maciej W. Rozycki - Fri, Jul 18, 2008 at 08:09:20PM +0100]
| On Fri, 18 Jul 2008, Cyrill Gorcunov wrote:
| 
| > iirc, they all were there (though some error codes are specific for
| > particular processor classes like P4, pentium and other - don't remeber
| > you could check intel dev manual for this).
| 
|  This is an error in some newer Intel documents -- the set of supported
| bits is the same for all APICs implementing the ESR register and the error
| interrupt (the 82489DX is the one to provide neither).
| 
|   Maciej
|

But Maciej, Intel doesn't claim about absence of bit but
rather about their 'reserved' state so checking the reserved
bit could be false alarm if it was being set by some reason.

		- Cyrill -

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-19 12:59         ` Cyrill Gorcunov
@ 2008-07-19 23:08           ` Maciej W. Rozycki
  2008-07-20  6:38             ` Cyrill Gorcunov
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2008-07-19 23:08 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Vegard Nossum, Ingo Molnar, linux-kernel

On Sat, 19 Jul 2008, Cyrill Gorcunov wrote:

> [Maciej W. Rozycki - Fri, Jul 18, 2008 at 08:09:20PM +0100]
> | On Fri, 18 Jul 2008, Cyrill Gorcunov wrote:
> | 
> | > iirc, they all were there (though some error codes are specific for
> | > particular processor classes like P4, pentium and other - don't remeber
> | > you could check intel dev manual for this).
> | 
> |  This is an error in some newer Intel documents -- the set of supported
> | bits is the same for all APICs implementing the ESR register and the error
> | interrupt (the 82489DX is the one to provide neither).
> | 
> |   Maciej
> |
> 
> But Maciej, Intel doesn't claim about absence of bit but
> rather about their 'reserved' state so checking the reserved
> bit could be false alarm if it was being set by some reason.

 As I say -- all later manuals have an error here (not the first and
presumably not the last one), where they state the invalid register error
has been only implemented since the P6.  This is not true -- go find the
Pentium manual or experiment with actual hardware to see otherwise.  No
new bits have been added since the introduction of the ESR, so there is no
issue with bits to mask out on older silicon.

 Obviously the checksum and accept error bits are valid for APICs using
the dedicated inter-APIC bus only as for the FSB they are irrelevant.  
They have to be hardwired to zero for backward compatibility though.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: warn on apic error
  2008-07-19 23:08           ` Maciej W. Rozycki
@ 2008-07-20  6:38             ` Cyrill Gorcunov
  0 siblings, 0 replies; 10+ messages in thread
From: Cyrill Gorcunov @ 2008-07-20  6:38 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Vegard Nossum, Ingo Molnar, linux-kernel

[Maciej W. Rozycki - Sun, Jul 20, 2008 at 12:08:23AM +0100]
| On Sat, 19 Jul 2008, Cyrill Gorcunov wrote:
| 
| > [Maciej W. Rozycki - Fri, Jul 18, 2008 at 08:09:20PM +0100]
| > | On Fri, 18 Jul 2008, Cyrill Gorcunov wrote:
| > | 
| > | > iirc, they all were there (though some error codes are specific for
| > | > particular processor classes like P4, pentium and other - don't remeber
| > | > you could check intel dev manual for this).
| > | 
| > |  This is an error in some newer Intel documents -- the set of supported
| > | bits is the same for all APICs implementing the ESR register and the error
| > | interrupt (the 82489DX is the one to provide neither).
| > | 
| > |   Maciej
| > |
| > 
| > But Maciej, Intel doesn't claim about absence of bit but
| > rather about their 'reserved' state so checking the reserved
| > bit could be false alarm if it was being set by some reason.
| 
|  As I say -- all later manuals have an error here (not the first and
| presumably not the last one), where they state the invalid register error
| has been only implemented since the P6.  This is not true -- go find the
| Pentium manual or experiment with actual hardware to see otherwise.  No
| new bits have been added since the introduction of the ESR, so there is no
| issue with bits to mask out on older silicon.
| 
|  Obviously the checksum and accept error bits are valid for APICs using
| the dedicated inter-APIC bus only as for the FSB they are irrelevant.  
| They have to be hardwired to zero for backward compatibility though.
| 
|   Maciej
| 

Thanks a lot, Maciej!

		- Cyrill -

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-07-20  6:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-18 17:28 [PATCH] x86: warn on apic error Vegard Nossum
2008-07-18 17:44 ` Cyrill Gorcunov
2008-07-18 17:45   ` Vegard Nossum
2008-07-18 17:49     ` Cyrill Gorcunov
2008-07-18 19:09       ` Maciej W. Rozycki
2008-07-18 19:13         ` Cyrill Gorcunov
2008-07-19 12:59         ` Cyrill Gorcunov
2008-07-19 23:08           ` Maciej W. Rozycki
2008-07-20  6:38             ` Cyrill Gorcunov
2008-07-18 19:02 ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox