* [RFC][PATCH] fix target_cpus() for summit subarch
@ 2004-08-28 0:24 john stultz
2004-08-28 2:02 ` William Lee Irwin III
2004-08-28 6:17 ` Martin J. Bligh
0 siblings, 2 replies; 7+ messages in thread
From: john stultz @ 2004-08-28 0:24 UTC (permalink / raw)
To: lkml
Cc: William Lee Irwin III, James, keith maanthey, Chris McDermott,
Martin J. Bligh
I've been hunting down a bug affecting IBM x440/x445 systems where the
floppy driver would get spurious interrupts and would not initialize
properly.
After digging James Cleverdon pointed out that target_cpus() is routing
the interrupts to the clustered apic broadcast mask. This was causing
multiple interrupts to show up, breaking the floppy init code.
This one-liner fix simply routes interrupts to the first cpu to resolve
this issue.
Any comments or feedback would be appreciated.
thanks
-john
===== include/asm-i386/mach-summit/mach_apic.h 1.38 vs edited =====
--- 1.38/include/asm-i386/mach-summit/mach_apic.h 2004-06-24 01:55:52 -07:00
+++ edited/include/asm-i386/mach-summit/mach_apic.h 2004-08-27 16:43:22 -07:00
@@ -19,7 +19,7 @@
static inline cpumask_t target_cpus(void)
{
- return CPU_MASK_ALL;
+ return cpumask_of_cpu(0);
}
#define TARGET_CPUS (target_cpus())
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-28 0:24 [RFC][PATCH] fix target_cpus() for summit subarch john stultz
@ 2004-08-28 2:02 ` William Lee Irwin III
2004-08-28 6:17 ` Martin J. Bligh
1 sibling, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2004-08-28 2:02 UTC (permalink / raw)
To: john stultz; +Cc: lkml, James, keith maanthey, Chris McDermott
On Fri, Aug 27, 2004 at 05:24:48PM -0700, john stultz wrote:
> I've been hunting down a bug affecting IBM x440/x445 systems where the
> floppy driver would get spurious interrupts and would not initialize
> properly.
> After digging James Cleverdon pointed out that target_cpus() is routing
> the interrupts to the clustered apic broadcast mask. This was causing
> multiple interrupts to show up, breaking the floppy init code.
> This one-liner fix simply routes interrupts to the first cpu to resolve
> this issue.
> Any comments or feedback would be appreciated.
You're using fixed delivery mode, so non-singleton destinations break.
If lowest prio delivery mode saw this, you'd have IO-APIC errata.
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-28 0:24 [RFC][PATCH] fix target_cpus() for summit subarch john stultz
2004-08-28 2:02 ` William Lee Irwin III
@ 2004-08-28 6:17 ` Martin J. Bligh
2004-08-30 18:03 ` john stultz
1 sibling, 1 reply; 7+ messages in thread
From: Martin J. Bligh @ 2004-08-28 6:17 UTC (permalink / raw)
To: john stultz, lkml
Cc: William Lee Irwin III, James, keith maanthey, Chris McDermott
--john stultz <johnstul@us.ibm.com> wrote (on Friday, August 27, 2004 17:24:48 -0700):
> I've been hunting down a bug affecting IBM x440/x445 systems where the
> floppy driver would get spurious interrupts and would not initialize
> properly.
>
> After digging James Cleverdon pointed out that target_cpus() is routing
> the interrupts to the clustered apic broadcast mask. This was causing
> multiple interrupts to show up, breaking the floppy init code.
>
> This one-liner fix simply routes interrupts to the first cpu to resolve
> this issue.
I'd say that means your hardware is horribly broken ... but I guess this
might be a suitable workaround given we're going to reprogram them all
later.
So ... do all your interrupts end up on the first cpu now, or does
irqbalance take care of it?
M.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-28 6:17 ` Martin J. Bligh
@ 2004-08-30 18:03 ` john stultz
2004-08-30 20:46 ` john stultz
2004-08-30 21:24 ` James Cleverdon
0 siblings, 2 replies; 7+ messages in thread
From: john stultz @ 2004-08-30 18:03 UTC (permalink / raw)
To: Martin J. Bligh
Cc: lkml, William Lee Irwin III, James, keith maanthey,
Chris McDermott
On Fri, 2004-08-27 at 23:17, Martin J. Bligh wrote:
> --john stultz <johnstul@us.ibm.com> wrote (on Friday, August 27, 2004 17:24:48 -0700):
>
> > I've been hunting down a bug affecting IBM x440/x445 systems where the
> > floppy driver would get spurious interrupts and would not initialize
> > properly.
> >
> > After digging James Cleverdon pointed out that target_cpus() is routing
> > the interrupts to the clustered apic broadcast mask. This was causing
> > multiple interrupts to show up, breaking the floppy init code.
> >
> > This one-liner fix simply routes interrupts to the first cpu to resolve
> > this issue.
>
> I'd say that means your hardware is horribly broken ... but I guess this
> might be a suitable workaround given we're going to reprogram them all
> later.
Ok, then my patch probably isn't correct. Let me grab James and we'll
sit down and work this out later today.
thanks
-john
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-30 18:03 ` john stultz
@ 2004-08-30 20:46 ` john stultz
2004-08-30 21:24 ` James Cleverdon
1 sibling, 0 replies; 7+ messages in thread
From: john stultz @ 2004-08-30 20:46 UTC (permalink / raw)
To: Martin J. Bligh
Cc: lkml, William Lee Irwin III, James, keith maanthey,
Chris McDermott
On Mon, 2004-08-30 at 11:03, john stultz wrote:
> On Fri, 2004-08-27 at 23:17, Martin J. Bligh wrote:
> > --john stultz <johnstul@us.ibm.com> wrote (on Friday, August 27, 2004 17:24:48 -0700):
> >
> > > I've been hunting down a bug affecting IBM x440/x445 systems where the
> > > floppy driver would get spurious interrupts and would not initialize
> > > properly.
> > >
> > > After digging James Cleverdon pointed out that target_cpus() is routing
> > > the interrupts to the clustered apic broadcast mask. This was causing
> > > multiple interrupts to show up, breaking the floppy init code.
> > >
> > > This one-liner fix simply routes interrupts to the first cpu to resolve
> > > this issue.
> >
> > I'd say that means your hardware is horribly broken ... but I guess this
> > might be a suitable workaround given we're going to reprogram them all
> > later.
>
> Ok, then my patch probably isn't correct. Let me grab James and we'll
> sit down and work this out later today.
So talking with more with James and Martin, the correct fix looks to be
Bill's suggestion of using Lowest Priority instead of Fixed for the
destination mode.
James still claims CPU_MASK_ALL (0xff) is wrong for target_cpus(), but
I'll let him make his case for that.
After some testing, I'll resend the corrected patch.
thanks
-john
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-30 18:03 ` john stultz
2004-08-30 20:46 ` john stultz
@ 2004-08-30 21:24 ` James Cleverdon
2004-08-31 1:06 ` john stultz
1 sibling, 1 reply; 7+ messages in thread
From: James Cleverdon @ 2004-08-30 21:24 UTC (permalink / raw)
To: john stultz, Martin J. Bligh
Cc: lkml, William Lee Irwin III, keith maanthey, Chris McDermott
I'm fine with changing the delivery mode to dest_LowestPrio. However,
someone changed the default destination mask that target_cpus() returns
from XAPIC_DEST_CPUS_MASK (0F) to APIC_ALL_CPUS (FF). The latter value
is a bad idea. I'm unaware of anyone's hardware that will correctly
arbitrate dest_LowestPrio among all CPUs of all clusters. (Please
correct me if I'm wrong.) By chance, FF mostly works on IBM Summit
(EXA) chips, but we can't rely on that in the future.
And if the delivery mode is dest_Fixed, then FF means "broadcast to all
CPUs", which is plainly wrong too.
It would be safer to change back to some value whose behavior is well
defined in Intel's docs, like XAPIC_DEST_CPUS_MASK or John's
suggestion, cpumask_of_cpu(0). Either one will cause almost all
interrupts to land on CPU 0 for P4s. IRQ balancing will shift them to
other processors soon enough.
I note that in 2.6.8.1 the other clustered sub-arches do something
similar to John's or my suggestion. Only numaq uses APIC_ALL_CPUS, and
it has special APIC cluster controllers. (Even there, 0F is arguably
clearer than FF, but the custom chips never route dest_LowestPrio
interrupts outside of the local cluster, so the upper nibble doesn't
matter.)
Alternative patch for include/asm-i386/mach-summit/mach_apic.h:
static inline cpumask_t target_cpus(void)
{
- return CPU_MASK_ALL;
+ /* Start on cluster 0. IRQ balancing will spread load soon. */
+ return XAPIC_DEST_CPUS_MASK;
}
#define TARGET_CPUS (target_cpus())
On Monday 30 August 2004 11:03 am, john stultz wrote:
> On Fri, 2004-08-27 at 23:17, Martin J. Bligh wrote:
> > --john stultz <johnstul@us.ibm.com> wrote (on Friday, August 27,
2004 17:24:48 -0700):
> > > I've been hunting down a bug affecting IBM x440/x445 systems
> > > where the floppy driver would get spurious interrupts and would
> > > not initialize properly.
> > >
> > > After digging James Cleverdon pointed out that target_cpus() is
> > > routing the interrupts to the clustered apic broadcast mask. This
> > > was causing multiple interrupts to show up, breaking the floppy
> > > init code.
> > >
> > > This one-liner fix simply routes interrupts to the first cpu to
> > > resolve this issue.
> >
> > I'd say that means your hardware is horribly broken ... but I guess
> > this might be a suitable workaround given we're going to reprogram
> > them all later.
>
> Ok, then my patch probably isn't correct. Let me grab James and we'll
> sit down and work this out later today.
>
> thanks
> -john
--
James Cleverdon
IBM LTC (xSeries Linux Solutions)
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot comm
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC][PATCH] fix target_cpus() for summit subarch
2004-08-30 21:24 ` James Cleverdon
@ 2004-08-31 1:06 ` john stultz
0 siblings, 0 replies; 7+ messages in thread
From: john stultz @ 2004-08-31 1:06 UTC (permalink / raw)
To: James
Cc: Martin J. Bligh, lkml, William Lee Irwin III, keith maanthey,
Chris McDermott
On Mon, 2004-08-30 at 14:24, James Cleverdon wrote:
> I'm fine with changing the delivery mode to dest_LowestPrio. However,
> someone changed the default destination mask that target_cpus() returns
> from XAPIC_DEST_CPUS_MASK (0F) to APIC_ALL_CPUS (FF). The latter value
> is a bad idea. I'm unaware of anyone's hardware that will correctly
> arbitrate dest_LowestPrio among all CPUs of all clusters. (Please
> correct me if I'm wrong.) By chance, FF mostly works on IBM Summit
> (EXA) chips, but we can't rely on that in the future.
Ok, here is the corrected patch. Ran it through LTP for awhile and
tested a few hotplug USB devices.
If there are no other comments, I'll submit this to Andrew later this
week.
thanks
linux-2.6.9-rc1_summit-target-cpus-fix_A1
-----------------------------------------
diff -Nru a/include/asm-i386/mach-summit/mach_apic.h b/include/asm-i386/mach-summit/mach_apic.h
--- a/include/asm-i386/mach-summit/mach_apic.h 2004-08-30 17:33:02 -07:00
+++ b/include/asm-i386/mach-summit/mach_apic.h 2004-08-30 17:33:02 -07:00
@@ -19,11 +19,15 @@
static inline cpumask_t target_cpus(void)
{
- return CPU_MASK_ALL;
+ /* CPU_MASK_ALL (0xff) has undefined behaviour with
+ * logical clustered apic interrupt routing.
+ * Just start on cpu 0. IRQ balancing will spread load
+ */
+ return cpumask_of_cpu(0);
}
#define TARGET_CPUS (target_cpus())
-#define INT_DELIVERY_MODE (dest_Fixed)
+#define INT_DELIVERY_MODE (dest_LowestPrio)
#define INT_DEST_MODE 1 /* logical delivery broadcast to all procs */
static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-08-31 1:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-28 0:24 [RFC][PATCH] fix target_cpus() for summit subarch john stultz
2004-08-28 2:02 ` William Lee Irwin III
2004-08-28 6:17 ` Martin J. Bligh
2004-08-30 18:03 ` john stultz
2004-08-30 20:46 ` john stultz
2004-08-30 21:24 ` James Cleverdon
2004-08-31 1:06 ` john stultz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox