RE: [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-20 15:46 ` Van Maren, Kevin
  2002-12-20 16:30   ` Martin J. Bligh
  2002-12-20 17:16   ` William Lee Irwin III
  0 siblings, 2 replies; 24+ messages in thread
From: Van Maren, Kevin @ 2002-12-20 15:46 UTC (permalink / raw)
  To: 'William Lee Irwin III', Christoph Hellwig,
	James Cleverdon, Pallipadi, Venkatesh, Linux Kernel, Martin Bligh,
	John Stultz, Nakajima, Jun, Mallick, Asit K, Saxena, Sunil,
	Van Maren, Kevin
  Cc: Protasevich, Natalie

> On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
> >>> A generic patch should also support Unisys' new box, the ES7000 or
> >>> some such.
> 
> On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
> >> That box needs more changes than just the apic setup.  Unfortunately
> >> unisys thinks they shouldn't send their patches to lkml, but when you
see
> >> them e.g. in the suse tree it's a bit understandable that they don't
want
> >> anyone to really see their mess :)

No need to sugar-coat anything :-)

Natalie is the engineer who added support for the ES7000 to Linux.
Fortunately she is in the cube next to me.

She has sent the patches to SuSe/United Linux, and is in the final process
of testing them on 2.5.5x before submitting them to LKML for comment.

> >> And btw, the box isn't that new, but three years ago or so when they
first
> >> showed it on cebit they even refused to talk about linux due to their
> >> restrictive agreements with Microsoft..
>
> On Fri, Dec 20, 2002 at 03:24:01AM -0800, William Lee Irwin III wrote:
> > Kevin, you're the only lkml-posting contact point I know of within
Unisys.
> > Is there any chance you could flag down some of the ia32 crew there for
> > some commentary on this stuff? (or do so yourself if you're in it)

I mostly work on our 16-32p IA64 machines.  Natalie or someone else will
have to comment on the clustered-apic code.

I do know that a lot of the code for the ES7000 is optional, and only
required to support value-added management functionality, which is
especially useful if you are running more than one OS instance on the
machine (it supports 8 fully-independent partitions).

Also, as a clarification, our 32-processor systems are NOT NUMA: there
is a full non-blocking crossbar to memory.  So clustered APIC support
should not be dependant on NUMA.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-20 15:46 ` Van Maren, Kevin
@ 2002-12-20 16:30   ` Martin J. Bligh
  2002-12-20 17:16   ` William Lee Irwin III
  1 sibling, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-20 16:30 UTC (permalink / raw)
  To: Van Maren, Kevin, 'William Lee Irwin III',
	Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
	Linux Kernel, John Stultz, Nakajima, Jun, Mallick, Asit K,
	Saxena, Sunil
  Cc: Protasevich, Natalie

> Natalie is the engineer who added support for the ES7000 to Linux.
> Fortunately she is in the cube next to me.
>
> She has sent the patches to SuSe/United Linux, and is in the final process
> of testing them on 2.5.5x before submitting them to LKML for comment.

Are they under subarch or somehow abstracted this time, or are there
going to be 10,000 ifdef's again? If the latter, I can predict what
the first set of review comments you get are going to be ;-)

M.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-20 15:46 ` Van Maren, Kevin
  2002-12-20 16:30   ` Martin J. Bligh
@ 2002-12-20 17:16   ` William Lee Irwin III
  1 sibling, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2002-12-20 17:16 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: Christoph Hellwig, James Cleverdon, Pallipadi, Venkatesh,
	Linux Kernel, Martin Bligh, John Stultz, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil, Protasevich, Natalie

On Fri, Dec 20, 2002 at 09:46:19AM -0600, Van Maren, Kevin wrote:
> I mostly work on our 16-32p IA64 machines.  Natalie or someone else will
> have to comment on the clustered-apic code.

Okay, that's not too big a deal. I didn't expect you'd field it directly.


On Fri, Dec 20, 2002 at 09:46:19AM -0600, Van Maren, Kevin wrote:
> Also, as a clarification, our 32-processor systems are NOT NUMA: there
> is a full non-blocking crossbar to memory.  So clustered APIC support
> should not be dependant on NUMA.

That one's easy to fix (and apparently for you to spot despite not
actually working on the things).


Thanks,
Bill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-20 22:57 Protasevich, Natalie
  2002-12-20 23:33 ` William Lee Irwin III
  2002-12-25 21:41 ` Alan Cox
  0 siblings, 2 replies; 24+ messages in thread
From: Protasevich, Natalie @ 2002-12-20 22:57 UTC (permalink / raw)
  To: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin
  Cc: 'Andi Kleen', 'Hubert Mantel'

> On Thu, Dec 19, 2002 at 06:04:55PM -0800, James Cleverdon wrote:
> >>> A generic patch should also support Unisys' new box, the ES7000 or
> >>> some such.
> 
> On Fri, Dec 20, 2002 at 08:00:50AM +0000, Christoph Hellwig wrote:
> >> That box needs more changes than just the apic setup.  Unfortunately
> >> unisys thinks they shouldn't send their patches to lkml, but when you
see
> >> them e.g. in the suse tree it's a bit understandable that they don't
want
> >> anyone to really see their mess :)

Briefly, our ES7000 boxes are non-NUMA, but use clustered APICs (logical
with Cascades, and physical with Gallatins/Fosters). Our code is pretty much
within the clustered APIC code (when both physical and logical are
implemented). Even with NUMA that is forced in clustered APIC case, we are
usually OK as a single-node case.
There are only a few problems with porting the Linux kernel to the ES7000:
	we use 8-bit APIC IDs - this makes us use APIC_LDR instead of
APIC_ID throughout the code;
	we have special RTE destination values on IO-APIC - the "if" in the
programming IO-APIC line code;
	we introduce severe IRQ override case - we remap ISA interrupts to a
different interrupt range (all the "i < 16" clauses).

Also, I usually have to add things like XTPR mechanism for Fosters/Gallatins
and disable conventional IRQ balancing, since our IO-APIC doesn't work this
way... (All of the above is in the SuSE code base).

I worked with the SuSE tree which has clustered code (at the first glance)
close to the patch being discussed here.
The 2.5 tree gives us a benefit of the subarch that will accomodate
(hopefully) our special cases. 
But I may need to add more hooks.

>No need to sugar-coat anything :-)

>Natalie is the engineer who added support for the ES7000 to Linux.
>Fortunately she is in the cube next to me.

>She has sent the patches to SuSE/United Linux, and is in the final process
>of testing them on 2.5.5x before submitting them to LKML for comment.

> >> And btw, the box isn't that new, but three years ago or so when they
first
> >> showed it on cebit they even refused to talk about linux due to their
> >> restrictive agreements with Microsoft..
>
> On Fri, Dec 20, 2002 at 03:24:01AM -0800, William Lee Irwin III wrote:
> > Kevin, you're the only lkml-posting contact point I know of within
Unisys.
> > Is there any chance you could flag down some of the ia32 crew there for
> > some commentary on this stuff? (or do so yourself if you're in it)

I will be looking at the Intel patch submited against 2.4 with support for
the ES7000 in mind. I am trying to get the ES7000 patch for 2.5.x out
sometime next week (my boss won't let me have a life until I get ES7000
support in 2.5 (:-<)). At the same time, we are very interested in any
clustered APIC patch that goes in the 2.5 tree (sooner the better).  Having
physical cluster support in 2.5 would greatly reduce the size of diffs for
the ES7000.

>I mostly work on our 16-32p IA64 machines.  Natalie or someone else will
>have to comment on the clustered-apic code.

>I do know that a lot of the code for the ES7000 is optional, and only
>required to support value-added management functionality, which is
>especially useful if you are running more than one OS instance on the
>machine (it supports 8 fully-independent partitions).

>Also, as a clarification, our 32-processor systems are NOT NUMA: there
>is a full non-blocking crossbar to memory.  So clustered APIC support
>should not be dependant on NUMA.

>Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-20 22:57 Protasevich, Natalie
@ 2002-12-20 23:33 ` William Lee Irwin III
  2002-12-25 21:41 ` Alan Cox
  1 sibling, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2002-12-20 23:33 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: 'Christoph Hellwig', 'James Cleverdon',
	'Pallipadi, Venkatesh', 'Linux Kernel',
	'Martin Bligh', 'John Stultz',
	'Nakajima, Jun', 'Mallick, Asit K',
	'Saxena, Sunil', Van Maren, Kevin, 'Andi Kleen',
	'Hubert Mantel'

On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> Briefly, our ES7000 boxes are non-NUMA, but use clustered APICs (logical
> with Cascades, and physical with Gallatins/Fosters). Our code is pretty much
> within the clustered APIC code (when both physical and logical are
> implemented). Even with NUMA that is forced in clustered APIC case, we are
> usually OK as a single-node case.

Okay, so nothing wild like a non-APIC interrupt controller is going on
here. (c.f. Voyager for an example).

On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> There are only a few problems with porting the Linux kernel to the ES7000:
> 	we use 8-bit APIC IDs - this makes us use APIC_LDR instead of
> APIC_ID throughout the code;
> 	we have special RTE destination values on IO-APIC - the "if" in the
> programming IO-APIC line code;
> 	we introduce severe IRQ override case - we remap ISA interrupts to a
> different interrupt range (all the "i < 16" clauses).
> Also, I usually have to add things like XTPR mechanism for Fosters/Gallatins
> and disable conventional IRQ balancing, since our IO-APIC doesn't work this
> way... (All of the above is in the SuSE code base).

Venkatesh, do you think you can handle these generically? Aside from
machine-specific configurations this all looks like perfectly generic.

If it's publicly discussable, what's the difference wrt. the IO-APIC?
IIRC NUMA-Q had a similar issue, where flat logical destinations were
being programmed into the IO-APIC by the IRQ balancing code, but the
NUMA-Q IO-APIC was programmed to accept physical destinations in the
RTE's via the DESTMOD bit, using physical broadcast by default, and
achieving node-locality as physical destinations may not refer to
off-node cpus. There probably isn't an issue of node locality, but even
if the IO-APIC's are programmed for logical DESTMOD it won't work with
the flat logical gunk the original IRQ balance patch programmed up.

>From 2.5.52 include/asm-i386/smp.h:

#ifdef CONFIG_CLUSTERED_APIC
 #define INT_DELIVERY_MODE 0     /* physical delivery on LOCAL quad */
#else
 #define INT_DELIVERY_MODE 1     /* logical delivery broadcast to all procs */
#endif

>From 2.5.52 arch/i386/mach-generic/mach_apic.h:

#ifdef CONFIG_SMP
 #define TARGET_CPUS (clustered_apic_mode ? 0xf : cpu_online_map)
#else
 #define TARGET_CPUS 0x01
#endif

And while setting up the RTE's in io_apic.c:

                entry.delivery_mode = dest_LowestPrio;
                entry.dest_mode = INT_DELIVERY_MODE;
                entry.mask = 0;                         /* enable IRQ */
                entry.dest.logical.logical_dest = TARGET_CPUS;

... which is rather blatant abuse of entry.dest.logical.logical_dest
for the NUMA-Q case, but never mind that.

On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> I worked with the SuSE tree which has clustered code (at the first glance)
> close to the patch being discussed here.
> The 2.5 tree gives us a benefit of the subarch that will accomodate
> (hopefully) our special cases. 
> But I may need to add more hooks.

It'd be great to have the APIC interface general enough to handle all
these machines.

Bill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-22  4:00 Pallipadi, Venkatesh
  2002-12-22  4:05 ` Martin J. Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-22  4:00 UTC (permalink / raw)
  To: William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, Martin Bligh,
	John Stultz, Nakajima, Jun, Mallick, Asit K, Saxena, Sunil,
	Van Maren, Kevin, Andi Kleen, Hubert Mantel, Kamble, Nitin A



> -----Original Message-----
> From: William Lee Irwin III [mailto:wli@holomorphy.com]
> On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> > There are only a few problems with porting the Linux kernel 
> to the ES7000:
> > 	we use 8-bit APIC IDs - this makes us use APIC_LDR instead of
> > APIC_ID throughout the code;
> > 	we have special RTE destination values on IO-APIC - the 
> "if" in the
> > programming IO-APIC line code;
> > 	we introduce severe IRQ override case - we remap ISA 
> interrupts to a
> > different interrupt range (all the "i < 16" clauses).
> > Also, I usually have to add things like XTPR mechanism for 
> Fosters/Gallatins
> > and disable conventional IRQ balancing, since our IO-APIC 
> doesn't work this
> > way... (All of the above is in the SuSE code base).
> 
> Venkatesh, do you think you can handle these generically? Aside from
> machine-specific configurations this all looks like perfectly generic.
> 
> If it's publicly discussable, what's the difference wrt. the IO-APIC?
> IIRC NUMA-Q had a similar issue, where flat logical destinations were
> being programmed into the IO-APIC by the IRQ balancing code, but the
> NUMA-Q IO-APIC was programmed to accept physical destinations in the
> RTE's via the DESTMOD bit, using physical broadcast by default, and
> achieving node-locality as physical destinations may not refer to
> off-node cpus. There probably isn't an issue of node 
> locality, but even
> if the IO-APIC's are programmed for logical DESTMOD it won't work with
> the flat logical gunk the original IRQ balance patch programmed up.
> 
> From 2.5.52 include/asm-i386/smp.h:
> 
> #ifdef CONFIG_CLUSTERED_APIC
>  #define INT_DELIVERY_MODE 0     /* physical delivery on LOCAL quad */
> #else
>  #define INT_DELIVERY_MODE 1     /* logical delivery 
> broadcast to all procs */
> #endif
> 
> 
> From 2.5.52 arch/i386/mach-generic/mach_apic.h:
> 
> #ifdef CONFIG_SMP
>  #define TARGET_CPUS (clustered_apic_mode ? 0xf : cpu_online_map)
> #else
>  #define TARGET_CPUS 0x01
> #endif
> 
> And while setting up the RTE's in io_apic.c:
> 
>                 entry.delivery_mode = dest_LowestPrio;
>                 entry.dest_mode = INT_DELIVERY_MODE;
>                 entry.mask = 0;                         /* 
> enable IRQ */
>                 entry.dest.logical.logical_dest = TARGET_CPUS;
> 
> ... which is rather blatant abuse of entry.dest.logical.logical_dest
> for the NUMA-Q case, but never mind that.
> 
> 
> On Fri, Dec 20, 2002 at 04:57:28PM -0600, Protasevich, Natalie wrote:
> > I worked with the SuSE tree which has clustered code (at 
> the first glance)
> > close to the patch being discussed here.
> > The 2.5 tree gives us a benefit of the subarch that will accomodate
> > (hopefully) our special cases. 
> > But I may need to add more hooks.
> 
> It'd be great to have the APIC interface general enough to handle all
> these machines.

Yes, our feeling it is possible to handle all non-NUMAQ systems pretty generically in terms of APIC setup and interrupt routing. We can use either logical clustered or physical destination modes.
But for NUMAQ systems, interrupt routing has to know about the local nodes and have necessary logic to do the routing withing local node.

Thanks,
-Venkatesh 

 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-22  4:00 Pallipadi, Venkatesh
@ 2002-12-22  4:05 ` Martin J. Bligh
  0 siblings, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-22  4:05 UTC (permalink / raw)
  To: Pallipadi, Venkatesh, William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A

> Yes, our feeling it is possible to handle all non-NUMAQ systems pretty
> generically in terms of APIC setup and interrupt routing. We can use
> either logical clustered or physical destination modes. But for NUMAQ
> systems, interrupt routing has to know about the local nodes and have
> necessary logic to do the routing withing local node.

NUMA-Q doesn't have to know about the local nodes. I set it up to use
physical delivery broadcast, which is a node-local broadcast ... gave
me NUMA affinity for free. I could also use logical clustered (p3 style)
addressing, and work out all the node locality, but I don't see the point.

M.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-22  6:19 Pallipadi, Venkatesh
  2002-12-22  6:39 ` William Lee Irwin III
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Pallipadi, Venkatesh @ 2002-12-22  6:19 UTC (permalink / raw)
  To: Martin J. Bligh, William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A



> -----Original Message-----
> From: Martin J. Bligh [mailto:mbligh@aracnet.com]
> > Yes, our feeling it is possible to handle all non-NUMAQ 
> systems pretty
> > generically in terms of APIC setup and interrupt routing. We can use
> > either logical clustered or physical destination modes. But 
> for NUMAQ
> > systems, interrupt routing has to know about the local 
> nodes and have
> > necessary logic to do the routing withing local node.
> 
> NUMA-Q doesn't have to know about the local nodes. I set it up to use
> physical delivery broadcast, which is a node-local broadcast ... gave
> me NUMA affinity for free. I could also use logical clustered 
> (p3 style)
> addressing, and work out all the node locality, but I don't 
> see the point.
> 

I actually meant interrupt distribution (rather than interrupt routing). AFAIK, interrupt distribution right now assumes flat logical setup and tries to distribute the interrupt. And is disabled in case of clustered APIC mode. 
I was just thinking loud, about the changes interrupt distribution code should have for systems using clustered APIC/physical mode (NUMAQ and non-NUMAQ).

Thanks,
-Venkatesh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-22  6:19 [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs Pallipadi, Venkatesh
@ 2002-12-22  6:39 ` William Lee Irwin III
  2002-12-22 17:21 ` Martin J. Bligh
  2002-12-22 17:23 ` Martin J. Bligh
  2 siblings, 0 replies; 24+ messages in thread
From: William Lee Irwin III @ 2002-12-22  6:39 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Martin J. Bligh, Protasevich, Natalie, Christoph Hellwig,
	James Cleverdon, Linux Kernel, John Stultz, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin, Andi Kleen,
	Hubert Mantel, Kamble, Nitin A

On Sat, Dec 21, 2002 at 10:19:20PM -0800, Pallipadi, Venkatesh wrote:
> I actually meant interrupt distribution (rather than interrupt
> routing). AFAIK, interrupt distribution right now assumes flat
> logical setup and tries to distribute the interrupt. And is disabled
> in case of clustered APIC mode. I was just thinking loud, about the
> changes interrupt distribution code should have for systems using
> clustered APIC/physical mode (NUMAQ and non-NUMAQ).

IIRC the physical DESTMOD in the IO-APIC's RTE's is not essential,
just somewhat more optimal given generalized node affinity. It also
dodged the need for infrastructure to associate various kinds of
devices with nodes in the 2.4.x timeframe.

Dumping small tidbits of decision-making and destination formatting
into headers that can be swizzled across subarches somehow would be ideal.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-22  6:19 [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs Pallipadi, Venkatesh
  2002-12-22  6:39 ` William Lee Irwin III
@ 2002-12-22 17:21 ` Martin J. Bligh
  2002-12-22 17:23 ` Martin J. Bligh
  2 siblings, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-22 17:21 UTC (permalink / raw)
  To: Pallipadi, Venkatesh, William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A

> I actually meant interrupt distribution (rather than interrupt routing).
> AFAIK, interrupt distribution right now assumes flat logical setup and
> tries to distribute the interrupt. And is disabled in case of clustered
> APIC mode.  I was just thinking loud, about the changes interrupt
> distribution code should have for systems using clustered APIC/physical
> mode (NUMAQ and non-NUMAQ).

Oh, you mean irq_balance? I'm happy to leave that turned off on NUMA-Q
until it does something less random than it does now. Getting some sort
of affinity for interrupts over a longer period is much more interesting
than providing pretty numbers under /proc/interrupts. Giving each of
the frequently used interrupts their own local CPU to process it would
be cool, but I see no purpose in continually moving them around. If you're
concerned about fairness, that's a scheduler problem to account for and
deal with, IMHO.

The provided topology functions should be able to do node_to_cpumask
and cpu_to_node mappings once that's sorted out. Treat each node as a
seperate "system" and balance within that.

M.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-22  6:19 [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs Pallipadi, Venkatesh
  2002-12-22  6:39 ` William Lee Irwin III
  2002-12-22 17:21 ` Martin J. Bligh
@ 2002-12-22 17:23 ` Martin J. Bligh
  2 siblings, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-22 17:23 UTC (permalink / raw)
  To: Pallipadi, Venkatesh, William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A

>> > Yes, our feeling it is possible to handle all non-NUMAQ
>> systems pretty
>> > generically in terms of APIC setup and interrupt routing. We can use
>> > either logical clustered or physical destination modes. But
>> for NUMAQ
>> > systems, interrupt routing has to know about the local
>> nodes and have
>> > necessary logic to do the routing withing local node.
>>
>> NUMA-Q doesn't have to know about the local nodes. I set it up to use
>> physical delivery broadcast, which is a node-local broadcast ... gave
>> me NUMA affinity for free. I could also use logical clustered
>> (p3 style)
>> addressing, and work out all the node locality, but I don't
>> see the point.
>
> I actually meant interrupt distribution (rather than interrupt routing).
> AFAIK, interrupt distribution right now assumes flat logical setup and
> tries to distribute the interrupt. And is disabled in case of clustered
> APIC mode.  I was just thinking loud, about the changes interrupt
> distribution code should have for systems using clustered APIC/physical
> mode (NUMAQ and non-NUMAQ).

Actually, if you're talking about irq_balance, that needs fixing for all
NUMA systems to get affininity, not just NUMA-Q. It then needs an
abstraction layer to do "program the IO-APIC with a cpu_bitmask" that's
different for each apic addressing mode used.

M.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-22 20:41 ` Protasevich, Natalie
  2002-12-22 20:52   ` Martin J. Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Protasevich, Natalie @ 2002-12-22 20:41 UTC (permalink / raw)
  To: 'Martin J. Bligh', Pallipadi, Venkatesh,
	William Lee Irwin III, Protasevich, Natalie
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A


>>> > Yes, our feeling it is possible to handle all non-NUMAQ
>>> systems pretty
>>> > generically in terms of APIC setup and interrupt routing. We can use
>>> > either logical clustered or physical destination modes. But
>>> for NUMAQ
>>> > systems, interrupt routing has to know about the local
>>> nodes and have
>>> > necessary logic to do the routing withing local node.
>>>
>>> NUMA-Q doesn't have to know about the local nodes. I set it up to use
>>> physical delivery broadcast, which is a node-local broadcast ... gave
>>> me NUMA affinity for free. I could also use logical clustered
>>> (p3 style)
>>> addressing, and work out all the node locality, but I don't
>>> see the point.
>>
>> I actually meant interrupt distribution (rather than interrupt routing).
>> AFAIK, interrupt distribution right now assumes flat logical setup and
>> tries to distribute the interrupt. And is disabled in case of clustered
>> APIC mode.  I was just thinking loud, about the changes interrupt
>> distribution code should have for systems using clustered APIC/physical
>> mode (NUMAQ and non-NUMAQ).

>Actually, if you're talking about irq_balance, that needs fixing for all
>NUMA systems to get affininity, not just NUMA-Q. It then needs an
>abstraction layer to do "program the IO-APIC with a cpu_bitmask" that's
>different for each apic addressing mode used.

Some platforms (like certain ES7000s) won't tolerate any bit masks
programmed into the RTE because their balancing is done entirely in
hardware, similar to XTPR mechanism for Fosters. For those I suggest to have
an escape door, in the form of boot parameter such as "irq_balance=no". It
was suggested to us by SuSE and worked great - I could turn it off in our
platform code unconditionally. It could also help those who can use irq
balancing as is but might want to implement their own balancing schema.

--Natalie

>M.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m  ore than 8 CPUs
  2002-12-22 20:41 ` Protasevich, Natalie
@ 2002-12-22 20:52   ` Martin J. Bligh
  0 siblings, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-22 20:52 UTC (permalink / raw)
  To: Protasevich, Natalie, Pallipadi, Venkatesh, William Lee Irwin III
  Cc: Christoph Hellwig, James Cleverdon, Linux Kernel, John Stultz,
	Nakajima, Jun, Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin,
	Andi Kleen, Hubert Mantel, Kamble, Nitin A

> Some platforms (like certain ES7000s) won't tolerate any bit masks
> programmed into the RTE because their balancing is done entirely in
> hardware, similar to XTPR mechanism for Fosters. For those I suggest to
> have an escape door, in the form of boot parameter such as
> "irq_balance=no". It was suggested to us by SuSE and worked great - I
> could turn it off in our platform code unconditionally. It could also
> help those who can use irq balancing as is but might want to implement
> their own balancing schema.

Having a boot-time parameter is useful, but I'd like it to default to off
without a paramater for the platforms where it's just broken. At the
moment there's an "if (clustered_apic_mode) return;" stuck at the top
of balance_irq, the latest series of patches changes that to
"if (no_balance_irq) return;", which is set by NUMA-Q. If we can set
up no_balance_irq to default correctly, but be possibly overridden by
the boot-time parameter, I think we'd have the best of both worlds.

M.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-23  7:29 Kamble, Nitin A
  2002-12-23  7:52 ` Martin J. Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Kamble, Nitin A @ 2002-12-23  7:29 UTC (permalink / raw)
  To: Martin J. Bligh, William Lee Irwin III
  Cc: Protasevich, Natalie, Pallipadi, Venkatesh, Christoph Hellwig,
	James Cleverdon, Linux Kernel, John Stultz, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin, Andi Kleen,
	Hubert Mantel

	Martin, Couple of days back I have posted a kernel IRQ distribution patch with some discussion. There we tried doing same things as you have interests here. We have made the interval flexible and longer. Also the randomness of the algorithm is removed.
	  Also about the fairness. Scheduler will not be able to solve the fairness issues coming because of the interrupts at all the times. For example, at very interrupts load, some of the CPUs may get 100% busy just servicing the interrupts. Here the scheduler cannot do anything. To get the fairness, we need the interrupts distribution mechanism to move interrupts as required.
	  May be we can add some generic NUMA awareness in it. But I am not fully aware of the way interrupt routing happens in various NUMA systems. If I can get this information I can look into, how can we have the generic NUMA support in the new IRQ distribution code.

Thanks,
Nitin

-----Original Message-----
From: Martin J. Bligh [mailto:mbligh@aracnet.com]
Sent: Sunday, December 22, 2002 9:21 AM
To: Pallipadi, Venkatesh; William Lee Irwin III; Protasevich, Natalie
Cc: Christoph Hellwig; James Cleverdon; Linux Kernel; John Stultz;
Nakajima, Jun; Mallick, Asit K; Saxena, Sunil; Van Maren, Kevin; Andi
Kleen; Hubert Mantel; Kamble, Nitin A
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with
m ore than 8 CPUs


> I actually meant interrupt distribution (rather than interrupt routing).
> AFAIK, interrupt distribution right now assumes flat logical setup and
> tries to distribute the interrupt. And is disabled in case of clustered
> APIC mode.  I was just thinking loud, about the changes interrupt
> distribution code should have for systems using clustered APIC/physical
> mode (NUMAQ and non-NUMAQ).

Oh, you mean irq_balance? I'm happy to leave that turned off on NUMA-Q
until it does something less random than it does now. Getting some sort
of affinity for interrupts over a longer period is much more interesting
than providing pretty numbers under /proc/interrupts. Giving each of
the frequently used interrupts their own local CPU to process it would
be cool, but I see no purpose in continually moving them around. If you're
concerned about fairness, that's a scheduler problem to account for and
deal with, IMHO.

The provided topology functions should be able to do node_to_cpumask
and cpu_to_node mappings once that's sorted out. Treat each node as a
seperate "system" and balance within that.

M.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-23  7:29 Kamble, Nitin A
@ 2002-12-23  7:52 ` Martin J. Bligh
  2002-12-23  9:46   ` Zwane Mwaikambo
  0 siblings, 1 reply; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-23  7:52 UTC (permalink / raw)
  To: Kamble, Nitin A, William Lee Irwin III
  Cc: Protasevich, Natalie, Pallipadi, Venkatesh, Christoph Hellwig,
	James Cleverdon, Linux Kernel, John Stultz, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil, Van Maren, Kevin, Andi Kleen,
	Hubert Mantel

> 	Martin, Couple of days back I have posted a kernel IRQ distribution patch with some discussion. There we tried doing same things as you have interests here. We have made the interval flexible and longer. Also the randomness of the algorithm is removed.

Yup, saw it, but haven't given it the inspection it really deserves yet.
That code does need some work, and it sounds like you're doing the
right things to it.

> 	  Also about the fairness. Scheduler will not be able to solve the fairness issues coming because of the interrupts at all the times. For example, at very interrupts load, some of the CPUs may get 100% busy just servicing the interrupts. Here the scheduler cannot do anything. To get the fairness, we need the interrupts distribution mechanism to move interrupts as required.

Well, if the scheduler didn't ding the process for time spent in interrupts,
I think it'd work out - it could always run processes on another CPU ;-) 
But that may not be practical to do in reality.

> 	  May be we can add some generic NUMA awareness in it. But I am not fully aware of the way interrupt routing happens in various NUMA systems. If I can get this information I can look into, how can we have the generic NUMA support in the new IRQ distribution code.

Mmm... well it's late and I'm tired, but off the top of my head ... you
need to map from each PCI bus to the closest set of cpus - for me that's
a simple bus_to_node mapping (not sure that bit is added to the topology
infrastructure yet, but it's a trivial patch that's floating around ...
I'll try to dig out out and add it to the 2.5-mjb tree). Then just limit
the distrubtion for an interrupt to the closest set of CPUs (for UMA SMP
would just be cpu_online_map), and have another abstracted function that
sets IO-APIC distribution up to a certain CPU (if doing balancing explicity)
or group thereof. But it's late, so if that makes no sense, I'll take it
all back in the morning ;-)

If you're interested in working on it, I'm very happy to test it ...
(should probably be kept seperate from your other stuff though).
I'll see if I can find someone in our performance team to evaluate 
how your existing patch runs on SMP for us ...

M.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-23  7:52 ` Martin J. Bligh
@ 2002-12-23  9:46   ` Zwane Mwaikambo
  2002-12-23 15:30     ` Martin J. Bligh
  0 siblings, 1 reply; 24+ messages in thread
From: Zwane Mwaikambo @ 2002-12-23  9:46 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Kamble, Nitin A, William Lee Irwin III, Protasevich, Natalie,
	Pallipadi, Venkatesh, Christoph Hellwig, James Cleverdon,
	Linux Kernel, John Stultz, Nakajima, Jun, Mallick, Asit K,
	Saxena, Sunil, Van Maren, Kevin, Andi Kleen, Hubert Mantel

On Sun, 22 Dec 2002, Martin J. Bligh wrote:

> > 	  May be we can add some generic NUMA awareness in it. But I am not fully aware of the way interrupt routing happens in various NUMA systems. If I can get this information I can look into, how can we have the generic NUMA support in the new IRQ distribution code.
>
> Mmm... well it's late and I'm tired, but off the top of my head ... you
> need to map from each PCI bus to the closest set of cpus - for me that's
> a simple bus_to_node mapping (not sure that bit is added to the topology
> infrastructure yet, but it's a trivial patch that's floating around ...
> I'll try to dig out out and add it to the 2.5-mjb tree). Then just limit
> the distrubtion for an interrupt to the closest set of CPUs (for UMA SMP
> would just be cpu_online_map), and have another abstracted function that
> sets IO-APIC distribution up to a certain CPU (if doing balancing explicity)
> or group thereof. But it's late, so if that makes no sense, I'll take it
> all back in the morning ;-)

How about using logical destination mode when programming the IOAPIC?
Currently we do physical in io_apic.c (the reason why it breaks on NUMAQ)
This way we can get node affinity just by setting the Destination Field
for an IOREDTBL to APIC_BROADCAST_ID and also targetting single cpus on a
node becomes node generic.

Cheers,
	Zwane Mwaikambo

PS This suggestion also comes with a possible nonsense disclaimer as i'm
also about to go to bed ;)

-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-23  9:46   ` Zwane Mwaikambo
@ 2002-12-23 15:30     ` Martin J. Bligh
  0 siblings, 0 replies; 24+ messages in thread
From: Martin J. Bligh @ 2002-12-23 15:30 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Kamble, Nitin A, William Lee Irwin III, Protasevich, Natalie,
	Pallipadi, Venkatesh, Christoph Hellwig, James Cleverdon,
	Linux Kernel, John Stultz, Nakajima, Jun, Mallick, Asit K,
	Saxena, Sunil, Van Maren, Kevin, Andi Kleen, Hubert Mantel

> How about using logical destination mode when programming the IOAPIC?
> Currently we do physical in io_apic.c (the reason why it breaks on NUMAQ)
> This way we can get node affinity just by setting the Destination Field
> for an IOREDTBL to APIC_BROADCAST_ID and also targetting single cpus on a
> node becomes node generic.

Yup, that'll work fine once we have balance_IRQ set up with node affinity.
Using phys is just a cheapo lazy hacker's way to steal node affinity for
free from the mouths of babes.

M.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-20 22:57 Protasevich, Natalie
  2002-12-20 23:33 ` William Lee Irwin III
@ 2002-12-25 21:41 ` Alan Cox
  1 sibling, 0 replies; 24+ messages in thread
From: Alan Cox @ 2002-12-25 21:41 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin, 'Andi Kleen', 'Hubert Mantel'

One thing I will say. Your code would be a hell of a lot saner for
merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
PCI ones to 16+ like everyone else does. That would kill a _lot_ of
ifdefs and the IRQ0 corner case



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-26  1:14 Protasevich, Natalie
  2002-12-27 23:39 ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Protasevich, Natalie @ 2002-12-26  1:14 UTC (permalink / raw)
  To: 'Alan Cox', Protasevich, Natalie
  Cc: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin, 'Andi Kleen', 'Hubert Mantel'

>One thing I will say. Your code would be a hell of a lot saner for
>merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
>PCI ones to 16+ like everyone else does. That would kill a _lot_ of
>ifdefs and the IRQ0 corner case

Alan, do you mean the case implemented in the IA64 tree? I was terribly out
of time so I had to do something quick and dirty. The IRQ0 was not nearly as
bad as the rest of the legacy drivers asking for the "IRQ3" and "4" etc. I
haven't looked into other arch's implementations - who else has done it? Was
it ever case similar to ours in others?

Thanks,

--Natalie

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2002-12-26  2:18 Van Maren, Kevin
  2002-12-27 23:38 ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Van Maren, Kevin @ 2002-12-26  2:18 UTC (permalink / raw)
  To: 'Alan Cox ', Protasevich, Natalie
  Cc: ''William Lee Irwin III' ',
	''Christoph Hellwig' ',
	''James Cleverdon' ',
	''Pallipadi, Venkatesh' ',
	''Linux Kernel' ',
	''Martin Bligh' ',
	''John Stultz' ',
	''Nakajima, Jun' ',
	''Mallick, Asit K' ',
	''Saxena, Sunil' ', Van Maren, Kevin,
	''Andi Kleen' ',
	''Hubert Mantel' '

> One thing I will say. Your code would be a hell of a lot saner for
> merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
> PCI ones to 16+ like everyone else does. That would kill a _lot_ of
> ifdefs and the IRQ0 corner case

If you have a suggestion on how to do that, I am sure we would
all be grateful to hear it.

Note that the reason the code _exists_ is because the interrupt
lines are physically connected to different pins on the APIC
than they are in "normal" systems.  The legitimacy of that
decision is not up for debate at this point -- that is the way
the system was built, and Linux needs to deal with it in
order to run on it.

So the PCI interrupts are in the table at IRQs < 16 (because
it tells which pin is being used), which makes it difficult
to tell whether a PCI or an ISA interrupt is being requested
if you tell the code "irq 3": if ISA, you need to use pin f(X),
while if PCI, you use pin X.

ACPI should have the ISA redirection information, but as
Natalie was saying, drivers hard-code the ISA vectors without
checking the ACPI info.

I suppose it would be possible to detect the ES7000 and have
the kernel re-write the PCI vectors (say, add 16 to them all)
and then re-mangle them based on a "< 16" criteria.
But I don't believe that is a "clean" solution either
(and would break when the ACPI isa redirection table is
properly used).

Anyway, this was the reason for the "severe irq override"
comment by Natalie.

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-26  2:18 Van Maren, Kevin
@ 2002-12-27 23:38 ` Alan Cox
  0 siblings, 0 replies; 24+ messages in thread
From: Alan Cox @ 2002-12-27 23:38 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: Protasevich, Natalie, ''William Lee Irwin III' ',
	''Christoph Hellwig' ',
	''James Cleverdon' ',
	"''Pallipadi, Venkatesh' "',
	''Linux Kernel' ',
	''Martin Bligh' ',
	''John Stultz' ',
	''Nakajima, Jun' ',
	''Mallick, Asit K' ',
	''Saxena, Sunil' ',
	''Andi Kleen' ',
	''Hubert Mantel' '

On Thu, 2002-12-26 at 02:18, Van Maren, Kevin wrote:
> If you have a suggestion on how to do that, I am sure we would
> all be grateful to hear it.
> 
> Note that the reason the code _exists_ is because the interrupt
> lines are physically connected to different pins on the APIC
> than they are in "normal" systems.  The legitimacy of that
> decision is not up for debate at this point -- that is the way
> the system was built, and Linux needs to deal with it in
> order to run on it.

The IRQ number is a cookie. Linux knows that on x86 ISA IRQ is mapped
0-15 and the ISA drivers sometimes know about this stuff too. What
exception number comes back off the processor and what function you call
is really quite unrelated. So request/free_irq functionality in the i386
layer can happily remap the irqs back and forth so ISA comes 0-15, and
keep the drivers and core oblivious to this.

> So the PCI interrupts are in the table at IRQs < 16 (because
> it tells which pin is being used), which makes it difficult
> to tell whether a PCI or an ISA interrupt is being requested
> if you tell the code "irq 3": if ISA, you need to use pin f(X),
> while if PCI, you use pin X.

Internal detail - doesnt matter for how you number IRQ's outside of your
arch/i386/kernel internal bits

> ACPI should have the ISA redirection information, but as
> Natalie was saying, drivers hard-code the ISA vectors without
> checking the ACPI info.

ACPI is basically PC specific gunge. Drivers don't and should not know
about it.

> I suppose it would be possible to detect the ES7000 and have
> the kernel re-write the PCI vectors (say, add 16 to them all)

Linux doesn't touch the PCI interrupt line values or really care about
them, so thats trivial to do.

> and then re-mangle them based on a "< 16" criteria.
> But I don't believe that is a "clean" solution either
> (and would break when the ACPI isa redirection table is
> properly used).

No because any ACPI ISA redirect must be done portably, so used to keep
the ISA irq's logically 0-15 on x86

> 
> Anyway, this was the reason for the "severe irq override"
> comment by Natalie.

I understand why it was done - I think its the wrong abstraction as it
pushes too much board knowledge into drivers and into places that make
assumptions about ISA (a legacy of Linux original design being for ISA
PC)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2002-12-26  1:14 Protasevich, Natalie
@ 2002-12-27 23:39 ` Alan Cox
  0 siblings, 0 replies; 24+ messages in thread
From: Alan Cox @ 2002-12-27 23:39 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin, 'Andi Kleen', 'Hubert Mantel'

On Thu, 2002-12-26 at 01:14, Protasevich, Natalie wrote:
> Alan, do you mean the case implemented in the IA64 tree? I was terribly out

x86. IA-64 isnt something I pay any attention too anyway



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
@ 2003-01-06 18:58 Protasevich, Natalie
  2003-01-08 14:53 ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Protasevich, Natalie @ 2003-01-06 18:58 UTC (permalink / raw)
  To: 'Alan Cox', Protasevich, Natalie
  Cc: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin, 'Andi Kleen', 'Hubert Mantel'

>One thing I will say. Your code would be a hell of a lot saner for
>merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
>PCI ones to 16+ like everyone else does. That would kill a _lot_ of
>ifdefs and the IRQ0 corner case

Alan,

You were right: my new IRQ overwrite code (done the way you suggested) is
getting much smaller now.
I got it down to ... one line :-)! 

I have to say, that either the Linux code got greatly perfected or our
numerous BIOS changes helped (one or the other, maybe  both) but in earlier
days I couldn't boot the system with generic SMP kernel past the first delay
calibration (off of the PIC). That's why I had to tinker with the IRQ0 and
do the rest of ugly IRQ transformations you noticed earlier. APIC and XTPR
issues  are still there (I will wait for Venkatesh's patch), but I am only
concentrating on interrupts this time. Now, it only stumbles on the IO-APIC
setup, which I can  fix with one line of code... Unfortunately, this line
cannot be justified without bringing up "knowledge of the platform". 

I am working with the MP table for now; the ACPI case gives me same results
but I haven't looked at it yet.

The problem is that current IRQ overwrite code handles everything perfectly
except it cannot handle PCI IRQ range being placed  over the ISA range:

static int pin_2_irq(int idx, int apic, int pin)
{
	.....
        switch (mp_bus_id_to_type[bus])
        {
                case MP_BUS_ISA: /* ISA pin */
                case MP_BUS_EISA:
                case MP_BUS_MCA:
                {
                        irq = mp_irqs[idx].mpc_srcbusirq;
                        break;
                }
                case MP_BUS_PCI: /* PCI pin */
                {
                        /*
                         * PCI IRQs are mapped in order
                         */
                        i = irq = 0;
                        while (i < apic)
                                irq += nr_ioapic_registers[i++];
//Here, it just takes the pin (0-16 in our case) and returns it as IRQ:
                        irq += pin;
//Knowing the above and the fact that our first IO-APIC has the ISA range, I
just shift it off the ISA range:
         ===>>          if (!apic) irq += 16; <<==== NBP - my line. Could be
"if (irq < 16)" instead
                        break;
                }
                default:
                {
	....

The original code is assumtious itself... but it is a question of how
generic I want to be to handle our case.
I guess I could:

1) place pin_2_irq and the one that fixes the ACPI case (and which I haven't
found yet) in our sub-arch making those routines platform defined
2) try to fit in the generic case which would take something like changing
mp_irqs on the platform basis or finding something that fixes every possible
case of this kind. For example: in the IA64 case, irq code was arranged
pretty smart: they made one to one correspondence between vectors and IRQs.
Then they set up ISA range within 0x20-0x2f, and all others go from 0x30 on,
this way they never mix up.(BTW, you mentioned x86 case once, but to me
their IRQ code looked identical to i386 case unless I missed something.)
3) ??? - what would you recommend? - ??? (Everyone's comments are VERY
welcome!)

This is a crucial issue for ES7000, since everything else seems to fit in
sub-arch. 
Another one that I am worried about is XTPR, hopefully someone is looking at
its implementation... 

Thanks,

--Natalie

-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: Wednesday, December 25, 2002 2:42 PM
To: Protasevich, Natalie
Cc: 'William Lee Irwin III'; 'Christoph Hellwig'; 'James Cleverdon';
'Pallipadi, Venkatesh'; 'Linux Kernel'; 'Martin Bligh'; 'John Stultz';
'Nakajima, Jun'; 'Mallick, Asit K'; 'Saxena, Sunil'; Van Maren, Kevin;
'Andi Kleen'; 'Hubert Mantel'
Subject: RE: [PATCH][2.4] generic cluster APIC support for systems with
m ore than 8 CPUs

One thing I will say. Your code would be a hell of a lot saner for
merging if you mapped the ISA/Legacy IRQ's as 0-15 (to software) and the
PCI ones to 16+ like everyone else does. That would kill a _lot_ of
ifdefs and the IRQ0 corner case

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH][2.4]  generic cluster APIC support for systems with m ore than 8 CPUs
  2003-01-06 18:58 Protasevich, Natalie
@ 2003-01-08 14:53 ` Alan Cox
  0 siblings, 0 replies; 24+ messages in thread
From: Alan Cox @ 2003-01-08 14:53 UTC (permalink / raw)
  To: Protasevich, Natalie
  Cc: 'William Lee Irwin III', 'Christoph Hellwig',
	'James Cleverdon', 'Pallipadi, Venkatesh',
	'Linux Kernel', 'Martin Bligh',
	'John Stultz', 'Nakajima, Jun',
	'Mallick, Asit K', 'Saxena, Sunil',
	Van Maren, Kevin, 'Andi Kleen', 'Hubert Mantel'

On Mon, 2003-01-06 at 18:58, Protasevich, Natalie wrote:
> 1) place pin_2_irq and the one that fixes the ACPI case (and which I haven't
> found yet) in our sub-arch making those routines platform defined

Does  cpu_to_pci_irq() pci_to_cpu_irq() work for this. That is sort of
the equivalent we have in mapping functions for other purposes. You
could then do the 16 irq shift, while other platforms would define it
in default/*.h to be a nop. 

> Another one that I am worried about is XTPR, hopefully someone is looking at
> its implementation... 

XPTR I really don't know anything about alas.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-01-08 13:59 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-22  6:19 [PATCH][2.4] generic cluster APIC support for systems with m ore than 8 CPUs Pallipadi, Venkatesh
2002-12-22  6:39 ` William Lee Irwin III
2002-12-22 17:21 ` Martin J. Bligh
2002-12-22 17:23 ` Martin J. Bligh
  -- strict thread matches above, loose matches on Subject: below --
2003-01-06 18:58 Protasevich, Natalie
2003-01-08 14:53 ` Alan Cox
2002-12-26  2:18 Van Maren, Kevin
2002-12-27 23:38 ` Alan Cox
2002-12-26  1:14 Protasevich, Natalie
2002-12-27 23:39 ` Alan Cox
2002-12-23  7:29 Kamble, Nitin A
2002-12-23  7:52 ` Martin J. Bligh
2002-12-23  9:46   ` Zwane Mwaikambo
2002-12-23 15:30     ` Martin J. Bligh
     [not found] <3FAD1088D4556046AEC48D80B47B478C1AEC75@usslc-exch-4.slc.unisys. com>
2002-12-22 20:41 ` Protasevich, Natalie
2002-12-22 20:52   ` Martin J. Bligh
2002-12-22  4:00 Pallipadi, Venkatesh
2002-12-22  4:05 ` Martin J. Bligh
2002-12-20 22:57 Protasevich, Natalie
2002-12-20 23:33 ` William Lee Irwin III
2002-12-25 21:41 ` Alan Cox
     [not found] <3FAD1088D4556046AEC48D80B47B478C0101F55D@usslc-exch-4.slc.unisy s.com>
2002-12-20 15:46 ` Van Maren, Kevin
2002-12-20 16:30   ` Martin J. Bligh
2002-12-20 17:16   ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).