* [patch] Pad irq_desc to internode cacheline size
@ 2007-04-09 19:56 Ravikiran G Thirumalai
2007-04-09 20:57 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Ravikiran G Thirumalai @ 2007-04-09 19:56 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
We noticed a drop in n/w performance due to the irq_desc being cacheline
aligned rather than internode aligned. We see 50% of expected performance
when two e1000 nics local to two different nodes have consecutive irq
descriptors allocated, due to false sharing.
Note that this patch does away with cacheline padding for the UP case, as it
does not seem useful for UP configurations.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Index: linux-2.6.21-rc5/include/linux/irq.h
===================================================================
--- linux-2.6.21-rc5.orig/include/linux/irq.h 2007-04-09 10:16:23.560848473 -0700
+++ linux-2.6.21-rc5/include/linux/irq.h 2007-04-09 10:16:45.401177929 -0700
@@ -175,7 +175,7 @@ struct irq_desc {
struct proc_dir_entry *dir;
#endif
const char *name;
-} ____cacheline_aligned;
+} ____cacheline_internodealigned_in_smp;
extern struct irq_desc irq_desc[NR_IRQS];
Index: linux-2.6.21-rc5/kernel/irq/handle.c
===================================================================
--- linux-2.6.21-rc5.orig/kernel/irq/handle.c 2007-02-04 10:44:54.000000000 -0800
+++ linux-2.6.21-rc5/kernel/irq/handle.c 2007-04-09 12:26:40.473326023 -0700
@@ -48,7 +48,7 @@ handle_bad_irq(unsigned int irq, struct
*
* Controller mappings for all interrupt sources:
*/
-struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned = {
+struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
[0 ... NR_IRQS-1] = {
.status = IRQ_DISABLED,
.chip = &no_irq_chip,
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [patch] Pad irq_desc to internode cacheline size
2007-04-09 19:56 [patch] Pad irq_desc to internode cacheline size Ravikiran G Thirumalai
@ 2007-04-09 20:57 ` Andrew Morton
2007-04-09 21:47 ` Eric W. Biederman
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-04-09 20:57 UTC (permalink / raw)
To: Ravikiran G Thirumalai; +Cc: linux-kernel, Eric W. Biederman
On Mon, 9 Apr 2007 12:56:27 -0700
Ravikiran G Thirumalai <kiran@scalex86.org> wrote:
> We noticed a drop in n/w performance due to the irq_desc being cacheline
> aligned rather than internode aligned. We see 50% of expected performance
> when two e1000 nics local to two different nodes have consecutive irq
> descriptors allocated, due to false sharing.
>
> Note that this patch does away with cacheline padding for the UP case, as it
> does not seem useful for UP configurations.
>
> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
> Signed-off-by: Shai Fultheim <shai@scalex86.org>
>
> Index: linux-2.6.21-rc5/include/linux/irq.h
> ===================================================================
> --- linux-2.6.21-rc5.orig/include/linux/irq.h 2007-04-09 10:16:23.560848473 -0700
> +++ linux-2.6.21-rc5/include/linux/irq.h 2007-04-09 10:16:45.401177929 -0700
> @@ -175,7 +175,7 @@ struct irq_desc {
> struct proc_dir_entry *dir;
> #endif
> const char *name;
> -} ____cacheline_aligned;
> +} ____cacheline_internodealigned_in_smp;
>
> extern struct irq_desc irq_desc[NR_IRQS];
This will consume nearly 4k per irq won't it? What is the upper bound
here, across all configs and all hardware?
Is VSMP the only arch which has ____cacheline_internodealigned_in_smp
larger than ____cacheline_aligned_in_smp?
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [patch] Pad irq_desc to internode cacheline size
2007-04-09 20:57 ` Andrew Morton
@ 2007-04-09 21:47 ` Eric W. Biederman
2007-04-09 22:09 ` Ravikiran G Thirumalai
0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2007-04-09 21:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ravikiran G Thirumalai, linux-kernel
Andrew Morton <akpm@linux-foundation.org> writes:
> This will consume nearly 4k per irq won't it? What is the upper bound
> here, across all configs and all hardware?
>
> Is VSMP the only arch which has ____cacheline_internodealigned_in_smp
> larger than ____cacheline_aligned_in_smp?
Ugh. We set internode aligned to 4k for all of x86_64.
I believe this ups our worst case memory consumption for
the array from 1M to 32M. Although the low end might be 2M.
I can't recall if an irq_desc takes one cache line or two
after we have put the cpu masks in it.
My gut feel says that what we want to do is delay this until
we are dynamically allocating the array members. Then we can at
least have the chance of allocating the memory on the proper NUMA
node, and won't need the extra NUMA alignment.
I'm not at all certain I'm impressed by an architecture that has
4K aligned cache lines. That seems terribly piggy. We might as
well do distributed shared memory in software on a cluster...
Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] Pad irq_desc to internode cacheline size
2007-04-09 21:47 ` Eric W. Biederman
@ 2007-04-09 22:09 ` Ravikiran G Thirumalai
2007-04-09 22:33 ` Eric W. Biederman
0 siblings, 1 reply; 5+ messages in thread
From: Ravikiran G Thirumalai @ 2007-04-09 22:09 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel, shai
On Mon, Apr 09, 2007 at 03:47:52PM -0600, Eric W. Biederman wrote:
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > This will consume nearly 4k per irq won't it? What is the upper bound
> > here, across all configs and all hardware?
> >
> > Is VSMP the only arch which has ____cacheline_internodealigned_in_smp
> > larger than ____cacheline_aligned_in_smp?
>
> Ugh. We set internode aligned to 4k for all of x86_64.
!!! No. internode aligned is 4k only if CONFIG_X86_VSMP is chosen. The
internode line size defaults to SMP_CACHE_BYTES for all other machine types.
Please note that an INTERNODE_CACHE_SHIFT of 12 is defined under
#ifdef CONFIG_X86_VSMP in include/asm-x86_64/cache.h
>
> I believe this ups our worst case memory consumption for
> the array from 1M to 32M. Although the low end might be 2M.
> I can't recall if an irq_desc takes one cache line or two
> after we have put the cpu masks in it.
>
> My gut feel says that what we want to do is delay this until
> we are dynamically allocating the array members. Then we can at
> least have the chance of allocating the memory on the proper NUMA
> node, and won't need the extra NUMA alignment.
I was thinking on those lines as well. But, until we get there, can we have
this in as stop gap? The patch does not increase the memory footprint for
any other architecture other than vSMPowered systems.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] Pad irq_desc to internode cacheline size
2007-04-09 22:09 ` Ravikiran G Thirumalai
@ 2007-04-09 22:33 ` Eric W. Biederman
0 siblings, 0 replies; 5+ messages in thread
From: Eric W. Biederman @ 2007-04-09 22:33 UTC (permalink / raw)
To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel, shai
Ravikiran G Thirumalai <kiran@scalex86.org> writes:
> !!! No. internode aligned is 4k only if CONFIG_X86_VSMP is chosen. The
> internode line size defaults to SMP_CACHE_BYTES for all other machine types.
> Please note that an INTERNODE_CACHE_SHIFT of 12 is defined under
> #ifdef CONFIG_X86_VSMP in include/asm-x86_64/cache.h
You are right. I got confused by the single definition in
asm-x86_64/cache.h
Think you can rewrite that concept so it is readable and maintainable?
Maybe all in Kconfig like INTERNODE_CACHE_BYTES or all in an sub
architecture specific header?
>> I believe this ups our worst case memory consumption for
>> the array from 1M to 32M. Although the low end might be 2M.
>> I can't recall if an irq_desc takes one cache line or two
>> after we have put the cpu masks in it.
>>
>> My gut feel says that what we want to do is delay this until
>> we are dynamically allocating the array members. Then we can at
>> least have the chance of allocating the memory on the proper NUMA
>> node, and won't need the extra NUMA alignment.
>
> I was thinking on those lines as well. But, until we get there, can we have
> this in as stop gap? The patch does not increase the memory footprint for
> any other architecture other than vSMPowered systems.
I don't think so because it doesn't make sense, and we are talking
about extremely generic code. And several other architectures are
already using the INTERNODE_CACHE_SHIFT concept. So even if VSMP
is the only architecture that defines it today, it doesn't look like
the only architecture that will define it tomorrow.
If you are sufficient NUMA to care you will destroy your performance
by handling the irq on the wrong node. So it make no sense to have
an irq_desc that is optimized for handling an irq on the wrong node.
So until we can handle this cleanly let's not let your platform
specific pain and insanity spill into other architectures.
I'm about half way there towards a patchset to get irq_desc
dynamically allocated, and if can find a day or two in the next
couple of weeks I 2.6.23 sounds like a possibility. So this shouldn't
be something that is impossibly distant.
Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-04-09 22:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-09 19:56 [patch] Pad irq_desc to internode cacheline size Ravikiran G Thirumalai
2007-04-09 20:57 ` Andrew Morton
2007-04-09 21:47 ` Eric W. Biederman
2007-04-09 22:09 ` Ravikiran G Thirumalai
2007-04-09 22:33 ` Eric W. Biederman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.