* RE: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
@ 2007-06-29 21:04 ` Luck, Tony
2007-06-29 22:30 ` Luck, Tony
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2007-06-29 21:04 UTC (permalink / raw)
To: linux-ia64
> Here is a series of patches for ia64 vector domain. By these patches, we can
> use more than 256 irqs. The patchset is based on existing x86-64 vector domain
> code. This is for 2.6.22-rc5 and I tested them on my ia64 box.
There are a few whitespace issues (<space><tab>) amongst these patches (I think
in parts 6 & 13). Here's the summary of the broken bits:
< + unsigned int gsi_base; /* GSI base */
< + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
---
> + unsigned int gsi_base; /* GSI base */
> + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
< + if (irq < 0)
---
> + if (irq < 0)
< +#define IRQ_RSVD (2)
---
> +#define IRQ_RSVD (2)
< + return vector;
---
> + return vector;
I also got two build errors which broke several configurations:
tiger-up (this is arch/ia64/configs/tiger_defconfig with CONFIG_SMP deleted)
arch/ia64/kernel/irq_ia64.c: In function `parse_vector_domain':
arch/ia64/kernel/irq_ia64.c:270: error: `no_int_routing' undeclared (first use in this function)
arch/ia64/kernel/irq_ia64.c:270: error: (Each undeclared identifier is reported only once
arch/ia64/kernel/irq_ia64.c:270: error: for each function it appears in.)
make[1]: *** [arch/ia64/kernel/irq_ia64.o] Error 1
generic-up (arch/ia64/defconfig with CONFIG_SMP deleted)
same error as tiger-up
generic-smp (arch/ia64-defconfig)
CC arch/ia64/kernel/asm-offsets.s
In file included from include/linux/hardirq.h:7,
from include/linux/interrupt.h:11,
from include/asm-ia64/mca.h:16,
from arch/ia64/kernel/asm-offsets.c:15:
include/asm/hardirq.h:30:3: #error HARDIRQ_BITS is too low!
make[1]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
sn2-smp (arch/ia64/configs/sn2_defconfig)
same error as generic-smp
generic-sparse (arch/ia64/configs/gensparse_defconfig)
same error as generic-smp
allnoconfig (make allnoconfig)
same error as tiger-up
Booting the arch/ia64/configs/tiger_defconfig kernel on my 4-socket Montecito tiger platform, I
get an almost immediate oops. I don't have the full stack backtrace, but the highlights were:
die
ia64_do_page_fault
ia64_leave_kernel
ia64_handle_irq
ia64_leave_kernel
unlock_ipi_calllock
start_secondary
I tried both with and without the new "vector=percpu" boot option (which needs to be
documented in Documentation/kernel-parameters.txt), but it dies with the same
stack trace both ways.
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread* RE: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
2007-06-29 21:04 ` Luck, Tony
@ 2007-06-29 22:30 ` Luck, Tony
2007-06-29 23:20 ` Luck, Tony
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2007-06-29 22:30 UTC (permalink / raw)
To: linux-ia64
> arch/ia64/kernel/irq_ia64.c: In function `parse_vector_domain':
> arch/ia64/kernel/irq_ia64.c:270: error: `no_int_routing' undeclared (first use in this function)
Fix for this is just:
diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index 8efb6e1..91abd1b 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -248,7 +248,7 @@ void __setup_vector_irq(int cpu)
}
}
-#if defined(CONFIG_IA64_GENERIC) || defined(CONFIG_IA64_DIG)
+#if defined(CONFIG_SMP) && (defined(CONFIG_IA64_GENERIC) || defined(CONFIG_IA64_DIG))
static enum vector_domain_type {
VECTOR_DOMAIN_NONE,
VECTOR_DOMAIN_PERCPU
-Tony
^ permalink raw reply related [flat|nested] 9+ messages in thread* RE: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
2007-06-29 21:04 ` Luck, Tony
2007-06-29 22:30 ` Luck, Tony
@ 2007-06-29 23:20 ` Luck, Tony
2007-07-02 10:26 ` Yasuaki Ishimatsu
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2007-06-29 23:20 UTC (permalink / raw)
To: linux-ia64
> include/asm/hardirq.h:30:3: #error HARDIRQ_BITS is too low!
This one is a direct consequence of the new definition of NR_IRQS:
#define NR_IRQS (NR_VECTORS + 32 * NR_CPUS))
With a large NR_CPUS value, this gets too big. Do we really need to scale
it with the number of cpus? I don't think this is the right thing to do.
While large cpu count systems may also have a large number of I/O devices,
the two parameters aren't strongly connected.
We could prevent it blowing up by doing:
/* NR_IRQS is limited by HARDIRQ_BITS */
#if (NR_VECTORS + 32 * NR_CPUS)) < 16363
#define NR_IRQS (NR_VECTORS + 32 * NR_CPUS))
#else
#define NR_IRQS 16383
#endif
But that looks rather ugly and still fails to build because of overflow of the percpu area.
This gets big because of include/linux/kernel_stat.h:
DECLARE_PERCPU(struct kernel_stat, kstat);
With the current allocation of percpu stuff, it looks like we can push NR_IRQS up
to around 7.5K, but that would leave no space for other additions to percpu space.
So if large systems are going to need as many as 7.5K IRQs, then we'll also need to
do something about kstat.
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
` (2 preceding siblings ...)
2007-06-29 23:20 ` Luck, Tony
@ 2007-07-02 10:26 ` Yasuaki Ishimatsu
2007-07-04 3:14 ` Yasuaki Ishimatsu
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Yasuaki Ishimatsu @ 2007-07-02 10:26 UTC (permalink / raw)
To: linux-ia64
Luck, Tony wrote:
>> Here is a series of patches for ia64 vector domain. By these patches, we can
>> use more than 256 irqs. The patchset is based on existing x86-64 vector domain
>> code. This is for 2.6.22-rc5 and I tested them on my ia64 box.
>
> There are a few whitespace issues (<space><tab>) amongst these patches (I think
> in parts 6 & 13). Here's the summary of the broken bits:
Sorry. I'll search the whitespace in my patchset and fix them by posting
take3 patchset for supporting vector domain.
> < + unsigned int gsi_base; /* GSI base */
> < + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
> ---
>> + unsigned int gsi_base; /* GSI base */
>> + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
>
> < + if (irq < 0)
> ---
>> + if (irq < 0)
>
> < +#define IRQ_RSVD (2)
> ---
>> +#define IRQ_RSVD (2)
>
> < + return vector;
> ---
>> + return vector;
>
> I also got two build errors which broke several configurations:
>
> tiger-up (this is arch/ia64/configs/tiger_defconfig with CONFIG_SMP deleted)
>
> arch/ia64/kernel/irq_ia64.c: In function `parse_vector_domain':
> arch/ia64/kernel/irq_ia64.c:270: error: `no_int_routing' undeclared (first use in this function)
> arch/ia64/kernel/irq_ia64.c:270: error: (Each undeclared identifier is reported only once
> arch/ia64/kernel/irq_ia64.c:270: error: for each function it appears in.)
> make[1]: *** [arch/ia64/kernel/irq_ia64.o] Error 1
>
> generic-up (arch/ia64/defconfig with CONFIG_SMP deleted)
>
> same error as tiger-up
>
> generic-smp (arch/ia64-defconfig)
>
> CC arch/ia64/kernel/asm-offsets.s
> In file included from include/linux/hardirq.h:7,
> from include/linux/interrupt.h:11,
> from include/asm-ia64/mca.h:16,
> from arch/ia64/kernel/asm-offsets.c:15:
> include/asm/hardirq.h:30:3: #error HARDIRQ_BITS is too low!
> make[1]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
>
>
> sn2-smp (arch/ia64/configs/sn2_defconfig)
>
> same error as generic-smp
>
> generic-sparse (arch/ia64/configs/gensparse_defconfig)
>
> same error as generic-smp
>
> allnoconfig (make allnoconfig)
>
> same error as tiger-up
>
I tested tiger_defconfig only. So I didn't notice this problem.
I confirmed that my patchset makes this problem and your following patch fixed it.
http://www.spinics.net/lists/linux-ia64/msg03352.html
Thanks tony,
>
> Booting the arch/ia64/configs/tiger_defconfig kernel on my 4-socket Montecito tiger platform, I
> get an almost immediate oops. I don't have the full stack backtrace, but the highlights were:
>
> die
> ia64_do_page_fault
> ia64_leave_kernel
> ia64_handle_irq
> ia64_leave_kernel
> unlock_ipi_calllock
> start_secondary
>
> I tried both with and without the new "vector=percpu" boot option (which needs to be
> documented in Documentation/kernel-parameters.txt), but it dies with the same
> stack trace both ways.
I have never seen these messages. I will check them.
Regards,
Yasuaki Ishimatsu
> -Tony
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
` (3 preceding siblings ...)
2007-07-02 10:26 ` Yasuaki Ishimatsu
@ 2007-07-04 3:14 ` Yasuaki Ishimatsu
2007-07-05 11:55 ` Yasuaki Ishimatsu
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Yasuaki Ishimatsu @ 2007-07-04 3:14 UTC (permalink / raw)
To: linux-ia64
Luck, Tony wrote:
>> include/asm/hardirq.h:30:3: #error HARDIRQ_BITS is too low!
>
> This one is a direct consequence of the new definition of NR_IRQS:
>
> #define NR_IRQS (NR_VECTORS + 32 * NR_CPUS))
>
> With a large NR_CPUS value, this gets too big. Do we really need to scale
> it with the number of cpus? I don't think this is the right thing to do.
> While large cpu count systems may also have a large number of I/O devices,
> the two parameters aren't strongly connected.
>
> We could prevent it blowing up by doing:
>
> /* NR_IRQS is limited by HARDIRQ_BITS */
> #if (NR_VECTORS + 32 * NR_CPUS)) < 16363
> #define NR_IRQS (NR_VECTORS + 32 * NR_CPUS))
> #else
> #define NR_IRQS 16383
> #endif
I think if the number of CPUs is small, the value of NR_IRQS depends
on the value of NR_CPUS. But, if the number of CPUs is large, this
relation becomes weak. Because, in general machine composition, the
number of I/O devices doesn't increase proportionally even if the
number of CPUs increases. So, this threshold such as 16383 is too
large. I'd like to define NR_IRQS as follows.
#if (NR_VECTORS + 32 * NR_CPUS)) < 1024
#define NR_IRQS (NR_VECTORS + 32 * NR_CPUS))
#else
#define NR_IRQS 1024
#endif
And, I will make NR_IRQS a tunable parameter which user can define as
boot parameter.
Thanks,
Yasuaki Ishimatsu
>
> But that looks rather ugly and still fails to build because of overflow of the percpu area.
> This gets big because of include/linux/kernel_stat.h:
>
> DECLARE_PERCPU(struct kernel_stat, kstat);
>
> With the current allocation of percpu stuff, it looks like we can push NR_IRQS up
> to around 7.5K, but that would leave no space for other additions to percpu space.
> So if large systems are going to need as many as 7.5K IRQs, then we'll also need to
> do something about kstat.
>
> -Tony
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
` (4 preceding siblings ...)
2007-07-04 3:14 ` Yasuaki Ishimatsu
@ 2007-07-05 11:55 ` Yasuaki Ishimatsu
2007-07-10 18:12 ` Luck, Tony
2007-07-10 23:36 ` Luck, Tony
7 siblings, 0 replies; 9+ messages in thread
From: Yasuaki Ishimatsu @ 2007-07-05 11:55 UTC (permalink / raw)
To: linux-ia64
Hi Tony,
Luck, Tony wrote:
>> Here is a series of patches for ia64 vector domain. By these patches, we can
>> use more than 256 irqs. The patchset is based on existing x86-64 vector domain
>> code. This is for 2.6.22-rc5 and I tested them on my ia64 box.
>
> There are a few whitespace issues (<space><tab>) amongst these patches (I think
> in parts 6 & 13). Here's the summary of the broken bits:
>
> < + unsigned int gsi_base; /* GSI base */
> < + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
> ---
>> + unsigned int gsi_base; /* GSI base */
>> + unsigned short num_rte; /* # of RTEs on this IOSAPIC */
>
> < + if (irq < 0)
> ---
>> + if (irq < 0)
>
> < +#define IRQ_RSVD (2)
> ---
>> +#define IRQ_RSVD (2)
>
> < + return vector;
> ---
>> + return vector;
>
> I also got two build errors which broke several configurations:
>
> tiger-up (this is arch/ia64/configs/tiger_defconfig with CONFIG_SMP deleted)
>
> arch/ia64/kernel/irq_ia64.c: In function `parse_vector_domain':
> arch/ia64/kernel/irq_ia64.c:270: error: `no_int_routing' undeclared (first use in this function)
> arch/ia64/kernel/irq_ia64.c:270: error: (Each undeclared identifier is reported only once
> arch/ia64/kernel/irq_ia64.c:270: error: for each function it appears in.)
> make[1]: *** [arch/ia64/kernel/irq_ia64.o] Error 1
>
> generic-up (arch/ia64/defconfig with CONFIG_SMP deleted)
>
> same error as tiger-up
>
> generic-smp (arch/ia64-defconfig)
>
> CC arch/ia64/kernel/asm-offsets.s
> In file included from include/linux/hardirq.h:7,
> from include/linux/interrupt.h:11,
> from include/asm-ia64/mca.h:16,
> from arch/ia64/kernel/asm-offsets.c:15:
> include/asm/hardirq.h:30:3: #error HARDIRQ_BITS is too low!
> make[1]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
>
>
> sn2-smp (arch/ia64/configs/sn2_defconfig)
>
> same error as generic-smp
>
> generic-sparse (arch/ia64/configs/gensparse_defconfig)
>
> same error as generic-smp
>
> allnoconfig (make allnoconfig)
>
> same error as tiger-up
>
>
>
> Booting the arch/ia64/configs/tiger_defconfig kernel on my 4-socket Montecito tiger platform, I
> get an almost immediate oops. I don't have the full stack backtrace, but the highlights were:
>
> die
> ia64_do_page_fault
> ia64_leave_kernel
> ia64_handle_irq
> ia64_leave_kernel
> unlock_ipi_calllock
> start_secondary
>
> I tried both with and without the new "vector=percpu" boot option (which needs to be
> documented in Documentation/kernel-parameters.txt), but it dies with the same
> stack trace both ways.
>
I tried it on my 4-socket Montecito machine, but didn't happen.
Could you send me your .config file and entire stack trace ?
Thanks,
Yasuaki Ishimatsu
> -Tony
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread* RE: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
` (5 preceding siblings ...)
2007-07-05 11:55 ` Yasuaki Ishimatsu
@ 2007-07-10 18:12 ` Luck, Tony
2007-07-10 23:36 ` Luck, Tony
7 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2007-07-10 18:12 UTC (permalink / raw)
To: linux-ia64
> I have never seen these messages. I will check them.
This seems to be an intermittent problem. I just got my serial console
working again to get you a full stack trace, and the first time I booted the
kernel with your vector domain patch it didn't crash.
Second boot did crash. Here's the stack trace part of the console log:
CPU 9: synchronized ITC with CPU 0 (last diff 0 cycles, maxerr 159 cycles)
CPU 10: synchronized ITC with CPU 0 (last diff 0 cycles, maxerr 40 cycles)
CPU 11: synchronized ITC with CPU 0 (last diff 0 cycles, maxerr 161 cycles)
Unable to handle kernel paging request at virtual address a000008100974780
swapper[0]: Oops 8813272891392 [1]
Modules linked in:
Pid: 0, CPU 12, comm: swapper
psr : 0000101008022018 ifs : 8000000000000389 ip : [<a000000100011c50>] Not tainted
ip is at ia64_handle_irq+0x190/0x2a0
unat: 0000000000000000 pfs : 03e0000000000389 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 00000000000095a5
ldrs: 0000000000000000 ccv : 0000000000000fff fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a000000100011b10 b6 : a000000100003320 b7 : e00000007fb1bde0
f6 : 000000000000000000000 f7 : 1003e0000000155557000
f8 : 000000000000000000000 f9 : 000000000000000000000
f10 : 000000000000000000000 f11 : 000000000000000000000
r1 : a000000100c2db30 r2 : 0000000000010000 r3 : 0000000000000000
r8 : 00000000000000fd r9 : 00000000000000fc r10 : 0000000000000000
r11 : 0009804c0270033f r12 : e000000180eafc60 r13 : e000000180ea0000
r14 : ffffffffffff4b00 r15 : ffffffffffff0428 r16 : 00000000ffffffff
r17 : a000000100a45c58 r18 : e00000007fe131b0 r19 : e00000007fb1bde0
r20 : a000000100a4b6a0 r21 : a000000100970000 r22 : a000008100974780
r23 : 0000007fffffff80 r24 : a000000100974800 r25 : a000000100a2ef30
r26 : e0000001800c004c r27 : 000000000000004c r28 : e0000001800c0000
r29 : ffffffffffff0000 r30 : e0000001800d0000 r31 : a000000100a44ea0
Call Trace:
[<a000000100012bb0>] show_stack+0x50/0xa0
spà00000180eaf830 bspà00000180ea0d28
[<a000000100013480>] show_regs+0x820/0x840
spà00000180eafa00 bspà00000180ea0ce0
[<a0000001000370c0>] die+0x1a0/0x280
spà00000180eafa00 bspà00000180ea0c98
[<a000000100061230>] ia64_do_page_fault+0x810/0x900
spà00000180eafa00 bspà00000180ea0c38
[<a00000010000bcc0>] ia64_leave_kernel+0x0/0x270
spà00000180eafa90 bspà00000180ea0c38
[<a000000100011c50>] ia64_handle_irq+0x190/0x2a0
spà00000180eafc60 bspà00000180ea0be8
[<a00000010000bcc0>] ia64_leave_kernel+0x0/0x270
spà00000180eafc60 bspà00000180ea0be8
[<a0000001000541d0>] unlock_ipi_calllock+0x30/0x60
spà00000180eafe30 bspà00000180ea0bd0
[<a000000100055fb0>] start_secondary+0x2d0/0x580
spà00000180eafe30 bspà00000180ea0b80
[<a0000001000089e0>] __end_ivt_text+0x6c0/0x6f0
spà00000180eafe30 bspà00000180ea0b80
Kernel panic - not syncing: Aiee, killing interrupt handler!
The bit of ia64_handle_irq where we crashed looks like this:
a000000100011c30: 02 80 00 34 10 10 [MII] ld4 r16=[r26]
a000000100011c36: 00 00 00 02 00 e0 nop.i 0x0;;
a000000100011c3c: 02 c1 7d 53 dep.z r23=r16,7,32
a000000100011c40: 03 00 00 00 01 00 [MII] nop.m 0x0
a000000100011c46: 70 02 40 00 42 c0 mov r39=r16;;
a000000100011c4c: 72 c1 00 80 add r22=r23,r24;;
a000000100011c50: 0d 70 00 2c 18 10 [MFI] ld8 r14=[r22]
r22 has the very bad value of 0xa000008100974780 ... so we die.
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread* RE: [RFC][PATCH take2 0/13] Support vector domain on ia64
2007-06-19 8:13 [RFC][PATCH take2 0/13] Support vector domain on ia64 Yasuaki Ishimatsu
` (6 preceding siblings ...)
2007-07-10 18:12 ` Luck, Tony
@ 2007-07-10 23:36 ` Luck, Tony
7 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2007-07-10 23:36 UTC (permalink / raw)
To: linux-ia64
> This seems to be an intermittent problem.
One more note on that ... my test system often kicks out some
"Unexpected irq vector 0x13 on CPU xx!"
messages as it boots. Looking at the logs of previous boots, I see between
zero and two such messages in the last few dozen boots, When the
kernel with the vector-domain patch applied successfully booted this
morning, this was one of the times where there were no unexpected
interrupts.
Since the crash looks like it occurred while dereferencing the "desc = irq_desq + irq"
in generic_handle_irq(), I think this is most likely related.
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread