From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Morton Date: Tue, 14 Nov 2006 10:05:17 +0000 Subject: Re: [PATCH] some irq_chip variables initiate end to point to NULL Message-Id: <20061114020517.2222dd08.akpm@osdl.org> List-Id: References: <1163495289.4311.68.camel@ymzhang-perf.sh.intel.com> In-Reply-To: <1163495289.4311.68.camel@ymzhang-perf.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable To: "Zhang, Yanmin" Cc: LKML , "linux-ia64@vger.kernel.org" , Ingo Molnar , Thomas Gleixner On Tue, 14 Nov 2006 17:08:10 +0800 "Zhang, Yanmin" wrote: > I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine. >=20 > Below is the log. >=20 > Oops 11012296146944 [1] > Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal proc= essor f > an container button sg eepro100 e100 mii >=20 > Pid: 0, CPU 0, comm: swapper > psr : 0000121008022038 ifs : 800000000000040b ip : [] = Not > tainted > ip is at __do_IRQ+0x371/0x3e0 > unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 > rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65 > ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f > csd : 0000000000000000 ssd : 0000000000000000 > b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00 > f6 : 000000000000000000000 f7 : 0ffe69090000000000000 > f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000 > f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909 > r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001 > r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8 > r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000 > r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8 > r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300 > r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0 > r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000 > r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c > r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0 >=20 > Call Trace: > [] show_stack+0x40/0xa0 > sp=A000000100bcb760 bsp=A000000100bc4f40 > [] show_regs+0x840/0x880 > sp=A000000100bcb930 bsp=A000000100bc4ee8 > [] die+0x250/0x320 > sp=A000000100bcb930 bsp=A000000100bc4ea0 > [] ia64_do_page_fault+0x8d0/0xa20 > sp=A000000100bcb950 bsp=A000000100bc4e50 > [] ia64_leave_kernel+0x0/0x290 > sp=A000000100bcb9e0 bsp=A000000100bc4e50 > [] __do_IRQ+0x370/0x3e0 > sp=A000000100bcbbb0 bsp=A000000100bc4df0 > [] ia64_handle_irq+0x170/0x220 > sp=A000000100bcbbb0 bsp=A000000100bc4dc0 > [] ia64_leave_kernel+0x0/0x290 > sp=A000000100bcbbb0 bsp=A000000100bc4dc0 > [] ia64_pal_call_static+0x90/0xc0 > sp=A000000100bcbd80 bsp=A000000100bc4d78 > [] default_idle+0x90/0x160 > sp=A000000100bcbd80 bsp=A000000100bc4d58 > [] cpu_idle+0x1f0/0x440 > sp=A000000100bcbe20 bsp=A000000100bc4d18 > [] rest_init+0xc0/0xe0 > sp=A000000100bcbe20 bsp=A000000100bc4d00 > [] start_kernel+0x6a0/0x6c0 > sp=A000000100bcbe20 bsp=A000000100bc4ca0 > [] __end_ivt_text+0x6d0/0x6f0 > sp=A000000100bcbe30 bsp=A000000100bc4c00 > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! >=20 >=20 > The root cause is that some irq_chip variables, especially ia64_msi_chip, > initiate their memeber end to point to NULL. __do_IRQ doesn't check > if irq_chip->end is null and just calls it after processing the interrupt. >=20 > As irq_chip->end is called at many places, so I fix it by reinitiating > irq_chip->end to dummy_irq_chip.end, e.g., a noop function. >=20 > Below patch against 2.6.19-rc5-mm1 fixes it. >=20 > Signed-off-by: Zhang Yanmin >=20 > --- >=20 > --- linux-2.6.19-rc5-mm1/kernel/irq/chip.c 2006-11-14 14:16:16.000000000 = +0800 > +++ linux-2.6.19-rc5-mm1_fix/kernel/irq/chip.c 2006-11-14 14:14:25.000000= 000 +0800 > @@ -233,6 +233,8 @@ void irq_chip_set_defaults(struct irq_ch > chip->shutdown =3D chip->disable; > if (!chip->name) > chip->name =3D chip->typename; > + if (!chip->end) > + chip->end =3D dummy_irq_chip.end; > } > =20 The same bug should be hitting in mainline, shouldn't it?