* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared [not found] <4ABBB8C2.2080901@sbg.ac.at> @ 2009-09-24 19:24 ` Frans Pop 2009-09-24 19:30 ` Alexander Huemer 2009-09-24 19:40 ` Frans Pop 0 siblings, 2 replies; 29+ messages in thread From: Frans Pop @ 2009-09-24 19:24 UTC (permalink / raw) To: Alexander Huemer; +Cc: linux-kernel, linux-ide Adding linux-ide to CC. Alexander Huemer wrote: > the problem appears under heavy system load and slows down the system to > unusable speed. > kernels before .30 were not affected. > irqpoll does not change behavior. > > error message from .31: > [157152.418524] irq 23: nobody cared (try booting with the "irqpoll" option) > [157152.418530] Pid: 1359, comm: cc1plus Tainted: G W 2.6.31-gentoo-blackbit #2 > [157152.418532] Call Trace: > [157152.418534] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d > [157152.418544] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 > [157152.418547] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa > [157152.418551] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d > [157152.418554] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 > [157152.418558] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa > [157152.418559] <EOI> > [157152.418560] handlers: > [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) > [157152.418566] Disabling IRQ #23 > > bios of the machine is up to date, > i tried all related bios settings, no change. > > kernel config for .31 http://xx.vu/~ahuemer/config_ahuemer_20090923.gz > lspci -vxxx http://xx.vu/~ahuemer/lspci_ahuemer_20090923 > lsusb -v http://xx.vu/~ahuemer/lsusb_ahuemer_20090923 > /proc/interrupts http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923 > thread in gentoo forums http://forums.gentoo.org/viewtopic-t-780725-start-0.html > > please tell me what additional info is needed. A full dmesg (or kernel log) starting from a clean boot up to the error could be useful. If no others reply and the issue can be reproduced reliably, running a git bisect between v2.6.29 and v2.6.30 to trace the cause of the regression could be an option. Cheers, FJP ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-24 19:24 ` 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Frans Pop @ 2009-09-24 19:30 ` Alexander Huemer 2009-09-24 19:40 ` Frans Pop 1 sibling, 0 replies; 29+ messages in thread From: Alexander Huemer @ 2009-09-24 19:30 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer Frans Pop wrote: > Adding linux-ide to CC. > > Alexander Huemer wrote: >> the problem appears under heavy system load and slows down the system to >> unusable speed. >> kernels before .30 were not affected. >> irqpoll does not change behavior. >> >> error message from .31: >> [157152.418524] irq 23: nobody cared (try booting with the "irqpoll" option) >> [157152.418530] Pid: 1359, comm: cc1plus Tainted: G W 2.6.31-gentoo-blackbit #2 >> [157152.418532] Call Trace: >> [157152.418534] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d >> [157152.418544] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 >> [157152.418547] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa >> [157152.418551] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d >> [157152.418554] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 >> [157152.418558] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa >> [157152.418559] <EOI> >> [157152.418560] handlers: >> [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) >> [157152.418566] Disabling IRQ #23 >> >> bios of the machine is up to date, >> i tried all related bios settings, no change. >> >> kernel config for .31 http://xx.vu/~ahuemer/config_ahuemer_20090923.gz >> lspci -vxxx http://xx.vu/~ahuemer/lspci_ahuemer_20090923 >> lsusb -v http://xx.vu/~ahuemer/lsusb_ahuemer_20090923 >> /proc/interrupts http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923 >> thread in gentoo forums http://forums.gentoo.org/viewtopic-t-780725-start-0.html >> >> please tell me what additional info is needed. > > A full dmesg (or kernel log) starting from a clean boot up to the error > could be useful. > > If no others reply and the issue can be reproduced reliably, running a > git bisect between v2.6.29 and v2.6.30 to trace the cause of the regression > could be an option. > > Cheers, > FJP http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 i rebootet and try to reproduce the error. the last time the problem appeared during compilation of gcc-4.3.4. regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-24 19:24 ` 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Frans Pop 2009-09-24 19:30 ` Alexander Huemer @ 2009-09-24 19:40 ` Frans Pop 2009-09-24 19:43 ` Alexander Huemer 2009-09-25 0:02 ` Alexander Huemer 1 sibling, 2 replies; 29+ messages in thread From: Frans Pop @ 2009-09-24 19:40 UTC (permalink / raw) To: Alexander Huemer; +Cc: linux-kernel, linux-ide On Thursday 24 September 2009, Frans Pop wrote: > > error message from .31: > > [157152.418524] irq 23: nobody cared > > If no others reply and the issue can be reproduced reliably, running a > git bisect between v2.6.29 and v2.6.30 to trace the cause of the > regression could be an option. Looking at the changes in drivers/ata/ahci.c, it might be worth to try if reverting the following commit fixes the issue: commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 Author: Tejun Heo <tj@kernel.org> Date: Fri Jan 23 11:31:39 2009 +0900 ahci: drop intx manipulation on msi enable It's a bit of a wild guess though. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-24 19:40 ` Frans Pop @ 2009-09-24 19:43 ` Alexander Huemer 2009-09-25 0:02 ` Alexander Huemer 1 sibling, 0 replies; 29+ messages in thread From: Alexander Huemer @ 2009-09-24 19:43 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide Frans Pop wrote: > On Thursday 24 September 2009, Frans Pop wrote: > >>> error message from .31: >>> [157152.418524] irq 23: nobody cared >>> >> If no others reply and the issue can be reproduced reliably, running a >> git bisect between v2.6.29 and v2.6.30 to trace the cause of the >> regression could be an option. >> > > Looking at the changes in drivers/ata/ahci.c, it might be worth to try if > reverting the following commit fixes the issue: > > commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 > Author: Tejun Heo <tj@kernel.org> > Date: Fri Jan 23 11:31:39 2009 +0900 > > ahci: drop intx manipulation on msi enable > > It's a bit of a wild guess though. > thanks for the hint. i'll wait for the end of the compilation of gcc-4.3.4. that will take ~ 45m. afterwards i'll check out the kernel sources from git and try the revert. many thanks till then. -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-24 19:40 ` Frans Pop 2009-09-24 19:43 ` Alexander Huemer @ 2009-09-25 0:02 ` Alexander Huemer 2009-09-25 11:28 ` Alexander Huemer 1 sibling, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-09-25 0:02 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer Frans Pop wrote: > On Thursday 24 September 2009, Frans Pop wrote: >>> error message from .31: >>> [157152.418524] irq 23: nobody cared >> If no others reply and the issue can be reproduced reliably, running a >> git bisect between v2.6.29 and v2.6.30 to trace the cause of the >> regression could be an option. > > Looking at the changes in drivers/ata/ahci.c, it might be worth to try if > reverting the following commit fixes the issue: > > commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 > Author: Tejun Heo <tj@kernel.org> > Date: Fri Jan 23 11:31:39 2009 +0900 > > ahci: drop intx manipulation on msi enable > > It's a bit of a wild guess though. i reproduced the issue. [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G W 2.6.31-gentoo-blackbit #2 [ 3486.747731] Call Trace: [ 3486.747733] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d [ 3486.747743] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 [ 3486.747746] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa [ 3486.747750] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d [ 3486.747752] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 [ 3486.747756] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa [ 3486.747758] <EOI> [ 3486.747759] handlers: [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) [ 3486.747765] Disabling IRQ #23 i will report back after a compile run of gcc-4.3.4 with a kernel without the commit you suggested. -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-25 0:02 ` Alexander Huemer @ 2009-09-25 11:28 ` Alexander Huemer 2009-09-25 12:24 ` Frans Pop 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-09-25 11:28 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer Alexander Huemer wrote: > Frans Pop wrote: >> On Thursday 24 September 2009, Frans Pop wrote: >>>> error message from .31: >>>> [157152.418524] irq 23: nobody cared >>> If no others reply and the issue can be reproduced reliably, running a >>> git bisect between v2.6.29 and v2.6.30 to trace the cause of the >>> regression could be an option. >> Looking at the changes in drivers/ata/ahci.c, it might be worth to try if >> reverting the following commit fixes the issue: >> >> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 >> Author: Tejun Heo <tj@kernel.org> >> Date: Fri Jan 23 11:31:39 2009 +0900 >> >> ahci: drop intx manipulation on msi enable >> >> It's a bit of a wild guess though. > i reproduced the issue. > > [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G W > 2.6.31-gentoo-blackbit #2 > [ 3486.747731] Call Trace: > [ 3486.747733] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d > [ 3486.747743] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 > [ 3486.747746] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa > [ 3486.747750] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d > [ 3486.747752] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 > [ 3486.747756] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa > [ 3486.747758] <EOI> > [ 3486.747759] handlers: > [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) > [ 3486.747765] Disabling IRQ #23 > > i will report back after a compile run of gcc-4.3.4 with a kernel > without the commit you suggested. > > -alex 4 compilation runs of gcc-4.3.4 finished without the issue re-appearing. it seems like you guessed right, Frans. i also found this: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc i'll report on bugzilla. thanks for the help. -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-25 11:28 ` Alexander Huemer @ 2009-09-25 12:24 ` Frans Pop 2009-09-25 12:27 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Frans Pop @ 2009-09-25 12:24 UTC (permalink / raw) To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, stable On Friday 25 September 2009, Alexander Huemer wrote: > Alexander Huemer wrote: > > Frans Pop wrote: > >> On Thursday 24 September 2009, Frans Pop wrote: > >>>> error message from .31: > >>>> [157152.418524] irq 23: nobody cared > >> > >> Looking at the changes in drivers/ata/ahci.c, it might be worth to > >> try if reverting the following commit fixes the issue: > >> > >> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 > >> Author: Tejun Heo <tj@kernel.org> > >> Date: Fri Jan 23 11:31:39 2009 +0900 > >> > >> ahci: drop intx manipulation on msi enable > > > > i reproduced the issue. > > > > [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G W 2.6.31-gentoo-blackbit #2 > > [ 3486.747731] Call Trace: > > [ 3486.747733] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d > > [ 3486.747743] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 > > [ 3486.747746] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa > > [ 3486.747750] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d > > [ 3486.747752] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 > > [ 3486.747756] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa > > [ 3486.747758] <EOI> > > [ 3486.747759] handlers: > > [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) > > [ 3486.747765] Disabling IRQ #23 > > > > i will report back after a compile run of gcc-4.3.4 with a kernel > > without the commit you suggested. > > 4 compilation runs of gcc-4.3.4 finished without the issue re-appearing. > it seems like you guessed right, Frans. Great. Glad to hear it worked out. > i also found this: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi >t;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc i'll report on bugzilla. So with the revert already in mainline for .32, the only thing left is for that to get included in stable updates for .30 and .31. Cheers, FJP ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-25 12:24 ` Frans Pop @ 2009-09-25 12:27 ` Alexander Huemer 2009-09-25 12:48 ` Frans Pop 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-09-25 12:27 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable Frans Pop wrote: > On Friday 25 September 2009, Alexander Huemer wrote: > >> Alexander Huemer wrote: >> >>> Frans Pop wrote: >>> >>>> On Thursday 24 September 2009, Frans Pop wrote: >>>> >>>>>> error message from .31: >>>>>> [157152.418524] irq 23: nobody cared >>>>>> >>>> Looking at the changes in drivers/ata/ahci.c, it might be worth to >>>> try if reverting the following commit fixes the issue: >>>> >>>> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246 >>>> Author: Tejun Heo <tj@kernel.org> >>>> Date: Fri Jan 23 11:31:39 2009 +0900 >>>> >>>> ahci: drop intx manipulation on msi enable >>>> >>> i reproduced the issue. >>> >>> [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G W 2.6.31-gentoo-blackbit #2 >>> [ 3486.747731] Call Trace: >>> [ 3486.747733] <IRQ> [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d >>> [ 3486.747743] [<ffffffff81066f93>] ? note_interrupt+0x107/0x170 >>> [ 3486.747746] [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa >>> [ 3486.747750] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d >>> [ 3486.747752] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 >>> [ 3486.747756] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa >>> [ 3486.747758] <EOI> >>> [ 3486.747759] handlers: >>> [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426) >>> [ 3486.747765] Disabling IRQ #23 >>> >>> i will report back after a compile run of gcc-4.3.4 with a kernel >>> without the commit you suggested. >>> >> 4 compilation runs of gcc-4.3.4 finished without the issue re-appearing. >> it seems like you guessed right, Frans. >> > > Great. Glad to hear it worked out. > > >> i also found this: >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi >> t;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc i'll report on bugzilla. >> > > So with the revert already in mainline for .32, the only thing left is for > that to get included in stable updates for .30 and .31. > > Cheers, > FJP > please see the last comment in [1]. can i do anything else to help ? thanks again -alex [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-25 12:27 ` Alexander Huemer @ 2009-09-25 12:48 ` Frans Pop 2009-10-08 12:00 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Frans Pop @ 2009-09-25 12:48 UTC (permalink / raw) To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, stable On Friday 25 September 2009, Alexander Huemer wrote: > > So with the revert already in mainline for .32, the only thing left is > > for that to get included in stable updates for .30 and .31. > > please see the last comment in [1]. > can i do anything else to help ? > [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124 Yes, adding that comment was excellent. I also added the relevant people in the CC of my previous mail, so it should get taken care of now. Unless they have additional questions no further action from you should be needed. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-09-25 12:48 ` Frans Pop @ 2009-10-08 12:00 ` Alexander Huemer 2009-10-09 21:30 ` Alexander Huemer 2009-10-10 13:13 ` Frans Pop 0 siblings, 2 replies; 29+ messages in thread From: Alexander Huemer @ 2009-10-08 12:00 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable Frans Pop wrote: > On Friday 25 September 2009, Alexander Huemer wrote: >>> So with the revert already in mainline for .32, the only thing left is >>> for that to get included in stable updates for .30 and .31. >> please see the last comment in [1]. >> can i do anything else to help ? > >> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124 > > Yes, adding that comment was excellent. I also added the relevant people > in the CC of my previous mail, so it should get taken care of now. Unless > they have additional questions no further action from you should be > needed. it seems like the problem is _not_ solved. i just booted with 2.6.31.3. 2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches. did the usual verification (compilation of gcc-4.3.4), and got this again: [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" option) [ 1018.059734] Pid: 8656, comm: sh Tainted: G W 2.6.31-gentoo-r2-blackbit #1 [ 1018.059736] Call Trace: [ 1018.059738] <IRQ> [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d [ 1018.059748] [<ffffffff81067023>] ? note_interrupt+0x107/0x170 [ 1018.059751] [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa [ 1018.059755] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d [ 1018.059757] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 [ 1018.059761] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa [ 1018.059762] <EOI> [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef [ 1018.059769] [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef [ 1018.059773] [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30 [ 1018.059776] [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30 [ 1018.059777] handlers: [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) [ 1018.059783] Disabling IRQ #23 so in my opinion reverting commit [1] with commit [2] missed the point. please comment. -alex [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246 [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-08 12:00 ` Alexander Huemer @ 2009-10-09 21:30 ` Alexander Huemer 2009-10-10 13:13 ` Frans Pop 1 sibling, 0 replies; 29+ messages in thread From: Alexander Huemer @ 2009-10-09 21:30 UTC (permalink / raw) To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable Alexander Huemer wrote: > Frans Pop wrote: >> On Friday 25 September 2009, Alexander Huemer wrote: >>>> So with the revert already in mainline for .32, the only thing left is >>>> for that to get included in stable updates for .30 and .31. >>> please see the last comment in [1]. >>> can i do anything else to help ? >>> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124 >> Yes, adding that comment was excellent. I also added the relevant people >> in the CC of my previous mail, so it should get taken care of now. Unless >> they have additional questions no further action from you should be >> needed. > it seems like the problem is _not_ solved. > i just booted with 2.6.31.3. > 2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches. > did the usual verification (compilation of gcc-4.3.4), > and got this again: > > [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" > option) > [ 1018.059734] Pid: 8656, comm: sh Tainted: G W > 2.6.31-gentoo-r2-blackbit #1 > [ 1018.059736] Call Trace: > [ 1018.059738] <IRQ> [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d > [ 1018.059748] [<ffffffff81067023>] ? note_interrupt+0x107/0x170 > [ 1018.059751] [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa > [ 1018.059755] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d > [ 1018.059757] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 > [ 1018.059761] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa > [ 1018.059762] <EOI> [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef > [ 1018.059769] [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef > [ 1018.059773] [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30 > [ 1018.059776] [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30 > [ 1018.059777] handlers: > [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) > [ 1018.059783] Disabling IRQ #23 > > so in my opinion reverting commit [1] with commit [2] missed the point. > please comment. > > -alex > > [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246 > [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc > i hope i do not annoy anybody by posting again, but i am afraid my last message was not noticed by anybody. is there something i don't know but should ? as it seems the problem is still existing. i would be happy do test whatever is needed to trace the problem. please respond. regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-08 12:00 ` Alexander Huemer 2009-10-09 21:30 ` Alexander Huemer @ 2009-10-10 13:13 ` Frans Pop 2009-10-11 20:57 ` Alexander Huemer 2009-10-12 7:49 ` Tejun Heo 1 sibling, 2 replies; 29+ messages in thread From: Frans Pop @ 2009-10-10 13:13 UTC (permalink / raw) To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, Jeff Garzik (dropped stable from CC) On Thursday 08 October 2009, you wrote: > Frans Pop wrote: > > On Friday 25 September 2009, Alexander Huemer wrote: > >>> So with the revert already in mainline for .32, the only thing left > >>> is for that to get included in stable updates for .30 and .31. > >> > >> please see the last comment in [1]. > >> can i do anything else to help ? > >> > >> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124 > > it seems like the problem is _not_ solved. > i just booted with 2.6.31.3. > 2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches. I don't know what vanilla-2.6.31-r2 is, but I assume it's based on either 2.6.31.3 or 2.6.31.2. > did the usual verification (compilation of gcc-4.3.4), > so in my opinion reverting commit [1] with commit [2] missed the point. > > [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246 > [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc The most likely explanation is that your earlier test from which you concluded that the revert did fix the problem was incorrect. It seems unlikely that some other stable commit interferes here. So basically we're back where we started. > [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" option) > [ 1018.059734] Pid: 8656, comm: sh Tainted: G W 2.6.31-gentoo-r2-blackbit #1 > [ 1018.059736] Call Trace: > [ 1018.059738] <IRQ> [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d > [ 1018.059748] [<ffffffff81067023>] ? note_interrupt+0x107/0x170 > [ 1018.059751] [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa > [ 1018.059755] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d > [ 1018.059757] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 > [ 1018.059761] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa > [ 1018.059762] <EOI> [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef > [ 1018.059769] [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef > [ 1018.059773] [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30 > [ 1018.059776] [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30 > [ 1018.059777] handlers: > [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) > [ 1018.059783] Disabling IRQ #23 How reproducible is the error for you? Do you see it every time or not? If it is reliably reproducible, can you think of any explanation why your earlier test was a success while we now see that the revert does not help? Does the error *only* occur during gcc compilation, or was that just the simplest way to reproduce it? Does it always occur at the same point during the compilation or does it vary? Can you create a test case that does not require doing the whole compilation, but only executes the step that triggers the error? If you can find a reliable and fairly quick way to reproduce the error, I would suggest doing a bisection. Jeff, Tejun: do you have any ideas what could cause this issue to suddenly appear or how to debug/instrument it? Cheers, FJP ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-10 13:13 ` Frans Pop @ 2009-10-11 20:57 ` Alexander Huemer 2009-10-12 7:49 ` Tejun Heo 1 sibling, 0 replies; 29+ messages in thread From: Alexander Huemer @ 2009-10-11 20:57 UTC (permalink / raw) To: Frans Pop Cc: linux-kernel, linux-ide, Tejun Heo, Jeff Garzik, alexander.huemer I don't know what vanilla-2.6.31-r2 is, but I assume it's based on either 2.6.31.3 or 2.6.31.2. vanilla just means the unpatched kernel from kernel.org. The most likely explanation is that your earlier test from which you concluded that the revert did fix the problem was incorrect. It seems unlikely that some other stable commit interferes here. So basically we're back where we started. unfortunately you seem to be right. How reproducible is the error for you? Do you see it every time or not? If it is reliably reproducible, can you think of any explanation why your earlier test was a success while we now see that the revert does not help? the error is reproducible. i'll try to pin it down to certain kernel versions in the next days. Does the error *only* occur during gcc compilation, or was that just the simplest way to reproduce it? Does it always occur at the same point during the compilation or does it vary? it was the simplest way. i don't know how i could find out if the error actually always happens exactly the same time. i'll think about that. Can you create a test case that does not require doing the whole compilation, but only executes the step that triggers the error? surely, if i know what happens when the error occurs. If you can find a reliable and fairly quick way to reproduce the error, I would suggest doing a bisection. i would be happy to do that. thanks for now. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-10 13:13 ` Frans Pop 2009-10-11 20:57 ` Alexander Huemer @ 2009-10-12 7:49 ` Tejun Heo 2009-10-12 9:48 ` Frans Pop 1 sibling, 1 reply; 29+ messages in thread From: Tejun Heo @ 2009-10-12 7:49 UTC (permalink / raw) To: Frans Pop; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik Hello, Frans Pop wrote: >> so in my opinion reverting commit [1] with commit [2] missed the point. >> >> [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246 >> [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc > > The most likely explanation is that your earlier test from which you > concluded that the revert did fix the problem was incorrect. It seems > unlikely that some other stable commit interferes here. Hmm... > So basically we're back where we started. > >> [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" option) >> [ 1018.059734] Pid: 8656, comm: sh Tainted: G W 2.6.31-gentoo-r2-blackbit #1 >> [ 1018.059736] Call Trace: >> [ 1018.059738] <IRQ> [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d >> [ 1018.059748] [<ffffffff81067023>] ? note_interrupt+0x107/0x170 >> [ 1018.059751] [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa >> [ 1018.059755] [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d >> [ 1018.059757] [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2 >> [ 1018.059761] [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa >> [ 1018.059762] <EOI> [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef >> [ 1018.059769] [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef >> [ 1018.059773] [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30 >> [ 1018.059776] [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30 >> [ 1018.059777] handlers: >> [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) >> [ 1018.059783] Disabling IRQ #23 > > How reproducible is the error for you? Do you see it every time or not? > If it is reliably reproducible, can you think of any explanation why your > earlier test was a success while we now see that the revert does not help? > > Does the error *only* occur during gcc compilation, or was that just the > simplest way to reproduce it? Does it always occur at the same point during > the compilation or does it vary? > Can you create a test case that does not require doing the whole > compilation, but only executes the step that triggers the error? > > If you can find a reliable and fairly quick way to reproduce the error, I > would suggest doing a bisection. > > Jeff, Tejun: do you have any ideas what could cause this issue to suddenly > appear or how to debug/instrument it? Alexander, can you please attach full boot log and the output of "lspci -nn"? Also, how reproducible is the problem? You already answered to Frans' question but can you be more specific? Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 7:49 ` Tejun Heo @ 2009-10-12 9:48 ` Frans Pop 2009-10-12 9:52 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Frans Pop @ 2009-10-12 9:48 UTC (permalink / raw) To: Tejun Heo; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik On Monday 12 October 2009, Tejun Heo wrote: > Alexander, can you please attach full boot log and the output of > "lspci -nn"? Also, how reproducible is the problem? You already > answered to Frans' question but can you be more specific? Full dmesg was made available earlier at: http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 9:48 ` Frans Pop @ 2009-10-12 9:52 ` Tejun Heo 2009-10-12 9:55 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2009-10-12 9:52 UTC (permalink / raw) To: Frans Pop; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik Frans Pop wrote: > On Monday 12 October 2009, Tejun Heo wrote: >> Alexander, can you please attach full boot log and the output of >> "lspci -nn"? Also, how reproducible is the problem? You already >> answered to Frans' question but can you be more specific? > > Full dmesg was made available earlier at: > http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 Does blacklisting i801_smbus make any difference? -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 9:52 ` Tejun Heo @ 2009-10-12 9:55 ` Alexander Huemer 2009-10-12 10:07 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-10-12 9:55 UTC (permalink / raw) To: Tejun Heo Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Tejun Heo wrote: > Frans Pop wrote: > >> On Monday 12 October 2009, Tejun Heo wrote: >> >>> Alexander, can you please attach full boot log and the output of >>> "lspci -nn"? Also, how reproducible is the problem? You already >>> answered to Frans' question but can you be more specific? >>> >> Full dmesg was made available earlier at: >> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >> > > Does blacklisting i801_smbus make any difference? > > lspci -nn: http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 what do you mean with "blacklisting i801_smbus" ? regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 9:55 ` Alexander Huemer @ 2009-10-12 10:07 ` Tejun Heo 2009-10-12 10:11 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2009-10-12 10:07 UTC (permalink / raw) To: Alexander Huemer; +Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik Alexander Huemer wrote: > Tejun Heo wrote: >> Frans Pop wrote: >> >>> On Monday 12 October 2009, Tejun Heo wrote: >>> >>>> Alexander, can you please attach full boot log and the output of >>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>> answered to Frans' question but can you be more specific? >>>> >>> Full dmesg was made available earlier at: >>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>> >> >> Does blacklisting i801_smbus make any difference? >> >> > lspci -nn: > http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 > > what do you mean with "blacklisting i801_smbus" ? [ 3.872387] i2c /dev entries driver [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 [ 3.875580] w83627hf: Found W83627HF chip at 0x290 IRQ23 is also used by i801_smbus and it would be nice to confirm whether the problem can still be triggered with that driver not loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist should probabaly do the trick. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 10:07 ` Tejun Heo @ 2009-10-12 10:11 ` Alexander Huemer 2009-10-12 15:03 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-10-12 10:11 UTC (permalink / raw) To: Tejun Heo Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Tejun Heo wrote: > Alexander Huemer wrote: > >> Tejun Heo wrote: >> >>> Frans Pop wrote: >>> >>> >>>> On Monday 12 October 2009, Tejun Heo wrote: >>>> >>>> >>>>> Alexander, can you please attach full boot log and the output of >>>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>>> answered to Frans' question but can you be more specific? >>>>> >>>>> >>>> Full dmesg was made available earlier at: >>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>>> >>>> >>> Does blacklisting i801_smbus make any difference? >>> >>> >>> >> lspci -nn: >> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 >> >> what do you mean with "blacklisting i801_smbus" ? >> > > [ 3.872387] i2c /dev entries driver > [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 > [ 3.875580] w83627hf: Found W83627HF chip at 0x290 > > IRQ23 is also used by i801_smbus and it would be nice to confirm > whether the problem can still be triggered with that driver not > loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist > should probabaly do the trick. > > Thanks. > > okay, i think you assume that i2c_i801 is a module. it is indeed built into the kernel. i'll rebuild the kernel without that component and run a test again. regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 10:11 ` Alexander Huemer @ 2009-10-12 15:03 ` Alexander Huemer 2009-10-12 17:28 ` Robert Hancock 2009-10-13 2:17 ` Tejun Heo 0 siblings, 2 replies; 29+ messages in thread From: Alexander Huemer @ 2009-10-12 15:03 UTC (permalink / raw) To: Tejun Heo Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Alexander Huemer wrote: > Tejun Heo wrote: >> Alexander Huemer wrote: >> >>> Tejun Heo wrote: >>> >>>> Frans Pop wrote: >>>> >>>> >>>>> On Monday 12 October 2009, Tejun Heo wrote: >>>>> >>>>>> Alexander, can you please attach full boot log and the output of >>>>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>>>> answered to Frans' question but can you be more specific? >>>>>> >>>>> Full dmesg was made available earlier at: >>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>>>> >>>> Does blacklisting i801_smbus make any difference? >>>> >>>> >>> lspci -nn: >>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 >>> >>> what do you mean with "blacklisting i801_smbus" ? >>> >> >> [ 3.872387] i2c /dev entries driver >> [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, >> low) -> IRQ 23 >> [ 3.875580] w83627hf: Found W83627HF chip at 0x290 >> >> IRQ23 is also used by i801_smbus and it would be nice to confirm >> whether the problem can still be triggered with that driver not >> loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist >> should probabaly do the trick. >> >> Thanks. >> >> > okay, i think you assume that i2c_i801 is a module. > it is indeed built into the kernel. > i'll rebuild the kernel without that component and run a test again. > > regards > -alex tejun, it seems you hit an interesting point. i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801. my usual test (compilation of gcc-4.3.2) finished 5 times without the error. i'll let it run some more times over night. does anybody have an idea how i can trace what exactly causes the error during the compilation run so that i can create a short test program ? regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 15:03 ` Alexander Huemer @ 2009-10-12 17:28 ` Robert Hancock 2009-10-13 2:17 ` Tejun Heo 1 sibling, 0 replies; 29+ messages in thread From: Robert Hancock @ 2009-10-12 17:28 UTC (permalink / raw) To: Alexander Huemer Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik On 10/12/2009 09:03 AM, Alexander Huemer wrote: > Alexander Huemer wrote: >> Tejun Heo wrote: >>> Alexander Huemer wrote: >>> >>>> Tejun Heo wrote: >>>>> Frans Pop wrote: >>>>> >>>>>> On Monday 12 October 2009, Tejun Heo wrote: >>>>>>> Alexander, can you please attach full boot log and the output of >>>>>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>>>>> answered to Frans' question but can you be more specific? >>>>>> Full dmesg was made available earlier at: >>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>>>> Does blacklisting i801_smbus make any difference? >>>>> >>>> lspci -nn: >>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 >>>> >>>> what do you mean with "blacklisting i801_smbus" ? >>> >>> [ 3.872387] i2c /dev entries driver >>> [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) >>> -> IRQ 23 >>> [ 3.875580] w83627hf: Found W83627HF chip at 0x290 >>> >>> IRQ23 is also used by i801_smbus and it would be nice to confirm >>> whether the problem can still be triggered with that driver not >>> loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist >>> should probabaly do the trick. >>> >>> Thanks. >>> >> okay, i think you assume that i2c_i801 is a module. >> it is indeed built into the kernel. >> i'll rebuild the kernel without that component and run a test again. >> >> regards >> -alex > tejun, it seems you hit an interesting point. > i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801. > my usual test (compilation of gcc-4.3.2) finished 5 times without the > error. > i'll let it run some more times over night. > does anybody have an idea how i can trace what exactly causes the error > during the compilation run so that i can create a short test program ? Do you have any hardware sensors monitoring software running (such as the GNOME sensors panel applet or something?) Something like that would be the most likely cause for something to access the smbus driver. Interesting that the device seems to be on the same interrupt but it hasn't registered itself as a handler (it looks like that driver doesn't use interrupts). If the device did generate an interrupt though, it would indeed cause this problem. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-12 15:03 ` Alexander Huemer 2009-10-12 17:28 ` Robert Hancock @ 2009-10-13 2:17 ` Tejun Heo 2009-10-13 6:49 ` Alexander Huemer 1 sibling, 1 reply; 29+ messages in thread From: Tejun Heo @ 2009-10-13 2:17 UTC (permalink / raw) To: Alexander Huemer, Jean Delvare Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik [cc'ing Jean and quoting whole body] Hello, Jean. It seems i2c_i801 is triggering IRQ storm on Alexander's machine. The original thread is http://thread.gmane.org/gmane.linux.kernel/894187 Any ideas? Thanks. Alexander Huemer wrote: > Alexander Huemer wrote: >> Tejun Heo wrote: >>> Alexander Huemer wrote: >>> >>>> Tejun Heo wrote: >>>> >>>>> Frans Pop wrote: >>>>> >>>>> >>>>>> On Monday 12 October 2009, Tejun Heo wrote: >>>>>> >>>>>>> Alexander, can you please attach full boot log and the output of >>>>>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>>>>> answered to Frans' question but can you be more specific? >>>>>>> >>>>>> Full dmesg was made available earlier at: >>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>>>>> >>>>> Does blacklisting i801_smbus make any difference? >>>>> >>>>> >>>> lspci -nn: >>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 >>>> >>>> what do you mean with "blacklisting i801_smbus" ? >>>> >>> >>> [ 3.872387] i2c /dev entries driver >>> [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, >>> low) -> IRQ 23 >>> [ 3.875580] w83627hf: Found W83627HF chip at 0x290 >>> >>> IRQ23 is also used by i801_smbus and it would be nice to confirm >>> whether the problem can still be triggered with that driver not >>> loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist >>> should probabaly do the trick. >>> >>> Thanks. >>> >>> >> okay, i think you assume that i2c_i801 is a module. >> it is indeed built into the kernel. >> i'll rebuild the kernel without that component and run a test again. >> >> regards >> -alex > tejun, it seems you hit an interesting point. > i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801. > my usual test (compilation of gcc-4.3.2) finished 5 times without the > error. > i'll let it run some more times over night. > does anybody have an idea how i can trace what exactly causes the error > during the compilation run so that i can create a short test program ? > > regards > -alex -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-13 2:17 ` Tejun Heo @ 2009-10-13 6:49 ` Alexander Huemer 2009-10-13 12:35 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-10-13 6:49 UTC (permalink / raw) To: Tejun Heo Cc: Jean Delvare, Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Tejun Heo wrote: > [cc'ing Jean and quoting whole body] > > Hello, Jean. > > It seems i2c_i801 is triggering IRQ storm on Alexander's machine. The > original thread is > > http://thread.gmane.org/gmane.linux.kernel/894187 > > Any ideas? > > Thanks. > > Alexander Huemer wrote: > >> Alexander Huemer wrote: >> >>> Tejun Heo wrote: >>> >>>> Alexander Huemer wrote: >>>> >>>> >>>>> Tejun Heo wrote: >>>>> >>>>> >>>>>> Frans Pop wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Monday 12 October 2009, Tejun Heo wrote: >>>>>>> >>>>>>> >>>>>>>> Alexander, can you please attach full boot log and the output of >>>>>>>> "lspci -nn"? Also, how reproducible is the problem? You already >>>>>>>> answered to Frans' question but can you be more specific? >>>>>>>> >>>>>>>> >>>>>>> Full dmesg was made available earlier at: >>>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923 >>>>>>> >>>>>>> >>>>>> Does blacklisting i801_smbus make any difference? >>>>>> >>>>>> >>>>>> >>>>> lspci -nn: >>>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012 >>>>> >>>>> what do you mean with "blacklisting i801_smbus" ? >>>>> >>>>> >>>> [ 3.872387] i2c /dev entries driver >>>> [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, >>>> low) -> IRQ 23 >>>> [ 3.875580] w83627hf: Found W83627HF chip at 0x290 >>>> >>>> IRQ23 is also used by i801_smbus and it would be nice to confirm >>>> whether the problem can still be triggered with that driver not >>>> loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist >>>> should probabaly do the trick. >>>> >>>> Thanks. >>>> >>>> >>>> >>> okay, i think you assume that i2c_i801 is a module. >>> it is indeed built into the kernel. >>> i'll rebuild the kernel without that component and run a test again. >>> >>> regards >>> -alex >>> >> tejun, it seems you hit an interesting point. >> i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801. >> my usual test (compilation of gcc-4.3.2) finished 5 times without the >> error. >> i'll let it run some more times over night. >> does anybody have an idea how i can trace what exactly causes the error >> during the compilation run so that i can create a short test program ? >> >> regards >> -alex >> > > hi, i compiled gcc in a loop over night, 14 times. no error. it really seams i2c_i801 was the cause... unfortunately i still don't know how i can extract the part of the gcc compilation process that causes the error on an affected kernel. that would enable me to create a simple test program. regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-13 6:49 ` Alexander Huemer @ 2009-10-13 12:35 ` Tejun Heo 2009-10-14 11:45 ` Jean Delvare 2009-10-21 8:38 ` Jean Delvare 0 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2009-10-13 12:35 UTC (permalink / raw) To: Alexander Huemer Cc: Jean Delvare, Frans Pop, linux-kernel, linux-ide, Jeff Garzik Alexander Huemer wrote: > i compiled gcc in a loop over night, 14 times. no error. > it really seams i2c_i801 was the cause... > unfortunately i still don't know how i can extract the part of the gcc > compilation process that causes the error on an affected kernel. > that would enable me to create a simple test program. Given that i2c is used for temperature monitoring, I think it is not triggered by any single step of the compiling but rather by the accumulated heat load during compilation. Let's wait for Jean to chime in. :-) Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-13 12:35 ` Tejun Heo @ 2009-10-14 11:45 ` Jean Delvare 2009-10-21 8:38 ` Jean Delvare 1 sibling, 0 replies; 29+ messages in thread From: Jean Delvare @ 2009-10-14 11:45 UTC (permalink / raw) To: Tejun Heo Cc: Alexander Huemer, Frans Pop, linux-kernel, linux-ide, Jeff Garzik Le mardi 13 octobre 2009, Tejun Heo a écrit : > Alexander Huemer wrote: > > i compiled gcc in a loop over night, 14 times. no error. > > it really seams i2c_i801 was the cause... > > unfortunately i still don't know how i can extract the part of the gcc > > compilation process that causes the error on an affected kernel. > > that would enable me to create a simple test program. > > Given that i2c is used for temperature monitoring, I think it is not > triggered by any single step of the compiling but rather by the > accumulated heat load during compilation. Let's wait for Jean to > chime in. :-) Sorry, I'm somewhat busy at the moment, I'll give it a look as soon as I get a moment. -- Jean Delvare Suse L3 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-13 12:35 ` Tejun Heo 2009-10-14 11:45 ` Jean Delvare @ 2009-10-21 8:38 ` Jean Delvare 2009-10-21 10:01 ` Alexander Huemer 1 sibling, 1 reply; 29+ messages in thread From: Jean Delvare @ 2009-10-21 8:38 UTC (permalink / raw) To: Tejun Heo Cc: Alexander Huemer, Frans Pop, linux-kernel, linux-ide, Jeff Garzik Hi Tejun, Alexander, Le mardi 13 octobre 2009, Tejun Heo a écrit : > Alexander Huemer wrote: > > i compiled gcc in a loop over night, 14 times. no error. > > it really seams i2c_i801 was the cause... > > unfortunately i still don't know how i can extract the part of the gcc > > compilation process that causes the error on an affected kernel. > > that would enable me to create a simple test program. > > Given that i2c is used for temperature monitoring, I think it is not > triggered by any single step of the compiling but rather by the > accumulated heat load during compilation. Let's wait for Jean to > chime in. :-) OK, here I am, sorry for the delay. I've read the discussion thread. Here are the few data points I can offer, in the hope it will help: * While the i2c-i801 driver received some changes in kernel 2.6.30, none of these are related to PCI nor interrupts. So as the problem is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to cause it. This may, however, be a combination of something i2c-i801 does and something the pci subsystem does since kernel 2.6.30. For this reason, I would still recommend a bisection if the problem can be reliably reproduced. I know it takes time, but it is always easier to fix a bug when we know which commit introduced it. * The i2c-i801 driver does _not_ make use of interrupts. It is poll-based (I am not exactly proud of that, but that's the way it is.) #define ENABLE_INT9 0 /* set to 0x01 to enable - untested */ So I am very surprised to read that this driver would cause an IRQ storm. * One thing the i2c-i801 driver does on the PCI device is: err = pci_enable_device(dev); I presume this is what causes the following message in dmesg: i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 Basically, even though the driver doesn't make use of interrupts, the IRQ is still registered because this is how the hardware is setup. As a conclusion, I suspect that 2 things may be happening: either the SMBus is triggering interrupts when told not to. The ICH6 is a bit different from all the other supported chips, I'll double check if we may have missed something. Or, something else is triggering SMBus transactions. SMI and ACPI come to mind. If this is the case then you do not want to use i2c-i801 on this motherboard. Questions to Alexander : * Can I please see the output of "sensors" on your system? * What are the brand and model of your motherboard? * Can we get an acpidump for your system? -- Jean Delvare Suse L3 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-21 8:38 ` Jean Delvare @ 2009-10-21 10:01 ` Alexander Huemer 2009-10-21 11:28 ` Jean Delvare 0 siblings, 1 reply; 29+ messages in thread From: Alexander Huemer @ 2009-10-21 10:01 UTC (permalink / raw) To: Jean Delvare Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Jean Delvare wrote: > Hi Tejun, Alexander, > > Le mardi 13 octobre 2009, Tejun Heo a écrit : > >> Alexander Huemer wrote: >> >>> i compiled gcc in a loop over night, 14 times. no error. >>> it really seams i2c_i801 was the cause... >>> unfortunately i still don't know how i can extract the part of the gcc >>> compilation process that causes the error on an affected kernel. >>> that would enable me to create a simple test program. >>> >> Given that i2c is used for temperature monitoring, I think it is not >> triggered by any single step of the compiling but rather by the >> accumulated heat load during compilation. Let's wait for Jean to >> chime in. :-) >> > > OK, here I am, sorry for the delay. I've read the discussion thread. > Here are the few data points I can offer, in the hope it will help: > > * While the i2c-i801 driver received some changes in kernel 2.6.30, > none of these are related to PCI nor interrupts. So as the problem > is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to > cause it. This may, however, be a combination of something i2c-i801 > does and something the pci subsystem does since kernel 2.6.30. For > this reason, I would still recommend a bisection if the problem can > be reliably reproduced. I know it takes time, but it is always > easier to fix a bug when we know which commit introduced it. > > * The i2c-i801 driver does _not_ make use of interrupts. It is > poll-based (I am not exactly proud of that, but that's the way it > is.) > > #define ENABLE_INT9 0 /* set to 0x01 to enable - untested */ > > So I am very surprised to read that this driver would cause an IRQ > storm. > > * One thing the i2c-i801 driver does on the PCI device is: > > err = pci_enable_device(dev); > > I presume this is what causes the following message in dmesg: > > i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 > > Basically, even though the driver doesn't make use of interrupts, > the IRQ is still registered because this is how the hardware is > setup. > > As a conclusion, I suspect that 2 things may be happening: either > the SMBus is triggering interrupts when told not to. The ICH6 is a > bit different from all the other supported chips, I'll double check > if we may have missed something. Or, something else is triggering > SMBus transactions. SMI and ACPI come to mind. If this is the case > then you do not want to use i2c-i801 on this motherboard. > > Questions to Alexander : > > * Can I please see the output of "sensors" on your system? > * What are the brand and model of your motherboard? > * Can we get an acpidump for your system? > > many thanks for your response. i appreciate that. first, the data you requested: sensors: http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt acpidump: http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt motherboard: tyan tempest i5400pw/s5397 with one intel xeon e5420. the output of sensors was made _without_ i801_smbus in the kernel. i noticed that the data of w83627hf-isa-0290 is quite weird. i do not have an explanation for that. if a bisection is what will bring light into this, i am willing to take the time. so that would be a bisection between 2.6.29 and 2.6.30 ? a quicker test case would be good for that, but i don't have one yet, just the compilation of gcc, which takes time, even on this machine with tmpfs and ccache. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-21 10:01 ` Alexander Huemer @ 2009-10-21 11:28 ` Jean Delvare 2009-10-26 15:01 ` Alexander Huemer 0 siblings, 1 reply; 29+ messages in thread From: Jean Delvare @ 2009-10-21 11:28 UTC (permalink / raw) To: Alexander Huemer Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik Le mercredi 21 octobre 2009, Alexander Huemer a écrit : > Jean Delvare wrote: > > OK, here I am, sorry for the delay. I've read the discussion thread. > > Here are the few data points I can offer, in the hope it will help: > > > > * While the i2c-i801 driver received some changes in kernel 2.6.30, > > none of these are related to PCI nor interrupts. So as the problem > > is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to > > cause it. This may, however, be a combination of something i2c-i801 > > does and something the pci subsystem does since kernel 2.6.30. For > > this reason, I would still recommend a bisection if the problem can > > be reliably reproduced. I know it takes time, but it is always > > easier to fix a bug when we know which commit introduced it. > > > > * The i2c-i801 driver does _not_ make use of interrupts. It is > > poll-based (I am not exactly proud of that, but that's the way it > > is.) > > > > #define ENABLE_INT9 0 /* set to 0x01 to enable - untested */ > > > > So I am very surprised to read that this driver would cause an IRQ > > storm. > > > > * One thing the i2c-i801 driver does on the PCI device is: > > > > err = pci_enable_device(dev); > > > > I presume this is what causes the following message in dmesg: > > > > i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 > > > > Basically, even though the driver doesn't make use of interrupts, > > the IRQ is still registered because this is how the hardware is > > setup. > > > > As a conclusion, I suspect that 2 things may be happening: either > > the SMBus is triggering interrupts when told not to. The ICH6 is a > > bit different from all the other supported chips, I'll double check My bad, it's an 63xxESB-based board, not ICH6. I must have been mixing data from a different bug. > > if we may have missed something. Or, something else is triggering > > SMBus transactions. SMI and ACPI come to mind. If this is the case > > then you do not want to use i2c-i801 on this motherboard. > > > > Questions to Alexander : > > > > * Can I please see the output of "sensors" on your system? > > * What are the brand and model of your motherboard? > > * Can we get an acpidump for your system? > > > > > many thanks for your response. i appreciate that. > first, the data you requested: > > sensors: http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt > acpidump: http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt The good news is that I can't see any access to the SMBus in the ACPI tables. Nothing can be said about the SMIs though, without an intimate knowledge of the BIOS. > motherboard: tyan tempest i5400pw/s5397 with one intel xeon e5420. > > the output of sensors was made _without_ i801_smbus in the kernel. Then please once again with it. My whole point was to know whether there was any hardware monitoring chip connected to the SMBus. Your initial kernel configuration suggests that you have a W83793G chip there. > i noticed that the data of w83627hf-isa-0290 is quite weird. i do not > have an explanation for that. I do. This happens when the manufacturer decides that the hardware monitoring features of the Super-I/O are insufficient for their needs. They add a dedicated chip for the hardware monitoring. This is particularly frequent on server boards from Tyan and SuperMicro. Ideally they would _also_ disable the feature on the Super-I/O side, but often then do not, so the driver still loads, but outputs garbage. You can see the following messages in your log: [ 3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense [ 3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense This is a good hint that this is the case (if the nonsensical data displayed by "sensors" wasn't enough to convince you.) So you should stop loading/including kernel module w83627hf. > if a bisection is what will bring light into this, i am willing to take > the time. > so that would be a bisection between 2.6.29 and 2.6.30 ? > a quicker test case would be good for that, but i don't have one yet, > just the compilation of gcc, which takes time, even on this machine with > tmpfs and ccache. -- Jean Delvare Suse L3 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared 2009-10-21 11:28 ` Jean Delvare @ 2009-10-26 15:01 ` Alexander Huemer 0 siblings, 0 replies; 29+ messages in thread From: Alexander Huemer @ 2009-10-26 15:01 UTC (permalink / raw) To: Jean Delvare Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer Jean Delvare wrote: > Le mercredi 21 octobre 2009, Alexander Huemer a écrit : > >> Jean Delvare wrote: >> >>> OK, here I am, sorry for the delay. I've read the discussion thread. >>> Here are the few data points I can offer, in the hope it will help: >>> >>> * While the i2c-i801 driver received some changes in kernel 2.6.30, >>> none of these are related to PCI nor interrupts. So as the problem >>> is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to >>> cause it. This may, however, be a combination of something i2c-i801 >>> does and something the pci subsystem does since kernel 2.6.30. For >>> this reason, I would still recommend a bisection if the problem can >>> be reliably reproduced. I know it takes time, but it is always >>> easier to fix a bug when we know which commit introduced it. >>> >>> * The i2c-i801 driver does _not_ make use of interrupts. It is >>> poll-based (I am not exactly proud of that, but that's the way it >>> is.) >>> >>> #define ENABLE_INT9 0 /* set to 0x01 to enable - untested */ >>> >>> So I am very surprised to read that this driver would cause an IRQ >>> storm. >>> >>> * One thing the i2c-i801 driver does on the PCI device is: >>> >>> err = pci_enable_device(dev); >>> >>> I presume this is what causes the following message in dmesg: >>> >>> i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 >>> >>> Basically, even though the driver doesn't make use of interrupts, >>> the IRQ is still registered because this is how the hardware is >>> setup. >>> >>> As a conclusion, I suspect that 2 things may be happening: either >>> the SMBus is triggering interrupts when told not to. The ICH6 is a >>> bit different from all the other supported chips, I'll double check >>> > > My bad, it's an 63xxESB-based board, not ICH6. I must have been > mixing data from a different bug. > > >>> if we may have missed something. Or, something else is triggering >>> SMBus transactions. SMI and ACPI come to mind. If this is the case >>> then you do not want to use i2c-i801 on this motherboard. >>> >>> Questions to Alexander : >>> >>> * Can I please see the output of "sensors" on your system? >>> * What are the brand and model of your motherboard? >>> * Can we get an acpidump for your system? >>> >>> >>> >> many thanks for your response. i appreciate that. >> first, the data you requested: >> >> sensors: http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt >> acpidump: http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt >> > > The good news is that I can't see any access to the SMBus in the > ACPI tables. Nothing can be said about the SMIs though, without an > intimate knowledge of the BIOS. > > >> motherboard: tyan tempest i5400pw/s5397 with one intel xeon e5420. >> >> the output of sensors was made _without_ i801_smbus in the kernel. >> > > Then please once again with it. My whole point was to know whether > there was any hardware monitoring chip connected to the SMBus. Your > initial kernel configuration suggests that you have a W83793G chip > there. > > >> i noticed that the data of w83627hf-isa-0290 is quite weird. i do not >> have an explanation for that. >> > > I do. This happens when the manufacturer decides that the hardware > monitoring features of the Super-I/O are insufficient for their > needs. They add a dedicated chip for the hardware monitoring. This > is particularly frequent on server boards from Tyan and SuperMicro. > Ideally they would _also_ disable the feature on the Super-I/O side, > but often then do not, so the driver still loads, but outputs > garbage. > > You can see the following messages in your log: > [ 3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense > [ 3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense > This is a good hint that this is the case (if the nonsensical data > displayed by "sensors" wasn't enough to convince you.) > > So you should stop loading/including kernel module w83627hf. > > >> if a bisection is what will bring light into this, i am willing to take >> the time. >> so that would be a bisection between 2.6.29 and 2.6.30 ? >> a quicker test case would be good for that, but i don't have one yet, >> just the compilation of gcc, which takes time, even on this machine with >> tmpfs and ccache. >> > > here is the output you requested: http://xx.vu/~ahuemer/sensors_ahuemer_with_i801_20091026.txt i am currently in the middle of a bisection between 2.6.29 and 2.6.30, 8 steps left. many thanks for the info on hardware monitoring. i'll report back when bisection is finished. regards -alex ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-10-26 15:02 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4ABBB8C2.2080901@sbg.ac.at>
2009-09-24 19:24 ` 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Frans Pop
2009-09-24 19:30 ` Alexander Huemer
2009-09-24 19:40 ` Frans Pop
2009-09-24 19:43 ` Alexander Huemer
2009-09-25 0:02 ` Alexander Huemer
2009-09-25 11:28 ` Alexander Huemer
2009-09-25 12:24 ` Frans Pop
2009-09-25 12:27 ` Alexander Huemer
2009-09-25 12:48 ` Frans Pop
2009-10-08 12:00 ` Alexander Huemer
2009-10-09 21:30 ` Alexander Huemer
2009-10-10 13:13 ` Frans Pop
2009-10-11 20:57 ` Alexander Huemer
2009-10-12 7:49 ` Tejun Heo
2009-10-12 9:48 ` Frans Pop
2009-10-12 9:52 ` Tejun Heo
2009-10-12 9:55 ` Alexander Huemer
2009-10-12 10:07 ` Tejun Heo
2009-10-12 10:11 ` Alexander Huemer
2009-10-12 15:03 ` Alexander Huemer
2009-10-12 17:28 ` Robert Hancock
2009-10-13 2:17 ` Tejun Heo
2009-10-13 6:49 ` Alexander Huemer
2009-10-13 12:35 ` Tejun Heo
2009-10-14 11:45 ` Jean Delvare
2009-10-21 8:38 ` Jean Delvare
2009-10-21 10:01 ` Alexander Huemer
2009-10-21 11:28 ` Jean Delvare
2009-10-26 15:01 ` Alexander Huemer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).