* Regression: EEE PC hangs when booting off battery
@ 2009-04-11 9:41 Alan Jenkins
[not found] ` <49E065CF.6040408-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-11 19:40 ` Regression: " Kristoffer Ericson
0 siblings, 2 replies; 32+ messages in thread
From: Alan Jenkins @ 2009-04-11 9:41 UTC (permalink / raw)
To: linux-acpi-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-kernel, Kernel Testers List
Regression #6 on latest git! (last known good is 2.6.29-rc8)
My EEE PC hangs on bootup if it is not connected to mains power. The
hang happens during udev coldplug. I've left it for >60 seconds with
nmi_watchdog=1, soft lockup and hung task detection are enabled, and I
get no trace. Key presses are not echoed to the screen.
Magic SysRQ keys work though. Unfortunately, SysRq-W / SysRq-T generate
too much output, so the useful bits immediately scroll off the top of
the screen. I can't press SysRq-P (even if that would help), because
the SysRQ key on my laptop is accessed using a Fn modifier key, which
also turns the left hand side of the keyboard into a numeric pad.
Magic SysRQs on my USB keyboard generate no response. (Sometimes the
numlock key seems to work though. I can't tell on the laptop keyboard
because the laptop doesn't have a numlock led).
I also tried booting with vga=ask (anticipating nmi_watchdog output),
and selecting the highest mode, 80x60. It turns out that this doesn't
really work, the screen flickers constantly until userspace changes the
console font. However, it avoided the hang in udev coldplug. Instead,
the hang occurred when X started, leaving me with a black screen and a
non-responsive X cursor.
Booting with acpi=off fixes this.
Regards
Alan
^ permalink raw reply [flat|nested] 32+ messages in thread[parent not found: <49E065CF.6040408-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* Re: Regression: EEE PC hangs when booting off battery [not found] ` <49E065CF.6040408-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-11 18:07 ` Tzy-Jye Daniel Lin [not found] ` <66dc75180904111107q645fbb77i3cf927ffab7a7ef0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Tzy-Jye Daniel Lin @ 2009-04-11 18:07 UTC (permalink / raw) To: Alan Jenkins Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List On Sat, Apr 11, 2009 at 5:41 AM, Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> wrote: > Regression #6 on latest git! (last known good is 2.6.29-rc8) > > Magic SysRQ keys work though. Unfortunately, SysRq-W / SysRq-T generate > too much output, so the useful bits immediately scroll off the top of > the screen. I can't press SysRq-P (even if that would help), because > the SysRQ key on my laptop is accessed using a Fn modifier key, which > also turns the left hand side of the keyboard into a numeric pad. I don't own an Eee, but having played with one, the key sequence Alt down Fn down SysRq down, up Fn up P down, up Alt up should work just fine. As long as you keep holding Alt, you don't need to continue holding SysRq after the initial keypress. > Magic SysRQs on my USB keyboard generate no response. (Sometimes the > numlock key seems to work though. I can't tell on the laptop keyboard > because the laptop doesn't have a numlock led). SysRq over USB has never worked. On PC hardware, Magic SysRq depends on features of the built-in i8042 controller, which only handles PS/2 input. ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <66dc75180904111107q645fbb77i3cf927ffab7a7ef0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Regression: EEE PC hangs when booting off battery [not found] ` <66dc75180904111107q645fbb77i3cf927ffab7a7ef0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-04-12 9:00 ` Alan Jenkins [not found] ` <49E1ADAE.2030103-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-12 9:00 UTC (permalink / raw) To: Tzy-Jye Daniel Lin Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List Tzy-Jye Daniel Lin wrote: > On Sat, Apr 11, 2009 at 5:41 AM, Alan Jenkins > <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> wrote: > >> Regression #6 on latest git! (last known good is 2.6.29-rc8) >> >> Magic SysRQ keys work though. Unfortunately, SysRq-W / SysRq-T generate >> too much output, so the useful bits immediately scroll off the top of >> the screen. I can't press SysRq-P (even if that would help), because >> the SysRQ key on my laptop is accessed using a Fn modifier key, which >> also turns the left hand side of the keyboard into a numeric pad. >> > > I don't own an Eee, but having played with one, the key sequence > > Alt down > Fn down > SysRq down, up > Fn up > P down, up > Alt up > > should work just fine. As long as you keep holding Alt, you don't > need to continue holding SysRq after the initial keypress. > > >> Magic SysRQs on my USB keyboard generate no response. (Sometimes the >> numlock key seems to work though. I can't tell on the laptop keyboard >> because the laptop doesn't have a numlock led). >> > > SysRq over USB has never worked. On PC hardware, Magic SysRq depends > on features of the built-in i8042 controller, which only handles PS/2 > input. > Great, thanks a lot! I was able to run SysRq-P, and found the following backtrace - Pid: 0 EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] cpud_idle_call cpu_idle rest_init start_kernel i386_start_kernel ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <49E1ADAE.2030103-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* [BISECTED] EEE PC hangs when booting off battery [not found] ` <49E1ADAE.2030103-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-12 13:11 ` Alan Jenkins [not found] ` <49E1E89D.7040502-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-12 13:11 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List Alan Jenkins wrote: > Tzy-Jye Daniel Lin wrote: > >> On Sat, Apr 11, 2009 at 5:41 AM, Alan Jenkins >> <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> wrote: >> >> >>> Regression #6 on latest git! (last known good is 2.6.29-rc8) >>> >>> Magic SysRQ keys work though. Unfortunately, SysRq-W / SysRq-T generate >>> too much output, so the useful bits immediately scroll off the top of >>> the screen. I can't press SysRq-P (even if that would help), because >>> the SysRQ key on my laptop is accessed using a Fn modifier key, which >>> also turns the left hand side of the keyboard into a numeric pad. >>> >>> >> I don't own an Eee, but having played with one, the key sequence >> >> Alt down >> Fn down >> SysRq down, up >> Fn up >> P down, up >> Alt up >> >> should work just fine. As long as you keep holding Alt, you don't >> need to continue holding SysRq after the initial keypress. >> >> >> >>> Magic SysRQs on my USB keyboard generate no response. (Sometimes the >>> numlock key seems to work though. I can't tell on the laptop keyboard >>> because the laptop doesn't have a numlock led). >>> >>> >> SysRq over USB has never worked. On PC hardware, Magic SysRq depends >> on features of the built-in i8042 controller, which only handles PS/2 >> input. >> >> > Great, thanks a lot! > > I was able to run SysRq-P, and found the following backtrace - > > Pid: 0 > EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] > cpud_idle_call > cpu_idle > rest_init > start_kernel > i386_start_kernel > 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f Author: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> Date: Mon Mar 30 17:48:24 2009 +0000 ACPI: processor: use .notify method instead of installing handler directly This patch adds a .notify() method. The presence of .notify() causes Linux/ACPI to manage event handlers and notify handlers on our behalf, so we don't have to install and remove them ourselves. Signed-off-by: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> CC: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> CC: Zhao Yakui <yakui.zhao-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> CC: Venki Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> CC: Anil S Keshavamurthy <anil.s.keshavamurthy-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Signed-off-by: Len Brown <len.brown-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <49E1E89D.7040502-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <49E1E89D.7040502-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-13 19:15 ` Bjorn Helgaas [not found] ` <200904131315.55519.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Bjorn Helgaas @ 2009-04-13 19:15 UTC (permalink / raw) To: Alan Jenkins Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List, Venkatesh Pallipadi On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > Alan Jenkins wrote: > > Tzy-Jye Daniel Lin wrote: > >> On Sat, Apr 11, 2009 at 5:41 AM, Alan Jenkins > >> <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> wrote: > >> > >>> Regression #6 on latest git! (last known good is 2.6.29-rc8) You mention that this occurs when booting off battery. So I assume everything works fine when the EEE is plugged in to the wall socket? > >>> Magic SysRQ keys work though. ... > > I was able to run SysRq-P, and found the following backtrace - > > > > Pid: 0 > > EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] Can you figure out where this is in acpi_idle_enter_bm() or maybe just email me your processor.ko module? Does it always happen at the same point? If you blacklist or rename the processor module to prevent it from loading, does that keep the hang from occurring? > > cpud_idle_call > > cpu_idle > > rest_init > > start_kernel > > i386_start_kernel > > 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit > commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f > Author: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> > Date: Mon Mar 30 17:48:24 2009 +0000 Ouch, sorry about that. Thanks for doing all the bisection work. > ACPI: processor: use .notify method instead of installing handler > directly > > This patch adds a .notify() method. The presence of .notify() causes > Linux/ACPI to manage event handlers and notify handlers on our behalf, > so we don't have to install and remove them ourselves. > > Signed-off-by: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> > CC: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > CC: Zhao Yakui <yakui.zhao-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > CC: Venki Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > CC: Anil S Keshavamurthy <anil.s.keshavamurthy-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > Signed-off-by: Len Brown <len.brown-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > > However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. I don't see the problem in that commit yet, and if there is a problem with it, I would think that reverting it from 2.6.30-rc1 would solve it. But maybe it'd be useful to revert the whole .notify series to make sure. From 2.6.30-rc1, you should be able to revert these: 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor 373cfc360ec773be2f7615e59a19f3313255db7c button 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure What happens with those commits reverted? Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <200904131315.55519.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <200904131315.55519.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> @ 2009-04-13 19:57 ` Alan Jenkins 2009-04-13 22:28 ` Bjorn Helgaas 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-13 19:57 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List, Venkatesh Pallipadi Bjorn Helgaas wrote: > On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > > You mention that this occurs when booting off battery. So I > assume everything works fine when the EEE is plugged in to the > wall socket? > When I tested it before, that was what I found. However, I now find that's not quite right. It only works (i.e. doesn't hang) if I remove the battery as well as plugging it into the wall. If I have the battery in, it hangs. >>>>> Magic SysRQ keys work though. ... >>>>> >>> I was able to run SysRq-P, and found the following backtrace - >>> >>> Pid: 0 >>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] >>> > > Can you figure out where this is in acpi_idle_enter_bm() or > maybe just email me your processor.ko module? > > Does it always happen at the same point? > Yes, it always happens at the same point. It turns out I can read the runes, but I don't understand what they're saying :-). objdump -dr processor.o 00001bd0 <acpi_idle_enter_bm>: ... 00001bd0 + 0x1df = 00001daf ... 1d70: b8 03 00 00 00 mov $0x3,%eax 1d75: e8 90 f3 ff ff call 110a <tsc_halts_in_c> 1d7a: 85 c0 test %eax,%eax 1d7c: 74 0a je 1d88 <acpi_idle_enter_bm+0x1b8> 1d7e: b8 0e 09 00 00 mov $0x90e,%eax 1d7f: R_386_32 .rodata.str1.1 1d83: e8 fc ff ff ff call 1d84 <acpi_idle_enter_bm+0x1b4> 1d84: R_386_PC32 mark_tsc_unstable 1d88: 8b 45 e8 mov -0x18(%ebp),%eax 1d8b: 8b 55 ec mov -0x14(%ebp),%edx 1d8e: e8 ab fd ff ff call 1b3e <us_to_pm_timer_ticks> 1d93: 89 c3 mov %eax,%ebx 1d95: b8 17 01 00 00 mov $0x117,%eax 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx 1da0: 89 d6 mov %edx,%esi 1da2: f7 e3 mul %ebx 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx 1da7: e8 fc ff ff ff call 1da8 <acpi_idle_enter_bm+0x1d8> 1da8: R_386_PC32 sched_clock_idle_wakeup_event 1dac: fb sti 1dad: 89 e0 mov %esp,%eax -> 1daf: 31 c9 xor %ecx,%ecx <--------- 1db1: 25 00 e0 ff ff and $0xffffe000,%eax 1db6: 89 fa mov %edi,%edx 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) 1dbc: ff 47 18 incl 0x18(%edi) 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax 1dc2: e8 a4 f5 ff ff call 136b <acpi_state_timer_broadcast> 1dc7: 01 5f 1c add %ebx,0x1c(%edi) 1dca: 11 77 20 adc %esi,0x20(%edi) 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax 1dd0: 83 c4 10 add $0x10,%esp 1dd3: 5b pop %ebx 1dd4: 5e pop %esi 1dd5: 5f pop %edi 1dd6: 5d pop %ebp 1dd7: c3 ret > If you blacklist or rename the processor module to prevent it > from loading, does that keep the hang from occurring? > No. In that case I get the hang in default_idle+0x59/0x95 0000007a <default_idle>: 7a: 55 push %ebp 7b: 89 e5 mov %esp,%ebp 7d: 56 push %esi 7e: 53 push %ebx 7f: 83 ec 18 sub $0x18,%esp 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 84: R_386_32 .bss 89: 75 7a jne 105 <default_idle+0x8b> 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 8d: R_386_32 boot_cpu_data 92: 74 71 je 105 <default_idle+0x8b> 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 96: R_386_32 __tracepoint_power_start 9b: 74 23 je c0 <default_idle+0x46> 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx 9f: R_386_32 __tracepoint_power_start a3: 85 db test %ebx,%ebx a5: 74 19 je c0 <default_idle+0x46> a7: 8d 75 e0 lea -0x20(%ebp),%esi aa: b9 01 00 00 00 mov $0x1,%ecx af: ba 01 00 00 00 mov $0x1,%edx b4: 89 f0 mov %esi,%eax b6: ff 13 call *(%ebx) b8: 83 c3 04 add $0x4,%ebx bb: 83 3b 00 cmpl $0x0,(%ebx) be: 75 ea jne aa <default_idle+0x30> c0: 89 e0 mov %esp,%eax c2: 25 00 e0 ff ff and $0xffffe000,%eax c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) cb: f6 40 08 08 testb $0x8,0x8(%eax) cf: 75 04 jne d5 <default_idle+0x5b> d1: fb sti d2: f4 hlt --> d3: eb 01 jmp d6 <default_idle+0x5c> <-------- d5: fb sti d6: 89 e0 mov %esp,%eax d8: 25 00 e0 ff ff and $0xffffe000,%eax dd: 83 48 0c 04 orl $0x4,0xc(%eax) e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 e3: R_386_32 __tracepoint_power_end e8: 74 1e je 108 <default_idle+0x8e> >> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit >> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f >> Author: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> >> Date: Mon Mar 30 17:48:24 2009 +0000 >> > > Ouch, sorry about that. Thanks for doing all the bisection work. > > >> ACPI: processor: use .notify method instead of installing handler >> directly >> >> This patch adds a .notify() method. The presence of .notify() causes >> Linux/ACPI to manage event handlers and notify handlers on our behalf, >> so we don't have to install and remove them ourselves. >> >> Signed-off-by: Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> >> CC: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> CC: Zhao Yakui <yakui.zhao-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> CC: Venki Pallipadi <venkatesh.pallipadi-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> CC: Anil S Keshavamurthy <anil.s.keshavamurthy-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> Signed-off-by: Len Brown <len.brown-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> >> >> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. >> > > I don't see the problem in that commit yet, and if there is a problem > with it, I would think that reverting it from 2.6.30-rc1 would solve > it. But maybe it'd be useful to revert the whole .notify series to > make sure. From 2.6.30-rc1, you should be able to revert these: > > 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor > 373cfc360ec773be2f7615e59a19f3313255db7c button > 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure > > What happens with those commits reverted? > I'll find out tomorrow. Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-13 19:57 ` Alan Jenkins @ 2009-04-13 22:28 ` Bjorn Helgaas 2009-04-14 8:06 ` Alan Jenkins 0 siblings, 1 reply; 32+ messages in thread From: Bjorn Helgaas @ 2009-04-13 22:28 UTC (permalink / raw) To: Alan Jenkins Cc: linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: > Bjorn Helgaas wrote: > > On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > > > > You mention that this occurs when booting off battery. So I > > assume everything works fine when the EEE is plugged in to the > > wall socket? > > When I tested it before, that was what I found. > > However, I now find that's not quite right. It only works (i.e. doesn't > hang) if I remove the battery as well as plugging it into the wall. If > I have the battery in, it hangs. > > >>>>> Magic SysRQ keys work though. ... > >>>>> > >>> I was able to run SysRq-P, and found the following backtrace - > >>> > >>> Pid: 0 > >>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] > > > > Can you figure out where this is in acpi_idle_enter_bm() or > > maybe just email me your processor.ko module? > > > > Does it always happen at the same point? > > Yes, it always happens at the same point. > > It turns out I can read the runes, but I don't understand what they're > saying :-). I'm not much good with x86 assembly either :-) I think that in both cases below, you're right after enabling interrupts and about to exit the idle routine. My guess is the system is not really hung; it just doesn't think it has anything to do and is spending all its time in the idle loop. > 00001bd0 <acpi_idle_enter_bm>: > > ... > 00001bd0 + 0x1df = 00001daf > ... > 1d70: b8 03 00 00 00 mov $0x3,%eax > 1d75: e8 90 f3 ff ff call 110a <tsc_halts_in_c> > 1d7a: 85 c0 test %eax,%eax > 1d7c: 74 0a je 1d88 <acpi_idle_enter_bm+0x1b8> > 1d7e: b8 0e 09 00 00 mov $0x90e,%eax > 1d7f: R_386_32 .rodata.str1.1 > 1d83: e8 fc ff ff ff call 1d84 <acpi_idle_enter_bm+0x1b4> > 1d84: R_386_PC32 mark_tsc_unstable > 1d88: 8b 45 e8 mov -0x18(%ebp),%eax > 1d8b: 8b 55 ec mov -0x14(%ebp),%edx > 1d8e: e8 ab fd ff ff call 1b3e <us_to_pm_timer_ticks> > 1d93: 89 c3 mov %eax,%ebx > 1d95: b8 17 01 00 00 mov $0x117,%eax > 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx > 1da0: 89 d6 mov %edx,%esi > 1da2: f7 e3 mul %ebx > 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx > 1da7: e8 fc ff ff ff call 1da8 <acpi_idle_enter_bm+0x1d8> > 1da8: R_386_PC32 sched_clock_idle_wakeup_event > 1dac: fb sti > 1dad: 89 e0 mov %esp,%eax > -> 1daf: 31 c9 xor %ecx,%ecx <--------- > 1db1: 25 00 e0 ff ff and $0xffffe000,%eax > 1db6: 89 fa mov %edi,%edx > 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) > 1dbc: ff 47 18 incl 0x18(%edi) > 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax > 1dc2: e8 a4 f5 ff ff call 136b <acpi_state_timer_broadcast> > 1dc7: 01 5f 1c add %ebx,0x1c(%edi) > 1dca: 11 77 20 adc %esi,0x20(%edi) > 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax > 1dd0: 83 c4 10 add $0x10,%esp > 1dd3: 5b pop %ebx > 1dd4: 5e pop %esi > 1dd5: 5f pop %edi > 1dd6: 5d pop %ebp > 1dd7: c3 ret > > > If you blacklist or rename the processor module to prevent it > > from loading, does that keep the hang from occurring? > > No. In that case I get the hang in default_idle+0x59/0x95 > > 0000007a <default_idle>: > 7a: 55 push %ebp > 7b: 89 e5 mov %esp,%ebp > 7d: 56 push %esi > 7e: 53 push %ebx > 7f: 83 ec 18 sub $0x18,%esp > 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 > 84: R_386_32 .bss > 89: 75 7a jne 105 <default_idle+0x8b> > 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 > 8d: R_386_32 boot_cpu_data > 92: 74 71 je 105 <default_idle+0x8b> > 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > 96: R_386_32 __tracepoint_power_start > 9b: 74 23 je c0 <default_idle+0x46> > 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx > 9f: R_386_32 __tracepoint_power_start > a3: 85 db test %ebx,%ebx > a5: 74 19 je c0 <default_idle+0x46> > a7: 8d 75 e0 lea -0x20(%ebp),%esi > aa: b9 01 00 00 00 mov $0x1,%ecx > af: ba 01 00 00 00 mov $0x1,%edx > b4: 89 f0 mov %esi,%eax > b6: ff 13 call *(%ebx) > b8: 83 c3 04 add $0x4,%ebx > bb: 83 3b 00 cmpl $0x0,(%ebx) > be: 75 ea jne aa <default_idle+0x30> > c0: 89 e0 mov %esp,%eax > c2: 25 00 e0 ff ff and $0xffffe000,%eax > c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) > cb: f6 40 08 08 testb $0x8,0x8(%eax) > cf: 75 04 jne d5 <default_idle+0x5b> > d1: fb sti > d2: f4 hlt > --> d3: eb 01 jmp d6 <default_idle+0x5c> <-------- > d5: fb sti > d6: 89 e0 mov %esp,%eax > d8: 25 00 e0 ff ff and $0xffffe000,%eax > dd: 83 48 0c 04 orl $0x4,0xc(%eax) > e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > e3: R_386_32 __tracepoint_power_end > e8: 74 1e je 108 <default_idle+0x8e> > > > >> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit > >> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f > >> Author: Bjorn Helgaas <bjorn.helgaas@hp.com> > >> Date: Mon Mar 30 17:48:24 2009 +0000 > > > > Ouch, sorry about that. Thanks for doing all the bisection work. > > > >> ACPI: processor: use .notify method instead of installing handler > >> directly > >> > >> This patch adds a .notify() method. The presence of .notify() causes > >> Linux/ACPI to manage event handlers and notify handlers on our behalf, > >> so we don't have to install and remove them ourselves. > >> > >> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> > >> CC: Zhang Rui <rui.zhang@intel.com> > >> CC: Zhao Yakui <yakui.zhao@intel.com> > >> CC: Venki Pallipadi <venkatesh.pallipadi@intel.com> > >> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> > >> Signed-off-by: Len Brown <len.brown@intel.com> > >> > >> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. > > > > I don't see the problem in that commit yet, and if there is a problem > > with it, I would think that reverting it from 2.6.30-rc1 would solve > > it. But maybe it'd be useful to revert the whole .notify series to > > make sure. From 2.6.30-rc1, you should be able to revert these: > > > > 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor > > 373cfc360ec773be2f7615e59a19f3313255db7c button > > 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure > > > > What happens with those commits reverted? > > I'll find out tomorrow. The fact that it still hangs even when you don't load the processor driver at all suggests that the 7ec0a729079 commit identified by the bisection is not the real problem. That commit only touches drivers/acpi/processor_core.c. I think it's more likely some kind of race or missed wakeup. Since it seems to be sensitive to whether the battery is present, I guess you could try blacklisting the battery.ko driver. There have been a few changes to it since 2.6.29-rc8. If things work without battery.ko, we can look through those changes. Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-13 22:28 ` Bjorn Helgaas @ 2009-04-14 8:06 ` Alan Jenkins 2009-04-14 9:26 ` Alan Jenkins 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-14 8:06 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi Bjorn Helgaas wrote: > On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: > >> Bjorn Helgaas wrote: >> >>> On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: >>> >>> You mention that this occurs when booting off battery. So I >>> assume everything works fine when the EEE is plugged in to the >>> wall socket? >>> >> When I tested it before, that was what I found. >> >> However, I now find that's not quite right. It only works (i.e. doesn't >> hang) if I remove the battery as well as plugging it into the wall. If >> I have the battery in, it hangs. >> ... and right now, I can only reproduce it by booting with it plugged into the wall and the battery present. If I unplug it from the wall, it boots fine. It must be affected by something else as well, maybe battery level or charging / discharging status. >>>>>>> Magic SysRQ keys work though. ... >>>>>>> >>>>>>> >>>>> I was able to run SysRq-P, and found the following backtrace - >>>>> >>>>> Pid: 0 >>>>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] >>>>> >>> Can you figure out where this is in acpi_idle_enter_bm() or >>> maybe just email me your processor.ko module? >>> >>> Does it always happen at the same point? >>> >> Yes, it always happens at the same point. >> >> It turns out I can read the runes, but I don't understand what they're >> saying :-). >> > > I'm not much good with x86 assembly either :-) > > I think that in both cases below, you're right after enabling > interrupts and about to exit the idle routine. My guess is the > system is not really hung; it just doesn't think it has anything > to do and is spending all its time in the idle loop. > > >> 00001bd0 <acpi_idle_enter_bm>: >> >> ... >> 00001bd0 + 0x1df = 00001daf >> ... >> 1d70: b8 03 00 00 00 mov $0x3,%eax >> 1d75: e8 90 f3 ff ff call 110a <tsc_halts_in_c> >> 1d7a: 85 c0 test %eax,%eax >> 1d7c: 74 0a je 1d88 <acpi_idle_enter_bm+0x1b8> >> 1d7e: b8 0e 09 00 00 mov $0x90e,%eax >> 1d7f: R_386_32 .rodata.str1.1 >> 1d83: e8 fc ff ff ff call 1d84 <acpi_idle_enter_bm+0x1b4> >> 1d84: R_386_PC32 mark_tsc_unstable >> 1d88: 8b 45 e8 mov -0x18(%ebp),%eax >> 1d8b: 8b 55 ec mov -0x14(%ebp),%edx >> 1d8e: e8 ab fd ff ff call 1b3e <us_to_pm_timer_ticks> >> 1d93: 89 c3 mov %eax,%ebx >> 1d95: b8 17 01 00 00 mov $0x117,%eax >> 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx >> 1da0: 89 d6 mov %edx,%esi >> 1da2: f7 e3 mul %ebx >> 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx >> 1da7: e8 fc ff ff ff call 1da8 <acpi_idle_enter_bm+0x1d8> >> 1da8: R_386_PC32 sched_clock_idle_wakeup_event >> 1dac: fb sti >> 1dad: 89 e0 mov %esp,%eax >> -> 1daf: 31 c9 xor %ecx,%ecx <--------- >> 1db1: 25 00 e0 ff ff and $0xffffe000,%eax >> 1db6: 89 fa mov %edi,%edx >> 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) >> 1dbc: ff 47 18 incl 0x18(%edi) >> 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax >> 1dc2: e8 a4 f5 ff ff call 136b <acpi_state_timer_broadcast> >> 1dc7: 01 5f 1c add %ebx,0x1c(%edi) >> 1dca: 11 77 20 adc %esi,0x20(%edi) >> 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax >> 1dd0: 83 c4 10 add $0x10,%esp >> 1dd3: 5b pop %ebx >> 1dd4: 5e pop %esi >> 1dd5: 5f pop %edi >> 1dd6: 5d pop %ebp >> 1dd7: c3 ret >> >> >>> If you blacklist or rename the processor module to prevent it >>> from loading, does that keep the hang from occurring? >>> >> No. In that case I get the hang in default_idle+0x59/0x95 >> >> 0000007a <default_idle>: >> 7a: 55 push %ebp >> 7b: 89 e5 mov %esp,%ebp >> 7d: 56 push %esi >> 7e: 53 push %ebx >> 7f: 83 ec 18 sub $0x18,%esp >> 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 >> 84: R_386_32 .bss >> 89: 75 7a jne 105 <default_idle+0x8b> >> 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 >> 8d: R_386_32 boot_cpu_data >> 92: 74 71 je 105 <default_idle+0x8b> >> 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >> 96: R_386_32 __tracepoint_power_start >> 9b: 74 23 je c0 <default_idle+0x46> >> 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx >> 9f: R_386_32 __tracepoint_power_start >> a3: 85 db test %ebx,%ebx >> a5: 74 19 je c0 <default_idle+0x46> >> a7: 8d 75 e0 lea -0x20(%ebp),%esi >> aa: b9 01 00 00 00 mov $0x1,%ecx >> af: ba 01 00 00 00 mov $0x1,%edx >> b4: 89 f0 mov %esi,%eax >> b6: ff 13 call *(%ebx) >> b8: 83 c3 04 add $0x4,%ebx >> bb: 83 3b 00 cmpl $0x0,(%ebx) >> be: 75 ea jne aa <default_idle+0x30> >> c0: 89 e0 mov %esp,%eax >> c2: 25 00 e0 ff ff and $0xffffe000,%eax >> c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) >> cb: f6 40 08 08 testb $0x8,0x8(%eax) >> cf: 75 04 jne d5 <default_idle+0x5b> >> d1: fb sti >> d2: f4 hlt >> --> d3: eb 01 jmp d6 <default_idle+0x5c> <-------- >> d5: fb sti >> d6: 89 e0 mov %esp,%eax >> d8: 25 00 e0 ff ff and $0xffffe000,%eax >> dd: 83 48 0c 04 orl $0x4,0xc(%eax) >> e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >> e3: R_386_32 __tracepoint_power_end >> e8: 74 1e je 108 <default_idle+0x8e> >> >> >> >>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit >>>> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f >>>> Author: Bjorn Helgaas <bjorn.helgaas@hp.com> >>>> Date: Mon Mar 30 17:48:24 2009 +0000 >>>> >>> Ouch, sorry about that. Thanks for doing all the bisection work. >>> >>> >>>> ACPI: processor: use .notify method instead of installing handler >>>> directly >>>> >>>> This patch adds a .notify() method. The presence of .notify() causes >>>> Linux/ACPI to manage event handlers and notify handlers on our behalf, >>>> so we don't have to install and remove them ourselves. >>>> >>>> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> >>>> CC: Zhang Rui <rui.zhang@intel.com> >>>> CC: Zhao Yakui <yakui.zhao@intel.com> >>>> CC: Venki Pallipadi <venkatesh.pallipadi@intel.com> >>>> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> >>>> Signed-off-by: Len Brown <len.brown@intel.com> >>>> >>>> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. >>>> >>> I don't see the problem in that commit yet, and if there is a problem >>> with it, I would think that reverting it from 2.6.30-rc1 would solve >>> it. But maybe it'd be useful to revert the whole .notify series to >>> make sure. From 2.6.30-rc1, you should be able to revert these: >>> >>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor >>> 373cfc360ec773be2f7615e59a19f3313255db7c button >>> 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure >>> >>> What happens with those commits reverted? >>> >> I'll find out tomorrow. >> > > The fact that it still hangs even when you don't load the processor > driver at all suggests that the 7ec0a729079 commit identified by the > bisection is not the real problem. That commit only touches > drivers/acpi/processor_core.c. > Yah. > I think it's more likely some kind of race or missed wakeup. > > Since it seems to be sensitive to whether the battery is present, > I guess you could try blacklisting the battery.ko driver. There > have been a few changes to it since 2.6.29-rc8. If things work > without battery.ko, we can look through those changes. > Good guess :-). I tried a couple of times either way, and blacklisting "battery" definitely avoids the hang. Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-14 8:06 ` Alan Jenkins @ 2009-04-14 9:26 ` Alan Jenkins 2009-04-14 14:59 ` Bjorn Helgaas 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-14 9:26 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi, Arjan van de Ven Alan Jenkins wrote: > Bjorn Helgaas wrote: > >> On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: >> >> >>> Bjorn Helgaas wrote: >>> >>> >>>> On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: >>>> >>>> You mention that this occurs when booting off battery. So I >>>> assume everything works fine when the EEE is plugged in to the >>>> wall socket? >>>> >>>> >>> When I tested it before, that was what I found. >>> >>> However, I now find that's not quite right. It only works (i.e. doesn't >>> hang) if I remove the battery as well as plugging it into the wall. If >>> I have the battery in, it hangs. >>> >>> > > ... and right now, I can only reproduce it by booting with it plugged > into the wall and the battery present. If I unplug it from the wall, it > boots fine. > > It must be affected by something else as well, maybe battery level or > charging / discharging status. > > >>>>>>>> Magic SysRQ keys work though. ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> I was able to run SysRq-P, and found the following backtrace - >>>>>> >>>>>> Pid: 0 >>>>>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] >>>>>> >>>>>> >>>> Can you figure out where this is in acpi_idle_enter_bm() or >>>> maybe just email me your processor.ko module? >>>> >>>> Does it always happen at the same point? >>>> >>>> >>> Yes, it always happens at the same point. >>> >>> It turns out I can read the runes, but I don't understand what they're >>> saying :-). >>> >>> >> I'm not much good with x86 assembly either :-) >> >> I think that in both cases below, you're right after enabling >> interrupts and about to exit the idle routine. My guess is the >> system is not really hung; it just doesn't think it has anything >> to do and is spending all its time in the idle loop. >> >> >> >>> 00001bd0 <acpi_idle_enter_bm>: >>> >>> ... >>> 00001bd0 + 0x1df = 00001daf >>> ... >>> 1d70: b8 03 00 00 00 mov $0x3,%eax >>> 1d75: e8 90 f3 ff ff call 110a <tsc_halts_in_c> >>> 1d7a: 85 c0 test %eax,%eax >>> 1d7c: 74 0a je 1d88 <acpi_idle_enter_bm+0x1b8> >>> 1d7e: b8 0e 09 00 00 mov $0x90e,%eax >>> 1d7f: R_386_32 .rodata.str1.1 >>> 1d83: e8 fc ff ff ff call 1d84 <acpi_idle_enter_bm+0x1b4> >>> 1d84: R_386_PC32 mark_tsc_unstable >>> 1d88: 8b 45 e8 mov -0x18(%ebp),%eax >>> 1d8b: 8b 55 ec mov -0x14(%ebp),%edx >>> 1d8e: e8 ab fd ff ff call 1b3e <us_to_pm_timer_ticks> >>> 1d93: 89 c3 mov %eax,%ebx >>> 1d95: b8 17 01 00 00 mov $0x117,%eax >>> 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx >>> 1da0: 89 d6 mov %edx,%esi >>> 1da2: f7 e3 mul %ebx >>> 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx >>> 1da7: e8 fc ff ff ff call 1da8 <acpi_idle_enter_bm+0x1d8> >>> 1da8: R_386_PC32 sched_clock_idle_wakeup_event >>> 1dac: fb sti >>> 1dad: 89 e0 mov %esp,%eax >>> -> 1daf: 31 c9 xor %ecx,%ecx <--------- >>> 1db1: 25 00 e0 ff ff and $0xffffe000,%eax >>> 1db6: 89 fa mov %edi,%edx >>> 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) >>> 1dbc: ff 47 18 incl 0x18(%edi) >>> 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax >>> 1dc2: e8 a4 f5 ff ff call 136b <acpi_state_timer_broadcast> >>> 1dc7: 01 5f 1c add %ebx,0x1c(%edi) >>> 1dca: 11 77 20 adc %esi,0x20(%edi) >>> 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax >>> 1dd0: 83 c4 10 add $0x10,%esp >>> 1dd3: 5b pop %ebx >>> 1dd4: 5e pop %esi >>> 1dd5: 5f pop %edi >>> 1dd6: 5d pop %ebp >>> 1dd7: c3 ret >>> >>> >>> >>>> If you blacklist or rename the processor module to prevent it >>>> from loading, does that keep the hang from occurring? >>>> >>>> >>> No. In that case I get the hang in default_idle+0x59/0x95 >>> >>> 0000007a <default_idle>: >>> 7a: 55 push %ebp >>> 7b: 89 e5 mov %esp,%ebp >>> 7d: 56 push %esi >>> 7e: 53 push %ebx >>> 7f: 83 ec 18 sub $0x18,%esp >>> 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 >>> 84: R_386_32 .bss >>> 89: 75 7a jne 105 <default_idle+0x8b> >>> 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 >>> 8d: R_386_32 boot_cpu_data >>> 92: 74 71 je 105 <default_idle+0x8b> >>> 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >>> 96: R_386_32 __tracepoint_power_start >>> 9b: 74 23 je c0 <default_idle+0x46> >>> 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx >>> 9f: R_386_32 __tracepoint_power_start >>> a3: 85 db test %ebx,%ebx >>> a5: 74 19 je c0 <default_idle+0x46> >>> a7: 8d 75 e0 lea -0x20(%ebp),%esi >>> aa: b9 01 00 00 00 mov $0x1,%ecx >>> af: ba 01 00 00 00 mov $0x1,%edx >>> b4: 89 f0 mov %esi,%eax >>> b6: ff 13 call *(%ebx) >>> b8: 83 c3 04 add $0x4,%ebx >>> bb: 83 3b 00 cmpl $0x0,(%ebx) >>> be: 75 ea jne aa <default_idle+0x30> >>> c0: 89 e0 mov %esp,%eax >>> c2: 25 00 e0 ff ff and $0xffffe000,%eax >>> c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) >>> cb: f6 40 08 08 testb $0x8,0x8(%eax) >>> cf: 75 04 jne d5 <default_idle+0x5b> >>> d1: fb sti >>> d2: f4 hlt >>> --> d3: eb 01 jmp d6 <default_idle+0x5c> <-------- >>> d5: fb sti >>> d6: 89 e0 mov %esp,%eax >>> d8: 25 00 e0 ff ff and $0xffffe000,%eax >>> dd: 83 48 0c 04 orl $0x4,0xc(%eax) >>> e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 >>> e3: R_386_32 __tracepoint_power_end >>> e8: 74 1e je 108 <default_idle+0x8e> >>> >>> >>> >>> >>>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit >>>>> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f >>>>> Author: Bjorn Helgaas <bjorn.helgaas@hp.com> >>>>> Date: Mon Mar 30 17:48:24 2009 +0000 >>>>> >>>>> >>>> Ouch, sorry about that. Thanks for doing all the bisection work. >>>> >>>> >>>> >>>>> ACPI: processor: use .notify method instead of installing handler >>>>> directly >>>>> >>>>> This patch adds a .notify() method. The presence of .notify() causes >>>>> Linux/ACPI to manage event handlers and notify handlers on our behalf, >>>>> so we don't have to install and remove them ourselves. >>>>> >>>>> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> >>>>> CC: Zhang Rui <rui.zhang@intel.com> >>>>> CC: Zhao Yakui <yakui.zhao@intel.com> >>>>> CC: Venki Pallipadi <venkatesh.pallipadi@intel.com> >>>>> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> >>>>> Signed-off-by: Len Brown <len.brown@intel.com> >>>>> >>>>> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. >>>>> >>>>> >>>> I don't see the problem in that commit yet, and if there is a problem >>>> with it, I would think that reverting it from 2.6.30-rc1 would solve >>>> it. But maybe it'd be useful to revert the whole .notify series to >>>> make sure. From 2.6.30-rc1, you should be able to revert these: >>>> >>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor >>>> 373cfc360ec773be2f7615e59a19f3313255db7c button >>>> 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure >>>> >>>> What happens with those commits reverted? >>>> >>>> >>> I'll find out tomorrow. >>> >>> >> The fact that it still hangs even when you don't load the processor >> driver at all suggests that the 7ec0a729079 commit identified by the >> bisection is not the real problem. That commit only touches >> drivers/acpi/processor_core.c. >> >> > > Yah. > > >> I think it's more likely some kind of race or missed wakeup. >> >> Since it seems to be sensitive to whether the battery is present, >> I guess you could try blacklisting the battery.ko driver. There >> have been a few changes to it since 2.6.29-rc8. If things work >> without battery.ko, we can look through those changes. >> >> > > Good guess :-). I tried a couple of times either way, and blacklisting > "battery" definitely avoids the hang. > Ok, I tried reverting 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init" and that fixed it. Thanks for your help so far, Bjorn. Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-14 9:26 ` Alan Jenkins @ 2009-04-14 14:59 ` Bjorn Helgaas 2009-04-14 15:17 ` Arjan van de Ven 0 siblings, 1 reply; 32+ messages in thread From: Bjorn Helgaas @ 2009-04-14 14:59 UTC (permalink / raw) To: Alan Jenkins Cc: linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi, Arjan van de Ven On Tuesday 14 April 2009 03:26:13 am Alan Jenkins wrote: > Alan Jenkins wrote: > > Bjorn Helgaas wrote: > >> On Monday 13 April 2009 01:57:00 pm Alan Jenkins wrote: > >>> Bjorn Helgaas wrote: > >>>> On Sunday 12 April 2009 07:11:57 am Alan Jenkins wrote: > >>>> > >>>> You mention that this occurs when booting off battery. So I > >>>> assume everything works fine when the EEE is plugged in to the > >>>> wall socket? > >>>> > >>> When I tested it before, that was what I found. > >>> > >>> However, I now find that's not quite right. It only works (i.e. doesn't > >>> hang) if I remove the battery as well as plugging it into the wall. If > >>> I have the battery in, it hangs. > > > > ... and right now, I can only reproduce it by booting with it plugged > > into the wall and the battery present. If I unplug it from the wall, it > > boots fine. > > > > It must be affected by something else as well, maybe battery level or > > charging / discharging status. > > > >>>>>>>> Magic SysRQ keys work though. ... > >>>>>> I was able to run SysRq-P, and found the following backtrace - > >>>>>> > >>>>>> Pid: 0 > >>>>>> EIP is at acpi_idle_enter_bm+0x1df/0x208 [processor] > >>>>>> > >>>> Can you figure out where this is in acpi_idle_enter_bm() or > >>>> maybe just email me your processor.ko module? > >>>> > >>>> Does it always happen at the same point? > >>>> > >>> Yes, it always happens at the same point. > >>> > >>> It turns out I can read the runes, but I don't understand what they're > >>> saying :-). > >>> > >> I'm not much good with x86 assembly either :-) > >> > >> I think that in both cases below, you're right after enabling > >> interrupts and about to exit the idle routine. My guess is the > >> system is not really hung; it just doesn't think it has anything > >> to do and is spending all its time in the idle loop. > >> > >>> 00001bd0 <acpi_idle_enter_bm>: > >>> ... > >>> 00001bd0 + 0x1df = 00001daf > >>> ... > >>> 1d70: b8 03 00 00 00 mov $0x3,%eax > >>> 1d75: e8 90 f3 ff ff call 110a <tsc_halts_in_c> > >>> 1d7a: 85 c0 test %eax,%eax > >>> 1d7c: 74 0a je 1d88 <acpi_idle_enter_bm+0x1b8> > >>> 1d7e: b8 0e 09 00 00 mov $0x90e,%eax > >>> 1d7f: R_386_32 .rodata.str1.1 > >>> 1d83: e8 fc ff ff ff call 1d84 <acpi_idle_enter_bm+0x1b4> > >>> 1d84: R_386_PC32 mark_tsc_unstable > >>> 1d88: 8b 45 e8 mov -0x18(%ebp),%eax > >>> 1d8b: 8b 55 ec mov -0x14(%ebp),%edx > >>> 1d8e: e8 ab fd ff ff call 1b3e <us_to_pm_timer_ticks> > >>> 1d93: 89 c3 mov %eax,%ebx > >>> 1d95: b8 17 01 00 00 mov $0x117,%eax > >>> 1d9a: 69 ca 17 01 00 00 imul $0x117,%edx,%ecx > >>> 1da0: 89 d6 mov %edx,%esi > >>> 1da2: f7 e3 mul %ebx > >>> 1da4: 8d 14 11 lea (%ecx,%edx,1),%edx > >>> 1da7: e8 fc ff ff ff call 1da8 <acpi_idle_enter_bm+0x1d8> > >>> 1da8: R_386_PC32 sched_clock_idle_wakeup_event > >>> 1dac: fb sti > >>> 1dad: 89 e0 mov %esp,%eax > >>> -> 1daf: 31 c9 xor %ecx,%ecx <--------- > >>> 1db1: 25 00 e0 ff ff and $0xffffe000,%eax > >>> 1db6: 89 fa mov %edi,%edx > >>> 1db8: 83 48 0c 04 orl $0x4,0xc(%eax) > >>> 1dbc: ff 47 18 incl 0x18(%edi) > >>> 1dbf: 8b 45 e4 mov -0x1c(%ebp),%eax > >>> 1dc2: e8 a4 f5 ff ff call 136b <acpi_state_timer_broadcast> > >>> 1dc7: 01 5f 1c add %ebx,0x1c(%edi) > >>> 1dca: 11 77 20 adc %esi,0x20(%edi) > >>> 1dcd: 8b 45 e8 mov -0x18(%ebp),%eax > >>> 1dd0: 83 c4 10 add $0x10,%esp > >>> 1dd3: 5b pop %ebx > >>> 1dd4: 5e pop %esi > >>> 1dd5: 5f pop %edi > >>> 1dd6: 5d pop %ebp > >>> 1dd7: c3 ret > >>> > >>>> If you blacklist or rename the processor module to prevent it > >>>> from loading, does that keep the hang from occurring? > >>>> > >>> No. In that case I get the hang in default_idle+0x59/0x95 > >>> > >>> 0000007a <default_idle>: > >>> 7a: 55 push %ebp > >>> 7b: 89 e5 mov %esp,%ebp > >>> 7d: 56 push %esi > >>> 7e: 53 push %ebx > >>> 7f: 83 ec 18 sub $0x18,%esp > >>> 82: 83 3d 18 00 00 00 00 cmpl $0x0,0x18 > >>> 84: R_386_32 .bss > >>> 89: 75 7a jne 105 <default_idle+0x8b> > >>> 8b: 80 3d 05 00 00 00 00 cmpb $0x0,0x5 > >>> 8d: R_386_32 boot_cpu_data > >>> 92: 74 71 je 105 <default_idle+0x8b> > >>> 94: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > >>> 96: R_386_32 __tracepoint_power_start > >>> 9b: 74 23 je c0 <default_idle+0x46> > >>> 9d: 8b 1d 08 00 00 00 mov 0x8,%ebx > >>> 9f: R_386_32 __tracepoint_power_start > >>> a3: 85 db test %ebx,%ebx > >>> a5: 74 19 je c0 <default_idle+0x46> > >>> a7: 8d 75 e0 lea -0x20(%ebp),%esi > >>> aa: b9 01 00 00 00 mov $0x1,%ecx > >>> af: ba 01 00 00 00 mov $0x1,%edx > >>> b4: 89 f0 mov %esi,%eax > >>> b6: ff 13 call *(%ebx) > >>> b8: 83 c3 04 add $0x4,%ebx > >>> bb: 83 3b 00 cmpl $0x0,(%ebx) > >>> be: 75 ea jne aa <default_idle+0x30> > >>> c0: 89 e0 mov %esp,%eax > >>> c2: 25 00 e0 ff ff and $0xffffe000,%eax > >>> c7: 83 60 0c fb andl $0xfffffffb,0xc(%eax) > >>> cb: f6 40 08 08 testb $0x8,0x8(%eax) > >>> cf: 75 04 jne d5 <default_idle+0x5b> > >>> d1: fb sti > >>> d2: f4 hlt > >>> --> d3: eb 01 jmp d6 <default_idle+0x5c> <-------- > >>> d5: fb sti > >>> d6: 89 e0 mov %esp,%eax > >>> d8: 25 00 e0 ff ff and $0xffffe000,%eax > >>> dd: 83 48 0c 04 orl $0x4,0xc(%eax) > >>> e1: 83 3d 04 00 00 00 00 cmpl $0x0,0x4 > >>> e3: R_386_32 __tracepoint_power_end > >>> e8: 74 1e je 108 <default_idle+0x8e> > >>> > >>>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f is first bad commit > >>>>> commit 7ec0a7290797f57b780f792d12f4bcc19c83aa4f > >>>>> Author: Bjorn Helgaas <bjorn.helgaas@hp.com> > >>>>> Date: Mon Mar 30 17:48:24 2009 +0000 > >>>>> > >>>> Ouch, sorry about that. Thanks for doing all the bisection work. > >>>> > >>>>> ACPI: processor: use .notify method instead of installing handler > >>>>> directly > >>>>> > >>>>> This patch adds a .notify() method. The presence of .notify() causes > >>>>> Linux/ACPI to manage event handlers and notify handlers on our behalf, > >>>>> so we don't have to install and remove them ourselves. > >>>>> > >>>>> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> > >>>>> CC: Zhang Rui <rui.zhang@intel.com> > >>>>> CC: Zhao Yakui <yakui.zhao@intel.com> > >>>>> CC: Venki Pallipadi <venkatesh.pallipadi@intel.com> > >>>>> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> > >>>>> Signed-off-by: Len Brown <len.brown@intel.com> > >>>>> > >>>>> However, reverting this commit from v2.6.30-rc1 doesn't solve the hang. > >>>>> > >>>> I don't see the problem in that commit yet, and if there is a problem > >>>> with it, I would think that reverting it from 2.6.30-rc1 would solve > >>>> it. But maybe it'd be useful to revert the whole .notify series to > >>>> make sure. From 2.6.30-rc1, you should be able to revert these: > >>>> > >>>> 7ec0a7290797f57b780f792d12f4bcc19c83aa4f processor > >>>> 373cfc360ec773be2f7615e59a19f3313255db7c button > >>>> 46ec8598fde74ba59703575c22a6fb0b6b151bb6 Linux/ACPI infrastructure > >>>> > >>>> What happens with those commits reverted? > >>>> > >>> I'll find out tomorrow. > >>> > >> The fact that it still hangs even when you don't load the processor > >> driver at all suggests that the 7ec0a729079 commit identified by the > >> bisection is not the real problem. That commit only touches > >> drivers/acpi/processor_core.c. > > > > Yah. > > > >> I think it's more likely some kind of race or missed wakeup. > >> > >> Since it seems to be sensitive to whether the battery is present, > >> I guess you could try blacklisting the battery.ko driver. There > >> have been a few changes to it since 2.6.29-rc8. If things work > >> without battery.ko, we can look through those changes. > > > > Good guess :-). I tried a couple of times either way, and blacklisting > > "battery" definitely avoids the hang. > > Ok, I tried reverting > > 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init" > > and that fixed it. I can't help with the real problem of why the asynchronous battery init causes the hang. But I do object to the magic makefile ordering change in that commit. Nobody reading the makefile can tell why battery is down at the end, and moving it apparently slows down boot significantly. So the ordering change just feels like a band-aid that covers up a place where ACPI could be improved. I don't see anything unusual in what the battery init is doing, so it's probably just some ACPI methods that take a long time to execute. Other drivers could easily have similar problems. Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-14 14:59 ` Bjorn Helgaas @ 2009-04-14 15:17 ` Arjan van de Ven 2009-04-14 15:37 ` Alan Jenkins [not found] ` <20090414081728.10de978a-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 2 replies; 32+ messages in thread From: Arjan van de Ven @ 2009-04-14 15:17 UTC (permalink / raw) To: Bjorn Helgaas Cc: Alan Jenkins, linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi On Tue, 14 Apr 2009 08:59:01 -0600 Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: > I can't help with the real problem of why the asynchronous battery > init causes the hang. that got fixed already for the module case. > > But I do object to the magic makefile ordering change in that commit. > Nobody reading the makefile can tell why battery is down at the end, > and moving it apparently slows down boot significantly. for all cases I've seen it actually speeds it up, because the battery now runs concurrently with the disk probe. > So the > ordering change just feels like a band-aid that covers up a place > where ACPI could be improved. the reason for the move is that both the battery and other pieces take the big acpi lock; which defeats the parallelism. So the battery needs to happen at the end instead. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [BISECTED] EEE PC hangs when booting off battery 2009-04-14 15:17 ` Arjan van de Ven @ 2009-04-14 15:37 ` Alan Jenkins [not found] ` <20090414081728.10de978a-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 1 sibling, 0 replies; 32+ messages in thread From: Alan Jenkins @ 2009-04-14 15:37 UTC (permalink / raw) To: Arjan van de Ven Cc: Bjorn Helgaas, linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi Arjan van de Ven wrote: > On Tue, 14 Apr 2009 08:59:01 -0600 > Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: > > >> I can't help with the real problem of why the asynchronous battery >> init causes the hang. >> > > that got fixed already for the module case. > Are you not thinking of the BUG due to calling __init functions from async? I have a distinct _hang_ which occurs in v2.6.30-rc1-205-g08f4416. Do you have a fix for that I can try? Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090414081728.10de978a-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <20090414081728.10de978a-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2009-04-14 15:48 ` Bjorn Helgaas [not found] ` <200904140948.37633.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Bjorn Helgaas @ 2009-04-14 15:48 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Jenkins, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List, Venkatesh Pallipadi On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: > On Tue, 14 Apr 2009 08:59:01 -0600 > Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> wrote: > > > I can't help with the real problem of why the asynchronous battery > > init causes the hang. > > that got fixed already for the module case. But apparently still broken for the builtin case? I think Alan is running pretty new bits -- he said "latest git" on April 11. > > But I do object to the magic makefile ordering change in that commit. > > Nobody reading the makefile can tell why battery is down at the end, > > and moving it apparently slows down boot significantly. > > for all cases I've seen it actually speeds it up, because the battery > now runs concurrently with the disk probe. I understand; I just meant that if somebody moves it back where it was, we'll mysteriously lose the speedup you got. Maybe a comment in the makefile would be a short-term solution. > > So the > > ordering change just feels like a band-aid that covers up a place > > where ACPI could be improved. > > the reason for the move is that both the battery and other pieces take > the big acpi lock; which defeats the parallelism. So the battery needs > to happen at the end instead. Yep. But I don't think it's anything about the Linux battery driver itself that makes it slow. I think it's more likely that some of the ACPI methods it executes happen to be slow. And that could afflict *any* driver, depending on the whim of a BIOS writer. My guess is that if we could run ACPI methods concurrently and avoid that big lock, the ordering wouldn't matter. I know we probably can't do that any time soon, but I think it's good to make the problem visible at least with a "we need this magic order because ACPI doesn't support concurrent method execution" sort of comment. Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <200904140948.37633.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>]
* RE: [BISECTED] EEE PC hangs when booting off battery [not found] ` <200904140948.37633.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> @ 2009-04-14 15:55 ` Moore, Robert [not found] ` <4911F71203A09E4D9981D27F9D8308581D3722EB-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2009-04-15 14:32 ` [FIXED] " Alan Jenkins 1 sibling, 1 reply; 32+ messages in thread From: Moore, Robert @ 2009-04-14 15:55 UTC (permalink / raw) To: Bjorn Helgaas, Arjan van de Ven Cc: Alan Jenkins, linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel, Kernel Testers List, Pallipadi, Venkatesh In fact, ACPI methods can execute concurrently -- constrained by the ACPI specification itself. The "big lock" is released before anything that will block for a significant amount of time, allowing other methods to run. Bob >-----Original Message----- >From: linux-acpi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-acpi- >owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bjorn Helgaas >Sent: Tuesday, April 14, 2009 8:49 AM >To: Arjan van de Ven >Cc: Alan Jenkins; linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel; Kernel Testers >List; Pallipadi, Venkatesh >Subject: Re: [BISECTED] EEE PC hangs when booting off battery > >On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: >> On Tue, 14 Apr 2009 08:59:01 -0600 >> Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> wrote: >> >> > I can't help with the real problem of why the asynchronous battery >> > init causes the hang. >> >> that got fixed already for the module case. > >But apparently still broken for the builtin case? I think Alan is >running pretty new bits -- he said "latest git" on April 11. > >> > But I do object to the magic makefile ordering change in that commit. >> > Nobody reading the makefile can tell why battery is down at the end, >> > and moving it apparently slows down boot significantly. >> >> for all cases I've seen it actually speeds it up, because the battery >> now runs concurrently with the disk probe. > >I understand; I just meant that if somebody moves it back where it >was, we'll mysteriously lose the speedup you got. Maybe a comment >in the makefile would be a short-term solution. > >> > So the >> > ordering change just feels like a band-aid that covers up a place >> > where ACPI could be improved. >> >> the reason for the move is that both the battery and other pieces take >> the big acpi lock; which defeats the parallelism. So the battery needs >> to happen at the end instead. > >Yep. But I don't think it's anything about the Linux battery driver >itself that makes it slow. I think it's more likely that some of the >ACPI methods it executes happen to be slow. And that could afflict >*any* driver, depending on the whim of a BIOS writer. > >My guess is that if we could run ACPI methods concurrently and avoid >that big lock, the ordering wouldn't matter. I know we probably can't >do that any time soon, but I think it's good to make the problem >visible at least with a "we need this magic order because ACPI doesn't >support concurrent method execution" sort of comment. > >Bjorn >-- >To unsubscribe from this list: send the line "unsubscribe linux-acpi" in >the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <4911F71203A09E4D9981D27F9D8308581D3722EB-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <4911F71203A09E4D9981D27F9D8308581D3722EB-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2009-04-14 16:56 ` Bjorn Helgaas 2009-04-14 17:22 ` Moore, Robert [not found] ` <200904141056.14159.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> 0 siblings, 2 replies; 32+ messages in thread From: Bjorn Helgaas @ 2009-04-14 16:56 UTC (permalink / raw) To: Moore, Robert Cc: Arjan van de Ven, Alan Jenkins, linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel, Kernel Testers List, Pallipadi, Venkatesh On Tuesday 14 April 2009 09:55:58 am Moore, Robert wrote: > In fact, ACPI methods can execute concurrently -- constrained by the ACPI specification itself. The "big lock" is released before anything that will block for a significant amount of time, allowing other methods to run. We had a discussion about this a couple years ago: http://www.mail-archive.com/linux-acpi%40vger.kernel.org/msg06976.html Here's my understanding (please correct me if I'm wrong): - The ACPI CA holds a mutex while executing an AML method (see acpi_ex_enter_interpreter()). - When an AML method blocks, the ACPI CA releases the mutex, which allows another AML method to run while the first is blocked. - Because of the mutex, at most one AML method can be executing at any given time. Others may have started, but they are blocked until the current method completes or blocks itself. - The undocumented "acpi_serialize" option makes the ACPI CA hold the mutex for the entire duration of a method, even while a method is blocked. Based on that, my guess about the battery init being slow because of a long-running AML method might be incorrect, although I suppose it's still possible to write AML busy-loops that never block. I'd really like to understand what's making it slow. The only thing that looks at all unusual in acpi_battery_add() is acpi_battery_update(), and that evaluates _STA and _BIF and not much else. Arjan, do you have any more information? Is the battery driver slow on all laptops? If not, where did you see the problem? Do you have a DSDT for them? Bjorn > >-----Original Message----- > >From: linux-acpi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-acpi- > >owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bjorn Helgaas > >Sent: Tuesday, April 14, 2009 8:49 AM > >To: Arjan van de Ven > >Cc: Alan Jenkins; linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel; Kernel Testers > >List; Pallipadi, Venkatesh > >Subject: Re: [BISECTED] EEE PC hangs when booting off battery > > > >On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: > >> On Tue, 14 Apr 2009 08:59:01 -0600 > >> Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> wrote: > >> > >> > I can't help with the real problem of why the asynchronous battery > >> > init causes the hang. > >> > >> that got fixed already for the module case. > > > >But apparently still broken for the builtin case? I think Alan is > >running pretty new bits -- he said "latest git" on April 11. > > > >> > But I do object to the magic makefile ordering change in that commit. > >> > Nobody reading the makefile can tell why battery is down at the end, > >> > and moving it apparently slows down boot significantly. > >> > >> for all cases I've seen it actually speeds it up, because the battery > >> now runs concurrently with the disk probe. > > > >I understand; I just meant that if somebody moves it back where it > >was, we'll mysteriously lose the speedup you got. Maybe a comment > >in the makefile would be a short-term solution. > > > >> > So the > >> > ordering change just feels like a band-aid that covers up a place > >> > where ACPI could be improved. > >> > >> the reason for the move is that both the battery and other pieces take > >> the big acpi lock; which defeats the parallelism. So the battery needs > >> to happen at the end instead. > > > >Yep. But I don't think it's anything about the Linux battery driver > >itself that makes it slow. I think it's more likely that some of the > >ACPI methods it executes happen to be slow. And that could afflict > >*any* driver, depending on the whim of a BIOS writer. > > > >My guess is that if we could run ACPI methods concurrently and avoid > >that big lock, the ordering wouldn't matter. I know we probably can't > >do that any time soon, but I think it's good to make the problem > >visible at least with a "we need this magic order because ACPI doesn't > >support concurrent method execution" sort of comment. > > > >Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [BISECTED] EEE PC hangs when booting off battery 2009-04-14 16:56 ` Bjorn Helgaas @ 2009-04-14 17:22 ` Moore, Robert [not found] ` <4911F71203A09E4D9981D27F9D8308581D372489-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org> [not found] ` <200904141056.14159.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> 1 sibling, 1 reply; 32+ messages in thread From: Moore, Robert @ 2009-04-14 17:22 UTC (permalink / raw) To: Bjorn Helgaas Cc: Arjan van de Ven, Alan Jenkins, linux-acpi@vger.kernel.org, linux-kernel, Kernel Testers List, Pallipadi, Venkatesh Your understanding is correct. The only interesting thing that happens when executing the battery methods like _STA, _BIF, and _BST is that the EC is usually involved - and this can be slow. Note that the AML interpreter is unlocked before accessing an EC operation region. >-----Original Message----- >From: Bjorn Helgaas [mailto:bjorn.helgaas@hp.com] >Sent: Tuesday, April 14, 2009 9:56 AM >To: Moore, Robert >Cc: Arjan van de Ven; Alan Jenkins; linux-acpi@vger.kernel.org; linux- >kernel; Kernel Testers List; Pallipadi, Venkatesh >Subject: Re: [BISECTED] EEE PC hangs when booting off battery > >On Tuesday 14 April 2009 09:55:58 am Moore, Robert wrote: >> In fact, ACPI methods can execute concurrently -- constrained by the ACPI >specification itself. The "big lock" is released before anything that will >block for a significant amount of time, allowing other methods to run. > >We had a discussion about this a couple years ago: > http://www.mail-archive.com/linux-acpi%40vger.kernel.org/msg06976.html > >Here's my understanding (please correct me if I'm wrong): > >- The ACPI CA holds a mutex while executing an AML method (see > acpi_ex_enter_interpreter()). > >- When an AML method blocks, the ACPI CA releases the mutex, which > allows another AML method to run while the first is blocked. > >- Because of the mutex, at most one AML method can be executing at > any given time. Others may have started, but they are blocked > until the current method completes or blocks itself. > >- The undocumented "acpi_serialize" option makes the ACPI CA hold > the mutex for the entire duration of a method, even while a method > is blocked. > >Based on that, my guess about the battery init being slow because of >a long-running AML method might be incorrect, although I suppose >it's still possible to write AML busy-loops that never block. > >I'd really like to understand what's making it slow. The only >thing that looks at all unusual in acpi_battery_add() is >acpi_battery_update(), and that evaluates _STA and _BIF and >not much else. > >Arjan, do you have any more information? Is the battery driver >slow on all laptops? If not, where did you see the problem? Do >you have a DSDT for them? > >Bjorn > >> >-----Original Message----- >> >From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi- >> >owner@vger.kernel.org] On Behalf Of Bjorn Helgaas >> >Sent: Tuesday, April 14, 2009 8:49 AM >> >To: Arjan van de Ven >> >Cc: Alan Jenkins; linux-acpi@vger.kernel.org; linux-kernel; Kernel >Testers >> >List; Pallipadi, Venkatesh >> >Subject: Re: [BISECTED] EEE PC hangs when booting off battery >> > >> >On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: >> >> On Tue, 14 Apr 2009 08:59:01 -0600 >> >> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: >> >> >> >> > I can't help with the real problem of why the asynchronous battery >> >> > init causes the hang. >> >> >> >> that got fixed already for the module case. >> > >> >But apparently still broken for the builtin case? I think Alan is >> >running pretty new bits -- he said "latest git" on April 11. >> > >> >> > But I do object to the magic makefile ordering change in that >commit. >> >> > Nobody reading the makefile can tell why battery is down at the end, >> >> > and moving it apparently slows down boot significantly. >> >> >> >> for all cases I've seen it actually speeds it up, because the battery >> >> now runs concurrently with the disk probe. >> > >> >I understand; I just meant that if somebody moves it back where it >> >was, we'll mysteriously lose the speedup you got. Maybe a comment >> >in the makefile would be a short-term solution. >> > >> >> > So the >> >> > ordering change just feels like a band-aid that covers up a place >> >> > where ACPI could be improved. >> >> >> >> the reason for the move is that both the battery and other pieces take >> >> the big acpi lock; which defeats the parallelism. So the battery needs >> >> to happen at the end instead. >> > >> >Yep. But I don't think it's anything about the Linux battery driver >> >itself that makes it slow. I think it's more likely that some of the >> >ACPI methods it executes happen to be slow. And that could afflict >> >*any* driver, depending on the whim of a BIOS writer. >> > >> >My guess is that if we could run ACPI methods concurrently and avoid >> >that big lock, the ordering wouldn't matter. I know we probably can't >> >do that any time soon, but I think it's good to make the problem >> >visible at least with a "we need this magic order because ACPI doesn't >> >support concurrent method execution" sort of comment. >> > >> >Bjorn ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <4911F71203A09E4D9981D27F9D8308581D372489-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <4911F71203A09E4D9981D27F9D8308581D372489-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2009-04-14 17:28 ` Alan Jenkins 0 siblings, 0 replies; 32+ messages in thread From: Alan Jenkins @ 2009-04-14 17:28 UTC (permalink / raw) To: Moore, Robert Cc: Bjorn Helgaas, Arjan van de Ven, linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel, Kernel Testers List, Pallipadi, Venkatesh So I guess the problem is that the other ACPI modules execute methods which also need to access the EC. And it's the EC that causes them to be serialized against the battery module. Moore, Robert wrote: > Your understanding is correct. > > The only interesting thing that happens when executing the battery methods like _STA, _BIF, and _BST is that the EC is usually involved - and this can be slow. Note that the AML interpreter is unlocked before accessing an EC operation region. > > >> -----Original Message----- >> From: Bjorn Helgaas [mailto:bjorn.helgaas-VXdhtT5mjnY@public.gmane.org] >> Sent: Tuesday, April 14, 2009 9:56 AM >> To: Moore, Robert >> Cc: Arjan van de Ven; Alan Jenkins; linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux- >> kernel; Kernel Testers List; Pallipadi, Venkatesh >> Subject: Re: [BISECTED] EEE PC hangs when booting off battery >> >> On Tuesday 14 April 2009 09:55:58 am Moore, Robert wrote: >> >>> In fact, ACPI methods can execute concurrently -- constrained by the ACPI >>> >> specification itself. The "big lock" is released before anything that will >> block for a significant amount of time, allowing other methods to run. >> >> We had a discussion about this a couple years ago: >> http://www.mail-archive.com/linux-acpi%40vger.kernel.org/msg06976.html >> >> Here's my understanding (please correct me if I'm wrong): >> >> - The ACPI CA holds a mutex while executing an AML method (see >> acpi_ex_enter_interpreter()). >> >> - When an AML method blocks, the ACPI CA releases the mutex, which >> allows another AML method to run while the first is blocked. >> >> - Because of the mutex, at most one AML method can be executing at >> any given time. Others may have started, but they are blocked >> until the current method completes or blocks itself. >> >> - The undocumented "acpi_serialize" option makes the ACPI CA hold >> the mutex for the entire duration of a method, even while a method >> is blocked. >> >> Based on that, my guess about the battery init being slow because of >> a long-running AML method might be incorrect, although I suppose >> it's still possible to write AML busy-loops that never block. >> >> I'd really like to understand what's making it slow. The only >> thing that looks at all unusual in acpi_battery_add() is >> acpi_battery_update(), and that evaluates _STA and _BIF and >> not much else. >> >> Arjan, do you have any more information? Is the battery driver >> slow on all laptops? If not, where did you see the problem? Do >> you have a DSDT for them? >> >> Bjorn >> >> >>>> -----Original Message----- >>>> From: linux-acpi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-acpi- >>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Bjorn Helgaas >>>> Sent: Tuesday, April 14, 2009 8:49 AM >>>> To: Arjan van de Ven >>>> Cc: Alan Jenkins; linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel; Kernel >>>> >> Testers >> >>>> List; Pallipadi, Venkatesh >>>> Subject: Re: [BISECTED] EEE PC hangs when booting off battery >>>> >>>> On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: >>>> >>>>> On Tue, 14 Apr 2009 08:59:01 -0600 >>>>> Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> wrote: >>>>> >>>>> >>>>>> I can't help with the real problem of why the asynchronous battery >>>>>> init causes the hang. >>>>>> >>>>> that got fixed already for the module case. >>>>> >>>> But apparently still broken for the builtin case? I think Alan is >>>> running pretty new bits -- he said "latest git" on April 11. >>>> >>>> >>>>>> But I do object to the magic makefile ordering change in that >>>>>> >> commit. >> >>>>>> Nobody reading the makefile can tell why battery is down at the end, >>>>>> and moving it apparently slows down boot significantly. >>>>>> >>>>> for all cases I've seen it actually speeds it up, because the battery >>>>> now runs concurrently with the disk probe. >>>>> >>>> I understand; I just meant that if somebody moves it back where it >>>> was, we'll mysteriously lose the speedup you got. Maybe a comment >>>> in the makefile would be a short-term solution. >>>> >>>> >>>>>> So the >>>>>> ordering change just feels like a band-aid that covers up a place >>>>>> where ACPI could be improved. >>>>>> >>>>> the reason for the move is that both the battery and other pieces take >>>>> the big acpi lock; which defeats the parallelism. So the battery needs >>>>> to happen at the end instead. >>>>> >>>> Yep. But I don't think it's anything about the Linux battery driver >>>> itself that makes it slow. I think it's more likely that some of the >>>> ACPI methods it executes happen to be slow. And that could afflict >>>> *any* driver, depending on the whim of a BIOS writer. >>>> >>>> My guess is that if we could run ACPI methods concurrently and avoid >>>> that big lock, the ordering wouldn't matter. I know we probably can't >>>> do that any time soon, but I think it's good to make the problem >>>> visible at least with a "we need this magic order because ACPI doesn't >>>> support concurrent method execution" sort of comment. >>>> >>>> Bjorn >>>> ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <200904141056.14159.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>]
* Re: [BISECTED] EEE PC hangs when booting off battery [not found] ` <200904141056.14159.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> @ 2009-04-15 15:38 ` Len Brown 0 siblings, 0 replies; 32+ messages in thread From: Len Brown @ 2009-04-15 15:38 UTC (permalink / raw) To: Bjorn Helgaas Cc: Moore, Robert, Arjan van de Ven, Alan Jenkins, linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel, Kernel Testers List, Pallipadi, Venkatesh > > In fact, ACPI methods can execute concurrently -- constrained by the ACPI specification itself. The "big lock" is released before anything that will block for a significant amount of time, allowing other methods to run. > > We had a discussion about this a couple years ago: > http://www.mail-archive.com/linux-acpi%40vger.kernel.org/msg06976.html > > Here's my understanding (please correct me if I'm wrong): > > - The ACPI CA holds a mutex while executing an AML method (see > acpi_ex_enter_interpreter()). > > - When an AML method blocks, the ACPI CA releases the mutex, which > allows another AML method to run while the first is blocked. > > - Because of the mutex, at most one AML method can be executing at > any given time. Others may have started, but they are blocked > until the current method completes or blocks itself. > > - The undocumented "acpi_serialize" option makes the ACPI CA hold > the mutex for the entire duration of a method, even while a method > is blocked. > > Based on that, my guess about the battery init being slow because of > a long-running AML method might be incorrect, although I suppose > it's still possible to write AML busy-loops that never block. > > I'd really like to understand what's making it slow. The only > thing that looks at all unusual in acpi_battery_add() is > acpi_battery_update(), and that evaluates _STA and _BIF and > not much else. > > Arjan, do you have any more information? Is the battery driver > slow on all laptops? If not, where did you see the problem? Do > you have a DSDT for them? Before I added Arjan's async battery patch, I booted a couple of laptops with async disabled and enabled. I saw about 40 usec battery init (T61, nx6325). Obviously, this was too fast for async battery to have any measurable benefit on these machines (though async disk probing, by comparison, was a big win). But Arjan saw O(300) usec serial battery init on an EEE PC, which is why he optimized it. Indeed, this thread was started b/c a failure was seen on an EEEPC, probably because of its unusually slow timing. -Len ^ permalink raw reply [flat|nested] 32+ messages in thread
* [FIXED] EEE PC hangs when booting off battery [not found] ` <200904140948.37633.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> 2009-04-14 15:55 ` Moore, Robert @ 2009-04-15 14:32 ` Alan Jenkins 2009-04-22 12:17 ` Alan Jenkins 1 sibling, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-15 14:32 UTC (permalink / raw) To: Bjorn Helgaas Cc: Arjan van de Ven, linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List, Venkatesh Pallipadi Bjorn Helgaas wrote: > On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: > >> On Tue, 14 Apr 2009 08:59:01 -0600 >> Bjorn Helgaas <bjorn.helgaas-VXdhtT5mjnY@public.gmane.org> wrote: >> >> >>> I can't help with the real problem of why the asynchronous battery >>> init causes the hang. >>> >> that got fixed already for the module case. >> > > But apparently still broken for the builtin case? I think Alan is > running pretty new bits -- he said "latest git" on April 11. > It's now fixed, in 2.6.30-rc2. My battery is modular btw. I suspect 5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot oops" [removal of __init] was not sufficient to fix my problem, but it was solved by the "real" fix, d6de2c80e9d758d2e36c21699117db6178c0f517 "async: Fix module loading async-work regression" [module loading waits on async work] I would argue there's still a question over why the asynchronous battery init (_with_ the oops fix) should cause a hang in the idle routine. But since the regression is fixed, it's not exactly an urgent question. Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [FIXED] EEE PC hangs when booting off battery 2009-04-15 14:32 ` [FIXED] " Alan Jenkins @ 2009-04-22 12:17 ` Alan Jenkins 2009-04-26 11:34 ` Alan Jenkins 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-22 12:17 UTC (permalink / raw) To: Arjan van de Ven Cc: Bjorn Helgaas, linux-acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi Alan Jenkins wrote: > Bjorn Helgaas wrote: > >> On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: >> >> >>> On Tue, 14 Apr 2009 08:59:01 -0600 >>> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: >>> >>> >>> >>>> I can't help with the real problem of why the asynchronous battery >>>> init causes the hang. >>>> >>>> >>> that got fixed already for the module case. >>> >>> >> But apparently still broken for the builtin case? I think Alan is >> running pretty new bits -- he said "latest git" on April 11. >> >> > > It's now fixed, in 2.6.30-rc2. My battery is modular btw. I suspect > > 5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot > oops" [removal of __init] > > was not sufficient to fix my problem, but it was solved by the "real" fix, > > d6de2c80e9d758d2e36c21699117db6178c0f517 "async: Fix module loading > async-work regression" [module loading waits on async work] > > > I would argue there's still a question over why the asynchronous battery > init (_with_ the oops fix) should cause a hang in the idle routine. But > since the regression is fixed, it's not exactly an urgent question. > Ugh. Recently I tried building the battery driver into the kernel, to benefit from the async work. Later, I tried booting from the battery and it hung again. This time, the kernel did not even respond to SysRq. I tried nmi_watchdog=1 and waiting 2 minutes, but the watchdog didn't trigger either. As before, it doesn't happen with acpi=off. I checked that this still happened in todays rc3, and it doesn't happen if I revert 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init" Regards Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* EEE PC hangs when booting off battery 2009-04-22 12:17 ` Alan Jenkins @ 2009-04-26 11:34 ` Alan Jenkins [not found] ` <49F446AE.6070607-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-26 11:34 UTC (permalink / raw) To: linux-wireless@vger.kernel.org Cc: Arjan van de Ven, linux acpi, linux-kernel, Kernel Testers List, Venkatesh Pallipadi, Bjorn Helgaas Alan Jenkins wrote: > Alan Jenkins wrote: > >> Bjorn Helgaas wrote: >> >> >>> On Tuesday 14 April 2009 09:17:28 am Arjan van de Ven wrote: >>> >>> >>> >>>> On Tue, 14 Apr 2009 08:59:01 -0600 >>>> Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: >>>> >>>> >>>> >>>> >>>>> I can't help with the real problem of why the asynchronous battery >>>>> init causes the hang. >>>>> >>>>> >>>>> >>>> that got fixed already for the module case. >>>> >>>> >>>> >>> But apparently still broken for the builtin case? I think Alan is >>> running pretty new bits -- he said "latest git" on April 11. >>> >>> >>> >> It's now fixed, in 2.6.30-rc2. My battery is modular btw. I suspect >> >> 5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot >> oops" [removal of __init] >> >> was not sufficient to fix my problem, but it was solved by the "real" fix, >> >> d6de2c80e9d758d2e36c21699117db6178c0f517 "async: Fix module loading >> async-work regression" [module loading waits on async work] >> >> >> I would argue there's still a question over why the asynchronous battery >> init (_with_ the oops fix) should cause a hang in the idle routine. But >> since the regression is fixed, it's not exactly an urgent question. >> >> > > Ugh. Recently I tried building the battery driver into the kernel, to > benefit from the async work. Later, I tried booting from the battery > and it hung again. > > This time, the kernel did not even respond to SysRq. I tried > nmi_watchdog=1 and waiting 2 minutes, but the watchdog didn't trigger > either. As before, it doesn't happen with acpi=off. > > I checked that this still happened in todays rc3, and it doesn't happen > if I revert > > 0f66af530116e9f4dd97f328d91718b56a6fc5a4 "ACPI: battery: asynchronous init" > > It looks like my hang is caused by linkwatch_event() deadlocking on rtnl_lock(). I can't see any direct connection to asynchronous battery init, so perhaps that is just revealing a bug by changing the timing. It appears I wasn't patient enough for hung task detection. If I leave it long enough, I see: <scrolled off top of screen> ? kobject_uevent_env ? kobject_uevent_env __mutex_lock_slowpath mutex_lock rtnl_lock linkwatch_event worker_thread ? linkwatch_event ? autoremove_wake_function ? worker_thread kthread kernel_thread_helper INFO: task modprobe:485 blocked for more than 120 seconds Call trace: ? __atomic_notifier_call_chain schedule schedule_timeout ? notify_update ? do_con_write ? __wake_up wait_for_common ? default_wake_function wait_for_completion flush_cpu_workqueue ? wq_barrier_func flush_workqueue flush_scheduled_work tty_ldisc_release ? tty_fasyc tty_release_dev ? free_pgtables tty_release __fput filp_close sys_close syscall_call ? __send_remote_softirq ? usecs_to_jiffies I then seem to get another repetition of the second calltrace, followed by a new one INFO: task swapper:1 blocked for more than 120 seconds Call trace: schedule schedule_timeout ? __wake_up_common ? wake_up wait_for_common wait_for_completion call_usermodehelper_exec __request_module crypto_larval_lookup ? extract_entropy crypto_alg_mod_lookup crypto_alloc_base ieee80211_wep_init ieee80211_register_hw ? ath5k_hw_set_bss ath5k+pci_probe local_pci_probe pci_device_probe driver_probe_device __driver_attach bus_for_each_dev driver_attach ? __driver_attach buad_add_driver driver_register ? ktime_get_ts __pci_register_driver init_ath5k_pci _stext ? init_ath5k_pci ? proc_create_data ? register_ieq_proc kernel_init ? kernel_init kernel_thread_helper The hang happens at this point: [ 0.967588] scsi 1:0:0:0: Direct-Access ATA SILICONMOTION SM n/a PQ: 0 ANSI: 5 [ 0.968049] calling 4_sd_probe_async+0x0/0x225 @ 323 [ 0.968313] initcall 3_async_port_probe+0x0/0x95 returned 0 after 343051 usecs <mark> (see below). [ 0.968786] sd 1:0:0:0: [sda] 7815024 512-byte hardware sectors: (4.00 GB/3.72 GiB) [ 0.968964] sd 1:0:0:0: [sda] Write Protect is off [ 0.969062] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 0.969132] sd 1:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 0.969543] sda: sda1 sda2 [ 0.970965] sd 1:0:0:0: [sda] Attached SCSI disk [ 0.971073] initcall 4_sd_probe_async+0x0/0x225 returned 0 after 2849 usecs On a successful boot, the next lines are [ 0.971188] async_continuing @ 1 after 2483 usec [ 0.971305] Freeing unused kernel memory: 256k freed [ 1.071724] calling ata_generic_init+0x0/0x19 [ata_generic] @ 574 [ 1.073798] initcall ata_generic_init+0x0/0x19 [ata_generic] returned 0 after 144 usecs [ 1.183372] Clocksource tsc unstable (delta = -128600689 ns) [ 2.035932] EXT4-fs: delayed allocation enabled Also, on a successful boot, I see these additional lines at the point <mark> above. [ 0.968461] async_continuing @ 1 after 76663 usec [ 0.968556] async_waiting @ 1 In fact, when the hang happens I can see no "async_waiting @ 1" on my 50-line screen. Which makes sense if the kernel init process is hung in init_athk_pci(). Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <49F446AE.6070607-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F446AE.6070607-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-28 9:19 ` Alan Jenkins [not found] ` <49F6CA0E.5040101-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-28 9:19 UTC (permalink / raw) To: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: Arjan van de Ven, linux-kernel, Kernel Testers List I found a regression where my EEE hangs at boot time, if the battery is present. I'm confident it's a regression because it disappears if I revert Arjan's asynchronous battery initialisation. However, the evidence points to a deadlock in the wireless stack which has simply been uncovered by timing changes. If I leave the system long enough, I get a series of hung task warnings. They suggest the following deadlock: - ieee80211_wep_init(), which is called with rtnl_lock() held, is blocked in request_module() [waiting for modprobe to load a crypto module]. - modprobe is blocked in a call to flush_workqueue(), caused by closing a TTY. - worker_thread is blocked because the workqueue item linkwatch_event() is blocked on rtnl_lock. I've hacked up a test patch to move wep_init() outside of rtnl_lock, and it solved the problem. My one caveat is that it would probably be cleaner to move it after rtnl_unlock(), instead of before rtnl_lock(). I just wasn't 100% sure if that would be safe. Here's the patch: ---8<--- diff --git a/net/mac80211/main.c b/net/mac80211/main.c index fbcbed6..fffa7f9 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -909,6 +909,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) if (result < 0) goto fail_sta_info; + result = ieee80211_wep_init(local); + if (result < 0) { + printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", + wiphy_name(local->hw.wiphy), result); + goto fail_wep; + } + rtnl_lock(); result = dev_alloc_name(local->mdev, local->mdev->name); if (result < 0) @@ -930,14 +937,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) goto fail_rate; } - result = ieee80211_wep_init(local); - - if (result < 0) { - printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", - wiphy_name(local->hw.wiphy), result); - goto fail_wep; - } ^ permalink raw reply related [flat|nested] 32+ messages in thread
[parent not found: <49F6CA0E.5040101-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F6CA0E.5040101-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-28 9:58 ` Johannes Berg [not found] ` <1240912688.28835.10.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Johannes Berg @ 2009-04-28 9:58 UTC (permalink / raw) To: Alan Jenkins Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List [-- Attachment #1: Type: text/plain, Size: 1400 bytes --] It seems the only reason lockdep doesn't warn is the detour to userspace (modprobe) and the waiting for a userspace task (waiting isn't lockdep annotated and I don't think it can be) It seems you cannot load a module while under _any_ lock, ever, since userspace might turn around and do something that requires that lock? I think we should probably add a warning to the code that waits for the userspace task: debug_check_no_locks_held(current); Or not? Some purely internal locks _might_ be ok... but how could you verify that? > - ieee80211_wep_init(), which is called with rtnl_lock() held, is > blocked in request_module() [waiting for modprobe to load a crypto module]. Right. > - modprobe is blocked in a call to flush_workqueue(), caused by closing > a TTY. Can you point out where? I can't find that. > - worker_thread is blocked because the workqueue item linkwatch_event() > is blocked on rtnl_lock. This I know about. > I've hacked up a test patch to move wep_init() outside of rtnl_lock, and > it solved the problem. My one caveat is that it would probably be > cleaner to move it after rtnl_unlock(), instead of before rtnl_lock(). > I just wasn't 100% sure if that would be safe. Here's the patch: That doesn't seem relevant, this just does some initialisation. However, you definitely missed adding a call to wep_free(). johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <1240912688.28835.10.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>]
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <1240912688.28835.10.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org> @ 2009-04-28 10:27 ` Alan Jenkins [not found] ` <49F6DA14.7030608-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-28 10:27 UTC (permalink / raw) To: Johannes Berg Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List Johannes Berg wrote: > It seems the only reason lockdep doesn't warn is the detour to userspace > (modprobe) and the waiting for a userspace task (waiting isn't lockdep > annotated and I don't think it can be) > > It seems you cannot load a module while under _any_ lock, ever, since > userspace might turn around and do something that requires that lock? I > think we should probably add a warning to the code that waits for the > userspace task: > > debug_check_no_locks_held(current); > > Or not? Some purely internal locks _might_ be ok... but how could you > verify that? > > >> - ieee80211_wep_init(), which is called with rtnl_lock() held, is >> blocked in request_module() [waiting for modprobe to load a crypto module]. >> > > Right. > > >> - modprobe is blocked in a call to flush_workqueue(), caused by closing >> a TTY. >> > > Can you point out where? I can't find that. > I posted the calltraces earlier at <http://marc.info/?l=linux-wireless&m=124074566523506&w=2>. wait_for_completion flush_cpu_workqueue ? wq_barrier_func flush_workqueue flush_scheduled_work tty_ldisc_release ? tty_fasyc tty_release_dev ? free_pgtables tty_release __fput filp_close sys_close syscall_call >> - worker_thread is blocked because the workqueue item linkwatch_event() >> is blocked on rtnl_lock. >> > > This I know about. > > >> I've hacked up a test patch to move wep_init() outside of rtnl_lock, and >> it solved the problem. My one caveat is that it would probably be >> cleaner to move it after rtnl_unlock(), instead of before rtnl_lock(). >> I just wasn't 100% sure if that would be safe. Here's the patch: >> > > That doesn't seem relevant, this just does some initialisation. However, > you definitely missed adding a call to wep_free(). > Hah, I should have realized something was wrong when I noticed I was removing more lines that I added. The crypto init does cause the module load: wait_for_completion call_usermodehelper_exec __request_module crypto_larval_lookup ? extract_entropy crypto_alg_mod_lookup crypto_alloc_base ieee80211_wep_init ieee80211_register_hw ? ath5k_hw_set_bss ath5k+pci_probe local_pci_probe pci_device_probe driver_probe_device __driver_attach bus_for_each_dev driver_attach ? __driver_attach buad_add_driver driver_register ? ktime_get_ts __pci_register_driver init_ath5k_pci _stext ? init_ath5k_pci ? proc_create_data ? register_ieq_proc kernel_init Thanks Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <49F6DA14.7030608-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F6DA14.7030608-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-28 10:35 ` Johannes Berg 2009-04-29 11:14 ` Alan Jenkins 1 sibling, 0 replies; 32+ messages in thread From: Johannes Berg @ 2009-04-28 10:35 UTC (permalink / raw) To: Alan Jenkins Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List [-- Attachment #1: Type: text/plain, Size: 411 bytes --] On Tue, 2009-04-28 at 11:27 +0100, Alan Jenkins wrote: > > Can you point out where? I can't find that. > > I posted the calltraces earlier at > <http://marc.info/?l=linux-wireless&m=124074566523506&w=2>. > > wait_for_completion > flush_cpu_workqueue > ? wq_barrier_func > flush_workqueue > flush_scheduled_work > tty_ldisc_release Ah, I was just grepping for the wrong thing :) johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F6DA14.7030608-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 2009-04-28 10:35 ` Johannes Berg @ 2009-04-29 11:14 ` Alan Jenkins [not found] ` <49F83699.3000307-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 1 sibling, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-29 11:14 UTC (permalink / raw) To: John Linville Cc: Johannes Berg, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List Alan Jenkins wrote: > Johannes Berg wrote: > >> That doesn't seem relevant, this just does some initialisation. However, >> you definitely missed adding a call to wep_free(). >> >> > > Hah, I should have realized something was wrong when I noticed I was > removing more lines that I added. > > The crypto init does cause the module load: > > wait_for_completion > call_usermodehelper_exec > __request_module > crypto_larval_lookup > ? extract_entropy > crypto_alg_mod_lookup > crypto_alloc_base > ieee80211_wep_init > ieee80211_register_hw > Here's a corrected patch complete with changelog. If there are no other problems with it, can you please apply this for 2.6.30 to keep my EeePC regression-free? Thanks Alan ------> From c5e9dc036247e70956d1a28e8850c3810385dda0 Mon Sep 17 00:00:00 2001 From: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> Date: Wed, 29 Apr 2009 11:41:24 +0100 Subject: [PATCH] mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lock - ieee80211_wep_init(), which is called with rtnl_lock held, blocks in request_module() [waiting for modprobe to load a crypto module]. - modprobe blocks in a call to flush_workqueue(), when it closes a TTY [presumably when it exits]. - The workqueue item linkwatch_event() blocks on rtnl_lock. There's no reason for wep_init() to be called with rtnl_lock held, so just move it outside the critical section. Signed-off-by: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> --- net/mac80211/main.c | 19 +++++++++---------- 1 files changed, 9 insertions(+), 10 deletions(-) diff --git a/net/mac80211/main.c b/net/mac80211/main.c index fbcbed6..00968c2 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -909,6 +909,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) if (result < 0) goto fail_sta_info; + result = ieee80211_wep_init(local); + if (result < 0) { + printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", + wiphy_name(local->hw.wiphy), result); + goto fail_wep; + } + rtnl_lock(); result = dev_alloc_name(local->mdev, local->mdev->name); if (result < 0) @@ -930,14 +937,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) goto fail_rate; } - result = ieee80211_wep_init(local); - - if (result < 0) { - printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", - wiphy_name(local->hw.wiphy), result); - goto fail_wep; - } - /* add one default STA interface if supported */ if (local->hw.wiphy->interface_modes & BIT(NL80211_IFTYPE_STATION)) { result = ieee80211_if_add(local, "wlan%d", NULL, @@ -967,13 +966,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) return 0; -fail_wep: - rate_control_deinitialize(local); fail_rate: unregister_netdevice(local->mdev); local->mdev = NULL; fail_dev: rtnl_unlock(); + ieee80211_wep_free(local); +fail_wep: sta_info_stop(local); fail_sta_info: debugfs_hw_del(local); -- 1.5.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 32+ messages in thread
[parent not found: <49F83699.3000307-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>]
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F83699.3000307-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-29 11:20 ` Johannes Berg 2009-05-15 21:48 ` Rafael J. Wysocki 1 sibling, 0 replies; 32+ messages in thread From: Johannes Berg @ 2009-04-29 11:20 UTC (permalink / raw) To: Alan Jenkins Cc: John Linville, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List [-- Attachment #1: Type: text/plain, Size: 2590 bytes --] On Wed, 2009-04-29 at 12:14 +0100, Alan Jenkins wrote: > From c5e9dc036247e70956d1a28e8850c3810385dda0 Mon Sep 17 00:00:00 2001 > From: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> > Date: Wed, 29 Apr 2009 11:41:24 +0100 > Subject: [PATCH] mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lock > > - ieee80211_wep_init(), which is called with rtnl_lock held, blocks in > request_module() [waiting for modprobe to load a crypto module]. > > - modprobe blocks in a call to flush_workqueue(), when it closes a TTY > [presumably when it exits]. > > - The workqueue item linkwatch_event() blocks on rtnl_lock. > > There's no reason for wep_init() to be called with rtnl_lock held, so > just move it outside the critical section. Looks correct to me. > Signed-off-by: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> Reviewed-by: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org> > --- > net/mac80211/main.c | 19 +++++++++---------- > 1 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/net/mac80211/main.c b/net/mac80211/main.c > index fbcbed6..00968c2 100644 > --- a/net/mac80211/main.c > +++ b/net/mac80211/main.c > @@ -909,6 +909,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > if (result < 0) > goto fail_sta_info; > > + result = ieee80211_wep_init(local); > + if (result < 0) { > + printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", > + wiphy_name(local->hw.wiphy), result); > + goto fail_wep; > + } > + > rtnl_lock(); > result = dev_alloc_name(local->mdev, local->mdev->name); > if (result < 0) > @@ -930,14 +937,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > goto fail_rate; > } > > - result = ieee80211_wep_init(local); > - > - if (result < 0) { > - printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", > - wiphy_name(local->hw.wiphy), result); > - goto fail_wep; > - } > - > /* add one default STA interface if supported */ > if (local->hw.wiphy->interface_modes & BIT(NL80211_IFTYPE_STATION)) { > result = ieee80211_if_add(local, "wlan%d", NULL, > @@ -967,13 +966,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > > return 0; > > -fail_wep: > - rate_control_deinitialize(local); > fail_rate: > unregister_netdevice(local->mdev); > local->mdev = NULL; > fail_dev: > rtnl_unlock(); > + ieee80211_wep_free(local); > +fail_wep: > sta_info_stop(local); > fail_sta_info: > debugfs_hw_del(local); [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <49F83699.3000307-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> 2009-04-29 11:20 ` Johannes Berg @ 2009-05-15 21:48 ` Rafael J. Wysocki [not found] ` <200905152348.56449.rjw-KKrjLPT3xs0@public.gmane.org> 1 sibling, 1 reply; 32+ messages in thread From: Rafael J. Wysocki @ 2009-05-15 21:48 UTC (permalink / raw) To: Alan Jenkins Cc: John Linville, Johannes Berg, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List On Wednesday 29 April 2009, Alan Jenkins wrote: > Alan Jenkins wrote: > > Johannes Berg wrote: > > > >> That doesn't seem relevant, this just does some initialisation. However, > >> you definitely missed adding a call to wep_free(). > >> > >> > > > > Hah, I should have realized something was wrong when I noticed I was > > removing more lines that I added. > > > > The crypto init does cause the module load: > > > > wait_for_completion > > call_usermodehelper_exec > > __request_module > > crypto_larval_lookup > > ? extract_entropy > > crypto_alg_mod_lookup > > crypto_alloc_base > > ieee80211_wep_init > > ieee80211_register_hw > > > > Here's a corrected patch complete with changelog. If there are no other > problems with it, can you please apply this for 2.6.30 to keep my EeePC > regression-free? > > Thanks > Alan > > ------> > From c5e9dc036247e70956d1a28e8850c3810385dda0 Mon Sep 17 00:00:00 2001 > From: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> > Date: Wed, 29 Apr 2009 11:41:24 +0100 > Subject: [PATCH] mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lock > > - ieee80211_wep_init(), which is called with rtnl_lock held, blocks in > request_module() [waiting for modprobe to load a crypto module]. > > - modprobe blocks in a call to flush_workqueue(), when it closes a TTY > [presumably when it exits]. > > - The workqueue item linkwatch_event() blocks on rtnl_lock. > > There's no reason for wep_init() to be called with rtnl_lock held, so > just move it outside the critical section. Has it been merged already or is it queued up for merging? Rafael > Signed-off-by: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> > --- > net/mac80211/main.c | 19 +++++++++---------- > 1 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/net/mac80211/main.c b/net/mac80211/main.c > index fbcbed6..00968c2 100644 > --- a/net/mac80211/main.c > +++ b/net/mac80211/main.c > @@ -909,6 +909,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > if (result < 0) > goto fail_sta_info; > > + result = ieee80211_wep_init(local); > + if (result < 0) { > + printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", > + wiphy_name(local->hw.wiphy), result); > + goto fail_wep; > + } > + > rtnl_lock(); > result = dev_alloc_name(local->mdev, local->mdev->name); > if (result < 0) > @@ -930,14 +937,6 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > goto fail_rate; > } > > - result = ieee80211_wep_init(local); > - > - if (result < 0) { > - printk(KERN_DEBUG "%s: Failed to initialize wep: %d\n", > - wiphy_name(local->hw.wiphy), result); > - goto fail_wep; > - } > - > /* add one default STA interface if supported */ > if (local->hw.wiphy->interface_modes & BIT(NL80211_IFTYPE_STATION)) { > result = ieee80211_if_add(local, "wlan%d", NULL, > @@ -967,13 +966,13 @@ int ieee80211_register_hw(struct ieee80211_hw *hw) > > return 0; > > -fail_wep: > - rate_control_deinitialize(local); > fail_rate: > unregister_netdevice(local->mdev); > local->mdev = NULL; > fail_dev: > rtnl_unlock(); > + ieee80211_wep_free(local); > +fail_wep: > sta_info_stop(local); > fail_sta_info: > debugfs_hw_del(local); -- Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? --- Brian Kernighan ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <200905152348.56449.rjw-KKrjLPT3xs0@public.gmane.org>]
* Re: [PATCH] [RFC] EEE PC hangs when booting off battery [not found] ` <200905152348.56449.rjw-KKrjLPT3xs0@public.gmane.org> @ 2009-05-16 8:39 ` Alan Jenkins 0 siblings, 0 replies; 32+ messages in thread From: Alan Jenkins @ 2009-05-16 8:39 UTC (permalink / raw) To: Rafael J. Wysocki Cc: John Linville, Johannes Berg, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Arjan van de Ven, linux-kernel, Kernel Testers List Rafael J. Wysocki wrote: > On Wednesday 29 April 2009, Alan Jenkins wrote: > >> Alan Jenkins wrote: >> >>> Johannes Berg wrote: >>> >>> >>>> That doesn't seem relevant, this just does some initialisation. However, >>>> you definitely missed adding a call to wep_free(). >>>> >>>> >>>> >>> Hah, I should have realized something was wrong when I noticed I was >>> removing more lines that I added. >>> >>> The crypto init does cause the module load: >>> >>> wait_for_completion >>> call_usermodehelper_exec >>> __request_module >>> crypto_larval_lookup >>> ? extract_entropy >>> crypto_alg_mod_lookup >>> crypto_alloc_base >>> ieee80211_wep_init >>> ieee80211_register_hw >>> >>> >> Here's a corrected patch complete with changelog. If there are no other >> problems with it, can you please apply this for 2.6.30 to keep my EeePC >> regression-free? >> >> Thanks >> Alan >> >> ------> >> From c5e9dc036247e70956d1a28e8850c3810385dda0 Mon Sep 17 00:00:00 2001 >> From: Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> >> Date: Wed, 29 Apr 2009 11:41:24 +0100 >> Subject: [PATCH] mac80211: fix modprobe deadlock by not calling wep_init under rtnl_lock >> >> - ieee80211_wep_init(), which is called with rtnl_lock held, blocks in >> request_module() [waiting for modprobe to load a crypto module]. >> >> - modprobe blocks in a call to flush_workqueue(), when it closes a TTY >> [presumably when it exits]. >> >> - The workqueue item linkwatch_event() blocks on rtnl_lock. >> >> There's no reason for wep_init() to be called with rtnl_lock held, so >> just move it outside the critical section. >> > > Has it been merged already or is it queued up for merging? > > Rafael > It looks like it's been merged as v2.6.30-rc6~2^2~40^2~1. Thanks for checking up Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regression: EEE PC hangs when booting off battery 2009-04-11 9:41 Regression: EEE PC hangs when booting off battery Alan Jenkins [not found] ` <49E065CF.6040408-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> @ 2009-04-11 19:40 ` Kristoffer Ericson [not found] ` <20090411214045.7bdd497f.kristoffer.ericson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 32+ messages in thread From: Kristoffer Ericson @ 2009-04-11 19:40 UTC (permalink / raw) To: Alan Jenkins; +Cc: linux-acpi, linux-kernel, Kernel Testers List On Sat, 11 Apr 2009 10:41:35 +0100 Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote: > Regression #6 on latest git! (last known good is 2.6.29-rc8) > > My EEE PC hangs on bootup if it is not connected to mains power. The > hang happens during udev coldplug. I've left it for >60 seconds with > nmi_watchdog=1, soft lockup and hung task detection are enabled, and I > get no trace. Key presses are not echoed to the screen. Getting this myself, even if power cord attached. Its an tainted kernel though (archlinux 2.6.28) so wasnt sure if vanilla was affected. BUG: unable to handle kernel NULL pointer dereference at 00000000 IP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/platform/i8042/modalias Modules linked in: joydev psmouse snd_pcsp serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support i2c_core uvcvideo compat_ioctl32 videodev v4l1_compat sg option usbserial video output eeepc_laptop rfkill intel_agp agpgart thermal processor fan evdev button battery ac snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_intel snd_hwdep snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore ath_pci wlan ath_hal(P) arc4 ecb ath5k mac80211 led_class cfg80211 atl2 rtc_cmos rtc_core rtc_lib ext3 jbd mbcache usbhid hid usb_storage sd_mod uhci_hcd ehci_hcd usbcore ata_piix ahci ata_generic pata_acpi libata scsi_mod Pid: 1794, comm: fsck.ext3 Tainted: P (2.6.28-ARCH #1) 701 EIP: 0060:[<f850d031>] EFLAGS: 00010246 CPU: 0 EIP is at acpi_ac_get_state+0x1b/0x59 [ac] EAX: f69559dc EBX: 00000000 ECX: 00000000 EDX: f850d418 ESI: f6955980 EDI: 00000001 EBP: f6834480 ESP: f6853f0c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process fsck.ext3 (pid: 1794, ti=f6852000 task=f737e400 task.ti=f6852000) Stack: f69559dc f6955980 f692e360 f850d2fd 00000001 f692e360 c01ab0e0 00000400 b80cb000 f6834480 f692e380 00000000 00000000 00000001 00000000 00000000 f7327c00 fffffffb c01aafb0 f6834480 c01cead4 f6853fa0 00000400 b80cb000 Call Trace: [<f850d2fd>] acpi_ac_seq_show+0x12/0x5d [ac] [<c01ab0e0>] seq_read+0x130/0x2e0 [<c01aafb0>] seq_read+0x0/0x2e0 [<c01cead4>] proc_reg_read+0x64/0xa0 [<c01cea70>] proc_reg_read+0x0/0xa0 [<c019346d>] vfs_read+0x9d/0x160 [<c0193601>] sys_read+0x41/0x80 [<c0103f13>] sysenter_do_call+0x12/0x33 [<c0320000>] serial_pnp_probe+0x150/0x250 Code: b8 ea ff ff ff 75 07 8b 43 5c 89 01 31 c0 5b c3 56 89 c6 85 f6 b8 ea ff ff ff 53 74 49 8b 5e 58 8d 46 5c 31 c9 50 ba 18 d4 50 f8 <8b> 03 e8 e0 be d2 c7 89 c2 58 31 c0 85 d2 74 2b 68 1d d4 50 f8 EIP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] SS:ESP 0068:f6853f0c ---[ end trace eb12d69e27c56472 ]--- > > Magic SysRQ keys work though. Unfortunately, SysRq-W / SysRq-T generate > too much output, so the useful bits immediately scroll off the top of > the screen. I can't press SysRq-P (even if that would help), because > the SysRQ key on my laptop is accessed using a Fn modifier key, which > also turns the left hand side of the keyboard into a numeric pad. > > Magic SysRQs on my USB keyboard generate no response. (Sometimes the > numlock key seems to work though. I can't tell on the laptop keyboard > because the laptop doesn't have a numlock led). > > I also tried booting with vga=ask (anticipating nmi_watchdog output), > and selecting the highest mode, 80x60. It turns out that this doesn't > really work, the screen flickers constantly until userspace changes the > console font. However, it avoided the hang in udev coldplug. Instead, > the hang occurred when X started, leaving me with a black screen and a > non-responsive X cursor. > > Booting with acpi=off fixes this. > > Regards > Alan > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Kristoffer Ericson <kristoffer.ericson@gmail.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20090411214045.7bdd497f.kristoffer.ericson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Regression: EEE PC hangs when booting off battery [not found] ` <20090411214045.7bdd497f.kristoffer.ericson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2009-04-12 9:22 ` Alan Jenkins 2009-04-15 15:49 ` archlinux 2.6.28 ac oops (was Re: Regression: EEE PC hangs when booting off battery) Len Brown 0 siblings, 1 reply; 32+ messages in thread From: Alan Jenkins @ 2009-04-12 9:22 UTC (permalink / raw) To: Kristoffer Ericson Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-kernel, Kernel Testers List Kristoffer Ericson wrote: > On Sat, 11 Apr 2009 10:41:35 +0100 > Alan Jenkins <alan-jenkins-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org> wrote: > > >> Regression #6 on latest git! (last known good is 2.6.29-rc8) >> >> My EEE PC hangs on bootup if it is not connected to mains power. The >> hang happens during udev coldplug. I've left it for >60 seconds with >> nmi_watchdog=1, soft lockup and hung task detection are enabled, and I >> get no trace. Key presses are not echoed to the screen. >> > > Getting this myself, even if power cord attached. Its an tainted > kernel though (archlinux 2.6.28) so wasnt sure if > vanilla was affected. > > BUG: unable to handle kernel NULL pointer dereference at 00000000 > IP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] > *pde = 00000000 > Oops: 0000 [#1] PREEMPT SMP > last sysfs file: /sys/devices/platform/i8042/modalias > Modules linked in: joydev psmouse snd_pcsp serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support i2c_core uvcvideo compat_ioctl32 videodev v4l1_compat sg option usbserial video output eeepc_laptop rfkill intel_agp agpgart thermal processor fan evdev button battery ac snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_intel snd_hwdep snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore ath_pci wlan ath_hal(P) arc4 ecb ath5k mac80211 led_class cfg80211 atl2 rtc_cmos rtc_core rtc_lib ext3 jbd mbcache usbhid hid usb_storage sd_mod uhci_hcd ehci_hcd usbcore ata_piix ahci ata_generic pata_acpi libata scsi_mod > > Pid: 1794, comm: fsck.ext3 Tainted: P (2.6.28-ARCH #1) 701 > EIP: 0060:[<f850d031>] EFLAGS: 00010246 CPU: 0 > EIP is at acpi_ac_get_state+0x1b/0x59 [ac] > EAX: f69559dc EBX: 00000000 ECX: 00000000 EDX: f850d418 > ESI: f6955980 EDI: 00000001 EBP: f6834480 ESP: f6853f0c > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process fsck.ext3 (pid: 1794, ti=f6852000 task=f737e400 task.ti=f6852000) > Stack: > f69559dc f6955980 f692e360 f850d2fd 00000001 f692e360 c01ab0e0 00000400 > b80cb000 f6834480 f692e380 00000000 00000000 00000001 00000000 00000000 > f7327c00 fffffffb c01aafb0 f6834480 c01cead4 f6853fa0 00000400 b80cb000 > Call Trace: > [<f850d2fd>] acpi_ac_seq_show+0x12/0x5d [ac] > [<c01ab0e0>] seq_read+0x130/0x2e0 > [<c01aafb0>] seq_read+0x0/0x2e0 > [<c01cead4>] proc_reg_read+0x64/0xa0 > [<c01cea70>] proc_reg_read+0x0/0xa0 > [<c019346d>] vfs_read+0x9d/0x160 > [<c0193601>] sys_read+0x41/0x80 > [<c0103f13>] sysenter_do_call+0x12/0x33 > [<c0320000>] serial_pnp_probe+0x150/0x250 > Code: b8 ea ff ff ff 75 07 8b 43 5c 89 01 31 c0 5b c3 56 89 c6 85 f6 b8 ea ff ff ff 53 74 49 8b 5e 58 8d 46 5c 31 c9 50 ba 18 d4 50 f8 <8b> 03 e8 e0 be d2 c7 89 c2 58 31 c0 85 d2 74 2b 68 1d d4 50 f8 > EIP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] SS:ESP 0068:f6853f0c > ---[ end trace eb12d69e27c56472 ]--- > I'm pretty sure that's a separate bug :-). As you say, it happens under different circumstances, it's a BUG as opposed to a hang, and it's in a different version (I didn't have my problem in 2.6.28). www.kerneloops.org says no-one else has reported your bug. So maybe you're right, and Arch have applied some patches which nobody else runs with. I don't know how to find Arch source packages, so I can't really tell. But the only _taint_ as such is due to the ath_hal module. Let's see, Arch linux... <http://wiki.archlinux.org/index.php/Udev#Blacklisting_Modules> describes how you can (reversibly) blacklist modules. If you blacklist ath_pci (and ath_hal too, just to make sure), you could boot without any taint and generate a "clean" backtrace. Regards Alan ^ permalink raw reply [flat|nested] 32+ messages in thread
* archlinux 2.6.28 ac oops (was Re: Regression: EEE PC hangs when booting off battery) 2009-04-12 9:22 ` Alan Jenkins @ 2009-04-15 15:49 ` Len Brown 0 siblings, 0 replies; 32+ messages in thread From: Len Brown @ 2009-04-15 15:49 UTC (permalink / raw) To: Alan Jenkins Cc: Kristoffer Ericson, linux-acpi, linux-kernel, Kernel Testers List > Kristoffer Ericson wrote: > > Getting this myself, even if power cord attached. Its an tainted > > kernel though (archlinux 2.6.28) so wasnt sure if > > vanilla was affected. > > > > BUG: unable to handle kernel NULL pointer dereference at 00000000 > > IP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] > > *pde = 00000000 > > Oops: 0000 [#1] PREEMPT SMP > > last sysfs file: /sys/devices/platform/i8042/modalias > > Modules linked in: joydev psmouse snd_pcsp serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support i2c_core uvcvideo compat_ioctl32 videodev v4l1_compat sg option usbserial video output eeepc_laptop rfkill intel_agp agpgart thermal processor fan evdev button battery ac snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_intel snd_hwdep snd_pcm_oss snd_pcm snd_timer snd_page_alloc snd_mixer_oss snd soundcore ath_pci wlan ath_hal(P) arc4 ecb ath5k mac80211 led_class cfg80211 atl2 rtc_cmos rtc_core rtc_lib ext3 jbd mbcache usbhid hid usb_storage sd_mod uhci_hcd ehci_hcd usbcore ata_piix ahci ata_generic pata_acpi libata scsi_mod > > > > Pid: 1794, comm: fsck.ext3 Tainted: P (2.6.28-ARCH #1) 701 > > EIP: 0060:[<f850d031>] EFLAGS: 00010246 CPU: 0 > > EIP is at acpi_ac_get_state+0x1b/0x59 [ac] > > EAX: f69559dc EBX: 00000000 ECX: 00000000 EDX: f850d418 > > ESI: f6955980 EDI: 00000001 EBP: f6834480 ESP: f6853f0c > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > Process fsck.ext3 (pid: 1794, ti=f6852000 task=f737e400 task.ti=f6852000) > > Stack: > > f69559dc f6955980 f692e360 f850d2fd 00000001 f692e360 c01ab0e0 00000400 > > b80cb000 f6834480 f692e380 00000000 00000000 00000001 00000000 00000000 > > f7327c00 fffffffb c01aafb0 f6834480 c01cead4 f6853fa0 00000400 b80cb000 > > Call Trace: > > [<f850d2fd>] acpi_ac_seq_show+0x12/0x5d [ac] > > [<c01ab0e0>] seq_read+0x130/0x2e0 > > [<c01aafb0>] seq_read+0x0/0x2e0 > > [<c01cead4>] proc_reg_read+0x64/0xa0 > > [<c01cea70>] proc_reg_read+0x0/0xa0 > > [<c019346d>] vfs_read+0x9d/0x160 > > [<c0193601>] sys_read+0x41/0x80 > > [<c0103f13>] sysenter_do_call+0x12/0x33 > > [<c0320000>] serial_pnp_probe+0x150/0x250 > > Code: b8 ea ff ff ff 75 07 8b 43 5c 89 01 31 c0 5b c3 56 89 c6 85 f6 b8 ea ff ff ff 53 74 49 8b 5e 58 8d 46 5c 31 c9 50 ba 18 d4 50 f8 <8b> 03 e8 e0 be d2 c7 89 c2 58 31 c0 85 d2 74 2b 68 1d d4 50 f8 > > EIP: [<f850d031>] acpi_ac_get_state+0x1b/0x59 [ac] SS:ESP 0068:f6853f0c > > ---[ end trace eb12d69e27c56472 ]--- > > > > I'm pretty sure that's a separate bug :-). As you say, it happens under > different circumstances, it's a BUG as opposed to a hang, and it's in a > different version (I didn't have my problem in 2.6.28). > > www.kerneloops.org says no-one else has reported your bug. So maybe > you're right, and Arch have applied some patches which nobody else runs > with. I don't know how to find Arch source packages, so I can't really > tell. > > But the only _taint_ as such is due to the ath_hal module. Let's see, > Arch linux... > > <http://wiki.archlinux.org/index.php/Udev#Blacklisting_Modules> > describes how you can (reversibly) blacklist modules. If you blacklist > ath_pci (and ath_hal too, just to make sure), you could boot without any > taint and generate a "clean" backtrace. This oops is happening in the ac driver when user-space is reading /proc/acpi/ac_adapter/*/state it would be interesting if... you can reproduce this using a kernel.org kernel if you delay loading of the ac module till after the system is up and see if you can reproduce this at will by reading the file above. thanks, -Len ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2009-05-16 8:39 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-11 9:41 Regression: EEE PC hangs when booting off battery Alan Jenkins
[not found] ` <49E065CF.6040408-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-11 18:07 ` Tzy-Jye Daniel Lin
[not found] ` <66dc75180904111107q645fbb77i3cf927ffab7a7ef0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-12 9:00 ` Alan Jenkins
[not found] ` <49E1ADAE.2030103-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-12 13:11 ` [BISECTED] " Alan Jenkins
[not found] ` <49E1E89D.7040502-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-13 19:15 ` Bjorn Helgaas
[not found] ` <200904131315.55519.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
2009-04-13 19:57 ` Alan Jenkins
2009-04-13 22:28 ` Bjorn Helgaas
2009-04-14 8:06 ` Alan Jenkins
2009-04-14 9:26 ` Alan Jenkins
2009-04-14 14:59 ` Bjorn Helgaas
2009-04-14 15:17 ` Arjan van de Ven
2009-04-14 15:37 ` Alan Jenkins
[not found] ` <20090414081728.10de978a-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-04-14 15:48 ` Bjorn Helgaas
[not found] ` <200904140948.37633.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
2009-04-14 15:55 ` Moore, Robert
[not found] ` <4911F71203A09E4D9981D27F9D8308581D3722EB-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2009-04-14 16:56 ` Bjorn Helgaas
2009-04-14 17:22 ` Moore, Robert
[not found] ` <4911F71203A09E4D9981D27F9D8308581D372489-osO9UTpF0URQxe9IK+vIArfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2009-04-14 17:28 ` Alan Jenkins
[not found] ` <200904141056.14159.bjorn.helgaas-VXdhtT5mjnY@public.gmane.org>
2009-04-15 15:38 ` Len Brown
2009-04-15 14:32 ` [FIXED] " Alan Jenkins
2009-04-22 12:17 ` Alan Jenkins
2009-04-26 11:34 ` Alan Jenkins
[not found] ` <49F446AE.6070607-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-28 9:19 ` [PATCH] [RFC] " Alan Jenkins
[not found] ` <49F6CA0E.5040101-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-28 9:58 ` Johannes Berg
[not found] ` <1240912688.28835.10.camel-YfaajirXv2244ywRPIzf9A@public.gmane.org>
2009-04-28 10:27 ` Alan Jenkins
[not found] ` <49F6DA14.7030608-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-28 10:35 ` Johannes Berg
2009-04-29 11:14 ` Alan Jenkins
[not found] ` <49F83699.3000307-cCz0Lq7MMjm9FHfhHBbuYA@public.gmane.org>
2009-04-29 11:20 ` Johannes Berg
2009-05-15 21:48 ` Rafael J. Wysocki
[not found] ` <200905152348.56449.rjw-KKrjLPT3xs0@public.gmane.org>
2009-05-16 8:39 ` Alan Jenkins
2009-04-11 19:40 ` Regression: " Kristoffer Ericson
[not found] ` <20090411214045.7bdd497f.kristoffer.ericson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-12 9:22 ` Alan Jenkins
2009-04-15 15:49 ` archlinux 2.6.28 ac oops (was Re: Regression: EEE PC hangs when booting off battery) Len Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).