ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

* ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-17 20:40 ` 2.6.19-rc6: known regressions (v2) Adrian Bunk
@ 2006-11-17 23:58   ` Linus Torvalds
  2006-11-18  1:25     ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-17 23:58 UTC (permalink / raw)
  To: Len Brown, Adrian Bunk, Andrew Morton; +Cc: David Brownell, linux-acpi

On Fri, 17 Nov 2006, Adrian Bunk wrote:
> 
> Subject    : nasty ACPI regression, AE_TIME errors
> References : http://lkml.org/lkml/2006/11/15/12
> Submitter  : David Brownell <david-b@pacbell.net>
> Handled-By : Len Brown <len.brown@intel.com>
>              Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
> Status     : problem is being debugged

I do not know if this is related, but testing one of my laptops (always a 
good idea to check the week before release) shows that my trusty old 
Compaq N620c locks up rather quickly at boot with the current -git tree. 

Total lockup - no sysrq, no messages, no nothing.

I've mostly bisected it (what the _hell_ did we do before "git bisect"?), 
and right now I know:

commit 9aaed2b42d00d4abb2748d72d599a8033600e2bf is bad (that's Len's "pull 
trivial into test branch") commit.

v2.6.19-rc2 seems all good.

Which leaves a chunk of just a few ACPI commits left to bisect. 

I'll do five or so more reboots, and I should be able to tell exactly 
which commit breaks. It almost always locks up very early during boot 
(generally during the "initializing udev" phase), although sometimes it 
survives a bit further..

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-17 23:58   ` ACPI breakage (Re: 2.6.19-rc6: known regressions (v2)) Linus Torvalds
@ 2006-11-18  1:25     ` Linus Torvalds
  0 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2006-11-18  1:25 UTC (permalink / raw)
  To: Len Brown, Alexey Starikovskiy, Adrian Bunk, Andrew Morton
  Cc: David Brownell, linux-acpi

On Fri, 17 Nov 2006, Linus Torvalds wrote:
> 
> Total lockup - no sysrq, no messages, no nothing.

Dammit.

It looks like 37605a6900f6b4d886d995751fcfeef88c4e462c, and I should have 
realized that immediately.

That commit re-introduces the bug that we already reverted once.

Why the hell did that idiotic thing go in, when we had to revert it once 
already (see commit 72945b2b90a5554975b8f72673ab7139d232a121 for the 
earlier revert).

It was broken then, it is broken now. Nothing has changed.

Why did you guys try to sneak it in again? Last time this same "use a 
second workqueue" patch went in (in a different form), we had _exactly_ 
the same problems, with total lockups, and way too high CPU usage.

The bugzilla entry that you refer to in that commit is even the same one 
that discussed why the _original_ patch was totally broken.

It's even the same AUTHOR who wrote the original buggy patch, that pushed 
through the same buggy patch AGAIN.

Dammit, this is frustrating.

Why did people expect it to suddenly not be buggy?

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
@ 2006-11-18 16:23 Starikovskiy, Alexey Y
  2006-11-18 17:12 ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Starikovskiy, Alexey Y @ 2006-11-18 16:23 UTC (permalink / raw)
  To: Linus Torvalds, Brown, Len, Adrian Bunk, Andrew Morton
  Cc: David Brownell, linux-acpi

May because it does not have a single common line with the previous
patch?
Or may be because it fixes all the current AMD-HP notebooks? 
Or may be because it did not fail while being in -mm?

I will not "sneak it in" again, I promise. 

Regards,
	Alex. 

-----Original Message-----
From: Linus Torvalds [mailto:torvalds@osdl.org] 
Sent: Saturday, November 18, 2006 4:25 AM
To: Brown, Len; Starikovskiy, Alexey Y; Adrian Bunk; Andrew Morton
Cc: David Brownell; linux-acpi@vger.kernel.org
Subject: Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))

On Fri, 17 Nov 2006, Linus Torvalds wrote:
> 
> Total lockup - no sysrq, no messages, no nothing.

Dammit.

It looks like 37605a6900f6b4d886d995751fcfeef88c4e462c, and I should
have 
realized that immediately.

That commit re-introduces the bug that we already reverted once.

Why the hell did that idiotic thing go in, when we had to revert it once

already (see commit 72945b2b90a5554975b8f72673ab7139d232a121 for the 
earlier revert).

It was broken then, it is broken now. Nothing has changed.

Why did you guys try to sneak it in again? Last time this same "use a 
second workqueue" patch went in (in a different form), we had _exactly_ 
the same problems, with total lockups, and way too high CPU usage.

The bugzilla entry that you refer to in that commit is even the same one

that discussed why the _original_ patch was totally broken.

It's even the same AUTHOR who wrote the original buggy patch, that
pushed 
through the same buggy patch AGAIN.

Dammit, this is frustrating.

Why did people expect it to suddenly not be buggy?

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 16:23 ACPI breakage (Re: 2.6.19-rc6: known regressions (v2)) Starikovskiy, Alexey Y
@ 2006-11-18 17:12 ` Linus Torvalds
  2006-11-18 19:05   ` David Brownell
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-18 17:12 UTC (permalink / raw)
  To: Starikovskiy, Alexey Y
  Cc: Brown, Len, Adrian Bunk, Andrew Morton, David Brownell,
	linux-acpi

On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote:
>
> May because it does not have a single common line with the previous
> patch?

Yeah, I do agree that it _looks_ very different as a patch, but it ends up 
having all the same execution profiles..

It's been too long since I debugged the previous problem, so I don't 
remember the exact details any more (back then I enabled ACPI debugging 
and watched the messages scroll by etc - this time I initially thought it 
was interrupt-related due to the other irq problems we've had, so I 
started bisecting immediately _without_ doing any ACPI debugging stuff, 
and by the time I actually bisected down enough, I recognized the problem, 
so I didn't do all the same "enable ACPI messages and look deeply into 
what is going on" thing).

But if I remember correctly, what happens is _roughly_ something like 
this:

 - thermal event happens - the CPU is getting warm, and the fan needs to 
   start up. Quite often, this happened early during boot (which is quite 
   busy - some init scripts are disgustingly CPU-intensive mainly due to 
   using inefficient scripting languages), but if it didn't happen there, 
   it's easy enough to force to happen other ways.

 - part of the handling is "acpi_os_execute()" for something (don't ask me 
   what), but the interestign thing is how that "acpi_os_execue()" then 
   ends up causing a _recursive_ event.

 - we handle the original event in kacpid, and hand over the new one as a 
   notification event. But the event keeps on happening, and kacpid keeps 
   on running, and the other thread doesn't actually ever _run_ because 
   kacpid holds he ACPI lock and is constantly busy.

 - we not only are constantly running in kernel space, we also end up 
   eventually running out of memory for allocating all the work queue 
   entries.

So the reason the old code works is because everything is done in a single 
thread, and yes, we end up getting multiple events, but because the queue 
is all done onto the same queue that is _handling_ the events in the first 
place, and because it's a FIFO queue, the notification events get handled 
_before_ the later events.

So with the single-threaded situation, you basically end up always doing 
the events in the same order they came in. In the "two separate threads" 
case, you don't, and one thread will end up generating events forever, 
waiting for them to happen, but they never _do_ happen, so you have a 
lockup _and_ eventually an infinite event queue for the other thread.

> Or may be because it fixes all the current AMD-HP notebooks? 
> Or may be because it did not fail while being in -mm?

I'm afraid that -mm doesn't get as much testing as it used to get. 

Also, I do realize that the patch fixes other problems, but we have long 
had a very strict policy that we do NOT accept regressions. Immediately 
when you start accepting regressions, you will never know whether you're 
going forward of backwards. It's better to have a known _old_ bug than to 
introduce a new one.

So the "no regressions!" rule ends up trumping pretty much every single 
other issue. It's unacceptable to have machines that used to work, 
suddenly stop working. Even if it fixes another machine. 

ACPI didn't use to have that rule, and it was wild and crazy. Maybe more 
bugs got fixed, but the problem with accepting regressions is that nobody 
can _ever_ trust that system. You do not want to have people _afraid_ of 
upgrading - they should feel confident that upgrading never introduces any 
new problems.

(Of course, that can never be reached 100%, but it's very much part of the 
goal. It kind of falls into the same "backwards compatibility on 
interfaces" absolute goal: it's ok to do new things, but you can never 
allow them to break old programs)

> I will not "sneak it in" again, I promise. 

Feel free to send me test patches when working on these things, because I 
have no trouble at all to test my particular machine.

I think you'll find the ACPI dumps etc for that machine in your archives, 
because I've sent them to Len and the acpi lists several times, but if you 
want to get AML disassemblies etc, just tell me how. I've done them 
before, but I work on this seldom enough that I always forget what the 
magic incantations are, and where to get the tools etc.

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
@ 2006-11-18 19:01 Starikovskiy, Alexey Y
  2006-11-18 19:05 ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Starikovskiy, Alexey Y @ 2006-11-18 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brown, Len, Adrian Bunk, Andrew Morton, David Brownell,
	linux-acpi

>Feel free to send me test patches when working on these 
>things, because I 
>have no trouble at all to test my particular machine.

I've sent you a test patch back in July, but did not get a reply. May be
due to OLS?

Thanks,
	Alex.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 17:12 ` Linus Torvalds
@ 2006-11-18 19:05   ` David Brownell
  2006-11-18 22:09     ` Linus Torvalds
  2006-11-19  4:33     ` David Brownell
  0 siblings, 2 replies; 15+ messages in thread
From: David Brownell @ 2006-11-18 19:05 UTC (permalink / raw)
  To: Alexey Starikovskiy, Linus Torvalds
  Cc: Adrian Bunk, Andrew Morton, Brown, Len, linux-acpi

> On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote:
> 
> > Or may be because it fixes all the current AMD-HP notebooks? 

Whatever "it" is sure broke mine though... the one that's
currently on my lap!  :)

Running right now with a patch reverting the update which
made trouble on Linus' machine, but without Alexey's two
tweaks to the EC interrupt handler.  So far so good, even
after doing things which had previously caused AE_TIME
errors pretty quickly.  But then, the errors weren't what
I'd call reproducible either.

Linus' explanation of what went wrong looks compatible with
the symptoms I've seen, FWIW.

- Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 19:01 Starikovskiy, Alexey Y
@ 2006-11-18 19:05 ` Linus Torvalds
       [not found]   ` <455FB44C.8050103@linux.intel.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-18 19:05 UTC (permalink / raw)
  To: Starikovskiy, Alexey Y
  Cc: Brown, Len, Adrian Bunk, Andrew Morton, David Brownell,
	linux-acpi



On Sat, 18 Nov 2006, Starikovskiy, Alexey Y wrote:
> 
> I've sent you a test patch back in July, but did not get a reply. May be
> due to OLS?

Heh. Whenever you send me something like that, and I don't answer within a 
few days, you can pretty much depend on me not answering - my mailqueue 
just fills up too fast. And yeah, it might have been during OLS. Just 
re-send when it happens.

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 19:05   ` David Brownell
@ 2006-11-18 22:09     ` Linus Torvalds
  2006-11-18 22:16       ` Adrian Bunk
  2006-11-19  4:33     ` David Brownell
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-11-18 22:09 UTC (permalink / raw)
  To: David Brownell
  Cc: Alexey Starikovskiy, Adrian Bunk, Andrew Morton, Brown, Len,
	linux-acpi



On Sat, 18 Nov 2006, David Brownell wrote:
> 
> Running right now with a patch reverting the update which
> made trouble on Linus' machine, but without Alexey's two
> tweaks to the EC interrupt handler.  So far so good, even
> after doing things which had previously caused AE_TIME
> errors pretty quickly.  But then, the errors weren't what
> I'd call reproducible either.

Ok, goodie. 

Adrian, that means that there's one less regression on your list, unless 
David reports that he can reproduce it again (I don't think he will be 
able to: all the other ACPI changes looked relatively harmless, at least 
in the particular area of ACPI changes I looked at)

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 22:09     ` Linus Torvalds
@ 2006-11-18 22:16       ` Adrian Bunk
  0 siblings, 0 replies; 15+ messages in thread
From: Adrian Bunk @ 2006-11-18 22:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Brownell, Alexey Starikovskiy, Andrew Morton, Brown, Len,
	linux-acpi

On Sat, Nov 18, 2006 at 02:09:56PM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 18 Nov 2006, David Brownell wrote:
> > 
> > Running right now with a patch reverting the update which
> > made trouble on Linus' machine, but without Alexey's two
> > tweaks to the EC interrupt handler.  So far so good, even
> > after doing things which had previously caused AE_TIME
> > errors pretty quickly.  But then, the errors weren't what
> > I'd call reproducible either.
> 
> Ok, goodie. 
> 
> Adrian, that means that there's one less regression on your list, unless 
> David reports that he can reproduce it again (I don't think he will be 
> able to: all the other ACPI changes looked relatively harmless, at least 
> in the particular area of ACPI changes I looked at)

I had already removed it from my list based on David's email.

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-18 19:05   ` David Brownell
  2006-11-18 22:09     ` Linus Torvalds
@ 2006-11-19  4:33     ` David Brownell
  2006-11-20 18:46       ` David Brownell
  1 sibling, 1 reply; 15+ messages in thread
From: David Brownell @ 2006-11-19  4:33 UTC (permalink / raw)
  To: Alexey Starikovskiy
  Cc: Linus Torvalds, Adrian Bunk, Andrew Morton, Brown, Len,
	linux-acpi

On Saturday 18 November 2006 11:05 am, David Brownell wrote:
> 
> Running right now with a patch reverting the update which
> made trouble on Linus' machine, but without Alexey's two
> tweaks to the EC interrupt handler.  So far so good, even
> after doing things which had previously caused AE_TIME
> errors pretty quickly.  But then, the errors weren't what
> I'd call reproducible either.

Hmm, well after a reboot to sort out some other patches,
and at uptime of ~2 hours, I noticed confusion about
whether AC or battery power was active, then the old:

ACPI Exception (evregion-0424): AE_TIME, Returned by Handler for [EmbeddedControl] [20060707]
ACPI Exception (dswexec-0458): AE_TIME, While resolving operands for [OpcodeName unavailable] [20060707]
ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.THRM._TMP] (Node ffff810002032d10), AE_TIME

So maybe that's not the entire story; sigh.

- Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
       [not found]         ` <Pine.LNX.4.64.0611201003540.3692@woody.osdl.org>
@ 2006-11-20 18:27           ` Linus Torvalds
  2006-11-20 19:31             ` Alexey Starikovskiy
  2006-11-21  3:10             ` Sanjoy Mahajan
  2006-11-20 22:13           ` Alexey Starikovskiy
  1 sibling, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2006-11-20 18:27 UTC (permalink / raw)
  To: Alexey Starikovskiy; +Cc: Brown, Len, linux-acpi

[ Digression from testing Alexey's patch that makes the Evo work again 
  with two separate threads ]

On Mon, 20 Nov 2006, Linus Torvalds wrote:
> 
> Ok, this one works for me too, and looks much simpler.

Hmm. Some more testing shows that fan behaviour after a suspend-to-ram 
event seems broken, but I suspect the breakage isn't new.

It seems that ACPI remembers fan state from before the suspend, and then 
(incorrectly) uses that to decide whether it should turn fans on or off. 
So for example, it seems to remember that the fan was already on, so it 
won't ever turn it on again - even though the suspend will obviously have 
turned off all fans too.

So after running for a while, I get (for example):

	cat /proc/acpi/thermal_zone/TZ1/*

	..
	state:                   active[0]
	temperature:             92 C
	critical (S5):           99 C
	passive:                 95 C: tc1=1 tc2=2 tsp=100 devices=0xf7e42338
	active[0]:               80 C: devices=0xc18d78ec
	active[1]:               70 C: devices=0xc18d7888
	active[2]:               60 C: devices=0xc18d7838
	active[3]:               45 C: devices=0xc18d77e8

(it thinks all fans are on), but no fans are actually on:

	cat /proc/acpi/fan/*/*

	status:                  off
	status:                  off
	status:                  off
	status:                  off

Of course, I'm not exactly having a lot of trust in ACPI in general, so 
for all I know this is just more unfixable crap from the firmware. But it 
smells like "I remember state from before the suspend".

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-19  4:33     ` David Brownell
@ 2006-11-20 18:46       ` David Brownell
  0 siblings, 0 replies; 15+ messages in thread
From: David Brownell @ 2006-11-20 18:46 UTC (permalink / raw)
  To: Alexey Starikovskiy
  Cc: Linus Torvalds, Adrian Bunk, Andrew Morton, Brown, Len,
	linux-acpi

On Saturday 18 November 2006 8:33 pm, David Brownell wrote:
> On Saturday 18 November 2006 11:05 am, David Brownell wrote:
> > 
> > Running right now with a patch reverting the update which
> > made trouble on Linus' machine, but without Alexey's two
> > tweaks to the EC interrupt handler.  So far so good, even
> > after doing things which had previously caused AE_TIME
> > errors pretty quickly.  But then, the errors weren't what
> > I'd call reproducible either.
> 
> Hmm, well after a reboot to sort out some other patches,
> and at uptime of ~2 hours, I noticed confusion about
> whether AC or battery power was active, then the old:
> 
> ACPI Exception (evregion-0424): AE_TIME, Returned by Handler for [EmbeddedControl] [20060707]
> ACPI Exception (dswexec-0458): AE_TIME, While resolving operands for [OpcodeName unavailable] [20060707]
> ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.THRM._TMP] (Node ffff810002032d10), AE_TIME
> 
> So maybe that's not the entire story; sigh.

Whatever it is, it hasn't shown its ugly little face since then.
So while it doesn't seem completely fixed ... it's nowhere near
as broken as it was previously.

- Dave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-20 18:27           ` Linus Torvalds
@ 2006-11-20 19:31             ` Alexey Starikovskiy
  2006-11-21  3:10             ` Sanjoy Mahajan
  1 sibling, 0 replies; 15+ messages in thread
From: Alexey Starikovskiy @ 2006-11-20 19:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Brown, Len, linux-acpi


Linus Torvalds wrote:
> [ Digression from testing Alexey's patch that makes the Evo work again 
>   with two separate threads ]
>
> On Mon, 20 Nov 2006, Linus Torvalds wrote:
>   
>> Ok, this one works for me too, and looks much simpler.
>>     
>
> Hmm. Some more testing shows that fan behaviour after a suspend-to-ram 
> event seems broken, but I suspect the breakage isn't new.
>
> It seems that ACPI remembers fan state from before the suspend, and then 
> (incorrectly) uses that to decide whether it should turn fans on or off. 
> So for example, it seems to remember that the fan was already on, so it 
> won't ever turn it on again - even though the suspend will obviously have 
> turned off all fans too.
>
>   
We have patches in #7122 for similar issue in suspend-to-disk, it may 
fix suspend-to-ram too?
It's related to order of ACPI devices resume and _WAK method execution.

Thanks,
    Alex.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
       [not found]         ` <Pine.LNX.4.64.0611201003540.3692@woody.osdl.org>
  2006-11-20 18:27           ` Linus Torvalds
@ 2006-11-20 22:13           ` Alexey Starikovskiy
  1 sibling, 0 replies; 15+ messages in thread
From: Alexey Starikovskiy @ 2006-11-20 22:13 UTC (permalink / raw)
  To: Linus Torvalds, linux-acpi, David Brownell

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]



Linus Torvalds wrote:
> On Sun, 19 Nov 2006, Alexey Starikovskiy wrote:
>   
>> I agree to all your comments with one exception, please see below. Attached is
>> the reworked patch against latest git. Please test.
>>     
>
> Ok, this one works for me too, and looks much simpler.
>
>   
>> Linus Torvalds wrote:
>>     
>>> And we might as well do it when we add an entry to the _deferred_ queue, no? 
>>>       
>>   
>> acpi_os_execute() is called from interrupt context for insertion into
>> _deferred_ queue, so it's not possible to yield in it, no?
>>     
>
> Hmm. Yes. Anyway, the new patch looks acceptable, and certainly much 
> simpler than trying to count events.
>
> It probably causes tons of new unnecessary scheduling events, but I doubt 
> we really care.
>
> That said, what we _really_ want here is a "priority queue" for the 
> events, and some way to put an event back on the queue while running it 
> (eg ACPI "Sleep" event). But I guess the ACPI interpreter isn't done that 
> way (ie you can't just push and pop ACPI state).
>
> 		Linus
>   
Linus, thanks for diagnosing and testing. Yes, interpeter is not 
currently able to put its stack aside.

David, could you try this patch too?

Regards,
    Alex.






[-- Attachment #2: yield_on_deferred_events.patch --]
[-- Type: text/plain, Size: 3673 bytes --]

ACPI: created a dedicated workqueue for notify() execution

From:  Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>

Needed to handle while loop in GPE handler of HP notebooks.    
http://bugzilla.kernel.org/show_bug.cgi?id=5534

Yield processor before execution of deferred event queue. Needed to avoid
flooding of Compaq n620c with events.
---

 drivers/acpi/osl.c |   51 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 068fe4f..169ca04 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -34,6 +34,7 @@ #include <linux/smp_lock.h>
 #include <linux/interrupt.h>
 #include <linux/kmod.h>
 #include <linux/delay.h>
+#include <linux/syscalls.h>
 #include <linux/workqueue.h>
 #include <linux/nmi.h>
 #include <acpi/acpi.h>
@@ -73,6 +74,7 @@ static unsigned int acpi_irq_irq;
 static acpi_osd_handler acpi_irq_handler;
 static void *acpi_irq_context;
 static struct workqueue_struct *kacpid_wq;
+static struct workqueue_struct *kacpi_notify_wq;
 
 acpi_status acpi_os_initialize(void)
 {
@@ -91,8 +93,9 @@ acpi_status acpi_os_initialize1(void)
 		return AE_NULL_ENTRY;
 	}
 	kacpid_wq = create_singlethread_workqueue("kacpid");
+	kacpi_notify_wq = create_singlethread_workqueue("kacpi_notify");
 	BUG_ON(!kacpid_wq);
-
+	BUG_ON(!kacpi_notify_wq);
 	return AE_OK;
 }
 
@@ -104,6 +107,7 @@ acpi_status acpi_os_terminate(void)
 	}
 
 	destroy_workqueue(kacpid_wq);
+	destroy_workqueue(kacpi_notify_wq);
 
 	return AE_OK;
 }
@@ -566,10 +570,23 @@ void acpi_os_derive_pci_id(acpi_handle r
 
 static void acpi_os_execute_deferred(void *context)
 {
-	struct acpi_os_dpc *dpc = NULL;
+	struct acpi_os_dpc *dpc = (struct acpi_os_dpc *)context;
+	if (!dpc) {
+		printk(KERN_ERR PREFIX "Invalid (NULL) context\n");
+		return;
+	}
+
+	sys_sched_yield();
+	dpc->function(dpc->context);
+
+	kfree(dpc);
 
+	return;
+}
 
-	dpc = (struct acpi_os_dpc *)context;
+static void acpi_os_execute_notify(void *context)
+{
+	struct acpi_os_dpc *dpc = (struct acpi_os_dpc *)context;
 	if (!dpc) {
 		printk(KERN_ERR PREFIX "Invalid (NULL) context\n");
 		return;
@@ -604,14 +621,12 @@ acpi_status acpi_os_execute(acpi_execute
 	struct acpi_os_dpc *dpc;
 	struct work_struct *task;
 
-	ACPI_FUNCTION_TRACE("os_queue_for_execution");
-
 	ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
 			  "Scheduling function [%p(%p)] for deferred execution.\n",
 			  function, context));
 
 	if (!function)
-		return_ACPI_STATUS(AE_BAD_PARAMETER);
+		return AE_BAD_PARAMETER;
 
 	/*
 	 * Allocate/initialize DPC structure.  Note that this memory will be
@@ -624,9 +639,8 @@ acpi_status acpi_os_execute(acpi_execute
 	 * from the same memory.
 	 */
 
-	dpc =
-	    kmalloc(sizeof(struct acpi_os_dpc) + sizeof(struct work_struct),
-		    GFP_ATOMIC);
+	dpc = kzalloc(sizeof(struct acpi_os_dpc) +
+			sizeof(struct work_struct), GFP_ATOMIC);
 	if (!dpc)
 		return_ACPI_STATUS(AE_NO_MEMORY);
 
@@ -634,13 +648,18 @@ acpi_status acpi_os_execute(acpi_execute
 	dpc->context = context;
 
 	task = (void *)(dpc + 1);
-	INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc);
-
-	if (!queue_work(kacpid_wq, task)) {
-		ACPI_DEBUG_PRINT((ACPI_DB_ERROR,
-				  "Call to queue_work() failed.\n"));
-		kfree(dpc);
-		status = AE_ERROR;
+	if (type == OSL_NOTIFY_HANDLER) {
+		INIT_WORK(task, acpi_os_execute_notify, (void *)dpc);
+		if (!queue_work(kacpi_notify_wq, task)) {
+			status = AE_ERROR;
+			kfree(dpc);
+		}
+	} else {
+		INIT_WORK(task, acpi_os_execute_deferred, (void *)dpc);
+		if (!queue_work(kacpid_wq, task)) {
+			status = AE_ERROR;
+			kfree(dpc);
+		}
 	}
 
 	return_ACPI_STATUS(status);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: ACPI breakage (Re: 2.6.19-rc6: known regressions (v2))
  2006-11-20 18:27           ` Linus Torvalds
  2006-11-20 19:31             ` Alexey Starikovskiy
@ 2006-11-21  3:10             ` Sanjoy Mahajan
  1 sibling, 0 replies; 15+ messages in thread
From: Sanjoy Mahajan @ 2006-11-21  3:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alexey Starikovskiy, Brown, Len, linux-acpi

Linus Torvalds wrote:

  (it thinks all fans are on), but no fans are actually on:

	  cat /proc/acpi/fan/*/*

	  status:                  off
	  status:                  off
	  status:                  off
	  status:                  off

I saw a related problem with suspend to disk; a second suspend to RAM
would hang while suspending -- bugzilla 5989, 6749 -- so I couldn't
test its effect on the fans.

After resuming, the fans would be off but the system thought they were
on, and the fan/*/* files would confirm that wrong idea.  Because the
fans were allegedly on, they would never get turned on, even as the
temperature climbed into the sky.  I reported this problem to the acpi
list and bugzilla, and the fan driver was fixed to have suspend/resume
methods, and that plus other patches eventually fixed it for my box
(IBM TP 600X).  Though with some of the intermediate patches, I saw
the behavior you are seeing (with 'off' status in fan/*/*, but showing
'on' in thermal_zone/*/* or in 'acpi -t').

Also there was some question there on whether exactly the right
version of the patch got merged.

See the discussions in
<http://bugzilla.kernel.org/show_bug.cgi?id=5000>.

Unfortunately I haven't kept testing it because my 600X's screen died,
and its replacement (T60) doesn't export any fan control or trip
points to ACPI (it's all done at a lower level, alas, so the damn fan
is on way too much).

-Sanjoy

`Never underestimate the evil of which men of power are capable.'
         --Bertrand Russell, _War Crimes in Vietnam_, chapter 1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-11-21  3:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-18 16:23 ACPI breakage (Re: 2.6.19-rc6: known regressions (v2)) Starikovskiy, Alexey Y
2006-11-18 17:12 ` Linus Torvalds
2006-11-18 19:05   ` David Brownell
2006-11-18 22:09     ` Linus Torvalds
2006-11-18 22:16       ` Adrian Bunk
2006-11-19  4:33     ` David Brownell
2006-11-20 18:46       ` David Brownell
  -- strict thread matches above, loose matches on Subject: below --
2006-11-18 19:01 Starikovskiy, Alexey Y
2006-11-18 19:05 ` Linus Torvalds
     [not found]   ` <455FB44C.8050103@linux.intel.com>
     [not found]     ` <Pine.LNX.4.64.0611182048560.3692@woody.osdl.org>
     [not found]       ` <456043F7.1030105@linux.intel.com>
     [not found]         ` <Pine.LNX.4.64.0611201003540.3692@woody.osdl.org>
2006-11-20 18:27           ` Linus Torvalds
2006-11-20 19:31             ` Alexey Starikovskiy
2006-11-21  3:10             ` Sanjoy Mahajan
2006-11-20 22:13           ` Alexey Starikovskiy
     [not found] <Pine.LNX.4.64.0611152008450.3349@woody.osdl.org>
2006-11-17 20:40 ` 2.6.19-rc6: known regressions (v2) Adrian Bunk
2006-11-17 23:58   ` ACPI breakage (Re: 2.6.19-rc6: known regressions (v2)) Linus Torvalds
2006-11-18  1:25     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox