public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: 2.6.16-rc5: known regressions
@ 2006-02-27  9:04 Yu, Luming
  2006-03-03  2:59 ` Sanjoy Mahajan
  2006-03-10  5:26 ` 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT] Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-02-27  9:04 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds, Andrew Morton
  Cc: Tom Seeley, Dave Jones, Jiri Slaby, michael, mchehab,
	v4l-dvb-maintainer, video4linux-list, Brian Marete, Ryan Phillips,
	gregkh, linux-usb-devel, Sanjoy Mahajan, Brown, Len, linux-acpi,
	Mark Lord, Randy Dunlap, jgarzik, linux-ide, Duncan,
	Pavlik Vojtech, linux-input, Meelis Roos

>Subject    : S3 sleep hangs the second time - 600X
>References : http://bugzilla.kernel.org/show_bug.cgi?id=5989
>Submitter  : Sanjoy Mahajan <sanjoy@mrao.cam.ac.uk>
>Handled-By : Luming Yu <luming.yu@intel.com>
>Status     : is being debugged,
>             we might want to change the default back for 2.6.16:
>             http://lkml.org/lkml/2006/2/25/101
>

Accordint to bug report, the BIOS DSDT is modified.
I don't know how these changes affect the results
of suspend/resume. But, it is clear this is NOT right approach 
to fix problem. Hence, I need the testing report with 
un-modified DSDT on TP 600X, bios 1.11.

--Luming


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-10  6:12 Yu, Luming
  2006-03-10  6:27 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-10  6:12 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos

>From: "Yu, Luming" <luming.yu@intel.com>
>> I suggest you to retest, and post dmesg with UN-modified BIOS.
>
>I'm now running/testing an unmodified DSDT with 2.6.16-rc5.  
>For a while
>I had no S3 hangs, but I just noticed them again.  The error 
>is the same
>as with the modified DSDT (with slightly different offsets):

I assume you have tested ec_intr=0 and ec_intr=1.

>
>exregion-0185 [36] ex_system_memory_space: system_memory 0 (32 
>width) Address=0000000023FDFFC0
>exregion-0185 [36] ex_system_memory_space: system_memory 1 (32 
>width) Address=0000000023FDFFC0
>exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 
>width) Address=00000000000000B2
>
>repeated endlessly.

I need calltrace for this 

>
>I think the problem resurfaced once I decided to let my sleep.sh script
>leave the thermal driver loaded before going into S3 (suspecting that
>the bug might come back if I did that).

Clealy, it's thermal related. We need to narrow down here.

>
>So I susect that my modified DSDT didn't cause the S3 problems, it
>merely exposed one even in the minimal configuration discussed in the
>#5989 report.

The ground rule is Don't use modified DSDT.
If you do that,  the results won't be trusted.

>
>Which makes me wonder about another bug that disappeared when 
>I switched
>to the vanilla DSDT: While printing (via gs+hpijs to an HP photosmart
>2710 via the wireless card), the system makes double-beeps as 
>if it were
>having the AC adapter plugged and unplugged.  These noises happen when
>printing via the wireless card or via USB (to a different HP inkjet),

Interesting, open bug for this.

>but not when printing via the parallel port to a Lexmark laserprinter
>(using just gs).  Since I didn't do anything to the battery code in the
>DSDT, I now wonder whether changing the DSDT merely exposed the issue
>but didn't create it.
>
>[From an earlier msg:]
>> I think the truth is, for 5989, we need to fix thermal and processor
>> driver issue.
>
>I agree, although I think the processor driver is not the culprit.  My
>earlier testing with the (with the modified DSDT) worked fine with the
>processor module loaded, but hung with processor + thermal loaded.
>

ok, we need to start from thermal.  

BTW, do you still think this is a regression?

Thanks,
Luming


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-10  6:46 Yu, Luming
  2006-03-10 13:27 ` Sanjoy Mahajan
  2006-03-10 13:36 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-10  6:46 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos

> exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 
>>> width) Address=00000000000000B2
>>>
>>> repeated endlessly.
>
>> I need calltrace for this 
>
>Looking at /proc/acpi/debug_level, I see several debugging choices
>that might give the calltrace you want.  Let me know which ones are
>essential (I'd turn all of them on; however, I found when trying to
>track this down earlier that the bug would slither away if I had too
>much debugging turned on):

What do you mean of "slither away" ? 
bug go away?

>
>ACPI_LV_DISPATCH	       0x00000100 [ ]
>ACPI_LV_EXEC		       0x00000200 [ ]
>ACPI_LV_NAMES		       0x00000400 [ ]
>ACPI_LV_FUNCTIONS	       0x00200000 [ ]
>
>By the way, a long standing buglet for me is that 'cat
>/proc/acpi/debug_level' truncates the output to 1024 bytes.  So I have
>to do 'cat /proc/acpi/debug_level | cat' so that the first cat doesn't
>find that its stdout is a tty and try to reduce its buffer size from
>4096 (big enough) to 1024.  A patch is available at
><http://bugzilla.kernel.org/show_bug.cgi?id=5076>

let's start from:

echo -n 0x10 > /proc/acpi/debug_layer
echo -n 0x10 > /proc/acpi/debug_level

>
>> BTW, do you still think this is a regression?
>
>I'm 95% sure, because booting with ec_intr=0 avoids the problem, so
>the commit that made ec_intr=1 the default almost certainly also makes
>this bug appear.

why NOT 100% sure? :-)

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-13  2:00 Yu, Luming
  2006-03-13  4:38 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-13  2:00 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos

>width) Address=0000000023FDFFC0
>exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 
>width) Address=00000000000000B2
>exregion-0185 [35] ex_system_memory_space: system_memory 0 (32 
>width) Address=0000000023FDFFC0
>exregion-0185 [36] ex_system_memory_space: system_memory 0 (32 
>width) Address=0000000023FDFFC0
>exregion-0185 [36] ex_system_memory_space: system_memory 1 (32 
>width) Address=0000000023FDFFC0
>exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 
>width) Address=00000000000000B2
>
>And then these above four lines (exregion-0185, -0185, -0185, -0290)
>repeat until I reboot.
>

If I understand correctly, it was due to  LEqual(S_AH, 0xA6) awlays
true.
SMM bios code didn't  respond , or respond correctly 
to the request by "store 0x81, APMD"  due to thermal module caused
issue?
I need the acpi trace log before _PTS to see what kind of thermal
related methods got called.

    Method (SMPI, 1, NotSerialized)
    {
        Store (S_AX, Local0)
        Store (0x81, APMD)
        While (LEqual (S_AH, 0xA6))
        {
            Sleep (0x64)
            Store (Local0, S_AX)
            Store (0x81, APMD)
        }
    }


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-13  4:51 Yu, Luming
  2006-03-13  7:28 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-13  4:51 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos

>> I need the acpi trace log before _PTS to see what kind of thermal
>> related methods got called.
>
>Alas, I've included all the dmesg's.  

I need the full log  for S3 suspend failure not just snippets.
Please attach it on bugzilla.kernel.org

The log for S3 suspend success cannot help me to track down.


>
>Below is the script that I use to enter S3 sleep.  It unloads rid of
>troublesome modules and stop services that don't sleep well.  Then
>(for debugging) it sends the kernel version and boot parameters across
>the serial console (the @@@@ SLEEP line), raises the debug level to
>0x1F, does a sync (in case the sleep hangs, since this is my
>production machine), and then enters mem sleep.
>
>So nothing in it should trigger any thermal methods; except that I
>usually have the THM2 trip point raised to 45C with a polling time of
>100 seconds.  So once in a while a thermal poll will happen sleep is
>being set up.  I am not sure whether it would be reported in the
>dmesgs if it happened; but the S3 failure happens much more often than
>such a thermal polling would happen, so I doubt the S3 failure
>requires a thermal poll.

Could you try to mute thermal poll?

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-13  8:35 Yu, Luming
  2006-03-13 15:21 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-13  8:35 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos


Thanks for your debug information.

>
>> Could you try to mute thermal poll?
>
>Done.  The sleep.sh script now has
>
>echo 0 > /proc/acpi/thermal_zone/THM2/polling_frequency
>echo 0 > /proc/acpi/thermal_zone/THM0/polling_frequency
>sleep 1

Hmm,  could you file dmesges with tmermal module loaded and
unloaded?

>
>> I need the full log  for S3 suspend failure not just snippets.
>> Please attach it on bugzilla.kernel.org
>
>Done.

I saw this acpi_debug=0xffffffff.

I used to used to use acpi_debug_layer=0x10 acpi_debug_level=0x10
Could you try that?


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-14  1:48 Yu, Luming
  2006-03-14  8:28 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-14  1:48 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos

>> Hmm, could you file dmesgs with thermal module loaded and unloaded?
>
>Filed at bugzilla.

Excellent! .

>Let me know if there's a different permutation of debug options that I
>should try.  I wasn't sure whether you meant that I should leave all
>the debug values at 0x10.  Or whether I should still include
>acpi_debug=0xffffffff on top of the other options.

So far, it's ok,  I saw these,  Could you do bisection to find out
which methods or which thermal zone cause trouble?
To do that, you have to hack thermal.c by commenting out 
some calls of evaluating methods below.
I hope it is easy for you!	 :-)

Thanks,
Luming

Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
Execute Method: [\_TZ_.THM0._SCP] (Node c157bec8)
Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
ACPI: Thermal Zone [THM0] (47 C)
Execute Method: [\_TZ_.THM2._TMP] (Node c157bb88)
Execute Method: [\_TZ_.THM2._AC0] (Node c157bb48)
Execute Method: [\_TZ_.THM2._SCP] (Node c157bac8)
Execute Method: [\_TZ_.THM2._TMP] (Node c157bb88)
Execute Method: [\_TZ_.PFN0._ON_] (Node c157a2c8)
Execute Method: [\_TZ_.PFN0._STA] (Node c157a308)
ACPI: Thermal Zone [THM2] (40 C)
Execute Method: [\_TZ_.THM6._TMP] (Node c157b948)
Execute Method: [\_TZ_.THM6._AC0] (Node c157b908)
Execute Method: [\_TZ_.THM6._SCP] (Node c157b888)
Execute Method: [\_TZ_.THM6._TMP] (Node c157b948)
ACPI: Thermal Zone [THM6] (30 C)
Execute Method: [\_TZ_.THM7._TMP] (Node c157b708)
Execute Method: [\_TZ_.THM7._AC0] (Node c157b6c8)
Execute Method: [\_TZ_.THM7._SCP] (Node c157b648)
Execute Method: [\_TZ_.THM7._TMP] (Node c157b708)
ACPI: Thermal Zone [THM7] (33 C)


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-15  1:46 Yu, Luming
  2006-03-15  5:40 ` Sanjoy Mahajan
  2006-03-15  5:57 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-15  1:46 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>
>[I've trimmed non-relevant lists (v4l-dvb-maintainer@linuxtv.org,
>video4linux-list@redhat.com, linux-ide@vger.kernel.org,
>linux-input@atrey.karlin.mff.cuni.cz,
>linux-usb-devel@lists.sourceforge.net) from the CC.  Let me know if
>anyone else wants to be trimmed.]
>
>> Could you do bisection to find out which methods or which thermal
>> zone cause trouble?  To do that, you have to hack thermal.c by
>> commenting out some calls of evaluating methods below.  I hope it is
>> easy for you!  :-)
>
>I eventually muddled my way there.  The short story is that I can
>reproduce the hang -- on the FIRST S3 cycle -- when the _TMP method is
>called a few times, just for THM0.  

Excellent!
Could you just comment out _TMP in kernel or in DSDT,
and do several  S3 suspend /resume  Cycles without remove thermal
module, 
I want to make sure we are at right place to drill down. 

Thanks for your  testing reports. It's impressive. :-)

--Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-15  6:16 Yu, Luming
  2006-03-15  6:35 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-15  6:16 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> Could you just comment out _TMP in kernel or in DSDT, 
>
>I think it needs both excisions: If I comment out just the kernel _TMP
>calls, the DSDT might slip one in through the interpreter.  If I
>comment out just the DSDT _TMP calls, then the kernel can still call
>_TMP.  So instead I modified acpi_evaluate_integer() to return 27 C
>(3000 dK) if it's ever asked for a temperature, without doing any
>actual work:
>
>--- utils.c.orig	2006-02-27 00:09:35.000000000 -0500
>+++ utils.c		2006-03-14 23:36:59.000000000 -0500
>@@ -270,7 +270,15 @@ acpi_evaluate_integer(acpi_handle handle
>   memset(element, 0, sizeof(union acpi_object));
>   buffer.length = sizeof(union acpi_object);
>   buffer.pointer = element;
>-  status = acpi_evaluate_object(handle, pathname, arguments, &buffer);
>+  if (strcmp(pathname, "_TMP") != 0)
>+    status = acpi_evaluate_object(handle, pathname, 
>arguments, &buffer);
>+    else {
>+      printk(KERN_INFO PREFIX "acpi_evaluate_integer: Faking _TMP\n");
>+        status = AE_OK;
>+	   element->type = ACPI_TYPE_INTEGER;
>+	     element->integer.value = 3000; /* 27 C, in deciKelvins */
>+	     }
>+
>	if (ACPI_FAILURE(status)) {
>	   acpi_util_eval_error(handle, pathname, status);
>					return_ACPI_STATUS(status);
>
>This diff is in addition to the previous debugging changes to
>thermal.c.

If you do it in this way, all thermal zone's _TMP will be faked.
If you remove the real THM0._TMP, and fake a dummy THM0._TMP
in DSDT, and don't change anything in kernel, then if S3 works
well, I will be convinced that THM0._TMP was causing trouble.
Yes, I'm asking you to override DSDT for debugging. :-)
But, please make sure don't change other things in DSDT, otherwise
it still won't be trusted. :-)

Anyway, I'm studying THM0._TMP, and try to figure out how it is related
with EC. 

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-15  6:25 Yu, Luming
  0 siblings, 0 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-15  6:25 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>One sad piece of data that I came across, perhaps worth investigating
>further after this one is chased down:
>
>As described in the last email, the combination of _TMP fakery (in
>utils.c) plus the bisecting version of thermal.c (loading only the
>zone THM0 and then only up to bisect_get_info=1) got rid of the hangs.
>
>So I got bold and tried _TMP fakery but with the vanilla thermal.c.
>The idea being that if _TMP is to blame for all the problems, then S3
>sleep should work fine with this setup.  But it hung in the usual way,
>on the second sleep.  Below are the dmesgs after the usual boot-time
>ones.
>
>This experiment produces a hang even with _TMP faked, whereas the
>previous experiment didn't (also with _TMP faked but, after the boot,
>loading only the THM0 zone and only doing the _TMP methods of it, even
>on wake).  So one of the non-TMP methods below must be causing a
>problem?  My suspicion is that it's one of the methods called on wake
>(_THM0._PSV or ._TC1, etc. or maybe one of the other zone's methods),
>which would explain why the first sleep goes fine but the second one
>fails.
>
>I don't think it's any of the calls made when 'thermal' is loading at
>boot time, because the same calls happen in the previous experiment.
>In that experiment, thermal loads normally (with _TMP faked), and only
>after boot do I unload it and replace it with
>
>  modprobe thermal zone_to_keep=0 bisect_get_info=1
>
>Anyway, here are the dmesgs for this experiment (hangs on 2nd sleep):

Ok, Let's change the way of hacking. Let's start bisection
without  touching kernel, instead with DSDT.
Firstly, you need to find out which THM.
Then,  which Methods.
Finally, which statements that triggers S3 hang.

Thanks,
Luming


^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-15  6:47 Yu, Luming
  2006-03-15  7:06 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-15  6:47 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>> If you remove the real THM0._TMP, and fake a dummy THM0._TMP in
>> DSDT, and don't change anything in kernel, then if S3 works well, I
>> will be convinced that THM0._TMP was causing trouble.
>
>I'll try it, to test my theory above!  But one clarification first: Do
>you mean that I use a vanilla thermal.c, or should I keep using the
>modified thermal.c with zone_to_keep=0 as the module parameter?  I
>don't think I revert to the vanilla thermal.c.  Suppose that there are
>two bugs, which I think is likely (see previous email).  Commenting
>out only THM0._TMP but preserving everything else in the DSDT & kernel
>might eliminate any bug caused by THM0._TMP.  But if it still hangs --
>and I'm pretty sure it will -- it means there's a another bug
>somewhere else.

Ok, I'm fine whatever way you choose to start, But I think you need to
verify
the findings with the UN-modified kernek ,UN-modified Thermal.c and
others
that can reproduce S3 hang with UN-modified DSDT.




^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-15  8:02 Yu, Luming
  2006-03-16  0:03 ` Sanjoy Mahajan
                   ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-15  8:02 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>So, before I begin that search, which THM0 methods can I safely get
>rid of?  All of AC0M, AC1M, PSL, TC1, TC2, TSP, TBL0, MODP, _CRT, AL0,
>PSL?  That'll leave MODE, _TMP, _AC0, _SCP, PSV to bisect among.

for example, I would  fake these methods in this way:

	 Method (_TMP, 0, NotSerialized)
            {
                    Return (0x0BB8)
            }

            Method (_PSV, 0, NotSerialized)
            {
                Return (0)
            }
Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
Execute Method: [\_TZ_.THM0._PSV] (Node c157be48)
Execute Method: [\_TZ_.THM0._TC1] (Node c157bdc8)
Execute Method: [\_TZ_.THM0._TC2] (Node c157bd88)
Execute Method: [\_TZ_.THM0._TSP] (Node c157bd48)
Execute Method: [\_TZ_.THM0._AC0] (Node c157bf48)
Execute Method: [\_TZ_.THM0._SCP] (Node c157bec8)
Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-16  6:41 Yu, Luming
  2006-03-16  6:54 ` Sanjoy Mahajan
  2006-03-16  7:14 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-16  6:41 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>   hang iff (TMP & (PSV | AC0)).

Very interesting! 

I found the common code in _PSV and _AC0

 Store (DerefOf (Index (DerefOf (MODP (0x01)), Local1)), Local0)

Could you just comment out that?

We are very near at root-cause.

Thanks,
luming




^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-16  7:28 Yu, Luming
  2006-03-16  7:57 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-16  7:28 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>It doesn't hang.  Though it seemed close to hanging a couple times,
>but after a 5-10 second pause always managed to go to sleep.  I tried
>about 15 sleep cycles, with a few echo 1 > polling_frequency thrown in.

ACPI SPEC define:

_PSV  : thermal zone object that returns Passive trip point in
	tenths of digress Kelvin.

_ACx:  thermal zone object that returns active cooling policy 
	threshold values in tenths of degrees Kelvin.

I suspect , when hang, the system was trying to start active cooling
with Fan
in function acpi_thermal_active that was somehow conflict request with
_PTS's call to SMPI in BIOS.  So, the solution is :

	Disable active/passive cooling request before suspend.

To verify this, please hack acpi_thermal_active.

We need a suspend/resume method for acpi thermal to cleanly solve 
your problem.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-16  8:18 Yu, Luming
  2006-03-16 15:15 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-16  8:18 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>> To verify this, please hack acpi_thermal_active.
>
>Do you mean hack it for now to return without doing anything (like if
>'tz' wasn't valid)?  Or do it farther in the function, like by
>changing
>
>				result =
>				    acpi_bus_set_power(active->devices.
>						       handles[j],
>						       ACPI_STATE_D0);
>to 
>
>				result = 1;
>
>>  Disable active/passive cooling request before suspend.

Yes, just return , and DONT do anything could impact to platform.

>
>Do I need to hack acpi_thermal_passive() as well?

Yes.

Please also make sure you have vanilla DSDT,  vanilla Kernel, and just
hacked 
acpi_thermal_active/passive.

I'm waiting for your good news.
If it is the root cause, probably you need to come up with a real patch.
:-)


Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-17  1:17 Yu, Luming
  2006-03-17  6:28 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-17  1:17 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>Bad news.  It hangs when I do the usual stress test:

Hmm,  we can continue to have fun with debugging. Right?

>
>echo 1 > THM0/polling_frequency
>sleep.sh
>sleep.sh
>
>The second sleep.sh hangs going to sleep.  It is in an endless loop
>printing the following line, once per second (from the
>polling_frequency):
>
>  Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)

This should be the diffient problem with the previous reported hang.
I recall it was hang at a loop in SMPI waiting for BIOS's response.
Please confirm, Also please mute THM0 polling.

>
>> Please also make sure you have vanilla DSDT
>
>$ grep DSDT /boot/config-2.6.16-rc5.fake-thermal_active+passive
># CONFIG_ACPI_CUSTOM_DSDT is not set
>
>> vanilla Kernel, and just hacked acpi_thermal_active/passive.
>
>Only diff between pristine 2.6.16-rc5 tree and mine is:
>
>diff -rup /tmp/linux-2.6.16-rc5/drivers/acpi/thermal.c 
>/usr/src/linux-2.6.16-rc5/drivers/acpi/thermal.c
>--- /tmp/linux-2.6.16-rc5/drivers/acpi/thermal.c	
>2006-02-27 00:09:35.000000000 -0500
>+++ /usr/src/linux-2.6.16-rc5/drivers/acpi/thermal.c	
>2006-03-16 09:45:30.000000000 -0500
>@@ -526,6 +526,8 @@ static void acpi_thermal_passive(struct 
> 
> 	ACPI_FUNCTION_TRACE("acpi_thermal_passive");
> 
>+	return;
>+
> 	if (!tz || !tz->trips.passive.flags.valid)
> 		return;
> 
>@@ -615,6 +617,8 @@ static void acpi_thermal_active(struct a
> 
> 	ACPI_FUNCTION_TRACE("acpi_thermal_active");
> 
>+	return;
>+
> 	if (!tz)
> 		return;
> 
>

This looks ok for debugging.

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-17  6:57 Yu, Luming
  2006-03-17  7:11 ` Sanjoy Mahajan
  2006-03-17  7:32 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-17  6:57 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> Hmm,  we can continue to have fun with debugging. Right?
>
>Definitely, I haven't given up.

Great!

>
>>> The second sleep.sh hangs going to sleep.  It is in an endless loop
>>> printing the following line, once per second (from the
>>> polling_frequency):
>>>
>>>  Execute Method: [\_TZ_.THM0._TMP] (Node c157bf88)
>
>I don't think these lines are a problem.  They just reflect that
>thermal polling is happening once per second.  So even though the ACPI
>system is hanging in the SMPI loop (as you say below), it is alive
>enough to poll the temperature sensors.
>
>> Also please mute THM0 polling.
>
>I retested the hacked kernel (with faked thermal_active/passive)
>but with no thermal polling, just doing
>
>  cat THM*/polling_frequency (they were all 'polling disabled')
>  sleep.sh  (works)
>  sleep.sh  (hangs in the usual SMPI loop)
>
>and it hangs as usual.

Good news, no new branch needed to track. 
I assume the problem is still like _TMP & (_PSV | _AC0).

How about re-testing dummy _PSV and dummy _AC0 in DSDT?
Because, your testing result with dummy _PSV and dummy_AC0
IS NOT consistent with the result of hacking
acpi_thermal_passive/active.
Maybe I need to reconsider the impact of _PSV or_AC0 on the 
platform.

How about  just faking _TMP in DSDT. I'm sure you have done this before.
But, I need to confirm that the problem is NOT _TMP | _PSV | _AC0.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-17  7:50 Yu, Luming
  2006-03-17 18:43 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-17  7:50 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> How about re-testing dummy _PSV and dummy _AC0 in DSDT?
>
>Just retested and you were right.  This time I managed to get it to
>hang, after many cycles of sleep.sh and "modprobe -r thermal ;
>modprobe thermal" mixed in.
>

Hmmm, may I think this is a problem of:
_TMP ,

It is neither _TMP && (_PSV || _AC0),
nor  _TMP || _PSV || _AC0.

So, please try hack thermal.c by removing calls to _TMP.
And do stress test with Vanilla Kernel, Vanilla Dsdt , just
with hacked thermal.c

Anyway, the clean way to fix your problem might be:

 suspend thermal driver with disabling AML methods invoke
that might cause/ trigger BIOS issues.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18  2:02 Yu, Luming
  2006-03-18  7:23 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18  2:02 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> So, please try hack thermal.c by removing calls to _TMP.
>
>I did something like that before, by changing acpi_evaluate_integer()
>to return 3000 if it is asked for _TMP.  
>
>--- a/utils.c	2006-03-15 01:42:34.000000000 -0500
>+++ b/utils.c	2006-03-14 23:36:59.000000000 -0500
>@@ -270,7 +270,15 @@ acpi_evaluate_integer(acpi_handle handle
> 	memset(element, 0, sizeof(union acpi_object));
> 	buffer.length = sizeof(union acpi_object);
> 	buffer.pointer = element;
>-	status = acpi_evaluate_object(handle, pathname, 
>arguments, &buffer);
>+	if (strcmp(pathname, "_TMP") != 0)
>+	  status = acpi_evaluate_object(handle, pathname, 
>arguments, &buffer);
>+	else {
>+	  printk(KERN_INFO PREFIX "acpi_evaluate_integer: 
>Faking _TMP\n");
>+	  status = AE_OK;
>+	  element->type = ACPI_TYPE_INTEGER;
>+	  element->integer.value = 3000; /* 27 C, in deciKelvins */
>+	}
>+
> 	if (ACPI_FAILURE(status)) {
> 		acpi_util_eval_error(handle, pathname, status);
> 		return_ACPI_STATUS(status);
>
>
>The alternative, obvious change in thermal.c (diff below) turns out
>not to be a minimal change.  If acpi_thermal_get_temperature() returns
>with a failure, then most of the later methods in THM0 aren't
>executed, so one is actually commenting out much more than _TMP.
>
>Which is why I think the minimal change is the diff above to utils.c.
>With that change the system never hung.

Good, this is exactly what I wanted.  How many times you tested with
this
hack without hang?  If s3 hang really goes away , then probably you can
move on , and come up with a real patch that could go into the 2.6.16. 
What do you think? :-)

The short-term proper way could be:
1. add a global variable: acpi_in_suspend.
2. in acpi_pm_prepare:
	a.call acpi_os_wait_events_complete()
	b.set acpi_in_suspend = YES.
   in acpi_pm_finish :
	set acpi_in_suspend = NO.
3. in acpi_thermal_run:
	if (acpi_in_suspend == YES)
		do nothing.

The long-term proper way should be:
1. ACPI subsystem should stop invoking BIOS before Suspend except
for several necessary AML methods that are required to put 
the platform into S3 state.  Otherwise, un-tested BIOS code path 
could cause trouble to linux, because I assume such platform 
should have been tested under windows. 

Thanks,
Luming
 

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18 13:24 Yu, Luming
  2006-03-18 14:37 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18 13:24 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> The short-term proper way could be:
>> 1. add a global variable: acpi_in_suspend.
>> 2. in acpi_pm_prepare:
>>	a.call acpi_os_wait_events_complete()
>> 	b.set acpi_in_suspend = YES.
>>    in acpi_pm_finish :
>> 	set acpi_in_suspend = NO.
>> 3. in acpi_thermal_run:
>> 	if (acpi_in_suspend == YES)
>>		do nothing.
>
>I tested the included diff to implement the above short-term fix.  It
>also hung on the second sleep.  BUT, it's the same reason that the
>utils.c change didn't help: because acpi_thermal_add() was loading
>THM[0267].  After the usual modification to acpi_thermal_add() to have
>it ignore THM[267], the system didn't hang (12 cycles).  Which is
>progress.

Hmm,  probably, you need to do :

4. in acpi_thermal_notify,
	if (acpi_in_suspend == YES)
		do nothing.

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18 15:10 Yu, Luming
  2006-03-18 15:48 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18 15:10 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>> Hmm,  probably, you need to do :
>>
>> 4. in acpi_thermal_notify,
>>       if (acpi_in_suspend == YES)
>>               do nothing.
>
>I've just tested that.  It suspended twice without problem, which made
>me think the problem was solved.  But it hung on the third suspend!

I'm NOT surprised about that hung, because kernel thread kacpid 
is a kernel worker thread that has flag PF_NOFREEZE, that means
kacpid won't be freezed.  I tried to freeze kacpid, but end up with 
this conclusion.  From my understanding, for safety concern,
kernel worker thread should be freezed. Because, kacpid could
invoke AML methods that we are trying to avoid during suspend.

Please try additional ugly hack
 5. in acpi_os_queue_for_execution:
	if(acpi_in_suspend == YES)
		do nothing.

Also, please add acpi_debug_layer=0x10 acpi_debug_leve=0x10 
boot option, then you can observe what methods were executed
before suspend.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18 15:58 Yu, Luming
  2006-03-18 16:27 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18 15:58 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>> Please try additional ugly hack
>>  5. in acpi_os_queue_for_execution:
>>	if(acpi_in_suspend == YES)
>>		do nothing.
>
>Am compiling it.  If acpi_in_suspend, I've had it do
>return_ACPI_STATUS(AE_BAD_PARAMETER).  Is there a better error code to
>use?  I didn't want to use AE_OK, since the caller might think that
>the function will be executed eventually, and might do something silly
>like wait for it to be executed -- and produce another hang.  I didn't
>know, but to be safe I wanted to return an error code.

just return AE_OK, because we are hacking. :-)
The only place that could have issue is in acpi_ev_global_lock_handler,
you can add a printk there, then you can know what happened.

>
>> Also, please add acpi_debug_layer=0x10 acpi_debug_leve=0x10 boot
>> option, then you can observe what methods were executed before
>> suspend.
>
>That's in my lilo.conf so all kernels I test use those options.  I can
>send you the dmesgs from the suspends without the ugly hack (and will
>send them from the upcoming suspends, with the ugly hack).

Thanks, I'm waiting for that to understand if the hack is clean for
killing unwanted AML methods call.

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18 16:37 Yu, Luming
  2006-03-18 17:03 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18 16:37 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>Here first are the dmesgs from suspending with a vanilla 2.6.16-rc5.  I
>did only one cycle so that it didn't hang and I could edit this email
>without rebooting (but later suspends produce the same method 
>calls, I'm
>90% sure):
>
># the sleep dmesgs
>PM: Preparing system for mem sleep
>Stopping tasks: 
>=======================================================|
Did you see any methods before and after this line in hang case on
screen?
If yeas, do you recall what they are?

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-18 17:08 Yu, Luming
  2006-03-18 20:12 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-18 17:08 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos


>>> PM: Preparing system for mem sleep
>>> Stopping tasks: 
>>> =======================================================|
>
>> Did you see any methods before and after this line in hang case on
>> screen?  If yes, do you recall what they are?
>
>I capture across a serial console, so here are the exact msgs (I just
>ran the second sleep and got the usual hang).  This is with vanilla
>2.6.16-rc5 (and vanilla DSDT):
>
>Stopping tasks: 
>=========================================================|
>Execute Method: [\_SB_.LID0._PSW] (Node c1564808)
>Execute Method: [\_SB_.SLPB._PSW] (Node c1564708)
>Execute Method: [\_S3_] (Node c157a988)
>Execute Method: [\_PTS] (Node c157ab48)
>
>The screen itself is full of garbage because the first 
>sleep/wake messes
>up the console.  Along with a giant white square that fills most of the
>screen, I see a fuzzy, dotted version of the above messages, plus one
>more line "ACPI" and then a flashing underscore cursor after that.  I
>don't know if it was trying to printk "ACPI" but then the rest of the
>message got lost, or it hung before printing it, or whether the ACPI is
>from a previous dmesg (i.e. the first sleep/wake) that didn't get
>cleared properly.

Do you load processor driver?

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-19  4:12 Yu, Luming
  2006-03-19 14:33 ` Sanjoy Mahajan
  2006-03-20  6:39 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-19  4:12 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> Do you load processor driver?
>
>It's loads at boot.  When thermal loads, it pulls in processor:
>
>$ lsmod | grep thermal
>thermal                17224  0 
>processor              30080  1 thermal
>

Maybe I need to make a summary here for this issue:
1. The s3 hang is in While-loop in SMPI that looks like
waiting BIOS response.
2. If THM2, THM6, THM7 disabled, disabling THM0._TMP
fix the s3 hang.

I think you need to continue to find out which THMs, which methods
cause s3 hang when THM0._TMP disabled.
I assume the problem is:
THM0._TMP && THMx._XXX && THMy._YYY..

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-21  1:38 Yu, Luming
  2006-03-21  7:27 ` Sanjoy Mahajan
  2006-03-21  8:47 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-21  1:38 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> I think you need to continue to find out which THMs, which methods
>> cause s3 hang when THM0._TMP disabled.
>
>So far I've found that if (with no THM0 loaded) I load exactly one of
>THM2, THM6, or THM7, then there's no hang.  Now I am looking for which
>combinations of the THM[0267] zones cause the problem.

Hmm, I guess you don't need to try each combination of  THM[0267].
>From pervious experience, we know _THM0._TMP causes problem.
If you fake _TMP for all THM, what could happen?

If you verified _TMP cause issue by fake them in DSDT,  probably,
we need to continue dig Method : UPDT. 

                    Method (UPDT, 0, NotSerialized)
                    {
                        If (IGNR)
                        {
                            Decrement (IGNR)
                        }
                        Else
                        {
                            If (H8DR)
                            {
                                If (Acquire (I2CM, 0x0064)) {}
                                Else
                                {
                                    Store (I2RB (Zero, 0x01, 0x04),
Local7)
                                    If (Local7)
                                    {
                                        Fatal (0x01, 0x80000003, Local7)
                                    }
                                    Else
                                    {
                                        Store (HBS0, TMP0)
                                        Store (HBS2, TMP2)
                                        Store (HBS6, TMP6)
                                        Store (HBS7, TMP7)
                                    }

                                    Release (I2CM)
                                }
                            }
                        }
                    }

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-21  9:11 Yu, Luming
  2006-03-21 20:37 ` Sanjoy Mahajan
  2006-03-21 22:09 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-21  9:11 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>With _TMP faked in the kernel and one whole zone ignored, this 
>is what I
>get:
>
>Zone to ignore	|	Result
>---------------------------------------------------------------
>---------
>THM0			OK (10 cycles)
>THM2			"kernel panic! attempted to kill init"

I guess, if you fake DSDT by completely removing THM2
you won't see this.

>THM6			Hangs (4th cycle)
Is it still hang at SMPI?

>THM7			OK (8 cycles)
>
>So THM6 seems healthy, but THM0 and THM7 (and maybe THM2) interact
>badly.  If I unload THM2, THM6, and THM7, then it's okay (previous
>experiments with faking _TMP but with only THM0 loaded).  But unloading
>THM6 is not enough.

Please try to remove THM2 judge if it is JUST the 
problem of THM0 && THM7.

>
>The kernel panic for the don't-load-THM2 kernel is very strange.  I had
>another kernel panic while doing another set of tests, which I also
>couldn't explain.  The only difference between the no-THM0 and the
>no-THM2 kernels is:

Could you just printk device->pnp? it could be null point (due to 
you hack?)

>
>diff -r b7ad6c906aba -r 213308f0ec31 drivers/acpi/thermal.c
>--- a/drivers/acpi/thermal.c	Tue Mar 21 02:23:30 2006 -0500
>+++ b/drivers/acpi/thermal.c	Tue Mar 21 02:36:42 2006 -0500
>@@ -1324,7 +1324,7 @@ static int acpi_thermal_add(struct acpi_
> 
> 	if (!device)
> 		return_VALUE(-EINVAL);
>-	if (strcmp("THM2", device->pnp.bus_id) == 0) {
>+	if (strcmp("THM0", device->pnp.bus_id) == 0) {
> 	    printk(KERN_INFO PREFIX "thermal_add: ignoring %s\n",
> 		   device->pnp.bus_id);
> 	    return_VALUE(-EINVAL);
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-22  1:30 Yu, Luming
  2006-03-22  4:35 ` Sanjoy Mahajan
  2006-03-22  7:15 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-22  1:30 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>Two more experiments:
>
>  With a vanilla kernel, I faked EC0.UPDT() to just return 
>0x00, and the
>  system hung on the second sleep.
>
>  Then, again in the DSDT, I also faked the 4 _TMP methods (one in each
>  thermal zone), and the system hung on the second sleep.
>
>I think we've raced too far ahead by trying to debug many thermal zones
>at once.  Perhaps there are two bugs.  So let's find them one by one.

Hmm, you seems to prefer depth-first search algorithm?
I like it too. :-)


>
>One bug is quite repeatable and we know a lot about it. With all zones
>except THM0 commented out, the system hung.  With the EC0.UPDT line in
>THM0._TMP also commented out, the system didn't hang.  So there's a
>problem related to the EC, even with only THM0.  And finding that
>problem may giveideas for what else may be wrong.

We can do bisection in EC0.UPDT to find out which statement cause hang?
Hmm, we are going to fix BIOS. :-)

My assumption is that since Windows works well, then these BIOS code
should have been tested ok. The only possible excuse for BIOS is that
Linux is using unnecessary/untested code path for Suspend/resume.
So, Eventually, we need to disable unnecessary BIOS call for
suspend/resume

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-22  1:34 Yu, Luming
  2006-03-22  7:00 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-22  1:34 UTC (permalink / raw)
  To: Yu, Luming, Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>
>Hmm, you seems to prefer depth-first search algorithm?
>I like it too. :-)
>
>
>>
>>One bug is quite repeatable and we know a lot about it. With all zones
>>except THM0 commented out, the system hung.  With the EC0.UPDT line in
>>THM0._TMP also commented out, the system didn't hang.  So there's a
>>problem related to the EC, even with only THM0.  And finding that
>>problem may giveideas for what else may be wrong.
>
>We can do bisection in EC0.UPDT to find out which statement cause hang?
>Hmm, we are going to fix BIOS. :-)

You can insert debug statements in EC0.UPDT to help debug:

Store (IGNR, Debug)
Store (" before relase I2CM", Debug)
Store (HBS7, TMP7)	
....

>
>My assumption is that since Windows works well, then these BIOS code
>should have been tested ok. The only possible excuse for BIOS is that
>Linux is using unnecessary/untested code path for Suspend/resume.
>So, Eventually, we need to disable unnecessary BIOS call for 
>suspend/resume

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-22  4:58 Yu, Luming
  2006-03-22  5:13 ` Sanjoy Mahajan
  2006-03-24  1:17 ` Sanjoy Mahajan
  0 siblings, 2 replies; 90+ messages in thread
From: Yu, Luming @ 2006-03-22  4:58 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>> We can do bisection in EC0.UPDT to find out which statement cause
>> hang?
>
>Yes, though see below for why I don't think it'll help no 
>matter what we
>find there.

Please don't give up . :-)

I need to know which statement  in EC0.UPDT that could trigger the
problem.
That is very important to understand the problem correctly.
If we cannot find out that statement , then, I will dout the testing
results that guiding us to here.

>
>> My assumption is that since Windows works well, then these BIOS code
>> should have been tested ok. The only possible excuse for BIOS is that
>> Linux is using unnecessary/untested code path for 
>Suspend/resume.  So,
>> Eventually, we need to disable unnecessary BIOS call for
>> suspend/resume
>
>Maybe we're not collecting the right data in that case.  We know that
>commenting out the call to UPDT in THM0.TMP fixes the hang.  
>But it does
>not follow that the osl suspend code should avoid running UPDT.

This is still my assumption that some AML code needed to be avoided
in suspend/resume, I need data support. So, we need to dig more in 
EC0.UPDT.


>
>The hang may work like this: Between boot and sleep, calling 
>UPDT messes
>up something in the ec [which is why it takes >1 sleep to 
>cause a hang].
>When the system tries to sleep, that something triggers and the ec
>hangs.  But it may hang somewhere else than UPDT, and avoiding UPDT
>during sleep will not fix it.

If BIOS behaviors NOT correctly , then everything can happen.

>
>However, we do have one more piece of data.  When it hangs, it hangs in
>\_SI._SST, because I see that line on successful sleeps (as the last

I don't know this. I always assume the hang is at _PTS.SMPI

>method before the beep) but not when it hangs (and then I also don't
>hear a beep).  There are lots of calls to EC0.XXX, including to
>EC0.BEEP, within _SST, which isn't surprising if the EC is the problem.

It could be. But there should have something that trigger it.

>So perhaps I should bisect in _SST and put in the debug lines there?
>
>Here's another idea, which is a terrible hack.  But there are lots of
>lines in the DSDT like
>   If (LOr (SPS, WNTF))
>which I imagine is saying "If something or if WinNT".  So, 
>what if Linux
>pretends to be WinNT (or W98F -- which is another common 
>test), at least
>for the 600x?  Maybe those code paths are known to work.
>
Yes, you can try that.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-22  7:28 Yu, Luming
  2006-03-22 14:16 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-22  7:28 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>Since I don't think Fatal() isn't being called, I guess the problem is
>in I2RB.  But all those magic numbers in I2RB make me recultant to take
>out lines, unless you tell me which changes won't harm the hardware.
>

How about this. The side effect of this change is that _BIF, _BST could
NOT
work. But I think it's just ok.


                    Method (I2RB, 3, NotSerialized)
                    {
                        Store (Arg0, HCSL)
                        Store (ShiftLeft (Arg1, 0x01), HMAD)
                        Store (Arg2, HMCM)
                        Store (0x0B, HMPR)
          /*              Return (CHKS ())*/
                    }

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-23  4:46 Yu, Luming
  2006-03-23  6:25 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-23  4:46 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>  How about this. The side effect of this change is that _BIF, 
>_BST could
>  NOT work. But I think it's just ok.
>
>
>		      Method (I2RB, 3, NotSerialized)
>		      {
>			  Store (Arg0, HCSL)
>			  Store (ShiftLeft (Arg1, 0x01), HMAD)
>			  Store (Arg2, HMCM)
>			  Store (0x0B, HMPR)
>	    /*              Return (CHKS ())*/
>		      }
>
>It hangs in the usual way (2nd sleep).  The boot messages had two Fatal
>opcodes, but that must be the _BIF and _BST that you mentioned:

Good, then the hang should be caused by:

			  Store (Arg0, HCSL)
			  Store (ShiftLeft (Arg1, 0x01), HMAD)
			  Store (Arg2, HMCM)
			  Store (0x0B, HMPR)

Could you add this at the beginning of this block:
	Store (Arg0,  Debug)
And add this at the end of this block:
	Store( HMPR, Debug)

Also change boot option: acpi_debug_layer=0x00100010,
acpi_debug_level=0x10
Let me verify if ec access is just ok.

>
>  Execute Method: [\_TZ_.THM0._TMP] (Node e3f8bf88)
>  ACPI: Fatal opcode executed
>  Execute Method: [\_TZ_.THM0._PSV] (Node e3f8be48)
>  Execute Method: [\_TZ_.THM0._TC1] (Node e3f8bdc8)
>  Execute Method: [\_TZ_.THM0._TC2] (Node e3f8bd88)
>  Execute Method: [\_TZ_.THM0._TSP] (Node e3f8bd48)
>  Execute Method: [\_TZ_.THM0._AC0] (Node e3f8bf48)
>  Execute Method: [\_TZ_.THM0._SCP] (Node e3f8bec8)
>  Execute Method: [\_TZ_.THM0._TMP] (Node e3f8bf88)
>  ACPI: Fatal opcode executed
>  ACPI: Thermal Zone [THM0] (47 C)
>
>With later modifications (e.g. commenting out one of the Store 
>lines), I
>could Return(0x00) instead of commenting out the line.  Let me know
>which ones to try.  

Probably yes.

>
>One more thought.  We know that commenting out the UPDT call in _TMP
>fixes the hang.  By bisecting the UPDT method, however, we change every
>call to UPDT, including the one in THM0._TMP.  So we're making extra
>changes beyond what is needed to fix the hang (and maybe producing
>another hang?).
>
>But let's continue this bisection since it's almost done.  If we
>eventually find the offending statement, we can use the information in
>order to find the smallest change that fixes the hang.  We make a copy
>of the original UPDT method, call it UPDTCOPY, say; same for 
>I2RB.  Then
>THM0._TMP can call EC0.UPDTCOPY(), which calls I2RBCOPY.  And we modify
>I2RBCOPY, but we leave I2RB and UPDT alone.
>
Yes, that's good idea to have separate i2rb copy for THM0 which we are
hacking.

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-23  9:10 Yu, Luming
  2006-03-23 19:19 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-23  9:10 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

>   Good, then the hang should be caused by:
>
>			     Store (Arg0, HCSL)
>			     Store (ShiftLeft (Arg1, 0x01), HMAD)
>			     Store (Arg2, HMCM)
>			     Store (0x0B, HMPR)
>
>   Could you add this at the beginning of this block:
>	   Store (Arg0,  Debug)
>   And add this at the end of this block:
>	   Store( HMPR, Debug)
>
>I added those two lines to the DSDT with only THM0 zone, but with
>nothing else commented out.  Below are the dmesgs for one sleep-wake
>cycle, plus an 'acpi -t'.  I thought it would hang if I did one more
>cycle, but it didn't.  So I tried five more, and it was fine too.
>
>Then I reset /proc/acpi/acpi_debug_layer to 0x10 (the boot paramater is
>acpi_dbg_layer although the /proc file is acpi_debug_layer), and
>unloaded and reloaded the thermal module.  And it hung in the 
>(expected)
>two cycles.  I've seen this behavior before: It won't hang with lots of
>debugging turned on, but it does hang with less debugging.  Strange!

Hmmm, then I cannot get the ec access log for hang case?!

	acpi_hw_low_level_read(8, data, &ec->common.data_addr);
	ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Read [%02x] from address
[%02x]\n",
			  *data, address));

Does it mean we need to slow down  acpi_ec_intr_read/write ?
Could you try to insert acpi_os_stall (100)  after  ACPI_DEBUG_PRINT
statement
both in acpi_ec_intr_read/write.

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-03-24  1:31 Yu, Luming
  2006-04-04  6:49 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-03-24  1:31 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos

es it mean we need to slow down  acpi_ec_intr_read/write ?
>> Could you try to insert acpi_os_stall (100)  after  ACPI_DEBUG_PRINT
>> statement both in acpi_ec_intr_read/write.
>
>I added that line in those two places.  The result refused to hang with
>acpi_debug_layer=0x00100010, but it did hang (on the usual 
>second sleep)
>with it set to 0x10.

Really strange,  how several printks could change the results.
Could you try to repalce acpi_os_stall with acpi_os_sleep(1)
in acpi_ec_intr_read/write?

>
>> Hmmm, then I cannot get the ec access log for hang case?!
>
>It seems difficult, but let's keep trying if you have other ideas for
>how to get it.
>

Also, please change I2RB copy to: 

                    Method (I2RBcopy, 3, NotSerialized)
                    {
                        Store (Arg0, HCSL)
                        Store (ShiftLeft (Arg1, 0x01), HMAD)
                        Store (Arg2, HMCM)
                        Store (0x0B, HMPR)

		Store(CHKS(), local0)
		Store(local0, Debug)

                        Return (local0)
                    }
And boot with acpi_dbg_layer=0x10 acpi_dbg_level=0x10,
Post full log (Don't edit) for both not hang case, and hang case on
bugzilla.
There should have some clues.

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-04-05  3:03 Yu, Luming
  0 siblings, 0 replies; 90+ messages in thread
From: Yu, Luming @ 2006-04-05  3:03 UTC (permalink / raw)
  To: Sanjoy Mahajan, linux-acpi; +Cc: linux-kernel, Andrew Morton, Brown, Len


>diff -r ac486e270597 -r abd89292c539 drivers/acpi/osl.c
>--- a/drivers/acpi/osl.c	Sat Mar 18 08:35:34 2006 -0500
>+++ b/drivers/acpi/osl.c	Thu Mar 30 10:59:57 2006 -0500
>@@ -634,6 +634,8 @@ static void acpi_os_execute_deferred(voi
> 	return_VOID;
> }
> 
>+extern int acpi_in_suspend;
>+
> acpi_status
> acpi_os_queue_for_execution(u32 priority,
> 			    acpi_osd_exec_callback function, 
>void *context)
>@@ -643,6 +645,8 @@ acpi_os_queue_for_execution(u32 priority
> 	struct work_struct *task;
> 
> 	ACPI_FUNCTION_TRACE("os_queue_for_execution");
>+	if (acpi_in_suspend)	/* in case kacpid is causing 
>the queue */
>+		return_ACPI_STATUS(AE_OK);

The request will be dropped silently , So, it sounds ugly.
At least, you need to put some warning here. 
The long-term solution is to fix the invoker to NOT ask 
kacpid to invoke AML methods during suspend-resume period.

> 
> 	ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
> 			  "Scheduling function [%p(%p)] for 
>deferred execution.\n",
>diff -r ac486e270597 -r abd89292c539 drivers/acpi/sleep/main.c
>--- a/drivers/acpi/sleep/main.c	Sat Mar 18 08:35:34 2006 -0500
>+++ b/drivers/acpi/sleep/main.c	Thu Mar 30 10:59:57 2006 -0500
>@@ -19,6 +19,12 @@
> #include <acpi/acpi_drivers.h>
> #include "sleep.h"
> 
>+/* for functions putting machine to sleep to know that we're
>+   suspending, so that they can careful about what AML methods they
>+   invoke (to avoid trying untested BIOS code paths) */
>+int acpi_in_suspend;
>+EXPORT_SYMBOL(acpi_in_suspend);
>+
> u8 sleep_states[ACPI_S_STATE_COUNT];
> 
> static struct pm_ops acpi_pm_ops;
>@@ -55,6 +61,8 @@ static int acpi_pm_prepare(suspend_state
> 		printk("acpi_pm_prepare does not support %d 
>\n", pm_state);
> 		return -EPERM;
> 	}
>+	acpi_os_wait_events_complete(NULL);
>+	acpi_in_suspend = TRUE;
> 	return acpi_sleep_prepare(acpi_state);

There is race condition here.
Probably, it should be :
	acpi_in_suspend = TURE;
	acpi_os_wait_events_complete(NULL);

> }
> 
>@@ -132,6 +140,7 @@ static int acpi_pm_finish(suspend_state_
> 	u32 acpi_state = acpi_suspend_states[pm_state];
> 
> 	acpi_leave_sleep_state(acpi_state);
>+	acpi_in_suspend = FALSE;
> 	acpi_disable_wakeup_device(acpi_state);
> 
> 	/* reset firmware waking vector */
>diff -r ac486e270597 -r abd89292c539 drivers/acpi/thermal.c
>--- a/drivers/acpi/thermal.c	Sat Mar 18 08:35:34 2006 -0500
>+++ b/drivers/acpi/thermal.c	Thu Mar 30 10:59:57 2006 -0500
>@@ -79,6 +79,8 @@ static int tzp;
> static int tzp;
> module_param(tzp, int, 0);
> MODULE_PARM_DESC(tzp, "Thermal zone polling frequency, in 
>1/10 seconds.\n");
>+
>+extern int acpi_in_suspend;
> 
> static int acpi_thermal_add(struct acpi_device *device);
> static int acpi_thermal_remove(struct acpi_device *device, int type);
>@@ -683,6 +685,8 @@ static void acpi_thermal_run(unsigned lo
> static void acpi_thermal_run(unsigned long data)
> {
> 	struct acpi_thermal *tz = (struct acpi_thermal *)data;
>+	if (acpi_in_suspend)	/* thermal methods might cause a hang */
>+		return_VOID;	/* so don't do them */

If you fixed kacpid, then this part could be removed.

> 	if (!tz->zombie)
> 		acpi_os_queue_for_execution(OSD_PRIORITY_GPE,
> 					    acpi_thermal_check, 
>(void *)data);
>@@ -705,6 +709,8 @@ static void acpi_thermal_check(void *dat
> 
> 	state = tz->state;
> 
>+	if (acpi_in_suspend)
>+		return_VOID;

Could it cause trouble to caller?

> 	result = acpi_thermal_get_temperature(tz);
> 	if (result)
> 		return_VOID;
>@@ -1224,6 +1230,9 @@ static void acpi_thermal_notify(acpi_han
> 	struct acpi_device *device = NULL;
> 
> 	ACPI_FUNCTION_TRACE("acpi_thermal_notify");
>+
>+	if (acpi_in_suspend)	/* thermal methods might cause a hang */
>+		return_VOID;	/* so don't do them */

Could it cause trouble to caller?

> 
> 	if (!tz)
> 		return_VOID;

^ permalink raw reply	[flat|nested] 90+ messages in thread
* RE: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
@ 2006-05-23 13:29 Yu, Luming
  2006-05-23 17:12 ` Sanjoy Mahajan
  0 siblings, 1 reply; 90+ messages in thread
From: Yu, Luming @ 2006-05-23 13:29 UTC (permalink / raw)
  To: trenn, Sanjoy Mahajan
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, v4l-dvb-maintainer,
	video4linux-list, Brian Marete, Ryan Phillips, gregkh,
	linux-usb-devel, Brown, Len, linux-acpi, Mark Lord, Randy Dunlap,
	jgarzik, linux-ide, Duncan, Pavlik Vojtech, linux-input,
	Meelis Roos, Carl-Daniel Hailfinger


>> exregion-0185 [36] ex_system_memory_space: system_memory 0 
>(32 width) Address=0000000023FDFFC0
>> exregion-0185 [36] ex_system_memory_space: system_memory 1 
>(32 width) Address=0000000023FDFFC0
>> exregion-0290 [36] ex_system_io_space_han: system_iO 1 (8 
>width) Address=00000000000000B2
>> 
>> repeated endlessly.

Hmm.. interesting.  This looks like same error with TP600X.

>
>This sounds like the problem Daniel had on his Samsung P35 recently.
>He could fix it by getting rid of some asus_unhide_smbus stuff or the
>otherway around, adding asus_unhide_smbus quirks in the S3 resume code.
>
>This thread was recently posted on lkml:
>Re: [patch] smbus unhiding kills thermal management
>
>Here are some more details, for me that sounds related...:
>https://bugzilla.novell.com/show_bug.cgi?id=173420
>

But this Samsung P35 don't have _GLK. So, I think TP 600x has
a different problem with Samsung P35.

Actually, Sanjoy has a workaround to solve TP 600X S3 issue.
What we need to do is to come up with a clean patch. 
It is on to-do list. 

Thanks,
Luming

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2006-05-23 17:13 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-27  9:04 2.6.16-rc5: known regressions Yu, Luming
2006-03-03  2:59 ` Sanjoy Mahajan
2006-03-03 16:51   ` Matthew Garrett
2006-03-03 21:04     ` Sanjoy Mahajan
2006-03-10  5:26 ` 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT] Sanjoy Mahajan
2006-05-19 13:44   ` Thomas Renninger
2006-05-21  0:12     ` Sanjoy Mahajan
2006-05-21  0:40       ` Carl-Daniel Hailfinger
2006-05-21  1:30         ` Joshua Hudson
2006-05-21  3:53           ` Lee Revell
2006-05-22  9:55       ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-03-10  6:12 Yu, Luming
2006-03-10  6:27 ` Sanjoy Mahajan
2006-03-10  6:46 Yu, Luming
2006-03-10 13:27 ` Sanjoy Mahajan
2006-03-10 13:36 ` Sanjoy Mahajan
2006-03-13  2:00 Yu, Luming
2006-03-13  4:38 ` Sanjoy Mahajan
2006-03-13  4:51 Yu, Luming
2006-03-13  7:28 ` Sanjoy Mahajan
2006-03-13  8:35 Yu, Luming
2006-03-13 15:21 ` Sanjoy Mahajan
2006-03-14  1:48 Yu, Luming
2006-03-14  8:28 ` Sanjoy Mahajan
2006-03-15  1:46 Yu, Luming
2006-03-15  5:40 ` Sanjoy Mahajan
2006-03-15  5:57 ` Sanjoy Mahajan
2006-03-15  6:16 Yu, Luming
2006-03-15  6:35 ` Sanjoy Mahajan
2006-03-15  6:25 Yu, Luming
2006-03-15  6:47 Yu, Luming
2006-03-15  7:06 ` Sanjoy Mahajan
2006-03-15  8:02 Yu, Luming
2006-03-16  0:03 ` Sanjoy Mahajan
2006-03-16  5:47 ` Sanjoy Mahajan
2006-03-16  6:46 ` Sanjoy Mahajan
2006-03-16  6:41 Yu, Luming
2006-03-16  6:54 ` Sanjoy Mahajan
2006-03-16  7:14 ` Sanjoy Mahajan
2006-03-16  7:28 Yu, Luming
2006-03-16  7:57 ` Sanjoy Mahajan
2006-03-16  8:18 Yu, Luming
2006-03-16 15:15 ` Sanjoy Mahajan
2006-03-17  1:17 Yu, Luming
2006-03-17  6:28 ` Sanjoy Mahajan
2006-03-17  6:57 Yu, Luming
2006-03-17  7:11 ` Sanjoy Mahajan
2006-03-17  7:32 ` Sanjoy Mahajan
2006-03-17  7:50 Yu, Luming
2006-03-17 18:43 ` Sanjoy Mahajan
2006-03-18  2:02 Yu, Luming
2006-03-18  7:23 ` Sanjoy Mahajan
2006-03-18 13:24 Yu, Luming
2006-03-18 14:37 ` Sanjoy Mahajan
2006-03-18 15:10 Yu, Luming
2006-03-18 15:48 ` Sanjoy Mahajan
2006-03-18 15:58 Yu, Luming
2006-03-18 16:27 ` Sanjoy Mahajan
2006-03-18 16:37 Yu, Luming
2006-03-18 17:03 ` Sanjoy Mahajan
2006-03-18 17:08 Yu, Luming
2006-03-18 20:12 ` Sanjoy Mahajan
2006-03-19  4:12 Yu, Luming
2006-03-19 14:33 ` Sanjoy Mahajan
2006-03-20  6:39 ` Sanjoy Mahajan
2006-03-21  1:38 Yu, Luming
2006-03-21  7:27 ` Sanjoy Mahajan
2006-03-21  8:47 ` Sanjoy Mahajan
2006-03-21  9:11 Yu, Luming
2006-03-21 20:37 ` Sanjoy Mahajan
2006-03-21 22:09 ` Sanjoy Mahajan
2006-03-22  1:30 Yu, Luming
2006-03-22  4:35 ` Sanjoy Mahajan
2006-03-22  7:15 ` Sanjoy Mahajan
2006-03-22  1:34 Yu, Luming
2006-03-22  7:00 ` Sanjoy Mahajan
2006-03-22  4:58 Yu, Luming
2006-03-22  5:13 ` Sanjoy Mahajan
2006-03-24  1:17 ` Sanjoy Mahajan
2006-03-22  7:28 Yu, Luming
2006-03-22 14:16 ` Sanjoy Mahajan
2006-03-23  4:46 Yu, Luming
2006-03-23  6:25 ` Sanjoy Mahajan
2006-03-23  9:10 Yu, Luming
2006-03-23 19:19 ` Sanjoy Mahajan
2006-03-24  1:31 Yu, Luming
2006-04-04  6:49 ` Sanjoy Mahajan
2006-04-05  3:03 Yu, Luming
2006-05-23 13:29 Yu, Luming
2006-05-23 17:12 ` Sanjoy Mahajan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox