* Resume problems @ 2007-10-22 16:15 Gabriel C 2007-10-22 22:35 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Gabriel C @ 2007-10-22 16:15 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: linux-pm, Rafael J. Wysocki, linux-acpi Hi all , I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). Suspend works fine but on resume I have some problems. All CPU's but boot CPU won't come back , everything else seems fine. ... Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 Oct 22 15:02:28 lara [ 54.638093] Not responding. Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 Oct 22 15:02:28 lara [ 59.656795] Not responding. Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 Oct 22 15:02:28 lara [ 64.675517] Not responding. Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode ... After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. So I've tested reboot=<> options with no luck. ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now ( or at least never worked here ). I know suspend worked fine on windows with that box. There is my config and dmesg ( good and bad one ) : http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt http://194.231.229.228/suspend/config Regards, Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-22 16:15 Resume problems Gabriel C @ 2007-10-22 22:35 ` Rafael J. Wysocki 2007-10-22 23:00 ` Gabriel C 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2007-10-22 22:35 UTC (permalink / raw) To: Gabriel C; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi On Monday, 22 October 2007 18:15, Gabriel C wrote: > Hi all , > > I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 > on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). > > Suspend works fine but on resume I have some problems. > All CPU's but boot CPU won't come back , everything else seems fine. Can you please try to disable HT and suspend? > ... > > Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... > Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 > Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 > Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code > Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 > Oct 22 15:02:28 lara [ 54.638093] Not responding. > Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... > Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed > Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed > Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed > Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online > Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 > Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 > Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 > Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 > Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 > Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code > Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 > Oct 22 15:02:28 lara [ 59.656795] Not responding. > Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... > Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed > Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed > Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed > Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online > Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 > Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 > Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 > Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 > Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 > Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code > Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 > Oct 22 15:02:28 lara [ 64.675517] Not responding. > Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... > Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed > Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed > Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed > Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online > Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 > Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 > Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 > Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode > > ... > > After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now > I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. > So I've tested reboot=<> options with no luck. > ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. > > I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. > The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. > > I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now > ( or at least never worked here ). I know suspend worked fine on windows with that box. > > There is my config and dmesg ( good and bad one ) : > > > http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt > http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt > http://194.231.229.228/suspend/config Well, I think we have a problem with the CPU hotplug. Can you try to offline-online CPUs (without suspending) and see if that works? Greetings, Rafael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-22 22:35 ` Rafael J. Wysocki @ 2007-10-22 23:00 ` Gabriel C 2007-10-22 23:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Gabriel C @ 2007-10-22 23:00 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi Rafael J. Wysocki wrote: > On Monday, 22 October 2007 18:15, Gabriel C wrote: >> Hi all , >> >> I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 >> on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). >> >> Suspend works fine but on resume I have some problems. >> All CPU's but boot CPU won't come back , everything else seems fine. > > Can you please try to disable HT and suspend? So only 'Hibernation' is enabled in kernel and HT disabled in BIOS ? If you mean that , sure I can try doing so. I also could disable Suspend to RAM completly from BIOS as well if you want. > >> ... >> >> Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... >> Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 >> Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 >> Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code >> Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 >> Oct 22 15:02:28 lara [ 54.638093] Not responding. >> Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... >> Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed >> Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed >> Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed >> Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online >> Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 >> Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 >> Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 >> Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 >> Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 >> Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code >> Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 >> Oct 22 15:02:28 lara [ 59.656795] Not responding. >> Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... >> Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed >> Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed >> Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed >> Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online >> Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 >> Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 >> Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 >> Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 >> Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 >> Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code >> Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 >> Oct 22 15:02:28 lara [ 64.675517] Not responding. >> Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... >> Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed >> Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed >> Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed >> Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online >> Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 >> Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 >> Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 >> Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode >> >> ... >> >> After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now >> I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. >> So I've tested reboot=<> options with no luck. >> ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. >> >> I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. >> The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. >> >> I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now >> ( or at least never worked here ). I know suspend worked fine on windows with that box. >> >> There is my config and dmesg ( good and bad one ) : >> >> >> http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt >> http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt >> http://194.231.229.228/suspend/config > > Well, I think we have a problem with the CPU hotplug. > > Can you try to offline-online CPUs (without suspending) and see if that works? Yes does work when I do it manually : [ 6687.595842] CPU 1 is now offline [ 6687.711425] CPU 2 is now offline [ 6687.819330] CPU 3 is now offline [ 6687.819337] SMP alternatives: switching to UP code [ 6702.109605] SMP alternatives: switching to SMP code [ 6702.110634] Booting processor 1/1 eip 3000 [ 6702.122140] Initializing CPU#1 [ 6702.182045] Calibrating delay using timer specific routine.. 3989.26 BogoMIPS (lpj=1994633) [ 6702.182063] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 [ 6702.182085] CPU: Trace cache: 12K uops, L1 D cache: 8K [ 6702.182091] CPU: L2 cache: 512K [ 6702.182096] CPU: Physical Processor ID: 0 [ 6702.182102] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 [ 6702.182118] Intel machine check architecture supported. [ 6702.182130] Intel machine check reporting enabled on CPU#1. [ 6702.182137] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available [ 6702.182143] CPU1: Thermal monitoring enabled [ 6702.183563] CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07 [ 6702.184488] checking TSC synchronization [CPU#0 -> CPU#1]: passed. [ 6702.205500] Switched to high resolution mode on CPU 1 [ 6702.210400] SMP alternatives: switching to SMP code [ 6702.212196] Booting processor 2/2 eip 3000 [ 6702.222693] Initializing CPU#2 [ 6702.282950] Calibrating delay using timer specific routine.. 3988.88 BogoMIPS (lpj=1994443) [ 6702.282962] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6702.282974] CPU: Trace cache: 12K uops, L1 D cache: 8K [ 6702.282977] CPU: L2 cache: 512K [ 6702.282980] CPU: Physical Processor ID: 3 [ 6702.282983] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 [ 6702.282991] Intel machine check architecture supported. [ 6702.282998] Intel machine check reporting enabled on CPU#2. [ 6702.283001] CPU2: Intel P4/Xeon Extended MCE MSRs (12) available [ 6702.283005] CPU2: Thermal monitoring enabled [ 6702.283300] CPU2: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 [ 6702.284296] checking TSC synchronization [CPU#1 -> CPU#2]: passed. [ 6702.305317] Switched to high resolution mode on CPU 2 [ 6702.312356] SMP alternatives: switching to SMP code [ 6702.313995] Booting processor 3/3 eip 3000 [ 6702.324511] Initializing CPU#3 [ 6702.384864] Calibrating delay using timer specific routine.. 3988.87 BogoMIPS (lpj=1994438) [ 6702.384875] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 6702.384888] CPU: Trace cache: 12K uops, L1 D cache: 8K [ 6702.384891] CPU: L2 cache: 512K [ 6702.384894] CPU: Physical Processor ID: 3 [ 6702.384897] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 [ 6702.384905] Intel machine check architecture supported. [ 6702.384912] Intel machine check reporting enabled on CPU#3. [ 6702.384915] CPU3: Intel P4/Xeon Extended MCE MSRs (12) available [ 6702.384919] CPU3: Thermal monitoring enabled [ 6702.385146] CPU3: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 [ 6702.386252] checking TSC synchronization [CPU#1 -> CPU#3]: passed. [ 6702.407259] Switched to high resolution mode on CPU 3 ... done with : for i in cpu1 cpu2 cpu3; do echo 0 >/sys/devices/system/cpu/$i/online; done for i in cpu1 cpu2 cpu3; do echo 1 >/sys/devices/system/cpu/$i/online; done > > Greetings, > Rafael > Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-22 23:00 ` Gabriel C @ 2007-10-22 23:31 ` Rafael J. Wysocki 2007-10-22 23:22 ` Gabriel C 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2007-10-22 23:31 UTC (permalink / raw) To: Gabriel C; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi On Tuesday, 23 October 2007 01:00, Gabriel C wrote: > Rafael J. Wysocki wrote: > > On Monday, 22 October 2007 18:15, Gabriel C wrote: > >> Hi all , > >> > >> I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 > >> on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). > >> > >> Suspend works fine but on resume I have some problems. > >> All CPU's but boot CPU won't come back , everything else seems fine. > > > > Can you please try to disable HT and suspend? > > So only 'Hibernation' is enabled in kernel and HT disabled in BIOS ? > > If you mean that , sure I can try doing so. With suspend or hibernation enabled in the kernel, but with HT disabled in the BIOS. > I also could disable Suspend to RAM completly from BIOS as well if you want. No, that rather won't work. > > > >> ... > >> > >> Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... > >> Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 > >> Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 > >> Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code > >> Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 > >> Oct 22 15:02:28 lara [ 54.638093] Not responding. > >> Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... > >> Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed > >> Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed > >> Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed > >> Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online > >> Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 > >> Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 > >> Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 > >> Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 > >> Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 > >> Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code > >> Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 > >> Oct 22 15:02:28 lara [ 59.656795] Not responding. > >> Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... > >> Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed > >> Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed > >> Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed > >> Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online > >> Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 > >> Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 > >> Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 > >> Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 > >> Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 > >> Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code > >> Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 > >> Oct 22 15:02:28 lara [ 64.675517] Not responding. > >> Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... > >> Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed > >> Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed > >> Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed > >> Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online > >> Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 > >> Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 > >> Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 > >> Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode > >> > >> ... > >> > >> After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now > >> I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. > >> So I've tested reboot=<> options with no luck. > >> ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. > >> > >> I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. > >> The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. > >> > >> I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now > >> ( or at least never worked here ). I know suspend worked fine on windows with that box. > >> > >> There is my config and dmesg ( good and bad one ) : > >> > >> > >> http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt > >> http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt > >> http://194.231.229.228/suspend/config > > > > Well, I think we have a problem with the CPU hotplug. > > > > Can you try to offline-online CPUs (without suspending) and see if that works? > > Yes does work when I do it manually : > > [ 6687.595842] CPU 1 is now offline > [ 6687.711425] CPU 2 is now offline > [ 6687.819330] CPU 3 is now offline > [ 6687.819337] SMP alternatives: switching to UP code > [ 6702.109605] SMP alternatives: switching to SMP code > [ 6702.110634] Booting processor 1/1 eip 3000 > [ 6702.122140] Initializing CPU#1 > [ 6702.182045] Calibrating delay using timer specific routine.. 3989.26 BogoMIPS (lpj=1994633) > [ 6702.182063] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 > [ 6702.182085] CPU: Trace cache: 12K uops, L1 D cache: 8K > [ 6702.182091] CPU: L2 cache: 512K > [ 6702.182096] CPU: Physical Processor ID: 0 > [ 6702.182102] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 > [ 6702.182118] Intel machine check architecture supported. > [ 6702.182130] Intel machine check reporting enabled on CPU#1. > [ 6702.182137] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available > [ 6702.182143] CPU1: Thermal monitoring enabled > [ 6702.183563] CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07 > [ 6702.184488] checking TSC synchronization [CPU#0 -> CPU#1]: passed. > [ 6702.205500] Switched to high resolution mode on CPU 1 > [ 6702.210400] SMP alternatives: switching to SMP code > [ 6702.212196] Booting processor 2/2 eip 3000 > [ 6702.222693] Initializing CPU#2 > [ 6702.282950] Calibrating delay using timer specific routine.. 3988.88 BogoMIPS (lpj=1994443) > [ 6702.282962] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 6702.282974] CPU: Trace cache: 12K uops, L1 D cache: 8K > [ 6702.282977] CPU: L2 cache: 512K > [ 6702.282980] CPU: Physical Processor ID: 3 > [ 6702.282983] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 > [ 6702.282991] Intel machine check architecture supported. > [ 6702.282998] Intel machine check reporting enabled on CPU#2. > [ 6702.283001] CPU2: Intel P4/Xeon Extended MCE MSRs (12) available > [ 6702.283005] CPU2: Thermal monitoring enabled > [ 6702.283300] CPU2: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 > [ 6702.284296] checking TSC synchronization [CPU#1 -> CPU#2]: passed. > [ 6702.305317] Switched to high resolution mode on CPU 2 > [ 6702.312356] SMP alternatives: switching to SMP code > [ 6702.313995] Booting processor 3/3 eip 3000 > [ 6702.324511] Initializing CPU#3 > [ 6702.384864] Calibrating delay using timer specific routine.. 3988.87 BogoMIPS (lpj=1994438) > [ 6702.384875] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 6702.384888] CPU: Trace cache: 12K uops, L1 D cache: 8K > [ 6702.384891] CPU: L2 cache: 512K > [ 6702.384894] CPU: Physical Processor ID: 3 > [ 6702.384897] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 > [ 6702.384905] Intel machine check architecture supported. > [ 6702.384912] Intel machine check reporting enabled on CPU#3. > [ 6702.384915] CPU3: Intel P4/Xeon Extended MCE MSRs (12) available > [ 6702.384919] CPU3: Thermal monitoring enabled > [ 6702.385146] CPU3: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 > [ 6702.386252] checking TSC synchronization [CPU#1 -> CPU#3]: passed. > [ 6702.407259] Switched to high resolution mode on CPU 3 > > ... > > done with : > for i in cpu1 cpu2 cpu3; do echo 0 >/sys/devices/system/cpu/$i/online; done > > for i in cpu1 cpu2 cpu3; do echo 1 >/sys/devices/system/cpu/$i/online; done Hm, well. Please apply the appended patch and then try: # echo 8 > /proc/sys/kernel/printk # echo 5 > /sys/power/pm_test_level # echo mem > /sys/power/state (should wait for approx. 3 sec. and return to the boot prompt) # echo 4 > /sys/power/pm_test_level # echo mem > /sys/power/state (should wait for approx. 3 sec. and return to the boot prompt) ... # echo 1 > /sys/power/pm_test_level # echo mem > /sys/power/state (should wait for approx. 3 sec. and return to the boot prompt) and see if you can reproduce the problem and for which test level. [Echoing 0 to /sys/power/pm_test_level restores the normal behavior.] Greetings, Rafael --- kernel/power/main.c | 75 ++++++++++++++++++++++++++++++++++++++++++++------- kernel/power/power.h | 10 ++++++ 2 files changed, 76 insertions(+), 9 deletions(-) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -28,6 +28,46 @@ BLOCKING_NOTIFIER_HEAD(pm_chain_head); DEFINE_MUTEX(pm_mutex); +#ifdef CONFIG_PM_DEBUG +int pm_test_level = TEST_NONE; + +static int suspend_test(int level) +{ + if (pm_test_level == level) { + printk(KERN_INFO "suspend debug: Waiting for 3 seconds.\n"); + mdelay(3000); + return 1; + } + return 0; +} + +static ssize_t pm_test_level_show(struct kset *kset, char *buf) +{ + return sprintf(buf, "%d\n", pm_test_level); +} + +static ssize_t +pm_test_level_store(struct kset *kset, const char *buf, size_t n) +{ + int val; + + if (sscanf(buf, "%d", &val) != 1) + return -EINVAL; + + if (val < TEST_NONE || val > TEST_FREEZER) + return -EINVAL; + + pm_test_level = val; + + return n; +} + +power_attr(pm_test_level); +#else /* !CONFIG_PM_DEBUG */ +static inline int suspend_test(int level) { return 0; } +#endif /* !CONFIG_PM_DEBUG */ + + #ifdef CONFIG_SUSPEND /* This is just an arbitrary number */ @@ -133,7 +173,10 @@ static int suspend_enter(suspend_state_t printk(KERN_ERR "Some devices failed to power down\n"); goto Done; } - error = suspend_ops->enter(state); + + if (!suspend_test(TEST_CORE)) + error = suspend_ops->enter(state); + device_power_up(); Done: arch_suspend_enable_irqs(); @@ -164,16 +207,25 @@ int suspend_devices_and_enter(suspend_st printk(KERN_ERR "Some devices failed to suspend\n"); goto Resume_console; } + + if (suspend_test(TEST_DEVICES)) + goto Resume_devices; + if (suspend_ops->prepare) { error = suspend_ops->prepare(); if (error) goto Resume_devices; } + + if (suspend_test(TEST_PLATFORM)) + goto Finish; + error = disable_nonboot_cpus(); - if (!error) + if (!error && !suspend_test(TEST_CPUS)) suspend_enter(state); enable_nonboot_cpus(); + Finish: if (suspend_ops->finish) suspend_ops->finish(); Resume_devices: @@ -240,12 +292,17 @@ static int enter_state(suspend_state_t s printk("done.\n"); pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]); - if ((error = suspend_prepare())) + error = suspend_prepare(); + if (error) goto Unlock; + if (suspend_test(TEST_FREEZER)) + goto Finish; + pr_debug("PM: Entering %s sleep\n", pm_states[state]); error = suspend_devices_and_enter(state); + Finish: pr_debug("PM: Finishing wakeup.\n"); suspend_finish(); Unlock: @@ -363,18 +420,18 @@ pm_trace_store(struct kset *kset, const } power_attr(pm_trace); +#endif /* CONFIG_PM_TRACE */ static struct attribute * g[] = { &state_attr.attr, +#ifdef CONFIG_PM_TRACE &pm_trace_attr.attr, +#endif +#ifdef CONFIG_PM_DEBUG + &pm_test_level_attr.attr, +#endif NULL, }; -#else -static struct attribute * g[] = { - &state_attr.attr, - NULL, -}; -#endif /* CONFIG_PM_TRACE */ static struct attribute_group attr_group = { .attrs = g, Index: linux-2.6/kernel/power/power.h =================================================================== --- linux-2.6.orig/kernel/power/power.h +++ linux-2.6/kernel/power/power.h @@ -211,3 +211,13 @@ static inline int pm_notifier_call_chain return (blocking_notifier_call_chain(&pm_chain_head, val, NULL) == NOTIFY_BAD) ? -EINVAL : 0; } + +/* Suspend test levels */ +enum { + TEST_NONE, + TEST_CORE, + TEST_CPUS, + TEST_PLATFORM, + TEST_DEVICES, + TEST_FREEZER +}; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-22 23:31 ` Rafael J. Wysocki @ 2007-10-22 23:22 ` Gabriel C 2007-10-23 0:11 ` Gabriel C 0 siblings, 1 reply; 9+ messages in thread From: Gabriel C @ 2007-10-22 23:22 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi Rafael J. Wysocki wrote: > On Tuesday, 23 October 2007 01:00, Gabriel C wrote: >> Rafael J. Wysocki wrote: >>> On Monday, 22 October 2007 18:15, Gabriel C wrote: >>>> Hi all , >>>> >>>> I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 >>>> on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). >>>> >>>> Suspend works fine but on resume I have some problems. >>>> All CPU's but boot CPU won't come back , everything else seems fine. >>> Can you please try to disable HT and suspend? >> So only 'Hibernation' is enabled in kernel and HT disabled in BIOS ? >> >> If you mean that , sure I can try doing so. > > With suspend or hibernation enabled in the kernel, but with HT disabled in the > BIOS. Ok trying in some minutes. > >> I also could disable Suspend to RAM completly from BIOS as well if you want. > > No, that rather won't work. > >>>> ... >>>> >>>> Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... >>>> Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 >>>> Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 >>>> Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code >>>> Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 >>>> Oct 22 15:02:28 lara [ 54.638093] Not responding. >>>> Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... >>>> Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed >>>> Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed >>>> Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed >>>> Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online >>>> Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 >>>> Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 >>>> Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 >>>> Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 >>>> Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 >>>> Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code >>>> Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 >>>> Oct 22 15:02:28 lara [ 59.656795] Not responding. >>>> Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... >>>> Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed >>>> Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed >>>> Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed >>>> Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online >>>> Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 >>>> Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 >>>> Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 >>>> Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 >>>> Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 >>>> Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code >>>> Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 >>>> Oct 22 15:02:28 lara [ 64.675517] Not responding. >>>> Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... >>>> Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed >>>> Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed >>>> Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed >>>> Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online >>>> Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 >>>> Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 >>>> Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 >>>> Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode >>>> >>>> ... >>>> >>>> After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now >>>> I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. >>>> So I've tested reboot=<> options with no luck. >>>> ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. >>>> >>>> I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. >>>> The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. >>>> >>>> I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now >>>> ( or at least never worked here ). I know suspend worked fine on windows with that box. >>>> >>>> There is my config and dmesg ( good and bad one ) : >>>> >>>> >>>> http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt >>>> http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt >>>> http://194.231.229.228/suspend/config >>> Well, I think we have a problem with the CPU hotplug. >>> >>> Can you try to offline-online CPUs (without suspending) and see if that works? >> Yes does work when I do it manually : >> >> [ 6687.595842] CPU 1 is now offline >> [ 6687.711425] CPU 2 is now offline >> [ 6687.819330] CPU 3 is now offline >> [ 6687.819337] SMP alternatives: switching to UP code >> [ 6702.109605] SMP alternatives: switching to SMP code >> [ 6702.110634] Booting processor 1/1 eip 3000 >> [ 6702.122140] Initializing CPU#1 >> [ 6702.182045] Calibrating delay using timer specific routine.. 3989.26 BogoMIPS (lpj=1994633) >> [ 6702.182063] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 >> [ 6702.182085] CPU: Trace cache: 12K uops, L1 D cache: 8K >> [ 6702.182091] CPU: L2 cache: 512K >> [ 6702.182096] CPU: Physical Processor ID: 0 >> [ 6702.182102] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 >> [ 6702.182118] Intel machine check architecture supported. >> [ 6702.182130] Intel machine check reporting enabled on CPU#1. >> [ 6702.182137] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available >> [ 6702.182143] CPU1: Thermal monitoring enabled >> [ 6702.183563] CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07 >> [ 6702.184488] checking TSC synchronization [CPU#0 -> CPU#1]: passed. >> [ 6702.205500] Switched to high resolution mode on CPU 1 >> [ 6702.210400] SMP alternatives: switching to SMP code >> [ 6702.212196] Booting processor 2/2 eip 3000 >> [ 6702.222693] Initializing CPU#2 >> [ 6702.282950] Calibrating delay using timer specific routine.. 3988.88 BogoMIPS (lpj=1994443) >> [ 6702.282962] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> [ 6702.282974] CPU: Trace cache: 12K uops, L1 D cache: 8K >> [ 6702.282977] CPU: L2 cache: 512K >> [ 6702.282980] CPU: Physical Processor ID: 3 >> [ 6702.282983] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 >> [ 6702.282991] Intel machine check architecture supported. >> [ 6702.282998] Intel machine check reporting enabled on CPU#2. >> [ 6702.283001] CPU2: Intel P4/Xeon Extended MCE MSRs (12) available >> [ 6702.283005] CPU2: Thermal monitoring enabled >> [ 6702.283300] CPU2: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 >> [ 6702.284296] checking TSC synchronization [CPU#1 -> CPU#2]: passed. >> [ 6702.305317] Switched to high resolution mode on CPU 2 >> [ 6702.312356] SMP alternatives: switching to SMP code >> [ 6702.313995] Booting processor 3/3 eip 3000 >> [ 6702.324511] Initializing CPU#3 >> [ 6702.384864] Calibrating delay using timer specific routine.. 3988.87 BogoMIPS (lpj=1994438) >> [ 6702.384875] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> [ 6702.384888] CPU: Trace cache: 12K uops, L1 D cache: 8K >> [ 6702.384891] CPU: L2 cache: 512K >> [ 6702.384894] CPU: Physical Processor ID: 3 >> [ 6702.384897] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 >> [ 6702.384905] Intel machine check architecture supported. >> [ 6702.384912] Intel machine check reporting enabled on CPU#3. >> [ 6702.384915] CPU3: Intel P4/Xeon Extended MCE MSRs (12) available >> [ 6702.384919] CPU3: Thermal monitoring enabled >> [ 6702.385146] CPU3: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 >> [ 6702.386252] checking TSC synchronization [CPU#1 -> CPU#3]: passed. >> [ 6702.407259] Switched to high resolution mode on CPU 3 >> >> ... >> >> done with : >> for i in cpu1 cpu2 cpu3; do echo 0 >/sys/devices/system/cpu/$i/online; done >> >> for i in cpu1 cpu2 cpu3; do echo 1 >/sys/devices/system/cpu/$i/online; done > > Hm, well. > > Please apply the appended patch and then try: Ok will do right after I test with disabled HT > > # echo 8 > /proc/sys/kernel/printk > # echo 5 > /sys/power/pm_test_level > # echo mem > /sys/power/state > (should wait for approx. 3 sec. and return to the boot prompt) > # echo 4 > /sys/power/pm_test_level > # echo mem > /sys/power/state > (should wait for approx. 3 sec. and return to the boot prompt) > ... > # echo 1 > /sys/power/pm_test_level > # echo mem > /sys/power/state > (should wait for approx. 3 sec. and return to the boot prompt) > > and see if you can reproduce the problem and for which test level. > > [Echoing 0 to /sys/power/pm_test_level restores the normal behavior.] > > Greetings, > Rafael > > > --- > kernel/power/main.c | 75 ++++++++++++++++++++++++++++++++++++++++++++------- > kernel/power/power.h | 10 ++++++ > 2 files changed, 76 insertions(+), 9 deletions(-) > > Index: linux-2.6/kernel/power/main.c > =================================================================== > --- linux-2.6.orig/kernel/power/main.c > +++ linux-2.6/kernel/power/main.c > @@ -28,6 +28,46 @@ BLOCKING_NOTIFIER_HEAD(pm_chain_head); > > DEFINE_MUTEX(pm_mutex); > > +#ifdef CONFIG_PM_DEBUG > +int pm_test_level = TEST_NONE; > + > +static int suspend_test(int level) > +{ > + if (pm_test_level == level) { > + printk(KERN_INFO "suspend debug: Waiting for 3 seconds.\n"); > + mdelay(3000); > + return 1; > + } > + return 0; > +} > + > +static ssize_t pm_test_level_show(struct kset *kset, char *buf) > +{ > + return sprintf(buf, "%d\n", pm_test_level); > +} > + > +static ssize_t > +pm_test_level_store(struct kset *kset, const char *buf, size_t n) > +{ > + int val; > + > + if (sscanf(buf, "%d", &val) != 1) > + return -EINVAL; > + > + if (val < TEST_NONE || val > TEST_FREEZER) > + return -EINVAL; > + > + pm_test_level = val; > + > + return n; > +} > + > +power_attr(pm_test_level); > +#else /* !CONFIG_PM_DEBUG */ > +static inline int suspend_test(int level) { return 0; } > +#endif /* !CONFIG_PM_DEBUG */ > + > + > #ifdef CONFIG_SUSPEND > > /* This is just an arbitrary number */ > @@ -133,7 +173,10 @@ static int suspend_enter(suspend_state_t > printk(KERN_ERR "Some devices failed to power down\n"); > goto Done; > } > - error = suspend_ops->enter(state); > + > + if (!suspend_test(TEST_CORE)) > + error = suspend_ops->enter(state); > + > device_power_up(); > Done: > arch_suspend_enable_irqs(); > @@ -164,16 +207,25 @@ int suspend_devices_and_enter(suspend_st > printk(KERN_ERR "Some devices failed to suspend\n"); > goto Resume_console; > } > + > + if (suspend_test(TEST_DEVICES)) > + goto Resume_devices; > + > if (suspend_ops->prepare) { > error = suspend_ops->prepare(); > if (error) > goto Resume_devices; > } > + > + if (suspend_test(TEST_PLATFORM)) > + goto Finish; > + > error = disable_nonboot_cpus(); > - if (!error) > + if (!error && !suspend_test(TEST_CPUS)) > suspend_enter(state); > > enable_nonboot_cpus(); > + Finish: > if (suspend_ops->finish) > suspend_ops->finish(); > Resume_devices: > @@ -240,12 +292,17 @@ static int enter_state(suspend_state_t s > printk("done.\n"); > > pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]); > - if ((error = suspend_prepare())) > + error = suspend_prepare(); > + if (error) > goto Unlock; > > + if (suspend_test(TEST_FREEZER)) > + goto Finish; > + > pr_debug("PM: Entering %s sleep\n", pm_states[state]); > error = suspend_devices_and_enter(state); > > + Finish: > pr_debug("PM: Finishing wakeup.\n"); > suspend_finish(); > Unlock: > @@ -363,18 +420,18 @@ pm_trace_store(struct kset *kset, const > } > > power_attr(pm_trace); > +#endif /* CONFIG_PM_TRACE */ > > static struct attribute * g[] = { > &state_attr.attr, > +#ifdef CONFIG_PM_TRACE > &pm_trace_attr.attr, > +#endif > +#ifdef CONFIG_PM_DEBUG > + &pm_test_level_attr.attr, > +#endif > NULL, > }; > -#else > -static struct attribute * g[] = { > - &state_attr.attr, > - NULL, > -}; > -#endif /* CONFIG_PM_TRACE */ > > static struct attribute_group attr_group = { > .attrs = g, > Index: linux-2.6/kernel/power/power.h > =================================================================== > --- linux-2.6.orig/kernel/power/power.h > +++ linux-2.6/kernel/power/power.h > @@ -211,3 +211,13 @@ static inline int pm_notifier_call_chain > return (blocking_notifier_call_chain(&pm_chain_head, val, NULL) > == NOTIFY_BAD) ? -EINVAL : 0; > } > + > +/* Suspend test levels */ > +enum { > + TEST_NONE, > + TEST_CORE, > + TEST_CPUS, > + TEST_PLATFORM, > + TEST_DEVICES, > + TEST_FREEZER > +}; > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-22 23:22 ` Gabriel C @ 2007-10-23 0:11 ` Gabriel C 2007-10-23 1:01 ` Gabriel C 0 siblings, 1 reply; 9+ messages in thread From: Gabriel C @ 2007-10-23 0:11 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi Gabriel C wrote: > Rafael J. Wysocki wrote: >> On Tuesday, 23 October 2007 01:00, Gabriel C wrote: >>> Rafael J. Wysocki wrote: >>>> On Monday, 22 October 2007 18:15, Gabriel C wrote: >>>>> Hi all , >>>>> >>>>> I'm running current git + aic7xxx suspend patch from http://bugzilla.kernel.org/show_bug.cgi?id=3062 >>>>> on a Dell Precision WorkStation 530 MT SMP box ( HT enabled ). >>>>> >>>>> Suspend works fine but on resume I have some problems. >>>>> All CPU's but boot CPU won't come back , everything else seems fine. >>>> Can you please try to disable HT and suspend? >>> So only 'Hibernation' is enabled in kernel and HT disabled in BIOS ? >>> >>> If you mean that , sure I can try doing so. >> With suspend or hibernation enabled in the kernel, but with HT disabled in the >> BIOS. > > Ok trying in some minutes. Disabling HT does not make any difference , nor disabling / enabling only one Hibernation or Suspend in kernel and BIOS nor any combination of these. > >>> I also could disable Suspend to RAM completly from BIOS as well if you want. >> No, that rather won't work. >> >>>>> ... >>>>> >>>>> Oct 22 15:02:28 lara [ 49.618795] Enabling non-boot CPUs ... >>>>> Oct 22 15:02:28 lara [ 49.622211] PM: Adding info for No Bus:msr1 >>>>> Oct 22 15:02:28 lara [ 49.622259] PM: Adding info for No Bus:cpu1 >>>>> Oct 22 15:02:28 lara [ 49.622302] SMP alternatives: switching to SMP code >>>>> Oct 22 15:02:28 lara [ 49.623536] Booting processor 1/1 eip 3000 >>>>> Oct 22 15:02:28 lara [ 54.638093] Not responding. >>>>> Oct 22 15:02:28 lara [ 54.638096] Inquiring remote APIC #1... >>>>> Oct 22 15:02:28 lara [ 54.638099] ... APIC #1 ID: failed >>>>> Oct 22 15:02:28 lara [ 54.638204] ... APIC #1 VERSION: failed >>>>> Oct 22 15:02:28 lara [ 54.638307] ... APIC #1 SPIV: failed >>>>> Oct 22 15:02:28 lara [ 54.638427] skipping cpu1, didn't come online >>>>> Oct 22 15:02:28 lara [ 54.638602] PM: Removing info for No Bus:msr1 >>>>> Oct 22 15:02:28 lara [ 54.638643] PM: Removing info for No Bus:cpu1 >>>>> Oct 22 15:02:28 lara [ 54.638678] Error taking CPU1 up: -5 >>>>> Oct 22 15:02:28 lara [ 54.640908] PM: Adding info for No Bus:msr2 >>>>> Oct 22 15:02:28 lara [ 54.640939] PM: Adding info for No Bus:cpu2 >>>>> Oct 22 15:02:28 lara [ 54.640976] SMP alternatives: switching to SMP code >>>>> Oct 22 15:02:28 lara [ 54.641961] Booting processor 2/2 eip 3000 >>>>> Oct 22 15:02:28 lara [ 59.656795] Not responding. >>>>> Oct 22 15:02:28 lara [ 59.656799] Inquiring remote APIC #2... >>>>> Oct 22 15:02:28 lara [ 59.656803] ... APIC #2 ID: failed >>>>> Oct 22 15:02:28 lara [ 59.656907] ... APIC #2 VERSION: failed >>>>> Oct 22 15:02:28 lara [ 59.657011] ... APIC #2 SPIV: failed >>>>> Oct 22 15:02:28 lara [ 59.657131] skipping cpu2, didn't come online >>>>> Oct 22 15:02:28 lara [ 59.657300] PM: Removing info for No Bus:msr2 >>>>> Oct 22 15:02:28 lara [ 59.657343] PM: Removing info for No Bus:cpu2 >>>>> Oct 22 15:02:28 lara [ 59.657379] Error taking CPU2 up: -5 >>>>> Oct 22 15:02:28 lara [ 59.659605] PM: Adding info for No Bus:msr3 >>>>> Oct 22 15:02:28 lara [ 59.659637] PM: Adding info for No Bus:cpu3 >>>>> Oct 22 15:02:28 lara [ 59.659673] SMP alternatives: switching to SMP code >>>>> Oct 22 15:02:28 lara [ 59.660725] Booting processor 3/3 eip 3000 >>>>> Oct 22 15:02:28 lara [ 64.675517] Not responding. >>>>> Oct 22 15:02:28 lara [ 64.675520] Inquiring remote APIC #3... >>>>> Oct 22 15:02:28 lara [ 64.675524] ... APIC #3 ID: failed >>>>> Oct 22 15:02:28 lara [ 64.675628] ... APIC #3 VERSION: failed >>>>> Oct 22 15:02:28 lara [ 64.675731] ... APIC #3 SPIV: failed >>>>> Oct 22 15:02:28 lara [ 64.675859] skipping cpu3, didn't come online >>>>> Oct 22 15:02:28 lara [ 64.676017] PM: Removing info for No Bus:msr3 >>>>> Oct 22 15:02:28 lara [ 64.676059] PM: Removing info for No Bus:cpu3 >>>>> Oct 22 15:02:28 lara [ 64.676092] Error taking CPU3 up: -5 >>>>> Oct 22 15:02:28 lara [ 64.676326] evxfevnt-0079 [00] enable : System is already in ACPI mode >>>>> >>>>> ... >>>>> >>>>> After I've played with a lot boot options I found out booting with ' acpi=ht ' will make the CPU's work again but now >>>>> I have a problem on Suspend. Everything seems to just go down disks etc but the box itself is for some reason still on. >>>>> So I've tested reboot=<> options with no luck. >>>>> ( after waiting 5 minutes to be sure everything is really off I can just hit power button). On resume now everything is fine. >>>>> >>>>> I'm not really sure what is wrong here acpi/hibernation/cpu-hotplug or a mix of all so I'm CC'ing linux-acpi as well. >>>>> The only thing I noticed is the 'Breaking affinity for irq XX' on suspend without acpi=ht messages. >>>>> >>>>> I can't even tell whatever other kernel versions are working because aic7xxx driver didn't got suspend support till now >>>>> ( or at least never worked here ). I know suspend worked fine on windows with that box. >>>>> >>>>> There is my config and dmesg ( good and bad one ) : >>>>> >>>>> >>>>> http://194.231.229.228/suspend/acpi=ht_working_dmesg.txt >>>>> http://194.231.229.228/suspend/dmesg_broken_cpus_on_resume.txt >>>>> http://194.231.229.228/suspend/config >>>> Well, I think we have a problem with the CPU hotplug. >>>> >>>> Can you try to offline-online CPUs (without suspending) and see if that works? >>> Yes does work when I do it manually : >>> >>> [ 6687.595842] CPU 1 is now offline >>> [ 6687.711425] CPU 2 is now offline >>> [ 6687.819330] CPU 3 is now offline >>> [ 6687.819337] SMP alternatives: switching to UP code >>> [ 6702.109605] SMP alternatives: switching to SMP code >>> [ 6702.110634] Booting processor 1/1 eip 3000 >>> [ 6702.122140] Initializing CPU#1 >>> [ 6702.182045] Calibrating delay using timer specific routine.. 3989.26 BogoMIPS (lpj=1994633) >>> [ 6702.182063] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 >>> [ 6702.182085] CPU: Trace cache: 12K uops, L1 D cache: 8K >>> [ 6702.182091] CPU: L2 cache: 512K >>> [ 6702.182096] CPU: Physical Processor ID: 0 >>> [ 6702.182102] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 >>> [ 6702.182118] Intel machine check architecture supported. >>> [ 6702.182130] Intel machine check reporting enabled on CPU#1. >>> [ 6702.182137] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available >>> [ 6702.182143] CPU1: Thermal monitoring enabled >>> [ 6702.183563] CPU1: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07 >>> [ 6702.184488] checking TSC synchronization [CPU#0 -> CPU#1]: passed. >>> [ 6702.205500] Switched to high resolution mode on CPU 1 >>> [ 6702.210400] SMP alternatives: switching to SMP code >>> [ 6702.212196] Booting processor 2/2 eip 3000 >>> [ 6702.222693] Initializing CPU#2 >>> [ 6702.282950] Calibrating delay using timer specific routine.. 3988.88 BogoMIPS (lpj=1994443) >>> [ 6702.282962] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> [ 6702.282974] CPU: Trace cache: 12K uops, L1 D cache: 8K >>> [ 6702.282977] CPU: L2 cache: 512K >>> [ 6702.282980] CPU: Physical Processor ID: 3 >>> [ 6702.282983] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 >>> [ 6702.282991] Intel machine check architecture supported. >>> [ 6702.282998] Intel machine check reporting enabled on CPU#2. >>> [ 6702.283001] CPU2: Intel P4/Xeon Extended MCE MSRs (12) available >>> [ 6702.283005] CPU2: Thermal monitoring enabled >>> [ 6702.283300] CPU2: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 >>> [ 6702.284296] checking TSC synchronization [CPU#1 -> CPU#2]: passed. >>> [ 6702.305317] Switched to high resolution mode on CPU 2 >>> [ 6702.312356] SMP alternatives: switching to SMP code >>> [ 6702.313995] Booting processor 3/3 eip 3000 >>> [ 6702.324511] Initializing CPU#3 >>> [ 6702.384864] Calibrating delay using timer specific routine.. 3988.87 BogoMIPS (lpj=1994438) >>> [ 6702.384875] CPU: After generic identify, caps: 3febfbff 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> [ 6702.384888] CPU: Trace cache: 12K uops, L1 D cache: 8K >>> [ 6702.384891] CPU: L2 cache: 512K >>> [ 6702.384894] CPU: Physical Processor ID: 3 >>> [ 6702.384897] CPU: After all inits, caps: 3febfbff 00000000 00000000 0000b080 00000000 00000000 00000000 00000000 >>> [ 6702.384905] Intel machine check architecture supported. >>> [ 6702.384912] Intel machine check reporting enabled on CPU#3. >>> [ 6702.384915] CPU3: Intel P4/Xeon Extended MCE MSRs (12) available >>> [ 6702.384919] CPU3: Thermal monitoring enabled >>> [ 6702.385146] CPU3: Intel(R) XEON(TM) CPU 2.00GHz stepping 04 >>> [ 6702.386252] checking TSC synchronization [CPU#1 -> CPU#3]: passed. >>> [ 6702.407259] Switched to high resolution mode on CPU 3 >>> >>> ... >>> >>> done with : >>> for i in cpu1 cpu2 cpu3; do echo 0 >/sys/devices/system/cpu/$i/online; done >>> >>> for i in cpu1 cpu2 cpu3; do echo 1 >/sys/devices/system/cpu/$i/online; done >> Hm, well. >> >> Please apply the appended patch and then try: > > Ok will do right after I test with disabled HT > >> # echo 8 > /proc/sys/kernel/printk >> # echo 5 > /sys/power/pm_test_level >> # echo mem > /sys/power/state >> (should wait for approx. 3 sec. and return to the boot prompt) >> # echo 4 > /sys/power/pm_test_level >> # echo mem > /sys/power/state >> (should wait for approx. 3 sec. and return to the boot prompt) >> ... >> # echo 1 > /sys/power/pm_test_level >> # echo mem > /sys/power/state >> (should wait for approx. 3 sec. and return to the boot prompt) >> >> and see if you can reproduce the problem and for which test level. >> >> [Echoing 0 to /sys/power/pm_test_level restores the normal behavior.] Also box just froze on level 3 but I got a ACPI error at least which I didn't got in any other dmesg till now : ( also patch was tested with HT disabled and Suspend and Hibernation enabled in kernel and BIOS ) ... Oct 23 01:51:05 lara [ 273.512374] PM: Removing info for No Bus:input0 Oct 23 01:51:05 lara [ 274.545158] PM: Removing info for No Bus:mouse0 Oct 23 01:51:05 lara [ 274.551435] PM: Removing info for No Bus:event1 Oct 23 01:51:05 lara [ 274.559493] PM: Removing info for No Bus:input1 Oct 23 01:53:06 lara [ 394.869468] ACPI Error (evevent-0303): No installed handler for fixed event [00000002] [20070126] .... ( I hard reseted after that ) I try level 2 and 1 now I just wanted to let you know. >> >> Greetings, >> Rafael >> Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-23 0:11 ` Gabriel C @ 2007-10-23 1:01 ` Gabriel C 2007-10-23 22:57 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Gabriel C @ 2007-10-23 1:01 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi > Also box just froze on level 3 but I got a ACPI error at least which I didn't got in any other dmesg till now : > ( also patch was tested with HT disabled and Suspend and Hibernation enabled in kernel and BIOS ) > > ... > > Oct 23 01:51:05 lara [ 273.512374] PM: Removing info for No Bus:input0 > Oct 23 01:51:05 lara [ 274.545158] PM: Removing info for No Bus:mouse0 > Oct 23 01:51:05 lara [ 274.551435] PM: Removing info for No Bus:event1 > Oct 23 01:51:05 lara [ 274.559493] PM: Removing info for No Bus:input1 > Oct 23 01:53:06 lara [ 394.869468] ACPI Error (evevent-0303): No installed handler for fixed event [00000002] [20070126] > > .... > > ( I hard reseted after that ) > > I try level 2 and 1 now I just wanted to let you know. > Same issues with level 2 and 1. BTW I found out why my box does not shutdown with acpi=ht. It seems like libata does not like that acpi mode =) dropping the '... read http://linux-ata.org/shutdown.html , power down manually' message. That works perfectly with full acpi here. After all I think all this problems may be some who ACPI related but the question is why they get triggered by Suspend/Hibernation. If you want me to test something else just let me know. Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-23 1:01 ` Gabriel C @ 2007-10-23 22:57 ` Rafael J. Wysocki 2007-10-25 13:09 ` Gabriel C 0 siblings, 1 reply; 9+ messages in thread From: Rafael J. Wysocki @ 2007-10-23 22:57 UTC (permalink / raw) To: Gabriel C; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi On Tuesday, 23 October 2007 03:01, Gabriel C wrote: > > > Also box just froze on level 3 but I got a ACPI error at least which I didn't got in any other dmesg till now : > > ( also patch was tested with HT disabled and Suspend and Hibernation enabled in kernel and BIOS ) > > > > ... > > > > Oct 23 01:51:05 lara [ 273.512374] PM: Removing info for No Bus:input0 > > Oct 23 01:51:05 lara [ 274.545158] PM: Removing info for No Bus:mouse0 > > Oct 23 01:51:05 lara [ 274.551435] PM: Removing info for No Bus:event1 > > Oct 23 01:51:05 lara [ 274.559493] PM: Removing info for No Bus:input1 > > Oct 23 01:53:06 lara [ 394.869468] ACPI Error (evevent-0303): No installed handler for fixed event [00000002] [20070126] > > > > .... > > > > ( I hard reseted after that ) > > > > I try level 2 and 1 now I just wanted to let you know. > > > > Same issues with level 2 and 1. Yes. If you have a problem at level n, it should always reappear for n-1 etc. > BTW I found out why my box does not shutdown with acpi=ht. It seems like libata does not like that > acpi mode =) dropping the '... read http://linux-ata.org/shutdown.html , power down manually' message. > > That works perfectly with full acpi here. > > After all I think all this problems may be some who ACPI related > but the question is why they get triggered by Suspend/Hibernation. They certainly are ACPI-related, because the only difference between level 4 and level 3 suspend testing is that some global ACPI methods are executed at level 3 (in addition to level 4). Unfortunately, I have no idea what to do next, for now. I think you can file a bug report at http://bugzilla.kernel.org and put a link to this thread in there (against ACPI and please add my address to the CC list). Greetings, Rafael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Resume problems 2007-10-23 22:57 ` Rafael J. Wysocki @ 2007-10-25 13:09 ` Gabriel C 0 siblings, 0 replies; 9+ messages in thread From: Gabriel C @ 2007-10-25 13:09 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, linux-pm, linux-acpi Rafael J. Wysocki wrote: >> >> After all I think all this problems may be some who ACPI related >> but the question is why they get triggered by Suspend/Hibernation. > > They certainly are ACPI-related, because the only difference between level 4 > and level 3 suspend testing is that some global ACPI methods are executed > at level 3 (in addition to level 4). > > Unfortunately, I have no idea what to do next, for now. > > I think you can file a bug report at http://bugzilla.kernel.org and put a link > to this thread in there (against ACPI and please add my address to the CC > list). Also I patched 2.6.23 with that patch and Hibernation works out of box , Suspend to Ram seems to work fine , just my video card is acting up ( old nvidia card ) I'll play with vbe tool on weekend. Also I can reproduce that bug in 2.6.23 when I use standby. I've started to bisect but it will take some time. When I'm done I will post an bug report. Thanks for your help so far. > > Greetings, > Rafael > Gabriel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-10-25 13:11 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-22 16:15 Resume problems Gabriel C 2007-10-22 22:35 ` Rafael J. Wysocki 2007-10-22 23:00 ` Gabriel C 2007-10-22 23:31 ` Rafael J. Wysocki 2007-10-22 23:22 ` Gabriel C 2007-10-23 0:11 ` Gabriel C 2007-10-23 1:01 ` Gabriel C 2007-10-23 22:57 ` Rafael J. Wysocki 2007-10-25 13:09 ` Gabriel C
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox