* System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] @ 2015-09-03 21:40 Doug Smythies 2015-09-04 14:59 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Doug Smythies @ 2015-09-03 21:40 UTC (permalink / raw) To: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan' Cc: Doug Smythies, linux-pm As of, or about, Kernel 4.2RC1 if I take my highest numbered CPU offline (7 in my case), the system will not suspend. The issue persists through Kernel 4.2. This is on my test computer with an i7-2600K. I do not normally use suspend on this computer, but was doing so while working on a bug report. The kernel was bisected, and the result was: $ git bisect bad 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit commit 87549141d516aee71d511138e27117c41e8aef68 Author: Viresh Kumar <viresh.kumar@linaro.org> Date: Wed Jun 10 02:13:21 2015 +0200 cpufreq: Stop migrating sysfs files on hotplug See also several e-mails with the above subject line between June 8th and 10th. With any other combination of taking CPUs offline, not including CPU 7, suspend seems to work properly. Since I sometimes mess up using git bisect, and end up at some random result, the conclusion was double checked manually: 87549141d516aee71d511138e27117c41e8aef68 has the issue. 11e584cfb8a9d2226151fd39bfa74d09e575f72d (the previous commit) does not have the issue. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-03 21:40 System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] Doug Smythies @ 2015-09-04 14:59 ` Rafael J. Wysocki 2015-09-04 14:42 ` Viresh Kumar 2015-09-04 15:26 ` Doug Smythies 0 siblings, 2 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-04 14:59 UTC (permalink / raw) To: Doug Smythies Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: > As of, or about, Kernel 4.2RC1 if I take my highest numbered > CPU offline (7 in my case), the system will not suspend. Does "will not suspend" mean that the suspend will fail with an error or will it crash or hang or something else? > The issue persists through Kernel 4.2. > This is on my test computer with an i7-2600K. > I do not normally use suspend on this computer, > but was doing so while working on a bug report. > > The kernel was bisected, and the result was: > > $ git bisect bad > 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit > commit 87549141d516aee71d511138e27117c41e8aef68 > Author: Viresh Kumar <viresh.kumar@linaro.org> > Date: Wed Jun 10 02:13:21 2015 +0200 > > cpufreq: Stop migrating sysfs files on hotplug > > See also several e-mails with the above subject line > between June 8th and 10th. > > With any other combination of taking CPUs offline, > not including CPU 7, suspend seems to work properly. Well, we'll need to debug it some. Can you please check what's in the /sys/devices/system/cpu/cpuX/cpufreq/related_cpus files for all CPUs on your system? Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 14:59 ` Rafael J. Wysocki @ 2015-09-04 14:42 ` Viresh Kumar 2015-09-04 18:41 ` Doug Smythies 2015-09-04 15:26 ` Doug Smythies 1 sibling, 1 reply; 24+ messages in thread From: Viresh Kumar @ 2015-09-04 14:42 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Doug Smythies, 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 04-09-15, 16:59, Rafael J. Wysocki wrote: > On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: > > As of, or about, Kernel 4.2RC1 if I take my highest numbered > > CPU offline (7 in my case), the system will not suspend. > > Does "will not suspend" mean that the suspend will fail with an error > or will it crash or hang or something else? > > > The issue persists through Kernel 4.2. > > This is on my test computer with an i7-2600K. > > I do not normally use suspend on this computer, > > but was doing so while working on a bug report. > > > > The kernel was bisected, and the result was: > > > > $ git bisect bad > > 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit > > commit 87549141d516aee71d511138e27117c41e8aef68 > > Author: Viresh Kumar <viresh.kumar@linaro.org> > > Date: Wed Jun 10 02:13:21 2015 +0200 > > > > cpufreq: Stop migrating sysfs files on hotplug > > > > See also several e-mails with the above subject line > > between June 8th and 10th. > > > > With any other combination of taking CPUs offline, > > not including CPU 7, suspend seems to work properly. > > Well, we'll need to debug it some. > > Can you please check what's in the > > /sys/devices/system/cpu/cpuX/cpufreq/related_cpus > > files for all CPUs on your system? I wanted to give him some patch to debug it a bit more, but couldn't do that whole day. @Doug: Can you please enable DEBUG for cpufreq with this: diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 9fde14544ead..c09945aa7f17 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -1,3 +1,4 @@ +subdir-ccflags-y := -DDEBUG # CPUfreq core obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o And give us the outputs of both successful and unsuccessful logs? + the values of both affected_cpus and related_cpus fields for all CPUs. -- viresh ^ permalink raw reply related [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 14:42 ` Viresh Kumar @ 2015-09-04 18:41 ` Doug Smythies 2015-09-04 22:26 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Doug Smythies @ 2015-09-04 18:41 UTC (permalink / raw) To: 'Viresh Kumar', 'Rafael J. Wysocki' Cc: 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm [-- Attachment #1: Type: text/plain, Size: 1643 bytes --] On 2015.09.04 07:43 Viresh Kumar wrote: > On 04-09-15, 16:59, Rafael J. Wysocki wrote: >> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>> CPU offline (7 in my case), the system will not suspend. > I wanted to give him some patch to debug it a bit more, but couldn't > do that whole day. > @Doug: Can you please enable DEBUG for cpufreq with this: > > diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > index 9fde14544ead..c09945aa7f17 100644 > --- a/drivers/cpufreq/Makefile > +++ b/drivers/cpufreq/Makefile > @@ -1,3 +1,4 @@ > +subdir-ccflags-y := -DDEBUG > # CPUfreq core > obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o > > > And give us the outputs of both successful and unsuccessful logs? Edited /var/log/kern.log attached (might get stripped for on-list e-mail deliveries) > + the values of both affected_cpus and related_cpus fields for all > CPUs. Step 1: CPU 6 offline (sudo pm-suspend works): root@s15:/home/doug# echo -n 0 > /sys/devices/system/cpu/cpu6/online root@s15:/home/doug# cat /sys/devices/system/cpu/cpu*/online 1 1 1 1 1 0 1 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus 0 1 2 3 4 5 7 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus 0 1 2 3 4 5 6 7 Step 2: CPU 7 offline (sudo pm-suspend does not work): root@s15:/sys/devices/system/cpu# cat /sys/devices/system/cpu/cpu*/online 1 1 1 1 1 1 0 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus 0 1 2 3 4 5 6 root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus 0 1 2 3 4 5 6 7 [-- Attachment #2: log.txt --] [-- Type: text/plain, Size: 21696 bytes --] >>>>>> Smythies 2015.09.04 Edited /var/log/kern.log file for Virseh, with Makefile modified. >>>>>> Linux s15 4.2.0viresh #46 SMP Fri Sep 4 09:00:41 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux >>>>>> Take CPU 6 offline. Sep 4 09:21:02 s15 kernel: [ 145.256813] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 6 Sep 4 09:21:02 s15 kernel: [ 145.256819] intel_pstate: CPU 6 exiting Sep 4 09:21:02 s15 kernel: [ 145.274158] smpboot: CPU 6 is now offline >>>>>> Do a "sudo pm-suspend" that will work properly. Subsequently turn computer on again. Sep 4 09:26:11 s15 kernel: [ 454.524941] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.524948] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.524949] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.526917] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.526923] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.526924] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.528952] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.528958] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.528959] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.530887] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.530892] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.530893] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.532526] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.532531] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.532533] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.534520] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.534525] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.534527] cpufreq: setting range Sep 4 09:26:11 s15 kernel: [ 454.537764] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.537766] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:11 s15 kernel: [ 454.537767] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 454.675889] PM: Syncing filesystems ... done. Sep 4 09:26:28 s15 kernel: [ 454.782119] PM: Preparing system for sleep (mem) Sep 4 09:26:28 s15 kernel: [ 454.782272] Freezing user space processes ... (elapsed 0.001 seconds) done. Sep 4 09:26:28 s15 kernel: [ 454.783462] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Sep 4 09:26:28 s15 kernel: [ 454.784612] PM: Suspending system (mem) Sep 4 09:26:28 s15 kernel: [ 454.784628] Suspending console(s) (use no_console_suspend to debug) ... deleted some lines ... Sep 4 09:26:28 s15 kernel: [ 456.490899] PM: suspend of devices complete after 1707.250 msecs Sep 4 09:26:28 s15 kernel: [ 456.506867] PM: late suspend of devices complete after 15.976 msecs Sep 4 09:26:28 s15 kernel: [ 456.507316] pcieport 0000:00:01.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507328] xhci_hcd 0000:07:00.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507374] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507375] r8169 0000:03:00.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.507527] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.522894] PM: noirq suspend of devices complete after 16.036 msecs Sep 4 09:26:28 s15 kernel: [ 456.523192] ACPI: Preparing to enter system sleep state S3 Sep 4 09:26:28 s15 kernel: [ 456.523380] PM: Saving platform NVS memory Sep 4 09:26:28 s15 kernel: [ 456.523389] Disabling non-boot CPUs ... Sep 4 09:26:28 s15 kernel: [ 456.523415] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 1 Sep 4 09:26:28 s15 kernel: [ 456.523416] intel_pstate: CPU 1 exiting Sep 4 09:26:28 s15 kernel: [ 456.524616] smpboot: CPU 1 is now offline Sep 4 09:26:28 s15 kernel: [ 456.547041] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 2 Sep 4 09:26:28 s15 kernel: [ 456.547043] intel_pstate: CPU 2 exiting Sep 4 09:26:28 s15 kernel: [ 456.547173] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.548215] smpboot: CPU 2 is now offline Sep 4 09:26:28 s15 kernel: [ 456.567001] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 3 Sep 4 09:26:28 s15 kernel: [ 456.567003] intel_pstate: CPU 3 exiting Sep 4 09:26:28 s15 kernel: [ 456.567129] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.567163] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.568172] smpboot: CPU 3 is now offline Sep 4 09:26:28 s15 kernel: [ 456.582960] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 4 Sep 4 09:26:28 s15 kernel: [ 456.582961] intel_pstate: CPU 4 exiting Sep 4 09:26:28 s15 kernel: [ 456.583069] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.583104] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.584113] smpboot: CPU 4 is now offline Sep 4 09:26:28 s15 kernel: [ 456.598918] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 5 Sep 4 09:26:28 s15 kernel: [ 456.598920] intel_pstate: CPU 5 exiting Sep 4 09:26:28 s15 kernel: [ 456.599016] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.599023] Broke affinity for irq 25 Sep 4 09:26:28 s15 kernel: [ 456.599049] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.600058] smpboot: CPU 5 is now offline Sep 4 09:26:28 s15 kernel: [ 456.614896] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 09:26:28 s15 kernel: [ 456.614898] intel_pstate: CPU 7 exiting Sep 4 09:26:28 s15 kernel: [ 456.614988] Broke affinity for irq 19 Sep 4 09:26:28 s15 kernel: [ 456.614991] Broke affinity for irq 23 Sep 4 09:26:28 s15 kernel: [ 456.614995] Broke affinity for irq 25 Sep 4 09:26:28 s15 kernel: [ 456.615021] Broke affinity for irq 33 Sep 4 09:26:28 s15 kernel: [ 456.616030] smpboot: CPU 7 is now offline Sep 4 09:26:28 s15 kernel: [ 456.631985] ACPI: Low-level resume complete Sep 4 09:26:28 s15 kernel: [ 456.632020] PM: Restoring platform NVS memory Sep 4 09:26:28 s15 kernel: [ 456.632338] Enabling non-boot CPUs ... Sep 4 09:26:28 s15 kernel: [ 456.632374] x86: Booting SMP configuration: Sep 4 09:26:28 s15 kernel: [ 456.632374] smpboot: Booting Node 0 Processor 1 APIC 0x2 Sep 4 09:26:28 s15 kernel: [ 456.644260] cache: parent cpu1 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.644318] cpufreq: adding CPU 1 Sep 4 09:26:28 s15 kernel: [ 456.644323] intel_pstate: controlling: cpu 1 Sep 4 09:26:28 s15 kernel: [ 456.644325] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.644327] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.644327] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.644328] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.644364] CPU1 is up Sep 4 09:26:28 s15 kernel: [ 456.644381] smpboot: Booting Node 0 Processor 2 APIC 0x4 Sep 4 09:26:28 s15 kernel: [ 456.652292] cache: parent cpu2 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.652348] cpufreq: adding CPU 2 Sep 4 09:26:28 s15 kernel: [ 456.652352] intel_pstate: controlling: cpu 2 Sep 4 09:26:28 s15 kernel: [ 456.652355] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.652356] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.652357] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.652357] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.652390] CPU2 is up Sep 4 09:26:28 s15 kernel: [ 456.652406] smpboot: Booting Node 0 Processor 3 APIC 0x6 Sep 4 09:26:28 s15 kernel: [ 456.660308] cache: parent cpu3 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.660366] cpufreq: adding CPU 3 Sep 4 09:26:28 s15 kernel: [ 456.660370] intel_pstate: controlling: cpu 3 Sep 4 09:26:28 s15 kernel: [ 456.660373] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.660374] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.660375] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.660375] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.660409] CPU3 is up Sep 4 09:26:28 s15 kernel: [ 456.660426] smpboot: Booting Node 0 Processor 4 APIC 0x1 Sep 4 09:26:28 s15 kernel: [ 456.668263] cache: parent cpu4 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.668300] cpufreq: adding CPU 4 Sep 4 09:26:28 s15 kernel: [ 456.668304] intel_pstate: controlling: cpu 4 Sep 4 09:26:28 s15 kernel: [ 456.668307] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.668308] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.668308] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.668309] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.668336] CPU4 is up Sep 4 09:26:28 s15 kernel: [ 456.668349] smpboot: Booting Node 0 Processor 5 APIC 0x3 Sep 4 09:26:28 s15 kernel: [ 456.680270] cache: parent cpu5 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.680307] cpufreq: adding CPU 5 Sep 4 09:26:28 s15 kernel: [ 456.680311] intel_pstate: controlling: cpu 5 Sep 4 09:26:28 s15 kernel: [ 456.680313] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.680314] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.680314] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.680315] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.680341] CPU5 is up Sep 4 09:26:28 s15 kernel: [ 456.680355] smpboot: Booting Node 0 Processor 7 APIC 0x7 Sep 4 09:26:28 s15 kernel: [ 456.688288] cache: parent cpu7 should not be sleeping Sep 4 09:26:28 s15 kernel: [ 456.688326] cpufreq: adding CPU 7 Sep 4 09:26:28 s15 kernel: [ 456.688330] intel_pstate: controlling: cpu 7 Sep 4 09:26:28 s15 kernel: [ 456.688333] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.688333] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:28 s15 kernel: [ 456.688334] cpufreq: setting range Sep 4 09:26:28 s15 kernel: [ 456.688334] cpufreq: initialization complete Sep 4 09:26:28 s15 kernel: [ 456.688360] CPU7 is up Sep 4 09:26:28 s15 kernel: [ 456.693743] ACPI: Waking up from system sleep state S3 Sep 4 09:26:28 s15 kernel: [ 456.708292] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708418] xhci_hcd 0000:07:00.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708583] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.708607] PM: noirq resume of devices complete after 14.593 msecs Sep 4 09:26:28 s15 kernel: [ 456.708929] PM: early resume of devices complete after 0.295 msecs Sep 4 09:26:28 s15 kernel: [ 456.709055] pcieport 0000:00:01.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709059] r8169 0000:03:00.0: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709106] rtc_cmos 00:02: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.709123] usb usb1: root hub lost power or was reset Sep 4 09:26:28 s15 kernel: [ 456.709124] usb usb2: root hub lost power or was reset Sep 4 09:26:28 s15 kernel: [ 456.709537] parport_pc 00:05: activated Sep 4 09:26:28 s15 kernel: [ 456.709572] i8042 kbd 00:07: System wakeup disabled by ACPI Sep 4 09:26:28 s15 kernel: [ 456.710359] serial 00:08: activated Sep 4 09:26:28 s15 kernel: [ 456.713382] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.713386] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV0._GTF] (Node ffff88040ecc8438), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.713407] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.713410] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV1._GTF] (Node ffff88040ecc84b0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.716059] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 456.716068] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 456.718140] sd 0:0:0:0: [sda] Starting disk Sep 4 09:26:28 s15 kernel: [ 456.718142] sd 0:0:1:0: [sdb] Starting disk Sep 4 09:26:28 s15 kernel: [ 456.830264] r8169 0000:03:00.0 eth0: link down Sep 4 09:26:28 s15 kernel: [ 456.830272] br0: port 1(eth0) entered disabled state Sep 4 09:26:28 s15 kernel: [ 457.035873] ata5: SATA link down (SStatus 0 SControl 300) Sep 4 09:26:28 s15 kernel: [ 457.035921] ata6: SATA link down (SStatus 0 SControl 300) Sep 4 09:26:28 s15 kernel: [ 457.191764] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 330) Sep 4 09:26:28 s15 kernel: [ 457.195886] usb 1-2: reset low-speed USB device number 2 using xhci_hcd Sep 4 09:26:28 s15 kernel: [ 457.199927] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359) Sep 4 09:26:28 s15 kernel: [ 457.199930] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536) Sep 4 09:26:28 s15 kernel: [ 457.215848] ata3.00: configured for UDMA/100 Sep 4 09:26:28 s15 kernel: [ 457.473072] usb 1-2: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes Sep 4 09:26:28 s15 kernel: [ 457.476193] PM: resume of devices complete after 767.759 msecs Sep 4 09:26:28 s15 kernel: [ 457.476362] PM: Finishing wakeup. Sep 4 09:26:28 s15 kernel: [ 457.476363] Restarting tasks ... done. ... deleted some lines... Sep 4 09:26:36 s15 kernel: [ 464.735231] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735235] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735236] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.735678] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735680] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.735681] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736112] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736114] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736115] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736547] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736549] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736550] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.736968] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736971] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.736972] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.737388] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.737390] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.737391] cpufreq: setting range Sep 4 09:26:36 s15 kernel: [ 464.738325] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.738328] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:26:36 s15 kernel: [ 464.738329] cpufreq: setting range Sep 4 09:26:39 s15 kernel: [ 468.184616] br0: port 1(eth0) entered forwarding state >>>>>> Now put CPU 6 back online and take CPU 7 offline. Sep 4 09:30:52 s15 kernel: [ 720.581828] smpboot: Booting Node 0 Processor 6 APIC 0x5 Sep 4 09:30:52 s15 kernel: [ 720.602104] cpufreq: adding CPU 6 Sep 4 09:30:52 s15 kernel: [ 720.602142] intel_pstate: controlling: cpu 6 Sep 4 09:30:52 s15 kernel: [ 720.602147] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 09:30:52 s15 kernel: [ 720.602150] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:30:52 s15 kernel: [ 720.602151] cpufreq: setting range Sep 4 09:30:52 s15 kernel: [ 720.602153] cpufreq: initialization complete Sep 4 09:31:03 s15 kernel: [ 732.221929] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 09:31:03 s15 kernel: [ 732.221932] intel_pstate: CPU 7 exiting Sep 4 09:31:03 s15 kernel: [ 732.235076] smpboot: CPU 7 is now offline >>>>>> about to to "sudo pm-suspend" that will fail. Sep 4 09:32:43 s15 kernel: [ 831.613558] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.613562] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.613563] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.614373] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.614375] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.614376] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.615165] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615166] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615167] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.615956] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615957] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.615958] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.616740] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.616741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.616742] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.617594] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.617596] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.617596] cpufreq: setting range Sep 4 09:32:43 s15 kernel: [ 831.618386] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.618387] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 09:32:43 s15 kernel: [ 831.618388] cpufreq: setting range >>>>>> During write up, realize that the last entries above must have been during the "sudo pm-suspend" that fails. >>>>>> Verify that these log entries are all that occur for the failed "sudo pm-suspend" by doing another: Sep 4 11:15:38 s15 kernel: [ 7002.787685] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.787689] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.787690] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.788544] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.788546] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.788547] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.789389] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.790210] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790212] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790213] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.790992] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.791891] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.791893] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.791894] cpufreq: setting range Sep 4 11:15:38 s15 kernel: [ 7002.792739] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz Sep 4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: setting range ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 18:41 ` Doug Smythies @ 2015-09-04 22:26 ` Rafael J. Wysocki 2015-09-04 23:05 ` Doug Smythies 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-04 22:26 UTC (permalink / raw) To: Doug Smythies Cc: Viresh Kumar, Rafael J. Wysocki, Saravana Kannan, linux-pm@vger.kernel.org On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: > On 2015.09.04 07:43 Viresh Kumar wrote: >> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>> CPU offline (7 in my case), the system will not suspend. > >> I wanted to give him some patch to debug it a bit more, but couldn't >> do that whole day. > >> @Doug: Can you please enable DEBUG for cpufreq with this: >> >> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >> index 9fde14544ead..c09945aa7f17 100644 >> --- a/drivers/cpufreq/Makefile >> +++ b/drivers/cpufreq/Makefile >> @@ -1,3 +1,4 @@ >> +subdir-ccflags-y := -DDEBUG >> # CPUfreq core >> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >> >> >> And give us the outputs of both successful and unsuccessful logs? > > Edited /var/log/kern.log attached (might get stripped for > on-list e-mail deliveries) Hmm. I suspect that your user space does something that fails during the pm-suspend. Instead of invoking the pm-suspend command, can you simply do (as root) # echo mem > /sys/power/state and see if that behaves in the same way? Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 22:26 ` Rafael J. Wysocki @ 2015-09-04 23:05 ` Doug Smythies 2015-09-05 0:22 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Doug Smythies @ 2015-09-04 23:05 UTC (permalink / raw) To: 'Rafael J. Wysocki' Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 2015.09.04 15:26 Rafael J. Wysocki wrote: > On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >> On 2015.09.04 07:43 Viresh Kumar wrote: >>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>> CPU offline (7 in my case), the system will not suspend. > >>> @Doug: Can you please enable DEBUG for cpufreq with this: >>> >>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>> index 9fde14544ead..c09945aa7f17 100644 >>> --- a/drivers/cpufreq/Makefile >>> +++ b/drivers/cpufreq/Makefile >>> @@ -1,3 +1,4 @@ >>> +subdir-ccflags-y := -DDEBUG >>> # CPUfreq core >>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>> >>> >>> And give us the outputs of both successful and unsuccessful logs? >> >> Edited /var/log/kern.log attached (might get stripped for >> on-list e-mail deliveries) > Hmm. > I suspect that your user space does something that fails during the pm-suspend. Are you saying that the patch might be O.K., but reveals and issue with pm-suspend that was always there? > Instead of invoking the pm-suspend command, can you simply do (as root) > # echo mem > /sys/power/state > and see if that behaves in the same way? With CPU 7 offline, that method seems to suspend just fine. I did not check any other scenarios. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 23:05 ` Doug Smythies @ 2015-09-05 0:22 ` Rafael J. Wysocki 2015-09-05 1:41 ` Rafael J. Wysocki 2015-09-05 2:34 ` Doug Smythies 0 siblings, 2 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-05 0:22 UTC (permalink / raw) To: Doug Smythies Cc: Rafael J. Wysocki, Viresh Kumar, Rafael J. Wysocki, Saravana Kannan, linux-pm@vger.kernel.org On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: > On 2015.09.04 15:26 Rafael J. Wysocki wrote: >> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>> CPU offline (7 in my case), the system will not suspend. >> >>>> @Doug: Can you please enable DEBUG for cpufreq with this: >>>> >>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>>> index 9fde14544ead..c09945aa7f17 100644 >>>> --- a/drivers/cpufreq/Makefile >>>> +++ b/drivers/cpufreq/Makefile >>>> @@ -1,3 +1,4 @@ >>>> +subdir-ccflags-y := -DDEBUG >>>> # CPUfreq core >>>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>>> >>>> >>>> And give us the outputs of both successful and unsuccessful logs? >>> >>> Edited /var/log/kern.log attached (might get stripped for >>> on-list e-mail deliveries) > >> Hmm. >> I suspect that your user space does something that fails during the pm-suspend. > > Are you saying that the patch might be O.K., but reveals > and issue with pm-suspend that was always there? Or it breaks something that pm-suspend does before suspending. It would be good to know what it is. :-) The "setting new policy for" messages in your log are from cpufreq_set_policy() and the last thing printed before pm-suspend exits in the failing case is before calling cpufreq_driver->setpolicy(). I guess we need to focus on that one, will send you a debug patch shortly. >> Instead of invoking the pm-suspend command, can you simply do (as root) >> # echo mem > /sys/power/state >> and see if that behaves in the same way? > > With CPU 7 offline, that method seems to suspend just fine. > I did not check any other scenarios. OK I'm now suspecting that the change in question might break something in intel_pstate which causes ->setpolicy() to fail for the last online CPU. Can you try to offline CPU7 and try to play with min and max for CPU6? Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 0:22 ` Rafael J. Wysocki @ 2015-09-05 1:41 ` Rafael J. Wysocki 2015-09-05 2:34 ` Doug Smythies 1 sibling, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-05 1:41 UTC (permalink / raw) To: Doug Smythies Cc: Rafael J. Wysocki, Viresh Kumar, Saravana Kannan, linux-pm@vger.kernel.org On Saturday, September 05, 2015 02:22:39 AM Rafael J. Wysocki wrote: > On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: > > On 2015.09.04 15:26 Rafael J. Wysocki wrote: > >> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: > >>> On 2015.09.04 07:43 Viresh Kumar wrote: > >>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: > >>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: > >>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered > >>>>>> CPU offline (7 in my case), the system will not suspend. > >> > >>>> @Doug: Can you please enable DEBUG for cpufreq with this: > >>>> > >>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile > >>>> index 9fde14544ead..c09945aa7f17 100644 > >>>> --- a/drivers/cpufreq/Makefile > >>>> +++ b/drivers/cpufreq/Makefile > >>>> @@ -1,3 +1,4 @@ > >>>> +subdir-ccflags-y := -DDEBUG > >>>> # CPUfreq core > >>>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o > >>>> > >>>> > >>>> And give us the outputs of both successful and unsuccessful logs? > >>> > >>> Edited /var/log/kern.log attached (might get stripped for > >>> on-list e-mail deliveries) > > > >> Hmm. > >> I suspect that your user space does something that fails during the pm-suspend. > > > > Are you saying that the patch might be O.K., but reveals > > and issue with pm-suspend that was always there? > > Or it breaks something that pm-suspend does before suspending. > > It would be good to know what it is. :-) > > The "setting new policy for" messages in your log are from > cpufreq_set_policy() and the last thing printed before pm-suspend > exits in the failing case is before calling > cpufreq_driver->setpolicy(). > > I guess we need to focus on that one, will send you a debug patch shortly. Please apply this one and see if there are any message from intel_pstate_set_policy() in the failing case. --- drivers/cpufreq/intel_pstate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-pm/drivers/cpufreq/intel_pstate.c =================================================================== --- linux-pm.orig/drivers/cpufreq/intel_pstate.c +++ linux-pm/drivers/cpufreq/intel_pstate.c @@ -972,8 +972,10 @@ static unsigned int intel_pstate_get(uns static int intel_pstate_set_policy(struct cpufreq_policy *policy) { - if (!policy->cpuinfo.max_freq) + if (!policy->cpuinfo.max_freq) { + pr_err("%s: failed for CPU%d\n", __func__, policy->cpu); return -ENODEV; + } if (policy->policy == CPUFREQ_POLICY_PERFORMANCE && policy->max >= policy->cpuinfo.max_freq) { ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 0:22 ` Rafael J. Wysocki 2015-09-05 1:41 ` Rafael J. Wysocki @ 2015-09-05 2:34 ` Doug Smythies 2015-09-05 7:46 ` Doug Smythies 1 sibling, 1 reply; 24+ messages in thread From: Doug Smythies @ 2015-09-05 2:34 UTC (permalink / raw) To: 'Rafael J. Wysocki' Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm [-- Attachment #1: Type: text/plain, Size: 2866 bytes --] On 2015.09.04 17:23 Rafael J. Wysocki wrote: > On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: >> On 2015.09.04 15:26 Rafael J. Wysocki wrote: >>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>>> CPU offline (7 in my case), the system will not suspend. >>> >>>>> @Doug: Can you please enable DEBUG for cpufreq with this: >>>>> >>>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile >>>>> index 9fde14544ead..c09945aa7f17 100644 >>>>> --- a/drivers/cpufreq/Makefile >>>>> +++ b/drivers/cpufreq/Makefile >>>>> @@ -1,3 +1,4 @@ >>>>> +subdir-ccflags-y := -DDEBUG >>>>> # CPUfreq core >>>>> obj-$(CONFIG_CPU_FREQ) += cpufreq.o freq_table.o >>>>> >>>>> >>>>> And give us the outputs of both successful and unsuccessful logs? >>>> >>>> Edited /var/log/kern.log attached (might get stripped for >>>> on-list e-mail deliveries) >> >>> Hmm. >>> I suspect that your user space does something that fails during the pm-suspend. >> >> Are you saying that the patch might be O.K., but reveals >> and issue with pm-suspend that was always there? > > Or it breaks something that pm-suspend does before suspending. > > It would be good to know what it is. :-) While researching pm-utils bugs, I found reference to /var/log/pm-suspend.log, which I had not noticed before. Relevant extract attached. It is not clear to me why that echo line (there is only one) would fail. > The "setting new policy for" messages in your log are from > cpufreq_set_policy() and the last thing printed before pm-suspend > exits in the failing case is before calling > cpufreq_driver->setpolicy(). > > I guess we need to focus on that one, will send you a debug patch shortly. > >>> Instead of invoking the pm-suspend command, can you simply do (as root) >>> # echo mem > /sys/power/state >>> and see if that behaves in the same way? >> >> With CPU 7 offline, that method seems to suspend just fine. >> I did not check any other scenarios. > > OK > > I'm now suspecting that the change in question might break something > in intel_pstate which causes ->setpolicy() to fail for the last online > CPU. While, by far, most of my work on this has been done using the intel_pstate scaling driver, I have also tested using the acpi-cpufreq scaling driver, with the same results. > Can you try to offline CPU7 and try to play with min and max for CPU6? Yes. I did, on a kernel with your intel_pstate.c patch from your subsequent e-mail. I didn't notice any problem, but maybe didn't do it correctly to manifest the issue. small /var/log/kern.log segment attached. ... Doug [-- Attachment #2: pm_log.txt --] [-- Type: text/plain, Size: 1091 bytes --] Running hook /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: Failed to connect to non-global ctrl_ifname: (null) error: No such file or directory /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: success. Running hook /usr/lib/pm-utils/sleep.d/75modules suspend suspend: /usr/lib/pm-utils/sleep.d/75modules suspend suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/90clock suspend suspend: /usr/lib/pm-utils/sleep.d/90clock suspend suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: sh: echo: I/O error /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: Returned exit code 1. Fri Sep 4 18:25:00 PDT 2015: Inhibit found, will not perform suspend Fri Sep 4 18:25:00 PDT 2015: Running hooks for resume Running hook /usr/lib/pm-utils/sleep.d/90clock resume suspend: /usr/lib/pm-utils/sleep.d/90clock resume suspend: not applicable. Running hook /usr/lib/pm-utils/sleep.d/75modules resume suspend: Reloaded unloaded modules. /usr/lib/pm-utils/sleep.d/75modules resume suspend: success. [-- Attachment #3: log_2.txt --] [-- Type: text/plain, Size: 795 bytes --] Sep 4 19:14:31 s15 kernel: [ 115.529407] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7 Sep 4 19:14:31 s15 kernel: [ 115.529413] intel_pstate: CPU 7 exiting Sep 4 19:14:31 s15 kernel: [ 115.542754] smpboot: CPU 7 is now offline Sep 4 19:20:19 s15 kernel: [ 463.426743] cpufreq: setting new policy for CPU 6: 1600000 - 2400000 kHz Sep 4 19:20:19 s15 kernel: [ 463.426750] cpufreq: new min and max freqs are 1600000 - 2400000 kHz Sep 4 19:20:19 s15 kernel: [ 463.426752] cpufreq: setting range Sep 4 19:23:05 s15 kernel: [ 628.598506] cpufreq: setting new policy for CPU 6: 1600000 - 2200000 kHz Sep 4 19:23:05 s15 kernel: [ 628.598511] cpufreq: new min and max freqs are 1600000 - 2200000 kHz Sep 4 19:23:05 s15 kernel: [ 628.598513] cpufreq: setting range ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 2:34 ` Doug Smythies @ 2015-09-05 7:46 ` Doug Smythies 2015-09-05 8:14 ` Viresh Kumar 2015-09-07 13:07 ` Rafael J. Wysocki 0 siblings, 2 replies; 24+ messages in thread From: Doug Smythies @ 2015-09-05 7:46 UTC (permalink / raw) To: 'Rafael J. Wysocki' Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 2015.09.05 19:35 Doug Smythies wrote: > On 2015.09.04 17:23 Rafael J. Wysocki wrote: >> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: >>> On 2015.09.04 15:26 Rafael J. Wysocki wrote: >>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: >>>>> On 2015.09.04 07:43 Viresh Kumar wrote: >>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: >>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered >>>>>>>> CPU offline (7 in my case), the system will not suspend. >>>> Hmm. >>>> I suspect that your user space does something that fails during the pm-suspend. >>> >>> Are you saying that the patch might be O.K., but reveals >>> and issue with pm-suspend that was always there? >> >> Or it breaks something that pm-suspend does before suspending. >> >> It would be good to know what it is. :-) > While researching pm-utils bugs, I found reference to > /var/log/pm-suspend.log, which I had not noticed before. > Relevant extract attached. > It is not clear to me why that echo line (there is only one) > would fail. The echo line fails because the related CPU is offline. If the failed echo is the last pass through the loop, then the script interprets the overall execution of 94cpufreq as a failure and aborts the suspend. If the failed echo is not the last pass through the loop, then the bad exit code gets overwritten with a good one before the loop exits. Since the loop is merely setting a temporary governor, to test I just used performance mode anyway, and commented out the echo. pm-suspend with CPU 7 offline then worked fine. I have not yet gone back to any before the patch kernel to determine why it used to work (it is late in my time zone). However, I would have to assume that before the commit in question, the echo worked even if the CPU was offline. Could someone please confirm or deny the above conclusion. The relevant code segment, with some added debug echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq hibernate_cpufreq() { ( cd /sys/devices/system/cpu/ for x in cpu[0-9]*; do # if cpufreq is a symlink, it is handled by another cpu. Skip. [ -L "$x/cpufreq" ] && continue gov="$x/cpufreq/scaling_governor" # if we do not have a scaling_governor file, skip. [ -f "$gov" ] || continue # if our temporary governor is not available, skip. grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ "$x/cpufreq/scaling_available_governors" || continue savestate "${x}_governor" < "$gov" # I added the next 3 lines echo "$x" echo "$TEMPORARY_CPUFREQ_GOVERNOR" echo "$gov" # For a test, do not do the echo, as I already set performance mode # echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" done ) } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 7:46 ` Doug Smythies @ 2015-09-05 8:14 ` Viresh Kumar 2015-09-07 13:32 ` Rafael J. Wysocki 2015-09-07 13:07 ` Rafael J. Wysocki 1 sibling, 1 reply; 24+ messages in thread From: Viresh Kumar @ 2015-09-05 8:14 UTC (permalink / raw) To: Doug Smythies Cc: 'Rafael J. Wysocki', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 05-09-15, 00:46, Doug Smythies wrote: > > It is not clear to me why that echo line (there is only one) > > would fail. To me it is clear now :) > The echo line fails because the related CPU is offline. > If the failed echo is the last pass through the loop, > then the script interprets the overall execution of > 94cpufreq as a failure and aborts the suspend. If the > failed echo is not the last pass through the loop, then > the bad exit code gets overwritten with a good one before > the loop exits. > > Since the loop is merely setting a temporary governor, > to test I just used performance mode anyway, and commented > out the echo. pm-suspend with CPU 7 offline then worked fine. > > I have not yet gone back to any before the patch kernel > to determine why it used to work (it is late in my time zone). > However, I would have to assume that before the commit in > question, the echo worked even if the CPU was offline. So here is the story behind it. - In your system all CPUs are independent, that is there are no links to cpufreq directory, so that check in the script is useless for you. - The $COMMIT in question did a significant change. Earlier, while offlining the CPU, we used to remove the cpufreq directory from sysfs, which is not the case any more. - So to be precise, following lines came to your rescue earlier: # if we do not have a scaling_governor file, skip. # [ -f "$gov" ] || continue - But they don't after the patch, as the file and directory are present even if the CPU is offline. - But because the CPU is offline, writing to those files isn't allowed and so the echo failed. Solution to that is that we check for CPU offline as well in the beginning of the script, and skip if the CPU is offline. -- viresh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 8:14 ` Viresh Kumar @ 2015-09-07 13:32 ` Rafael J. Wysocki 2015-09-08 2:40 ` Viresh Kumar 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-07 13:32 UTC (permalink / raw) To: Viresh Kumar Cc: Doug Smythies, 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On Saturday, September 05, 2015 01:44:07 PM Viresh Kumar wrote: > On 05-09-15, 00:46, Doug Smythies wrote: > > > It is not clear to me why that echo line (there is only one) > > > would fail. > > To me it is clear now :) > > > The echo line fails because the related CPU is offline. > > If the failed echo is the last pass through the loop, > > then the script interprets the overall execution of > > 94cpufreq as a failure and aborts the suspend. If the > > failed echo is not the last pass through the loop, then > > the bad exit code gets overwritten with a good one before > > the loop exits. > > > > Since the loop is merely setting a temporary governor, > > to test I just used performance mode anyway, and commented > > out the echo. pm-suspend with CPU 7 offline then worked fine. > > > > I have not yet gone back to any before the patch kernel > > to determine why it used to work (it is late in my time zone). > > However, I would have to assume that before the commit in > > question, the echo worked even if the CPU was offline. > > So here is the story behind it. > - In your system all CPUs are independent, that is there are no links > to cpufreq directory, so that check in the script is useless for > you. > - The $COMMIT in question did a significant change. Earlier, while > offlining the CPU, we used to remove the cpufreq directory from > sysfs, which is not the case any more. > > - So to be precise, following lines came to your rescue earlier: > > # if we do not have a scaling_governor file, skip. > # [ -f "$gov" ] || continue > > - But they don't after the patch, as the file and directory are > present even if the CPU is offline. > - But because the CPU is offline, writing to those files isn't allowed > and so the echo failed. > > Solution to that is that we check for CPU offline as well in the > beginning of the script, and skip if the CPU is offline. That's a bug in the script. It should discard all errors from the entire inner loop, but it doesn't discard errors from the last iteration of it. That said, what store() in cpufreq.c does is questionable too. First, if policy->cpu is offline, the policy will be inactive to my eyes, so we don't need the second check. But if the policy is active (and policy->cpu is online), it will not generally fail for an offline CPU. So, if the policy applies to more than 1 CPU, you can use any of them to manipulate it, even if one of them is offline as long as there are any online CPUs in the set. This isn't entirely consistent. We should either fail store() for any offline CPU or make the changes for offline CPUs to. And in the particular case of the governor, I'm wondering what will be the problem with changing last_governor for an inactive policy? Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-07 13:32 ` Rafael J. Wysocki @ 2015-09-08 2:40 ` Viresh Kumar 2015-09-11 20:43 ` Saravana Kannan 0 siblings, 1 reply; 24+ messages in thread From: Viresh Kumar @ 2015-09-08 2:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Doug Smythies, 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 07-09-15, 15:32, Rafael J. Wysocki wrote: > First, if policy->cpu is offline, the policy will be inactive to my eyes, so > we don't need the second check. Hmm, or maybe just drop the first check. > But if the policy is active (and policy->cpu is online), it will not generally > fail for an offline CPU. Right. > So, if the policy applies to more than 1 CPU, you > can use any of them to manipulate it, even if one of them is offline as long > as there are any online CPUs in the set. Right. > This isn't entirely consistent. We should either fail store() for any offline > CPU At that point we have no idea of the CPU for which the sysfs operation is called. And so we have to go ahead without failing, if policy is active. > or make the changes for offline CPUs to. What does that mean? Most of the stuff we do is for the policy, rather than per-cpu. And if there is per-cpu stuff, then we *only* should be doing that for the online ones. Not sure if I understood what you meant here. :( > And in the particular case of > the governor, I'm wondering what will be the problem with changing last_governor > for an inactive policy? I don't think we should be adding special cases for updating sysfs attributes of an inactive policy. Its not just about the last_governor thing, but other sysfs attributes as well. -- viresh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-08 2:40 ` Viresh Kumar @ 2015-09-11 20:43 ` Saravana Kannan 2015-09-11 21:30 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Saravana Kannan @ 2015-09-11 20:43 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki', linux-pm Sorry about the late reply and not helping out earlier. Didn't check this email account for sometime. On 09/07/2015 07:40 PM, Viresh Kumar wrote: > On 07-09-15, 15:32, Rafael J. Wysocki wrote: >> First, if policy->cpu is offline, the policy will be inactive to my eyes, so >> we don't need the second check. > > Hmm, or maybe just drop the first check. > >> But if the policy is active (and policy->cpu is online), it will not generally >> fail for an offline CPU. > > Right. > >> So, if the policy applies to more than 1 CPU, you >> can use any of them to manipulate it, even if one of them is offline as long >> as there are any online CPUs in the set. > > Right. > >> This isn't entirely consistent. We should either fail store() for any offline >> CPU > > At that point we have no idea of the CPU for which the sysfs operation > is called. And so we have to go ahead without failing, if policy is > active. > >> or make the changes for offline CPUs to. > > What does that mean? Most of the stuff we do is for the policy, rather > than per-cpu. And if there is per-cpu stuff, then we *only* should be > doing that for the online ones. > > Not sure if I understood what you meant here. :( > >> And in the particular case of >> the governor, I'm wondering what will be the problem with changing last_governor >> for an inactive policy? > > I don't think we should be adding special cases for updating sysfs > attributes of an inactive policy. Its not just about the last_governor > thing, but other sysfs attributes as well. > The way I see it, having the cpufreq policy control sysfs "bits" under every CPU directory is what's causing some semantic confusion/inconsistency. Every single node under a cpufreq folder is for policy control and not CPU control. But by putting the policy control bits under the cpuX directory, we give the wrong semantic impression that it's a per CPU attribute when it's really per-policy. Ideally (in terms of semantics) we would have put all the policy control bits in a per policy directory under /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied to the first CPU in related CPUs -- so that it's easy to correlate and also to avoid having the policy numbering being different depending on the order in which CPUs get hotplugged. But we can't go about breaking userspace ABI by removing the cpufreq directories out of the cpu directories just because of the semantic confusion. Well, we COULD still put the policy directories under cpu/cpufreq/ and then make every cpuX/cpufreq directory a symlink to the actual policy directory. But that is not going to help with this specific issue/discussion. Having said all that, I still think that stores to all these sysfs files should work. I'm not saying it's a trivial change (like setting a governor's polling time, etc would need some checks to cache the value and not start a timer immediately, etc), but I think it's a more consistent and user friendly API. If the user wants to set a min CPU freq, why should they care if the CPU is online at that very instant? It gets especially painful if you have a thermal daemon that's plugging in/out CPUs while the user or a script is trying to set the parameters. Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-11 20:43 ` Saravana Kannan @ 2015-09-11 21:30 ` Rafael J. Wysocki 2015-09-11 22:07 ` Saravana Kannan 0 siblings, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-11 21:30 UTC (permalink / raw) To: Saravana Kannan Cc: Viresh Kumar, Doug Smythies, 'Rafael J. Wysocki', linux-pm On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote: > Sorry about the late reply and not helping out earlier. Didn't check > this email account for sometime. > > On 09/07/2015 07:40 PM, Viresh Kumar wrote: > > On 07-09-15, 15:32, Rafael J. Wysocki wrote: > >> First, if policy->cpu is offline, the policy will be inactive to my eyes, so > >> we don't need the second check. > > > > Hmm, or maybe just drop the first check. > > > >> But if the policy is active (and policy->cpu is online), it will not generally > >> fail for an offline CPU. > > > > Right. > > > >> So, if the policy applies to more than 1 CPU, you > >> can use any of them to manipulate it, even if one of them is offline as long > >> as there are any online CPUs in the set. > > > > Right. > > > >> This isn't entirely consistent. We should either fail store() for any offline > >> CPU > > > > At that point we have no idea of the CPU for which the sysfs operation > > is called. And so we have to go ahead without failing, if policy is > > active. > > > >> or make the changes for offline CPUs to. > > > > What does that mean? Most of the stuff we do is for the policy, rather > > than per-cpu. And if there is per-cpu stuff, then we *only* should be > > doing that for the online ones. > > > > Not sure if I understood what you meant here. :( > > > >> And in the particular case of > >> the governor, I'm wondering what will be the problem with changing last_governor > >> for an inactive policy? > > > > I don't think we should be adding special cases for updating sysfs > > attributes of an inactive policy. Its not just about the last_governor > > thing, but other sysfs attributes as well. > > > > The way I see it, having the cpufreq policy control sysfs "bits" under > every CPU directory is what's causing some semantic confusion/inconsistency. > > Every single node under a cpufreq folder is for policy control and not > CPU control. But by putting the policy control bits under the cpuX > directory, we give the wrong semantic impression that it's a per CPU > attribute when it's really per-policy. > > Ideally (in terms of semantics) we would have put all the policy control > bits in a per policy directory under > /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied > to the first CPU in related CPUs -- so that it's easy to correlate and > also to avoid having the policy numbering being different depending on > the order in which CPUs get hotplugged. > > But we can't go about breaking userspace ABI by removing the cpufreq > directories out of the cpu directories just because of the semantic > confusion. > > Well, we COULD still put the policy directories under cpu/cpufreq/ and > then make every cpuX/cpufreq directory a symlink to the actual policy > directory. But that is not going to help with this specific > issue/discussion. It isn't, but it'd be a good change in my view. > Having said all that, I still think that stores to all these sysfs files > should work. I'm not saying it's a trivial change (like setting a > governor's polling time, etc would need some checks to cache the value > and not start a timer immediately, etc), but I think it's a more > consistent and user friendly API. I agree. It also is backwards compatible with scripts that walk the cpufreq directories for all CPUs without checking the online attribute and expect things to work. > If the user wants to set a min CPU freq, why should they care if the CPU > is online at that very instant? It gets especially painful if you have a > thermal daemon that's plugging in/out CPUs while the user or a script is > trying to set the parameters. You mean putting them offline/online I suppose? Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-11 21:30 ` Rafael J. Wysocki @ 2015-09-11 22:07 ` Saravana Kannan 2015-10-11 9:47 ` Viresh Kumar 0 siblings, 1 reply; 24+ messages in thread From: Saravana Kannan @ 2015-09-11 22:07 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Viresh Kumar, Doug Smythies, 'Rafael J. Wysocki', linux-pm On 09/11/2015 02:30 PM, Rafael J. Wysocki wrote: > On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote: >> Sorry about the late reply and not helping out earlier. Didn't check >> this email account for sometime. >> >> On 09/07/2015 07:40 PM, Viresh Kumar wrote: >>> On 07-09-15, 15:32, Rafael J. Wysocki wrote: >>>> First, if policy->cpu is offline, the policy will be inactive to my eyes, so >>>> we don't need the second check. >>> >>> Hmm, or maybe just drop the first check. >>> >>>> But if the policy is active (and policy->cpu is online), it will not generally >>>> fail for an offline CPU. >>> >>> Right. >>> >>>> So, if the policy applies to more than 1 CPU, you >>>> can use any of them to manipulate it, even if one of them is offline as long >>>> as there are any online CPUs in the set. >>> >>> Right. >>> >>>> This isn't entirely consistent. We should either fail store() for any offline >>>> CPU >>> >>> At that point we have no idea of the CPU for which the sysfs operation >>> is called. And so we have to go ahead without failing, if policy is >>> active. >>> >>>> or make the changes for offline CPUs to. >>> >>> What does that mean? Most of the stuff we do is for the policy, rather >>> than per-cpu. And if there is per-cpu stuff, then we *only* should be >>> doing that for the online ones. >>> >>> Not sure if I understood what you meant here. :( >>> >>>> And in the particular case of >>>> the governor, I'm wondering what will be the problem with changing last_governor >>>> for an inactive policy? >>> >>> I don't think we should be adding special cases for updating sysfs >>> attributes of an inactive policy. Its not just about the last_governor >>> thing, but other sysfs attributes as well. >>> >> >> The way I see it, having the cpufreq policy control sysfs "bits" under >> every CPU directory is what's causing some semantic confusion/inconsistency. >> >> Every single node under a cpufreq folder is for policy control and not >> CPU control. But by putting the policy control bits under the cpuX >> directory, we give the wrong semantic impression that it's a per CPU >> attribute when it's really per-policy. >> >> Ideally (in terms of semantics) we would have put all the policy control >> bits in a per policy directory under >> /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied >> to the first CPU in related CPUs -- so that it's easy to correlate and >> also to avoid having the policy numbering being different depending on >> the order in which CPUs get hotplugged. >> >> But we can't go about breaking userspace ABI by removing the cpufreq >> directories out of the cpu directories just because of the semantic >> confusion. >> >> Well, we COULD still put the policy directories under cpu/cpufreq/ and >> then make every cpuX/cpufreq directory a symlink to the actual policy >> directory. But that is not going to help with this specific >> issue/discussion. > > It isn't, but it'd be a good change in my view. >> Having said all that, I still think that stores to all these sysfs files >> should work. I'm not saying it's a trivial change (like setting a >> governor's polling time, etc would need some checks to cache the value >> and not start a timer immediately, etc), but I think it's a more >> consistent and user friendly API. > > I agree. > > It also is backwards compatible with scripts that walk the cpufreq directories > for all CPUs without checking the online attribute and expect things to work. Good to see some support. I do know Viresh doesn't like this :) Thinking more about it, it'll also make the code simpler since we don't have to decide which CPU has the real files vs which ones have symlinks. We probably won't need policy->cpu or kobj_cpu anymore. I'd love to do all these changes, but I doubt I'll find the time with the official job responsibilities I have. We'll see. >> If the user wants to set a min CPU freq, why should they care if the CPU >> is online at that very instant? It gets especially painful if you have a >> thermal daemon that's plugging in/out CPUs while the user or a script is >> trying to set the parameters. > > You mean putting them offline/online I suppose? Yup, I meant that a thermal daemon is putting them online/offline -- so used to using the terms plugging in/out internally since we don't have to deal with physical removals on mobile devices (YET?!). It'll be quite an achievement to see a daemon actually plugging a CPU in/out :) Thanks, Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-11 22:07 ` Saravana Kannan @ 2015-10-11 9:47 ` Viresh Kumar 2015-10-12 19:43 ` Saravana Kannan 0 siblings, 1 reply; 24+ messages in thread From: Viresh Kumar @ 2015-10-11 9:47 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki', linux-pm On 11-09-15, 15:07, Saravana Kannan wrote: > Good to see some support. I do know Viresh doesn't like this :) Sorry for catching up late, but looks like you guys did convince me on this. Let me get some patches out :) -- viresh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-10-11 9:47 ` Viresh Kumar @ 2015-10-12 19:43 ` Saravana Kannan 2015-10-13 3:47 ` Viresh Kumar 0 siblings, 1 reply; 24+ messages in thread From: Saravana Kannan @ 2015-10-12 19:43 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki', linux-pm On 10/11/2015 02:47 AM, Viresh Kumar wrote: > On 11-09-15, 15:07, Saravana Kannan wrote: >> Good to see some support. I do know Viresh doesn't like this :) > > Sorry for catching up late, but looks like you guys did convince me on > this. Did I also convince you about allowing changing of parameters for offline CPUs? Which would also include inactive policies? > > Let me get some patches out :) > Thanks! -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-10-12 19:43 ` Saravana Kannan @ 2015-10-13 3:47 ` Viresh Kumar 2015-10-13 19:23 ` Saravana Kannan 0 siblings, 1 reply; 24+ messages in thread From: Viresh Kumar @ 2015-10-13 3:47 UTC (permalink / raw) To: Saravana Kannan Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki', linux-pm On 12-10-15, 12:43, Saravana Kannan wrote: > Did I also convince you about allowing changing of parameters for > offline CPUs? Which would also include inactive policies? Yes, but that requires some careful modification of the code as there are various paths of the sysfs write path. And we need to see that we don't do anything more than just updating the files. So, I left it for now. -- viresh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-10-13 3:47 ` Viresh Kumar @ 2015-10-13 19:23 ` Saravana Kannan 0 siblings, 0 replies; 24+ messages in thread From: Saravana Kannan @ 2015-10-13 19:23 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki', linux-pm On 10/12/2015 08:47 PM, Viresh Kumar wrote: > On 12-10-15, 12:43, Saravana Kannan wrote: >> Did I also convince you about allowing changing of parameters for >> offline CPUs? Which would also include inactive policies? > > Yes, but that requires some careful modification of the code as there > are various paths of the sysfs write path. And we need to see that we > don't do anything more than just updating the files. So, I left it for > now. > Agreed. It's non-trivial and we can do that separately. -Saravana -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-05 7:46 ` Doug Smythies 2015-09-05 8:14 ` Viresh Kumar @ 2015-09-07 13:07 ` Rafael J. Wysocki 2015-09-07 14:03 ` Doug Smythies 1 sibling, 1 reply; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-07 13:07 UTC (permalink / raw) To: Doug Smythies Cc: 'Rafael J. Wysocki', 'Viresh Kumar', 'Saravana Kannan', linux-pm On Saturday, September 05, 2015 12:46:40 AM Doug Smythies wrote: > On 2015.09.05 19:35 Doug Smythies wrote: > > On 2015.09.04 17:23 Rafael J. Wysocki wrote: > >> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote: > >>> On 2015.09.04 15:26 Rafael J. Wysocki wrote: > >>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote: > >>>>> On 2015.09.04 07:43 Viresh Kumar wrote: > >>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote: > >>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: > >>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered > >>>>>>>> CPU offline (7 in my case), the system will not suspend. > > >>>> Hmm. > >>>> I suspect that your user space does something that fails during the pm-suspend. > >>> > >>> Are you saying that the patch might be O.K., but reveals > >>> and issue with pm-suspend that was always there? > >> > >> Or it breaks something that pm-suspend does before suspending. > >> > >> It would be good to know what it is. :-) > > > While researching pm-utils bugs, I found reference to > > /var/log/pm-suspend.log, which I had not noticed before. > > Relevant extract attached. > > > It is not clear to me why that echo line (there is only one) > > would fail. > > The echo line fails because the related CPU is offline. > If the failed echo is the last pass through the loop, > then the script interprets the overall execution of > 94cpufreq as a failure and aborts the suspend. That's correct AFAICS. > If the failed echo is not the last pass through the loop, then > the bad exit code gets overwritten with a good one before > the loop exits. Right. > Since the loop is merely setting a temporary governor, > to test I just used performance mode anyway, and commented > out the echo. pm-suspend with CPU 7 offline then worked fine. > > I have not yet gone back to any before the patch kernel > to determine why it used to work (it is late in my time zone). > However, I would have to assume that before the commit in > question, the echo worked even if the CPU was offline. It didn't have to, because the cpudreq directory was not present, so the [ -L "$x/cpufreq" ] && continue line would trigger. > Could someone please confirm or deny the above conclusion. > > The relevant code segment, with some added debug > echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq > > hibernate_cpufreq() > { > ( cd /sys/devices/system/cpu/ > for x in cpu[0-9]*; do > # if cpufreq is a symlink, it is handled by another cpu. Skip. > [ -L "$x/cpufreq" ] && continue > gov="$x/cpufreq/scaling_governor" > # if we do not have a scaling_governor file, skip. > [ -f "$gov" ] || continue > # if our temporary governor is not available, skip. > grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ > "$x/cpufreq/scaling_available_governors" || continue > savestate "${x}_governor" < "$gov" > # I added the next 3 lines > echo "$x" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > echo "$gov" > # For a test, do not do the echo, as I already set performance mode > # echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" > done ) > } Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-07 13:07 ` Rafael J. Wysocki @ 2015-09-07 14:03 ` Doug Smythies 2015-09-07 20:35 ` Rafael J. Wysocki 0 siblings, 1 reply; 24+ messages in thread From: Doug Smythies @ 2015-09-07 14:03 UTC (permalink / raw) To: 'Rafael J. Wysocki', 'Viresh Kumar' Cc: 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm To wrap this up, I was thinking to file a bug report on the pm-utils bug system and then to file a bug against the distribution that I use (Ubuntu Server), linking to the upstream bug report. I don't have it working correctly yet, but I was hoping to suggest a fix with the bug reports. Something like (still has all my debug stuff also): hibernate_cpufreq() { ( cd /sys/devices/system/cpu/ for x in cpu[0-9]*; do # if cpufreq is a symlink, it is handled by another cpu. Skip. [ -L "$x/cpufreq" ] && continue gov="$x/cpufreq/scaling_governor" # if we do not have a scaling_governor file, skip. [ -f "$gov" ] || continue echo "before $x online check" + # if the CPU is offline, skip, unless no file, i.e. CPU0. + [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue Or + if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then + continue; + fi Or something similar that actually works. echo "after $x online check" # if our temporary governor is not available, skip. grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ "$x/cpufreq/scaling_available_governors" || continue savestate "${x}_governor" < "$gov" echo "$x" echo "$TEMPORARY_CPUFREQ_GOVERNOR" echo "$gov" echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" done ) } With the proposed fix not dependent on CPU0 at all, just the condition that if the file exists, that the CPU be online, and if it doesn't exist then assume the CPU is online. As you both pointed out, there is a previous check and skip for the no governor or no CPU condition or older kernel conditions. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-07 14:03 ` Doug Smythies @ 2015-09-07 20:35 ` Rafael J. Wysocki 0 siblings, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2015-09-07 20:35 UTC (permalink / raw) To: Doug Smythies Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On Monday, September 07, 2015 07:03:16 AM Doug Smythies wrote: > To wrap this up, I was thinking to file a bug report > on the pm-utils bug system and then to file a bug against > the distribution that I use (Ubuntu Server), linking to the > upstream bug report. > > I don't have it working correctly yet, but I was hoping > to suggest a fix with the bug reports. > > Something like (still has all my debug stuff also): > > hibernate_cpufreq() > { > ( cd /sys/devices/system/cpu/ > for x in cpu[0-9]*; do > # if cpufreq is a symlink, it is handled by another cpu. Skip. > [ -L "$x/cpufreq" ] && continue > gov="$x/cpufreq/scaling_governor" > # if we do not have a scaling_governor file, skip. > [ -f "$gov" ] || continue > echo "before $x online check" > > + # if the CPU is offline, skip, unless no file, i.e. CPU0. > + [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue > Or > + if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then > + continue; > + fi > Or something similar that actually works. > > echo "after $x online check" > # if our temporary governor is not available, skip. > grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \ > "$x/cpufreq/scaling_available_governors" || continue > savestate "${x}_governor" < "$gov" > echo "$x" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > echo "$gov" > echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov" > done ) > } > > With the proposed fix not dependent on CPU0 at all, just the condition that if > the file exists, that the CPU be online, and if it doesn't exist then assume the > CPU is online. As you both pointed out, there is a previous check and skip for > the no governor or no CPU condition or older kernel conditions. I guess it also would work if you added return 0 at the end of hibernate_cpufreq(). Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] 2015-09-04 14:59 ` Rafael J. Wysocki 2015-09-04 14:42 ` Viresh Kumar @ 2015-09-04 15:26 ` Doug Smythies 1 sibling, 0 replies; 24+ messages in thread From: Doug Smythies @ 2015-09-04 15:26 UTC (permalink / raw) To: 'Rafael J. Wysocki' Cc: 'Viresh Kumar', 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm On 2015.09.04 08:00 Rafael J. Wysocki wrote: > On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote: >> As of, or about, Kernel 4.2RC1 if I take my highest numbered >> CPU offline (7 in my case), the system will not suspend. > Does "will not suspend" mean that the suspend will fail with an error > or will it crash or hang or something else? Something else: Nothing at all happens, no error, no crash, no hang, no log entries (at least that I have been able to find). I use the "sudo pm-suspend" command, and it is as if it is a no-op. doug@s15:/sys/devices/system/cpu$ cat /sys/devices/system/cpu/cpu*/online 1 1 1 1 1 1 0 Someone suggested to me that I have to take the core offline, not just the 1 of 2 CPUs on the same core. However it makes no difference, and CPU 3 offline by itself works fine. >> The issue persists through Kernel 4.2. >> This is on my test computer with an i7-2600K. >> I do not normally use suspend on this computer, >> but was doing so while working on a bug report. >> >> The kernel was bisected, and the result was: >> >> $ git bisect bad >> 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit >> commit 87549141d516aee71d511138e27117c41e8aef68 >> Author: Viresh Kumar <viresh.kumar@linaro.org> >> Date: Wed Jun 10 02:13:21 2015 +0200 >> >> cpufreq: Stop migrating sysfs files on hotplug >> >> See also several e-mails with the above subject line >> between June 8th and 10th. >> >> With any other combination of taking CPUs offline, >> not including CPU 7, suspend seems to work properly. > Well, we'll need to debug it some. > Can you please check what's in the > /sys/devices/system/cpu/cpuX/cpufreq/related_cpus > files for all CPUs on your system? doug@s15:/sys/devices/system/cpu$ cat cpu?/cpufreq/related_cpus 0 1 2 3 4 5 6 7 ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2015-10-13 19:23 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-03 21:40 System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] Doug Smythies 2015-09-04 14:59 ` Rafael J. Wysocki 2015-09-04 14:42 ` Viresh Kumar 2015-09-04 18:41 ` Doug Smythies 2015-09-04 22:26 ` Rafael J. Wysocki 2015-09-04 23:05 ` Doug Smythies 2015-09-05 0:22 ` Rafael J. Wysocki 2015-09-05 1:41 ` Rafael J. Wysocki 2015-09-05 2:34 ` Doug Smythies 2015-09-05 7:46 ` Doug Smythies 2015-09-05 8:14 ` Viresh Kumar 2015-09-07 13:32 ` Rafael J. Wysocki 2015-09-08 2:40 ` Viresh Kumar 2015-09-11 20:43 ` Saravana Kannan 2015-09-11 21:30 ` Rafael J. Wysocki 2015-09-11 22:07 ` Saravana Kannan 2015-10-11 9:47 ` Viresh Kumar 2015-10-12 19:43 ` Saravana Kannan 2015-10-13 3:47 ` Viresh Kumar 2015-10-13 19:23 ` Saravana Kannan 2015-09-07 13:07 ` Rafael J. Wysocki 2015-09-07 14:03 ` Doug Smythies 2015-09-07 20:35 ` Rafael J. Wysocki 2015-09-04 15:26 ` Doug Smythies
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).