System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
@ 2015-09-03 21:40 Doug Smythies
  2015-09-04 14:59 ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Smythies @ 2015-09-03 21:40 UTC (permalink / raw)
  To: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan'
  Cc: Doug Smythies, linux-pm

As of, or about, Kernel 4.2RC1 if I take my highest numbered
CPU offline (7 in my case), the system will not suspend.
The issue persists through Kernel 4.2.
This is on my test computer with an i7-2600K.
I do not normally use suspend on this computer,
but was doing so while working on a bug report.

The kernel was bisected, and the result was:

$ git bisect bad
87549141d516aee71d511138e27117c41e8aef68 is the first bad commit
commit 87549141d516aee71d511138e27117c41e8aef68
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Wed Jun 10 02:13:21 2015 +0200

cpufreq: Stop migrating sysfs files on hotplug

See also several e-mails with the above subject line
between June 8th and 10th.

With any other combination of taking CPUs offline,
not including CPU 7, suspend seems to work properly.

Since I sometimes mess up using git bisect, and end
up at some random result, the conclusion was double
checked manually:

87549141d516aee71d511138e27117c41e8aef68 has the issue.
11e584cfb8a9d2226151fd39bfa74d09e575f72d (the previous commit) does not have the issue.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-03 21:40 System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] Doug Smythies
@ 2015-09-04 14:59 ` Rafael J. Wysocki
  2015-09-04 14:42   ` Viresh Kumar
  2015-09-04 15:26   ` Doug Smythies
  0 siblings, 2 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-04 14:59 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
> As of, or about, Kernel 4.2RC1 if I take my highest numbered
> CPU offline (7 in my case), the system will not suspend.

Does "will not suspend" mean that the suspend will fail with an error
or will it crash or hang or something else?

> The issue persists through Kernel 4.2.
> This is on my test computer with an i7-2600K.
> I do not normally use suspend on this computer,
> but was doing so while working on a bug report.
> 
> The kernel was bisected, and the result was:
> 
> $ git bisect bad
> 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit
> commit 87549141d516aee71d511138e27117c41e8aef68
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Wed Jun 10 02:13:21 2015 +0200
> 
> cpufreq: Stop migrating sysfs files on hotplug
> 
> See also several e-mails with the above subject line
> between June 8th and 10th.
> 
> With any other combination of taking CPUs offline,
> not including CPU 7, suspend seems to work properly.

Well, we'll need to debug it some.

Can you please check what's in the

/sys/devices/system/cpu/cpuX/cpufreq/related_cpus

files for all CPUs on your system?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 14:59 ` Rafael J. Wysocki
@ 2015-09-04 14:42   ` Viresh Kumar
  2015-09-04 18:41     ` Doug Smythies
  2015-09-04 15:26   ` Doug Smythies
  1 sibling, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-09-04 14:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Doug Smythies, 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 04-09-15, 16:59, Rafael J. Wysocki wrote:
> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
> > As of, or about, Kernel 4.2RC1 if I take my highest numbered
> > CPU offline (7 in my case), the system will not suspend.
> 
> Does "will not suspend" mean that the suspend will fail with an error
> or will it crash or hang or something else?
> 
> > The issue persists through Kernel 4.2.
> > This is on my test computer with an i7-2600K.
> > I do not normally use suspend on this computer,
> > but was doing so while working on a bug report.
> > 
> > The kernel was bisected, and the result was:
> > 
> > $ git bisect bad
> > 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit
> > commit 87549141d516aee71d511138e27117c41e8aef68
> > Author: Viresh Kumar <viresh.kumar@linaro.org>
> > Date:   Wed Jun 10 02:13:21 2015 +0200
> > 
> > cpufreq: Stop migrating sysfs files on hotplug
> > 
> > See also several e-mails with the above subject line
> > between June 8th and 10th.
> > 
> > With any other combination of taking CPUs offline,
> > not including CPU 7, suspend seems to work properly.
> 
> Well, we'll need to debug it some.
> 
> Can you please check what's in the
> 
> /sys/devices/system/cpu/cpuX/cpufreq/related_cpus
> 
> files for all CPUs on your system?

I wanted to give him some patch to debug it a bit more, but couldn't
do that whole day.

@Doug: Can you please enable DEBUG for cpufreq with this:

diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 9fde14544ead..c09945aa7f17 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -1,3 +1,4 @@
+subdir-ccflags-y  := -DDEBUG
 # CPUfreq core
 obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o


And give us the outputs of both successful and unsuccessful logs?

+ the values of both affected_cpus and related_cpus fields for all
CPUs.

-- 
viresh

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 14:42   ` Viresh Kumar
@ 2015-09-04 18:41     ` Doug Smythies
  2015-09-04 22:26       ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Smythies @ 2015-09-04 18:41 UTC (permalink / raw)
  To: 'Viresh Kumar', 'Rafael J. Wysocki'
  Cc: 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm

[-- Attachment #1: Type: text/plain, Size: 1643 bytes --]

On 2015.09.04 07:43 Viresh Kumar wrote:
> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>> CPU offline (7 in my case), the system will not suspend.

> I wanted to give him some patch to debug it a bit more, but couldn't
> do that whole day.

> @Doug: Can you please enable DEBUG for cpufreq with this:
>
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 9fde14544ead..c09945aa7f17 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -1,3 +1,4 @@
> +subdir-ccflags-y  := -DDEBUG
> # CPUfreq core
> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>
>
> And give us the outputs of both successful and unsuccessful logs?

Edited /var/log/kern.log attached (might get stripped for
on-list e-mail deliveries)

> + the values of both affected_cpus and related_cpus fields for all
> CPUs.

Step 1: CPU 6 offline (sudo pm-suspend works):

root@s15:/home/doug# echo -n 0 > /sys/devices/system/cpu/cpu6/online
root@s15:/home/doug# cat /sys/devices/system/cpu/cpu*/online
1
1
1
1
1
0
1

root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus
0
1
2
3
4
5

7

root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus
0
1
2
3
4
5
6
7

Step 2: CPU 7 offline (sudo pm-suspend does not work):

root@s15:/sys/devices/system/cpu# cat /sys/devices/system/cpu/cpu*/online
1
1
1
1
1
1
0
root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/affected_cpus
0
1
2
3
4
5
6

root@s15:/sys/devices/system/cpu# cat cpu?/cpufreq/related_cpus
0
1
2
3
4
5
6
7


[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 21696 bytes --]

>>>>>> Smythies 2015.09.04 Edited /var/log/kern.log file for Virseh, with Makefile modified.
>>>>>> Linux s15 4.2.0viresh #46 SMP Fri Sep 4 09:00:41 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux

>>>>>> Take CPU 6 offline.
Sep  4 09:21:02 s15 kernel: [  145.256813] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 6
Sep  4 09:21:02 s15 kernel: [  145.256819] intel_pstate: CPU 6 exiting
Sep  4 09:21:02 s15 kernel: [  145.274158] smpboot: CPU 6 is now offline

>>>>>> Do a "sudo pm-suspend" that will work properly. Subsequently turn computer on again.
Sep  4 09:26:11 s15 kernel: [  454.524941] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.524948] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.524949] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.526917] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.526923] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.526924] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.528952] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.528958] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.528959] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.530887] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.530892] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.530893] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.532526] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.532531] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.532533] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.534520] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.534525] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.534527] cpufreq: setting range
Sep  4 09:26:11 s15 kernel: [  454.537764] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.537766] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:11 s15 kernel: [  454.537767] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  454.675889] PM: Syncing filesystems ... done.
Sep  4 09:26:28 s15 kernel: [  454.782119] PM: Preparing system for sleep (mem)
Sep  4 09:26:28 s15 kernel: [  454.782272] Freezing user space processes ... (elapsed 0.001 seconds) done.
Sep  4 09:26:28 s15 kernel: [  454.783462] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Sep  4 09:26:28 s15 kernel: [  454.784612] PM: Suspending system (mem)
Sep  4 09:26:28 s15 kernel: [  454.784628] Suspending console(s) (use no_console_suspend to debug)
... deleted some lines ...
Sep  4 09:26:28 s15 kernel: [  456.490899] PM: suspend of devices complete after 1707.250 msecs
Sep  4 09:26:28 s15 kernel: [  456.506867] PM: late suspend of devices complete after 15.976 msecs
Sep  4 09:26:28 s15 kernel: [  456.507316] pcieport 0000:00:01.0: System wakeup enabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.507328] xhci_hcd 0000:07:00.0: System wakeup enabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.507374] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.507375] r8169 0000:03:00.0: System wakeup enabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.507527] ehci-pci 0000:00:1a.0: System wakeup enabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.522894] PM: noirq suspend of devices complete after 16.036 msecs
Sep  4 09:26:28 s15 kernel: [  456.523192] ACPI: Preparing to enter system sleep state S3
Sep  4 09:26:28 s15 kernel: [  456.523380] PM: Saving platform NVS memory
Sep  4 09:26:28 s15 kernel: [  456.523389] Disabling non-boot CPUs ...
Sep  4 09:26:28 s15 kernel: [  456.523415] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 1
Sep  4 09:26:28 s15 kernel: [  456.523416] intel_pstate: CPU 1 exiting
Sep  4 09:26:28 s15 kernel: [  456.524616] smpboot: CPU 1 is now offline
Sep  4 09:26:28 s15 kernel: [  456.547041] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 2
Sep  4 09:26:28 s15 kernel: [  456.547043] intel_pstate: CPU 2 exiting
Sep  4 09:26:28 s15 kernel: [  456.547173] Broke affinity for irq 19
Sep  4 09:26:28 s15 kernel: [  456.548215] smpboot: CPU 2 is now offline
Sep  4 09:26:28 s15 kernel: [  456.567001] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 3
Sep  4 09:26:28 s15 kernel: [  456.567003] intel_pstate: CPU 3 exiting
Sep  4 09:26:28 s15 kernel: [  456.567129] Broke affinity for irq 19
Sep  4 09:26:28 s15 kernel: [  456.567163] Broke affinity for irq 33
Sep  4 09:26:28 s15 kernel: [  456.568172] smpboot: CPU 3 is now offline
Sep  4 09:26:28 s15 kernel: [  456.582960] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 4
Sep  4 09:26:28 s15 kernel: [  456.582961] intel_pstate: CPU 4 exiting
Sep  4 09:26:28 s15 kernel: [  456.583069] Broke affinity for irq 19
Sep  4 09:26:28 s15 kernel: [  456.583104] Broke affinity for irq 33
Sep  4 09:26:28 s15 kernel: [  456.584113] smpboot: CPU 4 is now offline
Sep  4 09:26:28 s15 kernel: [  456.598918] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 5
Sep  4 09:26:28 s15 kernel: [  456.598920] intel_pstate: CPU 5 exiting
Sep  4 09:26:28 s15 kernel: [  456.599016] Broke affinity for irq 19
Sep  4 09:26:28 s15 kernel: [  456.599023] Broke affinity for irq 25
Sep  4 09:26:28 s15 kernel: [  456.599049] Broke affinity for irq 33
Sep  4 09:26:28 s15 kernel: [  456.600058] smpboot: CPU 5 is now offline
Sep  4 09:26:28 s15 kernel: [  456.614896] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7
Sep  4 09:26:28 s15 kernel: [  456.614898] intel_pstate: CPU 7 exiting
Sep  4 09:26:28 s15 kernel: [  456.614988] Broke affinity for irq 19
Sep  4 09:26:28 s15 kernel: [  456.614991] Broke affinity for irq 23
Sep  4 09:26:28 s15 kernel: [  456.614995] Broke affinity for irq 25
Sep  4 09:26:28 s15 kernel: [  456.615021] Broke affinity for irq 33
Sep  4 09:26:28 s15 kernel: [  456.616030] smpboot: CPU 7 is now offline
Sep  4 09:26:28 s15 kernel: [  456.631985] ACPI: Low-level resume complete
Sep  4 09:26:28 s15 kernel: [  456.632020] PM: Restoring platform NVS memory
Sep  4 09:26:28 s15 kernel: [  456.632338] Enabling non-boot CPUs ...
Sep  4 09:26:28 s15 kernel: [  456.632374] x86: Booting SMP configuration:
Sep  4 09:26:28 s15 kernel: [  456.632374] smpboot: Booting Node 0 Processor 1 APIC 0x2
Sep  4 09:26:28 s15 kernel: [  456.644260]  cache: parent cpu1 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.644318] cpufreq: adding CPU 1
Sep  4 09:26:28 s15 kernel: [  456.644323] intel_pstate: controlling: cpu 1
Sep  4 09:26:28 s15 kernel: [  456.644325] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.644327] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.644327] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.644328] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.644364] CPU1 is up
Sep  4 09:26:28 s15 kernel: [  456.644381] smpboot: Booting Node 0 Processor 2 APIC 0x4
Sep  4 09:26:28 s15 kernel: [  456.652292]  cache: parent cpu2 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.652348] cpufreq: adding CPU 2
Sep  4 09:26:28 s15 kernel: [  456.652352] intel_pstate: controlling: cpu 2
Sep  4 09:26:28 s15 kernel: [  456.652355] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.652356] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.652357] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.652357] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.652390] CPU2 is up
Sep  4 09:26:28 s15 kernel: [  456.652406] smpboot: Booting Node 0 Processor 3 APIC 0x6
Sep  4 09:26:28 s15 kernel: [  456.660308]  cache: parent cpu3 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.660366] cpufreq: adding CPU 3
Sep  4 09:26:28 s15 kernel: [  456.660370] intel_pstate: controlling: cpu 3
Sep  4 09:26:28 s15 kernel: [  456.660373] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.660374] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.660375] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.660375] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.660409] CPU3 is up
Sep  4 09:26:28 s15 kernel: [  456.660426] smpboot: Booting Node 0 Processor 4 APIC 0x1
Sep  4 09:26:28 s15 kernel: [  456.668263]  cache: parent cpu4 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.668300] cpufreq: adding CPU 4
Sep  4 09:26:28 s15 kernel: [  456.668304] intel_pstate: controlling: cpu 4
Sep  4 09:26:28 s15 kernel: [  456.668307] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.668308] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.668308] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.668309] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.668336] CPU4 is up
Sep  4 09:26:28 s15 kernel: [  456.668349] smpboot: Booting Node 0 Processor 5 APIC 0x3
Sep  4 09:26:28 s15 kernel: [  456.680270]  cache: parent cpu5 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.680307] cpufreq: adding CPU 5
Sep  4 09:26:28 s15 kernel: [  456.680311] intel_pstate: controlling: cpu 5
Sep  4 09:26:28 s15 kernel: [  456.680313] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.680314] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.680314] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.680315] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.680341] CPU5 is up
Sep  4 09:26:28 s15 kernel: [  456.680355] smpboot: Booting Node 0 Processor 7 APIC 0x7
Sep  4 09:26:28 s15 kernel: [  456.688288]  cache: parent cpu7 should not be sleeping
Sep  4 09:26:28 s15 kernel: [  456.688326] cpufreq: adding CPU 7
Sep  4 09:26:28 s15 kernel: [  456.688330] intel_pstate: controlling: cpu 7
Sep  4 09:26:28 s15 kernel: [  456.688333] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.688333] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:28 s15 kernel: [  456.688334] cpufreq: setting range
Sep  4 09:26:28 s15 kernel: [  456.688334] cpufreq: initialization complete
Sep  4 09:26:28 s15 kernel: [  456.688360] CPU7 is up
Sep  4 09:26:28 s15 kernel: [  456.693743] ACPI: Waking up from system sleep state S3
Sep  4 09:26:28 s15 kernel: [  456.708292] ehci-pci 0000:00:1a.0: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.708418] xhci_hcd 0000:07:00.0: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.708583] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.708607] PM: noirq resume of devices complete after 14.593 msecs
Sep  4 09:26:28 s15 kernel: [  456.708929] PM: early resume of devices complete after 0.295 msecs
Sep  4 09:26:28 s15 kernel: [  456.709055] pcieport 0000:00:01.0: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.709059] r8169 0000:03:00.0: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.709106] rtc_cmos 00:02: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.709123] usb usb1: root hub lost power or was reset
Sep  4 09:26:28 s15 kernel: [  456.709124] usb usb2: root hub lost power or was reset
Sep  4 09:26:28 s15 kernel: [  456.709537] parport_pc 00:05: activated
Sep  4 09:26:28 s15 kernel: [  456.709572] i8042 kbd 00:07: System wakeup disabled by ACPI
Sep  4 09:26:28 s15 kernel: [  456.710359] serial 00:08: activated
Sep  4 09:26:28 s15 kernel: [  456.713382] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
Sep  4 09:26:28 s15 kernel: [  456.713386] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV0._GTF] (Node ffff88040ecc8438), AE_NOT_FOUND (20150619/psparse-536)
Sep  4 09:26:28 s15 kernel: [  456.713407] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
Sep  4 09:26:28 s15 kernel: [  456.713410] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.CHN0.DRV1._GTF] (Node ffff88040ecc84b0), AE_NOT_FOUND (20150619/psparse-536)
Sep  4 09:26:28 s15 kernel: [  456.716059] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
Sep  4 09:26:28 s15 kernel: [  456.716068] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536)
Sep  4 09:26:28 s15 kernel: [  456.718140] sd 0:0:0:0: [sda] Starting disk
Sep  4 09:26:28 s15 kernel: [  456.718142] sd 0:0:1:0: [sdb] Starting disk
Sep  4 09:26:28 s15 kernel: [  456.830264] r8169 0000:03:00.0 eth0: link down
Sep  4 09:26:28 s15 kernel: [  456.830272] br0: port 1(eth0) entered disabled state
Sep  4 09:26:28 s15 kernel: [  457.035873] ata5: SATA link down (SStatus 0 SControl 300)
Sep  4 09:26:28 s15 kernel: [  457.035921] ata6: SATA link down (SStatus 0 SControl 300)
Sep  4 09:26:28 s15 kernel: [  457.191764] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 330)
Sep  4 09:26:28 s15 kernel: [  457.195886] usb 1-2: reset low-speed USB device number 2 using xhci_hcd
Sep  4 09:26:28 s15 kernel: [  457.199927] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
Sep  4 09:26:28 s15 kernel: [  457.199930] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0.DRV0._GTF] (Node ffff88040ecc87d0), AE_NOT_FOUND (20150619/psparse-536)
Sep  4 09:26:28 s15 kernel: [  457.215848] ata3.00: configured for UDMA/100
Sep  4 09:26:28 s15 kernel: [  457.473072] usb 1-2: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
Sep  4 09:26:28 s15 kernel: [  457.476193] PM: resume of devices complete after 767.759 msecs
Sep  4 09:26:28 s15 kernel: [  457.476362] PM: Finishing wakeup.
Sep  4 09:26:28 s15 kernel: [  457.476363] Restarting tasks ... done.
... deleted some lines...
Sep  4 09:26:36 s15 kernel: [  464.735231] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.735235] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.735236] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.735678] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.735680] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.735681] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.736112] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736114] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736115] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.736547] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736549] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736550] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.736968] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736971] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.736972] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.737388] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.737390] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.737391] cpufreq: setting range
Sep  4 09:26:36 s15 kernel: [  464.738325] cpufreq: setting new policy for CPU 7: 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.738328] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:26:36 s15 kernel: [  464.738329] cpufreq: setting range
Sep  4 09:26:39 s15 kernel: [  468.184616] br0: port 1(eth0) entered forwarding state

>>>>>> Now put CPU 6 back online and take CPU 7 offline.
Sep  4 09:30:52 s15 kernel: [  720.581828] smpboot: Booting Node 0 Processor 6 APIC 0x5
Sep  4 09:30:52 s15 kernel: [  720.602104] cpufreq: adding CPU 6
Sep  4 09:30:52 s15 kernel: [  720.602142] intel_pstate: controlling: cpu 6
Sep  4 09:30:52 s15 kernel: [  720.602147] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz
Sep  4 09:30:52 s15 kernel: [  720.602150] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:30:52 s15 kernel: [  720.602151] cpufreq: setting range
Sep  4 09:30:52 s15 kernel: [  720.602153] cpufreq: initialization complete
Sep  4 09:31:03 s15 kernel: [  732.221929] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7
Sep  4 09:31:03 s15 kernel: [  732.221932] intel_pstate: CPU 7 exiting
Sep  4 09:31:03 s15 kernel: [  732.235076] smpboot: CPU 7 is now offline

>>>>>> about to to "sudo pm-suspend" that will fail.
Sep  4 09:32:43 s15 kernel: [  831.613558] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.613562] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.613563] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.614373] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.614375] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.614376] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.615165] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.615166] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.615167] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.615956] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.615957] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.615958] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.616740] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.616741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.616742] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.617594] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.617596] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.617596] cpufreq: setting range
Sep  4 09:32:43 s15 kernel: [  831.618386] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.618387] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 09:32:43 s15 kernel: [  831.618388] cpufreq: setting range

>>>>>> During write up, realize that the last entries above must have been during the "sudo pm-suspend" that fails.
>>>>>> Verify that these log entries are all that occur for the failed "sudo pm-suspend" by doing another:
Sep  4 11:15:38 s15 kernel: [ 7002.787685] cpufreq: setting new policy for CPU 0: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.787689] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.787690] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.788544] cpufreq: setting new policy for CPU 1: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.788546] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.788547] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.789389] cpufreq: setting new policy for CPU 2: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.789391] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.790210] cpufreq: setting new policy for CPU 3: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.790212] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.790213] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.790992] cpufreq: setting new policy for CPU 4: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.790994] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.791891] cpufreq: setting new policy for CPU 5: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.791893] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.791894] cpufreq: setting range
Sep  4 11:15:38 s15 kernel: [ 7002.792739] cpufreq: setting new policy for CPU 6: 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: new min and max freqs are 1600000 - 3800000 kHz
Sep  4 11:15:38 s15 kernel: [ 7002.792741] cpufreq: setting range

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 18:41     ` Doug Smythies
@ 2015-09-04 22:26       ` Rafael J. Wysocki
  2015-09-04 23:05         ` Doug Smythies
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-04 22:26 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Viresh Kumar, Rafael J. Wysocki, Saravana Kannan,
	linux-pm@vger.kernel.org

On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
> On 2015.09.04 07:43 Viresh Kumar wrote:
>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>>> CPU offline (7 in my case), the system will not suspend.
>
>> I wanted to give him some patch to debug it a bit more, but couldn't
>> do that whole day.
>
>> @Doug: Can you please enable DEBUG for cpufreq with this:
>>
>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>> index 9fde14544ead..c09945aa7f17 100644
>> --- a/drivers/cpufreq/Makefile
>> +++ b/drivers/cpufreq/Makefile
>> @@ -1,3 +1,4 @@
>> +subdir-ccflags-y  := -DDEBUG
>> # CPUfreq core
>> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>>
>>
>> And give us the outputs of both successful and unsuccessful logs?
>
> Edited /var/log/kern.log attached (might get stripped for
> on-list e-mail deliveries)

Hmm.

I suspect that your user space does something that fails during the pm-suspend.

Instead of invoking the pm-suspend command, can you simply do (as root)

# echo mem > /sys/power/state

and see if that behaves in the same way?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 22:26       ` Rafael J. Wysocki
@ 2015-09-04 23:05         ` Doug Smythies
  2015-09-05  0:22           ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Smythies @ 2015-09-04 23:05 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 2015.09.04 15:26 Rafael J. Wysocki wrote:
> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2015.09.04 07:43 Viresh Kumar wrote:
>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>>>> CPU offline (7 in my case), the system will not suspend.
>
>>> @Doug: Can you please enable DEBUG for cpufreq with this:
>>>
>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>>> index 9fde14544ead..c09945aa7f17 100644
>>> --- a/drivers/cpufreq/Makefile
>>> +++ b/drivers/cpufreq/Makefile
>>> @@ -1,3 +1,4 @@
>>> +subdir-ccflags-y  := -DDEBUG
>>> # CPUfreq core
>>> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>>>
>>>
>>> And give us the outputs of both successful and unsuccessful logs?
>>
>> Edited /var/log/kern.log attached (might get stripped for
>> on-list e-mail deliveries)

> Hmm.
> I suspect that your user space does something that fails during the pm-suspend.

Are you saying that the patch might be O.K., but reveals
and issue with pm-suspend that was always there?

> Instead of invoking the pm-suspend command, can you simply do (as root)
> # echo mem > /sys/power/state
> and see if that behaves in the same way?

With CPU 7 offline, that method seems to suspend just fine.
I did not check any other scenarios.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 23:05         ` Doug Smythies
@ 2015-09-05  0:22           ` Rafael J. Wysocki
  2015-09-05  1:41             ` Rafael J. Wysocki
  2015-09-05  2:34             ` Doug Smythies
  0 siblings, 2 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-05  0:22 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Viresh Kumar, Rafael J. Wysocki,
	Saravana Kannan, linux-pm@vger.kernel.org

On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote:
> On 2015.09.04 15:26 Rafael J. Wysocki wrote:
>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>> On 2015.09.04 07:43 Viresh Kumar wrote:
>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>>>>> CPU offline (7 in my case), the system will not suspend.
>>
>>>> @Doug: Can you please enable DEBUG for cpufreq with this:
>>>>
>>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>>>> index 9fde14544ead..c09945aa7f17 100644
>>>> --- a/drivers/cpufreq/Makefile
>>>> +++ b/drivers/cpufreq/Makefile
>>>> @@ -1,3 +1,4 @@
>>>> +subdir-ccflags-y  := -DDEBUG
>>>> # CPUfreq core
>>>> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>>>>
>>>>
>>>> And give us the outputs of both successful and unsuccessful logs?
>>>
>>> Edited /var/log/kern.log attached (might get stripped for
>>> on-list e-mail deliveries)
>
>> Hmm.
>> I suspect that your user space does something that fails during the pm-suspend.
>
> Are you saying that the patch might be O.K., but reveals
> and issue with pm-suspend that was always there?

Or it breaks something that pm-suspend does before suspending.

It would be good to know what it is. :-)

The "setting new policy for" messages in your log are from
cpufreq_set_policy() and the last thing printed before pm-suspend
exits in the failing case is before calling
cpufreq_driver->setpolicy().

I guess we need to focus on that one, will send you a debug patch shortly.

>> Instead of invoking the pm-suspend command, can you simply do (as root)
>> # echo mem > /sys/power/state
>> and see if that behaves in the same way?
>
> With CPU 7 offline, that method seems to suspend just fine.
> I did not check any other scenarios.

OK

I'm now suspecting that the change in question might break something
in intel_pstate which causes ->setpolicy() to fail for the last online
CPU.

Can you try to offline CPU7 and try to play with min and max for CPU6?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  0:22           ` Rafael J. Wysocki
@ 2015-09-05  1:41             ` Rafael J. Wysocki
  2015-09-05  2:34             ` Doug Smythies
  1 sibling, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-05  1:41 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Viresh Kumar, Saravana Kannan,
	linux-pm@vger.kernel.org

On Saturday, September 05, 2015 02:22:39 AM Rafael J. Wysocki wrote:
> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote:
> > On 2015.09.04 15:26 Rafael J. Wysocki wrote:
> >> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
> >>> On 2015.09.04 07:43 Viresh Kumar wrote:
> >>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
> >>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
> >>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
> >>>>>> CPU offline (7 in my case), the system will not suspend.
> >>
> >>>> @Doug: Can you please enable DEBUG for cpufreq with this:
> >>>>
> >>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> >>>> index 9fde14544ead..c09945aa7f17 100644
> >>>> --- a/drivers/cpufreq/Makefile
> >>>> +++ b/drivers/cpufreq/Makefile
> >>>> @@ -1,3 +1,4 @@
> >>>> +subdir-ccflags-y  := -DDEBUG
> >>>> # CPUfreq core
> >>>> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
> >>>>
> >>>>
> >>>> And give us the outputs of both successful and unsuccessful logs?
> >>>
> >>> Edited /var/log/kern.log attached (might get stripped for
> >>> on-list e-mail deliveries)
> >
> >> Hmm.
> >> I suspect that your user space does something that fails during the pm-suspend.
> >
> > Are you saying that the patch might be O.K., but reveals
> > and issue with pm-suspend that was always there?
> 
> Or it breaks something that pm-suspend does before suspending.
> 
> It would be good to know what it is. :-)
> 
> The "setting new policy for" messages in your log are from
> cpufreq_set_policy() and the last thing printed before pm-suspend
> exits in the failing case is before calling
> cpufreq_driver->setpolicy().
> 
> I guess we need to focus on that one, will send you a debug patch shortly.

Please apply this one and see if there are any message from intel_pstate_set_policy()
in the failing case.

---
 drivers/cpufreq/intel_pstate.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -972,8 +972,10 @@ static unsigned int intel_pstate_get(uns
 
 static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 {
-	if (!policy->cpuinfo.max_freq)
+	if (!policy->cpuinfo.max_freq) {
+		pr_err("%s: failed for CPU%d\n", __func__, policy->cpu);
 		return -ENODEV;
+	}
 
 	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE &&
 	    policy->max >= policy->cpuinfo.max_freq) {


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  0:22           ` Rafael J. Wysocki
  2015-09-05  1:41             ` Rafael J. Wysocki
@ 2015-09-05  2:34             ` Doug Smythies
  2015-09-05  7:46               ` Doug Smythies
  1 sibling, 1 reply; 24+ messages in thread
From: Doug Smythies @ 2015-09-05  2:34 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

On 2015.09.04 17:23 Rafael J. Wysocki wrote:
> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote:
>> On 2015.09.04 15:26 Rafael J. Wysocki wrote:
>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>>> On 2015.09.04 07:43 Viresh Kumar wrote:
>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>>>>>> CPU offline (7 in my case), the system will not suspend.
>>>
>>>>> @Doug: Can you please enable DEBUG for cpufreq with this:
>>>>>
>>>>> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
>>>>> index 9fde14544ead..c09945aa7f17 100644
>>>>> --- a/drivers/cpufreq/Makefile
>>>>> +++ b/drivers/cpufreq/Makefile
>>>>> @@ -1,3 +1,4 @@
>>>>> +subdir-ccflags-y  := -DDEBUG
>>>>> # CPUfreq core
>>>>> obj-$(CONFIG_CPU_FREQ)                 += cpufreq.o freq_table.o
>>>>>
>>>>>
>>>>> And give us the outputs of both successful and unsuccessful logs?
>>>>
>>>> Edited /var/log/kern.log attached (might get stripped for
>>>> on-list e-mail deliveries)
>>
>>> Hmm.
>>> I suspect that your user space does something that fails during the pm-suspend.
>>
>> Are you saying that the patch might be O.K., but reveals
>> and issue with pm-suspend that was always there?
>
> Or it breaks something that pm-suspend does before suspending.
>
> It would be good to know what it is. :-)

While researching pm-utils bugs, I found reference to
/var/log/pm-suspend.log, which I had not noticed before.
Relevant extract attached.

It is not clear to me why that echo line (there is only one)
would fail.

> The "setting new policy for" messages in your log are from
> cpufreq_set_policy() and the last thing printed before pm-suspend
> exits in the failing case is before calling
> cpufreq_driver->setpolicy().
>
> I guess we need to focus on that one, will send you a debug patch shortly.
>
>>> Instead of invoking the pm-suspend command, can you simply do (as root)
>>> # echo mem > /sys/power/state
>>> and see if that behaves in the same way?
>>
>> With CPU 7 offline, that method seems to suspend just fine.
>> I did not check any other scenarios.
>
> OK
>
> I'm now suspecting that the change in question might break something
> in intel_pstate which causes ->setpolicy() to fail for the last online
> CPU.

While, by far, most of my work on this has been done using the intel_pstate
scaling driver, I have also tested using the acpi-cpufreq scaling driver,
with the same results.

> Can you try to offline CPU7 and try to play with min and max for CPU6?

Yes. I did, on a kernel with your intel_pstate.c patch from your
subsequent e-mail. I didn't notice any problem, but maybe didn't
do it correctly to manifest the issue.
small /var/log/kern.log segment attached.

... Doug

[-- Attachment #2: pm_log.txt --]
[-- Type: text/plain, Size: 1091 bytes --]

Running hook /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend:
Failed to connect to non-global ctrl_ifname: (null)  error: No such file or directory
/usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/75modules suspend suspend:
/usr/lib/pm-utils/sleep.d/75modules suspend suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/90clock suspend suspend:
/usr/lib/pm-utils/sleep.d/90clock suspend suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend:
sh: echo: I/O error
/usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: Returned exit code 1.

Fri Sep  4 18:25:00 PDT 2015: Inhibit found, will not perform suspend
Fri Sep  4 18:25:00 PDT 2015: Running hooks for resume
Running hook /usr/lib/pm-utils/sleep.d/90clock resume suspend:
/usr/lib/pm-utils/sleep.d/90clock resume suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/75modules resume suspend:
Reloaded unloaded modules.
/usr/lib/pm-utils/sleep.d/75modules resume suspend: success.

[-- Attachment #3: log_2.txt --]
[-- Type: text/plain, Size: 795 bytes --]

Sep  4 19:14:31 s15 kernel: [  115.529407] cpufreq: __cpufreq_remove_dev_prepare: unregistering CPU 7
Sep  4 19:14:31 s15 kernel: [  115.529413] intel_pstate: CPU 7 exiting
Sep  4 19:14:31 s15 kernel: [  115.542754] smpboot: CPU 7 is now offline
Sep  4 19:20:19 s15 kernel: [  463.426743] cpufreq: setting new policy for CPU 6: 1600000 - 2400000 kHz
Sep  4 19:20:19 s15 kernel: [  463.426750] cpufreq: new min and max freqs are 1600000 - 2400000 kHz
Sep  4 19:20:19 s15 kernel: [  463.426752] cpufreq: setting range
Sep  4 19:23:05 s15 kernel: [  628.598506] cpufreq: setting new policy for CPU 6: 1600000 - 2200000 kHz
Sep  4 19:23:05 s15 kernel: [  628.598511] cpufreq: new min and max freqs are 1600000 - 2200000 kHz
Sep  4 19:23:05 s15 kernel: [  628.598513] cpufreq: setting range

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  2:34             ` Doug Smythies
@ 2015-09-05  7:46               ` Doug Smythies
  2015-09-05  8:14                 ` Viresh Kumar
  2015-09-07 13:07                 ` Rafael J. Wysocki
  0 siblings, 2 replies; 24+ messages in thread
From: Doug Smythies @ 2015-09-05  7:46 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 2015.09.05 19:35 Doug Smythies wrote:
> On 2015.09.04 17:23 Rafael J. Wysocki wrote:
>> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote:
>>> On 2015.09.04 15:26 Rafael J. Wysocki wrote:
>>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
>>>>> On 2015.09.04 07:43 Viresh Kumar wrote:
>>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
>>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>>>>>>>> CPU offline (7 in my case), the system will not suspend.

>>>> Hmm.
>>>> I suspect that your user space does something that fails during the pm-suspend.
>>>
>>> Are you saying that the patch might be O.K., but reveals
>>> and issue with pm-suspend that was always there?
>>
>> Or it breaks something that pm-suspend does before suspending.
>>
>> It would be good to know what it is. :-)

> While researching pm-utils bugs, I found reference to
> /var/log/pm-suspend.log, which I had not noticed before.
> Relevant extract attached.

> It is not clear to me why that echo line (there is only one)
> would fail.

The echo line fails because the related CPU is offline.
If the failed echo is the last pass through the loop,
then the script interprets the overall execution of
94cpufreq as a failure and aborts the suspend. If the
failed echo is not the last pass through the loop, then
the bad exit code gets overwritten with a good one before
the loop exits.

Since the loop is merely setting a temporary governor,
to test I just used performance mode anyway, and commented
out the echo. pm-suspend with CPU 7 offline then worked fine.

I have not yet gone back to any before the patch kernel
to determine why it used to work (it is late in my time zone).
However, I would have to assume that before the commit in
question, the echo worked even if the CPU was offline.

Could someone please confirm or deny the above conclusion.

The relevant code segment, with some added debug
echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq 

hibernate_cpufreq()
{
  ( cd /sys/devices/system/cpu/
  for x in cpu[0-9]*; do
     # if cpufreq is a symlink, it is handled by another cpu. Skip.
     [ -L "$x/cpufreq" ] && continue
     gov="$x/cpufreq/scaling_governor"
     # if we do not have a scaling_governor file, skip.
     [ -f "$gov" ] || continue
     # if our temporary governor is not available, skip.
     grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \
     "$x/cpufreq/scaling_available_governors" || continue
     savestate "${x}_governor" < "$gov"
# I added the next 3 lines
     echo "$x"
     echo "$TEMPORARY_CPUFREQ_GOVERNOR"
     echo "$gov"
# For a test, do not do the echo, as I already set performance mode
#    echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov"
  done )
}



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  7:46               ` Doug Smythies
@ 2015-09-05  8:14                 ` Viresh Kumar
  2015-09-07 13:32                   ` Rafael J. Wysocki
  2015-09-07 13:07                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-09-05  8:14 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Rafael J. Wysocki', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 05-09-15, 00:46, Doug Smythies wrote:
> > It is not clear to me why that echo line (there is only one)
> > would fail.

To me it is clear now :)

> The echo line fails because the related CPU is offline.
> If the failed echo is the last pass through the loop,
> then the script interprets the overall execution of
> 94cpufreq as a failure and aborts the suspend. If the
> failed echo is not the last pass through the loop, then
> the bad exit code gets overwritten with a good one before
> the loop exits.
> 
> Since the loop is merely setting a temporary governor,
> to test I just used performance mode anyway, and commented
> out the echo. pm-suspend with CPU 7 offline then worked fine.
> 
> I have not yet gone back to any before the patch kernel
> to determine why it used to work (it is late in my time zone).
> However, I would have to assume that before the commit in
> question, the echo worked even if the CPU was offline.

So here is the story behind it.
- In your system all CPUs are independent, that is there are no links
  to cpufreq directory, so that check in the script is useless for
  you.
- The $COMMIT in question did a significant change. Earlier, while
  offlining the CPU, we used to remove the cpufreq directory from
  sysfs, which is not the case any more.

- So to be precise, following lines came to your rescue earlier:

# if we do not have a scaling_governor file, skip.
# [ -f "$gov" ] || continue

- But they don't after the patch, as the file and directory are
  present even if the CPU is offline.
- But because the CPU is offline, writing to those files isn't allowed
  and so the echo failed.

Solution to that is that we check for CPU offline as well in the
beginning of the script, and skip if the CPU is offline.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  8:14                 ` Viresh Kumar
@ 2015-09-07 13:32                   ` Rafael J. Wysocki
  2015-09-08  2:40                     ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-07 13:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Doug Smythies, 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On Saturday, September 05, 2015 01:44:07 PM Viresh Kumar wrote:
> On 05-09-15, 00:46, Doug Smythies wrote:
> > > It is not clear to me why that echo line (there is only one)
> > > would fail.
> 
> To me it is clear now :)
> 
> > The echo line fails because the related CPU is offline.
> > If the failed echo is the last pass through the loop,
> > then the script interprets the overall execution of
> > 94cpufreq as a failure and aborts the suspend. If the
> > failed echo is not the last pass through the loop, then
> > the bad exit code gets overwritten with a good one before
> > the loop exits.
> > 
> > Since the loop is merely setting a temporary governor,
> > to test I just used performance mode anyway, and commented
> > out the echo. pm-suspend with CPU 7 offline then worked fine.
> > 
> > I have not yet gone back to any before the patch kernel
> > to determine why it used to work (it is late in my time zone).
> > However, I would have to assume that before the commit in
> > question, the echo worked even if the CPU was offline.
> 
> So here is the story behind it.
> - In your system all CPUs are independent, that is there are no links
>   to cpufreq directory, so that check in the script is useless for
>   you.
> - The $COMMIT in question did a significant change. Earlier, while
>   offlining the CPU, we used to remove the cpufreq directory from
>   sysfs, which is not the case any more.
> 
> - So to be precise, following lines came to your rescue earlier:
> 
> # if we do not have a scaling_governor file, skip.
> # [ -f "$gov" ] || continue
> 
> - But they don't after the patch, as the file and directory are
>   present even if the CPU is offline.
> - But because the CPU is offline, writing to those files isn't allowed
>   and so the echo failed.
> 
> Solution to that is that we check for CPU offline as well in the
> beginning of the script, and skip if the CPU is offline.

That's a bug in the script.  It should discard all errors from the entire inner
loop, but it doesn't discard errors from the last iteration of it.

That said, what store() in cpufreq.c does is questionable too.

First, if policy->cpu is offline, the policy will be inactive to my eyes, so
we don't need the second check.

But if the policy is active (and policy->cpu is online), it will not generally
fail for an offline CPU.  So, if the policy applies to more than 1 CPU, you
can use any of them to manipulate it, even if one of them is offline as long
as there are any online CPUs in the set.

This isn't entirely consistent.  We should either fail store() for any offline
CPU or make the changes for offline CPUs to.  And in the particular case of
the governor, I'm wondering what will be the problem with changing last_governor
for an inactive policy?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-07 13:32                   ` Rafael J. Wysocki
@ 2015-09-08  2:40                     ` Viresh Kumar
  2015-09-11 20:43                       ` Saravana Kannan
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-09-08  2:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Doug Smythies, 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 07-09-15, 15:32, Rafael J. Wysocki wrote:
> First, if policy->cpu is offline, the policy will be inactive to my eyes, so
> we don't need the second check.

Hmm, or maybe just drop the first check.

> But if the policy is active (and policy->cpu is online), it will not generally
> fail for an offline CPU.

Right.

> So, if the policy applies to more than 1 CPU, you
> can use any of them to manipulate it, even if one of them is offline as long
> as there are any online CPUs in the set.

Right.

> This isn't entirely consistent.  We should either fail store() for any offline
> CPU

At that point we have no idea of the CPU for which the sysfs operation
is called. And so we have to go ahead without failing, if policy is
active.

> or make the changes for offline CPUs to.

What does that mean? Most of the stuff we do is for the policy, rather
than per-cpu. And if there is per-cpu stuff, then we *only* should be
doing that for the online ones.

Not sure if I understood what you meant here. :(

> And in the particular case of
> the governor, I'm wondering what will be the problem with changing last_governor
> for an inactive policy?

I don't think we should be adding special cases for updating sysfs
attributes of an inactive policy. Its not just about the last_governor
thing, but other sysfs attributes as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-08  2:40                     ` Viresh Kumar
@ 2015-09-11 20:43                       ` Saravana Kannan
  2015-09-11 21:30                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Saravana Kannan @ 2015-09-11 20:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

Sorry about the late reply and not helping out earlier. Didn't check 
this email account for sometime.

On 09/07/2015 07:40 PM, Viresh Kumar wrote:
> On 07-09-15, 15:32, Rafael J. Wysocki wrote:
>> First, if policy->cpu is offline, the policy will be inactive to my eyes, so
>> we don't need the second check.
>
> Hmm, or maybe just drop the first check.
>
>> But if the policy is active (and policy->cpu is online), it will not generally
>> fail for an offline CPU.
>
> Right.
>
>> So, if the policy applies to more than 1 CPU, you
>> can use any of them to manipulate it, even if one of them is offline as long
>> as there are any online CPUs in the set.
>
> Right.
>
>> This isn't entirely consistent.  We should either fail store() for any offline
>> CPU
>
> At that point we have no idea of the CPU for which the sysfs operation
> is called. And so we have to go ahead without failing, if policy is
> active.
>
>> or make the changes for offline CPUs to.
>
> What does that mean? Most of the stuff we do is for the policy, rather
> than per-cpu. And if there is per-cpu stuff, then we *only* should be
> doing that for the online ones.
>
> Not sure if I understood what you meant here. :(
>
>> And in the particular case of
>> the governor, I'm wondering what will be the problem with changing last_governor
>> for an inactive policy?
>
> I don't think we should be adding special cases for updating sysfs
> attributes of an inactive policy. Its not just about the last_governor
> thing, but other sysfs attributes as well.
>

The way I see it, having the cpufreq policy control sysfs "bits" under 
every CPU directory is what's causing some semantic confusion/inconsistency.

Every single node under a cpufreq folder is for policy control and not 
CPU control. But by putting the policy control bits under the cpuX 
directory, we give the wrong semantic impression that it's a per CPU 
attribute when it's really per-policy.

Ideally (in terms of semantics) we would have put all the policy control 
bits in a per policy directory under 
/sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied 
to the first CPU in related CPUs -- so that it's easy to correlate and 
also to avoid having the policy numbering being different depending on 
the order in which CPUs get hotplugged.

But we can't go about breaking userspace ABI by removing the cpufreq 
directories out of the cpu directories just because of the semantic 
confusion.

Well, we COULD still put the policy directories under cpu/cpufreq/ and 
then make every cpuX/cpufreq directory a symlink to the actual policy 
directory. But that is not going to help with this specific 
issue/discussion.

Having said all that, I still think that stores to all these sysfs files 
should work. I'm not saying it's a trivial change (like setting a 
governor's polling time, etc would need some checks to cache the value 
and not start a timer immediately, etc), but I think it's a more 
consistent and user friendly API.

If the user wants to set a min CPU freq, why should they care if the CPU 
is online at that very instant? It gets especially painful if you have a 
thermal daemon that's plugging in/out CPUs while the user or a script is 
trying to set the parameters.

Thanks,
Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-11 20:43                       ` Saravana Kannan
@ 2015-09-11 21:30                         ` Rafael J. Wysocki
  2015-09-11 22:07                           ` Saravana Kannan
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-11 21:30 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Viresh Kumar, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote:
> Sorry about the late reply and not helping out earlier. Didn't check 
> this email account for sometime.
> 
> On 09/07/2015 07:40 PM, Viresh Kumar wrote:
> > On 07-09-15, 15:32, Rafael J. Wysocki wrote:
> >> First, if policy->cpu is offline, the policy will be inactive to my eyes, so
> >> we don't need the second check.
> >
> > Hmm, or maybe just drop the first check.
> >
> >> But if the policy is active (and policy->cpu is online), it will not generally
> >> fail for an offline CPU.
> >
> > Right.
> >
> >> So, if the policy applies to more than 1 CPU, you
> >> can use any of them to manipulate it, even if one of them is offline as long
> >> as there are any online CPUs in the set.
> >
> > Right.
> >
> >> This isn't entirely consistent.  We should either fail store() for any offline
> >> CPU
> >
> > At that point we have no idea of the CPU for which the sysfs operation
> > is called. And so we have to go ahead without failing, if policy is
> > active.
> >
> >> or make the changes for offline CPUs to.
> >
> > What does that mean? Most of the stuff we do is for the policy, rather
> > than per-cpu. And if there is per-cpu stuff, then we *only* should be
> > doing that for the online ones.
> >
> > Not sure if I understood what you meant here. :(
> >
> >> And in the particular case of
> >> the governor, I'm wondering what will be the problem with changing last_governor
> >> for an inactive policy?
> >
> > I don't think we should be adding special cases for updating sysfs
> > attributes of an inactive policy. Its not just about the last_governor
> > thing, but other sysfs attributes as well.
> >
> 
> The way I see it, having the cpufreq policy control sysfs "bits" under 
> every CPU directory is what's causing some semantic confusion/inconsistency.
> 
> Every single node under a cpufreq folder is for policy control and not 
> CPU control. But by putting the policy control bits under the cpuX 
> directory, we give the wrong semantic impression that it's a per CPU 
> attribute when it's really per-policy.
> 
> Ideally (in terms of semantics) we would have put all the policy control 
> bits in a per policy directory under 
> /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied 
> to the first CPU in related CPUs -- so that it's easy to correlate and 
> also to avoid having the policy numbering being different depending on 
> the order in which CPUs get hotplugged.
> 
> But we can't go about breaking userspace ABI by removing the cpufreq 
> directories out of the cpu directories just because of the semantic 
> confusion.
> 
> Well, we COULD still put the policy directories under cpu/cpufreq/ and 
> then make every cpuX/cpufreq directory a symlink to the actual policy 
> directory. But that is not going to help with this specific 
> issue/discussion.

It isn't, but it'd be a good change in my view.

> Having said all that, I still think that stores to all these sysfs files 
> should work. I'm not saying it's a trivial change (like setting a 
> governor's polling time, etc would need some checks to cache the value 
> and not start a timer immediately, etc), but I think it's a more 
> consistent and user friendly API.

I agree.

It also is backwards compatible with scripts that walk the cpufreq directories
for all CPUs without checking the online attribute and expect things to work.

> If the user wants to set a min CPU freq, why should they care if the CPU 
> is online at that very instant? It gets especially painful if you have a 
> thermal daemon that's plugging in/out CPUs while the user or a script is 
> trying to set the parameters.

You mean putting them offline/online I suppose?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-11 21:30                         ` Rafael J. Wysocki
@ 2015-09-11 22:07                           ` Saravana Kannan
  2015-10-11  9:47                             ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Saravana Kannan @ 2015-09-11 22:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On 09/11/2015 02:30 PM, Rafael J. Wysocki wrote:
> On Friday, September 11, 2015 01:43:51 PM Saravana Kannan wrote:
>> Sorry about the late reply and not helping out earlier. Didn't check
>> this email account for sometime.
>>
>> On 09/07/2015 07:40 PM, Viresh Kumar wrote:
>>> On 07-09-15, 15:32, Rafael J. Wysocki wrote:
>>>> First, if policy->cpu is offline, the policy will be inactive to my eyes, so
>>>> we don't need the second check.
>>>
>>> Hmm, or maybe just drop the first check.
>>>
>>>> But if the policy is active (and policy->cpu is online), it will not generally
>>>> fail for an offline CPU.
>>>
>>> Right.
>>>
>>>> So, if the policy applies to more than 1 CPU, you
>>>> can use any of them to manipulate it, even if one of them is offline as long
>>>> as there are any online CPUs in the set.
>>>
>>> Right.
>>>
>>>> This isn't entirely consistent.  We should either fail store() for any offline
>>>> CPU
>>>
>>> At that point we have no idea of the CPU for which the sysfs operation
>>> is called. And so we have to go ahead without failing, if policy is
>>> active.
>>>
>>>> or make the changes for offline CPUs to.
>>>
>>> What does that mean? Most of the stuff we do is for the policy, rather
>>> than per-cpu. And if there is per-cpu stuff, then we *only* should be
>>> doing that for the online ones.
>>>
>>> Not sure if I understood what you meant here. :(
>>>
>>>> And in the particular case of
>>>> the governor, I'm wondering what will be the problem with changing last_governor
>>>> for an inactive policy?
>>>
>>> I don't think we should be adding special cases for updating sysfs
>>> attributes of an inactive policy. Its not just about the last_governor
>>> thing, but other sysfs attributes as well.
>>>
>>
>> The way I see it, having the cpufreq policy control sysfs "bits" under
>> every CPU directory is what's causing some semantic confusion/inconsistency.
>>
>> Every single node under a cpufreq folder is for policy control and not
>> CPU control. But by putting the policy control bits under the cpuX
>> directory, we give the wrong semantic impression that it's a per CPU
>> attribute when it's really per-policy.
>>
>> Ideally (in terms of semantics) we would have put all the policy control
>> bits in a per policy directory under
>> /sys/devices/system/cpu/cpufreq/policyX/ where X would/could be the tied
>> to the first CPU in related CPUs -- so that it's easy to correlate and
>> also to avoid having the policy numbering being different depending on
>> the order in which CPUs get hotplugged.
>>
>> But we can't go about breaking userspace ABI by removing the cpufreq
>> directories out of the cpu directories just because of the semantic
>> confusion.
>>
>> Well, we COULD still put the policy directories under cpu/cpufreq/ and
>> then make every cpuX/cpufreq directory a symlink to the actual policy
>> directory. But that is not going to help with this specific
>> issue/discussion.
>
> It isn't, but it'd be a good change in my view.



>> Having said all that, I still think that stores to all these sysfs files
>> should work. I'm not saying it's a trivial change (like setting a
>> governor's polling time, etc would need some checks to cache the value
>> and not start a timer immediately, etc), but I think it's a more
>> consistent and user friendly API.
>
> I agree.
>
> It also is backwards compatible with scripts that walk the cpufreq directories
> for all CPUs without checking the online attribute and expect things to work.

Good to see some support. I do know Viresh doesn't like this :)

Thinking more about it, it'll also make the code simpler since we don't 
have to decide which CPU has the real files vs which ones have symlinks. 
We probably won't need policy->cpu or kobj_cpu anymore.

I'd love to do all these changes, but I doubt I'll find the time with 
the official job responsibilities I have. We'll see.

>> If the user wants to set a min CPU freq, why should they care if the CPU
>> is online at that very instant? It gets especially painful if you have a
>> thermal daemon that's plugging in/out CPUs while the user or a script is
>> trying to set the parameters.
>
> You mean putting them offline/online I suppose?

Yup, I meant that a thermal daemon is putting them online/offline -- so 
used to using the terms plugging in/out internally since we don't have 
to deal with physical removals on mobile devices (YET?!).

It'll be quite an achievement to see a daemon actually plugging a CPU 
in/out :)

Thanks,
Saravana
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-11 22:07                           ` Saravana Kannan
@ 2015-10-11  9:47                             ` Viresh Kumar
  2015-10-12 19:43                               ` Saravana Kannan
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-10-11  9:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On 11-09-15, 15:07, Saravana Kannan wrote:
> Good to see some support. I do know Viresh doesn't like this :)

Sorry for catching up late, but looks like you guys did convince me on
this.

Let me get some patches out :)

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-10-11  9:47                             ` Viresh Kumar
@ 2015-10-12 19:43                               ` Saravana Kannan
  2015-10-13  3:47                                 ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Saravana Kannan @ 2015-10-12 19:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On 10/11/2015 02:47 AM, Viresh Kumar wrote:
> On 11-09-15, 15:07, Saravana Kannan wrote:
>> Good to see some support. I do know Viresh doesn't like this :)
>
> Sorry for catching up late, but looks like you guys did convince me on
> this.

Did I also convince you about allowing changing of parameters for 
offline CPUs? Which would also include inactive policies?

>
> Let me get some patches out :)
>

Thanks!

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-10-12 19:43                               ` Saravana Kannan
@ 2015-10-13  3:47                                 ` Viresh Kumar
  2015-10-13 19:23                                   ` Saravana Kannan
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-10-13  3:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On 12-10-15, 12:43, Saravana Kannan wrote:
> Did I also convince you about allowing changing of parameters for
> offline CPUs? Which would also include inactive policies?

Yes, but that requires some careful modification of the code as there
are various paths of the sysfs write path. And we need to see that we
don't do anything more than just updating the files. So, I left it for
now.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-10-13  3:47                                 ` Viresh Kumar
@ 2015-10-13 19:23                                   ` Saravana Kannan
  0 siblings, 0 replies; 24+ messages in thread
From: Saravana Kannan @ 2015-10-13 19:23 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Doug Smythies, 'Rafael J. Wysocki',
	linux-pm

On 10/12/2015 08:47 PM, Viresh Kumar wrote:
> On 12-10-15, 12:43, Saravana Kannan wrote:
>> Did I also convince you about allowing changing of parameters for
>> offline CPUs? Which would also include inactive policies?
>
> Yes, but that requires some careful modification of the code as there
> are various paths of the sysfs write path. And we need to see that we
> don't do anything more than just updating the files. So, I left it for
> now.
>

Agreed. It's non-trivial and we can do that separately.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-05  7:46               ` Doug Smythies
  2015-09-05  8:14                 ` Viresh Kumar
@ 2015-09-07 13:07                 ` Rafael J. Wysocki
  2015-09-07 14:03                   ` Doug Smythies
  1 sibling, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-07 13:07 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Rafael J. Wysocki', 'Viresh Kumar',
	'Saravana Kannan', linux-pm

On Saturday, September 05, 2015 12:46:40 AM Doug Smythies wrote:
> On 2015.09.05 19:35 Doug Smythies wrote:
> > On 2015.09.04 17:23 Rafael J. Wysocki wrote:
> >> On Sat, Sep 5, 2015 at 1:05 AM, Doug Smythies <dsmythies@telus.net> wrote:
> >>> On 2015.09.04 15:26 Rafael J. Wysocki wrote:
> >>>> On Fri, Sep 4, 2015 at 8:41 PM, Doug Smythies <dsmythies@telus.net> wrote:
> >>>>> On 2015.09.04 07:43 Viresh Kumar wrote:
> >>>>>> On 04-09-15, 16:59, Rafael J. Wysocki wrote:
> >>>>>>> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
> >>>>>>>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
> >>>>>>>> CPU offline (7 in my case), the system will not suspend.
> 
> >>>> Hmm.
> >>>> I suspect that your user space does something that fails during the pm-suspend.
> >>>
> >>> Are you saying that the patch might be O.K., but reveals
> >>> and issue with pm-suspend that was always there?
> >>
> >> Or it breaks something that pm-suspend does before suspending.
> >>
> >> It would be good to know what it is. :-)
> 
> > While researching pm-utils bugs, I found reference to
> > /var/log/pm-suspend.log, which I had not noticed before.
> > Relevant extract attached.
> 
> > It is not clear to me why that echo line (there is only one)
> > would fail.
> 
> The echo line fails because the related CPU is offline.
> If the failed echo is the last pass through the loop,
> then the script interprets the overall execution of
> 94cpufreq as a failure and aborts the suspend.

That's correct AFAICS.

> If the failed echo is not the last pass through the loop, then
> the bad exit code gets overwritten with a good one before
> the loop exits.

Right.

> Since the loop is merely setting a temporary governor,
> to test I just used performance mode anyway, and commented
> out the echo. pm-suspend with CPU 7 offline then worked fine.
> 
> I have not yet gone back to any before the patch kernel
> to determine why it used to work (it is late in my time zone).
> However, I would have to assume that before the commit in
> question, the echo worked even if the CPU was offline.

It didn't have to, because the cpudreq directory was not present,
so the

[ -L "$x/cpufreq" ] && continue

line would trigger.

> Could someone please confirm or deny the above conclusion.
> 
> The relevant code segment, with some added debug
> echo stuff, from /usr/lib/pm-utils/sleep.d/94cpufreq 
> 
> hibernate_cpufreq()
> {
>   ( cd /sys/devices/system/cpu/
>   for x in cpu[0-9]*; do
>      # if cpufreq is a symlink, it is handled by another cpu. Skip.
>      [ -L "$x/cpufreq" ] && continue
>      gov="$x/cpufreq/scaling_governor"
>      # if we do not have a scaling_governor file, skip.
>      [ -f "$gov" ] || continue
>      # if our temporary governor is not available, skip.
>      grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \
>      "$x/cpufreq/scaling_available_governors" || continue
>      savestate "${x}_governor" < "$gov"
> # I added the next 3 lines
>      echo "$x"
>      echo "$TEMPORARY_CPUFREQ_GOVERNOR"
>      echo "$gov"
> # For a test, do not do the echo, as I already set performance mode
> #    echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov"
>   done )
> }

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-07 13:07                 ` Rafael J. Wysocki
@ 2015-09-07 14:03                   ` Doug Smythies
  2015-09-07 20:35                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 24+ messages in thread
From: Doug Smythies @ 2015-09-07 14:03 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Viresh Kumar'
  Cc: 'Rafael J. Wysocki', 'Saravana Kannan', linux-pm

To wrap this up, I was thinking to file a bug report
on the pm-utils bug system and then to file a bug against
the distribution that I use (Ubuntu Server), linking to the
upstream bug report.

I don't have it working correctly yet, but I was hoping
to suggest a fix with the bug reports.

Something like (still has all my debug stuff also):

hibernate_cpufreq()
{
        ( cd /sys/devices/system/cpu/
        for x in cpu[0-9]*; do
                # if cpufreq is a symlink, it is handled by another cpu. Skip.
                [ -L "$x/cpufreq" ] && continue
                gov="$x/cpufreq/scaling_governor"
                # if we do not have a scaling_governor file, skip.
                [ -f "$gov" ] || continue
                echo "before $x online check"

+               # if the CPU is offline, skip, unless no file, i.e. CPU0.
+               [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue
Or
+               if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then
+                       continue;
+                fi
Or something similar that actually works.

                echo "after $x online check"
                # if our temporary governor is not available, skip.
                grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \
                        "$x/cpufreq/scaling_available_governors" || continue
                savestate "${x}_governor" < "$gov"
                echo "$x"
                echo "$TEMPORARY_CPUFREQ_GOVERNOR"
                echo "$gov"
                echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov"
        done )
}

With the proposed fix not dependent on CPU0 at all, just the condition that if
the file exists, that the CPU be online, and if it doesn't exist then assume the
CPU is online. As you both pointed out, there is a previous check and skip for
the no governor or no CPU condition or older kernel conditions.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-07 14:03                   ` Doug Smythies
@ 2015-09-07 20:35                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-09-07 20:35 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On Monday, September 07, 2015 07:03:16 AM Doug Smythies wrote:
> To wrap this up, I was thinking to file a bug report
> on the pm-utils bug system and then to file a bug against
> the distribution that I use (Ubuntu Server), linking to the
> upstream bug report.
> 
> I don't have it working correctly yet, but I was hoping
> to suggest a fix with the bug reports.
> 
> Something like (still has all my debug stuff also):
> 
> hibernate_cpufreq()
> {
>         ( cd /sys/devices/system/cpu/
>         for x in cpu[0-9]*; do
>                 # if cpufreq is a symlink, it is handled by another cpu. Skip.
>                 [ -L "$x/cpufreq" ] && continue
>                 gov="$x/cpufreq/scaling_governor"
>                 # if we do not have a scaling_governor file, skip.
>                 [ -f "$gov" ] || continue
>                 echo "before $x online check"
> 
> +               # if the CPU is offline, skip, unless no file, i.e. CPU0.
> +               [ $(cat "$x/online") = "1" -o ! -f "$x/online" ] || continue
> Or
> +               if [ $(cat "$x/online") = "1" ] || [ ! -f "$x/online" ]; then
> +                       continue;
> +                fi
> Or something similar that actually works.
> 
>                 echo "after $x online check"
>                 # if our temporary governor is not available, skip.
>                 grep -q "$TEMPORARY_CPUFREQ_GOVERNOR" \
>                         "$x/cpufreq/scaling_available_governors" || continue
>                 savestate "${x}_governor" < "$gov"
>                 echo "$x"
>                 echo "$TEMPORARY_CPUFREQ_GOVERNOR"
>                 echo "$gov"
>                 echo "$TEMPORARY_CPUFREQ_GOVERNOR" > "$gov"
>         done )
> }
> 
> With the proposed fix not dependent on CPU0 at all, just the condition that if
> the file exists, that the CPU be online, and if it doesn't exist then assume the
> CPU is online. As you both pointed out, there is a previous check and skip for
> the no governor or no CPU condition or older kernel conditions.

I guess it also would work if you added

	return 0

at the end of hibernate_cpufreq().

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED]
  2015-09-04 14:59 ` Rafael J. Wysocki
  2015-09-04 14:42   ` Viresh Kumar
@ 2015-09-04 15:26   ` Doug Smythies
  1 sibling, 0 replies; 24+ messages in thread
From: Doug Smythies @ 2015-09-04 15:26 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Viresh Kumar', 'Rafael J. Wysocki',
	'Saravana Kannan', linux-pm

On 2015.09.04 08:00 Rafael J. Wysocki wrote:
> On Thursday, September 03, 2015 02:40:43 PM Doug Smythies wrote:
>> As of, or about, Kernel 4.2RC1 if I take my highest numbered
>> CPU offline (7 in my case), the system will not suspend.

> Does "will not suspend" mean that the suspend will fail with an error
> or will it crash or hang or something else?

Something else: Nothing at all happens, no error, no crash, no hang,
no log entries (at least that I have been able to find).
I use the "sudo pm-suspend" command, and it is as if it is a no-op.

doug@s15:/sys/devices/system/cpu$ cat /sys/devices/system/cpu/cpu*/online
1
1
1
1
1
1
0

Someone suggested to me that I have to take the core offline, not just
the 1 of 2 CPUs on the same core. However it makes no difference, and
CPU 3 offline by itself works fine.

>> The issue persists through Kernel 4.2.
>> This is on my test computer with an i7-2600K.
>> I do not normally use suspend on this computer,
>> but was doing so while working on a bug report.
>> 
>> The kernel was bisected, and the result was:
>> 
>> $ git bisect bad
>> 87549141d516aee71d511138e27117c41e8aef68 is the first bad commit
>> commit 87549141d516aee71d511138e27117c41e8aef68
>> Author: Viresh Kumar <viresh.kumar@linaro.org>
>> Date:   Wed Jun 10 02:13:21 2015 +0200
>> 
>> cpufreq: Stop migrating sysfs files on hotplug
>> 
>> See also several e-mails with the above subject line
>> between June 8th and 10th.
>> 
>> With any other combination of taking CPUs offline,
>> not including CPU 7, suspend seems to work properly.

> Well, we'll need to debug it some.
> Can you please check what's in the
> /sys/devices/system/cpu/cpuX/cpufreq/related_cpus
> files for all CPUs on your system?

doug@s15:/sys/devices/system/cpu$ cat cpu?/cpufreq/related_cpus
0
1
2
3
4
5
6
7



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-10-13 19:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-03 21:40 System will not suspend with highest numbered CPU offline [REGRESSION][BISECTED] Doug Smythies
2015-09-04 14:59 ` Rafael J. Wysocki
2015-09-04 14:42   ` Viresh Kumar
2015-09-04 18:41     ` Doug Smythies
2015-09-04 22:26       ` Rafael J. Wysocki
2015-09-04 23:05         ` Doug Smythies
2015-09-05  0:22           ` Rafael J. Wysocki
2015-09-05  1:41             ` Rafael J. Wysocki
2015-09-05  2:34             ` Doug Smythies
2015-09-05  7:46               ` Doug Smythies
2015-09-05  8:14                 ` Viresh Kumar
2015-09-07 13:32                   ` Rafael J. Wysocki
2015-09-08  2:40                     ` Viresh Kumar
2015-09-11 20:43                       ` Saravana Kannan
2015-09-11 21:30                         ` Rafael J. Wysocki
2015-09-11 22:07                           ` Saravana Kannan
2015-10-11  9:47                             ` Viresh Kumar
2015-10-12 19:43                               ` Saravana Kannan
2015-10-13  3:47                                 ` Viresh Kumar
2015-10-13 19:23                                   ` Saravana Kannan
2015-09-07 13:07                 ` Rafael J. Wysocki
2015-09-07 14:03                   ` Doug Smythies
2015-09-07 20:35                     ` Rafael J. Wysocki
2015-09-04 15:26   ` Doug Smythies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).