linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression: turbostat stops working after suspend/resume cycle
@ 2015-05-17 15:48 Gabriele Mazzotta
  2015-05-18  6:45 ` Ingo Molnar
  2015-05-18  6:48 ` Regression: turbostat stops working after suspend/resume cycle\ Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Gabriele Mazzotta @ 2015-05-17 15:48 UTC (permalink / raw)
  To: juri.lelli, mingo, peterz; +Cc: linux-kernel, len.brown, andrey.semin

Hi,

I've recently noticed that if I suspend and resume my laptop, I can no
longer execute turbostat. This is what I get when I try to start it:
# turbostat
Could not migrate to CPU 1
turbostat: re-initialized with num_cpus 4
Could not migrate to CPU 1

Since everything works as expected with v4.0, I ran a bisection and
found that commit 3c18d447b3b36a8d ("sched/core: Check for available
DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.

I don't know if there's something else affected by that change, but
I can consistently reproduce the bug with turbostat.

I can provide more info if needed.

Regards,
Gabriele

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression: turbostat stops working after suspend/resume cycle
  2015-05-17 15:48 Regression: turbostat stops working after suspend/resume cycle Gabriele Mazzotta
@ 2015-05-18  6:45 ` Ingo Molnar
  2015-05-18  7:12   ` Gabriele Mazzotta
  2015-05-18  6:48 ` Regression: turbostat stops working after suspend/resume cycle\ Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2015-05-18  6:45 UTC (permalink / raw)
  To: Gabriele Mazzotta
  Cc: juri.lelli, mingo, peterz, linux-kernel, len.brown, andrey.semin


* Gabriele Mazzotta <gabriele.mzt@gmail.com> wrote:

> Hi,
> 
> I've recently noticed that if I suspend and resume my laptop, I can no
> longer execute turbostat. This is what I get when I try to start it:
> # turbostat
> Could not migrate to CPU 1
> turbostat: re-initialized with num_cpus 4
> Could not migrate to CPU 1
> 
> Since everything works as expected with v4.0, I ran a bisection and
> found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> 
> I don't know if there's something else affected by that change, but
> I can consistently reproduce the bug with turbostat.
> 
> I can provide more info if needed.

Does this commit:

  533445c6e533 sched/core: Fix regression in cpuset_cpu_inactive() for suspend

which is already in Linus's tree, and which should be part of -rc4, 
fix it? Also attached below.

Thanks,

	Ingo

====================>
>From 533445c6e53368569e50ab3fb712230c03d523f3 Mon Sep 17 00:00:00 2001
From: Omar Sandoval <osandov@osandov.com>
Date: Mon, 4 May 2015 03:09:36 -0700
Subject: [PATCH] sched/core: Fix regression in cpuset_cpu_inactive() for suspend

Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

	$ ./a.out
	sched_setaffinity: Success
	$ systemctl suspend
	$ ./a.out
	sched_setaffinity: Invalid argument

... where ./a.out is:

	#define _GNU_SOURCE
	#include <errno.h>
	#include <sched.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <string.h>
	#include <unistd.h>

	int main(void)
	{
		long num_cores;
		cpu_set_t cpu_set;
		int ret;

		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
		CPU_ZERO(&cpu_set);
		CPU_SET(num_cores - 1, &cpu_set);
		errno = 0;
		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
		perror("sched_setaffinity");
		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
	}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34db9bf892a3..57bd333bc4ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
 	unsigned long flags;
 	long cpu = (long)hcpu;
 	struct dl_bw *dl_b;
+	bool overflow;
+	int cpus;
 
-	switch (action & ~CPU_TASKS_FROZEN) {
+	switch (action) {
 	case CPU_DOWN_PREPARE:
-		/* explicitly allow suspend */
-		if (!(action & CPU_TASKS_FROZEN)) {
-			bool overflow;
-			int cpus;
-
-			rcu_read_lock_sched();
-			dl_b = dl_bw_of(cpu);
+		rcu_read_lock_sched();
+		dl_b = dl_bw_of(cpu);
 
-			raw_spin_lock_irqsave(&dl_b->lock, flags);
-			cpus = dl_bw_cpus(cpu);
-			overflow = __dl_overflow(dl_b, cpus, 0, 0);
-			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		cpus = dl_bw_cpus(cpu);
+		overflow = __dl_overflow(dl_b, cpus, 0, 0);
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 
-			rcu_read_unlock_sched();
+		rcu_read_unlock_sched();
 
-			if (overflow)
-				return notifier_from_errno(-EBUSY);
-		}
+		if (overflow)
+			return notifier_from_errno(-EBUSY);
 		cpuset_update_active_cpus(false);
 		break;
 	case CPU_DOWN_PREPARE_FROZEN:


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Regression: turbostat stops working after suspend/resume cycle\
  2015-05-17 15:48 Regression: turbostat stops working after suspend/resume cycle Gabriele Mazzotta
  2015-05-18  6:45 ` Ingo Molnar
@ 2015-05-18  6:48 ` Peter Zijlstra
  2015-05-18  7:19   ` Gabriele Mazzotta
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2015-05-18  6:48 UTC (permalink / raw)
  To: Gabriele Mazzotta
  Cc: juri.lelli, mingo, linux-kernel, len.brown, andrey.semin

On Sun, May 17, 2015 at 05:48:44PM +0200, Gabriele Mazzotta wrote:
> Hi,
> 
> I've recently noticed that if I suspend and resume my laptop, I can no
> longer execute turbostat. This is what I get when I try to start it:
> # turbostat
> Could not migrate to CPU 1
> turbostat: re-initialized with num_cpus 4
> Could not migrate to CPU 1
> 
> Since everything works as expected with v4.0, I ran a bisection and
> found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> 
> I don't know if there's something else affected by that change, but
> I can consistently reproduce the bug with turbostat.


This should be fixed by the below commit which is already in Linus'
tree.

---
commit 533445c6e53368569e50ab3fb712230c03d523f3
Author: Omar Sandoval <osandov@osandov.com>
Date:   Mon May 4 03:09:36 2015 -0700

    sched/core: Fix regression in cpuset_cpu_inactive() for suspend
    
    Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
    cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
    caused a regression in setting a CPU inactive during suspend. I ran into
    this when a program was failing pthread_setaffinity_np() with EINVAL after
    a suspend+wake up.
    
    A simple reproducer:
    
    	$ ./a.out
    	sched_setaffinity: Success
    	$ systemctl suspend
    	$ ./a.out
    	sched_setaffinity: Invalid argument
    
    ... where ./a.out is:
    
    	#define _GNU_SOURCE
    	#include <errno.h>
    	#include <sched.h>
    	#include <stdio.h>
    	#include <stdlib.h>
    	#include <string.h>
    	#include <unistd.h>
    
    	int main(void)
    	{
    		long num_cores;
    		cpu_set_t cpu_set;
    		int ret;
    
    		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
    		CPU_ZERO(&cpu_set);
    		CPU_SET(num_cores - 1, &cpu_set);
    		errno = 0;
    		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
    		perror("sched_setaffinity");
    		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
    	}
    
    The mistake is that suspend is handled in the action ==
    CPU_DOWN_PREPARE_FROZEN case of the switch statement in
    cpuset_cpu_inactive().
    
    However, the commit in question masked out CPU_TASKS_FROZEN
    from the action, making this case dead.
    
    The fix is straightforward.
    
    Signed-off-by: Omar Sandoval <osandov@osandov.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Juri Lelli <juri.lelli@arm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
    Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34db9bf892a3..57bd333bc4ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
 	unsigned long flags;
 	long cpu = (long)hcpu;
 	struct dl_bw *dl_b;
+	bool overflow;
+	int cpus;
 
-	switch (action & ~CPU_TASKS_FROZEN) {
+	switch (action) {
 	case CPU_DOWN_PREPARE:
-		/* explicitly allow suspend */
-		if (!(action & CPU_TASKS_FROZEN)) {
-			bool overflow;
-			int cpus;
-
-			rcu_read_lock_sched();
-			dl_b = dl_bw_of(cpu);
+		rcu_read_lock_sched();
+		dl_b = dl_bw_of(cpu);
 
-			raw_spin_lock_irqsave(&dl_b->lock, flags);
-			cpus = dl_bw_cpus(cpu);
-			overflow = __dl_overflow(dl_b, cpus, 0, 0);
-			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		cpus = dl_bw_cpus(cpu);
+		overflow = __dl_overflow(dl_b, cpus, 0, 0);
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 
-			rcu_read_unlock_sched();
+		rcu_read_unlock_sched();
 
-			if (overflow)
-				return notifier_from_errno(-EBUSY);
-		}
+		if (overflow)
+			return notifier_from_errno(-EBUSY);
 		cpuset_update_active_cpus(false);
 		break;
 	case CPU_DOWN_PREPARE_FROZEN:

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Regression: turbostat stops working after suspend/resume cycle
  2015-05-18  6:45 ` Ingo Molnar
@ 2015-05-18  7:12   ` Gabriele Mazzotta
  0 siblings, 0 replies; 5+ messages in thread
From: Gabriele Mazzotta @ 2015-05-18  7:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: juri.lelli, mingo, peterz, linux-kernel, len.brown, andrey.semin

On Monday 18 May 2015 08:45:52 Ingo Molnar wrote:
> 
> * Gabriele Mazzotta <gabriele.mzt@gmail.com> wrote:
> 
> > Hi,
> > 
> > I've recently noticed that if I suspend and resume my laptop, I can no
> > longer execute turbostat. This is what I get when I try to start it:
> > # turbostat
> > Could not migrate to CPU 1
> > turbostat: re-initialized with num_cpus 4
> > Could not migrate to CPU 1
> > 
> > Since everything works as expected with v4.0, I ran a bisection and
> > found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> > DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> > 
> > I don't know if there's something else affected by that change, but
> > I can consistently reproduce the bug with turbostat.
> > 
> > I can provide more info if needed.
> 
> Does this commit:
> 
>   533445c6e533 sched/core: Fix regression in cpuset_cpu_inactive() for suspend
> 
> which is already in Linus's tree, and which should be part of -rc4, 
> fix it? Also attached below.

Yes, this fixes the problem, thanks. Sorry for not noticing it.

Gabriele

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Regression: turbostat stops working after suspend/resume cycle\
  2015-05-18  6:48 ` Regression: turbostat stops working after suspend/resume cycle\ Peter Zijlstra
@ 2015-05-18  7:19   ` Gabriele Mazzotta
  0 siblings, 0 replies; 5+ messages in thread
From: Gabriele Mazzotta @ 2015-05-18  7:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: juri.lelli, mingo, linux-kernel, len.brown, andrey.semin

On Monday 18 May 2015 08:48:04 Peter Zijlstra wrote:
> On Sun, May 17, 2015 at 05:48:44PM +0200, Gabriele Mazzotta wrote:
> > Hi,
> > 
> > I've recently noticed that if I suspend and resume my laptop, I can no
> > longer execute turbostat. This is what I get when I try to start it:
> > # turbostat
> > Could not migrate to CPU 1
> > turbostat: re-initialized with num_cpus 4
> > Could not migrate to CPU 1
> > 
> > Since everything works as expected with v4.0, I ran a bisection and
> > found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> > DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> > 
> > I don't know if there's something else affected by that change, but
> > I can consistently reproduce the bug with turbostat.
> 
> 
> This should be fixed by the below commit which is already in Linus'
> tree.

Thank you for the quick reply. As I replied to Ingo's mail, which
arrived just a bit earlier than yours, yes, the commit here below fixes
the problem.

Thanks,
Gabriele

> ---
> commit 533445c6e53368569e50ab3fb712230c03d523f3
> Author: Omar Sandoval <osandov@osandov.com>
> Date:   Mon May 4 03:09:36 2015 -0700
> 
>     sched/core: Fix regression in cpuset_cpu_inactive() for suspend
>     
>     Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
>     cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
>     caused a regression in setting a CPU inactive during suspend. I ran into
>     this when a program was failing pthread_setaffinity_np() with EINVAL after
>     a suspend+wake up.
>     
>     A simple reproducer:
>     
>     	$ ./a.out
>     	sched_setaffinity: Success
>     	$ systemctl suspend
>     	$ ./a.out
>     	sched_setaffinity: Invalid argument
>     
>     ... where ./a.out is:
>     
>     	#define _GNU_SOURCE
>     	#include <errno.h>
>     	#include <sched.h>
>     	#include <stdio.h>
>     	#include <stdlib.h>
>     	#include <string.h>
>     	#include <unistd.h>
>     
>     	int main(void)
>     	{
>     		long num_cores;
>     		cpu_set_t cpu_set;
>     		int ret;
>     
>     		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
>     		CPU_ZERO(&cpu_set);
>     		CPU_SET(num_cores - 1, &cpu_set);
>     		errno = 0;
>     		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
>     		perror("sched_setaffinity");
>     		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
>     	}
>     
>     The mistake is that suspend is handled in the action ==
>     CPU_DOWN_PREPARE_FROZEN case of the switch statement in
>     cpuset_cpu_inactive().
>     
>     However, the commit in question masked out CPU_TASKS_FROZEN
>     from the action, making this case dead.
>     
>     The fix is straightforward.
>     
>     Signed-off-by: Omar Sandoval <osandov@osandov.com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Borislav Petkov <bp@alien8.de>
>     Cc: H. Peter Anvin <hpa@zytor.com>
>     Cc: Juri Lelli <juri.lelli@arm.com>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
>     Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 34db9bf892a3..57bd333bc4ab 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
>  	unsigned long flags;
>  	long cpu = (long)hcpu;
>  	struct dl_bw *dl_b;
> +	bool overflow;
> +	int cpus;
>  
> -	switch (action & ~CPU_TASKS_FROZEN) {
> +	switch (action) {
>  	case CPU_DOWN_PREPARE:
> -		/* explicitly allow suspend */
> -		if (!(action & CPU_TASKS_FROZEN)) {
> -			bool overflow;
> -			int cpus;
> -
> -			rcu_read_lock_sched();
> -			dl_b = dl_bw_of(cpu);
> +		rcu_read_lock_sched();
> +		dl_b = dl_bw_of(cpu);
>  
> -			raw_spin_lock_irqsave(&dl_b->lock, flags);
> -			cpus = dl_bw_cpus(cpu);
> -			overflow = __dl_overflow(dl_b, cpus, 0, 0);
> -			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
> +		raw_spin_lock_irqsave(&dl_b->lock, flags);
> +		cpus = dl_bw_cpus(cpu);
> +		overflow = __dl_overflow(dl_b, cpus, 0, 0);
> +		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>  
> -			rcu_read_unlock_sched();
> +		rcu_read_unlock_sched();
>  
> -			if (overflow)
> -				return notifier_from_errno(-EBUSY);
> -		}
> +		if (overflow)
> +			return notifier_from_errno(-EBUSY);
>  		cpuset_update_active_cpus(false);
>  		break;
>  	case CPU_DOWN_PREPARE_FROZEN:


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-18  7:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-17 15:48 Regression: turbostat stops working after suspend/resume cycle Gabriele Mazzotta
2015-05-18  6:45 ` Ingo Molnar
2015-05-18  7:12   ` Gabriele Mazzotta
2015-05-18  6:48 ` Regression: turbostat stops working after suspend/resume cycle\ Peter Zijlstra
2015-05-18  7:19   ` Gabriele Mazzotta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).