public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
@ 2016-08-10 11:14 Mike Galbraith
  2016-08-10 12:30 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2016-08-10 11:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML

Hi Peter,

While running ltp, the fates decided it was time for me to encounter
the roughly 1 out of every 10 call failure below.  As much as I run
ltp, I'm a bit surprised that I (or anyone else) haven't met this
before, but then the fates are known to be a tad fickle.

getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04    0  TINFO  :  utime:           0us; stime:         179us
getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:

When applying the full rtime to either stime or utime, do not overwrite
the previously tallied value.

Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: stable@vger.kernel.org # 4.3+
---
 kernel/sched/cputime.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -608,11 +608,13 @@ static void cputime_adjust(struct task_c
 
 	if (utime == 0) {
 		stime = rtime;
+		utime = prev->utime;
 		goto update;
 	}
 
 	if (stime == 0) {
 		utime = rtime;
+		stime = prev->stime;
 		goto update;
 	}
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-10 11:14 [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression Mike Galbraith
@ 2016-08-10 12:30 ` Peter Zijlstra
  2016-08-10 12:47   ` Peter Zijlstra
  2016-08-10 18:57   ` Mike Galbraith
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-08-10 12:30 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On Wed, Aug 10, 2016 at 01:14:29PM +0200, Mike Galbraith wrote:
> Hi Peter,
> 
> While running ltp, the fates decided it was time for me to encounter
> the roughly 1 out of every 10 call failure below.  As much as I run
> ltp, I'm a bit surprised that I (or anyone else) haven't met this
> before, but then the fates are known to be a tad fickle.
> 
> getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
> getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
> getrusage04    0  TINFO  :  utime:           0us; stime:         179us
> getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
> getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
> 
> When applying the full rtime to either stime or utime, do not overwrite
> the previously tallied value.
> 
> Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: stable@vger.kernel.org # 4.3+
> ---
>  kernel/sched/cputime.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -608,11 +608,13 @@ static void cputime_adjust(struct task_c
>  
>  	if (utime == 0) {
>  		stime = rtime;
> +		utime = prev->utime;
>  		goto update;
>  	}
>  
>  	if (stime == 0) {
>  		utime = rtime;
> +		stime = prev->stime;
>  		goto update;
>  	}

This cannot be right; it violates that utime+stime==rtime. Let me try
and figure out what actually happens.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-10 12:30 ` Peter Zijlstra
@ 2016-08-10 12:47   ` Peter Zijlstra
  2016-08-10 14:21     ` Mike Galbraith
  2016-08-10 18:57   ` Mike Galbraith
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-08-10 12:47 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On Wed, Aug 10, 2016 at 02:30:33PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 10, 2016 at 01:14:29PM +0200, Mike Galbraith wrote:
> > Hi Peter,
> > 
> > While running ltp, the fates decided it was time for me to encounter
> > the roughly 1 out of every 10 call failure below.  As much as I run
> > ltp, I'm a bit surprised that I (or anyone else) haven't met this
> > before, but then the fates are known to be a tad fickle.
> > 
> > getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
> > getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
> > getrusage04    0  TINFO  :  utime:           0us; stime:         179us
> > getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
> > getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
> > 
> > When applying the full rtime to either stime or utime, do not overwrite
> > the previously tallied value.
> > 
> > Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
> > Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> > Cc: stable@vger.kernel.org # 4.3+
> > ---
> >  kernel/sched/cputime.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -608,11 +608,13 @@ static void cputime_adjust(struct task_c
> >  
> >  	if (utime == 0) {
> >  		stime = rtime;
> > +		utime = prev->utime;
> >  		goto update;
> >  	}
> >  
> >  	if (stime == 0) {
> >  		utime = rtime;
> > +		stime = prev->stime;
> >  		goto update;
> >  	}
> 
> This cannot be right; it violates that utime+stime==rtime. Let me try
> and figure out what actually happens.

Any idea where your [us]time are coming from? Do you end up in the
vtime_accounting_enabled() path or not?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-10 12:47   ` Peter Zijlstra
@ 2016-08-10 14:21     ` Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2016-08-10 14:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML

On Wed, 2016-08-10 at 14:47 +0200, Peter Zijlstra wrote:

> Any idea where your [us]time are coming from? Do you end up in the
> vtime_accounting_enabled() path or not?

No, I'm not booting with nohz_full=.

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-10 12:30 ` Peter Zijlstra
  2016-08-10 12:47   ` Peter Zijlstra
@ 2016-08-10 18:57   ` Mike Galbraith
  2016-08-15  8:51     ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2016-08-10 18:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML

On Wed, 2016-08-10 at 14:30 +0200, Peter Zijlstra wrote:
> On Wed, Aug 10, 2016 at 01:14:29PM +0200, Mike Galbraith wrote:

> > --- a/kernel/sched/cputime.c
> > +++ b/kernel/sched/cputime.c
> > @@ -608,11 +608,13 @@ static void cputime_adjust(struct task_c
> >  
> >  	if (utime == 0) {
> >  		stime = rtime;
> > +		utime = prev->utime;
> >  		goto update;
> >  	}
> >  
> >  	if (stime == 0) {
> >  		utime = rtime;
> > +		stime = prev->stime;
> >  		goto update;
> >  	}
> 
> This cannot be right; it violates that utime+stime==rtime. Let me try
> and figure out what actually happens.

How about this instead.

sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression

Roughly 10% of the time, ltp testcase getrusage04 fails:
getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04    0  TINFO  :  utime:           0us; stime:         179us
getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:

If ->sum_exec_runtime has moved beyond the rtime of ->prev_cputime, but
no time has as yet been accounted to the task, bail.

Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: stable@vger.kernel.org # 4.3+
---
 kernel/sched/cputime.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -606,6 +606,13 @@ static void cputime_adjust(struct task_c
 	stime = curr->stime;
 	utime = curr->utime;
 
+	/*
+	 * sum_exec_runtime has moved, but nothing has yet been
+	 * accounted to the task, there's nothing to update.
+	 */
+	if (utime + stime == 0)
+		goto out;
+
 	if (utime == 0) {
 		stime = rtime;
 		goto update;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-10 18:57   ` Mike Galbraith
@ 2016-08-15  8:51     ` Peter Zijlstra
  2016-08-15 12:29       ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-08-15  8:51 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On Wed, Aug 10, 2016 at 08:57:28PM +0200, Mike Galbraith wrote:
> sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
> 
> Roughly 10% of the time, ltp testcase getrusage04 fails:
> getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
> getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
> getrusage04    0  TINFO  :  utime:           0us; stime:         179us
> getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
> getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
> 
> If ->sum_exec_runtime has moved beyond the rtime of ->prev_cputime, but
> no time has as yet been accounted to the task, bail.
> 
> Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: stable@vger.kernel.org # 4.3+
> ---
>  kernel/sched/cputime.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -606,6 +606,13 @@ static void cputime_adjust(struct task_c
>  	stime = curr->stime;
>  	utime = curr->utime;
>  
> +	/*
> +	 * sum_exec_runtime has moved, but nothing has yet been
> +	 * accounted to the task, there's nothing to update.
> +	 */
> +	if (utime + stime == 0)
> +		goto out;

urgh...

Valid scenario.. not sure about the solution though. This would mean the
task has _no_ running time if it forever dodges the tick, which would be
bad.

Does something like so cure things too?

---
 kernel/sched/cputime.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 9858266fb0b3..2ee83b200504 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -614,19 +614,25 @@ static void cputime_adjust(struct task_cputime *curr,
 	stime = curr->stime;
 	utime = curr->utime;
 
-	if (utime == 0) {
-		stime = rtime;
+	/*
+	 * If either stime or both stime and utime are 0, assume all runtime is
+	 * userspace. Once a task gets some ticks, the monotonicy code at
+	 * 'update' will ensure things converge to the observed ratio.
+	 */
+	if (stime == 0) {
+		utime = rtime;
 		goto update;
 	}
 
-	if (stime == 0) {
-		utime = rtime;
+	if (utime == 0) {
+		stime = rtime;
 		goto update;
 	}
 
 	stime = scale_stime((__force u64)stime, (__force u64)rtime,
 			    (__force u64)(stime + utime));
 
+update:
 	/*
 	 * Make sure stime doesn't go backwards; this preserves monotonicity
 	 * for utime because rtime is monotonic.
@@ -649,7 +655,6 @@ static void cputime_adjust(struct task_cputime *curr,
 		stime = rtime - utime;
 	}
 
-update:
 	prev->stime = stime;
 	prev->utime = utime;
 out:

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
  2016-08-15  8:51     ` Peter Zijlstra
@ 2016-08-15 12:29       ` Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2016-08-15 12:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML

On Mon, 2016-08-15 at 10:51 +0200, Peter Zijlstra wrote:
> On Wed, Aug 10, 2016 at 08:57:28PM +0200, Mike Galbraith wrote:
> >  
> > +> > 	> > /*
> > +> > 	> >  * sum_exec_runtime has moved, but nothing has yet been
> > +> > 	> >  * accounted to the task, there's nothing to update.
> > +> > 	> >  */
> > +> > 	> > if (utime + stime == 0)
> > +> > 	> > 	> > goto out;
> 
> urgh...
> 
> Valid scenario.. not sure about the solution though. This would mean the
> task has _no_ running time if it forever dodges the tick, which would be
> bad.
> 
> Does something like so cure things too?

Yeah, it's a happy camper.

> ---
>  kernel/sched/cputime.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 9858266fb0b3..2ee83b200504 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -614,19 +614,25 @@ static void cputime_adjust(struct task_cputime *curr,
>  > 	> stime = curr->stime;
>  > 	> utime = curr->utime;
>  
> -> 	> if (utime == 0) {
> -> 	> 	> stime = rtime;
> +> 	> /*
> +> 	>  * If either stime or both stime and utime are 0, assume all runtime is
> +> 	>  * userspace. Once a task gets some ticks, the monotonicy code at
> +> 	>  * 'update' will ensure things converge to the observed ratio.
> +> 	>  */
> +> 	> if (stime == 0) {
> +> 	> 	> utime = rtime;
>  > 	> 	> goto update;
>  > 	> }
>  
> -> 	> if (stime == 0) {
> -> 	> 	> utime = rtime;
> +> 	> if (utime == 0) {
> +> 	> 	> stime = rtime;
>  > 	> 	> goto update;
>  > 	> }
>  
>  > 	> stime = scale_stime((__force u64)stime, (__force u64)rtime,
>  > 	> 	> 	>     (__force u64)(stime + utime));
>  
> +update:
>  > 	> /*
>  > 	>  * Make sure stime doesn't go backwards; this preserves monotonicity
>  > 	>  * for utime because rtime is monotonic.
> @@ -649,7 +655,6 @@ static void cputime_adjust(struct task_cputime *curr,
>  > 	> 	> stime = rtime - utime;
>  > 	> }
>  
> -update:
>  > 	> prev->stime = stime;
>  > 	> prev->utime = utime;
>  out:

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-15 12:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-10 11:14 [patch] sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression Mike Galbraith
2016-08-10 12:30 ` Peter Zijlstra
2016-08-10 12:47   ` Peter Zijlstra
2016-08-10 14:21     ` Mike Galbraith
2016-08-10 18:57   ` Mike Galbraith
2016-08-15  8:51     ` Peter Zijlstra
2016-08-15 12:29       ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox