Re: [tip:sched/core] sched: Lower chances of cputime scaling overflow

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stanislaw Gruszka <sgruszka@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched: Lower chances of cputime scaling overflow
Date: Tue, 16 Apr 2013 12:40:07 +0200	[thread overview]
Message-ID: <20130416104007.GA620@redhat.com> (raw)
In-Reply-To: <CA+55aFwMeAe6Mb9EPeT4Ke0W36z-XUK3BZvbOt+dMzdbJ3F1Ew@mail.gmail.com>

On Sat, Apr 13, 2013 at 11:44:54AM -0700, Linus Torvalds wrote:
> PS. This is the "Make sure 'total' fits in 32 bits first" version. Not
> really tested, but it's just changing the order of operations a bit.
> 
>     /* We know one of the values has a bit set in the high 32 bits */
>     for (;;) {
>         /* Make sure "rtime" is the bigger of stime/rtime */
>         if (stime > rtime) {
>             u64 tmp = rtime; rtime = stime; stime = tmp;
>         }
> 
>         /* Make sure 'total' fits in 32 bits */
>         if (total >> 32)
>                 goto drop_precision;
> 
>         /* Does rtime (and thus stime) fit in 32 bits? */
>         if (!(rtime >> 32))
>                 break;
> 
>         /* Can we just balance rtime/stime rather than dropping bits? */
>         if (stime >> 31)
>             goto drop_precision;
> 
>         /* We can grow stime and shrink rtime and try to make them both fit */
>         stime <<= 1;
>         rtime >>= 1;
>         continue;
> 
> drop_precision:
>         /* We drop from rtime, it has more bits than stime */
>         rtime >>= 1;
>         total >>= 1;
>     }

It also also pass 0.1% relative error on my tests. Decreasing error
threshold to 0.02% failed. I didn't check other values or measure how
frequent 0.02% fail on each version, I assume this one is better :-)

So regarding relative error this algorithm is fine, there is no
multiplication overflow error which make scaled numbers bogus. Then
I looked on this algorithm regarding context how it is used ...

Raw stime/rtime/total values will increase in jiffies resolution,
so do scaled_stime if we do not drop precision. For bigger numbers,
since we drop precision, scaled_stime will grow in chunks. How big
the chunk depend on how overall big numbers are and stime/total ratio.
For example: stime = 0.5*total, 128 threads, after 1 year of CPU
execution chunk will be 1024 jiffies.

We use scaled stime value this way:

	if (total)
		stime = scale_stime(stime, rtime, total);
	else
		stime = rtime;

	/*
	 * If the tick based count grows faster than the scheduler one,
	 * the result of the scaling may go backward.
	 * Let's enforce monotonicity.
	 */
	prev->stime = max(prev->stime, stime);
	prev->utime = max(prev->utime, rtime - prev->stime);

	*ut = prev->utime;
	*st = prev->stime;

Since rtime increase, but scaled stime not, stime will be accounted
as prev->utime. Then after chunk jiffies, stime will grow and we will
get it accounted in prev->stime. As result we will account more cpu
time than actually process do. This error will accumulate depending
how frequently cputime_adjust(process) will be called.

As solution for this we could just stop accounting if prev->stime +
prev->utime are already bigger than rtime. For example:

 	rtime = nsecs_to_cputime(curr->sum_exec_runtime);

+	/*
+	 * Update userspace visible utime/stime values only if actual execution
+	 * time is bigger than already exported. Note that can happen, that we 
+	 * provided bigger values due to scaling inaccuracy on big numbers.
+	 */
+	if (prev->stime + prev->utime >= rtime)
+		goto out;
+
 	if (total)
 		stime = scale_stime(stime, rtime, total);
 	else
@@ -573,6 +581,7 @@ static void cputime_adjust(struct task_cputime *curr,
 	prev->stime = max(prev->stime, stime);
 	prev->utime = max(prev->utime, rtime - prev->stime);

+out:
 	*ut = prev->utime;
 	*st = prev->stime;
 }

This should help with erroneously accounting more CPU time than process
actually use. As disadvantage userspace will see CPU time increase in
chunks, but I think this is better than see values much bigger than
correct ones (and for 99.9% user cases it does not matter).

Stanislaw

next prev parent reply	other threads:[~2013-04-16 10:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-d9a3c9823a2e6a543eb7807fb3d15d8233817ec5@git.kernel.org>
2013-03-26 14:01 ` [tip:sched/core] sched: Lower chances of cputime scaling overflow Stanislaw Gruszka
2013-03-26 14:19   ` Frederic Weisbecker
2013-03-26 16:54     ` Stanislaw Gruszka
2013-04-10 12:51     ` Ingo Molnar
2013-04-10 15:28       ` Frederic Weisbecker
2013-04-10 17:32         ` Ingo Molnar
2013-04-11  8:04           ` Stanislaw Gruszka
2013-04-11 13:45   ` Peter Zijlstra
2013-04-11 14:50     ` Stanislaw Gruszka
2013-04-11 17:31       ` Peter Zijlstra
2013-04-11 15:38     ` Linus Torvalds
2013-04-11 18:07       ` Peter Zijlstra
2013-04-11 18:22         ` Frederic Weisbecker
2013-04-11 18:26           ` Frederic Weisbecker
2013-04-11 18:22         ` Linus Torvalds
2013-04-12  7:55       ` Peter Zijlstra
2013-04-13 14:49         ` Stanislaw Gruszka
2013-04-13 18:44           ` Linus Torvalds
2013-04-16 10:40             ` Stanislaw Gruszka [this message]
2013-04-30 14:03             ` Stanislaw Gruszka
2013-04-13 14:55       ` Stanislaw Gruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130416104007.GA620@redhat.com \
    --to=sgruszka@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.