From: Peter Zijlstra <peterz@infradead.org>
To: Olivier Langlois <olivier@trillion01.com>
Cc: mingo@redhat.com, tglx@linutronix.de, fweisbec@gmail.com,
schwidefsky@de.ibm.com, rostedt@goodmis.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] process cputimer is moving faster than its corresponding clock
Date: Fri, 12 Apr 2013 17:55:15 +0200 [thread overview]
Message-ID: <1365782115.17140.68.camel@laptop> (raw)
In-Reply-To: <1365763837.17140.52.camel@laptop>
On Fri, 2013-04-12 at 12:50 +0200, Peter Zijlstra wrote:
> I'll try and dig through the rest of your email later.. sorry for
> being
> a tad slow etc.
So at thread_group_cputimer() we initialize the cputimer->cputime state
by using thread_group_cputime() which iterates all tasks of the process
and calls task_sched_runtime() upon them (which includes the current
delta).
Upon subsequent account_group_exec_runtime() calls (from all schedule
events and timer ticks) we add the current delta to cputimer->cputime.
However since we already added the first (or part thereof) delta to the
initial state, we account this double. Thus we can be up to
NR_CPUS*TICK_NSEC ahead.
On every timer tick we evaluate the cputimer state using
cpu_timer_sample_group() which adds the current tasks delta. This can
thus be up to (NR_CPUS-1)*TICK_NSEC behind.
The combination of the timeline behind ahead and the sample being
behind make it a virtual guarantee we'll hit early by almost
2*NR_CPUS*TICK_NSEC.
This is what you've been saying right?
So how about we do not include the deltas into the initial sum, so that
we're up to NR_CPUS*TICK_NSEC behind. That way, with the sample up to
(NR_CPUS-1)*TICK_NSEC behind, we're in the order of TICK_NSEC late with
firing.
Hmm?
---
include/linux/sched.h | 5 +++--
kernel/posix-cpu-timers.c | 15 ++++++++++-----
kernel/sched/core.c | 6 ++++--
kernel/sched/cputime.c | 8 ++++----
4 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 88ec7f4..abe5870 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1832,7 +1832,7 @@ static inline void disable_sched_clock_irqtime(void) {}
#endif
extern unsigned long long
-task_sched_runtime(struct task_struct *task);
+task_sched_runtime(struct task_struct *task, bool add_delta);
/* sched_exec is called by processes performing an exec */
#ifdef CONFIG_SMP
@@ -2496,7 +2496,8 @@ static inline void current_clr_polling(void) { }
/*
* Thread group CPU time accounting.
*/
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times,
+ bool add_delta);
void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
static inline void thread_group_cputime_init(struct signal_struct *sig)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 8fd709c..d8133ad 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -220,7 +220,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
cpu->cpu = virt_ticks(p);
break;
case CPUCLOCK_SCHED:
- cpu->sched = task_sched_runtime(p);
+ cpu->sched = task_sched_runtime(p, true);
break;
}
return 0;
@@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
* values through the TIMER_ABSTIME flag, therefore we have
* to synchronize the timer to the clock every time we start
* it.
+ *
+ * Do no add the current delta, because
+ * account_group_exec_runtime() will also add this delta and we
+ * wouldn't want to double account time and get ahead of
+ * ourselves.
*/
- thread_group_cputime(tsk, &sum);
+ thread_group_cputime(tsk, &sum, false);
raw_spin_lock_irqsave(&cputimer->lock, flags);
cputimer->running = 1;
update_gt_cputime(&cputimer->cputime, &sum);
@@ -275,15 +280,15 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
default:
return -EINVAL;
case CPUCLOCK_PROF:
- thread_group_cputime(p, &cputime);
+ thread_group_cputime(p, &cputime, true);
cpu->cpu = cputime.utime + cputime.stime;
break;
case CPUCLOCK_VIRT:
- thread_group_cputime(p, &cputime);
+ thread_group_cputime(p, &cputime, true);
cpu->cpu = cputime.utime;
break;
case CPUCLOCK_SCHED:
- thread_group_cputime(p, &cputime);
+ thread_group_cputime(p, &cputime, true);
cpu->sched = cputime.sum_exec_runtime;
break;
}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e8167e3..704fa44 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2677,14 +2677,16 @@ unsigned long long task_delta_exec(struct task_struct *p)
* In case the task is currently running, return the runtime plus current's
* pending runtime that have not been accounted yet.
*/
-unsigned long long task_sched_runtime(struct task_struct *p)
+unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
{
unsigned long flags;
struct rq *rq;
u64 ns = 0;
rq = task_rq_lock(p, &flags);
- ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+ ns = p->se.sum_exec_runtime;
+ if (add_delta)
+ ns += do_task_delta_exec(p, rq);
task_rq_unlock(rq, p, &flags);
return ns;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ea32f02..c3495e1 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -277,7 +277,7 @@ static __always_inline bool steal_account_process_tick(void)
* Accumulate raw cputime values of dead tasks (sig->[us]time) and live
* tasks (sum on group iteration) belonging to @tsk's group.
*/
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times, bool add_delta)
{
struct signal_struct *sig = tsk->signal;
cputime_t utime, stime;
@@ -297,7 +297,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
task_cputime(t, &utime, &stime);
times->utime += utime;
times->stime += stime;
- times->sum_exec_runtime += task_sched_runtime(t);
+ times->sum_exec_runtime += task_sched_runtime(t, add_delta);
} while_each_thread(tsk, t);
out:
rcu_read_unlock();
@@ -444,7 +444,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
{
struct task_cputime cputime;
- thread_group_cputime(p, &cputime);
+ thread_group_cputime(p, &cputime, true);
*ut = cputime.utime;
*st = cputime.stime;
@@ -606,7 +606,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
{
struct task_cputime cputime;
- thread_group_cputime(p, &cputime);
+ thread_group_cputime(p, &cputime, true);
cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
}
#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
next prev parent reply other threads:[~2013-04-12 15:55 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-05 17:59 [PATCH] process cputimer is moving faster than its corresponding clock Olivier Langlois
2013-04-10 11:35 ` Peter Zijlstra
2013-04-10 15:48 ` Olivier Langlois
2013-04-12 9:16 ` Peter Zijlstra
2013-04-15 1:55 ` Olivier Langlois
2013-04-12 9:18 ` Peter Zijlstra
2013-04-12 10:50 ` Peter Zijlstra
2013-04-12 15:55 ` Peter Zijlstra [this message]
2013-04-15 6:11 ` Olivier Langlois
2013-04-19 13:03 ` Frederic Weisbecker
2013-04-19 17:38 ` KOSAKI Motohiro
2013-04-19 18:08 ` KOSAKI Motohiro
2013-04-26 4:40 ` Olivier Langlois
2013-04-26 6:27 ` Olivier Langlois
2013-04-26 19:08 ` KOSAKI Motohiro
2013-04-27 1:51 ` Olivier Langlois
2013-04-27 2:15 ` KOSAKI Motohiro
2013-04-27 5:02 ` Olivier Langlois
2013-04-27 5:17 ` Olivier Langlois
2013-04-27 5:31 ` Olivier Langlois
2013-04-27 5:06 ` Olivier Langlois
2013-04-27 4:40 ` [PATCH v2 1/3] " Olivier Langlois
2013-04-29 0:45 ` Frederic Weisbecker
2013-04-29 17:29 ` Olivier Langlois
2013-04-29 5:06 ` KOSAKI Motohiro
2013-04-29 17:10 ` Olivier Langlois
2013-04-29 17:41 ` Olivier Langlois
2013-04-29 17:56 ` KOSAKI Motohiro
2013-04-29 18:20 ` Olivier Langlois
2013-04-29 18:31 ` KOSAKI Motohiro
2013-04-29 18:54 ` Olivier Langlois
2013-04-29 19:09 ` KOSAKI Motohiro
2013-04-29 21:20 ` Olivier Langlois
2013-04-29 22:42 ` KOSAKI Motohiro
[not found] ` <1367036552.7911.63.camel@Wailaba2>
2013-04-27 4:40 ` [PATCH v2 2/3] " Olivier Langlois
2013-04-27 4:41 ` [PATCH v2 3/3] " Olivier Langlois
2013-04-29 6:25 ` KOSAKI Motohiro
2013-04-29 17:16 ` Olivier Langlois
2013-04-11 3:29 ` [PATCH] " Olivier Langlois
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1365782115.17140.68.camel@laptop \
--to=peterz@infradead.org \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=olivier@trillion01.com \
--cc=rostedt@goodmis.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.