public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Olivier Langlois <olivier@trillion01.com>
Cc: mingo@redhat.com, tglx@linutronix.de, fweisbec@gmail.com,
	schwidefsky@de.ibm.com, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] process cputimer is moving faster than its corresponding clock
Date: Fri, 12 Apr 2013 17:55:15 +0200	[thread overview]
Message-ID: <1365782115.17140.68.camel@laptop> (raw)
In-Reply-To: <1365763837.17140.52.camel@laptop>

On Fri, 2013-04-12 at 12:50 +0200, Peter Zijlstra wrote:

> I'll try and dig through the rest of your email later.. sorry for
> being
> a tad slow etc.


So at thread_group_cputimer() we initialize the cputimer->cputime state
by using thread_group_cputime() which iterates all tasks of the process
and calls task_sched_runtime() upon them (which includes the current
delta).

Upon subsequent account_group_exec_runtime() calls (from all schedule
events and timer ticks) we add the current delta to cputimer->cputime.

However since we already added the first (or part thereof) delta to the
initial state, we account this double. Thus we can be up to
NR_CPUS*TICK_NSEC ahead.

On every timer tick we evaluate the cputimer state using
cpu_timer_sample_group() which adds the current tasks delta. This can
thus be up to (NR_CPUS-1)*TICK_NSEC behind.

The combination of the timeline behind ahead and the sample being
behind make it a virtual guarantee we'll hit early by almost
2*NR_CPUS*TICK_NSEC.

This is what you've been saying right?


So how about we do not include the deltas into the initial sum, so that
we're up to NR_CPUS*TICK_NSEC behind. That way, with the sample up to
(NR_CPUS-1)*TICK_NSEC behind, we're in the order of TICK_NSEC late with
firing.

Hmm?

---
 include/linux/sched.h     |  5 +++--
 kernel/posix-cpu-timers.c | 15 ++++++++++-----
 kernel/sched/core.c       |  6 ++++--
 kernel/sched/cputime.c    |  8 ++++----
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 88ec7f4..abe5870 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1832,7 +1832,7 @@ static inline void disable_sched_clock_irqtime(void) {}
 #endif
 
 extern unsigned long long
-task_sched_runtime(struct task_struct *task);
+task_sched_runtime(struct task_struct *task, bool add_delta);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
@@ -2496,7 +2496,8 @@ static inline void current_clr_polling(void) { }
 /*
  * Thread group CPU time accounting.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times,
+			  bool add_delta);
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 8fd709c..d8133ad 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -220,7 +220,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 		cpu->cpu = virt_ticks(p);
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p);
+		cpu->sched = task_sched_runtime(p, true);
 		break;
 	}
 	return 0;
@@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * values through the TIMER_ABSTIME flag, therefore we have
 		 * to synchronize the timer to the clock every time we start
 		 * it.
+		 *
+		 * Do no add the current delta, because
+		 * account_group_exec_runtime() will also add this delta and we
+		 * wouldn't want to double account time and get ahead of
+		 * ourselves.
 		 */
-		thread_group_cputime(tsk, &sum);
+		thread_group_cputime(tsk, &sum, false);
 		raw_spin_lock_irqsave(&cputimer->lock, flags);
 		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
@@ -275,15 +280,15 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, &cputime, true);
 		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, &cputime, true);
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, &cputime, true);
 		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e8167e3..704fa44 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2677,14 +2677,16 @@ unsigned long long task_delta_exec(struct task_struct *p)
  * In case the task is currently running, return the runtime plus current's
  * pending runtime that have not been accounted yet.
  */
-unsigned long long task_sched_runtime(struct task_struct *p)
+unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
 {
 	unsigned long flags;
 	struct rq *rq;
 	u64 ns = 0;
 
 	rq = task_rq_lock(p, &flags);
-	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+	ns = p->se.sum_exec_runtime;
+        if (add_delta)
+		ns += do_task_delta_exec(p, rq);
 	task_rq_unlock(rq, p, &flags);
 
 	return ns;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ea32f02..c3495e1 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -277,7 +277,7 @@ static __always_inline bool steal_account_process_tick(void)
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
  * tasks (sum on group iteration) belonging to @tsk's group.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times, bool add_delta)
 {
 	struct signal_struct *sig = tsk->signal;
 	cputime_t utime, stime;
@@ -297,7 +297,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 		task_cputime(t, &utime, &stime);
 		times->utime += utime;
 		times->stime += stime;
-		times->sum_exec_runtime += task_sched_runtime(t);
+		times->sum_exec_runtime += task_sched_runtime(t, add_delta);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
@@ -444,7 +444,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, &cputime, true);
 
 	*ut = cputime.utime;
 	*st = cputime.stime;
@@ -606,7 +606,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, &cputime, true);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */



  reply	other threads:[~2013-04-12 15:55 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-05 17:59 [PATCH] process cputimer is moving faster than its corresponding clock Olivier Langlois
2013-04-10 11:35 ` Peter Zijlstra
2013-04-10 15:48   ` Olivier Langlois
2013-04-12  9:16     ` Peter Zijlstra
2013-04-15  1:55       ` Olivier Langlois
2013-04-12  9:18     ` Peter Zijlstra
2013-04-12 10:50     ` Peter Zijlstra
2013-04-12 15:55       ` Peter Zijlstra [this message]
2013-04-15  6:11         ` Olivier Langlois
2013-04-19 13:03         ` Frederic Weisbecker
2013-04-19 17:38           ` KOSAKI Motohiro
2013-04-19 18:08             ` KOSAKI Motohiro
2013-04-26  4:40               ` Olivier Langlois
2013-04-26  6:27                 ` Olivier Langlois
2013-04-26 19:08                   ` KOSAKI Motohiro
2013-04-27  1:51                     ` Olivier Langlois
2013-04-27  2:15                       ` KOSAKI Motohiro
2013-04-27  5:02                         ` Olivier Langlois
2013-04-27  5:17                           ` Olivier Langlois
2013-04-27  5:31                           ` Olivier Langlois
2013-04-27  5:06                         ` Olivier Langlois
2013-04-27  4:40                     ` [PATCH v2 1/3] " Olivier Langlois
2013-04-29  0:45                       ` Frederic Weisbecker
2013-04-29 17:29                         ` Olivier Langlois
2013-04-29  5:06                       ` KOSAKI Motohiro
2013-04-29 17:10                         ` Olivier Langlois
2013-04-29 17:41                           ` Olivier Langlois
2013-04-29 17:56                           ` KOSAKI Motohiro
2013-04-29 18:20                             ` Olivier Langlois
2013-04-29 18:31                               ` KOSAKI Motohiro
2013-04-29 18:54                                 ` Olivier Langlois
2013-04-29 19:09                                   ` KOSAKI Motohiro
2013-04-29 21:20                                     ` Olivier Langlois
2013-04-29 22:42                                       ` KOSAKI Motohiro
     [not found]                     ` <1367036552.7911.63.camel@Wailaba2>
2013-04-27  4:40                       ` [PATCH v2 2/3] " Olivier Langlois
2013-04-27  4:41                       ` [PATCH v2 3/3] " Olivier Langlois
2013-04-29  6:25                         ` KOSAKI Motohiro
2013-04-29 17:16                           ` Olivier Langlois
2013-04-11  3:29   ` [PATCH] " Olivier Langlois

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1365782115.17140.68.camel@laptop \
    --to=peterz@infradead.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=olivier@trillion01.com \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox