public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] posix timers fixlet
@ 2013-05-03  4:47 kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Glibc's posix timer testcase found a lot of bugs in posix timer code.
This series, hopefully, fixes all of them. All patches are independent 
each other logically.

Changes from v3
 - task_sched_runtime() micro optimization add to care tsk->on_cpu.
   suggested Paul Turner.
 - fixed several typo in changelogs.


KOSAKI Motohiro (7):
  posix-cpu-timers: don't account cpu timer after stopped thread
    runtime accounting
  posix-cpu-timers: fix acounting delta_exec twice
  posix-cpu-timers: fix wrong timer initialization
  posix-cpu-timers: timer functions should use timer time instead of
    clock time
  posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
  sched: task_sched_runtime introduce micro optimization
  posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}

 fs/binfmt_elf.c             |    2 +-
 fs/binfmt_elf_fdpic.c       |    2 +-
 include/linux/kernel_stat.h |    5 --
 include/linux/sched.h       |    5 +-
 kernel/posix-cpu-timers.c   |  143 ++++++++++++++++++++++++-------------------
 kernel/sched/core.c         |   34 ++++++-----
 kernel/sched/cputime.c      |   19 +++++-
 kernel/sched/stats.h        |    7 ++
 8 files changed, 125 insertions(+), 92 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-06 23:47   ` Frederic Weisbecker
  2013-05-03  4:47 ` [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

When tsk->signal->cputimer->running is 1, signal->cputimer and
tsk->sum_sched_runtime increase at the same pace because update_curr()
increases both accounting.

However, there is one exception. When thread exiting, __exit_signal() turns
over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
signal->cputimer accounting.

This inconsistency makes POSIX timer wake up too early. This patch fixes it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/sched/stats.h |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 2ef90a5..5a0cfc4 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
+	/*
+	 * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
+	 * in __exit_signal(), we must not account exec_runtime for consistency.
+	 */
+	if (unlikely(!tsk->sighand))
+		return;
+
 	raw_spin_lock(&cputimer->lock);
 	cputimer->cputime.sum_exec_runtime += ns;
 	raw_spin_unlock(&cputimer->lock);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-11  0:56   ` Frederic Weisbecker
  2013-05-03  4:47 ` [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Currently glibc rt/tst-cpuclock2 test(*) sporadically fails because
scheduler delta can be accounted twice from thread_group_cputimer()
and account_group_exec_runtime().

Finally, clock_nanosleep() wakes up before an argument. This is posix
violation. This issue was introduced by commit d670ec1317 (posix-cpu-timers:
Cure SMP wobbles).

(*) http://sourceware.org/git/?p=glibc.git;a=blob;f=rt/tst-cpuclock2.c;h=6752721717f959e89c0d692b3f1ee082d507eec2;hb=HEAD

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/binfmt_elf.c           |    2 +-
 fs/binfmt_elf_fdpic.c     |    2 +-
 include/linux/sched.h     |    4 ++--
 kernel/posix-cpu-timers.c |   15 ++++++++++-----
 kernel/sched/core.c       |    6 ++++--
 kernel/sched/cputime.c    |    8 ++++----
 6 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 86af964..fea51e7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1322,7 +1322,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9c13e02..ab5b508 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1371,7 +1371,7 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * This is the record for the group leader.  It shows the
 		 * group-wide total, not its individual thread total.
 		 */
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
 		cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
 	} else {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e692a02..7863d4b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2002,7 +2002,7 @@ static inline void disable_sched_clock_irqtime(void) {}
 #endif
 
 extern unsigned long long
-task_sched_runtime(struct task_struct *task);
+task_sched_runtime(struct task_struct *task, bool add_delta);
 
 /* sched_exec is called by processes performing an exec */
 #ifdef CONFIG_SMP
@@ -2625,7 +2625,7 @@ static inline int spin_needbreak(spinlock_t *lock)
 /*
  * Thread group CPU time accounting.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+void thread_group_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times);
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 
 static inline void thread_group_cputime_init(struct signal_struct *sig)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 8fd709c..e56be4c 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -220,7 +220,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 		cpu->cpu = virt_ticks(p);
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p);
+		cpu->sched = task_sched_runtime(p, true);
 		break;
 	}
 	return 0;
@@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * values through the TIMER_ABSTIME flag, therefore we have
 		 * to synchronize the timer to the clock every time we start
 		 * it.
+		 *
+		 * Do not add the current delta, because
+		 * account_group_exec_runtime() will also this delta and we
+		 * wouldn't want to double account time and get ahead of
+		 * ourselves.
 		 */
-		thread_group_cputime(tsk, &sum);
+		thread_group_cputime(tsk, false, &sum);
 		raw_spin_lock_irqsave(&cputimer->lock, flags);
 		cputimer->running = 1;
 		update_gt_cputime(&cputimer->cputime, &sum);
@@ -275,15 +280,15 @@ static int cpu_clock_sample_group(const clockid_t which_clock,
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, &cputime);
+		thread_group_cputime(p, true, &cputime);
 		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d0465..ad3339f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2664,14 +2664,16 @@ unsigned long long task_delta_exec(struct task_struct *p)
  * In case the task is currently running, return the runtime plus current's
  * pending runtime that have not been accounted yet.
  */
-unsigned long long task_sched_runtime(struct task_struct *p)
+unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
 {
 	unsigned long flags;
 	struct rq *rq;
 	u64 ns = 0;
 
 	rq = task_rq_lock(p, &flags);
-	ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+	ns = p->se.sum_exec_runtime;
+	if (add_delta)
+		ns += do_task_delta_exec(p, rq);
 	task_rq_unlock(rq, p, &flags);
 
 	return ns;
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index e93cca9..69d3f6c 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -293,7 +293,7 @@ static __always_inline bool steal_account_process_tick(void)
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
  * tasks (sum on group iteration) belonging to @tsk's group.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
 	cputime_t utime, stime;
@@ -313,7 +313,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 		task_cputime(t, &utime, &stime);
 		times->utime += utime;
 		times->stime += stime;
-		times->sum_exec_runtime += task_sched_runtime(t);
+		times->sum_exec_runtime += task_sched_runtime(t, add_delta);
 	} while_each_thread(tsk, t);
 out:
 	rcu_read_unlock();
@@ -459,7 +459,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, true, &cputime);
 
 	*ut = cputime.utime;
 	*st = cputime.stime;
@@ -594,7 +594,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
 {
 	struct task_cputime cputime;
 
-	thread_group_cputime(p, &cputime);
+	thread_group_cputime(p, true, &cputime);
 	cputime_adjust(&cputime, &p->signal->prev_cputime, ut, st);
 }
 #endif /* !CONFIG_VIRT_CPU_ACCOUNTING */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 4/7] posix-cpu-timers: timer functions should use timer time instead of clock time kosaki.motohiro
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Currently glibc's rt/tst-cputimer1 testcase sporadically fails because
a timer created by timer_create() may fire earlier than specified.

posix_cpu_timer_set() uses "val" as current time for three purpose. 1)
initialize sig->cputimer. 2) calculation "old" val. 3) calculations an
expires.

(1) and (2) should only use committed time (i.e. without delta_exec)
because run_posix_cpu_timers() don't care of delta_exec and we need
consistency, but (3) need exact current time (aka cpu clock time) because
an expires should be "now + timeout" by definition.

This patch distinguishes between two kinds of "now".

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/kernel_stat.h |    5 -----
 kernel/posix-cpu-timers.c   |   14 ++++++++++++--
 kernel/sched/core.c         |   13 -------------
 3 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index ed5f6ed..f5d4fdf 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -117,11 +117,6 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 	return kstat_cpu(cpu).irqs_sum;
 }
 
-/*
- * Lock/unlock the current runqueue - to extract task statistics:
- */
-extern unsigned long long task_delta_exec(struct task_struct *);
-
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
 extern void account_steal_time(cputime_t);
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index e56be4c..98f354e 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -635,7 +635,7 @@ static int cpu_timer_sample_group(const clockid_t which_clock,
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime + task_delta_exec(p);
+		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
 	return 0;
@@ -749,7 +749,17 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 	}
 
 	if (new_expires.sched != 0 && !(flags & TIMER_ABSTIME)) {
-		cpu_time_add(timer->it_clock, &new_expires, val);
+		union cpu_time_count now;
+
+		/*
+		 * The expires is "now + timeout" by definition. So,
+		 * we need exact current time.
+		 */
+		if (CPUCLOCK_PERTHREAD(timer->it_clock))
+			now = val;
+		else
+			cpu_clock_sample_group(timer->it_clock, p, &now);
+		cpu_time_add(timer->it_clock, &new_expires, now);
 	}
 
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad3339f..b817e6d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2646,19 +2646,6 @@ static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)
 	return ns;
 }
 
-unsigned long long task_delta_exec(struct task_struct *p)
-{
-	unsigned long flags;
-	struct rq *rq;
-	u64 ns = 0;
-
-	rq = task_rq_lock(p, &flags);
-	ns = do_task_delta_exec(p, rq);
-	task_rq_unlock(rq, p, &flags);
-
-	return ns;
-}
-
 /*
  * Return accounted runtime for the task.
  * In case the task is currently running, return the runtime plus current's
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] posix-cpu-timers: timer functions should use timer time instead of clock time
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
                   ` (2 preceding siblings ...)
  2013-05-03  4:47 ` [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

For process timers, we use cpu_clock_sample_group() and cpu_timer_sample_group()
correctly. However, for thread timers, we always use cpu_clock_sample(). This is
wrong because a cpu_clock_sample() accounts uncommitted delta_exec too. And this
is inconsistent against run_posix_cpu_tiemrs().

This is not big matter because the following workaround code in
posix_cpu_timer_get() hides the timer miscalculation issue. However, it makes
rq lock held wrongly and it would be better to fix it.

  if (cpu_time_before(timer->it_clock, now, timer->it.cpu.expires)) {
    ...
  } else {
    /*
     * The timer should have expired already, but the firing
     * hasn't taken place yet.  Say it's just about to expire.
     */
     itp->it_value.tv_nsec = 1;
     itp->it_value.tv_sec = 0;

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/posix-cpu-timers.c |   37 +++++++++++++++++++++++++++----------
 1 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 98f354e..c69b8d8 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -204,11 +204,10 @@ posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
 }
 
 
-/*
- * Sample a per-thread clock for the given task.
- */
-static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
+static int do_cpu_clock_timer_sample(const clockid_t which_clock,
+				     struct task_struct *p,
+				     bool add_delta,
+				     union cpu_time_count *cpu)
 {
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
@@ -220,12 +219,30 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 		cpu->cpu = virt_ticks(p);
 		break;
 	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p, true);
+		cpu->sched = task_sched_runtime(p, add_delta);
 		break;
 	}
 	return 0;
 }
 
+/*
+ * Sample a per-thread clock for the given task.
+ */
+static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, cpu);
+}
+
+/*
+ * Sample a per-thread timer clock for the given task.
+ */
+static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, cpu);
+}
+
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
 	if (b->utime > a->utime)
@@ -700,7 +717,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 	 * check if it's already passed.  In short, we need a sample.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &val);
+		cpu_timer_sample(timer->it_clock, p, &val);
 	} else {
 		cpu_timer_sample_group(timer->it_clock, p, &val);
 	}
@@ -756,7 +773,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 		 * we need exact current time.
 		 */
 		if (CPUCLOCK_PERTHREAD(timer->it_clock))
-			now = val;
+			cpu_clock_sample(timer->it_clock, p, &now);
 		else
 			cpu_clock_sample_group(timer->it_clock, p, &now);
 		cpu_time_add(timer->it_clock, &new_expires, now);
@@ -844,7 +861,7 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
 	 * Sample the clock to take the difference with the expiry time.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &now);
+		cpu_timer_sample(timer->it_clock, p, &now);
 		clear_dead = p->exit_state;
 	} else {
 		read_lock(&tasklist_lock);
@@ -1168,7 +1185,7 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 	 * Fetch the current sample and update the timer's expiry time.
 	 */
 	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
-		cpu_clock_sample(timer->it_clock, p, &now);
+		cpu_timer_sample(timer->it_clock, p, &now);
 		bump_cpu_timer(timer, now);
 		if (unlikely(p->exit_state)) {
 			clear_dead_task(timer, now);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime()
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
                   ` (3 preceding siblings ...)
  2013-05-03  4:47 ` [PATCH 4/7] posix-cpu-timers: timer functions should use timer time instead of clock time kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 6/7] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro
  6 siblings, 0 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

A type of tsk->se.sum_exec_runtime is u64. Thus, reading it is racy when
running 32bit. We should use task_sched_runtime().

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/posix-cpu-timers.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c69b8d8..4043282 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -958,7 +958,8 @@ static void check_thread_timers(struct task_struct *tsk,
 		struct cpu_timer_list *t = list_first_entry(timers,
 						      struct cpu_timer_list,
 						      entry);
-		if (!--maxfire || tsk->se.sum_exec_runtime < t->expires.sched) {
+		unsigned long long runtime = task_sched_runtime(tsk, false);
+		if (!--maxfire || runtime < t->expires.sched) {
 			tsk->cputime_expires.sched_exp = t->expires.sched;
 			break;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] sched: task_sched_runtime introduce micro optimization
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
                   ` (4 preceding siblings ...)
  2013-05-03  4:47 ` [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  2013-05-03  4:47 ` [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro
  6 siblings, 0 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

rq lock in task_sched_runtime() is necessary for two reasons. 1)
accessing se.sum_exec_runtime is not atomic on 32bit and 2)
do_task_delta_exec() require it.

So, 64bit can avoid holding rq lock when add_delta is false and
delta_exec is 0.

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Suggested-by: Paul Turner <pjt@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 kernel/sched/core.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b817e6d..75872c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2657,6 +2657,21 @@ unsigned long long task_sched_runtime(struct task_struct *p, bool add_delta)
 	struct rq *rq;
 	u64 ns = 0;
 
+#ifdef CONFIG_64BIT
+	/*
+	 * 64-bit doesn't need locks to atomically read a 64bit value. So we
+	 * have two optimization chances, 1) when caller doesn't need
+	 * delta_exec and 2) when the task's delta_exec is 0. The former is
+	 * obvious. The latter is complicated. reading ->on_cpu is racy, but
+	 * this is ok. If we race with it leaving cpu, we'll take a lock. So
+	 * we're correct. If we race with it entering cpu, unaccounted time
+	 * is 0. This is indistinguishable from the read occurring a few
+	 * cycles earlier.
+	 */
+	if (!add_delta || !p->on_cpu)
+		return p->se.sum_exec_runtime;
+#endif
+
 	rq = task_rq_lock(p, &flags);
 	ns = p->se.sum_exec_runtime;
 	if (add_delta)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group}
  2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
                   ` (5 preceding siblings ...)
  2013-05-03  4:47 ` [PATCH 6/7] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
@ 2013-05-03  4:47 ` kosaki.motohiro
  6 siblings, 0 replies; 17+ messages in thread
From: kosaki.motohiro @ 2013-05-03  4:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Olivier Langlois, Thomas Gleixner, Frederic Weisbecker,
	Ingo Molnar, Peter Zijlstra, KOSAKI Motohiro

From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Now we have four similar timer related functions, cpu_clock_sample(),
cpu_clock_sample_group(), cpu_timer_sample() and cpu_timer_sample_group().

For readability, make do_cpu_clock_timer_sample() and thread_cputime()
helper functions and all *_sample functions use these.

Cc: Olivier Langlois <olivier@trillion01.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/sched.h     |    1 +
 kernel/posix-cpu-timers.c |  132 +++++++++++++++++++-------------------------
 kernel/sched/cputime.c    |   11 ++++
 3 files changed, 69 insertions(+), 75 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7863d4b..73ac8fa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2622,6 +2622,7 @@ static inline int spin_needbreak(spinlock_t *lock)
 #endif
 }
 
+void thread_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times);
 /*
  * Thread group CPU time accounting.
  */
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 4043282..b33290d 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -203,46 +203,6 @@ posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
 	return error;
 }
 
-
-static int do_cpu_clock_timer_sample(const clockid_t which_clock,
-				     struct task_struct *p,
-				     bool add_delta,
-				     union cpu_time_count *cpu)
-{
-	switch (CPUCLOCK_WHICH(which_clock)) {
-	default:
-		return -EINVAL;
-	case CPUCLOCK_PROF:
-		cpu->cpu = prof_ticks(p);
-		break;
-	case CPUCLOCK_VIRT:
-		cpu->cpu = virt_ticks(p);
-		break;
-	case CPUCLOCK_SCHED:
-		cpu->sched = task_sched_runtime(p, add_delta);
-		break;
-	}
-	return 0;
-}
-
-/*
- * Sample a per-thread clock for the given task.
- */
-static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
-{
-	return do_cpu_clock_timer_sample(which_clock, p, true, cpu);
-}
-
-/*
- * Sample a per-thread timer clock for the given task.
- */
-static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
-			    union cpu_time_count *cpu)
-{
-	return do_cpu_clock_timer_sample(which_clock, p, false, cpu);
-}
-
 static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
 {
 	if (b->utime > a->utime)
@@ -284,34 +244,83 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 }
 
 /*
- * Sample a process (thread group) clock for the given group_leader task.
- * Must be called with tasklist_lock held for reading.
+ * Sample time for the given task.
+ * @which_clock:	Clock id.
+ * @p:			Target task. Must be thread group leader when
+ *			thread_group is true.
+ * @thread_group:	True if want to get process time.
+ * @add_delta:		True if want to get clock time,
+ *			otherwise, get timer time.
  */
-static int cpu_clock_sample_group(const clockid_t which_clock,
-				  struct task_struct *p,
-				  union cpu_time_count *cpu)
+static int do_cpu_clock_timer_sample(const clockid_t which_clock,
+				     struct task_struct *p,
+				     bool thread_group,
+				     bool clock_time,
+				     union cpu_time_count *cpu)
 {
 	struct task_cputime cputime;
 
+	if (thread_group) {
+		if (clock_time)
+			thread_group_cputime(p, true, &cputime);
+		else
+			thread_group_cputimer(p, &cputime);
+	} else
+		thread_cputime(p, clock_time, &cputime);
+
 	switch (CPUCLOCK_WHICH(which_clock)) {
 	default:
 		return -EINVAL;
 	case CPUCLOCK_PROF:
-		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime + cputime.stime;
 		break;
 	case CPUCLOCK_VIRT:
-		thread_group_cputime(p, true, &cputime);
 		cpu->cpu = cputime.utime;
 		break;
 	case CPUCLOCK_SCHED:
-		thread_group_cputime(p, true, &cputime);
 		cpu->sched = cputime.sum_exec_runtime;
 		break;
 	}
 	return 0;
 }
 
+/*
+ * Sample a per-thread clock time for the given task.
+ */
+static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, true, cpu);
+}
+
+/*
+ * Sample a per-process clock time for the given task.
+ */
+static int cpu_clock_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, true, cpu);
+}
+
+/*
+ * Sample a per-thread timer time for the given task.
+ */
+static int cpu_timer_sample(const clockid_t which_clock, struct task_struct *p,
+			    union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, false, false, cpu);
+}
+
+/*
+ * Sample a per-process timer time for the given task.
+ */
+static int cpu_timer_sample_group(const clockid_t which_clock,
+				  struct task_struct *p,
+				  union cpu_time_count *cpu)
+{
+	return do_cpu_clock_timer_sample(which_clock, p, true, false, cpu);
+}
 
 static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp)
 {
@@ -632,33 +641,6 @@ static void cpu_timer_fire(struct k_itimer *timer)
 }
 
 /*
- * Sample a process (thread group) timer for the given group_leader task.
- * Must be called with tasklist_lock held for reading.
- */
-static int cpu_timer_sample_group(const clockid_t which_clock,
-				  struct task_struct *p,
-				  union cpu_time_count *cpu)
-{
-	struct task_cputime cputime;
-
-	thread_group_cputimer(p, &cputime);
-	switch (CPUCLOCK_WHICH(which_clock)) {
-	default:
-		return -EINVAL;
-	case CPUCLOCK_PROF:
-		cpu->cpu = cputime.utime + cputime.stime;
-		break;
-	case CPUCLOCK_VIRT:
-		cpu->cpu = cputime.utime;
-		break;
-	case CPUCLOCK_SCHED:
-		cpu->sched = cputime.sum_exec_runtime;
-		break;
-	}
-	return 0;
-}
-
-/*
  * Guts of sys_timer_settime for CPU timers.
  * This is called with the timer locked and interrupts disabled.
  * If we return TIMER_RETRY, it's necessary to release the timer's lock
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 69d3f6c..29ef7d7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -289,6 +289,17 @@ static __always_inline bool steal_account_process_tick(void)
 	return false;
 }
 
+void thread_cputime(struct task_struct *tsk, bool add_delta, struct task_cputime *times)
+{
+	struct signal_struct *sig = tsk->signal;
+	cputime_t utime, stime;
+
+	task_cputime(tsk, &utime, &stime);
+	times->utime = utime;
+	times->stime = stime;
+	times->sum_exec_runtime = task_sched_runtime(tsk, add_delta);
+}
+
 /*
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
  * tasks (sum on group iteration) belonging to @tsk's group.
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-03  4:47 ` [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
@ 2013-05-06 23:47   ` Frederic Weisbecker
  2013-05-07  2:57     ` KOSAKI Motohiro
  2013-05-07  3:16     ` Olivier Langlois
  0 siblings, 2 replies; 17+ messages in thread
From: Frederic Weisbecker @ 2013-05-06 23:47 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, Olivier Langlois, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

On Fri, May 03, 2013 at 12:47:42AM -0400, kosaki.motohiro@gmail.com wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> When tsk->signal->cputimer->running is 1, signal->cputimer and
> tsk->sum_sched_runtime increase at the same pace because update_curr()
> increases both accounting.
> 
> However, there is one exception. When thread exiting, __exit_signal() turns
> over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
> signal->cputimer accounting.
> 
> This inconsistency makes POSIX timer wake up too early. This patch fixes it.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  kernel/sched/stats.h |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
> index 2ef90a5..5a0cfc4 100644
> --- a/kernel/sched/stats.h
> +++ b/kernel/sched/stats.h
> @@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
>  	if (!cputimer->running)
>  		return;
>  
> +	/*
> +	 * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
> +	 * in __exit_signal(), we must not account exec_runtime for consistency.
> +	 */
> +	if (unlikely(!tsk->sighand))
> +		return;

Ok, if we want the clock and timer to be consistent, do we also want the same check in
account_group_user_time() and account_group_system_time()? The task can still account
a tick after autoreaping itself between release_task() and the final schedule().

> +
>  	raw_spin_lock(&cputimer->lock);
>  	cputimer->cputime.sum_exec_runtime += ns;
>  	raw_spin_unlock(&cputimer->lock);
> -- 
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-06 23:47   ` Frederic Weisbecker
@ 2013-05-07  2:57     ` KOSAKI Motohiro
  2013-05-07 15:24       ` Frederic Weisbecker
  2013-05-07  3:16     ` Olivier Langlois
  1 sibling, 1 reply; 17+ messages in thread
From: KOSAKI Motohiro @ 2013-05-07  2:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Olivier Langlois, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

>> +     /*
>> +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
>> +      * in __exit_signal(), we must not account exec_runtime for consistency.
>> +      */
>> +     if (unlikely(!tsk->sighand))
>> +             return;
>
> Ok, if we want the clock and timer to be consistent, do we also want the same check in
> account_group_user_time() and account_group_system_time()? The task can still account
> a tick after autoreaping itself between release_task() and the final schedule().

You are right.

That said, current the man pages don't describe this linux specific
extensions. So, nobody
(glibc, ltp, and me) tested them. Please give me a couple of days.
I'll test and fix this features
too.

timer_create(2): http://man7.org/linux/man-pages/man2/timer_create.2.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-06 23:47   ` Frederic Weisbecker
  2013-05-07  2:57     ` KOSAKI Motohiro
@ 2013-05-07  3:16     ` Olivier Langlois
  2013-05-11  0:17       ` Frederic Weisbecker
  1 sibling, 1 reply; 17+ messages in thread
From: Olivier Langlois @ 2013-05-07  3:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: kosaki.motohiro, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

On Tue, 2013-05-07 at 01:47 +0200, Frederic Weisbecker wrote:
> On Fri, May 03, 2013 at 12:47:42AM -0400, kosaki.motohiro@gmail.com wrote:
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > 
> > When tsk->signal->cputimer->running is 1, signal->cputimer and
> > tsk->sum_sched_runtime increase at the same pace because update_curr()
> > increases both accounting.
> > 
> > However, there is one exception. When thread exiting, __exit_signal() turns
> > over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
> > signal->cputimer accounting.
> > 
> > This inconsistency makes POSIX timer wake up too early. This patch fixes it.
> > 
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  kernel/sched/stats.h |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
> > index 2ef90a5..5a0cfc4 100644
> > --- a/kernel/sched/stats.h
> > +++ b/kernel/sched/stats.h
> > @@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
> >       if (!cputimer->running)
> >               return;
> >  
> > +     /*
> > +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
> > +      * in __exit_signal(), we must not account exec_runtime for consistency.
> > +      */
> > +     if (unlikely(!tsk->sighand))
> > +             return;
> 
> Ok, if we want the clock and timer to be consistent, do we also want the same check in
> account_group_user_time() and account_group_system_time()? The task can still account
> a tick after autoreaping itself between release_task() and the final schedule().
> 
If I can reply, I believe that Kosaki refactoring is superior to my
initial proposal because:

1. Condition will only be evaluated when cputimer is on (and not slow
down scheduler when it is off)
2. It handles all the cases when this function is called. There are few
other places outside update_curr() that this function is called.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-07  2:57     ` KOSAKI Motohiro
@ 2013-05-07 15:24       ` Frederic Weisbecker
  2013-05-26 21:38         ` KOSAKI Motohiro
  0 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2013-05-07 15:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Olivier Langlois, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

2013/5/7 KOSAKI Motohiro <kosaki.motohiro@gmail.com>:
>>> +     /*
>>> +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
>>> +      * in __exit_signal(), we must not account exec_runtime for consistency.
>>> +      */
>>> +     if (unlikely(!tsk->sighand))
>>> +             return;
>>
>> Ok, if we want the clock and timer to be consistent, do we also want the same check in
>> account_group_user_time() and account_group_system_time()? The task can still account
>> a tick after autoreaping itself between release_task() and the final schedule().
>
> You are right.
>
> That said, current the man pages don't describe this linux specific
> extensions. So, nobody
> (glibc, ltp, and me) tested them. Please give me a couple of days.
> I'll test and fix this features
> too.
>
> timer_create(2): http://man7.org/linux/man-pages/man2/timer_create.2.html

Ah, indeed timer_create() seem to only create CPUCLOCK_SCHED timers. So that
issue with timer_gettime becoming asynchonous with clock_gettime can't happen
with PROF and VIRT clocks

I see itimers can use those clocks. But there don't seem to be a
similar issue with
getitimer/setitimer as they don't have matching clock reads.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-07  3:16     ` Olivier Langlois
@ 2013-05-11  0:17       ` Frederic Weisbecker
  2013-05-11  2:40         ` KOSAKI Motohiro
  0 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2013-05-11  0:17 UTC (permalink / raw)
  To: Olivier Langlois
  Cc: kosaki.motohiro, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

On Mon, May 06, 2013 at 11:16:40PM -0400, Olivier Langlois wrote:
> On Tue, 2013-05-07 at 01:47 +0200, Frederic Weisbecker wrote:
> > On Fri, May 03, 2013 at 12:47:42AM -0400, kosaki.motohiro@gmail.com wrote:
> > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > 
> > > When tsk->signal->cputimer->running is 1, signal->cputimer and
> > > tsk->sum_sched_runtime increase at the same pace because update_curr()
> > > increases both accounting.
> > > 
> > > However, there is one exception. When thread exiting, __exit_signal() turns
> > > over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
> > > signal->cputimer accounting.
> > > 
> > > This inconsistency makes POSIX timer wake up too early. This patch fixes it.
> > > 
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Olivier Langlois <olivier@trillion01.com>
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > ---
> > >  kernel/sched/stats.h |    7 +++++++
> > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
> > > index 2ef90a5..5a0cfc4 100644
> > > --- a/kernel/sched/stats.h
> > > +++ b/kernel/sched/stats.h
> > > @@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
> > >       if (!cputimer->running)
> > >               return;
> > >  
> > > +     /*
> > > +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
> > > +      * in __exit_signal(), we must not account exec_runtime for consistency.
> > > +      */

Please just precise the nature of that consistency: the fact we want CLOCK_PROCESS_CPUTIME_ID
clock and timer to be consistent.

> > > +     if (unlikely(!tsk->sighand))
> > > +             return;
> > 
> > Ok, if we want the clock and timer to be consistent, do we also want the same check in
> > account_group_user_time() and account_group_system_time()? The task can still account
> > a tick after autoreaping itself between release_task() and the final schedule().
> > 
> If I can reply, I believe that Kosaki refactoring is superior to my
> initial proposal because:
> 
> 1. Condition will only be evaluated when cputimer is on (and not slow
> down scheduler when it is off)
> 2. It handles all the cases when this function is called. There are few
> other places outside update_curr() that this function is called.

Agreed. Thanks!

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
  2013-05-03  4:47 ` [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
@ 2013-05-11  0:56   ` Frederic Weisbecker
  2013-05-11  2:17     ` KOSAKI Motohiro
  0 siblings, 1 reply; 17+ messages in thread
From: Frederic Weisbecker @ 2013-05-11  0:56 UTC (permalink / raw)
  To: kosaki.motohiro
  Cc: linux-kernel, Olivier Langlois, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, KOSAKI Motohiro

On Fri, May 03, 2013 at 12:47:43AM -0400, kosaki.motohiro@gmail.com wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Currently glibc rt/tst-cpuclock2 test(*) sporadically fails because
> scheduler delta can be accounted twice from thread_group_cputimer()
> and account_group_exec_runtime().
> 
> Finally, clock_nanosleep() wakes up before an argument. This is posix
> violation. This issue was introduced by commit d670ec1317 (posix-cpu-timers:
> Cure SMP wobbles).
> 
> (*) http://sourceware.org/git/?p=glibc.git;a=blob;f=rt/tst-cpuclock2.c;h=6752721717f959e89c0d692b3f1ee082d507eec2;hb=HEAD
> 
> Cc: Olivier Langlois <olivier@trillion01.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[...]
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index 8fd709c..e56be4c 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -220,7 +220,7 @@ static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
>  		cpu->cpu = virt_ticks(p);
>  		break;
>  	case CPUCLOCK_SCHED:
> -		cpu->sched = task_sched_runtime(p);
> +		cpu->sched = task_sched_runtime(p, true);
>  		break;
>  	}
>  	return 0;
> @@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
>  		 * values through the TIMER_ABSTIME flag, therefore we have
>  		 * to synchronize the timer to the clock every time we start
>  		 * it.
> +		 *
> +		 * Do not add the current delta, because
> +		 * account_group_exec_runtime() will also this delta and we
> +		 * wouldn't want to double account time and get ahead of
> +		 * ourselves.
>  		 */
> -		thread_group_cputime(tsk, &sum);
> +		thread_group_cputime(tsk, false, &sum);
>  		raw_spin_lock_irqsave(&cputimer->lock, flags);

I wonder if we should move thread_group_cputime() inside this lock.
Otherwise we can miss some updates in-between.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice
  2013-05-11  0:56   ` Frederic Weisbecker
@ 2013-05-11  2:17     ` KOSAKI Motohiro
  0 siblings, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2013-05-11  2:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Olivier Langlois, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

>> @@ -250,8 +250,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
>>                * values through the TIMER_ABSTIME flag, therefore we have
>>                * to synchronize the timer to the clock every time we start
>>                * it.
>> +              *
>> +              * Do not add the current delta, because
>> +              * account_group_exec_runtime() will also this delta and we
>> +              * wouldn't want to double account time and get ahead of
>> +              * ourselves.
>>                */
>> -             thread_group_cputime(tsk, &sum);
>> +             thread_group_cputime(tsk, false, &sum);
>>               raw_spin_lock_irqsave(&cputimer->lock, flags);
>
> I wonder if we should move thread_group_cputime() inside this lock.
> Otherwise we can miss some updates in-between.

Hmm..

I don't agree with this. Right, we can miss some updates. But 1)
cputimer->lock doesn't
prevent any update update_curr() only take rq_lock, and 2) POSIX timer
and sleeping
semantics allow longer sleep than an argument. Then, the missing is
safe, nobody
can observe which of the timer_setime() syscall and update_curr()
happened earlier.

Ah, I'm now finding when update_gt_cputime() effectively work. It
helps to avoid timer_settime() vs timer_settime() mess.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-11  0:17       ` Frederic Weisbecker
@ 2013-05-11  2:40         ` KOSAKI Motohiro
  0 siblings, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2013-05-11  2:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Olivier Langlois, LKML, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra

On Fri, May 10, 2013 at 8:17 PM, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> On Mon, May 06, 2013 at 11:16:40PM -0400, Olivier Langlois wrote:
>> On Tue, 2013-05-07 at 01:47 +0200, Frederic Weisbecker wrote:
>> > On Fri, May 03, 2013 at 12:47:42AM -0400, kosaki.motohiro@gmail.com wrote:
>> > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> > >
>> > > When tsk->signal->cputimer->running is 1, signal->cputimer and
>> > > tsk->sum_sched_runtime increase at the same pace because update_curr()
>> > > increases both accounting.
>> > >
>> > > However, there is one exception. When thread exiting, __exit_signal() turns
>> > > over task's sum_shced_runtime to sig->sum_sched_runtime, but it doesn't stop
>> > > signal->cputimer accounting.
>> > >
>> > > This inconsistency makes POSIX timer wake up too early. This patch fixes it.
>> > >
>> > > Cc: Thomas Gleixner <tglx@linutronix.de>
>> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>
>> > > Cc: Ingo Molnar <mingo@kernel.org>
>> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
>> > > Signed-off-by: Olivier Langlois <olivier@trillion01.com>
>> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> > > ---
>> > >  kernel/sched/stats.h |    7 +++++++
>> > >  1 files changed, 7 insertions(+), 0 deletions(-)
>> > >
>> > > diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
>> > > index 2ef90a5..5a0cfc4 100644
>> > > --- a/kernel/sched/stats.h
>> > > +++ b/kernel/sched/stats.h
>> > > @@ -225,6 +225,13 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
>> > >       if (!cputimer->running)
>> > >               return;
>> > >
>> > > +     /*
>> > > +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
>> > > +      * in __exit_signal(), we must not account exec_runtime for consistency.
>> > > +      */
>
> Please just precise the nature of that consistency: the fact we want CLOCK_PROCESS_CPUTIME_ID
> clock and timer to be consistent.

typo? This patch fixes an inconsistency between a thread cputime and a
process cputime.

How is this?

/*
* After turning over se.sum_exec_runtime to sig->sum_sched_runtime
* in __exit_signal(), a per-thread cputime of the thread will be lost. We
* must not account exec_runtime here too because we need to keep
* consistent cputime between per-thread and per-process. Otherwise,
* the inconsistency is observable when single thread program run.
*/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
  2013-05-07 15:24       ` Frederic Weisbecker
@ 2013-05-26 21:38         ` KOSAKI Motohiro
  0 siblings, 0 replies; 17+ messages in thread
From: KOSAKI Motohiro @ 2013-05-26 21:38 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: KOSAKI Motohiro, LKML, Olivier Langlois, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra

(5/7/13 11:24 AM), Frederic Weisbecker wrote:
> 2013/5/7 KOSAKI Motohiro <kosaki.motohiro@gmail.com>:
>>>> +     /*
>>>> +      * After turning over se.sum_exec_runtime to sig->sum_sched_runtime
>>>> +      * in __exit_signal(), we must not account exec_runtime for consistency.
>>>> +      */
>>>> +     if (unlikely(!tsk->sighand))
>>>> +             return;
>>>
>>> Ok, if we want the clock and timer to be consistent, do we also want the same check in
>>> account_group_user_time() and account_group_system_time()? The task can still account
>>> a tick after autoreaping itself between release_task() and the final schedule().
>>
>> You are right.
>>
>> That said, current the man pages don't describe this linux specific
>> extensions. So, nobody
>> (glibc, ltp, and me) tested them. Please give me a couple of days.
>> I'll test and fix this features
>> too.
>>
>> timer_create(2): http://man7.org/linux/man-pages/man2/timer_create.2.html
> 
> Ah, indeed timer_create() seem to only create CPUCLOCK_SCHED timers. So that
> issue with timer_gettime becoming asynchonous with clock_gettime can't happen
> with PROF and VIRT clocks
> 
> I see itimers can use those clocks. But there don't seem to be a
> similar issue with
> getitimer/setitimer as they don't have matching clock reads.

OK. I've found PROF and VIRT clock of posix timer have a bug and I could narrow down
and fixed it. Please see my next iteration.

Thanks.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-26 21:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-03  4:47 [PATCH v4 0/7] posix timers fixlet kosaki.motohiro
2013-05-03  4:47 ` [PATCH 1/7] posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting kosaki.motohiro
2013-05-06 23:47   ` Frederic Weisbecker
2013-05-07  2:57     ` KOSAKI Motohiro
2013-05-07 15:24       ` Frederic Weisbecker
2013-05-26 21:38         ` KOSAKI Motohiro
2013-05-07  3:16     ` Olivier Langlois
2013-05-11  0:17       ` Frederic Weisbecker
2013-05-11  2:40         ` KOSAKI Motohiro
2013-05-03  4:47 ` [PATCH 2/7] posix-cpu-timers: fix acounting delta_exec twice kosaki.motohiro
2013-05-11  0:56   ` Frederic Weisbecker
2013-05-11  2:17     ` KOSAKI Motohiro
2013-05-03  4:47 ` [PATCH 3/7] posix-cpu-timers: fix wrong timer initialization kosaki.motohiro
2013-05-03  4:47 ` [PATCH 4/7] posix-cpu-timers: timer functions should use timer time instead of clock time kosaki.motohiro
2013-05-03  4:47 ` [PATCH 5/7] posix-cpu-timers: check_thread_timers() uses task_sched_runtime() kosaki.motohiro
2013-05-03  4:47 ` [PATCH 6/7] sched: task_sched_runtime introduce micro optimization kosaki.motohiro
2013-05-03  4:47 ` [PATCH 7/7] posix-cpu-timers: cleanup cpu_{clock,timer}_sample{,_group} kosaki.motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox