* [PATCH 1/3] posix-timers: Correctly get dying task time sample in posix_cpu_timer_schedule()
2013-03-30 13:15 [PATCH 0/3] posix_timers: A few expiry caching fixes Frederic Weisbecker
@ 2013-03-30 13:15 ` Frederic Weisbecker
2013-04-01 0:07 ` Chen Gang
2013-03-30 13:15 ` [PATCH 2/3] posix_timers: Fix racy timer delta caching on task exit Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2013-03-30 13:15 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, Frederic Weisbecker, Stanislaw Gruszka, Thomas Gleixner,
Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Chen Gang
In order to arm the next timer to schedule, we take a sample of the
current process or thread cputime.
If the task is dying though, we don't arm anything but we
cache the remaining timer expiration delta for further reads.
Something similar is performed in posix_cpu_timer_get() but
here we forget to take the process wide cputime sample
before caching it.
As a result we are storing random stack content, leading
every further reads of that timer to return junk values.
Fix this by taking the appropriate sample in the case of
process wide timers.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Chen Gang <gang.chen@asianux.com>
---
kernel/posix-cpu-timers.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index edf94b6..afd79a9 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -1062,6 +1062,7 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
* not yet reaped. Take this opportunity to
* drop our task ref.
*/
+ cpu_timer_sample_group(timer->it_clock, p, &now);
clear_dead_task(timer, now);
goto out_unlock;
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 2/3] posix_timers: Fix racy timer delta caching on task exit
2013-03-30 13:15 [PATCH 0/3] posix_timers: A few expiry caching fixes Frederic Weisbecker
2013-03-30 13:15 ` [PATCH 1/3] posix-timers: Correctly get dying task time sample in posix_cpu_timer_schedule() Frederic Weisbecker
@ 2013-03-30 13:15 ` Frederic Weisbecker
2013-03-30 13:15 ` [PATCH 3/3] posix_timers: Remove dead task timer expiry caching Frederic Weisbecker
2013-04-08 23:55 ` [PATCH 0/3] posix_timers: A few expiry caching fixes Andrew Morton
3 siblings, 0 replies; 9+ messages in thread
From: Frederic Weisbecker @ 2013-03-30 13:15 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, Frederic Weisbecker, Stanislaw Gruszka, Thomas Gleixner,
Peter Zijlstra, Ingo Molnar, Oleg Nesterov
When a task exits, we perform a caching of the remaining
cputime delta before expiring of its timers.
This is done from the following places:
* When the task is reaped. We iterate through its list of
posix cpu timers and store the remaining timer delta to
the timer struct instead of the absolute value.
(See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )
* When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
If the timer's task is considered dying when watched from these
places, the same conversion from absolute to relative expiry time
is performed. Then the given task's reference is released.
(See clear_dead_task() ).
The relevance of this caching is questionable but this is another
and deeper debate.
The big issue here is that these two sources of caching don't mix
up very well together.
More specifically, the caching can easily be done twice, resulting
in a wrong delta as it gets spuriously substracted a second time by
the elapsed clock. This can happen in the following scenario:
The task exits and gets reaped: we call posix_cpu_timers_exit()
and the absolute timer expiry values are converted to a relative
delta.
timer_gettime() -> posix_cpu_timer_get() is called and relies on
clear_dead_task() because tsk->exit_state == EXIT_DEAD.
The delta gets substracted again by the elapsed clock and we return
a wrong result.
To fix this, just remove the caching done on task reaping time.
It doesn't bring much value on its own. The caching done from
posix_cpu_timer_get/schedule is enough.
And it would also be hard to get it really right: we could make it
put and clear the target task in the timer struct so that readers
know if they are dealing with a relative cached of absolute value.
But it would be racy. The only safe way to do it would be to lock
the itimer->it_lock so that we know nobody reads the cputime expiry
value while we modify it and its target task reference. Doing so
would involve some funny workarounds to avoid circular lock against
the sighand lock. There is just no reason to maintain this.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/posix-cpu-timers.c | 22 +++++++++++-----------
1 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index afd79a9..877439b 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -387,14 +387,8 @@ static void cleanup_timers_list(struct list_head *head,
{
struct cpu_timer_list *timer, *next;
- list_for_each_entry_safe(timer, next, head, entry) {
+ list_for_each_entry_safe(timer, next, head, entry)
list_del_init(&timer->entry);
- if (timer->expires < curr) {
- timer->expires = 0;
- } else {
- timer->expires -= curr;
- }
- }
}
/*
@@ -442,15 +436,21 @@ void posix_cpu_timers_exit_group(struct task_struct *tsk)
tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
}
-static void clear_dead_task(struct k_itimer *timer, unsigned long long now)
+static void clear_dead_task(struct k_itimer *itimer, unsigned long long now)
{
+ struct cpu_timer_list *timer = &itimer->it.cpu;
+
/*
* That's all for this thread or process.
* We leave our residual in expires to be reported.
*/
- put_task_struct(timer->it.cpu.task);
- timer->it.cpu.task = NULL;
- timer->it.cpu.expires -= now;
+ put_task_struct(timer->task);
+ timer->task = NULL;
+ if (timer->expires < now) {
+ timer->expires = 0;
+ } else {
+ timer->expires -= now;
+ }
}
static inline int expires_gt(cputime_t expires, cputime_t new_exp)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 3/3] posix_timers: Remove dead task timer expiry caching
2013-03-30 13:15 [PATCH 0/3] posix_timers: A few expiry caching fixes Frederic Weisbecker
2013-03-30 13:15 ` [PATCH 1/3] posix-timers: Correctly get dying task time sample in posix_cpu_timer_schedule() Frederic Weisbecker
2013-03-30 13:15 ` [PATCH 2/3] posix_timers: Fix racy timer delta caching on task exit Frederic Weisbecker
@ 2013-03-30 13:15 ` Frederic Weisbecker
2013-04-08 23:55 ` [PATCH 0/3] posix_timers: A few expiry caching fixes Andrew Morton
3 siblings, 0 replies; 9+ messages in thread
From: Frederic Weisbecker @ 2013-03-30 13:15 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, Frederic Weisbecker, Stanislaw Gruszka, Thomas Gleixner,
Peter Zijlstra, Ingo Molnar, Oleg Nesterov
When reading a timer sample, posix_cpu_timer_get() and
posix_cpu_timer_schedule() both perform a caching of the timer
expiry time by converting its value from absolute to relative
if the task has exited.
The reason for this caching is not clear though, it could be:
1) For performance reasons: no need to calculate the delta after
the task has died, its cputime won't change anymore. We can
thus avoid some locking (sighand, tasklist_lock, rq->lock for
task_delta_exec(), ...), and various operations to calculate
the sample...
2) To keep the remaining delta for the timer available after the
task has died. When it gets reaped, its sighand disappears, so
accessing the process wide cputime through tsk->signal is probably
not safe.
Now, is the first reason really worth it? I have no idea if it is
a case we really want to optimize.
Considering the second reason, we return a disarmed zero'ed timer
when tsk->sighand == NULL. So if this is an assumed reason, it's
broken. And this case only concern process wide timers that
have their group leader reaped. The posix cpu timer shouldn't even
be available anymore at that time. Unless the group leader changed
since we called posix_cpu_timer_create() after an exec?
Anyway for now I'm sending this as an RFC because there may well
be subtle things I left behind.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
---
kernel/posix-cpu-timers.c | 67 +-------------------------------------------
1 files changed, 2 insertions(+), 65 deletions(-)
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index 877439b..2388062 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -436,23 +436,6 @@ void posix_cpu_timers_exit_group(struct task_struct *tsk)
tsk->se.sum_exec_runtime + sig->sum_sched_runtime);
}
-static void clear_dead_task(struct k_itimer *itimer, unsigned long long now)
-{
- struct cpu_timer_list *timer = &itimer->it.cpu;
-
- /*
- * That's all for this thread or process.
- * We leave our residual in expires to be reported.
- */
- put_task_struct(timer->task);
- timer->task = NULL;
- if (timer->expires < now) {
- timer->expires = 0;
- } else {
- timer->expires -= now;
- }
-}
-
static inline int expires_gt(cputime_t expires, cputime_t new_exp)
{
return expires == 0 || expires > new_exp;
@@ -737,7 +720,6 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
{
unsigned long long now;
struct task_struct *p = timer->it.cpu.task;
- int clear_dead;
/*
* Easy part: convert the reload time.
@@ -750,23 +732,11 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
return;
}
- if (unlikely(p == NULL)) {
- /*
- * This task already died and the timer will never fire.
- * In this case, expires is actually the dead value.
- */
- dead:
- sample_to_timespec(timer->it_clock, timer->it.cpu.expires,
- &itp->it_value);
- return;
- }
-
/*
* Sample the clock to take the difference with the expiry time.
*/
if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
cpu_clock_sample(timer->it_clock, p, &now);
- clear_dead = p->exit_state;
} else {
read_lock(&tasklist_lock);
if (unlikely(p->sighand == NULL)) {
@@ -775,29 +745,16 @@ static void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp)
* We can't even collect a sample any more.
* Call the timer disarmed, nothing else to do.
*/
- put_task_struct(p);
- timer->it.cpu.task = NULL;
timer->it.cpu.expires = 0;
+ itp->it_value.tv_sec = itp->it_value.tv_nsec = 0;
read_unlock(&tasklist_lock);
- goto dead;
+ return;
} else {
cpu_timer_sample_group(timer->it_clock, p, &now);
- clear_dead = (unlikely(p->exit_state) &&
- thread_group_empty(p));
}
read_unlock(&tasklist_lock);
}
- if (unlikely(clear_dead)) {
- /*
- * We've noticed that the thread is dead, but
- * not yet reaped. Take this opportunity to
- * drop our task ref.
- */
- clear_dead_task(timer, now);
- goto dead;
- }
-
if (now < timer->it.cpu.expires) {
sample_to_timespec(timer->it_clock,
timer->it.cpu.expires - now,
@@ -1027,22 +984,12 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
struct task_struct *p = timer->it.cpu.task;
unsigned long long now;
- if (unlikely(p == NULL))
- /*
- * The task was cleaned up already, no future firings.
- */
- goto out;
-
/*
* Fetch the current sample and update the timer's expiry time.
*/
if (CPUCLOCK_PERTHREAD(timer->it_clock)) {
cpu_clock_sample(timer->it_clock, p, &now);
bump_cpu_timer(timer, now);
- if (unlikely(p->exit_state)) {
- clear_dead_task(timer, now);
- goto out;
- }
read_lock(&tasklist_lock); /* arm_timer needs it. */
spin_lock(&p->sighand->siglock);
} else {
@@ -1056,15 +1003,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
timer->it.cpu.task = p = NULL;
timer->it.cpu.expires = 0;
goto out_unlock;
- } else if (unlikely(p->exit_state) && thread_group_empty(p)) {
- /*
- * We've noticed that the thread is dead, but
- * not yet reaped. Take this opportunity to
- * drop our task ref.
- */
- cpu_timer_sample_group(timer->it_clock, p, &now);
- clear_dead_task(timer, now);
- goto out_unlock;
}
spin_lock(&p->sighand->siglock);
cpu_timer_sample_group(timer->it_clock, p, &now);
@@ -1082,7 +1020,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
out_unlock:
read_unlock(&tasklist_lock);
-out:
timer->it_overrun_last = timer->it_overrun;
timer->it_overrun = -1;
++timer->it_requeue_pending;
--
1.7.5.4
^ permalink raw reply related [flat|nested] 9+ messages in thread