* [RFC PATCH] timer: Improve itimers scalability
@ 2015-08-05 0:29 Jason Low
2015-08-05 9:24 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jason Low @ 2015-08-05 0:29 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, linux-kernel
Cc: Frederic Weisbecker, Oleg Nesterov, Paul E. McKenney,
Davidlohr Bueso, Mike Galbraith, terry.rudd, Rik van Riel,
Waiman Long, Scott J Norton, Jason Low
When running a database workload on a 16 socket machine, there were
scalability issues related to itimers.
Commit 1018016c706f addressed the issue with the thread_group_cputimer
spinlock taking up a significant portion of total run time.
This patch addresses the other issue where a lot of time is spent
trying to acquire the sighand lock. It was found in some cases that
200+ threads were simultaneously contending for the same sighand lock.
The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had already sent the signal and updated the "next expire" time. The
fastpath_timer_check() didn't help much since the "next expire" time
was updated later.
The contention for the sighand lock reduced throughput by more than 30%.
This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.
Signed-off-by: Jason Low <jason.low2@hp.com>
---
include/linux/init_task.h | 5 +++--
include/linux/sched.h | 3 +++
kernel/time/posix-cpu-timers.c | 33 ++++++++++++++++++++++++++-------
3 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index d0b380e..c5d216c 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -51,8 +51,9 @@ extern struct fs_struct init_fs;
.cpu_timers = INIT_CPU_TIMERS(sig.cpu_timers), \
.rlim = INIT_RLIMITS, \
.cputimer = { \
- .cputime_atomic = INIT_CPUTIME_ATOMIC, \
- .running = 0, \
+ .cputime_atomic = INIT_CPUTIME_ATOMIC, \
+ .running = 0, \
+ .is_checking_timer = 0, \
}, \
INIT_PREV_CPUTIME(sig) \
.cred_guard_mutex = \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b50082..37e952c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -619,6 +619,8 @@ struct task_cputime_atomic {
* @cputime_atomic: atomic thread group interval timers.
* @running: non-zero when there are timers running and
* @cputime receives updates.
+ * @is_checking_timer: non-zero when a thread is in the process of
+ * checking for thread group timers.
*
* This structure contains the version of task_cputime, above, that is
* used for thread group CPU timer calculations.
@@ -626,6 +628,7 @@ struct task_cputime_atomic {
struct thread_group_cputimer {
struct task_cputime_atomic cputime_atomic;
int running;
+ int is_checking_timer;
};
#include <linux/rwsem.h>
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3da..fba1fc0 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -962,6 +962,14 @@ static void check_process_timers(struct task_struct *tsk,
unsigned long soft;
/*
+ * Signify that a thread is checking for process timers.
+ * The is_checking_timer field is only modified in this function,
+ * which is called with the sighand lock held. Thus, we can
+ * just use WRITE_ONCE() without any further locking.
+ */
+ WRITE_ONCE(sig->cputimer.is_checking_timer, 1);
+
+ /*
* Collect the current process totals.
*/
thread_group_cputimer(tsk, &cputime);
@@ -973,13 +981,6 @@ static void check_process_timers(struct task_struct *tsk,
virt_expires = check_timers_list(++timers, firing, utime);
sched_expires = check_timers_list(++timers, firing, sum_sched_runtime);
- /*
- * Check for the special case process timers.
- */
- check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
- SIGPROF);
- check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
- SIGVTALRM);
soft = READ_ONCE(sig->rlim[RLIMIT_CPU].rlim_cur);
if (soft != RLIM_INFINITY) {
unsigned long psecs = cputime_to_secs(ptime);
@@ -1010,11 +1011,21 @@ static void check_process_timers(struct task_struct *tsk,
}
}
+ /*
+ * Check for the special case process timers.
+ */
+ check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
+ SIGPROF);
+ check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
+ SIGVTALRM);
+
sig->cputime_expires.prof_exp = expires_to_cputime(prof_expires);
sig->cputime_expires.virt_exp = expires_to_cputime(virt_expires);
sig->cputime_expires.sched_exp = sched_expires;
if (task_cputime_zero(&sig->cputime_expires))
stop_process_timers(sig);
+
+ WRITE_ONCE(sig->cputimer.is_checking_timer, 0);
}
/*
@@ -1137,6 +1148,13 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
if (READ_ONCE(sig->cputimer.running)) {
struct task_cputime group_sample;
+ /*
+ * If another thread in the group is already checking
+ * for the thread group cputimer, then we will skip that.
+ */
+ if (READ_ONCE(sig->cputimer.is_checking_timer))
+ return 0;
+
sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);
if (task_cputime_expired(&group_sample, &sig->cputime_expires))
@@ -1174,6 +1192,7 @@ void run_posix_cpu_timers(struct task_struct *tsk)
* put them on the firing list.
*/
check_thread_timers(tsk, &firing);
+
/*
* If there are any active process wide timers (POSIX 1.b, itimers,
* RLIMIT_CPU) cputimer must be running.
--
1.7.2.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-05 0:29 [RFC PATCH] timer: Improve itimers scalability Jason Low
@ 2015-08-05 9:24 ` Peter Zijlstra
2015-08-05 9:37 ` Peter Zijlstra
2015-08-06 14:18 ` Oleg Nesterov
2 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2015-08-05 9:24 UTC (permalink / raw)
To: Jason Low
Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Oleg Nesterov,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton
On Tue, Aug 04, 2015 at 05:29:44PM -0700, Jason Low wrote:
> When running a database workload on a 16 socket machine, there were
> scalability issues related to itimers.
I very much hope you're also trying to convince the relevant database
people that using process wide timers on something they expect to scale
is a horrendously stupid idea?
Nothing obviously broken stood out, but *shees* :-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-05 0:29 [RFC PATCH] timer: Improve itimers scalability Jason Low
2015-08-05 9:24 ` Peter Zijlstra
@ 2015-08-05 9:37 ` Peter Zijlstra
2015-08-05 19:44 ` Jason Low
2015-08-06 14:18 ` Oleg Nesterov
2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-08-05 9:37 UTC (permalink / raw)
To: Jason Low
Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Oleg Nesterov,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton
On Tue, Aug 04, 2015 at 05:29:44PM -0700, Jason Low wrote:
> @@ -1137,6 +1148,13 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
> if (READ_ONCE(sig->cputimer.running)) {
Maybe make that:
if (READ_ONCE(sig->cputimer.running) &&
!READ_ONCE(sig->cputimer.is_checking_timer)) {
> struct task_cputime group_sample;
>
> + /*
> + * If another thread in the group is already checking
> + * for the thread group cputimer, then we will skip that.
> + */
> + if (READ_ONCE(sig->cputimer.is_checking_timer))
> + return 0;
> +
> sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);
>
> if (task_cputime_expired(&group_sample, &sig->cputime_expires))
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-05 9:37 ` Peter Zijlstra
@ 2015-08-05 19:44 ` Jason Low
0 siblings, 0 replies; 7+ messages in thread
From: Jason Low @ 2015-08-05 19:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Oleg Nesterov,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton, jason.low2
On Wed, 2015-08-05 at 11:37 +0200, Peter Zijlstra wrote:
> On Tue, Aug 04, 2015 at 05:29:44PM -0700, Jason Low wrote:
>
> > @@ -1137,6 +1148,13 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
> > if (READ_ONCE(sig->cputimer.running)) {
>
> Maybe make that:
>
> if (READ_ONCE(sig->cputimer.running) &&
> !READ_ONCE(sig->cputimer.is_checking_timer)) {
Yes, I think it would be better if the check is done here.
And perhaps the comment can be modified to:
/*
* Check if thread group timers expired. This is skipped if the cputimer
* is not running or if another thread in the group is already checking
* for thread group cputimers.
*/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-05 0:29 [RFC PATCH] timer: Improve itimers scalability Jason Low
2015-08-05 9:24 ` Peter Zijlstra
2015-08-05 9:37 ` Peter Zijlstra
@ 2015-08-06 14:18 ` Oleg Nesterov
2015-08-06 18:21 ` Jason Low
2 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2015-08-06 14:18 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Frederic Weisbecker,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton
On 08/04, Jason Low wrote:
>
> @@ -973,13 +981,6 @@ static void check_process_timers(struct task_struct *tsk,
> virt_expires = check_timers_list(++timers, firing, utime);
> sched_expires = check_timers_list(++timers, firing, sum_sched_runtime);
>
> - /*
> - * Check for the special case process timers.
> - */
> - check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
> - SIGPROF);
> - check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
> - SIGVTALRM);
> soft = READ_ONCE(sig->rlim[RLIMIT_CPU].rlim_cur);
> if (soft != RLIM_INFINITY) {
> unsigned long psecs = cputime_to_secs(ptime);
> @@ -1010,11 +1011,21 @@ static void check_process_timers(struct task_struct *tsk,
> }
> }
>
> + /*
> + * Check for the special case process timers.
> + */
> + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
> + SIGPROF);
> + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
> + SIGVTALRM);
> +
Not sure I understand this part... looks wrong actually, please note
that RLIMIT_CPU block above may need to update prof_expires _after_
check_cpu_itimer(), or I am totally confused.
> if (READ_ONCE(sig->cputimer.running)) {
> struct task_cputime group_sample;
>
> + /*
> + * If another thread in the group is already checking
> + * for the thread group cputimer, then we will skip that.
> + */
> + if (READ_ONCE(sig->cputimer.is_checking_timer))
> + return 0;
> +
Cosmetic, I won't insist, but this is not symmetrical to ->running check,
if (READ_ONCE(sig->cputimer.running) &&
!READ_ONCE(sig->cputimer.is_checking_timer))
looks a littke bit better to me.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-06 14:18 ` Oleg Nesterov
@ 2015-08-06 18:21 ` Jason Low
2015-08-07 12:01 ` Oleg Nesterov
0 siblings, 1 reply; 7+ messages in thread
From: Jason Low @ 2015-08-06 18:21 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Frederic Weisbecker,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton, jason.low2
On Thu, 2015-08-06 at 16:18 +0200, Oleg Nesterov wrote:
> On 08/04, Jason Low wrote:
> >
> > @@ -973,13 +981,6 @@ static void check_process_timers(struct task_struct *tsk,
> > virt_expires = check_timers_list(++timers, firing, utime);
> > sched_expires = check_timers_list(++timers, firing, sum_sched_runtime);
> >
> > - /*
> > - * Check for the special case process timers.
> > - */
> > - check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
> > - SIGPROF);
> > - check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
> > - SIGVTALRM);
> > soft = READ_ONCE(sig->rlim[RLIMIT_CPU].rlim_cur);
> > if (soft != RLIM_INFINITY) {
> > unsigned long psecs = cputime_to_secs(ptime);
> > @@ -1010,11 +1011,21 @@ static void check_process_timers(struct task_struct *tsk,
> > }
> > }
> >
> > + /*
> > + * Check for the special case process timers.
> > + */
> > + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
> > + SIGPROF);
> > + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
> > + SIGVTALRM);
> > +
>
> Not sure I understand this part... looks wrong actually, please note
> that RLIMIT_CPU block above may need to update prof_expires _after_
> check_cpu_itimer(), or I am totally confused.
This change isn't critical to the patch, so we can delete this from the
patch. Though from my understanding, the purpose of prof_expires is to
collect the earliest prof expire time for when we update
"sig->cputime_expires.prof_exp". So I think it wouldn't matter which
order prof_expire gets updated (as long as check_timers_list() is called
first, since prof_expires gets directly assigned there).
> > if (READ_ONCE(sig->cputimer.running)) {
> > struct task_cputime group_sample;
> >
> > + /*
> > + * If another thread in the group is already checking
> > + * for the thread group cputimer, then we will skip that.
> > + */
> > + if (READ_ONCE(sig->cputimer.is_checking_timer))
> > + return 0;
> > +
>
> Cosmetic, I won't insist, but this is not symmetrical to ->running check,
>
> if (READ_ONCE(sig->cputimer.running) &&
> !READ_ONCE(sig->cputimer.is_checking_timer))
>
> looks a littke bit better to me.
I agree, I will be making that change.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] timer: Improve itimers scalability
2015-08-06 18:21 ` Jason Low
@ 2015-08-07 12:01 ` Oleg Nesterov
0 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2015-08-07 12:01 UTC (permalink / raw)
To: Jason Low
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Frederic Weisbecker,
Paul E. McKenney, Davidlohr Bueso, Mike Galbraith, terry.rudd,
Rik van Riel, Waiman Long, Scott J Norton
On 08/06, Jason Low wrote:
>
> On Thu, 2015-08-06 at 16:18 +0200, Oleg Nesterov wrote:
> > > + /*
> > > + * Check for the special case process timers.
> > > + */
> > > + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_PROF], &prof_expires, ptime,
> > > + SIGPROF);
> > > + check_cpu_itimer(tsk, &sig->it[CPUCLOCK_VIRT], &virt_expires, utime,
> > > + SIGVTALRM);
> > > +
> >
> > Not sure I understand this part... looks wrong actually, please note
> > that RLIMIT_CPU block above may need to update prof_expires _after_
> > check_cpu_itimer(), or I am totally confused.
>
> This change isn't critical to the patch, so we can delete this from the
> patch. Though from my understanding, the purpose of prof_expires is to
> collect the earliest prof expire time for when we update
> "sig->cputime_expires.prof_exp". So I think it wouldn't matter which
> order prof_expire gets updated (as long as check_timers_list() is called
> first, since prof_expires gets directly assigned there).
Yes, I missed that check_cpu_itimer() checks "it->expires < *expires"
before it updates *expires.
Thanks for correcting me!
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-08-07 12:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-05 0:29 [RFC PATCH] timer: Improve itimers scalability Jason Low
2015-08-05 9:24 ` Peter Zijlstra
2015-08-05 9:37 ` Peter Zijlstra
2015-08-05 19:44 ` Jason Low
2015-08-06 14:18 ` Oleg Nesterov
2015-08-06 18:21 ` Jason Low
2015-08-07 12:01 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox