From: tip-bot for Rik van Riel <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: riel@redhat.com, hpa@zytor.com, tglx@linutronix.de,
efault@gmx.de, torvalds@linux-foundation.org,
peterz@infradead.org, linux-kernel@vger.kernel.org,
mingo@kernel.org
Subject: [tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
Date: Mon, 29 Feb 2016 03:18:55 -0800 [thread overview]
Message-ID: <tip-ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd@git.kernel.org> (raw)
In-Reply-To: <1455152907-18495-5-git-send-email-riel@redhat.com>
Commit-ID: ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd
Author: Rik van Riel <riel@redhat.com>
AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 29 Feb 2016 09:53:10 +0100
sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
When profiling syscall overhead on nohz-full kernels,
after removing __acct_update_integrals() from the profile,
native_sched_clock() remains as the top CPU user. This can be
reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
This will reduce timing accuracy on nohz_full CPUs to jiffy
based sampling, just like on normal CPUs. It results in
totally removing native_sched_clock from the profile, and
significantly speeding up the syscall entry and exit path,
as well as irq entry and exit, and KVM guest entry & exit.
Additionally, only call the more expensive functions (and
advance the seqlock) when jiffies actually changed.
This code relies on another CPU advancing jiffies when the
system is busy. On a nohz_full system, this is done by a
housekeeping CPU.
A microbenchmark calling an invalid syscall number 10 million
times in a row speeds up an additional 30% over the numbers
with just the previous patches, for a total speedup of about
40% over 4.4 and 4.5-rc1.
Run times for the microbenchmark:
4.4 3.8 seconds
4.5-rc1 3.7 seconds
4.5-rc1 + first patch 3.3 seconds
4.5-rc1 + first 3 patches 3.1 seconds
4.5-rc1 + all patches 2.3 seconds
A non-NOHZ_FULL cpu (not the housekeeping CPU):
all kernels 1.86 seconds
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: clark@redhat.com
Cc: eric.dumazet@gmail.com
Cc: fweisbec@gmail.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/cputime.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index b2ab2ff..01d9898 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -668,26 +668,25 @@ void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut, cputime
#endif /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
-static unsigned long long vtime_delta(struct task_struct *tsk)
+static cputime_t vtime_delta(struct task_struct *tsk)
{
- unsigned long long clock;
+ unsigned long now = READ_ONCE(jiffies);
- clock = local_clock();
- if (clock < tsk->vtime_snap)
+ if (time_before(now, (unsigned long)tsk->vtime_snap))
return 0;
- return clock - tsk->vtime_snap;
+ return jiffies_to_cputime(now - tsk->vtime_snap);
}
static cputime_t get_vtime_delta(struct task_struct *tsk)
{
- unsigned long long delta = vtime_delta(tsk);
+ unsigned long now = READ_ONCE(jiffies);
+ unsigned long delta = now - tsk->vtime_snap;
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
- tsk->vtime_snap += delta;
+ tsk->vtime_snap = now;
- /* CHECKME: always safe to convert nsecs to cputime? */
- return nsecs_to_cputime(delta);
+ return jiffies_to_cputime(delta);
}
static void __vtime_account_system(struct task_struct *tsk)
@@ -699,6 +698,9 @@ static void __vtime_account_system(struct task_struct *tsk)
void vtime_account_system(struct task_struct *tsk)
{
+ if (!vtime_delta(tsk))
+ return;
+
write_seqcount_begin(&tsk->vtime_seqcount);
__vtime_account_system(tsk);
write_seqcount_end(&tsk->vtime_seqcount);
@@ -707,7 +709,8 @@ void vtime_account_system(struct task_struct *tsk)
void vtime_gen_account_irq_exit(struct task_struct *tsk)
{
write_seqcount_begin(&tsk->vtime_seqcount);
- __vtime_account_system(tsk);
+ if (vtime_delta(tsk))
+ __vtime_account_system(tsk);
if (context_tracking_in_user())
tsk->vtime_snap_whence = VTIME_USER;
write_seqcount_end(&tsk->vtime_seqcount);
@@ -718,16 +721,19 @@ void vtime_account_user(struct task_struct *tsk)
cputime_t delta_cpu;
write_seqcount_begin(&tsk->vtime_seqcount);
- delta_cpu = get_vtime_delta(tsk);
tsk->vtime_snap_whence = VTIME_SYS;
- account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+ if (vtime_delta(tsk)) {
+ delta_cpu = get_vtime_delta(tsk);
+ account_user_time(tsk, delta_cpu, cputime_to_scaled(delta_cpu));
+ }
write_seqcount_end(&tsk->vtime_seqcount);
}
void vtime_user_enter(struct task_struct *tsk)
{
write_seqcount_begin(&tsk->vtime_seqcount);
- __vtime_account_system(tsk);
+ if (vtime_delta(tsk))
+ __vtime_account_system(tsk);
tsk->vtime_snap_whence = VTIME_USER;
write_seqcount_end(&tsk->vtime_seqcount);
}
@@ -742,7 +748,8 @@ void vtime_guest_enter(struct task_struct *tsk)
* that can thus safely catch up with a tickless delta.
*/
write_seqcount_begin(&tsk->vtime_seqcount);
- __vtime_account_system(tsk);
+ if (vtime_delta(tsk))
+ __vtime_account_system(tsk);
current->flags |= PF_VCPU;
write_seqcount_end(&tsk->vtime_seqcount);
}
@@ -772,7 +779,7 @@ void arch_vtime_task_switch(struct task_struct *prev)
write_seqcount_begin(¤t->vtime_seqcount);
current->vtime_snap_whence = VTIME_SYS;
- current->vtime_snap = sched_clock_cpu(smp_processor_id());
+ current->vtime_snap = jiffies;
write_seqcount_end(¤t->vtime_seqcount);
}
@@ -783,7 +790,7 @@ void vtime_init_idle(struct task_struct *t, int cpu)
local_irq_save(flags);
write_seqcount_begin(&t->vtime_seqcount);
t->vtime_snap_whence = VTIME_SYS;
- t->vtime_snap = sched_clock_cpu(cpu);
+ t->vtime_snap = jiffies;
write_seqcount_end(&t->vtime_seqcount);
local_irq_restore(flags);
}
next prev parent reply other threads:[~2016-02-29 11:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-11 1:08 [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% riel
2016-02-11 1:08 ` [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals riel
2016-02-29 11:17 ` [tip:sched/core] sched, time: Remove non-power-of-two divides from __acct_update_integrals() tip-bot for Rik van Riel
2016-02-11 1:08 ` [PATCH 2/4] acct,time: change indentation in __acct_update_integrals riel
2016-02-11 1:23 ` Joe Perches
2016-02-29 11:18 ` [tip:sched/core] acct, time: Change indentation in __acct_update_integrals() tip-bot for Rik van Riel
2016-02-11 1:08 ` [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals riel
2016-02-29 11:18 ` [tip:sched/core] time, acct: Drop irq save & restore from __acct_update_integrals() tip-bot for Rik van Riel
2016-02-11 1:08 ` [PATCH 4/4] sched,time: switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity riel
2016-02-29 11:18 ` tip-bot for Rik van Riel [this message]
2016-02-29 15:31 ` [tip:sched/core] sched, time: Switch " Frederic Weisbecker
2016-03-01 15:35 ` Frederic Weisbecker
2016-02-15 9:01 ` [PATCH 0/4 v6] sched,time: reduce nohz_full syscall overhead 40% Mike Galbraith
2016-02-24 11:16 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tip-ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd@git.kernel.org \
--to=tipbot@zytor.com \
--cc=efault@gmx.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).