From: Frederic Weisbecker <fweisbec@gmail.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Tony Luck <tony.luck@intel.com>,
Wanpeng Li <wanpeng.li@hotmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Thomas Gleixner <tglx@linutronix.de>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs
Date: Tue, 22 Nov 2016 14:45:56 +0100 [thread overview]
Message-ID: <20161122134550.GA21436@lerouge> (raw)
In-Reply-To: <20161121111728.13a0a3db@mschwide>
On Mon, Nov 21, 2016 at 11:17:28AM +0100, Martin Schwidefsky wrote:
> On Mon, 21 Nov 2016 07:59:56 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
>
> > On Fri, 18 Nov 2016 15:47:02 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > > On Fri, Nov 18, 2016 at 01:08:46PM +0100, Martin Schwidefsky wrote:
> > > > On Thu, 17 Nov 2016 19:08:07 +0100
> > > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > >
> > > > Now it has been proposed to implement lazy accounting to accumulate deltas
> > > > and do the expensive conversions only infrequently. This is pretty straight-
> > > > forward for account_user_time but to do this for the account_system_time
> > > > function is more complicated. The function has to differentiate between
> > > > guest/hardirq/softirq and pure system time. We would need to keep sums for
> > > > each bucket and provide a separate function to add to each bucket. Like
> > > > account_guest_time(), account_hardirq_time(), account_softirq_time() and
> > > > account_system_time(). Then it is up to the arch code to sort out the details
> > > > and call the accounting code once per jiffy for each of the buckets.
> > >
> > > That wouldn't be too hard really. The s390 code in vtime.c already does that.
> >
> > Yes, I agree that the accumulating change would not be too hard. Can I make the
> > request that we try to get that done first before doing the cleanup ?
>
> Played with the idea a bit, here is a prototype patch to do the delay system time
> accounting for s390. It applies against the latest s390 features tree which you'll
> find here
>
> git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
>
> The details probably needs some more work but it works.
>
> --
> From 1b5ef9ddf899da81a48de826f783b15e6fc45d25 Mon Sep 17 00:00:00 2001
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Date: Mon, 21 Nov 2016 10:44:10 +0100
> Subject: [PATCH] s390/cputime: delayed accounting of system time
>
> The account_system_time() function is called with a cputime that
> occurred while running in the kernel. The function detects which
> context the CPU is currently running in and accounts the time to
> the correct bucket. This forces the arch code to account the
> cputime for hardirq and softirq immediately.
>
> Make account_guest_time non-static and add account_sys_time,
> account_hardirq_time and account_softirq_time. With these functions
> the arch code can delay the accounting for system time. For s390
> the accounting is done once per timer tick and for each task switch.
>
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Thanks a lot for taking care of that! I'll give a try to do the same
on powerpc.
A few comments below:
> ---
> arch/s390/include/asm/lowcore.h | 65 ++++++++++++-----------
> arch/s390/include/asm/processor.h | 3 ++
> arch/s390/kernel/vtime.c | 106 ++++++++++++++++++++++----------------
> include/linux/kernel_stat.h | 13 +++--
> kernel/sched/cputime.c | 22 +++++++-
> 5 files changed, 129 insertions(+), 80 deletions(-)
>
> diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> index 62a5cf1..8a5b082 100644
> --- a/arch/s390/include/asm/lowcore.h
> +++ b/arch/s390/include/asm/lowcore.h
[...]
> @@ -110,34 +119,48 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> #endif
> : "=m" (S390_lowcore.last_update_timer),
> "=m" (S390_lowcore.last_update_clock));
> - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> - S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> + clock = S390_lowcore.last_update_clock - clock;
> + timer -= S390_lowcore.last_update_timer;
> +
> + if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> + S390_lowcore.guest_timer += timer;
> + else if (hardirq_count() - hardirq_offset)
> + S390_lowcore.hardirq_timer += timer;
> + else if (in_serving_softirq())
> + S390_lowcore.softirq_timer += timer;
> + else
> + S390_lowcore.system_timer += timer;
I initially thought that some code could be shared for that whole accumulation. Now I
don't know if it would be a good idea. An example would be to deal with the contexts above
in order to store the accumulation to the appropriate place.
>
> /* Update MT utilization calculation */
> if (smp_cpu_mtid &&
> time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> update_mt_scaling();
>
> + /* Calculate cputime delta */
> user = S390_lowcore.user_timer - tsk->thread.user_timer;
> - S390_lowcore.steal_timer -= user;
> tsk->thread.user_timer = S390_lowcore.user_timer;
> -
> + guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> + tsk->thread.guest_timer = S390_lowcore.guest_timer;
> system = S390_lowcore.system_timer - tsk->thread.system_timer;
> - S390_lowcore.steal_timer -= system;
> tsk->thread.system_timer = S390_lowcore.system_timer;
> -
> - user_scaled = user;
> - system_scaled = system;
> - /* Do MT utilization scaling */
> - if (smp_cpu_mtid) {
> - u64 mult = __this_cpu_read(mt_scaling_mult);
> - u64 div = __this_cpu_read(mt_scaling_div);
> -
> - user_scaled = (user_scaled * mult) / div;
> - system_scaled = (system_scaled * mult) / div;
> - }
> - account_user_time(tsk, user, user_scaled);
> - account_system_time(tsk, hardirq_offset, system, system_scaled);
> + hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> + tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> + softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> + tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> + S390_lowcore.steal_timer +=
> + clock - user - guest - system - hardirq - softirq;
> +
> + /* Push account value */
> + if (user)
> + account_user_time(tsk, user, scale_vtime(user));
> + if (guest)
> + account_guest_time(tsk, guest, scale_vtime(guest));
> + if (system)
> + account_sys_time(tsk, system, scale_vtime(system));
> + if (hardirq)
> + account_hardirq_time(tsk, hardirq, scale_vtime(hardirq));
> + if (softirq)
> + account_softirq_time(tsk, softirq, scale_vtime(softirq));
And doing that would be another part of the shared code.
>
> steal = S390_lowcore.steal_timer;
> if ((s64) steal > 0) {
> @@ -145,16 +168,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> account_steal_time(steal);
> }
>
> - return virt_timer_forward(user + system);
> + return virt_timer_forward(user + guest + system + hardirq + softirq);
> }
>
> void vtime_task_switch(struct task_struct *prev)
> {
> do_account_vtime(prev, 0);
> prev->thread.user_timer = S390_lowcore.user_timer;
> + prev->thread.guest_timer = S390_lowcore.guest_timer;
> prev->thread.system_timer = S390_lowcore.system_timer;
> + prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> + prev->thread.softirq_timer = S390_lowcore.softirq_timer;
> S390_lowcore.user_timer = current->thread.user_timer;
> + S390_lowcore.guest_timer = current->thread.guest_timer;
> S390_lowcore.system_timer = current->thread.system_timer;
> + S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> + S390_lowcore.softirq_timer = current->thread.softirq_timer;
> }
Ditto.
>
> /*
> @@ -174,31 +203,22 @@ void vtime_account_user(struct task_struct *tsk)
> */
> void vtime_account_irq_enter(struct task_struct *tsk)
> {
> - u64 timer, system, system_scaled;
> + u64 timer;
>
> timer = S390_lowcore.last_update_timer;
> S390_lowcore.last_update_timer = get_vtimer();
> - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -
> - /* Update MT utilization calculation */
> - if (smp_cpu_mtid &&
> - time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> - update_mt_scaling();
> -
> - system = S390_lowcore.system_timer - tsk->thread.system_timer;
> - S390_lowcore.steal_timer -= system;
> - tsk->thread.system_timer = S390_lowcore.system_timer;
> - system_scaled = system;
> - /* Do MT utilization scaling */
> - if (smp_cpu_mtid) {
> - u64 mult = __this_cpu_read(mt_scaling_mult);
> - u64 div = __this_cpu_read(mt_scaling_div);
> -
> - system_scaled = (system_scaled * mult) / div;
> - }
> - account_system_time(tsk, 0, system, system_scaled);
> -
> - virt_timer_forward(system);
> + timer -= S390_lowcore.last_update_timer;
> +
> + if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
> + S390_lowcore.guest_timer += timer;
> + else if (hardirq_count())
> + S390_lowcore.hardirq_timer += timer;
> + else if (in_serving_softirq())
> + S390_lowcore.softirq_timer += timer;
> + else
> + S390_lowcore.system_timer += timer;
And Ditto.
We could put together the accumulation in a common struct in s390_lowcore,
and its mirror in thread struct then have helpers take care of the contexts.
How does that sound to you, would it help or hurt?
Thanks.
next prev parent reply other threads:[~2016-11-22 13:46 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-17 18:08 [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 01/36] jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 02/36] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 03/36] sched: Remove unused INIT_CPUTIME macro Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 04/36] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 05/36] macintosh/rack-meter: Remove cputime_t internal use Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 06/36] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 07/36] cputime: Special API to return old-typed cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 08/36] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 09/36] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 10/36] x86: Convert obsolete cputime type " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 11/36] isdn: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 12/36] binfmt: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 13/36] acct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 14/36] delaycct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 15/36] tsacct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 16/36] signal: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 17/36] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 18/36] posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 19/36] posix-timers: Convert internals to use nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 20/36] itimer: Convert internal cputime_t units to nsec Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 21/36] sched: Remove temporary cputime_t accessors Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 22/36] cputime: Push time to account_user_time() in nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 23/36] cputime: Push time to account_steal_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 24/36] cputime: Push time to account_idle_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 25/36] cputime: Push time to account_system_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 26/36] cputime: Complete nsec conversion of tick based accounting Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 27/36] vtime: Return nsecs instead of cputime_t to account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 28/36] cputime: Remove jiffies based cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 29/36] ia64: Move nsecs based cputime headers to the last arch using it Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 30/36] ia64: Convert vtime to use nsec units directly Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 31/36] ia64: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 32/36] s390: Make arch_cpu_idle_time() to return nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 33/36] powerpc: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 34/36] s390: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 35/36] cputime: Remove unused nsec_to_cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 36/36] cputime: Remove asm generic headers Frederic Weisbecker
2016-11-18 12:08 ` [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Martin Schwidefsky
2016-11-18 14:47 ` Frederic Weisbecker
2016-11-21 6:59 ` Martin Schwidefsky
2016-11-21 10:17 ` Martin Schwidefsky
2016-11-22 13:45 ` Frederic Weisbecker [this message]
2016-11-22 14:28 ` Martin Schwidefsky
2016-11-21 16:20 ` Frederic Weisbecker
2016-11-22 6:11 ` Martin Schwidefsky
2016-11-21 9:49 ` Ingo Molnar
2016-11-21 16:22 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161122134550.GA21436@lerouge \
--to=fweisbec@gmail.com \
--cc=benh@kernel.crashing.org \
--cc=fenghua.yu@intel.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=schwidefsky@de.ibm.com \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).