linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs
Date: Tue, 22 Nov 2016 14:45:56 +0100	[thread overview]
Message-ID: <20161122134550.GA21436@lerouge> (raw)
In-Reply-To: <20161121111728.13a0a3db@mschwide>

On Mon, Nov 21, 2016 at 11:17:28AM +0100, Martin Schwidefsky wrote:
> On Mon, 21 Nov 2016 07:59:56 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > On Fri, 18 Nov 2016 15:47:02 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > On Fri, Nov 18, 2016 at 01:08:46PM +0100, Martin Schwidefsky wrote:
> > > > On Thu, 17 Nov 2016 19:08:07 +0100
> > > > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > > 
> > > > Now it has been proposed to implement lazy accounting to accumulate deltas
> > > > and do the expensive conversions only infrequently. This is pretty straight-
> > > > forward for account_user_time but to do this for the account_system_time
> > > > function is more complicated. The function has to differentiate between
> > > > guest/hardirq/softirq and pure system time. We would need to keep sums for
> > > > each bucket and provide a separate function to add to each bucket. Like
> > > > account_guest_time(), account_hardirq_time(), account_softirq_time() and
> > > > account_system_time(). Then it is up to the arch code to sort out the details
> > > > and call the accounting code once per jiffy for each of the buckets.
> > > 
> > > That wouldn't be too hard really. The s390 code in vtime.c already does that.
> > 
> > Yes, I agree that the accumulating change would not be too hard. Can I make the
> > request that we try to get that done first before doing the cleanup ?
> 
> Played with the idea a bit, here is a prototype patch to do the delay system time
> accounting for s390. It applies against the latest s390 features tree which you'll
> find here
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> 
> The details probably needs some more work but it works.
> 
> --
> From 1b5ef9ddf899da81a48de826f783b15e6fc45d25 Mon Sep 17 00:00:00 2001
> From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Date: Mon, 21 Nov 2016 10:44:10 +0100
> Subject: [PATCH] s390/cputime: delayed accounting of system time
> 
> The account_system_time() function is called with a cputime that
> occurred while running in the kernel. The function detects which
> context the CPU is currently running in and accounts the time to
> the correct bucket. This forces the arch code to account the
> cputime for hardirq and softirq immediately.
> 
> Make account_guest_time non-static and add account_sys_time,
> account_hardirq_time and account_softirq_time. With these functions
> the arch code can delay the accounting for system time. For s390
> the accounting is done once per timer tick and for each task switch.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Thanks a lot for taking care of that! I'll give a try to do the same
on powerpc.

A few comments below:

> ---
>  arch/s390/include/asm/lowcore.h   |  65 ++++++++++++-----------
>  arch/s390/include/asm/processor.h |   3 ++
>  arch/s390/kernel/vtime.c          | 106 ++++++++++++++++++++++----------------
>  include/linux/kernel_stat.h       |  13 +++--
>  kernel/sched/cputime.c            |  22 +++++++-
>  5 files changed, 129 insertions(+), 80 deletions(-)
> 
> diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> index 62a5cf1..8a5b082 100644
> --- a/arch/s390/include/asm/lowcore.h
> +++ b/arch/s390/include/asm/lowcore.h
[...]
> @@ -110,34 +119,48 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  #endif
>  		: "=m" (S390_lowcore.last_update_timer),
>  		  "=m" (S390_lowcore.last_update_clock));
> -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> +	clock = S390_lowcore.last_update_clock - clock;
> +	timer -= S390_lowcore.last_update_timer;
> +
> +	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> +		S390_lowcore.guest_timer += timer;
> +	else if (hardirq_count() - hardirq_offset)
> +		S390_lowcore.hardirq_timer += timer;
> +	else if (in_serving_softirq())
> +		S390_lowcore.softirq_timer += timer;
> +	else
> +		S390_lowcore.system_timer += timer;

I initially thought that some code could be shared for that whole accumulation. Now I
don't know if it would be a good idea. An example would be to deal with the contexts above
in order to store the accumulation to the appropriate place.

>  
>  	/* Update MT utilization calculation */
>  	if (smp_cpu_mtid &&
>  	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
>  		update_mt_scaling();
>  
> +	/* Calculate cputime delta */
>  	user = S390_lowcore.user_timer - tsk->thread.user_timer;
> -	S390_lowcore.steal_timer -= user;
>  	tsk->thread.user_timer = S390_lowcore.user_timer;
> -
> +	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> +	tsk->thread.guest_timer = S390_lowcore.guest_timer;
>  	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> -	S390_lowcore.steal_timer -= system;
>  	tsk->thread.system_timer = S390_lowcore.system_timer;
> -
> -	user_scaled = user;
> -	system_scaled = system;
> -	/* Do MT utilization scaling */
> -	if (smp_cpu_mtid) {
> -		u64 mult = __this_cpu_read(mt_scaling_mult);
> -		u64 div = __this_cpu_read(mt_scaling_div);
> -
> -		user_scaled = (user_scaled * mult) / div;
> -		system_scaled = (system_scaled * mult) / div;
> -	}
> -	account_user_time(tsk, user, user_scaled);
> -	account_system_time(tsk, hardirq_offset, system, system_scaled);
> +	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> +	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> +	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> +	S390_lowcore.steal_timer +=
> +		clock - user - guest - system - hardirq - softirq;
> +
> +	/* Push account value */
> +	if (user)
> +		account_user_time(tsk, user, scale_vtime(user));
> +	if (guest)
> +		account_guest_time(tsk, guest, scale_vtime(guest));
> +	if (system)
> +		account_sys_time(tsk, system, scale_vtime(system));
> +	if (hardirq)
> +		account_hardirq_time(tsk, hardirq, scale_vtime(hardirq));
> +	if (softirq)
> +		account_softirq_time(tsk, softirq, scale_vtime(softirq));

And doing that would be another part of the shared code.

>  
>  	steal = S390_lowcore.steal_timer;
>  	if ((s64) steal > 0) {
> @@ -145,16 +168,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  		account_steal_time(steal);
>  	}
>  
> -	return virt_timer_forward(user + system);
> +	return virt_timer_forward(user + guest + system + hardirq + softirq);
>  }
>  
>  void vtime_task_switch(struct task_struct *prev)
>  {
>  	do_account_vtime(prev, 0);
>  	prev->thread.user_timer = S390_lowcore.user_timer;
> +	prev->thread.guest_timer = S390_lowcore.guest_timer;
>  	prev->thread.system_timer = S390_lowcore.system_timer;
> +	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
>  	S390_lowcore.user_timer = current->thread.user_timer;
> +	S390_lowcore.guest_timer = current->thread.guest_timer;
>  	S390_lowcore.system_timer = current->thread.system_timer;
> +	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> +	S390_lowcore.softirq_timer = current->thread.softirq_timer;
>  }

Ditto.

>  
>  /*
> @@ -174,31 +203,22 @@ void vtime_account_user(struct task_struct *tsk)
>   */
>  void vtime_account_irq_enter(struct task_struct *tsk)
>  {
> -	u64 timer, system, system_scaled;
> +	u64 timer;
>  
>  	timer = S390_lowcore.last_update_timer;
>  	S390_lowcore.last_update_timer = get_vtimer();
> -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -
> -	/* Update MT utilization calculation */
> -	if (smp_cpu_mtid &&
> -	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> -		update_mt_scaling();
> -
> -	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> -	S390_lowcore.steal_timer -= system;
> -	tsk->thread.system_timer = S390_lowcore.system_timer;
> -	system_scaled = system;
> -	/* Do MT utilization scaling */
> -	if (smp_cpu_mtid) {
> -		u64 mult = __this_cpu_read(mt_scaling_mult);
> -		u64 div = __this_cpu_read(mt_scaling_div);
> -
> -		system_scaled = (system_scaled * mult) / div;
> -	}
> -	account_system_time(tsk, 0, system, system_scaled);
> -
> -	virt_timer_forward(system);
> +	timer -= S390_lowcore.last_update_timer;
> +
> +	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
> +		S390_lowcore.guest_timer += timer;
> +	else if (hardirq_count())
> +		S390_lowcore.hardirq_timer += timer;
> +	else if (in_serving_softirq())
> +		S390_lowcore.softirq_timer += timer;
> +	else
> +		S390_lowcore.system_timer += timer;

And Ditto.

We could put together the accumulation in a common struct in s390_lowcore,
and its mirror in thread struct then have helpers take care of the contexts.

How does that sound to you, would it help or hurt?

Thanks.

  reply	other threads:[~2016-11-22 13:46 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-17 18:08 [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 01/36] jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 02/36] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 03/36] sched: Remove unused INIT_CPUTIME macro Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 04/36] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 05/36] macintosh/rack-meter: Remove cputime_t internal use Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 06/36] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 07/36] cputime: Special API to return old-typed cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 08/36] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 09/36] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 10/36] x86: Convert obsolete cputime type " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 11/36] isdn: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 12/36] binfmt: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 13/36] acct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 14/36] delaycct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 15/36] tsacct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 16/36] signal: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 17/36] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 18/36] posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 19/36] posix-timers: Convert internals to use nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 20/36] itimer: Convert internal cputime_t units to nsec Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 21/36] sched: Remove temporary cputime_t accessors Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 22/36] cputime: Push time to account_user_time() in nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 23/36] cputime: Push time to account_steal_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 24/36] cputime: Push time to account_idle_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 25/36] cputime: Push time to account_system_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 26/36] cputime: Complete nsec conversion of tick based accounting Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 27/36] vtime: Return nsecs instead of cputime_t to account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 28/36] cputime: Remove jiffies based cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 29/36] ia64: Move nsecs based cputime headers to the last arch using it Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 30/36] ia64: Convert vtime to use nsec units directly Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 31/36] ia64: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 32/36] s390: Make arch_cpu_idle_time() to return nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 33/36] powerpc: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 34/36] s390: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 35/36] cputime: Remove unused nsec_to_cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 36/36] cputime: Remove asm generic headers Frederic Weisbecker
2016-11-18 12:08 ` [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Martin Schwidefsky
2016-11-18 14:47   ` Frederic Weisbecker
2016-11-21  6:59     ` Martin Schwidefsky
2016-11-21 10:17       ` Martin Schwidefsky
2016-11-22 13:45         ` Frederic Weisbecker [this message]
2016-11-22 14:28           ` Martin Schwidefsky
2016-11-21 16:20       ` Frederic Weisbecker
2016-11-22  6:11         ` Martin Schwidefsky
2016-11-21  9:49     ` Ingo Molnar
2016-11-21 16:22       ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161122134550.GA21436@lerouge \
    --to=fweisbec@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).