All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
	Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH 09/10] s390/cputime: delayed accounting of system time
Date: Mon, 12 Dec 2016 11:27:54 +0100	[thread overview]
Message-ID: <20161212112754.5ad104cf@mschwideX1> (raw)
In-Reply-To: <20161210014804.GA3023@lerouge>

On Sat, 10 Dec 2016 02:48:06 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Dec 06, 2016 at 03:32:22AM +0100, Frederic Weisbecker wrote:
> > From: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > 
> > The account_system_time() function is called with a cputime that
> > occurred while running in the kernel. The function detects which
> > context the CPU is currently running in and accounts the time to
> > the correct bucket. This forces the arch code to account the
> > cputime for hardirq and softirq immediately.
> > 
> > Such accounting function can be costly and perform unwelcome divisions
> > and multiplications, among others.
> > 
> > The arch code can delay the accounting for system time. For s390
> > the accounting is done once per timer tick and for each task switch.
> > 
> > Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Stanislaw Gruszka <sgruszka@redhat.com>
> > Cc: Wanpeng Li <wanpeng.li@hotmail.com>
> > [rebase against latest cputime tree, massaged changelog accordingly]
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>  
> 
> Looking at this patch again, I think I need to do another pass on it.
> Comments below:
> 
> >  /*
> >   * Update process times based on virtual cpu times stored by entry.S
> >   * to the lowcore fields user_timer, system_timer & steal_clock.
> >   */
> >  static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  {
> > -	u64 timer, clock, user, system, steal;
> > -	u64 user_scaled, system_scaled;
> > +	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
> >  
> >  	timer = S390_lowcore.last_update_timer;
> >  	clock = S390_lowcore.last_update_clock;
> > @@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  #endif
> >  		: "=m" (S390_lowcore.last_update_timer),
> >  		  "=m" (S390_lowcore.last_update_clock));
> > -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> > -	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> > +	clock = S390_lowcore.last_update_clock - clock;
> > +	timer -= S390_lowcore.last_update_timer;
> > +
> > +	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> > +		S390_lowcore.guest_timer += timer;
> > +	else if (hardirq_count() - hardirq_offset)
> > +		S390_lowcore.hardirq_timer += timer;  
> 
> We should get rid of the hardirq_offset argument, it doesn't really make sense
> anymore. Also it makes the accounting buggy now. It's called from the tick
> through account_user_time() with hardirq_offset=1, so the irq time is incorrectly
> accumulated as system time. Guest time may be incorrect too.
> 
> In fact it may have been buggy even before this patchset because vtime_account_user()
> isn't only called from the tick but also from task switch, and hardirq_offset remains 1
> for those two cases. Not good.

For s390 the do_account_vtime function is called from vtime_task_switch and vtime_flush.
1) vtime_task_switch is exclusively called from finish_task_switch outside of irq context.
   The call to do_account_vtime with hardirq_offset==0 from vtime_task_switch is correct.
2) The call to vtime_flush in vtime_common_task_switch is irrelevant for s390 as we
   define __ARCH_HAS_VTIME_TASK_SWITCH
3) The call to vtime_flush in account_process_tick is done in irq context from
   update_process_times. hardirq_offset==1 is also correct.

As far as s390 is concerned that looks good.

> > +	else if (in_serving_softirq())
> > +		S390_lowcore.softirq_timer += timer;
> > +	else
> > +		S390_lowcore.system_timer += timer;
> >  
> >  	/* Update MT utilization calculation */
> >  	if (smp_cpu_mtid &&
> >  	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> >  		update_mt_scaling();
> >  
> > +	/* Calculate cputime delta */
> >  	user = S390_lowcore.user_timer - tsk->thread.user_timer;
> > -	S390_lowcore.steal_timer -= user;
> >  	tsk->thread.user_timer = S390_lowcore.user_timer;
> > -
> > +	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> > +	tsk->thread.guest_timer = S390_lowcore.guest_timer;
> >  	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> > -	S390_lowcore.steal_timer -= system;
> >  	tsk->thread.system_timer = S390_lowcore.system_timer;
> > +	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> > +	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > +	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> > +	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> > +	S390_lowcore.steal_timer +=
> > +		clock - user - guest - system - hardirq - softirq;
> >  
> > -	user_scaled = user;
> > -	system_scaled = system;
> > -	/* Do MT utilization scaling */
> > -	if (smp_cpu_mtid) {
> > -		u64 mult = __this_cpu_read(mt_scaling_mult);
> > -		u64 div = __this_cpu_read(mt_scaling_div);
> > +	/* Push account value */
> > +	if (user) {
> > +		account_user_time(tsk, user);
> > +		tsk->utimescaled += scale_vtime(user);
> > +	}
> >  
> > -		user_scaled = (user_scaled * mult) / div;
> > -		system_scaled = (system_scaled * mult) / div;
> > +	if (guest) {
> > +		account_guest_time(tsk, guest);
> > +		tsk->utimescaled += scale_vtime(guest);
> >  	}
> > -	account_user_time(tsk, user);
> > -	tsk->utimescaled += user_scaled;
> > -	account_system_time(tsk, hardirq_offset, system);
> > -	tsk->stimescaled += system_scaled;
> > +
> > +	if (system)
> > +		account_system_index_scaled(tsk, system, scale_vtime(system),
> > +					    CPUTIME_SYSTEM);
> > +	if (hardirq)
> > +		account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq),
> > +					    CPUTIME_IRQ);
> > +	if (softirq)
> > +		account_system_index_scaled(tsk, softirq, scale_vtime(softirq),
> > +					    CPUTIME_SOFTIRQ);
> >  
> >  	steal = S390_lowcore.steal_timer;
> >  	if ((s64) steal > 0) {
> > @@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
> >  		account_steal_time(steal);
> >  	}
> >  
> > -	return virt_timer_forward(user + system);
> > +	return virt_timer_forward(user + guest + system + hardirq + softirq);
> >  }
> >  
> >  void vtime_task_switch(struct task_struct *prev)
> >  {
> >  	do_account_vtime(prev, 0);  
> 
> This call should be removed, the task switch already calls vtime_account_user().

The vtime_account_user function is empty for s390..
 
> >  	prev->thread.user_timer = S390_lowcore.user_timer;
> > +	prev->thread.guest_timer = S390_lowcore.guest_timer;
> >  	prev->thread.system_timer = S390_lowcore.system_timer;
> > +	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> > +	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
> >  	S390_lowcore.user_timer = current->thread.user_timer;
> > +	S390_lowcore.guest_timer = current->thread.guest_timer;
> >  	S390_lowcore.system_timer = current->thread.system_timer;
> > +	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> > +	S390_lowcore.softirq_timer = current->thread.softirq_timer;
> >  }  
> 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  reply	other threads:[~2016-12-12 10:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-06  2:32 [PATCH 00/10] vtime: Delay cputime accounting to tick Frederic Weisbecker
2016-12-06  2:32 ` [FIX][PATCH 01/10] powerpc32: Fix stale scaled stime on context switch Frederic Weisbecker
2016-12-06  2:32 ` [FIX][PATCH 02/10] ia64: Fix wrong start cputime assignment on task switch Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 03/10] cputime: Allow accounting system time using cpustat index Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 04/10] cputime: Export account_guest_time Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 05/10] powerpc: Prepare accounting structure for cputime flush on tick Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 06/10] powerpc: Migrate stolen_time field to accounting structure Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 07/10] powerpc/vtime: Accumulate cputime and account only on tick/task switch Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 08/10] ia64: " Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker
2016-12-10  1:48   ` Frederic Weisbecker
2016-12-12 10:27     ` Martin Schwidefsky [this message]
2016-12-12 15:02       ` Frederic Weisbecker
2016-12-13 11:13         ` Martin Schwidefsky
2016-12-13 13:21           ` Martin Schwidefsky
2016-12-14  1:44             ` Frederic Weisbecker
2016-12-20 14:13               ` Martin Schwidefsky
2016-12-20 14:30                 ` Frederic Weisbecker
2016-12-13 14:38           ` Frederic Weisbecker
2016-12-06  2:32 ` [PATCH 10/10] vtime: Rename vtime_account_user() to vtime_flush() Frederic Weisbecker
2016-12-06  4:20 ` [PATCH 00/10] vtime: Delay cputime accounting to tick Paul Mackerras
2016-12-06  7:04   ` Martin Schwidefsky
2016-12-06 14:34   ` Frederic Weisbecker
2016-12-06  8:40 ` Christian Borntraeger
  -- strict thread matches above, loose matches on Subject: below --
2017-01-05 17:11 [PATCH 00/10] vtime: Delay cputime accounting to tick / context switch Frederic Weisbecker
2017-01-05 17:11 ` [PATCH 09/10] s390/cputime: delayed accounting of system time Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161212112754.5ad104cf@mschwideX1 \
    --to=schwidefsky@de.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.