Re: [PATCH] remove blocked time accounting from xen "clockchip"

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	xen-devel@lists.xensource.com, Joe Jin <joe.jin@oracle.com>,
	Zhenzhong Duan <zhenzhong.duan@oracle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [PATCH] remove blocked time accounting from xen "clockchip"
Date: Wed, 21 Dec 2011 14:53:25 +0100	[thread overview]
Message-ID: <4EF1E4D5.4090301@redhat.com> (raw)
In-Reply-To: <4EF1A7A002000078000694CA@nat28.tlf.novell.com>

On 12/21/11 09:32, Jan Beulich wrote:

> Specifically, without further adjustments, on a 4:3 over-committed
> system doing a kernel build, I'm seeing an over-accounting of
> approximately 4%. I was able to reduce this to slightly above 1%
> with below (experimental and not 32-bit compatible) change:
>
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3864,6 +3864,29 @@
>   	return ns;
>   }
>
> +#ifndef CONFIG_XEN
> +#define _cputime_to_cputime64 cputime_to_cputime64
> +#else
> +#define NS_PER_TICK (1000000000 / HZ)
> +static DEFINE_PER_CPU(u64, stolen_snapshot);
> +static DEFINE_PER_CPU(unsigned int, stolen_residual);
> +
> +static cputime64_t _cputime_to_cputime64(cputime_t t)
> +{
> +	//todo not entirely suitable for 32-bit
> +	u64 s = this_cpu_read(runstate.time[RUNSTATE_runnable]);
> +	unsigned long adj = div_u64_rem(s - this_cpu_read(stolen_snapshot)
> +					  + this_cpu_read(stolen_residual),
> +					NS_PER_TICK,
> +					&__get_cpu_var(stolen_residual));
> +
> +	this_cpu_write(stolen_snapshot, s);
> +	t = cputime_sub(t, jiffies_to_cputime(adj));
> +
> +	return cputime_le(t, cputime_max) ? cputime_to_cputime64(t) : 0;
> +}
> +#endif
> +

Why is this not entirely suitable for 32-bit? div_u64_rem() indeed 
returns an u64, but for the truncation to do actual damage, the retval 
(which is the number of ticks that was stolen from the VCPU) would have 
to be greater than about 4*10^9, which seems improbable with 1 msec long 
ticks. Also cputime_sub() only depends, in order to leave any underflow 
detectable, on adj <= cputime_max (cca. 2*10^9). I'm likely looking in 
the wrong place though.


>   /*
>    * Account user cpu time to a process.
>    * @p: the process that the cpu time gets accounted to
> @@ -3882,7 +3905,7 @@
>   	account_group_user_time(p, cputime);
>
>   	/* Add user time to cpustat. */
> -	tmp = cputime_to_cputime64(cputime);
> +	tmp = _cputime_to_cputime64(cputime);
>   	if (TASK_NICE(p)>  0)
>   		cpustat->nice = cputime64_add(cpustat->nice, tmp);
>   	else
> @@ -3934,7 +3957,7 @@
>   void __account_system_time(struct task_struct *p, cputime_t cputime,
>   			cputime_t cputime_scaled, cputime64_t *target_cputime64)
>   {
> -	cputime64_t tmp = cputime_to_cputime64(cputime);
> +	cputime64_t tmp = _cputime_to_cputime64(cputime);
>
>   	/* Add system time to process. */
>   	p->stime = cputime_add(p->stime, cputime);
> @@ -3996,7 +4019,7 @@
>   void account_idle_time(cputime_t cputime)
>   {
>   	struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
> -	cputime64_t cputime64 = cputime_to_cputime64(cputime);
> +	cputime64_t cputime64 = _cputime_to_cputime64(cputime);
>   	struct rq *rq = this_rq();
>
>   	if (atomic_read(&rq->nr_iowait)>  0)
> @@ -4033,7 +4056,7 @@
>   						struct rq *rq)
>   {
>   	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
> -	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
> +	cputime64_t tmp = _cputime_to_cputime64(cputime_one_jiffy);
>   	struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat;
>
>   	if (irqtime_account_hi_update()) {
>
> I currently have no idea what the remain 1% could be attributed to,
> so thoughts from others would certainly be welcome.

What about irqtime_account_process_tick() calling account_idle_time() 
with "cputime_one_jiffy" -- it could more frequently trigger the 
underflow in _cputime_to_cputime64(). In that case we might want to 
decrease idle time (ie. account "negative" cputime against idle), but can't.

As always, apologies if what I wrote is painfully wrong.
Laszlo

next prev parent reply	other threads:[~2011-12-21 13:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-18 20:42 [PATCH] remove blocked time accounting from xen "clockchip" Laszlo Ersek
2011-10-19  7:51 ` Jan Beulich
2011-10-19 14:54   ` Laszlo Ersek
2011-10-20 14:35   ` Laszlo Ersek
2011-10-20 15:02     ` Laszlo Ersek
2011-10-26 20:52       ` Konrad Rzeszutek Wilk
2011-11-09 13:35 ` Jan Beulich
2011-11-09 17:47   ` Laszlo Ersek
2011-11-10  8:32     ` Jan Beulich
2011-11-10 18:05   ` Jeremy Fitzhardinge
2012-01-19 19:42     ` Konrad Rzeszutek Wilk
2012-01-20  9:57       ` Jan Beulich
2012-01-20 16:00         ` Konrad Rzeszutek Wilk
2012-01-23 22:07         ` Konrad Rzeszutek Wilk
2011-12-21  8:32 ` Jan Beulich
2011-12-21 13:53   ` Laszlo Ersek [this message]
2011-12-21 14:58     ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2011-12-22  8:49 Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EF1E4D5.4090301@redhat.com \
    --to=lersek@redhat.com \
    --cc=JBeulich@suse.com \
    --cc=jeremy@goop.org \
    --cc=joe.jin@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=zhenzhong.duan@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).