All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad@darnok.org>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: xen-devel@lists.xensource.com,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Joe Jin <joe.jin@oracle.com>,
	Zhenzhong Duan <zhenzhong.duan@oracle.com>,
	Jan Beulich <JBeulich@suse.com>, Laszlo Ersek <lersek@redhat.com>
Subject: Re: [PATCH] remove blocked time accounting from xen "clockchip"
Date: Thu, 19 Jan 2012 14:42:39 -0500	[thread overview]
Message-ID: <20120119194232.GA3728@konrad-lan> (raw)
In-Reply-To: <4EBC125A.70300@goop.org>

On Thu, Nov 10, 2011 at 10:05:14AM -0800, Jeremy Fitzhardinge wrote:
> On 11/09/2011 05:35 AM, Jan Beulich wrote:
> >>>> On 18.10.11 at 22:42, Laszlo Ersek <lersek@redhat.com> wrote:
> >> ... because the "clock_event_device framework" already accounts for idle
> >> time through the "event_handler" function pointer in
> >> xen_timer_interrupt().
> >>
> >> The patch is intended as the completion of [1]. It should fix the double
> >> idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
> >> stolen time accounting (the removed code seems to be isolated).
> > After some more looking around I still think it's incorrect, albeit for
> > a different reason: What tick_nohz_restart_sched_tick() accounts
> > as idle time is *all* time that passed while in cpu_idle(). What gets
> > accounted in do_stolen_accounting() (without your patch) is
> > different:
> > - time the vCPU was in RUNSTATE_blocked gets accounted as idle
> > - time the vCPU was in RUNSTATE_runnable and RUNSTATE_offline
> >   gets accounted as stolen.
> >
> > That is, on an overcommitted system (and without your patch) I
> > would expect you to not see the (full) double idle increment for a not
> > fully idle and not fully loaded vCPU.
> >
> > Now we can of course say that for most purposes it's fine to
> > ignore the difference between idle and steal time, but there must
> > be a reason these have been getting accounted separately in Linux
> > without regard to Xen.
> >
> > So I really think that what needs to be suppressed to address this
> > is tick_nohz_restart_sched_tick()'s call to account_idle_ticks(). But
> > simply forcing CONFIG_VIRT_CPU_ACCOUNTING is not an immediate
> > option (not even when considering a Xen-only kernel), as that has
> > further implications. Instead I wonder whether under Xen we should
> > e.g. update per_cpu(tick_cpu_sched, cpu).idle_jiffies prior to calling
> > tick_nohz_restart_sched_tick() (possibly implicitly by doing the
> > update in do_stolen_accounting(), though that wouldn't work when
> > the wakeup is due to an interrupt other than the timer one).
> >
> > The alternative of accounting the negative of the steal time as
> > idle time in do_stolen_accounting() doesn't look correct either,
> > since not all stolen time was necessarily accounted as idle (some
> > would have got - also incorrectly - accounted as process or system
> > time).
> >
> > So my conclusion is that only the full implementation of what
> > CONFIG_VIRT_CPU_ACCOUNTING expects would actually get
> > things sorted out properly (but that would imply Xen *and* native
> > are willing to pay the performance price, or the enabling of this
> > can be made dynamic so that at least native could remain
> > unaffected).
> >
> > Or the negative stolen time should be accounted to the current
> > process, the system, or as idle, depending on the context which
> > do_stolen_accounting() runs in (just like timer_interrupt() did in
> > the XenoLinux kernels for positive accounting).
> 
> I was always suspicious of the timer-interrupt-based stolen time
> accounting code.  It's really a hold-over from when ticks were the main
> timekeeping mechanism, but with highres timers its massive overkill and
> probably a source of performance degradation.
> 
> Overall, the stolen time accounting isn't really very important.  The
> kernel doesn't use it at all internally - it doesn't affect scheduling
> decisions, for example.  It's only exported in /proc/stat, and some
> tools like top will display it if its non-zero.  It could be useful for
> diagnosing performance problems on a heavily loaded host system, but
> other tools like "xentop" would give you as much information.
> 
> So really, all this code is nice to have, but I'm not sure its worth a
> lot of time to fix, especially if it makes idle accounting hard (which
> *is* important).
> 
> However, that said, PeterZ recently added some code to the scheduler so
> that time "stolen" by interrupt handling isn't accounted against the
> task running at the time, and the design is specifically intended to
> also be useful for virtualization stolen time as well.  There are some
> KVM patches floating around to implement this, and we should look at

I finally got some time to look at them and I think they are these ones:

git log --oneline e03b644fe68b1c6401465b02724d261538dba10f..3c404b578fab699c4708279938078d9404b255a4 
3c404b5 KVM guest: Add a pv_ops stub for steal time
c9aaa89 KVM: Steal time implementation
9ddabbe KVM: KVM Steal time guest/host interface
4b6b35f KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface

What is interesting is that they end up inserting a bunch of:

 
+       if (steal_account_process_tick())
+               return;
+

in irqtime_account_process_tick and in account_process_tick.

The steal_account_process_tick just makes a pv_time_ops.steal_clock
based on the asm goto of paravirt_steal_enabled key.

I think if we were to persue this we could rip out in the
'do_stolen_accounting' the call to 'accont_idle_tick' and
'account_idle_time' completly. In essence making that function
behave as a pv_time_ops.steal_clock implementation.

Also that would mean we could remove in the 'xen_timer_interrupt'
function the call to 'do_stolen_accounting'

I think this would mean we would account _only_ for the
RUNSTATE_runnable and RUNSTATE_offline as Laszlo's earlier patch
suggested?

And the RUNSTATE_blocked would be figured out by the existing
mechanism in the sched.c code?

> doing a Xen implementation.  That would be much more practically useful,
> since (I think) it allows the scheduler to be aware of stolen time and
> not penalize tasks for time they never got to use.  Or at the very least
> it could give you per-task stolen stats (maybe?) which would be useful.
> 
>     J
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

  reply	other threads:[~2012-01-19 19:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-18 20:42 [PATCH] remove blocked time accounting from xen "clockchip" Laszlo Ersek
2011-10-19  7:51 ` Jan Beulich
2011-10-19 14:54   ` Laszlo Ersek
2011-10-20 14:35   ` Laszlo Ersek
2011-10-20 15:02     ` Laszlo Ersek
2011-10-26 20:52       ` Konrad Rzeszutek Wilk
2011-11-09 13:35 ` Jan Beulich
2011-11-09 17:47   ` Laszlo Ersek
2011-11-10  8:32     ` Jan Beulich
2011-11-10 18:05   ` Jeremy Fitzhardinge
2012-01-19 19:42     ` Konrad Rzeszutek Wilk [this message]
2012-01-20  9:57       ` Jan Beulich
2012-01-20 16:00         ` Konrad Rzeszutek Wilk
2012-01-23 22:07         ` Konrad Rzeszutek Wilk
2011-12-21  8:32 ` Jan Beulich
2011-12-21 13:53   ` Laszlo Ersek
2011-12-21 14:58     ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2011-12-22  8:49 Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120119194232.GA3728@konrad-lan \
    --to=konrad@darnok.org \
    --cc=JBeulich@suse.com \
    --cc=jeremy@goop.org \
    --cc=joe.jin@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lersek@redhat.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=zhenzhong.duan@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.