From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Jed Smith <jed@linode.com>
Cc: xen-devel@lists.xensource.com
Subject: Re: Runaway real/sys time in newer paravirt domUs?
Date: Tue, 06 Jul 2010 12:05:01 -0700 [thread overview]
Message-ID: <4C337E5D.7040407@goop.org> (raw)
In-Reply-To: <B64CF51D-D1D4-494C-BEA8-F5C6F0A926B6@linode.com>
On 07/06/2010 09:32 AM, Jed Smith wrote:
> Good morning,
>
> We've had a few reports from domU customers[1] - confirmed by myself - that CPU
> time accounting is very inaccurate in certain circumstances. This issue seems
> to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can't be
> sure of that).
>
> The symptoms of the flaw include top reporting hours and days of CPU consumed by
> a task which has been running for mere seconds of wall time, as well as the
> time(1) utility reporting hundreds of years in some cases. Contra-indicatively,
> the /proc/stat timers on all four VCPUs increment at roughly the expected rate.
> Needless to say, this is puzzling.
>
> A test case which highlights the failure has been brought to our attention by
> Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and
> executes numerous dig(1) processes. At the end of his script, time(1) reports
> 268659840m0.951s of user and 38524003m13.072s of system time consumed. I am
> able to confirm this demonstration using:
>
> - Xen 3.4.1 on dom0 2.6.18.8-931-2
> - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3]
>
> Running Ævar's test case looks like this, in that domU:
>
>
>> real 0m30.741s
>> user 307399002m50.773s
>> sys 46724m44.192s
>>
> However, a quick busyloop in Python seems to report the correct time:
>
>
>> li21-66:~# cat doit.py
>> for i in xrange(10000000):
>> a = i ** 5
>>
>> li21-66:~# time python doit.py
>>
>> real 0m16.600s
>> user 0m16.593s
>> sys 0m0.006s
>>
> I rebooted the domU, and the problem no longer exists. It seems to be transient
> in nature, and difficult to isolate. /proc/stat seems to increment normally:
>
>
>> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu "
>> cpu 3742 0 1560 700180 1326 0 27 1282 0
>> cpu 3742 0 1562 700983 1326 0 27 1282 0
>>
> I'm not sure where to begin with this one - any thoughts?
>
It would be helpful to identify what kernel version the change of
behaviour started in (ideally a git bisect down to a particular change,
but a pair of versions would be close enough).
I think this is the same problem as
https://bugzilla.kernel.org/show_bug.cgi?id=16314
Thanks,
J
next prev parent reply other threads:[~2010-07-06 19:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-06 16:32 Runaway real/sys time in newer paravirt domUs? Jed Smith
2010-07-06 19:05 ` Jeremy Fitzhardinge [this message]
2010-07-06 20:20 ` Jed Smith
2010-07-08 16:38 ` Jed Smith
2010-07-09 8:46 ` Jan Beulich
2010-07-09 14:57 ` Jed Smith
2010-07-09 16:00 ` Jan Beulich
2010-07-13 21:10 ` Jed Smith
2010-07-14 16:31 ` Jed Smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C337E5D.7040407@goop.org \
--to=jeremy@goop.org \
--cc=jed@linode.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).