* [Qemu-devel] Steal time MSR not set properly during live migration?
@ 2015-06-03 12:12 Apollon Oikonomopoulos
2015-06-11 20:46 ` Apollon Oikonomopoulos
0 siblings, 1 reply; 4+ messages in thread
From: Apollon Oikonomopoulos @ 2015-06-03 12:12 UTC (permalink / raw)
To: qemu-devel; +Cc: pkg-qemu-devel, debian-admin
Hi,
I'm trying to debug an issue we're having with some debian.org machines
running in QEMU 2.1.2 instances (see [1] for more background). In short,
after a live migration guests running Debian Jessie (linux 3.16) stop
accounting CPU time properly. /proc/stat in the guest shows no increase
in user and system time anymore (regardless of workload) and what stands
out are extremely large values for steal time:
% cat /proc/stat
cpu 2400 0 1842 650879168 2579640 0 25 136562317270 0 0
cpu0 1366 0 1028 161392988 1238598 0 11 383803090749 0 0
cpu1 294 0 240 162582008 639105 0 8 39686436048 0 0
cpu2 406 0 338 163331066 383867 0 4 333994238765 0 0
cpu3 332 0 235 163573105 318069 0 1 1223752959076 0 0
intr 355773871 33 10 0 0 0 0 3 0 1 0 0 36 144 0 0 1638612 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5001741 41 0 8516993 0 3669582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 837862829
btime 1431642967
processes 8529939
procs_running 1
procs_blocked 0
softirq 225193331 2 77532878 172 7250024 819289 0 54 33739135 176552 105675225
Reading the memory pointed to by the steal time MSRs pre- and
post-migration, I can see that post-migration the high bytes are set to
0xff:
(qemu) xp /8b 0x1fc0cfc0
000000001fc0cfc0: 0x94 0x57 0x77 0xf5 0xff 0xff 0xff 0xff
The "jump" in steal time happens when the guest is resumed on the
receiving side.
I've also been able to consistently reproduce this on a Ganeti cluster
at work, using QEMU 2.1.3 and kernels 3.16 and 4.0 in the guests. The
issue goes away if I disable the steal time MSR using `-cpu
qemu64,-kvm_steal_time`.
So, it looks to me as if the steal time MSR is not set/copied properly
during live migration, although AFAICT this should be the case after
917367aa968fd4fef29d340e0c7ec8c608dffaab.
Any ideas?
Regards,
Apollon
[1] https://bugs.debian.org/785557
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
2015-06-03 12:12 [Qemu-devel] Steal time MSR not set properly during live migration? Apollon Oikonomopoulos
@ 2015-06-11 20:46 ` Apollon Oikonomopoulos
2015-06-11 21:42 ` Michael Tokarev
0 siblings, 1 reply; 4+ messages in thread
From: Apollon Oikonomopoulos @ 2015-06-11 20:46 UTC (permalink / raw)
To: qemu-devel; +Cc: debian-admin
On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote:
> Any ideas?
As far as I understand, there is an issue when reading the MSR on the
incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main
thread during initialization, that causes the initial vCPU steal time
value to be set using the main thread's (and not the vCPU thread's)
run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the
vCPU thread's run_delay to determine steal time, causing an overflow.
The issue was introduced by commit
917367aa968fd4fef29d340e0c7ec8c608dffaab.
For the full analysis, see https://bugs.debian.org/785557#64 and the
followup e-mail.
Regards,
Apollon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
2015-06-11 20:46 ` Apollon Oikonomopoulos
@ 2015-06-11 21:42 ` Michael Tokarev
2015-08-28 14:57 ` Alexandre DERUMIER
0 siblings, 1 reply; 4+ messages in thread
From: Michael Tokarev @ 2015-06-11 21:42 UTC (permalink / raw)
To: qemu-devel, debian-admin; +Cc: Marcelo Tosatti, Gleb Natapov
11.06.2015 23:46, Apollon Oikonomopoulos wrote:
> On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote:
>> Any ideas?
>
> As far as I understand, there is an issue when reading the MSR on the
> incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main
> thread during initialization, that causes the initial vCPU steal time
> value to be set using the main thread's (and not the vCPU thread's)
> run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the
> vCPU thread's run_delay to determine steal time, causing an overflow.
> The issue was introduced by commit
> 917367aa968fd4fef29d340e0c7ec8c608dffaab.
>
> For the full analysis, see https://bugs.debian.org/785557#64 and the
> followup e-mail.
Adding Cc's...
Thanks,
/mjt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
2015-06-11 21:42 ` Michael Tokarev
@ 2015-08-28 14:57 ` Alexandre DERUMIER
0 siblings, 0 replies; 4+ messages in thread
From: Alexandre DERUMIER @ 2015-08-28 14:57 UTC (permalink / raw)
To: Michael Tokarev; +Cc: debian-admin, Marcelo Tosatti, qemu-devel, Gleb Natapov
Hi,
I have add this bug today on 3 debian jessie guests (kernel 3.16), after migration from qemu 2.3 to qemu 2.4.
Is it a qemu bug or guest kernel 3.16 ?
Regards,
Alexandre Derumier
----- Mail original -----
De: "Michael Tokarev" <mjt@tls.msk.ru>
À: "qemu-devel" <qemu-devel@nongnu.org>, debian-admin@lists.debian.org
Cc: "Marcelo Tosatti" <mtosatti@redhat.com>, "Gleb Natapov" <gleb@redhat.com>
Envoyé: Jeudi 11 Juin 2015 23:42:02
Objet: Re: [Qemu-devel] Steal time MSR not set properly during live migration?
11.06.2015 23:46, Apollon Oikonomopoulos wrote:
> On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote:
>> Any ideas?
>
> As far as I understand, there is an issue when reading the MSR on the
> incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main
> thread during initialization, that causes the initial vCPU steal time
> value to be set using the main thread's (and not the vCPU thread's)
> run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the
> vCPU thread's run_delay to determine steal time, causing an overflow.
> The issue was introduced by commit
> 917367aa968fd4fef29d340e0c7ec8c608dffaab.
>
> For the full analysis, see https://bugs.debian.org/785557#64 and the
> followup e-mail.
Adding Cc's...
Thanks,
/mjt
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-08-28 14:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-03 12:12 [Qemu-devel] Steal time MSR not set properly during live migration? Apollon Oikonomopoulos
2015-06-11 20:46 ` Apollon Oikonomopoulos
2015-06-11 21:42 ` Michael Tokarev
2015-08-28 14:57 ` Alexandre DERUMIER
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).