qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Steal time MSR not set properly during live migration?
@ 2015-06-03 12:12 Apollon Oikonomopoulos
  2015-06-11 20:46 ` Apollon Oikonomopoulos
  0 siblings, 1 reply; 4+ messages in thread
From: Apollon Oikonomopoulos @ 2015-06-03 12:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkg-qemu-devel, debian-admin

Hi,

I'm trying to debug an issue we're having with some debian.org machines 
running in QEMU 2.1.2 instances (see [1] for more background). In short, 
after a live migration guests running Debian Jessie (linux 3.16) stop 
accounting CPU time properly. /proc/stat in the guest shows no increase 
in user and system time anymore (regardless of workload) and what stands 
out are extremely large values for steal time:

 % cat /proc/stat
 cpu  2400 0 1842 650879168 2579640 0 25 136562317270 0 0
 cpu0 1366 0 1028 161392988 1238598 0 11 383803090749 0 0
 cpu1 294 0 240 162582008 639105 0 8 39686436048 0 0
 cpu2 406 0 338 163331066 383867 0 4 333994238765 0 0
 cpu3 332 0 235 163573105 318069 0 1 1223752959076 0 0
 intr 355773871 33 10 0 0 0 0 3 0 1 0 0 36 144 0 0 1638612 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5001741 41 0 8516993 0 3669582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 ctxt 837862829
 btime 1431642967
 processes 8529939
 procs_running 1
 procs_blocked 0
 softirq 225193331 2 77532878 172 7250024 819289 0 54 33739135 176552 105675225
 
Reading the memory pointed to by the steal time MSRs pre- and 
post-migration, I can see that post-migration the high bytes are set to 
0xff:

(qemu) xp /8b 0x1fc0cfc0
000000001fc0cfc0: 0x94 0x57 0x77 0xf5 0xff 0xff 0xff 0xff

The "jump" in steal time happens when the guest is resumed on the 
receiving side.

I've also been able to consistently reproduce this on a Ganeti cluster 
at work, using QEMU 2.1.3 and kernels 3.16 and 4.0 in the guests. The 
issue goes away if I disable the steal time MSR using `-cpu 
qemu64,-kvm_steal_time`.

So, it looks to me as if the steal time MSR is not set/copied properly 
during live migration, although AFAICT this should be the case after 
917367aa968fd4fef29d340e0c7ec8c608dffaab.

Any ideas?

Regards,
Apollon

[1] https://bugs.debian.org/785557

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
  2015-06-03 12:12 [Qemu-devel] Steal time MSR not set properly during live migration? Apollon Oikonomopoulos
@ 2015-06-11 20:46 ` Apollon Oikonomopoulos
  2015-06-11 21:42   ` Michael Tokarev
  0 siblings, 1 reply; 4+ messages in thread
From: Apollon Oikonomopoulos @ 2015-06-11 20:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: debian-admin

On 15:12 Wed 03 Jun     , Apollon Oikonomopoulos wrote:
> Any ideas?

As far as I understand, there is an issue when reading the MSR on the 
incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main 
thread during initialization, that causes the initial vCPU steal time 
value to be set using the main thread's (and not the vCPU thread's) 
run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the 
vCPU thread's run_delay to determine steal time, causing an overflow.  
The issue was introduced by commit 
917367aa968fd4fef29d340e0c7ec8c608dffaab.

For the full analysis, see https://bugs.debian.org/785557#64 and the 
followup e-mail.

Regards,
Apollon

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
  2015-06-11 20:46 ` Apollon Oikonomopoulos
@ 2015-06-11 21:42   ` Michael Tokarev
  2015-08-28 14:57     ` Alexandre DERUMIER
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Tokarev @ 2015-06-11 21:42 UTC (permalink / raw)
  To: qemu-devel, debian-admin; +Cc: Marcelo Tosatti, Gleb Natapov

11.06.2015 23:46, Apollon Oikonomopoulos wrote:
> On 15:12 Wed 03 Jun     , Apollon Oikonomopoulos wrote:
>> Any ideas?
> 
> As far as I understand, there is an issue when reading the MSR on the 
> incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main 
> thread during initialization, that causes the initial vCPU steal time 
> value to be set using the main thread's (and not the vCPU thread's) 
> run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the 
> vCPU thread's run_delay to determine steal time, causing an overflow.  
> The issue was introduced by commit 
> 917367aa968fd4fef29d340e0c7ec8c608dffaab.
> 
> For the full analysis, see https://bugs.debian.org/785557#64 and the 
> followup e-mail.

Adding Cc's...

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Steal time MSR not set properly during live migration?
  2015-06-11 21:42   ` Michael Tokarev
@ 2015-08-28 14:57     ` Alexandre DERUMIER
  0 siblings, 0 replies; 4+ messages in thread
From: Alexandre DERUMIER @ 2015-08-28 14:57 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: debian-admin, Marcelo Tosatti, qemu-devel, Gleb Natapov

Hi,

I have add this bug today on 3 debian jessie guests (kernel 3.16), after migration from qemu 2.3 to qemu 2.4.

Is it a qemu bug or guest kernel 3.16 ?

Regards,

Alexandre Derumier


----- Mail original -----
De: "Michael Tokarev" <mjt@tls.msk.ru>
À: "qemu-devel" <qemu-devel@nongnu.org>, debian-admin@lists.debian.org
Cc: "Marcelo Tosatti" <mtosatti@redhat.com>, "Gleb Natapov" <gleb@redhat.com>
Envoyé: Jeudi 11 Juin 2015 23:42:02
Objet: Re: [Qemu-devel] Steal time MSR not set properly during live	migration?

11.06.2015 23:46, Apollon Oikonomopoulos wrote: 
> On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote: 
>> Any ideas? 
> 
> As far as I understand, there is an issue when reading the MSR on the 
> incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main 
> thread during initialization, that causes the initial vCPU steal time 
> value to be set using the main thread's (and not the vCPU thread's) 
> run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the 
> vCPU thread's run_delay to determine steal time, causing an overflow. 
> The issue was introduced by commit 
> 917367aa968fd4fef29d340e0c7ec8c608dffaab. 
> 
> For the full analysis, see https://bugs.debian.org/785557#64 and the 
> followup e-mail. 

Adding Cc's... 

Thanks, 

/mjt 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-28 14:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-03 12:12 [Qemu-devel] Steal time MSR not set properly during live migration? Apollon Oikonomopoulos
2015-06-11 20:46 ` Apollon Oikonomopoulos
2015-06-11 21:42   ` Michael Tokarev
2015-08-28 14:57     ` Alexandre DERUMIER

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).