* guest suspend/resume & virtio: vring errors @ 2011-07-08 19:02 Michael Tokarev 2011-07-09 9:17 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Michael Tokarev @ 2011-07-08 19:02 UTC (permalink / raw) To: KVM list I tried suspend/resume cycle for a linux guest today, with 100% failure result. There are 2 possible scenarious after resume (you need pretty recent guest kernel for it to work at all, earlier kernels, incl. early 2.6.32, just stops somewhere at the start of suspend cycle, but 2.6.32.42 and 3.0-rc6 "works"), both are in the same place but different viewpoint. It is either guest complains virtio_net virtio0: input:id 2 is not a head! and enters an endless loop eating 100% CPU, or qemu-kvm exits (aborts) with the message: kvm: Guest moved used index from 1 to 0 this is when trying to use virtio-net. The same happens when disabling network entirely: it complains about virtio-blk in the same way, for example kvm: Guest moved used index from 1 to 49285 qemu-kvm is of version 0.14.1 so far, I'll try a git version later. With e1000 instead of virtio-net-pci, the suspend does not complete - guest kernel freezes after the message "Suspending console(s)" and does not respond (but does not eat 100% CPU either) - the same as for older guest kernel. Has anyone succeeded suspend/resume cycle? Thanks! /mjt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-08 19:02 guest suspend/resume & virtio: vring errors Michael Tokarev @ 2011-07-09 9:17 ` Gleb Natapov 2011-07-09 9:47 ` Michael Tokarev 0 siblings, 1 reply; 8+ messages in thread From: Gleb Natapov @ 2011-07-09 9:17 UTC (permalink / raw) To: Michael Tokarev; +Cc: KVM list On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote: > I tried suspend/resume cycle for a linux guest > today, with 100% failure result. There are 2 Good. It works as expect :) Linux virtio drivers do not support PM. > With e1000 instead of virtio-net-pci, the suspend > does not complete - guest kernel freezes after the > message "Suspending console(s)" and does not respond > (but does not eat 100% CPU either) - the same as for > older guest kernel. > > Has anyone succeeded suspend/resume cycle? > Yes, but Linux and suspend/resume are not best friends. Sometimes it works by mistake, but next merged patch fixes it and suspend/resume returns to its normal broken state. -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 9:17 ` Gleb Natapov @ 2011-07-09 9:47 ` Michael Tokarev 2011-07-09 9:55 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Michael Tokarev @ 2011-07-09 9:47 UTC (permalink / raw) To: Gleb Natapov; +Cc: KVM list 09.07.2011 13:17, Gleb Natapov wrote: > On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote: >> I tried suspend/resume cycle for a linux guest >> today, with 100% failure result. There are 2 > Good. It works as expect :) Linux virtio drivers do not support PM. This means that neither in-guest suspend/resume nor qemu-kvm migrate-to-file (which fails for a different reason I'm trying to debug now) works. Which is very unfortunate. >> With e1000 instead of virtio-net-pci, the suspend >> does not complete - guest kernel freezes after the >> message "Suspending console(s)" and does not respond >> (but does not eat 100% CPU either) - the same as for >> older guest kernel. >> >> Has anyone succeeded suspend/resume cycle? >> > Yes, but Linux and suspend/resume are not best friends. Sometimes it > works by mistake, but next merged patch fixes it and suspend/resume > returns to its normal broken state. Lovelylovely. It does not work with any version of windows I tried, too (which is winXP, win7 32 and win7 64bits). Windows enters an endless loop with a black screen during suspend (eating 100% CPU time). What's wrong/broken in linux? Can it be fixed? Thanks, /mjt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 9:47 ` Michael Tokarev @ 2011-07-09 9:55 ` Gleb Natapov 2011-07-09 10:09 ` Michael Tokarev 0 siblings, 1 reply; 8+ messages in thread From: Gleb Natapov @ 2011-07-09 9:55 UTC (permalink / raw) To: Michael Tokarev; +Cc: KVM list On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote: > 09.07.2011 13:17, Gleb Natapov wrote: > > On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote: > >> I tried suspend/resume cycle for a linux guest > >> today, with 100% failure result. There are 2 > > Good. It works as expect :) Linux virtio drivers do not support PM. > > This means that neither in-guest suspend/resume nor > qemu-kvm migrate-to-file (which fails for a different > reason I'm trying to debug now) works. Which is very > unfortunate. > Migration to file should work, or, at least, is a different problem. It does not require guest cooperation. > >> With e1000 instead of virtio-net-pci, the suspend > >> does not complete - guest kernel freezes after the > >> message "Suspending console(s)" and does not respond > >> (but does not eat 100% CPU either) - the same as for > >> older guest kernel. > >> > >> Has anyone succeeded suspend/resume cycle? > >> > > Yes, but Linux and suspend/resume are not best friends. Sometimes it > > works by mistake, but next merged patch fixes it and suspend/resume > > returns to its normal broken state. > > Lovelylovely. > > It does not work with any version of windows I tried, too > (which is winXP, win7 32 and win7 64bits). Windows enters > an endless loop with a black screen during suspend (eating > 100% CPU time). > Heh. Is this S4 or S3 suspend resume? Looks like recent breakage. The only knows problem to me is in win7/2008 S3 resume + net. > What's wrong/broken in linux? Can it be fixed? > Complete lack of regression testing. -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 9:55 ` Gleb Natapov @ 2011-07-09 10:09 ` Michael Tokarev 2011-07-09 10:36 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Michael Tokarev @ 2011-07-09 10:09 UTC (permalink / raw) To: Gleb Natapov; +Cc: KVM list 09.07.2011 13:55, Gleb Natapov wrote: > On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote: >> 09.07.2011 13:17, Gleb Natapov wrote: >> This means that neither in-guest suspend/resume nor >> qemu-kvm migrate-to-file (which fails for a different >> reason I'm trying to debug now) works. Which is very >> unfortunate. >> > Migration to file should work, or, at least, is a different problem. > It does not require guest cooperation. Sure, that's why I said "different reason". That's what prompted me to try in-guest migration - because I weren't able to use migrate-to-file, but I needed to replace the qemu-kvm binary after a security fix. >> It does not work with any version of windows I tried, too >> (which is winXP, win7 32 and win7 64bits). Windows enters >> an endless loop with a black screen during suspend (eating >> 100% CPU time). >> > Heh. Is this S4 or S3 suspend resume? Looks like recent breakage. > The only knows problem to me is in win7/2008 S3 resume + net. How can I know if it's S3 or S4? And I can't say it's recent: 0.14.1 works (or actually does not work) exactly the same in this respect as current git: suspend/resume is equally broken. >> What's wrong/broken in linux? Can it be fixed? >> > Complete lack of regression testing. Hm. You wrote: > Yes, but Linux and suspend/resume are not best friends. Sometimes it > works by mistake, but next merged patch fixes it and suspend/resume > returns to its normal broken state. So regression testing should detect and raise an alarm when it works by mistake. But can it be fixed to work by design instead, after which regression testing gets entrirely new meaning ? :) Thanks, /mjt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 10:09 ` Michael Tokarev @ 2011-07-09 10:36 ` Gleb Natapov 2011-07-09 11:54 ` Michael Tokarev 0 siblings, 1 reply; 8+ messages in thread From: Gleb Natapov @ 2011-07-09 10:36 UTC (permalink / raw) To: Michael Tokarev; +Cc: KVM list On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote: > 09.07.2011 13:55, Gleb Natapov wrote: > > On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote: > >> 09.07.2011 13:17, Gleb Natapov wrote: > > >> This means that neither in-guest suspend/resume nor > >> qemu-kvm migrate-to-file (which fails for a different > >> reason I'm trying to debug now) works. Which is very > >> unfortunate. > >> > > Migration to file should work, or, at least, is a different problem. > > It does not require guest cooperation. > > Sure, that's why I said "different reason". That's what > prompted me to try in-guest migration - because I weren't > able to use migrate-to-file, but I needed to replace the > qemu-kvm binary after a security fix. > > >> It does not work with any version of windows I tried, too > >> (which is winXP, win7 32 and win7 64bits). Windows enters > >> an endless loop with a black screen during suspend (eating > >> 100% CPU time). > >> > > Heh. Is this S4 or S3 suspend resume? Looks like recent breakage. > > The only knows problem to me is in win7/2008 S3 resume + net. > > How can I know if it's S3 or S4? And I can't say it's recent: S3 is suspend to memory, S4 is suspend to disk. Don't remember how different version of Windows call them. Actually win7 disables S3 when running on qemu with cirrus adapter, so you are probably doing S4. > 0.14.1 works (or actually does not work) exactly the same in > this respect as current git: suspend/resume is equally broken. Suspend to disk worked flawlessly for me with Windows. There is nothing special qemu should do for it to work. I haven't checked it on upstream for a long time though. Can you check 0.12? > > >> What's wrong/broken in linux? Can it be fixed? > >> > > Complete lack of regression testing. > > Hm. You wrote: > > > Yes, but Linux and suspend/resume are not best friends. Sometimes it > > works by mistake, but next merged patch fixes it and suspend/resume > > returns to its normal broken state. > > So regression testing should detect and raise an alarm > when it works by mistake. But can it be fixed to work > by design instead, after which regression testing gets > entrirely new meaning ? :) > Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design is not enough. Testing is needed. On Windows a driver that can't properly handle S3/S4 will not pass WHQL, will not be signed by MS and will never be loaded by the OS. -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 10:36 ` Gleb Natapov @ 2011-07-09 11:54 ` Michael Tokarev 2011-07-09 12:07 ` Gleb Natapov 0 siblings, 1 reply; 8+ messages in thread From: Michael Tokarev @ 2011-07-09 11:54 UTC (permalink / raw) To: Gleb Natapov; +Cc: KVM list 09.07.2011 14:36, Gleb Natapov wrote: > On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote: >>> Heh. Is this S4 or S3 suspend resume? Looks like recent breakage. >>> The only knows problem to me is in win7/2008 S3 resume + net. >> >> How can I know if it's S3 or S4? And I can't say it's recent: > S3 is suspend to memory, S4 is suspend to disk. Don't remember how > different version of Windows call them. Actually win7 disables S3 > when running on qemu with cirrus adapter, so you are probably doing > S4. Ahh, yes, I remember now. Yes I used suspend-to-disk aka hybernation on windows. And actually there's no error in there - it works. I thought it is stuck in an endless loop, but it actually did suspending and it completes eventially. The only problem is that it does not show anything at all, no progress, no messages, no nothing - the screen is completely blank. That's with -vga std which I usually use. I just retried with cirrus and the effect is the same -- blank guest screen and 100% cpu usage, but it takes much much longer to complete for some reason - several minutes instead of ~30s. It also takes lots of time when resuming, and the thing is less reliable too -- I've seen ~50/50 failure ratio at resuming with -vga cirrus. >> 0.14.1 works (or actually does not work) exactly the same in >> this respect as current git: suspend/resume is equally broken. > Suspend to disk worked flawlessly for me with Windows. There is nothing > special qemu should do for it to work. I haven't checked it on upstream > for a long time though. Can you check 0.12? It looks like 0.12 works the same way - at least the behavour is very similar. I also tried current qemu-kvm git and there, things are the same. >>>> What's wrong/broken in linux? Can it be fixed? >>>> >>> Complete lack of regression testing. >> >> Hm. You wrote: >> >>> Yes, but Linux and suspend/resume are not best friends. Sometimes it >>> works by mistake, but next merged patch fixes it and suspend/resume >>> returns to its normal broken state. >> >> So regression testing should detect and raise an alarm >> when it works by mistake. But can it be fixed to work >> by design instead, after which regression testing gets >> entrirely new meaning ? :) >> > Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design > is not enough. Testing is needed. On Windows a driver that can't properly > handle S3/S4 will not pass WHQL, will not be signed by MS and will never > be loaded by the OS. So I'm completely confused. Is virtio designed to work or to fail on suspend/resume cycle? If the former, regression testing will help. If the latter, the design should be fixed first... ;) According to your words above it's designed to fail, so to say. Also, as far as I can see, it should be fixed on the guest side, ie, in linux virtio drivers, not in qemu/kvm (which appears to work with windows guests), right? Thanks, /mjt ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors 2011-07-09 11:54 ` Michael Tokarev @ 2011-07-09 12:07 ` Gleb Natapov 0 siblings, 0 replies; 8+ messages in thread From: Gleb Natapov @ 2011-07-09 12:07 UTC (permalink / raw) To: Michael Tokarev; +Cc: KVM list On Sat, Jul 09, 2011 at 03:54:04PM +0400, Michael Tokarev wrote: > 09.07.2011 14:36, Gleb Natapov wrote: > > On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote: > > >>> Heh. Is this S4 or S3 suspend resume? Looks like recent breakage. > >>> The only knows problem to me is in win7/2008 S3 resume + net. > >> > >> How can I know if it's S3 or S4? And I can't say it's recent: > > S3 is suspend to memory, S4 is suspend to disk. Don't remember how > > different version of Windows call them. Actually win7 disables S3 > > when running on qemu with cirrus adapter, so you are probably doing > > S4. > > Ahh, yes, I remember now. Yes I used suspend-to-disk aka > hybernation on windows. And actually there's no error in > there - it works. I thought it is stuck in an endless loop, > but it actually did suspending and it completes eventially. > The only problem is that it does not show anything at all, > no progress, no messages, no nothing - the screen is completely > blank. > > That's with -vga std which I usually use. I just retried with > cirrus and the effect is the same -- blank guest screen and 100% > cpu usage, but it takes much much longer to complete for some > reason - several minutes instead of ~30s. It also takes lots > of time when resuming, and the thing is less reliable too -- > I've seen ~50/50 failure ratio at resuming with -vga cirrus. > So hibernate works for you, but slow. Try to check with cache=unsafe. Resume failures look strange to me. Don't remember seen them even once. > >> 0.14.1 works (or actually does not work) exactly the same in > >> this respect as current git: suspend/resume is equally broken. > > Suspend to disk worked flawlessly for me with Windows. There is nothing > > special qemu should do for it to work. I haven't checked it on upstream > > for a long time though. Can you check 0.12? > > It looks like 0.12 works the same way - at least the behavour > is very similar. I also tried current qemu-kvm git and there, > things are the same. > > >>>> What's wrong/broken in linux? Can it be fixed? > >>>> > >>> Complete lack of regression testing. > >> > >> Hm. You wrote: > >> > >>> Yes, but Linux and suspend/resume are not best friends. Sometimes it > >>> works by mistake, but next merged patch fixes it and suspend/resume > >>> returns to its normal broken state. > >> > >> So regression testing should detect and raise an alarm > >> when it works by mistake. But can it be fixed to work > >> by design instead, after which regression testing gets > >> entrirely new meaning ? :) > >> > > Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design > > is not enough. Testing is needed. On Windows a driver that can't properly > > handle S3/S4 will not pass WHQL, will not be signed by MS and will never > > be loaded by the OS. > > So I'm completely confused. Is virtio designed to work or to fail > on suspend/resume cycle? If the former, regression testing will > help. If the latter, the design should be fixed first... ;) > > According to your words above it's designed to fail, so to say. > Ah you mixed my comments about virtio drivers specifically and Linux as a whole kernel :). Linux in general designed to support hibernation, but it requires cooperation of many subsystems and failure of one of them means failure to hibernate. That is where a lot of testing is needed. PCI subsystem has PM support, but not all driver implement their part properly, or, as in case of virtio, at all. > Also, as far as I can see, it should be fixed on the guest side, > ie, in linux virtio drivers, not in qemu/kvm (which appears to > work with windows guests), right? > Correct. AFAIK Amit is working on this. -- Gleb. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-07-09 12:07 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-08 19:02 guest suspend/resume & virtio: vring errors Michael Tokarev 2011-07-09 9:17 ` Gleb Natapov 2011-07-09 9:47 ` Michael Tokarev 2011-07-09 9:55 ` Gleb Natapov 2011-07-09 10:09 ` Michael Tokarev 2011-07-09 10:36 ` Gleb Natapov 2011-07-09 11:54 ` Michael Tokarev 2011-07-09 12:07 ` Gleb Natapov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox