* guest suspend/resume & virtio: vring errors
@ 2011-07-08 19:02 Michael Tokarev
2011-07-09 9:17 ` Gleb Natapov
0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2011-07-08 19:02 UTC (permalink / raw)
To: KVM list
I tried suspend/resume cycle for a linux guest
today, with 100% failure result. There are 2
possible scenarious after resume (you need
pretty recent guest kernel for it to work at
all, earlier kernels, incl. early 2.6.32, just
stops somewhere at the start of suspend cycle,
but 2.6.32.42 and 3.0-rc6 "works"), both are
in the same place but different viewpoint.
It is either guest complains
virtio_net virtio0: input:id 2 is not a head!
and enters an endless loop eating 100% CPU, or
qemu-kvm exits (aborts) with the message:
kvm: Guest moved used index from 1 to 0
this is when trying to use virtio-net.
The same happens when disabling network entirely:
it complains about virtio-blk in the same way,
for example
kvm: Guest moved used index from 1 to 49285
qemu-kvm is of version 0.14.1 so far, I'll try
a git version later.
With e1000 instead of virtio-net-pci, the suspend
does not complete - guest kernel freezes after the
message "Suspending console(s)" and does not respond
(but does not eat 100% CPU either) - the same as for
older guest kernel.
Has anyone succeeded suspend/resume cycle?
Thanks!
/mjt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-08 19:02 guest suspend/resume & virtio: vring errors Michael Tokarev
@ 2011-07-09 9:17 ` Gleb Natapov
2011-07-09 9:47 ` Michael Tokarev
0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2011-07-09 9:17 UTC (permalink / raw)
To: Michael Tokarev; +Cc: KVM list
On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote:
> I tried suspend/resume cycle for a linux guest
> today, with 100% failure result. There are 2
Good. It works as expect :) Linux virtio drivers do not support PM.
> With e1000 instead of virtio-net-pci, the suspend
> does not complete - guest kernel freezes after the
> message "Suspending console(s)" and does not respond
> (but does not eat 100% CPU either) - the same as for
> older guest kernel.
>
> Has anyone succeeded suspend/resume cycle?
>
Yes, but Linux and suspend/resume are not best friends. Sometimes it
works by mistake, but next merged patch fixes it and suspend/resume
returns to its normal broken state.
--
Gleb.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 9:17 ` Gleb Natapov
@ 2011-07-09 9:47 ` Michael Tokarev
2011-07-09 9:55 ` Gleb Natapov
0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2011-07-09 9:47 UTC (permalink / raw)
To: Gleb Natapov; +Cc: KVM list
09.07.2011 13:17, Gleb Natapov wrote:
> On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote:
>> I tried suspend/resume cycle for a linux guest
>> today, with 100% failure result. There are 2
> Good. It works as expect :) Linux virtio drivers do not support PM.
This means that neither in-guest suspend/resume nor
qemu-kvm migrate-to-file (which fails for a different
reason I'm trying to debug now) works. Which is very
unfortunate.
>> With e1000 instead of virtio-net-pci, the suspend
>> does not complete - guest kernel freezes after the
>> message "Suspending console(s)" and does not respond
>> (but does not eat 100% CPU either) - the same as for
>> older guest kernel.
>>
>> Has anyone succeeded suspend/resume cycle?
>>
> Yes, but Linux and suspend/resume are not best friends. Sometimes it
> works by mistake, but next merged patch fixes it and suspend/resume
> returns to its normal broken state.
Lovelylovely.
It does not work with any version of windows I tried, too
(which is winXP, win7 32 and win7 64bits). Windows enters
an endless loop with a black screen during suspend (eating
100% CPU time).
What's wrong/broken in linux? Can it be fixed?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 9:47 ` Michael Tokarev
@ 2011-07-09 9:55 ` Gleb Natapov
2011-07-09 10:09 ` Michael Tokarev
0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2011-07-09 9:55 UTC (permalink / raw)
To: Michael Tokarev; +Cc: KVM list
On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote:
> 09.07.2011 13:17, Gleb Natapov wrote:
> > On Fri, Jul 08, 2011 at 11:02:54PM +0400, Michael Tokarev wrote:
> >> I tried suspend/resume cycle for a linux guest
> >> today, with 100% failure result. There are 2
> > Good. It works as expect :) Linux virtio drivers do not support PM.
>
> This means that neither in-guest suspend/resume nor
> qemu-kvm migrate-to-file (which fails for a different
> reason I'm trying to debug now) works. Which is very
> unfortunate.
>
Migration to file should work, or, at least, is a different problem.
It does not require guest cooperation.
> >> With e1000 instead of virtio-net-pci, the suspend
> >> does not complete - guest kernel freezes after the
> >> message "Suspending console(s)" and does not respond
> >> (but does not eat 100% CPU either) - the same as for
> >> older guest kernel.
> >>
> >> Has anyone succeeded suspend/resume cycle?
> >>
> > Yes, but Linux and suspend/resume are not best friends. Sometimes it
> > works by mistake, but next merged patch fixes it and suspend/resume
> > returns to its normal broken state.
>
> Lovelylovely.
>
> It does not work with any version of windows I tried, too
> (which is winXP, win7 32 and win7 64bits). Windows enters
> an endless loop with a black screen during suspend (eating
> 100% CPU time).
>
Heh. Is this S4 or S3 suspend resume? Looks like recent breakage.
The only knows problem to me is in win7/2008 S3 resume + net.
> What's wrong/broken in linux? Can it be fixed?
>
Complete lack of regression testing.
--
Gleb.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 9:55 ` Gleb Natapov
@ 2011-07-09 10:09 ` Michael Tokarev
2011-07-09 10:36 ` Gleb Natapov
0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2011-07-09 10:09 UTC (permalink / raw)
To: Gleb Natapov; +Cc: KVM list
09.07.2011 13:55, Gleb Natapov wrote:
> On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote:
>> 09.07.2011 13:17, Gleb Natapov wrote:
>> This means that neither in-guest suspend/resume nor
>> qemu-kvm migrate-to-file (which fails for a different
>> reason I'm trying to debug now) works. Which is very
>> unfortunate.
>>
> Migration to file should work, or, at least, is a different problem.
> It does not require guest cooperation.
Sure, that's why I said "different reason". That's what
prompted me to try in-guest migration - because I weren't
able to use migrate-to-file, but I needed to replace the
qemu-kvm binary after a security fix.
>> It does not work with any version of windows I tried, too
>> (which is winXP, win7 32 and win7 64bits). Windows enters
>> an endless loop with a black screen during suspend (eating
>> 100% CPU time).
>>
> Heh. Is this S4 or S3 suspend resume? Looks like recent breakage.
> The only knows problem to me is in win7/2008 S3 resume + net.
How can I know if it's S3 or S4? And I can't say it's recent:
0.14.1 works (or actually does not work) exactly the same in
this respect as current git: suspend/resume is equally broken.
>> What's wrong/broken in linux? Can it be fixed?
>>
> Complete lack of regression testing.
Hm. You wrote:
> Yes, but Linux and suspend/resume are not best friends. Sometimes it
> works by mistake, but next merged patch fixes it and suspend/resume
> returns to its normal broken state.
So regression testing should detect and raise an alarm
when it works by mistake. But can it be fixed to work
by design instead, after which regression testing gets
entrirely new meaning ? :)
Thanks,
/mjt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 10:09 ` Michael Tokarev
@ 2011-07-09 10:36 ` Gleb Natapov
2011-07-09 11:54 ` Michael Tokarev
0 siblings, 1 reply; 8+ messages in thread
From: Gleb Natapov @ 2011-07-09 10:36 UTC (permalink / raw)
To: Michael Tokarev; +Cc: KVM list
On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote:
> 09.07.2011 13:55, Gleb Natapov wrote:
> > On Sat, Jul 09, 2011 at 01:47:25PM +0400, Michael Tokarev wrote:
> >> 09.07.2011 13:17, Gleb Natapov wrote:
>
> >> This means that neither in-guest suspend/resume nor
> >> qemu-kvm migrate-to-file (which fails for a different
> >> reason I'm trying to debug now) works. Which is very
> >> unfortunate.
> >>
> > Migration to file should work, or, at least, is a different problem.
> > It does not require guest cooperation.
>
> Sure, that's why I said "different reason". That's what
> prompted me to try in-guest migration - because I weren't
> able to use migrate-to-file, but I needed to replace the
> qemu-kvm binary after a security fix.
>
> >> It does not work with any version of windows I tried, too
> >> (which is winXP, win7 32 and win7 64bits). Windows enters
> >> an endless loop with a black screen during suspend (eating
> >> 100% CPU time).
> >>
> > Heh. Is this S4 or S3 suspend resume? Looks like recent breakage.
> > The only knows problem to me is in win7/2008 S3 resume + net.
>
> How can I know if it's S3 or S4? And I can't say it's recent:
S3 is suspend to memory, S4 is suspend to disk. Don't remember how
different version of Windows call them. Actually win7 disables S3
when running on qemu with cirrus adapter, so you are probably doing
S4.
> 0.14.1 works (or actually does not work) exactly the same in
> this respect as current git: suspend/resume is equally broken.
Suspend to disk worked flawlessly for me with Windows. There is nothing
special qemu should do for it to work. I haven't checked it on upstream
for a long time though. Can you check 0.12?
>
> >> What's wrong/broken in linux? Can it be fixed?
> >>
> > Complete lack of regression testing.
>
> Hm. You wrote:
>
> > Yes, but Linux and suspend/resume are not best friends. Sometimes it
> > works by mistake, but next merged patch fixes it and suspend/resume
> > returns to its normal broken state.
>
> So regression testing should detect and raise an alarm
> when it works by mistake. But can it be fixed to work
> by design instead, after which regression testing gets
> entrirely new meaning ? :)
>
Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design
is not enough. Testing is needed. On Windows a driver that can't properly
handle S3/S4 will not pass WHQL, will not be signed by MS and will never
be loaded by the OS.
--
Gleb.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 10:36 ` Gleb Natapov
@ 2011-07-09 11:54 ` Michael Tokarev
2011-07-09 12:07 ` Gleb Natapov
0 siblings, 1 reply; 8+ messages in thread
From: Michael Tokarev @ 2011-07-09 11:54 UTC (permalink / raw)
To: Gleb Natapov; +Cc: KVM list
09.07.2011 14:36, Gleb Natapov wrote:
> On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote:
>>> Heh. Is this S4 or S3 suspend resume? Looks like recent breakage.
>>> The only knows problem to me is in win7/2008 S3 resume + net.
>>
>> How can I know if it's S3 or S4? And I can't say it's recent:
> S3 is suspend to memory, S4 is suspend to disk. Don't remember how
> different version of Windows call them. Actually win7 disables S3
> when running on qemu with cirrus adapter, so you are probably doing
> S4.
Ahh, yes, I remember now. Yes I used suspend-to-disk aka
hybernation on windows. And actually there's no error in
there - it works. I thought it is stuck in an endless loop,
but it actually did suspending and it completes eventially.
The only problem is that it does not show anything at all,
no progress, no messages, no nothing - the screen is completely
blank.
That's with -vga std which I usually use. I just retried with
cirrus and the effect is the same -- blank guest screen and 100%
cpu usage, but it takes much much longer to complete for some
reason - several minutes instead of ~30s. It also takes lots
of time when resuming, and the thing is less reliable too --
I've seen ~50/50 failure ratio at resuming with -vga cirrus.
>> 0.14.1 works (or actually does not work) exactly the same in
>> this respect as current git: suspend/resume is equally broken.
> Suspend to disk worked flawlessly for me with Windows. There is nothing
> special qemu should do for it to work. I haven't checked it on upstream
> for a long time though. Can you check 0.12?
It looks like 0.12 works the same way - at least the behavour
is very similar. I also tried current qemu-kvm git and there,
things are the same.
>>>> What's wrong/broken in linux? Can it be fixed?
>>>>
>>> Complete lack of regression testing.
>>
>> Hm. You wrote:
>>
>>> Yes, but Linux and suspend/resume are not best friends. Sometimes it
>>> works by mistake, but next merged patch fixes it and suspend/resume
>>> returns to its normal broken state.
>>
>> So regression testing should detect and raise an alarm
>> when it works by mistake. But can it be fixed to work
>> by design instead, after which regression testing gets
>> entrirely new meaning ? :)
>>
> Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design
> is not enough. Testing is needed. On Windows a driver that can't properly
> handle S3/S4 will not pass WHQL, will not be signed by MS and will never
> be loaded by the OS.
So I'm completely confused. Is virtio designed to work or to fail
on suspend/resume cycle? If the former, regression testing will
help. If the latter, the design should be fixed first... ;)
According to your words above it's designed to fail, so to say.
Also, as far as I can see, it should be fixed on the guest side,
ie, in linux virtio drivers, not in qemu/kvm (which appears to
work with windows guests), right?
Thanks,
/mjt
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: guest suspend/resume & virtio: vring errors
2011-07-09 11:54 ` Michael Tokarev
@ 2011-07-09 12:07 ` Gleb Natapov
0 siblings, 0 replies; 8+ messages in thread
From: Gleb Natapov @ 2011-07-09 12:07 UTC (permalink / raw)
To: Michael Tokarev; +Cc: KVM list
On Sat, Jul 09, 2011 at 03:54:04PM +0400, Michael Tokarev wrote:
> 09.07.2011 14:36, Gleb Natapov wrote:
> > On Sat, Jul 09, 2011 at 02:09:46PM +0400, Michael Tokarev wrote:
>
> >>> Heh. Is this S4 or S3 suspend resume? Looks like recent breakage.
> >>> The only knows problem to me is in win7/2008 S3 resume + net.
> >>
> >> How can I know if it's S3 or S4? And I can't say it's recent:
> > S3 is suspend to memory, S4 is suspend to disk. Don't remember how
> > different version of Windows call them. Actually win7 disables S3
> > when running on qemu with cirrus adapter, so you are probably doing
> > S4.
>
> Ahh, yes, I remember now. Yes I used suspend-to-disk aka
> hybernation on windows. And actually there's no error in
> there - it works. I thought it is stuck in an endless loop,
> but it actually did suspending and it completes eventially.
> The only problem is that it does not show anything at all,
> no progress, no messages, no nothing - the screen is completely
> blank.
>
> That's with -vga std which I usually use. I just retried with
> cirrus and the effect is the same -- blank guest screen and 100%
> cpu usage, but it takes much much longer to complete for some
> reason - several minutes instead of ~30s. It also takes lots
> of time when resuming, and the thing is less reliable too --
> I've seen ~50/50 failure ratio at resuming with -vga cirrus.
>
So hibernate works for you, but slow. Try to check with cache=unsafe.
Resume failures look strange to me. Don't remember seen them even once.
> >> 0.14.1 works (or actually does not work) exactly the same in
> >> this respect as current git: suspend/resume is equally broken.
> > Suspend to disk worked flawlessly for me with Windows. There is nothing
> > special qemu should do for it to work. I haven't checked it on upstream
> > for a long time though. Can you check 0.12?
>
> It looks like 0.12 works the same way - at least the behavour
> is very similar. I also tried current qemu-kvm git and there,
> things are the same.
>
> >>>> What's wrong/broken in linux? Can it be fixed?
> >>>>
> >>> Complete lack of regression testing.
> >>
> >> Hm. You wrote:
> >>
> >>> Yes, but Linux and suspend/resume are not best friends. Sometimes it
> >>> works by mistake, but next merged patch fixes it and suspend/resume
> >>> returns to its normal broken state.
> >>
> >> So regression testing should detect and raise an alarm
> >> when it works by mistake. But can it be fixed to work
> >> by design instead, after which regression testing gets
> >> entrirely new meaning ? :)
> >>
> > Well S4/S3 is pet peeve of mine, so I little bit exaggerated :) Design
> > is not enough. Testing is needed. On Windows a driver that can't properly
> > handle S3/S4 will not pass WHQL, will not be signed by MS and will never
> > be loaded by the OS.
>
> So I'm completely confused. Is virtio designed to work or to fail
> on suspend/resume cycle? If the former, regression testing will
> help. If the latter, the design should be fixed first... ;)
>
> According to your words above it's designed to fail, so to say.
>
Ah you mixed my comments about virtio drivers specifically and Linux as
a whole kernel :). Linux in general designed to support hibernation, but
it requires cooperation of many subsystems and failure of one of them
means failure to hibernate. That is where a lot of testing is needed.
PCI subsystem has PM support, but not all driver implement their part
properly, or, as in case of virtio, at all.
> Also, as far as I can see, it should be fixed on the guest side,
> ie, in linux virtio drivers, not in qemu/kvm (which appears to
> work with windows guests), right?
>
Correct. AFAIK Amit is working on this.
--
Gleb.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-07-09 12:07 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-08 19:02 guest suspend/resume & virtio: vring errors Michael Tokarev
2011-07-09 9:17 ` Gleb Natapov
2011-07-09 9:47 ` Michael Tokarev
2011-07-09 9:55 ` Gleb Natapov
2011-07-09 10:09 ` Michael Tokarev
2011-07-09 10:36 ` Gleb Natapov
2011-07-09 11:54 ` Michael Tokarev
2011-07-09 12:07 ` Gleb Natapov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox