qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-block] Migration sometimes fails with IDE and Qemu 2.2.1
Date: Thu, 09 Apr 2015 16:54:09 +0200	[thread overview]
Message-ID: <55269291.2000805@kamp.de> (raw)
In-Reply-To: <20150409134339.GE2292@work-vm>

Am 09.04.2015 um 15:43 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> Am 07.04.2015 um 21:01 schrieb Dr. David Alan Gilbert:
>>> * Peter Lieven (pl@kamp.de) wrote:
>>>> Am 07.04.2015 um 17:29 schrieb Dr. David Alan Gilbert:
>>>>> * Peter Lieven (pl@kamp.de) wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Am 07.04.2015 um 10:43 schrieb Dr. David Alan Gilbert:
>>>>>>>>>> Any particular workload or reproducer?
>>>>>>>>> Workload is almost zero. I try to figure out if there is a way to trigger it.
>>>>>>>>>
>>>>>>>>> Maybe playing a role: Machine type is -M pc1.2 and we set -kvmclock as
>>>>>>>>> CPU flag since kvmclock seemed to be quite buggy in 2.6.16...
>>>>>>>>>
>>>>>>>>> Exact cmdline is:
>>>>>>>>> /usr/bin/qemu-2.2.1  -enable-kvm  -M pc-1.2  -nodefaults -netdev type=tap,id=guest2,script=no,downscript=no,ifname=tap2  -device e1000,netdev=guest2,mac=52:54:00:ff:00:65 -drive format=raw,file=iscsi://172.21.200.53/iqn.2001-05.com.equallogic:4-52aed6-88a7e99a4-d9e00040fdc509a3-XXX-hd0/0,if=ide,cache=writeback,aio=native  -serial null  -parallel null  -m 1024 -smp 2,sockets=1,cores=2,threads=1  -monitor tcp:0:4003,server,nowait -vnc :3 -qmp tcp:0:3003,server,nowait -name 'XXX' -boot order=c,once=dc,menu=off  -drive index=2,media=cdrom,if=ide,cache=unsafe,aio=native,readonly=on  -k de  -incoming tcp:0:5003  -pidfile /var/run/qemu/vm-146.pid  -mem-path /hugepages  -mem-prealloc  -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus  -cpu qemu64,-kvmclock
>>>>>>>>>
>>>>>>>>> Exact kernel is:
>>>>>>>>> 2.6.16.46-0.12-smp (i think this is SLES10 or sth.)
>>>>>>>>>
>>>>>>>>> The machine does not hang. It seems just I/O is hanging. So you can type at the console or ping the system, but no longer login.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Peter
>>>>>>>> Interesting observation: Migrating the vServer again seems to fix to problem (at least in one case I could test just now).
>>>>>>>>
>>>>>>>> 2.6.8-24-smp is also affected.
>>>>>>> How often does it fail - you say 'sometimes' - is it a 1/10 or a 1/1000 ?
>>>>>> Its more often than 1/10 I would say.
>>>>> OK, that's not too bad - it's the 1/1000 that are really nasty to find.
>>>>> In your setup, how easy would it be for you to try :
>>>>>      with either 2.1 or current head?
>>>>>      with a newer machine-type?
>>>>>      without the cdrom?
>>>> Its all possible. I can clone the system and try everything on my test systems. I hope
>>>> it reproduces there.
>>> Great.  I think the order I would go would be:
>>>      Try head - if it works we know we've already got the fix somewhere
>>>      Try 2.1  - if it works we know it's something we introduced between
>>>                 2.1 and 2.2.1
>>>      Try a newer machine type - because pc-1.2 probably isn't tested much
>>>      CDROM at the end.
>> Update:
>>   - head -> not working
>>   - 2.1.3 -> not working
>>   - without CROM -> not working
>>   - with head and no machine type specified -> not working
>>   - with -device isa-ide -> BIOS not booting harddisk
> Well, at least it's consistent....
>
>> Will now try 1.3.1 just to be sure.
>>
>> Any ideas how to debug the IDE state after migration and/or check if the issue is similar to the ATAPI IDE
>> problem?
> It's unlikely to be quite the same - most of the ATAPI problems were related to ATAPI
> being quite separate and not saving much state.
>
> The way I found the CDROM problems was to turn on most of the debugging in the ide and bmdma code
> and on a failed migrate try and see what the state of any IO was at the point it migrated.

Thats tough. I enalbed DEBUG_IDE and DEBUG_AIO at first. But I have never debugged IDE before so I first
have to understand how that works....

What debugging confirms is that the IDE interface ideed stalls completely.

One thing I found curious in pci.c:

#define BM_MIGRATION_COMPAT_STATUS_BITS \
         (IDE_RETRY_DMA | IDE_RETRY_PIO | \
         IDE_RETRY_READ | IDE_RETRY_FLUSH)

Why is there no IDE_RETRY_WRITE ?
Honestly, I have not yet understood that that BM_MIGRATION_COMPAT_STATUS_BITS is for.

>
> One other thing to check; I found the newer kernel code recovers better after
> IDE problems; so on a newer guest kernel are there any log warnings about IDE problems,
> even if the guests are otherwise apparently happy?

I will check for that.

Thanks,
Peter

  reply	other threads:[~2015-04-09 14:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-06 18:47 [Qemu-devel] Migration sometimes fails with IDE and Qemu 2.2.1 Peter Lieven
2015-04-06 18:50 ` [Qemu-devel] [Qemu-block] " John Snow
2015-04-06 19:02   ` Peter Lieven
2015-04-06 19:10     ` Peter Lieven
2015-04-07  8:43       ` Dr. David Alan Gilbert
2015-04-07 15:11         ` Peter Lieven
2015-04-07 15:14           ` Paolo Bonzini
2015-04-07 18:54             ` Peter Lieven
2015-04-07 15:29           ` Dr. David Alan Gilbert
2015-04-07 18:44             ` Peter Lieven
2015-04-07 18:56               ` John Snow
2015-04-07 19:02                 ` Peter Lieven
2015-04-07 19:13                   ` John Snow
2015-04-09  6:34                     ` Peter Lieven
2015-04-09 12:46                     ` Peter Lieven
2015-04-09 12:50                       ` Paolo Bonzini
2015-04-07 19:01               ` Dr. David Alan Gilbert
2015-04-07 19:04                 ` Peter Lieven
2015-04-09 12:49                 ` Peter Lieven
2015-04-09 13:32                   ` Peter Lieven
2015-04-09 13:43                   ` Dr. David Alan Gilbert
2015-04-09 14:54                     ` Peter Lieven [this message]
2015-04-09 15:17                       ` Paolo Bonzini
2015-04-11 13:11                         ` Peter Lieven
2015-04-11 15:00                           ` Peter Lieven
2015-04-13  7:20                             ` Peter Lieven
2015-04-07 20:05               ` Paolo Bonzini
2015-04-09  6:43                 ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55269291.2000805@kamp.de \
    --to=pl@kamp.de \
    --cc=dgilbert@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).