From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Lucas Meneghel Rodrigues <lmr@redhat.com>,
KVM mailing list <kvm@vger.kernel.org>,
Juan Jose Quintela Carreira <quintela@redhat.com>,
"libvir-list@redhat.com" <libvir-list@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
QEMU devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
Date: Mon, 14 Nov 2011 13:51:40 +0200 [thread overview]
Message-ID: <20111114115139.GA17560@redhat.com> (raw)
In-Reply-To: <20111114113727.GD32392@redhat.com>
On Mon, Nov 14, 2011 at 11:37:27AM +0000, Daniel P. Berrange wrote:
> On Mon, Nov 14, 2011 at 01:34:15PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 14, 2011 at 11:29:18AM +0000, Daniel P. Berrange wrote:
> > > On Mon, Nov 14, 2011 at 12:21:53PM +0100, Kevin Wolf wrote:
> > > > Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> > > > > On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
> > > > >> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
> > > > >>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
> > > > >>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> > > > >>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > > > >>>>>> Live migration with qcow2 or any other image format is just not going to work
> > > > >>>>>> right now even with proper clustered storage. I think doing a block level flush
> > > > >>>>>> cache interface and letting block devices decide how to do it is the best approach.
> > > > >>>>>
> > > > >>>>> I would really prefer reusing the existing open/close code. It means
> > > > >>>>> less (duplicated) code, is existing code that is well tested and doesn't
> > > > >>>>> make migration much of a special case.
> > > > >>>>>
> > > > >>>>> If you want to avoid reopening the file on the OS level, we can reopen
> > > > >>>>> only the topmost layer (i.e. the format, but not the protocol) for now
> > > > >>>>> and in 1.1 we can use bdrv_reopen().
> > > > >>>>>
> > > > >>>>
> > > > >>>> Intuitively I dislike _reopen style interfaces. If the second open
> > > > >>>> yields different results from the first, does it invalidate any
> > > > >>>> computations in between?
> > > > >>>>
> > > > >>>> What's wrong with just delaying the open?
> > > > >>>
> > > > >>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
> > > > >>> the ability to rollback to the source host upon open failure for most
> > > > >>> deployed versions of libvirt. We only fairly recently switched to a five
> > > > >>> stage migration handshake to cope with rollback when 'cont' fails.
> > > > >>>
> > > > >>> Daniel
> > > > >>
> > > > >> I guess reopen can fail as well, so this seems to me to be an important
> > > > >> fix but not a blocker.
> > > > >
> > > > > If if the initial open succeeds, then it is far more likely that a later
> > > > > re-open will succeed too, because you have already elminated the possibility
> > > > > of configuration mistakes, and will have caught most storage runtime errors
> > > > > too. So there is a very significant difference in reliability between doing
> > > > > an 'open at startup + reopen at cont' vs just 'open at cont'
> > > > >
> > > > > Based on the bug reports I see, we want to be very good at detecting and
> > > > > gracefully handling open errors because they are pretty frequent.
> > > >
> > > > Do you have some more details on the kind of errors? Missing files,
> > > > permissions, something like this? Or rather something related to the
> > > > actual content of an image file?
> > >
> > > Missing files due to wrong/missing NFS mounts, or incorrect SAN / iSCSI
> > > setup. Access permissions due to incorrect user / group setup, or read
> > > only mounts, or SELinux denials. Actual I/O errors are less common and
> > > are not so likely to cause QEMU to fail to start any, since QEMU is
> > > likely to just report them to the guest OS instead.
> >
> > Do you run qemu with -S, then give a 'cont' command to start it?
>
> Yes
>
> Daniel
Probably in an attempt to improve reliability :)
So this is in fact unrelated to migration. So we can either ignore this
bug (assuming no distros ship cutting edge qemu with an old libvirt), or
special-case -S and do an open/close cycle on startup.
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
next prev parent reply other threads:[~2011-11-14 11:50 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-09 16:29 [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions Lucas Meneghel Rodrigues
2011-11-09 16:39 ` Anthony Liguori
2011-11-09 17:02 ` Avi Kivity
2011-11-09 17:35 ` Anthony Liguori
2011-11-09 19:53 ` Juan Quintela
2011-11-09 20:18 ` Michael S. Tsirkin
2011-11-09 20:22 ` Anthony Liguori
2011-11-09 21:00 ` Michael S. Tsirkin
2011-11-09 21:01 ` Anthony Liguori
2011-11-10 10:41 ` Kevin Wolf
2011-11-10 16:50 ` Juan Quintela
2011-11-10 17:59 ` Anthony Liguori
2011-11-10 18:00 ` Anthony Liguori
2011-11-09 20:57 ` Juan Quintela
2011-11-10 8:55 ` Avi Kivity
2011-11-10 17:50 ` Juan Quintela
2011-11-10 17:54 ` Anthony Liguori
2011-11-12 10:20 ` Avi Kivity
2011-11-12 13:30 ` Anthony Liguori
2011-11-12 14:36 ` Avi Kivity
2011-11-10 18:27 ` Anthony Liguori
2011-11-10 18:42 ` Daniel P. Berrange
2011-11-10 19:11 ` Anthony Liguori
2011-11-10 20:06 ` Daniel P. Berrange
2011-11-10 20:07 ` Anthony Liguori
2011-11-10 21:30 ` Anthony Liguori
2011-11-11 10:15 ` Kevin Wolf
2011-11-11 14:03 ` Anthony Liguori
2011-11-11 14:29 ` Kevin Wolf
2011-11-11 14:35 ` Anthony Liguori
2011-11-11 14:44 ` Kevin Wolf
2011-11-11 20:38 ` Anthony Liguori
2011-11-12 10:27 ` Avi Kivity
2011-11-12 13:39 ` Anthony Liguori
2011-11-12 14:43 ` Avi Kivity
2011-11-12 16:01 ` Anthony Liguori
2011-11-12 10:25 ` Avi Kivity
2011-11-14 9:58 ` Kevin Wolf
2011-11-14 10:10 ` Michael S. Tsirkin
2011-11-15 13:28 ` Avi Kivity
2011-11-14 10:16 ` Daniel P. Berrange
2011-11-14 10:24 ` Michael S. Tsirkin
2011-11-14 11:08 ` Daniel P. Berrange
2011-11-14 11:21 ` Kevin Wolf
2011-11-14 11:29 ` Daniel P. Berrange
2011-11-14 11:34 ` Michael S. Tsirkin
2011-11-14 11:37 ` Daniel P. Berrange
2011-11-14 11:51 ` Michael S. Tsirkin [this message]
2011-11-14 11:55 ` Daniel P. Berrange
2011-11-14 11:56 ` Michael S. Tsirkin
2011-11-14 11:58 ` Daniel P. Berrange
2011-11-14 12:17 ` Michael S. Tsirkin
2011-11-14 11:36 ` Gleb Natapov
2011-11-14 11:32 ` Michael S. Tsirkin
2011-11-14 14:19 ` Anthony Liguori
2011-11-15 13:20 ` Juan Quintela
2011-11-15 13:56 ` Anthony Liguori
2011-11-09 19:25 ` Juan Quintela
2011-11-09 23:33 ` Lucas Meneghel Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111114115139.GA17560@redhat.com \
--to=mst@redhat.com \
--cc=avi@redhat.com \
--cc=berrange@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=libvir-list@redhat.com \
--cc=lmr@redhat.com \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).