qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Dor Laor <dlaor@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Wed, 02 Mar 2011 00:27:49 +0200	[thread overview]
Message-ID: <4D6D72E5.9070106@redhat.com> (raw)
In-Reply-To: <4D6D160E.4060208@codemonkey.ws>

On 03/01/2011 05:51 PM, Anthony Liguori wrote:
> On 03/01/2011 04:39 AM, Avi Kivity wrote:
>> On 02/28/2011 08:12 PM, Anthony Liguori wrote:
>>>
>>>
>>> On Feb 28, 2011 11:47 AM, "Avi Kivity" <avi@redhat.com
>>> <mailto:avi@redhat.com>> wrote:
>>> >
>>> > On 02/28/2011 07:33 PM, Anthony Liguori wrote:
>>> >>
>>> >>
>>> >> >
>>> >> > You're just ignoring what I've written.
>>> >>
>>> >> No, you're just impervious to my subtle attempt to refocus the
>>> discussion on solving a practical problem.
>>> >>
>>> >> There's a lot of good, reasonably straight forward changes we can
>>> make that have a high return on investment.
>>> >>
>>> >
>>> > Is making qemu the authoritative source of configuration
>>> information a straightforward change? Is the return on it high? Is
>>> the investment low?
>>>
>>> I think this is where we fundamentally disagree. My position is that
>>> QEMU is already the authoritative source. Having a state file doesn't
>>> change anything.
>>>
>>> Do a hot unplug of a network device with upstream libvirt with
>>> acpiphp unloaded, consult libvirt and then consult the monitor to see
>>> who has the right view of the guests config.
>>>
>>
>> libvirt is right and the monitor is wrong.
>>
>> On real hardware, calling _EJ0 doesn't affect the configuration one
>> little bit (if I understand it correctly). It just turns off power to
>> the slot. If you power-cycle, the card will be there.
>
> It's up to the hardware vendor. Since it's ACPI, it can result in any
> number of operations. Usually, there's some logic to flip on an LED or
> something.
>
> There's nothing that prevents a vendor from ejecting the card. My point
> is that there aren't cleanly separated lines in the real world.
>
>>> To me, that's the definition of authoritative.
>>>
>>> > "No" to all three (ignoring for the moment whether it is good or
>>> not, which we were debating).
>>> >
>>> >
>>> >> The only suggestion I'm making beyond Marcelo's original patch is
>>> that we use a structured format and that we make it possible to use
>>> the same file to solve this problem in multiple places.
>>> >>
>>> >
>>> > No, you're suggesting a lot more than that.
>>>
>>> That's exactly what I'm suggesting from a technical perspective.
>>>
>>
>> Unless I'm hallucinating, you're suggesting quite a bit more. A
>> revolution in how qemu is to be managed.
>
> Let me take another route to see if I can't persuade you.
>
> First, let's clarify your proposal. You want to introduce a new block
> format

No. That was Avi's initial proposal, after we talked we realized that it 
is not needed and we can use plain files w/o any new configuration.
Pretty much similar to what you're proposing below, just w/o the 
configuration files.

> that references to block devices. It may also store a dirty bitmap to keep
> track of which blocks are out of sync. Hopefully, it goes without saying
> that the dirty bitmap is strictly optional (it's a performance
> optimization) so
> let's ignore it.
>
> Your format, as a text file, looks like:
>
> [raid1]
> primary=diska.img
> secondary=diskb.img
> active=primary
>
> To use it, here's the sequence:
>
> 0) qemu uses disk A for a block device
>
> 1) create a raid1 block device pointing to disk A and disk B.
>
> 2) management tool asks qemu to us the new raid1 block device.
>
> 3) qemu acks (2)
>
> 4) at some point, the mirror completes, writes are going to both disks
>
> 5) qemu sends out an event indicating that the disks are in sync
>
> 6) management tool then sends a command to fail over to disk B
>
> 7) qemu acks (6)

7) is not a must when there is no raid.

>
> We're making the management tool the "authoritative" source of how to
> launch
> QEMU. That means that the management tool ultimately determines which
> command
> line to relaunch QEMU with.

This is what we have today regardless of live copy. How else would you 
track many hot plug/unplug operations and live migration afterwards?
For enterprise usage, that's the best case. It's also true for a single 
host w/ libvirt and virt-manager.

>
> Here are the races:
>
> A) If QEMU crashes between (2) and (3), it may have issues a write to
> the new
> raid1 block device before the management tool sees (3). If this happens,
> when the management tool restarts QEMU with disk A, we're left with a
> dangling raid1 block device. Not a critical failure, but not ideal.

Once there is no raid there is no race.

>
> B) If QEMU crashes between (6) and (7), QEMU may have started writing to
> disk
> B before the management tool sees (7). This means that the management tool
> will create the guest with the raid1 block device which no longer is the
> correct disk. This could fail in subtly bad ways. Depending on how read
> is implemented (if you try to do striping for instance), bad data could be
> returned. You could try to implement a policy of always reading from B if
> the block has been copied but this gets harry really quickly. It's
> definitely not RAID1 anymore.

Exactly! Drop the raid and always read from B post #6.
This is what I was suggesting before.

>
> You may observe that the problem is not the RAID1 mechanism, but
> changing from
> using a normal device and the RAID1 mechanism. It would then be wise to
> say,
> let's always use this image format. Since that eliminates the race, we
> don't
> really need the copy bitmap anymore.
>
> Now we're left with a simple format that just refers to two filenames.

Ok, looks good. A management app won't need the files below since it 
manages everything by its own.

> However,
> block devices are more than just a filename. It needs a format, cache
> settings, etc. So let's put this all in the RAID1 block format. We also
> need
> a way to indicate which block device is selected.
>
> Let's make it a text file for purposes of discussion. It will look
> something
> like:
>
> [primary]
> filename=diska.img
> cache=none
> format=raw
>
> [secondary]
> filename=diskb.img
> cache=writethrough
> format=qcow2
>
> [global]
> active=primary
>
> Since we might want to mirror multiple drives at once, we should probablyn
> support having multiple drives configured which means we need to not
> just have
> a single active entry, but an entry associated with a particular device.
>
> [drive "diskA"]
> filename=diska.img
> cache=none
> format=raw
>
> [drive "diskB"]
> filename=diskb.img
> cache=writethrough
> format=qcow2
>
> [device "vda"]
> drive=diskB
>
> And this is exactly what I'm proposing. It's really the natural
> generalization
> of what you're proposing.
>
> So basically, the only differences are:
>
> 1) always use the new RAID1 format
> 2) drop the progress bitmap
> 3) support multiple devices per file
> 4) let drive properties be specified beyond filename
>
> All reasonable things to do.
>
> Regards,
>
> Anthony Liguori
>

  reply	other threads:[~2011-03-01 22:27 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50   ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07     ` Marcelo Tosatti
2011-02-22 21:11       ` Anthony Liguori
2011-02-22 23:09         ` Marcelo Tosatti
2011-02-22 23:14           ` Anthony Liguori
2011-02-23 13:01             ` Avi Kivity
2011-02-23 14:35               ` Anthony Liguori
2011-02-23 15:31                 ` Avi Kivity
2011-02-23 16:01                   ` Anthony Liguori
2011-02-23 16:14                     ` Avi Kivity
2011-02-23 16:28                       ` Anthony Liguori
2011-02-23 17:18                         ` Avi Kivity
2011-02-23 20:18                           ` Anthony Liguori
2011-02-23 20:44                             ` Marcelo Tosatti
2011-02-23 21:41                               ` Anthony Liguori
2011-02-24 14:39                                 ` Marcelo Tosatti
2011-02-24  7:37                             ` Markus Armbruster
2011-02-24  8:54                             ` Avi Kivity
2011-02-24 15:00                               ` Anthony Liguori
2011-02-24 15:22                                 ` Avi Kivity
2011-02-24 17:58                                   ` Anthony Liguori
2011-02-27  9:10                                     ` Avi Kivity
2011-02-27  9:55                                       ` Dor Laor
2011-02-27 13:49                                         ` Anthony Liguori
2011-02-27 16:02                                           ` Dor Laor
2011-02-27 17:25                                             ` Anthony Liguori
2011-02-28  8:58                                               ` Dor Laor
2011-02-27 14:00                                       ` Anthony Liguori
2011-02-27 15:31                                         ` Avi Kivity
2011-02-27 17:41                                           ` Anthony Liguori
2011-02-28  8:38                                             ` Avi Kivity
2011-02-28 12:45                                               ` Anthony Liguori
2011-02-28 13:21                                                 ` Avi Kivity
2011-02-28 17:33                                                   ` Anthony Liguori
2011-02-28 17:47                                                     ` Avi Kivity
2011-02-28 18:12                                                       ` Anthony Liguori
     [not found]                                                         ` <4D6CB556.5060401@redhat.c! om>
     [not found]                                                         ` <4D6CBECF.8090805@redhat.c! om>
2011-03-01  8:59                                                         ` Dor Laor
2011-03-02 12:39                                                           ` Anthony Liguori
2011-03-02 13:00                                                             ` Avi Kivity
2011-03-02 15:07                                                               ` Anthony Liguori
2011-03-01  9:39                                                         ` Avi Kivity
2011-03-01 15:51                                                           ` Anthony Liguori
2011-03-01 22:27                                                             ` Dor Laor [this message]
2011-03-02 16:30                                                             ` Avi Kivity
2011-03-02 21:55                                                               ` Anthony Liguori
2011-02-28 18:56                                                       ` Marcelo Tosatti
2011-03-01  9:45                                                         ` Avi Kivity
2011-02-23 16:17                     ` Peter Maydell
2011-02-23 16:30                       ` Anthony Liguori
2011-02-24  5:41                         ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00                           ` Stefan Hajnoczi
2011-02-23 17:26                   ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06                     ` Anthony Liguori
2011-02-24 12:15                       ` Markus Armbruster
2011-02-25  7:16                   ` Stefan Hajnoczi
2011-02-23 17:49               ` Marcelo Tosatti
2011-02-24  8:58                 ` Avi Kivity
2011-02-24 15:14                   ` Marcelo Tosatti
2011-02-24 15:28                     ` Avi Kivity
2011-02-24 16:39                       ` Marcelo Tosatti
2011-02-24 17:32                         ` Avi Kivity
2011-02-24 17:45                         ` Anthony Liguori
2011-02-27  9:22                           ` Avi Kivity
2011-02-23 12:46         ` Avi Kivity
2011-02-22 20:50   ` Anthony Liguori
2011-02-22 21:16   ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06   ` Anthony Liguori
2011-02-26  0:02     ` Marcelo Tosatti
2011-02-26 13:45       ` Anthony Liguori
2011-02-28 19:09         ` Marcelo Tosatti
2011-03-01  2:35         ` Marcelo Tosatti
2011-02-26 15:32       ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6D72E5.9070106@redhat.com \
    --to=dlaor@redhat.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).