All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dor Laor <dlaor@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Wed, 02 Mar 2011 00:27:49 +0200	[thread overview]
Message-ID: <4D6D72E5.9070106@redhat.com> (raw)
In-Reply-To: <4D6D160E.4060208@codemonkey.ws>

On 03/01/2011 05:51 PM, Anthony Liguori wrote:
> On 03/01/2011 04:39 AM, Avi Kivity wrote:
>> On 02/28/2011 08:12 PM, Anthony Liguori wrote:
>>>
>>>
>>> On Feb 28, 2011 11:47 AM, "Avi Kivity" <avi@redhat.com
>>> <mailto:avi@redhat.com>> wrote:
>>> >
>>> > On 02/28/2011 07:33 PM, Anthony Liguori wrote:
>>> >>
>>> >>
>>> >> >
>>> >> > You're just ignoring what I've written.
>>> >>
>>> >> No, you're just impervious to my subtle attempt to refocus the
>>> discussion on solving a practical problem.
>>> >>
>>> >> There's a lot of good, reasonably straight forward changes we can
>>> make that have a high return on investment.
>>> >>
>>> >
>>> > Is making qemu the authoritative source of configuration
>>> information a straightforward change? Is the return on it high? Is
>>> the investment low?
>>>
>>> I think this is where we fundamentally disagree. My position is that
>>> QEMU is already the authoritative source. Having a state file doesn't
>>> change anything.
>>>
>>> Do a hot unplug of a network device with upstream libvirt with
>>> acpiphp unloaded, consult libvirt and then consult the monitor to see
>>> who has the right view of the guests config.
>>>
>>
>> libvirt is right and the monitor is wrong.
>>
>> On real hardware, calling _EJ0 doesn't affect the configuration one
>> little bit (if I understand it correctly). It just turns off power to
>> the slot. If you power-cycle, the card will be there.
>
> It's up to the hardware vendor. Since it's ACPI, it can result in any
> number of operations. Usually, there's some logic to flip on an LED or
> something.
>
> There's nothing that prevents a vendor from ejecting the card. My point
> is that there aren't cleanly separated lines in the real world.
>
>>> To me, that's the definition of authoritative.
>>>
>>> > "No" to all three (ignoring for the moment whether it is good or
>>> not, which we were debating).
>>> >
>>> >
>>> >> The only suggestion I'm making beyond Marcelo's original patch is
>>> that we use a structured format and that we make it possible to use
>>> the same file to solve this problem in multiple places.
>>> >>
>>> >
>>> > No, you're suggesting a lot more than that.
>>>
>>> That's exactly what I'm suggesting from a technical perspective.
>>>
>>
>> Unless I'm hallucinating, you're suggesting quite a bit more. A
>> revolution in how qemu is to be managed.
>
> Let me take another route to see if I can't persuade you.
>
> First, let's clarify your proposal. You want to introduce a new block
> format

No. That was Avi's initial proposal, after we talked we realized that it 
is not needed and we can use plain files w/o any new configuration.
Pretty much similar to what you're proposing below, just w/o the 
configuration files.

> that references to block devices. It may also store a dirty bitmap to keep
> track of which blocks are out of sync. Hopefully, it goes without saying
> that the dirty bitmap is strictly optional (it's a performance
> optimization) so
> let's ignore it.
>
> Your format, as a text file, looks like:
>
> [raid1]
> primary=diska.img
> secondary=diskb.img
> active=primary
>
> To use it, here's the sequence:
>
> 0) qemu uses disk A for a block device
>
> 1) create a raid1 block device pointing to disk A and disk B.
>
> 2) management tool asks qemu to us the new raid1 block device.
>
> 3) qemu acks (2)
>
> 4) at some point, the mirror completes, writes are going to both disks
>
> 5) qemu sends out an event indicating that the disks are in sync
>
> 6) management tool then sends a command to fail over to disk B
>
> 7) qemu acks (6)

7) is not a must when there is no raid.

>
> We're making the management tool the "authoritative" source of how to
> launch
> QEMU. That means that the management tool ultimately determines which
> command
> line to relaunch QEMU with.

This is what we have today regardless of live copy. How else would you 
track many hot plug/unplug operations and live migration afterwards?
For enterprise usage, that's the best case. It's also true for a single 
host w/ libvirt and virt-manager.

>
> Here are the races:
>
> A) If QEMU crashes between (2) and (3), it may have issues a write to
> the new
> raid1 block device before the management tool sees (3). If this happens,
> when the management tool restarts QEMU with disk A, we're left with a
> dangling raid1 block device. Not a critical failure, but not ideal.

Once there is no raid there is no race.

>
> B) If QEMU crashes between (6) and (7), QEMU may have started writing to
> disk
> B before the management tool sees (7). This means that the management tool
> will create the guest with the raid1 block device which no longer is the
> correct disk. This could fail in subtly bad ways. Depending on how read
> is implemented (if you try to do striping for instance), bad data could be
> returned. You could try to implement a policy of always reading from B if
> the block has been copied but this gets harry really quickly. It's
> definitely not RAID1 anymore.

Exactly! Drop the raid and always read from B post #6.
This is what I was suggesting before.

>
> You may observe that the problem is not the RAID1 mechanism, but
> changing from
> using a normal device and the RAID1 mechanism. It would then be wise to
> say,
> let's always use this image format. Since that eliminates the race, we
> don't
> really need the copy bitmap anymore.
>
> Now we're left with a simple format that just refers to two filenames.

Ok, looks good. A management app won't need the files below since it 
manages everything by its own.

> However,
> block devices are more than just a filename. It needs a format, cache
> settings, etc. So let's put this all in the RAID1 block format. We also
> need
> a way to indicate which block device is selected.
>
> Let's make it a text file for purposes of discussion. It will look
> something
> like:
>
> [primary]
> filename=diska.img
> cache=none
> format=raw
>
> [secondary]
> filename=diskb.img
> cache=writethrough
> format=qcow2
>
> [global]
> active=primary
>
> Since we might want to mirror multiple drives at once, we should probablyn
> support having multiple drives configured which means we need to not
> just have
> a single active entry, but an entry associated with a particular device.
>
> [drive "diskA"]
> filename=diska.img
> cache=none
> format=raw
>
> [drive "diskB"]
> filename=diskb.img
> cache=writethrough
> format=qcow2
>
> [device "vda"]
> drive=diskB
>
> And this is exactly what I'm proposing. It's really the natural
> generalization
> of what you're proposing.
>
> So basically, the only differences are:
>
> 1) always use the new RAID1 format
> 2) drop the progress bitmap
> 3) support multiple devices per file
> 4) let drive properties be specified beyond filename
>
> All reasonable things to do.
>
> Regards,
>
> Anthony Liguori
>

  reply	other threads:[~2011-03-01 22:27 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50   ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07     ` Marcelo Tosatti
2011-02-22 21:11       ` Anthony Liguori
2011-02-22 23:09         ` Marcelo Tosatti
2011-02-22 23:14           ` Anthony Liguori
2011-02-23 13:01             ` Avi Kivity
2011-02-23 14:35               ` Anthony Liguori
2011-02-23 15:31                 ` Avi Kivity
2011-02-23 16:01                   ` Anthony Liguori
2011-02-23 16:14                     ` Avi Kivity
2011-02-23 16:28                       ` Anthony Liguori
2011-02-23 17:18                         ` Avi Kivity
2011-02-23 20:18                           ` Anthony Liguori
2011-02-23 20:44                             ` Marcelo Tosatti
2011-02-23 21:41                               ` Anthony Liguori
2011-02-24 14:39                                 ` Marcelo Tosatti
2011-02-24  7:37                             ` Markus Armbruster
2011-02-24  8:54                             ` Avi Kivity
2011-02-24 15:00                               ` Anthony Liguori
2011-02-24 15:22                                 ` Avi Kivity
2011-02-24 17:58                                   ` Anthony Liguori
2011-02-27  9:10                                     ` Avi Kivity
2011-02-27  9:55                                       ` Dor Laor
2011-02-27 13:49                                         ` Anthony Liguori
2011-02-27 16:02                                           ` Dor Laor
2011-02-27 17:25                                             ` Anthony Liguori
2011-02-28  8:58                                               ` Dor Laor
2011-02-27 14:00                                       ` Anthony Liguori
2011-02-27 15:31                                         ` Avi Kivity
2011-02-27 17:41                                           ` Anthony Liguori
2011-02-28  8:38                                             ` Avi Kivity
2011-02-28 12:45                                               ` Anthony Liguori
2011-02-28 13:21                                                 ` Avi Kivity
2011-02-28 17:33                                                   ` Anthony Liguori
2011-02-28 17:47                                                     ` Avi Kivity
2011-02-28 18:12                                                       ` Anthony Liguori
     [not found]                                                         ` <4D6CBECF.8090805@redhat.c! om>
     [not found]                                                         ` <4D6CB556.5060401@redhat.c! om>
2011-03-01  8:59                                                         ` Dor Laor
2011-03-02 12:39                                                           ` Anthony Liguori
2011-03-02 13:00                                                             ` Avi Kivity
2011-03-02 15:07                                                               ` Anthony Liguori
2011-03-01  9:39                                                         ` Avi Kivity
2011-03-01 15:51                                                           ` Anthony Liguori
2011-03-01 22:27                                                             ` Dor Laor [this message]
2011-03-02 16:30                                                             ` Avi Kivity
2011-03-02 21:55                                                               ` Anthony Liguori
2011-02-28 18:56                                                       ` Marcelo Tosatti
2011-03-01  9:45                                                         ` Avi Kivity
2011-02-23 16:17                     ` Peter Maydell
2011-02-23 16:30                       ` Anthony Liguori
2011-02-24  5:41                         ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00                           ` Stefan Hajnoczi
2011-02-23 17:26                   ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06                     ` Anthony Liguori
2011-02-24 12:15                       ` Markus Armbruster
2011-02-25  7:16                   ` Stefan Hajnoczi
2011-02-23 17:49               ` Marcelo Tosatti
2011-02-24  8:58                 ` Avi Kivity
2011-02-24 15:14                   ` Marcelo Tosatti
2011-02-24 15:28                     ` Avi Kivity
2011-02-24 16:39                       ` Marcelo Tosatti
2011-02-24 17:32                         ` Avi Kivity
2011-02-24 17:45                         ` Anthony Liguori
2011-02-27  9:22                           ` Avi Kivity
2011-02-23 12:46         ` Avi Kivity
2011-02-22 20:50   ` Anthony Liguori
2011-02-22 21:16   ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06   ` Anthony Liguori
2011-02-26  0:02     ` Marcelo Tosatti
2011-02-26 13:45       ` Anthony Liguori
2011-02-28 19:09         ` Marcelo Tosatti
2011-03-01  2:35         ` Marcelo Tosatti
2011-02-26 15:32       ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6D72E5.9070106@redhat.com \
    --to=dlaor@redhat.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.