From: Anthony Liguori <anthony@codemonkey.ws>
To: dlaor@redhat.com
Cc: Jes.Sorensen@redhat.com, Marcelo Tosatti <mtosatti@redhat.com>,
Avi Kivity <avi@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy
Date: Wed, 02 Mar 2011 07:39:01 -0500 [thread overview]
Message-ID: <4D6E3A65.7090502@codemonkey.ws> (raw)
In-Reply-To: <4D6CB556.5060401@redhat.com>
On 03/01/2011 03:59 AM, Dor Laor wrote:
> On 02/28/2011 08:12 PM, Anthony Liguori wrote:
>>
>> On Feb 28, 2011 11:47 AM, "Avi Kivity" <avi@redhat.com
>> <mailto:avi@redhat.com>> wrote:
>> >
>> > On 02/28/2011 07:33 PM, Anthony Liguori wrote:
>> >>
>> >>
>> >> >
>> >> > You're just ignoring what I've written.
>> >>
>> >> No, you're just impervious to my subtle attempt to refocus the
>> discussion on solving a practical problem.
>> >>
>> >> There's a lot of good, reasonably straight forward changes we can
>> make that have a high return on investment.
>> >>
>> >
>> > Is making qemu the authoritative source of configuration information
>> a straightforward change? Is the return on it high? Is the
>> investment low?
>>
>> I think this is where we fundamentally disagree. My position is that
>> QEMU is already the authoritative source. Having a state file doesn't
>> change anything.
>>
>> Do a hot unplug of a network device with upstream libvirt with acpiphp
>> unloaded, consult libvirt and then consult the monitor to see who has
>> the right view of the guests config.
>>
>> To me, that's the definition of authoritative.
>>
>> > "No" to all three (ignoring for the moment whether it is good or not,
>> which we were debating).
>> >
>> >
>> >> The only suggestion I'm making beyond Marcelo's original patch is
>> that we use a structured format and that we make it possible to use the
>> same file to solve this problem in multiple places.
>> >>
>> >
>> > No, you're suggesting a lot more than that.
>>
>> That's exactly what I'm suggesting from a technical perspective.
>>
>> >> I don't think this creates a fundamental break in how management
>> tools interact with QEMU. I don't think introducing RAID support in the
>> block layer is a reasonable alternative.
>> >>
>> >>
>> >
>> > Why not?
>>
>> Because its a lot of complexity and code that can go wrong while only
>> solving the race for one specific case. Not to mention that we double
>> the iop rate.
>>
>> > Something that avoids the whole state thing altogether:
>> >
>> > - instead of atomically switching when live copy is done, keep on
>> issuing writes to both the origin and the live copy
>> > - issue a notification to management
>> > - management receives the notification, and issues an atomic blockdev
>> switch command
>>
>> > this is really the RAID-1 solution but without the state file (credit
>> Dor). An advantage is that there is no additional latency when trying
>> to catch up to the dirty bitmap.
>>
>> It still suffers from the two generals problem. You cannot solve this
>> without making one node reliable and that takes us back to it being
>> either QEMU (posted event and state file) or the management tool (sync
>> event).
>
> It is safe w/o a state file by changing the basic live copy algorithm:
>
> 1. Live copy in progress stage
> Once live copy command is issued, a dirty bit map is created for
> tracking. There is a single pass over the entire image where we copy
> blocks from the src to the dst.
>
> Write commands for blocks that were already copied will be done
> twice for the src and dst.
>
> Once the full copy single pass ends, we trigger a QMP event that
> this stage can end.
>
> The live copy stage keeps running till the management issue a switch
> command. When it will happen, the switch is immediate and no need to
> copy additional blocks (but flush pending IOs).
>
> 2. Management sends a switch command.
> Qemu stops the doubling the IO and switches to the destination.
> End.
Here is where your race is:
2. Management sends a switch command
3. QEMU receives switch command
4. QEMU stops doubling IO and switches to the destination
5. QEMU sends acknowledgement of switch command
6. Management receives acknowledge of switch command
7. Management changes internal state definition to reflect the new
destination
If QEMU or the management tool crashes after step 4 and before step 6,
when the management tool restarts QEMU with the source image, data loss
will have occurred (and potentially corruption if a flush had happened).
This all boils down to the Two Generals Problem[1]. It's simply not
fixable without making one end reliable and that means that someone
needs to fsync() something *after* the switchover happens but before the
first write happens. That can be QEMU (Avi's RAID proposal and my state
file proposal) or it can be the management tool (if we introduce
synchronous events).
[1] http://en.wikipedia.org/wiki/Two_Generals%27_Problem
Regards,
Anthony Liguori
next prev parent reply other threads:[~2011-03-02 12:39 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-22 17:00 [Qemu-devel] [patch 0/3] live block copy (v2) Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 1/3] add migration_active function Marcelo Tosatti
2011-02-22 17:00 ` [Qemu-devel] [patch 2/3] Add support for live block copy Marcelo Tosatti
2011-02-22 20:50 ` [Qemu-devel] " Anthony Liguori
2011-02-22 21:07 ` Marcelo Tosatti
2011-02-22 21:11 ` Anthony Liguori
2011-02-22 23:09 ` Marcelo Tosatti
2011-02-22 23:14 ` Anthony Liguori
2011-02-23 13:01 ` Avi Kivity
2011-02-23 14:35 ` Anthony Liguori
2011-02-23 15:31 ` Avi Kivity
2011-02-23 16:01 ` Anthony Liguori
2011-02-23 16:14 ` Avi Kivity
2011-02-23 16:28 ` Anthony Liguori
2011-02-23 17:18 ` Avi Kivity
2011-02-23 20:18 ` Anthony Liguori
2011-02-23 20:44 ` Marcelo Tosatti
2011-02-23 21:41 ` Anthony Liguori
2011-02-24 14:39 ` Marcelo Tosatti
2011-02-24 7:37 ` Markus Armbruster
2011-02-24 8:54 ` Avi Kivity
2011-02-24 15:00 ` Anthony Liguori
2011-02-24 15:22 ` Avi Kivity
2011-02-24 17:58 ` Anthony Liguori
2011-02-27 9:10 ` Avi Kivity
2011-02-27 9:55 ` Dor Laor
2011-02-27 13:49 ` Anthony Liguori
2011-02-27 16:02 ` Dor Laor
2011-02-27 17:25 ` Anthony Liguori
2011-02-28 8:58 ` Dor Laor
2011-02-27 14:00 ` Anthony Liguori
2011-02-27 15:31 ` Avi Kivity
2011-02-27 17:41 ` Anthony Liguori
2011-02-28 8:38 ` Avi Kivity
2011-02-28 12:45 ` Anthony Liguori
2011-02-28 13:21 ` Avi Kivity
2011-02-28 17:33 ` Anthony Liguori
2011-02-28 17:47 ` Avi Kivity
2011-02-28 18:12 ` Anthony Liguori
[not found] ` <4D6CB556.5060401@redhat.c! om>
[not found] ` <4D6CBECF.8090805@redhat.c! om>
2011-03-01 8:59 ` Dor Laor
2011-03-02 12:39 ` Anthony Liguori [this message]
2011-03-02 13:00 ` Avi Kivity
2011-03-02 15:07 ` Anthony Liguori
2011-03-01 9:39 ` Avi Kivity
2011-03-01 15:51 ` Anthony Liguori
2011-03-01 22:27 ` Dor Laor
2011-03-02 16:30 ` Avi Kivity
2011-03-02 21:55 ` Anthony Liguori
2011-02-28 18:56 ` Marcelo Tosatti
2011-03-01 9:45 ` Avi Kivity
2011-02-23 16:17 ` Peter Maydell
2011-02-23 16:30 ` Anthony Liguori
2011-02-24 5:41 ` [Qemu-devel] Unsubsribing James Brown
2011-02-24 10:00 ` Stefan Hajnoczi
2011-02-23 17:26 ` [Qemu-devel] Re: [patch 2/3] Add support for live block copy Markus Armbruster
2011-02-23 20:06 ` Anthony Liguori
2011-02-24 12:15 ` Markus Armbruster
2011-02-25 7:16 ` Stefan Hajnoczi
2011-02-23 17:49 ` Marcelo Tosatti
2011-02-24 8:58 ` Avi Kivity
2011-02-24 15:14 ` Marcelo Tosatti
2011-02-24 15:28 ` Avi Kivity
2011-02-24 16:39 ` Marcelo Tosatti
2011-02-24 17:32 ` Avi Kivity
2011-02-24 17:45 ` Anthony Liguori
2011-02-27 9:22 ` Avi Kivity
2011-02-23 12:46 ` Avi Kivity
2011-02-22 20:50 ` Anthony Liguori
2011-02-22 21:16 ` [Qemu-devel] " Anthony Liguori
2011-02-23 19:06 ` Anthony Liguori
2011-02-26 0:02 ` Marcelo Tosatti
2011-02-26 13:45 ` Anthony Liguori
2011-02-28 19:09 ` Marcelo Tosatti
2011-03-01 2:35 ` Marcelo Tosatti
2011-02-26 15:32 ` Anthony Liguori
2011-02-22 17:00 ` [Qemu-devel] [patch 3/3] do not allow migration if block copy in progress Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D6E3A65.7090502@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=Jes.Sorensen@redhat.com \
--cc=avi@redhat.com \
--cc=dlaor@redhat.com \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).