From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: dlaor@redhat.com, ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org,
mtosatti@redhat.com, aliguori@us.ibm.com, qemu-devel@nongnu.org,
yoshikawa.takuya@oss.ntt.co.jp, avi@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
Date: Fri, 23 Apr 2010 10:53:19 +0900 [thread overview]
Message-ID: <4BD0FD8F.5060108@lab.ntt.co.jp> (raw)
In-Reply-To: <4BD0B295.3010004@codemonkey.ws>
Anthony Liguori wrote:
> On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote:
>> 2010/4/22 Dor Laor<dlaor@redhat.com>:
>>> On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:
>>>> Dor Laor wrote:
>>>>> On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We have been implementing the prototype of Kemari for KVM, and we're
>>>>>> sending
>>>>>> this message to share what we have now and TODO lists. Hopefully, we
>>>>>> would like
>>>>>> to get early feedback to keep us in the right direction. Although
>>>>>> advanced
>>>>>> approaches in the TODO lists are fascinating, we would like to run
>>>>>> this project
>>>>>> step by step while absorbing comments from the community. The current
>>>>>> code is
>>>>>> based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.
>>>>>>
>>>>>> For those who are new to Kemari for KVM, please take a look at the
>>>>>> following RFC which we posted last year.
>>>>>>
>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html
>>>>>>
>>>>>> The transmission/transaction protocol, and most of the control
>>>>>> logic is
>>>>>> implemented in QEMU. However, we needed a hack in KVM to prevent rip
>>>>>> from
>>>>>> proceeding before synchronizing VMs. It may also need some
>>>>>> plumbing in
>>>>>> the
>>>>>> kernel side to guarantee replayability of certain events and
>>>>>> instructions,
>>>>>> integrate the RAS capabilities of newer x86 hardware with the HA
>>>>>> stack, as well
>>>>>> as for optimization purposes, for example.
>>>>> [ snap]
>>>>>
>>>>>> The rest of this message describes TODO lists grouped by each topic.
>>>>>>
>>>>>> === event tapping ===
>>>>>>
>>>>>> Event tapping is the core component of Kemari, and it decides on
>>>>>> which
>>>>>> event the
>>>>>> primary should synchronize with the secondary. The basic assumption
>>>>>> here is
>>>>>> that outgoing I/O operations are idempotent, which is usually true
>>>>>> for
>>>>>> disk I/O
>>>>>> and reliable network protocols such as TCP.
>>>>> IMO any type of network even should be stalled too. What if the VM
>>>>> runs
>>>>> non tcp protocol and the packet that the master node sent reached some
>>>>> remote client and before the sync to the slave the master failed?
>>>> In current implementation, it is actually stalling any type of network
>>>> that goes through virtio-net.
>>>>
>>>> However, if the application was using unreliable protocols, it should
>>>> have its own recovering mechanism, or it should be completely
>>>> stateless.
>>> Why do you treat tcp differently? You can damage the entire VM this
>>> way -
>>> think of dhcp request that was dropped on the moment you switched
>>> between
>>> the master and the slave?
>> I'm not trying to say that we should treat tcp differently, but just
>> it's severe.
>> In case of dhcp request, the client would have a chance to retry after
>> failover, correct?
>> BTW, in current implementation,
>
> I'm slightly confused about the current implementation vs. my
> recollection of the original paper with Xen. I had thought that all disk
> and network I/O was buffered in such a way that at each checkpoint, the
> I/O operations would be released in a burst. Otherwise, you would have
> to synchronize after every I/O operation which is what it seems the
> current implementation does.
Yes, you're almost right.
It's synchronizing before QEMU starts emulating I/O at each device model.
It was originally designed that way to avoid complexity of introducing buffering
mechanism and additional I/O latency by buffering.
> I'm not sure how that is accomplished
> atomically though since you could have a completed I/O operation
> duplicated on the slave node provided it didn't notify completion prior
> to failure.
That's exactly the point I wanted to discuss.
Currently, we're calling vm_stop(0), qemu_aio_flush() and bdrv_flush_all()
before qemu_save_state_all() in ft_tranx_ready(), to ensure outstanding I/O is
complete. I mimicked what existing live migration is doing.
It's not enough?
> Is there another kemari component that somehow handles buffering I/O
> that is not obvious from these patches?
No, I'm not hiding anything, and I would share any information regarding Kemari
to develop it in this community :-)
Thanks,
Yoshi
>
> Regards,
>
> Anthony Liguori
>
>
>
next prev parent reply other threads:[~2010-04-23 1:53 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-21 5:57 [RFC PATCH 00/20] Kemari for KVM v0.1 Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 01/20] Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty Yoshiaki Tamura
2010-04-22 19:26 ` Anthony Liguori
2010-04-23 2:09 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 02/20] Introduce cpu_physical_memory_get_dirty_range() Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 03/20] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 04/20] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-04-21 8:03 ` Stefan Hajnoczi
2010-04-21 8:27 ` Yoshiaki Tamura
2010-04-23 9:53 ` Avi Kivity
2010-04-23 9:59 ` Yoshiaki Tamura
2010-04-23 13:14 ` Avi Kivity
2010-04-26 10:43 ` Yoshiaki Tamura
2010-04-23 13:26 ` Anthony Liguori
2010-04-21 5:57 ` [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops() Yoshiaki Tamura
2010-04-22 19:28 ` Anthony Liguori
2010-04-23 3:37 ` Yoshiaki Tamura
2010-04-23 13:22 ` Anthony Liguori
2010-04-23 13:48 ` Avi Kivity
2010-05-03 9:32 ` Yoshiaki Tamura
2010-05-03 12:05 ` Anthony Liguori
2010-05-03 15:36 ` Yoshiaki Tamura
2010-05-03 16:07 ` Anthony Liguori
2010-04-26 10:43 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 06/20] Introduce iovec util functions, qemu_iovec_to_vector() and qemu_iovec_to_size() Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 07/20] Introduce qemu_put_vector() and qemu_put_vector_prepare() to use put_vector() in QEMUFile Yoshiaki Tamura
2010-04-22 19:29 ` Anthony Liguori
2010-04-23 4:02 ` Yoshiaki Tamura
2010-04-23 13:23 ` Anthony Liguori
2010-04-26 10:43 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 08/20] Introduce RAMSaveIO and use cpu_physical_memory_get_dirty_range() to check multiple dirty pages Yoshiaki Tamura
2010-04-22 19:31 ` Anthony Liguori
2010-04-21 5:57 ` [RFC PATCH 09/20] Introduce writev and read to FdMigrationState Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 10/20] Introduce skip_header parameter to qemu_loadvm_state() so that it can be called iteratively without reading the header Yoshiaki Tamura
2010-04-22 19:34 ` Anthony Liguori
2010-04-23 4:25 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 11/20] Introduce some socket util functions Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 12/20] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 13/20] Introduce util functions to control ft_transaction from savevm layer Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 14/20] Upgrade QEMU_FILE_VERSION from 3 to 4, and introduce qemu_savevm_state_all() Yoshiaki Tamura
2010-04-22 19:37 ` Anthony Liguori
2010-04-23 3:29 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 15/20] Introduce FT mode support to configure Yoshiaki Tamura
2010-04-22 19:38 ` Anthony Liguori
2010-04-23 3:09 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 16/20] Introduce event_tap fucntions and ft_tranx_ready() Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 17/20] Modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 18/20] Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 19/20] Insert do_event_tap() to virtio-{blk,net}, comment out assert() on cpu_single_env temporally Yoshiaki Tamura
2010-04-22 19:39 ` [RFC PATCH 19/20] Insert do_event_tap() to virtio-{blk, net}, " Anthony Liguori
2010-04-23 4:51 ` Yoshiaki Tamura
2010-04-21 5:57 ` [RFC PATCH 20/20] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-04-22 8:58 ` [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1 Dor Laor
2010-04-22 10:35 ` Yoshiaki Tamura
2010-04-22 11:36 ` Takuya Yoshikawa
2010-04-22 12:35 ` Yoshiaki Tamura
2010-04-22 12:19 ` Dor Laor
2010-04-22 13:16 ` Yoshiaki Tamura
2010-04-22 20:33 ` Anthony Liguori
2010-04-23 1:53 ` Yoshiaki Tamura [this message]
2010-04-23 13:20 ` Anthony Liguori
2010-04-26 10:44 ` Yoshiaki Tamura
2010-04-22 20:38 ` Dor Laor
2010-04-23 5:17 ` Yoshiaki Tamura
2010-04-23 7:36 ` Fernando Luis Vázquez Cao
2010-04-25 21:52 ` Dor Laor
2010-04-22 16:15 ` Jamie Lokier
2010-04-23 0:20 ` Yoshiaki Tamura
2010-04-23 15:07 ` Jamie Lokier
2010-04-22 19:42 ` Anthony Liguori
2010-04-23 0:45 ` Yoshiaki Tamura
2010-04-23 13:10 ` Anthony Liguori
2010-04-23 13:24 ` Avi Kivity
2010-04-26 10:44 ` Yoshiaki Tamura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD0FD8F.5060108@lab.ntt.co.jp \
--to=tamura.yoshiaki@lab.ntt.co.jp \
--cc=aliguori@us.ibm.com \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=dlaor@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=ohmura.kei@lab.ntt.co.jp \
--cc=qemu-devel@nongnu.org \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox