public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
To: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: dlaor@redhat.com, ohmura.kei@lab.ntt.co.jp, kvm@vger.kernel.org,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	yoshikawa.takuya@oss.ntt.co.jp, avi@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1
Date: Mon, 26 Apr 2010 19:44:11 +0900	[thread overview]
Message-ID: <4BD56E7B.30906@lab.ntt.co.jp> (raw)
In-Reply-To: <4BD19E95.7030906@linux.vnet.ibm.com>

Anthony Liguori wrote:
> On 04/22/2010 08:53 PM, Yoshiaki Tamura wrote:
>> Anthony Liguori wrote:
>>> On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote:
>>>> 2010/4/22 Dor Laor<dlaor@redhat.com>:
>>>>> On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:
>>>>>> Dor Laor wrote:
>>>>>>> On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We have been implementing the prototype of Kemari for KVM, and
>>>>>>>> we're
>>>>>>>> sending
>>>>>>>> this message to share what we have now and TODO lists.
>>>>>>>> Hopefully, we
>>>>>>>> would like
>>>>>>>> to get early feedback to keep us in the right direction. Although
>>>>>>>> advanced
>>>>>>>> approaches in the TODO lists are fascinating, we would like to run
>>>>>>>> this project
>>>>>>>> step by step while absorbing comments from the community. The
>>>>>>>> current
>>>>>>>> code is
>>>>>>>> based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.
>>>>>>>>
>>>>>>>> For those who are new to Kemari for KVM, please take a look at the
>>>>>>>> following RFC which we posted last year.
>>>>>>>>
>>>>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html
>>>>>>>>
>>>>>>>> The transmission/transaction protocol, and most of the control
>>>>>>>> logic is
>>>>>>>> implemented in QEMU. However, we needed a hack in KVM to prevent
>>>>>>>> rip
>>>>>>>> from
>>>>>>>> proceeding before synchronizing VMs. It may also need some
>>>>>>>> plumbing in
>>>>>>>> the
>>>>>>>> kernel side to guarantee replayability of certain events and
>>>>>>>> instructions,
>>>>>>>> integrate the RAS capabilities of newer x86 hardware with the HA
>>>>>>>> stack, as well
>>>>>>>> as for optimization purposes, for example.
>>>>>>> [ snap]
>>>>>>>
>>>>>>>> The rest of this message describes TODO lists grouped by each
>>>>>>>> topic.
>>>>>>>>
>>>>>>>> === event tapping ===
>>>>>>>>
>>>>>>>> Event tapping is the core component of Kemari, and it decides on
>>>>>>>> which
>>>>>>>> event the
>>>>>>>> primary should synchronize with the secondary. The basic assumption
>>>>>>>> here is
>>>>>>>> that outgoing I/O operations are idempotent, which is usually true
>>>>>>>> for
>>>>>>>> disk I/O
>>>>>>>> and reliable network protocols such as TCP.
>>>>>>> IMO any type of network even should be stalled too. What if the VM
>>>>>>> runs
>>>>>>> non tcp protocol and the packet that the master node sent reached
>>>>>>> some
>>>>>>> remote client and before the sync to the slave the master failed?
>>>>>> In current implementation, it is actually stalling any type of
>>>>>> network
>>>>>> that goes through virtio-net.
>>>>>>
>>>>>> However, if the application was using unreliable protocols, it should
>>>>>> have its own recovering mechanism, or it should be completely
>>>>>> stateless.
>>>>> Why do you treat tcp differently? You can damage the entire VM this
>>>>> way -
>>>>> think of dhcp request that was dropped on the moment you switched
>>>>> between
>>>>> the master and the slave?
>>>> I'm not trying to say that we should treat tcp differently, but just
>>>> it's severe.
>>>> In case of dhcp request, the client would have a chance to retry after
>>>> failover, correct?
>>>> BTW, in current implementation,
>>>
>>> I'm slightly confused about the current implementation vs. my
>>> recollection of the original paper with Xen. I had thought that all disk
>>> and network I/O was buffered in such a way that at each checkpoint, the
>>> I/O operations would be released in a burst. Otherwise, you would have
>>> to synchronize after every I/O operation which is what it seems the
>>> current implementation does.
>>
>> Yes, you're almost right.
>> It's synchronizing before QEMU starts emulating I/O at each device model.
>
> If NodeA is the master and NodeB is the slave, if NodeA sends a network
> packet, you'll checkpoint before the packet is actually sent, and then
> if a failure occurs before the next checkpoint, won't that result in
> both NodeA and NodeB sending out a duplicate version of the packet?

Yes.  But I think it's better than taking checkpoint after.

If we checkpoint after sending packet, let's say it sent TCP ACK to the client, 
and if a hardware failure occurred to NodeA during the transaction *but the 
client received the TCP ACK*, NodeB will resume from the previous state, and it 
may need to receive some data from the client. However, because the client has 
already receiver TCP ACK, it won't resend the data to NodeB.  It looks this 
data is going to be dropped.

Anyway, I've just started planning to move the sync point to network/block 
layer, and I would post the result for discussion again.

  reply	other threads:[~2010-04-26 10:44 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-21  5:57 [RFC PATCH 00/20] Kemari for KVM v0.1 Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 01/20] Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of bit-based phys_ram_dirty Yoshiaki Tamura
2010-04-22 19:26   ` Anthony Liguori
2010-04-23  2:09     ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 02/20] Introduce cpu_physical_memory_get_dirty_range() Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 03/20] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 04/20] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2010-04-21  8:03   ` Stefan Hajnoczi
2010-04-21  8:27     ` Yoshiaki Tamura
2010-04-23  9:53   ` Avi Kivity
2010-04-23  9:59     ` Yoshiaki Tamura
2010-04-23 13:14       ` Avi Kivity
2010-04-26 10:43         ` Yoshiaki Tamura
2010-04-23 13:26     ` Anthony Liguori
2010-04-21  5:57 ` [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops() Yoshiaki Tamura
2010-04-22 19:28   ` Anthony Liguori
2010-04-23  3:37     ` Yoshiaki Tamura
2010-04-23 13:22       ` Anthony Liguori
2010-04-23 13:48         ` Avi Kivity
2010-05-03  9:32           ` Yoshiaki Tamura
2010-05-03 12:05             ` Anthony Liguori
2010-05-03 15:36               ` Yoshiaki Tamura
2010-05-03 16:07                 ` Anthony Liguori
2010-04-26 10:43         ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 06/20] Introduce iovec util functions, qemu_iovec_to_vector() and qemu_iovec_to_size() Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 07/20] Introduce qemu_put_vector() and qemu_put_vector_prepare() to use put_vector() in QEMUFile Yoshiaki Tamura
2010-04-22 19:29   ` Anthony Liguori
2010-04-23  4:02     ` Yoshiaki Tamura
2010-04-23 13:23       ` Anthony Liguori
2010-04-26 10:43         ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 08/20] Introduce RAMSaveIO and use cpu_physical_memory_get_dirty_range() to check multiple dirty pages Yoshiaki Tamura
2010-04-22 19:31   ` Anthony Liguori
2010-04-21  5:57 ` [RFC PATCH 09/20] Introduce writev and read to FdMigrationState Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 10/20] Introduce skip_header parameter to qemu_loadvm_state() so that it can be called iteratively without reading the header Yoshiaki Tamura
2010-04-22 19:34   ` Anthony Liguori
2010-04-23  4:25     ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 11/20] Introduce some socket util functions Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 12/20] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 13/20] Introduce util functions to control ft_transaction from savevm layer Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 14/20] Upgrade QEMU_FILE_VERSION from 3 to 4, and introduce qemu_savevm_state_all() Yoshiaki Tamura
2010-04-22 19:37   ` Anthony Liguori
2010-04-23  3:29     ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 15/20] Introduce FT mode support to configure Yoshiaki Tamura
2010-04-22 19:38   ` Anthony Liguori
2010-04-23  3:09     ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 16/20] Introduce event_tap fucntions and ft_tranx_ready() Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 17/20] Modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 18/20] Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 19/20] Insert do_event_tap() to virtio-{blk,net}, comment out assert() on cpu_single_env temporally Yoshiaki Tamura
2010-04-22 19:39   ` [RFC PATCH 19/20] Insert do_event_tap() to virtio-{blk, net}, " Anthony Liguori
2010-04-23  4:51     ` Yoshiaki Tamura
2010-04-21  5:57 ` [RFC PATCH 20/20] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2010-04-22  8:58 ` [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1 Dor Laor
2010-04-22 10:35   ` Yoshiaki Tamura
2010-04-22 11:36     ` Takuya Yoshikawa
2010-04-22 12:35       ` Yoshiaki Tamura
2010-04-22 12:19     ` Dor Laor
2010-04-22 13:16       ` Yoshiaki Tamura
2010-04-22 20:33         ` Anthony Liguori
2010-04-23  1:53           ` Yoshiaki Tamura
2010-04-23 13:20             ` Anthony Liguori
2010-04-26 10:44               ` Yoshiaki Tamura [this message]
2010-04-22 20:38         ` Dor Laor
2010-04-23  5:17           ` Yoshiaki Tamura
2010-04-23  7:36             ` Fernando Luis Vázquez Cao
2010-04-25 21:52               ` Dor Laor
2010-04-22 16:15     ` Jamie Lokier
2010-04-23  0:20       ` Yoshiaki Tamura
2010-04-23 15:07         ` Jamie Lokier
2010-04-22 19:42 ` Anthony Liguori
2010-04-23  0:45   ` Yoshiaki Tamura
2010-04-23 13:10     ` Anthony Liguori
2010-04-23 13:24 ` Avi Kivity
2010-04-26 10:44   ` Yoshiaki Tamura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD56E7B.30906@lab.ntt.co.jp \
    --to=tamura.yoshiaki@lab.ntt.co.jp \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=dlaor@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=ohmura.kei@lab.ntt.co.jp \
    --cc=qemu-devel@nongnu.org \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox