From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: xiecl.fnst@cn.fujitsu.com, lizhijian@cn.fujitsu.com,
quintela@redhat.com, armbru@redhat.com, yunhong.jiang@intel.com,
eddie.dong@intel.com, peter.huangpeng@huawei.com,
qemu-devel@nongnu.org, arei.gonglei@huawei.com,
stefanha@redhat.com, pbonzini@redhat.com, amit.shah@redhat.com,
zhangchen.fnst@cn.fujitsu.com, hongyang.yang@easystack.cn
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)
Date: Wed, 2 Mar 2016 21:01:58 +0800 [thread overview]
Message-ID: <56D6E446.3040606@huawei.com> (raw)
In-Reply-To: <20160301122554.GA3745@work-vm>
On 2016/3/1 20:25, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2016/2/29 17:47, Dr. David Alan Gilbert wrote:
>>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2016/2/27 0:36, Dr. David Alan Gilbert wrote:
>>>>> * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
>>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>>> From: root <root@localhost.localdomain>
>>>>>>>
>>>>>>> This is the 15th version of COLO (Still only support periodic checkpoint).
>>>>>>>
>>>>>>> Here is only COLO frame part, you can get the whole codes from github:
>>>>>>> https://github.com/coloft/qemu/commits/colo-v2.6-periodic-mode
>>>>>>>
>>>>>>> There are little changes for this series except the network releated part.
>>>>>>
>>>>>> I was looking at the time the guest is paused during COLO and
>>>>>> was surprised to find one of the larger chunks was the time to reset
>>>>>> the guest before loading each checkpoint; I've traced it part way, the
>>>>>> biggest contributors for my test VM seem to be:
>>>>>>
>>>>>> 3.8ms pcibus_reset: VGA
>>>>>> 1.8ms pcibus_reset: virtio-net-pci
>>>>>> 1.5ms pcibus_reset: virtio-blk-pci
>>>>>> 1.5ms qemu_devices_reset: piix4_reset
>>>>>> 1.1ms pcibus_reset: piix3-ide
>>>>>> 1.1ms pcibus_reset: virtio-rng-pci
>>>>>>
>>>>>> I've not looked deeper yet, but some of these are very silly;
>>>>>> I'm running with -nographic so why it's taking 3.8ms to reset VGA is
>>>>>> going to be interesting.
>>>>>> Also, my only block device is the virtio-blk, so while I understand the
>>>>>> standard PC machine has the IDE controller, why it takes it over a ms
>>>>>> to reset an unused device.
>>>>>
>>>>> OK, so I've dug a bit deeper, and it appears that it's the changes in
>>>>> PCI bars that actually take the time; every time we do a reset we
>>>>> reset all the BARs, this causes it to do a pci_update_mappings and
>>>>> end up doing a memory_region_del_subregion.
>>>>> Then we load the config space of the PCI device as we do the vmstate_load,
>>>>> and this recreates all the mappings again.
>>>>>
>>>>> I'm not sure what the fix is, but that sounds like it would
>>>>> speed up the checkpoints usefully if we can avoid the map/remap when
>>>>> they're the same.
>>>>>
>>>>
>>>> Interesting, and thanks for your report.
>>>>
>>>> We already known qemu_system_reset() is a time-consuming function, we shouldn't
>>>> call it here, but if we didn't do that, there will be a bug, which we have
>>>> reported before in the previous COLO series, the bellow is the copy of the related
>>>> patch comment:
>
> Paolo suggested one fix, see the patch below; I'm not sure if it's safe
> (in particular if the guest changed a bar and the device code tried to access the memory
> while loading the state???) - but it does seem to work and shaves ~10ms off the reset/load
> times:
>
Nice work, i also tested it, and it is a good improvement, I'm wondering if it is safe here,
it should be safe to apply to qemu_system_reset() independently (I tested it too,
it will shaves about 5ms off).
Hailiang
> Dave
>
> commit 7570b2984143860005ad9fe79f5394c75f294328
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date: Tue Mar 1 12:08:14 2016 +0000
>
> COLO: Lock memory map around reset/load
>
> Changing the memory map appears to be expensive; we see this
> partiuclarly when on loading a checkpoint we:
> a) reset the devices
> This causes PCI bars to be reset
> b) Loading the device states
> This causes the PCI bars to be reloaded.
>
> Turning this all into a single memory_region_transaction saves
> ~10ms/checkpoint.
>
> TBD: What happens if the device code accesses the RAM during loading
> the checkpoint?
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>
> diff --git a/migration/colo.c b/migration/colo.c
> index 45c3432..c44fb2a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -22,6 +22,7 @@
> #include "net/colo-proxy.h"
> #include "net/net.h"
> #include "block/block_int.h"
> +#include "exec/memory.h"
>
> static bool vmstate_loading;
>
> @@ -934,6 +935,7 @@ void *colo_process_incoming_thread(void *opaque)
>
> stage_time_start = qemu_clock_get_us(QEMU_CLOCK_HOST);
> qemu_mutex_lock_iothread();
> + memory_region_transaction_begin();
> qemu_system_reset(VMRESET_SILENT);
> stage_time_end = qemu_clock_get_us(QEMU_CLOCK_HOST);
> timed_average_account(&mis->colo_state.time_reset,
> @@ -947,6 +949,7 @@ void *colo_process_incoming_thread(void *opaque)
> stage_time_end - stage_time_start);
> stage_time_start = stage_time_end;
> ret = qemu_load_device_state(fb);
> + memory_region_transaction_commit();
> if (ret < 0) {
> error_report("COLO: load device state failed\n");
> vmstate_loading = false;
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>
next prev parent reply other threads:[~2016-03-02 13:03 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-22 2:39 [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2016-02-22 2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2016-02-22 2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
2016-02-22 2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 03/38] COLO: migrate colo related info to secondary node zhanghailiang
2016-02-22 2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 04/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
2016-02-22 2:39 ` [Qemu-devel] [PATCH COLO-Frame v15 05/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 06/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 07/38] COLO: Implement colo checkpoint protocol zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 08/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 09/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 10/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 11/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 12/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 13/38] COLO: Load VMState into qsb before restore it zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 14/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 15/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 16/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 17/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 18/38] COLO failover: Introduce state to record failover process zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 19/38] COLO: Implement failover work for Primary VM zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 20/38] COLO: Implement failover work for Secondary VM zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 21/38] qmp event: Add COLO_EXIT event to notify users while exited from COLO zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 22/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 23/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 24/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 25/38] COLO: Update the global runstate after going into colo state zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 26/38] savevm: Introduce two helper functions for save/find loadvm_handlers entry zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 27/38] migration/savevm: Add new helpers to process the different stages of loadvm zhanghailiang
2016-02-26 12:52 ` Dr. David Alan Gilbert
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 28/38] migration/savevm: Export two helper functions for savevm process zhanghailiang
2016-02-26 13:00 ` Dr. David Alan Gilbert
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 29/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2016-02-26 13:16 ` Dr. David Alan Gilbert
2016-02-27 10:03 ` Hailiang Zhang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 30/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 31/38] net/filter: Add a 'status' property for filter object zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 32/38] filter-buffer: Accept zero interval zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 33/38] net: Add notifier/callback for netdev init zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 34/38] COLO/filter: add each netdev a buffer filter zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 35/38] COLO: manage the status of buffer filters for PVM zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 36/38] filter-buffer: make filter_buffer_flush() public zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 37/38] COLO: flush buffered packets in checkpoint process or exit COLO zhanghailiang
2016-02-22 2:40 ` [Qemu-devel] [PATCH COLO-Frame v15 38/38] COLO: Add block replication into colo process zhanghailiang
2016-02-25 19:52 ` [Qemu-devel] [PATCH COLO-Frame v15 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
2016-02-26 16:36 ` Dr. David Alan Gilbert
2016-02-27 7:54 ` Hailiang Zhang
2016-02-29 9:47 ` Dr. David Alan Gilbert
2016-02-29 12:16 ` Hailiang Zhang
2016-02-29 13:04 ` Dr. David Alan Gilbert
2016-03-01 12:25 ` Dr. David Alan Gilbert
2016-03-02 13:01 ` Hailiang Zhang [this message]
2016-03-03 20:13 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D6E446.3040606@huawei.com \
--to=zhang.zhanghailiang@huawei.com \
--cc=amit.shah@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=armbru@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eddie.dong@intel.com \
--cc=hongyang.yang@easystack.cn \
--cc=lizhijian@cn.fujitsu.com \
--cc=pbonzini@redhat.com \
--cc=peter.huangpeng@huawei.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=xiecl.fnst@cn.fujitsu.com \
--cc=yunhong.jiang@intel.com \
--cc=zhangchen.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.