* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-11-30 14:27 ` Avi Kivity
@ 2010-11-30 14:50 ` Anthony Liguori
2010-12-01 12:40 ` Avi Kivity
2010-11-30 17:43 ` Juan Quintela
2010-12-01 1:20 ` Takuya Yoshikawa
2 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2010-11-30 14:50 UTC (permalink / raw)
To: Avi Kivity
Cc: Paolo Bonzini, Juan Quintela, qemu-devel, Juan Quintela,
kvm-devel
On 11/30/2010 08:27 AM, Avi Kivity wrote:
> On 11/30/2010 04:17 PM, Anthony Liguori wrote:
>>> What's the problem with burning that cpu? per guest page,
>>> compressing takes less than sending. Is it just an issue of qemu
>>> mutex hold time?
>>
>>
>> If you have a 512GB guest, then you have a 16MB dirty bitmap which
>> ends up being an 128MB dirty bitmap in QEMU because we represent
>> dirty bits with 8 bits.
>
> Was there not a patchset to split each bit into its own bitmap? And
> then copy the kvm or qemu master bitmap into each client bitmap as it
> became needed?
>
>> Walking 16mb (or 128mb) of memory just fine find a few pages to send
>> over the wire is a big waste of CPU time. If kvm.ko used a
>> multi-level table to represent dirty info, we could walk the memory
>> mapping at 2MB chunks allowing us to skip a large amount of the
>> comparisons.
>
> There's no reason to assume dirty pages would be clustered. If 0.2%
> of memory were dirty, but scattered uniformly, there would be no win
> from the two-level bitmap. A loss, in fact: 2MB can be represented as
> 512 bits or 64 bytes, just one cache line. Any two-level thing will
> need more.
>
> We might have a more compact encoding for sparse bitmaps, like
> run-length encoding.
>
>>
>>>> In the short term, fixing (2) by accounting zero pages as full
>>>> sized pages should "fix" the problem.
>>>>
>>>> In the long term, we need a new dirty bit interface from kvm.ko
>>>> that uses a multi-level table. That should dramatically improve
>>>> scan performance.
>>>
>>> Why would a multi-level table help? (or rather, please explain what
>>> you mean by a multi-level table).
>>>
>>> Something we could do is divide memory into more slots, and polling
>>> each slot when we start to scan its page range. That reduces the
>>> time between sampling a page's dirtiness and sending it off, and
>>> reduces the latency incurred by the sampling. There are also
>>> non-interface-changing ways to reduce this latency, like O(1) write
>>> protection, or using dirty bits instead of write protection when
>>> available.
>>
>> BTW, we should also refactor qemu to use the kvm dirty bitmap
>> directly instead of mapping it to the main dirty bitmap.
>
> That's what the patch set I was alluding to did. Or maybe I imagined
> the whole thing.
No, it just split the main bitmap into three bitmaps. I'm suggesting
that we have the dirty interface have two implementations, one that
refers to the 8-bit bitmap when TCG in use and another one that uses the
KVM representation.
TCG really needs multiple dirty bits but KVM doesn't. A shared
implementation really can't be optimal.
>
>>>> We also need to implement live migration in a separate thread that
>>>> doesn't carry qemu_mutex while it runs.
>>>
>>> IMO that's the biggest hit currently.
>>
>> Yup. That's the Correct solution to the problem.
>
> Then let's just Do it.
>
Yup.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-11-30 14:50 ` Anthony Liguori
@ 2010-12-01 12:40 ` Avi Kivity
0 siblings, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2010-12-01 12:40 UTC (permalink / raw)
To: Anthony Liguori
Cc: Paolo Bonzini, Juan Quintela, qemu-devel, Juan Quintela,
kvm-devel
On 11/30/2010 04:50 PM, Anthony Liguori wrote:
>> That's what the patch set I was alluding to did. Or maybe I imagined
>> the whole thing.
>
>
> No, it just split the main bitmap into three bitmaps. I'm suggesting
> that we have the dirty interface have two implementations, one that
> refers to the 8-bit bitmap when TCG in use and another one that uses
> the KVM representation.
>
> TCG really needs multiple dirty bits but KVM doesn't. A shared
> implementation really can't be optimal.
Live migration and the framebuffer can certainly share code with kvm and
tcg:
- tcg or kvm maintain an internal bitmap (kvm in the kernel, tcg updates
a private bitmap)
- a dirty log client wants to see an updated bitmap; migration on a new
pass, vga on screen refresh
- ask the producer (kvm or tcg) to fetch-and-clear a dirty bitmap
- broadcast it ( |= ) into any active clients (migration or framebuffer)
- everyone's happy
The code dirty thing might need special treatment, we can have a special
tcg-only bitmap for it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-11-30 14:27 ` Avi Kivity
2010-11-30 14:50 ` Anthony Liguori
@ 2010-11-30 17:43 ` Juan Quintela
2010-12-01 1:20 ` Takuya Yoshikawa
2 siblings, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2010-11-30 17:43 UTC (permalink / raw)
To: Avi Kivity; +Cc: Anthony Liguori, Paolo Bonzini, qemu-devel, kvm-devel
Avi Kivity <avi@redhat.com> wrote:
> On 11/30/2010 04:17 PM, Anthony Liguori wrote:
>>> What's the problem with burning that cpu? per guest page,
>>> compressing takes less than sending. Is it just an issue of qemu
>>> mutex hold time?
>>
>>
>> If you have a 512GB guest, then you have a 16MB dirty bitmap which
>> ends up being an 128MB dirty bitmap in QEMU because we represent
>> dirty bits with 8 bits.
>
> Was there not a patchset to split each bit into its own bitmap? And
> then copy the kvm or qemu master bitmap into each client bitmap as it
> became needed?
>
>> Walking 16mb (or 128mb) of memory just fine find a few pages to send
>> over the wire is a big waste of CPU time. If kvm.ko used a
>> multi-level table to represent dirty info, we could walk the memory
>> mapping at 2MB chunks allowing us to skip a large amount of the
>> comparisons.
>
> There's no reason to assume dirty pages would be clustered. If 0.2%
> of memory were dirty, but scattered uniformly, there would be no win
> from the two-level bitmap. A loss, in fact: 2MB can be represented as
> 512 bits or 64 bytes, just one cache line. Any two-level thing will
> need more.
>
> We might have a more compact encoding for sparse bitmaps, like
> run-length encoding.
I haven't measured it, but I think that it would be much better that
way. When we start, it don't matter too much (everything is dirty),
what we should optimize for is the last rounds, and in the last rounds
it would be much better to ask kvm:
fill this array of dirty pages offsets, and be done with it.
Not sure if adding a size field would improve things, both tests need to
be measured.
What would be a winner independent of that is a way to ask qemu the
number of dirty pages. Just now we need to calculate them walking the
bitmap (one of my patches just simplifies this).
Adding the feature to qemu means that we could always give recent
information to "info migrate" without incurring in a big cost.
>> BTW, we should also refactor qemu to use the kvm dirty bitmap
>> directly instead of mapping it to the main dirty bitmap.
>
> That's what the patch set I was alluding to did. Or maybe I imagined
> the whole thing.
it existed. And today would be easier because KQEMU and VGA are not
needed anymore.
>>>> We also need to implement live migration in a separate thread that
>>>> doesn't carry qemu_mutex while it runs.
>>>
>>> IMO that's the biggest hit currently.
>>
>> Yup. That's the Correct solution to the problem.
>
> Then let's just Do it.
Will take a look at splittingthe qemu_mutex bit.
Later, Juan.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-11-30 14:27 ` Avi Kivity
2010-11-30 14:50 ` Anthony Liguori
2010-11-30 17:43 ` Juan Quintela
@ 2010-12-01 1:20 ` Takuya Yoshikawa
2010-12-01 1:52 ` Juan Quintela
2 siblings, 1 reply; 17+ messages in thread
From: Takuya Yoshikawa @ 2010-12-01 1:20 UTC (permalink / raw)
To: Avi Kivity
Cc: Anthony Liguori, Paolo Bonzini, Juan Quintela, qemu-devel,
Juan Quintela, kvm-devel
On Tue, 30 Nov 2010 16:27:13 +0200
Avi Kivity <avi@redhat.com> wrote:
> On 11/30/2010 04:17 PM, Anthony Liguori wrote:
> >> What's the problem with burning that cpu? per guest page,
> >> compressing takes less than sending. Is it just an issue of qemu
> >> mutex hold time?
> >
> >
> > If you have a 512GB guest, then you have a 16MB dirty bitmap which
> > ends up being an 128MB dirty bitmap in QEMU because we represent dirty
> > bits with 8 bits.
>
> Was there not a patchset to split each bit into its own bitmap? And
> then copy the kvm or qemu master bitmap into each client bitmap as it
> became needed?
>
> > Walking 16mb (or 128mb) of memory just fine find a few pages to send
> > over the wire is a big waste of CPU time. If kvm.ko used a
> > multi-level table to represent dirty info, we could walk the memory
> > mapping at 2MB chunks allowing us to skip a large amount of the
> > comparisons.
>
> There's no reason to assume dirty pages would be clustered. If 0.2% of
> memory were dirty, but scattered uniformly, there would be no win from
> the two-level bitmap. A loss, in fact: 2MB can be represented as 512
> bits or 64 bytes, just one cache line. Any two-level thing will need more.
>
> We might have a more compact encoding for sparse bitmaps, like
> run-length encoding.
>
Does anyone is profiling these dirty bitmap things?
- 512GB guest is really the target?
- how much cpu time can we use for these things?
- how many dirty pages do we have to care?
Since we are planning to do some profiling for these, taking into account
Kemari, can you please share these information?
> >
> >>> In the short term, fixing (2) by accounting zero pages as full sized
> >>> pages should "fix" the problem.
> >>>
> >>> In the long term, we need a new dirty bit interface from kvm.ko that
> >>> uses a multi-level table. That should dramatically improve scan
> >>> performance.
> >>
> >> Why would a multi-level table help? (or rather, please explain what
> >> you mean by a multi-level table).
> >>
> >> Something we could do is divide memory into more slots, and polling
> >> each slot when we start to scan its page range. That reduces the
> >> time between sampling a page's dirtiness and sending it off, and
> >> reduces the latency incurred by the sampling. There are also
If we use rmap approach with one more interface, we can specify which
range of dirty bitmap to get. This has the same effect to splitting into
more slots.
> >> non-interface-changing ways to reduce this latency, like O(1) write
> >> protection, or using dirty bits instead of write protection when
> >> available.
IIUC, O(1) will lazily write protect pages beggining from top level?
Does this have any impact other than the timing of get_dirty_log()?
Thanks,
Takuya
> >
> > BTW, we should also refactor qemu to use the kvm dirty bitmap directly
> > instead of mapping it to the main dirty bitmap.
>
> That's what the patch set I was alluding to did. Or maybe I imagined
> the whole thing.
>
> >>> We also need to implement live migration in a separate thread that
> >>> doesn't carry qemu_mutex while it runs.
> >>
> >> IMO that's the biggest hit currently.
> >
> > Yup. That's the Correct solution to the problem.
>
> Then let's just Do it.
>
> --
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-01 1:20 ` Takuya Yoshikawa
@ 2010-12-01 1:52 ` Juan Quintela
2010-12-01 2:22 ` Takuya Yoshikawa
2010-12-01 12:35 ` Avi Kivity
0 siblings, 2 replies; 17+ messages in thread
From: Juan Quintela @ 2010-12-01 1:52 UTC (permalink / raw)
To: Takuya Yoshikawa
Cc: Avi Kivity, Anthony Liguori, Paolo Bonzini, qemu-devel, kvm-devel
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> wrote:
> On Tue, 30 Nov 2010 16:27:13 +0200
> Avi Kivity <avi@redhat.com> wrote:
> Does anyone is profiling these dirty bitmap things?
I am.
> - 512GB guest is really the target?
no, problems exist with smaller amounts of RAM. with 16GB guest it is
trivial to get 1s stalls, 64GB guest, 3-4s, with more memory, migration
is flaky to say the less.
> - how much cpu time can we use for these things?
the problem here is that we are forced to walk the bitmap too many
times, we want to do it less times.
> - how many dirty pages do we have to care?
default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of
dirty pages to have only 30ms of downtime.
But notice that this is what we are advertising, we aren't near there at all.
> Since we are planning to do some profiling for these, taking into account
> Kemari, can you please share these information?
If you see the 0/10 email with this setup, you can see how much time are
we spending on stuff. Just now (for migration, kemari is a bit
different) we have to fix other things first.
Next item for me is te improve that bitmap handling (we can at least
trivial divide the space used by 8, and use ffs to find dirty pages).
Thinknig about changing kvm interface when we convert it to the
bottleneck (it is not at this point).
>> >>> In the short term, fixing (2) by accounting zero pages as full sized
>> >>> pages should "fix" the problem.
>> >>>
>> >>> In the long term, we need a new dirty bit interface from kvm.ko that
>> >>> uses a multi-level table. That should dramatically improve scan
>> >>> performance.
>> >>
>> >> Why would a multi-level table help? (or rather, please explain what
>> >> you mean by a multi-level table).
>> >>
>> >> Something we could do is divide memory into more slots, and polling
>> >> each slot when we start to scan its page range. That reduces the
>> >> time between sampling a page's dirtiness and sending it off, and
>> >> reduces the latency incurred by the sampling. There are also
>
> If we use rmap approach with one more interface, we can specify which
> range of dirty bitmap to get. This has the same effect to splitting into
> more slots.
kvm allows us to that today. It is qemu who don't use this
information. qemu always asks for the whole memory. kvm is happy to
give only a range.
>> >> non-interface-changing ways to reduce this latency, like O(1) write
>> >> protection, or using dirty bits instead of write protection when
>> >> available.
>
> IIUC, O(1) will lazily write protect pages beggining from top level?
> Does this have any impact other than the timing of get_dirty_log()?
dunno.
At this point I am trying to:
- get migration with 16-64GB to not having stalls.
- get infrastructure to be able to know what is going on.
So far, bigger stalls are gone, and discusing what to do next. As
Anthony suggested I run ram_save_live() loop without qemu_mutex, and now
guests get much better interaction, but my current patch (for this) is
just to put qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread()
around it. I think that we are racey with teh access to the bitmap, but
it was just a test.
With respect to Kemari, we can discuss what do you need and how you are
going to test, just to not do overlapping work.
Thanks, Juan.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-01 1:52 ` Juan Quintela
@ 2010-12-01 2:22 ` Takuya Yoshikawa
2010-12-01 12:35 ` Avi Kivity
1 sibling, 0 replies; 17+ messages in thread
From: Takuya Yoshikawa @ 2010-12-01 2:22 UTC (permalink / raw)
To: Juan Quintela
Cc: Avi Kivity, Anthony Liguori, Paolo Bonzini, qemu-devel, kvm-devel
On Wed, 01 Dec 2010 02:52:08 +0100
Juan Quintela <quintela@redhat.com> wrote:
> > Since we are planning to do some profiling for these, taking into account
> > Kemari, can you please share these information?
>
> If you see the 0/10 email with this setup, you can see how much time are
> we spending on stuff. Just now (for migration, kemari is a bit
> different) we have to fix other things first.
>
Thank you for the information.
Sorry, I only had the [9/10] in my kvm mail box and did not notice this [0/10].
> >> >> non-interface-changing ways to reduce this latency, like O(1) write
> >> >> protection, or using dirty bits instead of write protection when
> >> >> available.
> >
> > IIUC, O(1) will lazily write protect pages beggining from top level?
> > Does this have any impact other than the timing of get_dirty_log()?
>
> dunno.
>
> At this point I am trying to:
> - get migration with 16-64GB to not having stalls.
> - get infrastructure to be able to know what is going on.
>
> So far, bigger stalls are gone, and discusing what to do next. As
> Anthony suggested I run ram_save_live() loop without qemu_mutex, and now
> guests get much better interaction, but my current patch (for this) is
> just to put qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread()
> around it. I think that we are racey with teh access to the bitmap, but
> it was just a test.
>
> With respect to Kemari, we can discuss what do you need and how you are
> going to test, just to not do overlapping work.
I see, we've just started to talk about what we have to achieve and what we
can expect. I need to talk with Tamura-san about this a bit more before
showing some example targets.
Thanks,
Takuya
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-01 1:52 ` Juan Quintela
2010-12-01 2:22 ` Takuya Yoshikawa
@ 2010-12-01 12:35 ` Avi Kivity
2010-12-01 13:45 ` Juan Quintela
2010-12-02 1:31 ` Takuya Yoshikawa
1 sibling, 2 replies; 17+ messages in thread
From: Avi Kivity @ 2010-12-01 12:35 UTC (permalink / raw)
To: Juan Quintela
Cc: Takuya Yoshikawa, Anthony Liguori, Paolo Bonzini, qemu-devel,
kvm-devel
On 12/01/2010 03:52 AM, Juan Quintela wrote:
> > - 512GB guest is really the target?
>
> no, problems exist with smaller amounts of RAM. with 16GB guest it is
> trivial to get 1s stalls, 64GB guest, 3-4s, with more memory, migration
> is flaky to say the less.
>
> > - how much cpu time can we use for these things?
>
> the problem here is that we are forced to walk the bitmap too many
> times, we want to do it less times.
How much time is spent walking bitmaps? Are you sure this is the problem?
> > - how many dirty pages do we have to care?
>
> default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of
> dirty pages to have only 30ms of downtime.
1Gb/s * 30ms = 100 MB/s * 30 ms = 3 MB.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-01 12:35 ` Avi Kivity
@ 2010-12-01 13:45 ` Juan Quintela
2010-12-02 1:31 ` Takuya Yoshikawa
1 sibling, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2010-12-01 13:45 UTC (permalink / raw)
To: Avi Kivity
Cc: Takuya Yoshikawa, Anthony Liguori, Paolo Bonzini, qemu-devel,
kvm-devel
Avi Kivity <avi@redhat.com> wrote:
> On 12/01/2010 03:52 AM, Juan Quintela wrote:
>> > - 512GB guest is really the target?
>>
>> no, problems exist with smaller amounts of RAM. with 16GB guest it is
>> trivial to get 1s stalls, 64GB guest, 3-4s, with more memory, migration
>> is flaky to say the less.
>>
>> > - how much cpu time can we use for these things?
>>
>> the problem here is that we are forced to walk the bitmap too many
>> times, we want to do it less times.
>
> How much time is spent walking bitmaps? Are you sure this is the problem?
see my 10/10 patch, that one makes ram_save_live() to move from 80% of the
time in the profile to the second one with ~5% or so.
The important bit is:
static uint64_t ram_save_remaining(void)
{
- RAMBlock *block;
- uint64_t count = 0;
-
- QLIST_FOREACH(block, &ram_list.blocks, next) {
- ram_addr_t addr;
- for (addr = block->offset; addr < block->offset + block->length;
- addr += TARGET_PAGE_SIZE) {
- if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG)) {
- count++;
- }
- }
- }
-
- return count;
+ return ram_list.dirty_pages;
}
We dont' need to walk all bitmap to see how much memory is remaining.
syncing the bitmap is also expensive, but just now we have other issues
that hide it. That is the reason that I didn't started trynig to get a
better interface with kvm, 1st I will try to remove the bigger
bottlenecks.
>> > - how many dirty pages do we have to care?
>>
>> default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of
>> dirty pages to have only 30ms of downtime.
>
> 1Gb/s * 30ms = 100 MB/s * 30 ms = 3 MB.
I will learn to make math with time and bytes at some point O:-)
Later, Juan.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-01 12:35 ` Avi Kivity
2010-12-01 13:45 ` Juan Quintela
@ 2010-12-02 1:31 ` Takuya Yoshikawa
2010-12-02 8:37 ` Avi Kivity
1 sibling, 1 reply; 17+ messages in thread
From: Takuya Yoshikawa @ 2010-12-02 1:31 UTC (permalink / raw)
To: Avi Kivity
Cc: Juan Quintela, Anthony Liguori, Paolo Bonzini, qemu-devel,
kvm-devel
Thanks for the answers Avi, Juan,
Some FYI, (not about the bottleneck)
On Wed, 01 Dec 2010 14:35:57 +0200
Avi Kivity <avi@redhat.com> wrote:
> > > - how many dirty pages do we have to care?
> >
> > default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of
> > dirty pages to have only 30ms of downtime.
>
> 1Gb/s * 30ms = 100 MB/s * 30 ms = 3 MB.
>
3MB / 4KB/page = 750 pages.
Then, KVM side processing is near the theoretical goal!
In my framebuffer test, I tested
nr_dirty_pages/npages = 576/4096
case with the rate of 20 updates/s (1updates/50ms).
Using rmap optimization, write protection only took 46,718 tsc time.
Bitmap copy was not a problem of course.
The display was working anyway at this rate!
In my guess, within 1,000 dirty pages, kvm_vm_ioctl_get_dirty_log()
can be processed within 200us or so even for large RAM slot.
- rmap optimization depends mainly on nr_dirty_pages but npages.
Avi, can you guess the property of O(1) write protection?
I want to test rmap optimization taking these issues into acount.
Of course, Kemari have to continue synchronization, and maybe see
more dirty pages. This will be a future task!
Thanks,
Takuya
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 09/10] Exit loop if we have been there too long
2010-12-02 1:31 ` Takuya Yoshikawa
@ 2010-12-02 8:37 ` Avi Kivity
0 siblings, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2010-12-02 8:37 UTC (permalink / raw)
To: Takuya Yoshikawa
Cc: Juan Quintela, Anthony Liguori, Paolo Bonzini, qemu-devel,
kvm-devel
On 12/02/2010 03:31 AM, Takuya Yoshikawa wrote:
> Thanks for the answers Avi, Juan,
>
> Some FYI, (not about the bottleneck)
>
> On Wed, 01 Dec 2010 14:35:57 +0200
> Avi Kivity<avi@redhat.com> wrote:
>
> > > > - how many dirty pages do we have to care?
> > >
> > > default values and assuming 1Gigabit ethernet for ourselves ~9.5MB of
> > > dirty pages to have only 30ms of downtime.
> >
> > 1Gb/s * 30ms = 100 MB/s * 30 ms = 3 MB.
> >
>
> 3MB / 4KB/page = 750 pages.
>
> Then, KVM side processing is near the theoretical goal!
>
> In my framebuffer test, I tested
>
> nr_dirty_pages/npages = 576/4096
>
> case with the rate of 20 updates/s (1updates/50ms).
>
> Using rmap optimization, write protection only took 46,718 tsc time.
Yes, using rmap to drive write protection with sparse dirty bitmaps
really helps.
> Bitmap copy was not a problem of course.
>
> The display was working anyway at this rate!
>
>
> In my guess, within 1,000 dirty pages, kvm_vm_ioctl_get_dirty_log()
> can be processed within 200us or so even for large RAM slot.
> - rmap optimization depends mainly on nr_dirty_pages but npages.
>
> Avi, can you guess the property of O(1) write protection?
> I want to test rmap optimization taking these issues into acount.
I think we should use O(1) write protection only if there is a large
number of dirty pages. With a small number, using rmap guided by the
previous dirty bitmap is faster.
So, under normal operation where only the framebuffer is logged, we'd
use rmap write protection; when enabling logging for live migration we'd
use O(1) write protection, after a few iterations when the number of
dirty pages drops, we switch back to rmap write protection.
> Of course, Kemari have to continue synchronization, and maybe see
> more dirty pages. This will be a future task!
>
There's yet another option, of using dirty bits instead of write
protection. Or maybe using write protection in the upper page tables
and dirty bits in the lowest level.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 17+ messages in thread