From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Huang, Kai" <kai.huang@linux.intel.com>
Subject: Re: [PATCH 3/6] KVM: Dirty memory tracking for performant
 checkpointing and improved live migration
Date: Wed, 4 May 2016 19:45:08 +1200
Message-ID: <32d8060e-648c-cf99-970a-3ddadc6a501a@linux.intel.com>
References: <201604261855.u3QItn85024244@dev1.sn.stratus.com>
 <BL2PR08MB4812F929A2760BC40EA757CF0630@BL2PR08MB481.namprd08.prod.outlook.com>
 <33d8668e-2bba-af91-069e-6452609a6ff0@linux.intel.com>
 <BL2PR08MB4818EC8F767DEB112204FE4F0650@BL2PR08MB481.namprd08.prod.outlook.com>
 <20160429181911.GA2687@potion>
 <BL2PR08MB4811CE322D58EBBDCFA6EFBF0790@BL2PR08MB481.namprd08.prod.outlook.com>
 <b467c2c5-b680-692a-b278-578a911dd674@linux.intel.com>
 <20160503141118.GA27975@potion>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
	format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: "Cao, Lei" <Lei.Cao@stratus.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga14.intel.com ([192.55.52.115]:8883 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751256AbcEDHpP (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 4 May 2016 03:45:15 -0400
In-Reply-To: <20160503141118.GA27975@potion>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 5/4/2016 2:11 AM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote:
> 2016-05-03 18:06+1200, Huang, Kai:
>> Actually my concern is, with your new mechanism to track guest dirty=
 pages,
>> there will be two logdirty mechanisms (using bitmap and your per-vcp=
u list),
>> which I think is not good as it's a little bit redundant, given both
>> mechanisms are used for dirty page logging.
>>
>> I think your main concern of current bitmap mechanism is scanning bi=
tmap
>> takes lots of time, especially when only few pages get dirty, you st=
ill have
>> to scan the entire bitmap, which results in bad performance if you r=
uns
>> checkpoint very frequently. My suggestion is, instead of introducing=
 two
>> logdirty data structures, maybe you can try to use another more effi=
cient
>> data structure instead of bitmap for both current logdirty mechanism=
 and
>> your new interfaces. Maybe Xen's log-dirty tree is a good reference.
>
> A sparse structure (buffer, tree, ...) also needs a mechanism to grow
> (store new entries), so concurrent accesses become a problem, because
> there has to be synchronization.  I think that per-vcpu structure
> becomes mandatory when thousands VCPUs dirty memory at the same time.

Yes synchronization will be needed. But even for per-vcpu structure, we=
=20
still need per-vcpu lock to access, say, gfn_list, right? For example,=20
one thread from userspace trying to get and clear dirty pages would nee=
d=20
to loop all vcpus and acquire each vcpu's lock for gfn_list. (see=20
function mt_reset_all_gfns in patch 3/6). Looks this is not scalable=20
neither?

>
>>                      Maybe Xen's log-dirty tree is a good reference.
>
> Is there some top-level overview?
>
>>>From a glance at the code, it looked like GPA bitmap sparsified with
> radix tree in a manner similar to the page table hierarchy.

Yes it is just a radix tree. The point is the tree will be pretty small=
=20
if there are few dirty pages, so the scanning will be very quick,=20
comparing to bitmap.

>
>> Of course this is just my concern and I'll leave it to maintainers.
>
> I too would prefer if both userspace interfaces used a common backend=
=2E
> A possible backend for that is
>
>   vcpu -> memslot -> sparse dirty log

This is the most reasonable proposal I think, at least for the first=20
step to improve performance.

>
> We should have dynamic sparse dirty log, to avoid wasting memory when
> there are many small memslots, but a linear structure is probably sti=
ll
> fine.

The sparse dirty log structure can be allocated when necessary so I=20
don't think it will waste of memory. Take radix tree as example, if=20
there's no dirty page in the slot, the pointer to radix can be NULL, or=
=20
just root entry.

>
> We don't care which vcpu dirtied the page, so it seems like a waste t=
o
> have them in the hierarchy, but I can't think of designs where the
> sparse dirty log is rooted in memslot and its updates scale well.
>
> 'memslot -> sparse dirty log' usually evolve into buffering on the VC=
PU
> side before writing to the memslot or aren't efficient for sparse
> dataset.
>
> Where do you think is the balance between 'memslot -> bitmap' and
> 'vcpu -> memslot -> dirty buffer'?

In my opinion, we can first try 'memslot -> sparse dirty log'. Cao, Lei=
=20
mentioned there were two bottlenecks: bitmap and bad multithread=20
performance due to mmu_lock. I think 'memslot->sparse dirty log' might=20
help to improve or solve the bitmap one.

>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>