From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
Subject: Re: [PATCH 3/6] KVM: Dirty memory tracking for performant
 checkpointing and improved live migration
Date: Wed, 4 May 2016 21:27:10 +0200
Message-ID: <20160504192709.GH30059@potion>
References: <BL2PR08MB4812F929A2760BC40EA757CF0630@BL2PR08MB481.namprd08.prod.outlook.com>
 <33d8668e-2bba-af91-069e-6452609a6ff0@linux.intel.com>
 <BL2PR08MB4818EC8F767DEB112204FE4F0650@BL2PR08MB481.namprd08.prod.outlook.com>
 <20160429181911.GA2687@potion>
 <BL2PR08MB4811CE322D58EBBDCFA6EFBF0790@BL2PR08MB481.namprd08.prod.outlook.com>
 <b467c2c5-b680-692a-b278-578a911dd674@linux.intel.com>
 <20160503141118.GA27975@potion>
 <32d8060e-648c-cf99-970a-3ddadc6a501a@linux.intel.com>
 <20160504131314.GA27590@potion>
 <BL2PR08MB481F421C9AE82E4B4EDFC35F07B0@BL2PR08MB481.namprd08.prod.outlook.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: "Huang, Kai" <kai.huang@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
To: "Cao, Lei" <Lei.Cao@stratus.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:41571 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751333AbcEDT1N (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 4 May 2016 15:27:13 -0400
Content-Disposition: inline
In-Reply-To: <BL2PR08MB481F421C9AE82E4B4EDFC35F07B0@BL2PR08MB481.namprd08.prod.outlook.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2016-05-04 17:15+0000, Cao, Lei:
> On 5/4/2016 9:13 AM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote:
>> Good designs so far seem to be:
>>  memslot -> lockless radix tree
>> and
>>  vcpu -> memslot -> list  (memslot -> vcpu -> list)
>>
>=20
> There is no need for lookup, the dirty log is fetched in sequence, so=
 why use
> radix tree with added complexity but no benefit?
>=20
> List can be designed to be lockless, so memslot -> lockless fixed lis=
t?

It can, but lockless list for concurrent writers is harder than lockles=
s
list for a concurrent writer and reader.
The difference is in starvation -- it's possible that VCPU would never
get to write an entry unless you implemented a queueing mechanism.
A queueing mechanism means that you basically have a spinlock, so I
wouldn't bother with a lockless list and just try spinlock directly.

A spinlock with very short critical section might actually work well fo=
r
< 256 VCPU and is definitely the easiest option.  Worth experimenting
with, IMO.

Lockless radix tree doesn't starve.  Every entry has a well defined
place in the tree.  The entry just might not be fully allocated yet.
If another VCPU is faster and expands the tree, then other VCPUs use
that extended tree until they all get to their leaf nodes, VCPUs
basically cooperate on growing the tree.

And I completely forgot that we can preallocate the whole tree and use =
a
effective packed storage thanks to that.  My first guess is that it
would be make sense with double the memory of our bitmap.  Scans and
insertion would be slower than for a per-vcpu list, but much faster tha=
n
with a dynamically allocated structure.  I'll think a bit about that.

The main reason why I'd like something that can contain all dirty pages
is overflow -- the userspace has to treat *all* pages as dirty if we
lose a dirty page, so overflow must never happen -- we have to either
grow the dirty log or suspend the writer until userspace frees space ..=
=2E