kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Radim Krčmář" <rkrcmar@redhat.com>
To: "Cao, Lei" <Lei.Cao@stratus.com>
Cc: "Huang, Kai" <kai.huang@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH 3/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration
Date: Wed, 4 May 2016 21:27:10 +0200	[thread overview]
Message-ID: <20160504192709.GH30059@potion> (raw)
In-Reply-To: <BL2PR08MB481F421C9AE82E4B4EDFC35F07B0@BL2PR08MB481.namprd08.prod.outlook.com>

2016-05-04 17:15+0000, Cao, Lei:
> On 5/4/2016 9:13 AM, Radim Krčmář wrote:
>> Good designs so far seem to be:
>>  memslot -> lockless radix tree
>> and
>>  vcpu -> memslot -> list  (memslot -> vcpu -> list)
>>
> 
> There is no need for lookup, the dirty log is fetched in sequence, so why use
> radix tree with added complexity but no benefit?
> 
> List can be designed to be lockless, so memslot -> lockless fixed list?

It can, but lockless list for concurrent writers is harder than lockless
list for a concurrent writer and reader.
The difference is in starvation -- it's possible that VCPU would never
get to write an entry unless you implemented a queueing mechanism.
A queueing mechanism means that you basically have a spinlock, so I
wouldn't bother with a lockless list and just try spinlock directly.

A spinlock with very short critical section might actually work well for
< 256 VCPU and is definitely the easiest option.  Worth experimenting
with, IMO.

Lockless radix tree doesn't starve.  Every entry has a well defined
place in the tree.  The entry just might not be fully allocated yet.
If another VCPU is faster and expands the tree, then other VCPUs use
that extended tree until they all get to their leaf nodes, VCPUs
basically cooperate on growing the tree.

And I completely forgot that we can preallocate the whole tree and use a
effective packed storage thanks to that.  My first guess is that it
would be make sense with double the memory of our bitmap.  Scans and
insertion would be slower than for a per-vcpu list, but much faster than
with a dynamically allocated structure.  I'll think a bit about that.

The main reason why I'd like something that can contain all dirty pages
is overflow -- the userspace has to treat *all* pages as dirty if we
lose a dirty page, so overflow must never happen -- we have to either
grow the dirty log or suspend the writer until userspace frees space ...

  parent reply	other threads:[~2016-05-04 19:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201604261855.u3QItn85024244@dev1.sn.stratus.com>
2016-04-26 19:24 ` [PATCH 3/6] KVM: Dirty memory tracking for performant checkpointing and improved live migration Cao, Lei
2016-04-28  9:13   ` Huang, Kai
2016-04-28 19:58     ` Cao, Lei
2016-04-29 18:19       ` Radim Krčmář
2016-05-02 15:24         ` Cao, Lei
2016-05-02 15:46           ` Radim Krčmář
2016-05-02 15:51             ` Cao, Lei
2016-05-03  6:06           ` Huang, Kai
2016-05-03 14:11             ` Radim Krčmář
2016-05-04  7:45               ` Huang, Kai
2016-05-04 13:13                 ` Radim Krčmář
2016-05-04 13:51                   ` Cao, Lei
2016-05-04 17:15                   ` Cao, Lei
2016-05-04 18:33                     ` Cao, Lei
2016-05-04 18:57                       ` Radim Krčmář
2016-05-06  9:46                         ` Kai Huang
2016-05-06 12:09                           ` Radim Krčmář
2016-05-06 15:13                             ` Cao, Lei
2016-05-06 16:04                               ` Radim Krčmář
2016-05-24 17:19                                 ` Cao, Lei
2016-06-30 13:49                                 ` Cao, Lei
2016-05-07  1:48                             ` Kai Huang
2016-05-04 19:27                     ` Radim Krčmář [this message]
2016-05-05 16:26                       ` Radim Krčmář
2016-05-06 15:19                         ` Cao, Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160504192709.GH30059@potion \
    --to=rkrcmar@redhat.com \
    --cc=Lei.Cao@stratus.com \
    --cc=kai.huang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).