All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>,
	Juan Quintela <quintela@redhat.com>,
	Orit Wasserman <owasserm@redhat.com>,
	Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
Date: Wed, 17 Oct 2012 10:31:54 +0000	[thread overview]
Message-ID: <507E891A.7050800@redhat.com> (raw)
In-Reply-To: <20121016215253.GB8331@bloggs.ozlabs.ibm.com>

On 10/16/2012 11:52 PM, Paul Mackerras wrote:
> On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
>> On 10/16/2012 01:58 PM, Paul Mackerras wrote:
>> > On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
>> >> Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
>> >> to change).
>> > 
>> > No.
>> 
>> This forces userspace to dedicate a thread for the HPT.
> 
> Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

>> 
>> I meant the internal data structure that holds HPT entries.
> 
> Oh, that's just an array, and userspace already knows how big it is.
> 
>> I guess I don't understand the index.  Do we expect changes to be in
>> contiguous ranges?  And invalid entries to be contiguous as well?  That
>> doesn't fit with how hash tables work.  Does the index represent the
>> position of the entry within the table, or something else?
> 
> The index is just the position in the array.  Typically, in each group
> of 8 it will tend to be the low-numbered ones that are valid, since
> creating an entry usually uses the first empty slot.  So I expect that
> on the first pass, most of the records will represent 8 HPTEs.  On
> subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

>> 
>> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
>> it warrant a live migration protocol?
> 
> The qemu people I talked to seemed to think so.
> 
>> > Because it is a hash table, updates tend to be scattered throughout
>> > the whole table, which is another reason why per-page dirty tracking
>> > and updates would be pretty inefficient.
>> 
>> This suggests a stream format that includes the index in every entry.
> 
> That would amount to dropping the n_valid and n_invalid fields from
> the current header format.  That would be less efficient for the
> initial pass (assuming we achieve an average n_valid of at least 2 on
> the initial pass), and probably less efficient for the incremental
> updates, since a newly-invalidated entry would have to be represented
> as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
> n_invalid=1.  I'm assuming here that the initial pass would omit
> invalid entries.

I agree.  But let's have some measurements to make sure.

> 
>> > 
>> > As for the change rate, it depends on the application of course, but
>> > basically every time the guest changes a PTE in its Linux page tables
>> > we do the corresponding change to the corresponding HPT entry, so the
>> > rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
>> > exec, etc. have a high rate of HPT updates.
>> 
>> If the rate is high enough, then there's no point in a live update.
> 
> True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function

WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Paul Mackerras <paulus@samba.org>
Cc: Alexander Graf <agraf@suse.de>,
	kvm@vger.kernel.org, kvm-ppc@vger.kernel.org,
	David Gibson <david@gibson.dropbear.id.au>,
	Juan Quintela <quintela@redhat.com>,
	Orit Wasserman <owasserm@redhat.com>,
	Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
Date: Wed, 17 Oct 2012 12:31:54 +0200	[thread overview]
Message-ID: <507E891A.7050800@redhat.com> (raw)
In-Reply-To: <20121016215253.GB8331@bloggs.ozlabs.ibm.com>

On 10/16/2012 11:52 PM, Paul Mackerras wrote:
> On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
>> On 10/16/2012 01:58 PM, Paul Mackerras wrote:
>> > On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
>> >> Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
>> >> to change).
>> > 
>> > No.
>> 
>> This forces userspace to dedicate a thread for the HPT.
> 
> Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

>> 
>> I meant the internal data structure that holds HPT entries.
> 
> Oh, that's just an array, and userspace already knows how big it is.
> 
>> I guess I don't understand the index.  Do we expect changes to be in
>> contiguous ranges?  And invalid entries to be contiguous as well?  That
>> doesn't fit with how hash tables work.  Does the index represent the
>> position of the entry within the table, or something else?
> 
> The index is just the position in the array.  Typically, in each group
> of 8 it will tend to be the low-numbered ones that are valid, since
> creating an entry usually uses the first empty slot.  So I expect that
> on the first pass, most of the records will represent 8 HPTEs.  On
> subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

>> 
>> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
>> it warrant a live migration protocol?
> 
> The qemu people I talked to seemed to think so.
> 
>> > Because it is a hash table, updates tend to be scattered throughout
>> > the whole table, which is another reason why per-page dirty tracking
>> > and updates would be pretty inefficient.
>> 
>> This suggests a stream format that includes the index in every entry.
> 
> That would amount to dropping the n_valid and n_invalid fields from
> the current header format.  That would be less efficient for the
> initial pass (assuming we achieve an average n_valid of at least 2 on
> the initial pass), and probably less efficient for the incremental
> updates, since a newly-invalidated entry would have to be represented
> as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
> n_invalid=1.  I'm assuming here that the initial pass would omit
> invalid entries.

I agree.  But let's have some measurements to make sure.

> 
>> > 
>> > As for the change rate, it depends on the application of course, but
>> > basically every time the guest changes a PTE in its Linux page tables
>> > we do the corresponding change to the corresponding HPT entry, so the
>> > rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
>> > exec, etc. have a high rate of HPT updates.
>> 
>> If the rate is high enough, then there's no point in a live update.
> 
> True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2012-10-17 10:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16  3:58 [PATCH 0/5] KVM: PPC: Book3S HV: HPT read/write functions for userspace Paul Mackerras
2012-10-16  3:58 ` Paul Mackerras
2012-10-16  3:59 ` [PATCH 1/5] KVM: Provide mmu notifier retry test based on struct kvm Paul Mackerras
2012-10-16  3:59   ` Paul Mackerras
2012-10-16  9:44   ` Avi Kivity
2012-10-16  9:44     ` Avi Kivity
2012-10-16 10:06     ` Alexander Graf
2012-10-16 10:06       ` Alexander Graf
2012-10-16  4:00 ` [PATCH 2/5] KVM: PPC: Book3S HV: Restructure HPT entry creation code Paul Mackerras
2012-10-16  4:00   ` Paul Mackerras
2012-10-16  4:00 ` [PATCH 3/5] KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs Paul Mackerras
2012-10-16  4:00   ` Paul Mackerras
2012-10-16  4:01 ` [PATCH 4/5] KVM: PPC: Book3S HV: Make a HPTE removal function available Paul Mackerras
2012-10-16  4:01   ` Paul Mackerras
2012-10-16  4:01 ` [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT Paul Mackerras
2012-10-16  4:01   ` Paul Mackerras
2012-10-16 10:06   ` Avi Kivity
2012-10-16 10:06     ` Avi Kivity
2012-10-16 11:58     ` Paul Mackerras
2012-10-16 11:58       ` Paul Mackerras
2012-10-16 13:06       ` Avi Kivity
2012-10-16 13:06         ` Avi Kivity
2012-10-16 20:03         ` Anthony Liguori
2012-10-16 20:03           ` Anthony Liguori
2012-10-17 10:27           ` Avi Kivity
2012-10-17 10:27             ` Avi Kivity
2012-10-16 21:52         ` Paul Mackerras
2012-10-16 21:52           ` Paul Mackerras
2012-10-17 10:31           ` Avi Kivity [this message]
2012-10-17 10:31             ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507E891A.7050800@redhat.com \
    --to=avi@redhat.com \
    --cc=agraf@suse.de \
    --cc=anthony@codemonkey.ws \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=owasserm@redhat.com \
    --cc=paulus@samba.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.