Re: [PATCH 1/6] KVM: Cache the least recently used slot index per vCPU

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Matlack <dmatlack@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, Ben Gardon <bgardon@google.com>,
	Joerg Roedel <joro@8bytes.org>, Jim Mattson <jmattson@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Junaid Shahid <junaids@google.com>,
	Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH 1/6] KVM: Cache the least recently used slot index per vCPU
Date: Mon, 2 Aug 2021 16:27:47 +0000	[thread overview]
Message-ID: <YQgdA6Blu4vYToLM@google.com> (raw)
In-Reply-To: <b87b9f52-b763-856f-16f0-ecb668ba22c1@redhat.com>

On Mon, Aug 02, 2021 at 04:36:24PM +0200, Paolo Bonzini wrote:
> On 31/07/21 00:37, David Matlack wrote:
> > The memslot for a given gfn is looked up multiple times during page
> > fault handling. Avoid binary searching for it multiple times by caching
> > the least recently used slot. There is an existing VM-wide LRU slot but
> > that does not work well for cases where vCPUs are accessing memory in
> > different slots (see performance data below).
> > 
> > Another benefit of caching the least recently use slot (versus looking
> > up the slot once and passing around a pointer) is speeding up memslot
> > lookups *across* faults and during spte prefetching.
> > 
> > To measure the performance of this change I ran dirty_log_perf_test with
> > 64 vCPUs and 64 memslots and measured "Populate memory time" and
> > "Iteration 2 dirty memory time".  Tests were ran with eptad=N to force
> > dirty logging to use fast_page_fault so its performance could be
> > measured.
> > 
> > Config     | Metric                        | Before | After
> > ---------- | ----------------------------- | ------ | ------
> > tdp_mmu=Y  | Populate memory time          | 6.76s  | 5.47s
> > tdp_mmu=Y  | Iteration 2 dirty memory time | 2.83s  | 0.31s
> > tdp_mmu=N  | Populate memory time          | 20.4s  | 18.7s
> > tdp_mmu=N  | Iteration 2 dirty memory time | 2.65s  | 0.30s
> > 
> > The "Iteration 2 dirty memory time" results are especially compelling
> > because they are equivalent to running the same test with a single
> > memslot. In other words, fast_page_fault performance no longer scales
> > with the number of memslots.
> > 
> > Signed-off-by: David Matlack <dmatlack@google.com>
> 
> It's the *most* recently used slot index, of course. :)  That's true of
> lru_slot as well.

*facepalm*

I'll include a patch in v2 to fix the name of the existing lru_slot to
something like mru_slot or last_used_slot.

> 
> > +static inline struct kvm_memory_slot *get_slot(struct kvm_memslots *slots, int slot_index)
> > +{
> > +	if (slot_index < 0 || slot_index >= slots->used_slots)
> > +		return NULL;
> > +
> > +	return &slots->memslots[slot_index];
> > +}
> > +
> 
> Since there are plenty of arrays inside struct kvm_memory_slot*, do we want
> to protect this against speculative out-of-bounds accesses with
> array_index_nospec?

I'm not sure when it makes sense to use array_index_nospec. Perhaps this
is a good candidate since vcpu->lru_slot_index and the length of
memslots[] can both be controlled by userspace.

I'll play around with adding it to see if there are any performance
trade-offs.

> 
> > +static inline struct kvm_memory_slot *
> > +search_memslots(struct kvm_memslots *slots, gfn_t gfn)
> > +{
> > +	int slot_index = __search_memslots(slots, gfn);
> > +
> > +	return get_slot(slots, slot_index);
> >  }
> 
> Let's use this occasion to remove the duplication between __gfn_to_memslot
> and search_memslots; you can make search_memslots do the search and palce
> the LRU (ehm, MRU) code to __gfn_to_memslot only.  So:
> 
> - the new patch 1 (something like "make search_memslots search without LRU
> caching") is basically this series's patch 2, plus a tree-wide replacement
> of search_memslots with __gfn_to_memslot.
> 
> - the new patch 2 is this series's patch 1, except kvm_vcpu_gfn_to_memslot
> uses search_memslots instead of __search_memslots.  The comments in patch
> 2's commit message about the double misses move to this commit message.

Will do.

> 
> > +	if (slot)
> > +		vcpu->lru_slot_index = slot_index;
> 
> Let's call it lru_slot for consistency with the field of struct
> kvm_memory_slots.

I'll make sure the two have consistent names when I rename them to get
rid of "lru".

> 
> Paolo
>

next prev parent reply	other threads:[~2021-08-02 16:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 22:37 [PATCH 0/6] Improve gfn-to-memslot performance during page faults David Matlack
2021-07-30 22:37 ` [PATCH 1/6] KVM: Cache the least recently used slot index per vCPU David Matlack
2021-08-02 14:36   ` Paolo Bonzini
2021-08-02 16:27     ` David Matlack [this message]
2021-08-02 16:38       ` Sean Christopherson
2021-07-30 22:37 ` [PATCH 2/6] KVM: Avoid VM-wide lru_slot lookup in kvm_vcpu_gfn_to_memslot David Matlack
2021-07-30 22:37 ` [PATCH 3/6] KVM: x86/mmu: Speed up dirty logging in tdp_mmu_map_handle_target_level David Matlack
2021-08-02 14:58   ` Paolo Bonzini
2021-08-02 16:31     ` David Matlack
2021-07-30 22:37 ` [PATCH 4/6] KVM: x86/mmu: Leverage vcpu->lru_slot_index for rmap_add and rmap_recycle David Matlack
2021-08-02 14:58   ` Paolo Bonzini
2021-07-30 22:37 ` [PATCH 5/6] KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap David Matlack
2021-07-31  9:41   ` kernel test robot
2021-07-31  9:41     ` kernel test robot
2021-07-31 12:22   ` kernel test robot
2021-07-31 12:22     ` kernel test robot
2021-08-02 14:59   ` Paolo Bonzini
2021-07-30 22:37 ` [PATCH 6/6] KVM: selftests: Support multiple slots in dirty_log_perf_test David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YQgdA6Blu4vYToLM@google.com \
    --to=dmatlack@google.com \
    --cc=bgardon@google.com \
    --cc=drjones@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.