Re: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Vipin Sharma <vipinsh@google.com>
Cc: David Matlack <dmatlack@google.com>,
	pbonzini@redhat.com, bgardon@google.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches
Date: Wed, 18 Jan 2023 17:43:46 +0000	[thread overview]
Message-ID: <Y8gv0srYi+6PvJml@google.com> (raw)
In-Reply-To: <CAHVum0cZtDYZN2bD3TZgUNpcWiy2-Qkw1mb40syut_2kkR=Agg@mail.gmail.com>

@all, trim your replies!

On Tue, Jan 03, 2023, Vipin Sharma wrote:
> On Tue, Jan 3, 2023 at 10:01 AM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On Thu, Dec 29, 2022 at 1:55 PM David Matlack <dmatlack@google.com> wrote:
> > > > @@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
> > > >  static unsigned long
> > > >  mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > >  {
> > > > -     struct kvm *kvm;
> > > > -     int nr_to_scan = sc->nr_to_scan;
> > > > +     struct kvm_mmu_memory_cache *cache;
> > > > +     struct kvm *kvm, *first_kvm = NULL;
> > > >       unsigned long freed = 0;
> > > > +     /* spinlock for memory cache */
> > > > +     spinlock_t *cache_lock;
> > > > +     struct kvm_vcpu *vcpu;
> > > > +     unsigned long i;
> > > >
> > > >       mutex_lock(&kvm_lock);
> > > >
> > > >       list_for_each_entry(kvm, &vm_list, vm_list) {
> > > > -             int idx;
> > > > -             LIST_HEAD(invalid_list);
> > > > -
> > > > -             /*
> > > > -              * Never scan more than sc->nr_to_scan VM instances.
> > > > -              * Will not hit this condition practically since we do not try
> > > > -              * to shrink more than one VM and it is very unlikely to see
> > > > -              * !n_used_mmu_pages so many times.
> > > > -              */
> > > > -             if (!nr_to_scan--)
> > > > +             if (first_kvm == kvm)
> > > >                       break;
> > > > -             /*
> > > > -              * n_used_mmu_pages is accessed without holding kvm->mmu_lock
> > > > -              * here. We may skip a VM instance errorneosly, but we do not
> > > > -              * want to shrink a VM that only started to populate its MMU
> > > > -              * anyway.
> > > > -              */
> > > > -             if (!kvm->arch.n_used_mmu_pages &&
> > > > -                 !kvm_has_zapped_obsolete_pages(kvm))
> > > > -                     continue;
> > > > +             if (!first_kvm)
> > > > +                     first_kvm = kvm;
> > > > +             list_move_tail(&kvm->vm_list, &vm_list);
> > > >
> > > > -             idx = srcu_read_lock(&kvm->srcu);
> > > > -             write_lock(&kvm->mmu_lock);
> > > > +             kvm_for_each_vcpu(i, vcpu, kvm) {
> > >
> > > What protects this from racing with vCPU creation/deletion?
> > >
> 
> vCPU deletion:
> We take kvm_lock in mmu_shrink_scan(), the same lock is taken in
> kvm_destroy_vm() to remove a vm from vm_list. So, once we are
> iterating vm_list we will not see any VM removal which will means no
> vcpu removal.
> 
> I didn't find any other code for vCPU deletion except failures during
> VM and VCPU set up. A VM is only added to vm_list after successful
> creation.

Yep, KVM doesn't support destroying/freeing a vCPU after it's been added.

> vCPU creation:
> I think it will work.
> 
> kvm_vm_ioctl_create_vcpus() initializes the vcpu, adds it to
> kvm->vcpu_array which is of the type xarray and is managed by RCU.
> After this online_vcpus is incremented. So, kvm_for_each_vcpu() which
> uses RCU to read entries, if it sees incremented online_vcpus value
> then it will also sees all of the vcpu initialization.

Yep.  The shrinker may race with a vCPU creation, e.g. not process a just-created
vCPU, but that's totally ok in this case since the shrinker path is best effort
(and purging the caches of a newly created vCPU is likely pointless).

> @Sean, Paolo
> 
> Is the above explanation correct, kvm_for_each_vcpu() is safe without any lock?

Well, in this case, you do need to hold kvm_lock ;-)

But yes, iterating over vCPUs without holding the per-VM kvm->lock is safe, the
caller just needs to ensure the VM can't be destroyed, i.e. either needs to hold
a reference to the VM or needs to hold kvm_lock.

next prev parent reply	other threads:[~2023-01-18 17:45 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-22  2:34 [Patch v3 0/9] NUMA aware page table's pages allocation Vipin Sharma
2022-12-22  2:34 ` [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches Vipin Sharma
2022-12-27 18:37   ` Ben Gardon
2022-12-28 22:07     ` Vipin Sharma
2022-12-29 21:15       ` David Matlack
2023-01-03 17:38         ` Vipin Sharma
2022-12-29 21:54   ` David Matlack
2023-01-03 18:01     ` Vipin Sharma
2023-01-04  0:25       ` Vipin Sharma
2023-01-18 17:43         ` Sean Christopherson [this message]
2023-01-03 19:32   ` Mingwei Zhang
2023-01-04  1:00     ` Vipin Sharma
2023-01-04  6:29       ` Mingwei Zhang
2023-01-04  6:57         ` Mingwei Zhang
2023-01-18 17:36         ` Sean Christopherson
2023-01-16  4:14   ` kernel test robot
2022-12-22  2:34 ` [Patch v3 2/9] KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} Vipin Sharma
2022-12-29 21:59   ` David Matlack
2022-12-22  2:34 ` [Patch v3 3/9] KVM: x86/mmu: Shrink split_shadow_page_cache via KVM MMU shrinker Vipin Sharma
2022-12-22  2:34 ` [Patch v3 4/9] KVM: Add module param to make page tables NUMA aware Vipin Sharma
2022-12-29 22:05   ` David Matlack
2022-12-22  2:34 ` [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on correct NUMA node on split Vipin Sharma
2022-12-27 19:02   ` Ben Gardon
2022-12-28 22:07     ` Vipin Sharma
2022-12-29 22:30   ` David Matlack
2023-01-03 18:26     ` Vipin Sharma
2022-12-22  2:34 ` [Patch v3 6/9] KVM: Provide NUMA node support to kvm_mmu_memory_cache{} Vipin Sharma
2022-12-27 19:09   ` Ben Gardon
2022-12-28 22:07     ` Vipin Sharma
2022-12-29 18:22       ` Ben Gardon
2023-01-03 17:36         ` Vipin Sharma
2022-12-29 23:08   ` David Matlack
2022-12-29 23:11     ` David Matlack
2023-01-03 18:45       ` Vipin Sharma
2023-01-03 18:55         ` David Matlack
2022-12-22  2:34 ` [Patch v3 7/9] KVM: x86/mmu: Allocate page table's pages on NUMA node of the underlying pages Vipin Sharma
2022-12-27 19:34   ` Ben Gardon
2022-12-28 22:08     ` Vipin Sharma
2022-12-29 18:20       ` Ben Gardon
2022-12-22  2:34 ` [Patch v3 8/9] KVM: x86/mmu: Make split_shadow_page_cache NUMA aware Vipin Sharma
2022-12-27 19:42   ` Ben Gardon
2022-12-28 22:08     ` Vipin Sharma
2022-12-29 23:18   ` David Matlack
2023-01-03 18:49     ` Vipin Sharma
2022-12-22  2:34 ` [Patch v3 9/9] KVM: x86/mmu: Reduce default cache size in KVM from 40 to PT64_ROOT_MAX_LEVEL Vipin Sharma
2022-12-27 19:52   ` Ben Gardon
2022-12-28 22:08     ` Vipin Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8gv0srYi+6PvJml@google.com \
    --to=seanjc@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.