All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Ricardo Koller <ricarkol@google.com>
Cc: ricarkol@gmail.com, kvm@vger.kernel.org, catalin.marinas@arm.com,
	kvmarm@lists.linux.dev, andrew.jones@linux.dev,
	bgardon@google.com, maz@kernel.org, dmatlack@google.com,
	pbonzini@redhat.com, kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging
Date: Mon, 14 Nov 2022 18:42:36 +0000	[thread overview]
Message-ID: <Y3KMHGvIEuwhU1wS@google.com> (raw)
In-Reply-To: <20221112081714.2169495-1-ricarkol@google.com>

Hi Ricardo,

On Sat, Nov 12, 2022 at 08:17:02AM +0000, Ricardo Koller wrote:
> Hi,
> 
> I'm sending this RFC mainly to get some early feedback on the approach used
> for implementing "Eager Page Splitting" on ARM.  "Eager Page Splitting"
> improves the performance of dirty-logging (used in live migrations) when
> guest memory is backed by huge-pages.  It's an optimization used in Google
> Cloud since 2016 on x86, and for the last couple of months on ARM.
> 
> I tried multiple ways of implementing this optimization on ARM: from
> completely reusing the stage2 mapper, to implementing a new walker from
> scratch, and some versions in between. This RFC is one of those in
> between. They all have similar performance benefits, based on some light
> performance testing (mainly dirty_log_perf_test).
> 
> Background and motivation
> =========================
> Dirty logging is typically used for live-migration iterative copying.  KVM
> implements dirty-logging at the PAGE_SIZE granularity (will refer to 4K
> pages from now on).  It does it by faulting on write-protected 4K pages.
> Therefore, enabling dirty-logging on a huge-page requires breaking it into
> 4K pages in the first place.  KVM does this breaking on fault, and because
> it's in the critical path it only maps the 4K page that faulted; every
> other 4K page is left unmapped.  This is not great for performance on ARM
> for a couple of reasons:
> 
> - Splitting on fault can halt vcpus for milliseconds in some
>   implementations. Splitting a block PTE requires using a broadcasted TLB
>   invalidation (TLBI) for every huge-page (due to the break-before-make
>   requirement). Note that x86 doesn't need this. We observed some
>   implementations that take millliseconds to complete broadcasted TLBIs
>   when done in parallel from multiple vcpus.  And that's exactly what
>   happens when doing it on fault: multiple vcpus fault at the same time
>   triggering TLBIs in parallel.
> 
> - Read intensive guest workloads end up paying for dirty-logging.  Only
>   mapping the faulting 4K page means that all the other pages that were
>   part of the huge-page will now be unmapped. The effect is that any
>   access, including reads, now has to fault.
> 
> Eager Page Splitting (on ARM)
> =============================
> Eager Page Splitting fixes the above two issues by eagerly splitting
> huge-pages when enabling dirty logging. The goal is to avoid doing it while
> faulting on write-protected pages. This is what the TDP MMU does for x86
> [0], except that x86 does it for different reasons: to avoid grabbing the
> MMU lock on fault. Note that taking care of write-protection faults still
> requires grabbing the MMU lock on ARM, but not on x86 (with the
> fast_page_fault path).
> 
> An additional benefit of eagerly splitting huge-pages is that it can be
> done in a controlled way (e.g., via an IOCTL). This series provides two
> knobs for doing it, just like its x86 counterpart: when enabling dirty
> logging, and when using the KVM_CLEAR_DIRTY_LOG ioctl. The benefit of doing
> it on KVM_CLEAR_DIRTY_LOG is that this ioctl takes ranges, and not complete
> memslots like when enabling dirty logging. This means that the cost of
> splitting (mainly broadcasted TLBIs) can be throttled: split a range, wait
> for a bit, split another range, etc. The benefits of this approach were
> presented by Oliver Upton at KVM Forum 2022 [1].
> 
> Implementation
> ==============
> Patches 1-4 add a pgtable utility function for splitting huge block PTEs:
> kvm_pgtable_stage2_split(). Patches 5-6 add support for not doing
> break-before-make on huge-page breaking when FEAT_BBM level 2 is supported.

I would suggest you split up FEAT_BBM=2 and eager page splitting into
two separate series, if possible. IMO, the eager page split is easier to
reason about if it follows the existing pattern of break-before-make.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Oliver Upton <oliver.upton@linux.dev>
To: Ricardo Koller <ricarkol@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, dmatlack@google.com,
	qperret@google.com, catalin.marinas@arm.com,
	andrew.jones@linux.dev, seanjc@google.com,
	alexandru.elisei@arm.com, suzuki.poulose@arm.com,
	eric.auger@redhat.com, gshan@redhat.com, reijiw@google.com,
	rananta@google.com, bgardon@google.com, kvmarm@lists.linux.dev,
	ricarkol@gmail.com, kvmarm@lists.cs.columbia.edu,
	kvm@vger.kernel.org
Subject: Re: [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging
Date: Mon, 14 Nov 2022 18:42:36 +0000	[thread overview]
Message-ID: <Y3KMHGvIEuwhU1wS@google.com> (raw)
Message-ID: <20221114184236.a41BgikhYCOgMbDKkPBKlOmby4aYAegLKtYM_hJZfnA@z> (raw)
In-Reply-To: <20221112081714.2169495-1-ricarkol@google.com>

Hi Ricardo,

On Sat, Nov 12, 2022 at 08:17:02AM +0000, Ricardo Koller wrote:
> Hi,
> 
> I'm sending this RFC mainly to get some early feedback on the approach used
> for implementing "Eager Page Splitting" on ARM.  "Eager Page Splitting"
> improves the performance of dirty-logging (used in live migrations) when
> guest memory is backed by huge-pages.  It's an optimization used in Google
> Cloud since 2016 on x86, and for the last couple of months on ARM.
> 
> I tried multiple ways of implementing this optimization on ARM: from
> completely reusing the stage2 mapper, to implementing a new walker from
> scratch, and some versions in between. This RFC is one of those in
> between. They all have similar performance benefits, based on some light
> performance testing (mainly dirty_log_perf_test).
> 
> Background and motivation
> =========================
> Dirty logging is typically used for live-migration iterative copying.  KVM
> implements dirty-logging at the PAGE_SIZE granularity (will refer to 4K
> pages from now on).  It does it by faulting on write-protected 4K pages.
> Therefore, enabling dirty-logging on a huge-page requires breaking it into
> 4K pages in the first place.  KVM does this breaking on fault, and because
> it's in the critical path it only maps the 4K page that faulted; every
> other 4K page is left unmapped.  This is not great for performance on ARM
> for a couple of reasons:
> 
> - Splitting on fault can halt vcpus for milliseconds in some
>   implementations. Splitting a block PTE requires using a broadcasted TLB
>   invalidation (TLBI) for every huge-page (due to the break-before-make
>   requirement). Note that x86 doesn't need this. We observed some
>   implementations that take millliseconds to complete broadcasted TLBIs
>   when done in parallel from multiple vcpus.  And that's exactly what
>   happens when doing it on fault: multiple vcpus fault at the same time
>   triggering TLBIs in parallel.
> 
> - Read intensive guest workloads end up paying for dirty-logging.  Only
>   mapping the faulting 4K page means that all the other pages that were
>   part of the huge-page will now be unmapped. The effect is that any
>   access, including reads, now has to fault.
> 
> Eager Page Splitting (on ARM)
> =============================
> Eager Page Splitting fixes the above two issues by eagerly splitting
> huge-pages when enabling dirty logging. The goal is to avoid doing it while
> faulting on write-protected pages. This is what the TDP MMU does for x86
> [0], except that x86 does it for different reasons: to avoid grabbing the
> MMU lock on fault. Note that taking care of write-protection faults still
> requires grabbing the MMU lock on ARM, but not on x86 (with the
> fast_page_fault path).
> 
> An additional benefit of eagerly splitting huge-pages is that it can be
> done in a controlled way (e.g., via an IOCTL). This series provides two
> knobs for doing it, just like its x86 counterpart: when enabling dirty
> logging, and when using the KVM_CLEAR_DIRTY_LOG ioctl. The benefit of doing
> it on KVM_CLEAR_DIRTY_LOG is that this ioctl takes ranges, and not complete
> memslots like when enabling dirty logging. This means that the cost of
> splitting (mainly broadcasted TLBIs) can be throttled: split a range, wait
> for a bit, split another range, etc. The benefits of this approach were
> presented by Oliver Upton at KVM Forum 2022 [1].
> 
> Implementation
> ==============
> Patches 1-4 add a pgtable utility function for splitting huge block PTEs:
> kvm_pgtable_stage2_split(). Patches 5-6 add support for not doing
> break-before-make on huge-page breaking when FEAT_BBM level 2 is supported.

I would suggest you split up FEAT_BBM=2 and eager page splitting into
two separate series, if possible. IMO, the eager page split is easier to
reason about if it follows the existing pattern of break-before-make.

--
Thanks,
Oliver

  parent reply	other threads:[~2022-11-14 18:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-12  8:17 [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging Ricardo Koller
2022-11-12  8:17 ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 01/12] KVM: arm64: Relax WARN check in stage2_make_pte() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 20:59   ` Oliver Upton
2022-11-14 20:59     ` Oliver Upton
2022-11-12  8:17 ` [RFC PATCH 02/12] KVM: arm64: Allow visiting block PTEs in post-order Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:48   ` Oliver Upton
2022-11-14 18:48     ` Oliver Upton
2023-01-13  3:44     ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 03/12] KVM: arm64: Add stage2_create_removed() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 04/12] KVM: arm64: Add kvm_pgtable_stage2_split() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 20:54   ` Oliver Upton
2022-11-14 20:54     ` Oliver Upton
2022-11-15 23:03     ` Ricardo Koller
2022-11-15 23:03       ` Ricardo Koller
2022-11-15 23:27       ` Ricardo Koller
2022-11-15 23:27         ` Ricardo Koller
2022-11-15 23:54         ` Oliver Upton
2022-11-15 23:54           ` Oliver Upton
2022-11-17 21:50           ` Ricardo Koller
2022-11-17 21:50             ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 05/12] arm64: Add a capability for FEAT_BBM level 2 Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 06/12] KVM: arm64: Split block PTEs without using break-before-make Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:56   ` Oliver Upton
2022-11-14 18:56     ` Oliver Upton
2022-11-12  8:17 ` [RFC PATCH 07/12] KVM: arm64: Refactor kvm_arch_commit_memory_region() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 08/12] KVM: arm64: Add kvm_uninit_stage2_mmu() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 09/12] KVM: arm64: Split huge pages when dirty logging is enabled Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 10/12] KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked() Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 11/12] KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-12  8:17 ` [RFC PATCH 12/12] KVM: arm64: Use local TLBI on permission relaxation Ricardo Koller
2022-11-12  8:17   ` Ricardo Koller
2022-11-14 18:42 ` Oliver Upton [this message]
2022-11-14 18:42   ` [RFC PATCH 00/12] KVM: arm64: Eager huge-page splitting for dirty-logging Oliver Upton
2023-01-13  3:42   ` Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3KMHGvIEuwhU1wS@google.com \
    --to=oliver.upton@linux.dev \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@gmail.com \
    --cc=ricarkol@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.