kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Jing Zhang <jingzhangos@google.com>
Cc: KVM <kvm@vger.kernel.org>, David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Will Deacon <will@kernel.org>,
	KVMARM <kvmarm@lists.cs.columbia.edu>
Subject: Re: [RFC PATCH 0/3] ARM64: Guest performance improvement during dirty
Date: Tue, 11 Jan 2022 11:54:55 +0000	[thread overview]
Message-ID: <877db6trlc.wl-maz@kernel.org> (raw)
In-Reply-To: <20220110210441.2074798-1-jingzhangos@google.com>

On Mon, 10 Jan 2022 21:04:38 +0000,
Jing Zhang <jingzhangos@google.com> wrote:
> 
> This patch is to reduce the performance degradation of guest workload during
> dirty logging on ARM64. A fast path is added to handle permission relaxation
> during dirty logging. The MMU lock is replaced with rwlock, by which all
> permision relaxations on leaf pte can be performed under the read lock. This
> greatly reduces the MMU lock contention during dirty logging. With this
> solution, the source guest workload performance degradation can be improved
> by more than 60%.
> 
> Problem:
>   * A Google internal live migration test shows that the source guest workload
>   performance has >99% degradation for about 105 seconds, >50% degradation
>   for about 112 seconds, >10% degradation for about 112 seconds on ARM64.
>   This shows that most of the time, the guest workload degradtion is above
>   99%, which obviously needs some improvement compared to the test result
>   on x86 (>99% for 6s, >50% for 9s, >10% for 27s).
>   * Tested H/W: Ampere Altra 3GHz, #CPU: 64, #Mem: 256GB
>   * VM spec: #vCPU: 48, #Mem/vCPU: 4GB

What are the host and guest page sizes?

> 
> Analysis:
>   * We enabled CONFIG_LOCK_STAT in kernel and used dirty_log_perf_test to get
>     the number of contentions of MMU lock and the "dirty memory time" on
>     various VM spec.
>     By using test command
>     ./dirty_log_perf_test -b 2G -m 2 -i 2 -s anonymous_hugetlb_2mb -v [#vCPU]

How is this test representative of the internal live migration test
you mention above? '-m 2' indicates a mode that varies depending on
the HW and revision of the test (I just added a bunch of supported
modes). Which one is it?

>     Below are the results:
>     +-------+------------------------+-----------------------+
>     | #vCPU | dirty memory time (ms) | number of contentions |
>     +-------+------------------------+-----------------------+
>     | 1     | 926                    | 0                     |
>     +-------+------------------------+-----------------------+
>     | 2     | 1189                   | 4732558               |
>     +-------+------------------------+-----------------------+
>     | 4     | 2503                   | 11527185              |
>     +-------+------------------------+-----------------------+
>     | 8     | 5069                   | 24881677              |
>     +-------+------------------------+-----------------------+
>     | 16    | 10340                  | 50347956              |
>     +-------+------------------------+-----------------------+
>     | 32    | 20351                  | 100605720             |
>     +-------+------------------------+-----------------------+
>     | 64    | 40994                  | 201442478             |
>     +-------+------------------------+-----------------------+
> 
>   * From the test results above, the "dirty memory time" and the number of
>     MMU lock contention scale with the number of vCPUs. That means all the
>     dirty memory operations from all vCPU threads have been serialized by
>     the MMU lock. Further analysis also shows that the permission relaxation
>     during dirty logging is where vCPU threads get serialized.
> 
> Solution:
>   * On ARM64, there is no mechanism as PML (Page Modification Logging) and
>     the dirty-bit solution for dirty logging is much complicated compared to
>     the write-protection solution. The straight way to reduce the guest
>     performance degradation is to enhance the concurrency for the permission
>     fault path during dirty logging.
>   * In this patch, we only put leaf PTE permission relaxation for dirty
>     logging under read lock, all others would go under write lock.
>     Below are the results based on the solution:
>     +-------+------------------------+
>     | #vCPU | dirty memory time (ms) |
>     +-------+------------------------+
>     | 1     | 803                    |
>     +-------+------------------------+
>     | 2     | 843                    |
>     +-------+------------------------+
>     | 4     | 942                    |
>     +-------+------------------------+
>     | 8     | 1458                   |
>     +-------+------------------------+
>     | 16    | 2853                   |
>     +-------+------------------------+
>     | 32    | 5886                   |
>     +-------+------------------------+
>     | 64    | 12190                  |
>     +-------+------------------------+
>     All "dirty memory time" have been reduced by more than 60% when the
>     number of vCPU grows.

How does that translate to the original problem statement with your
live migration test?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  parent reply	other threads:[~2022-01-11 11:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 21:04 [RFC PATCH 0/3] ARM64: Guest performance improvement during dirty Jing Zhang
2022-01-10 21:04 ` [RFC PATCH 1/3] KVM: arm64: Use read/write spin lock for MMU protection Jing Zhang
2022-01-11 10:23   ` Marc Zyngier
2022-01-11 22:12     ` Jing Zhang
2022-01-10 21:04 ` [RFC PATCH 2/3] KVM: arm64: Add fast path to handle permission relaxation during dirty logging Jing Zhang
2022-01-11 10:22   ` Marc Zyngier
2022-01-11 10:50   ` Marc Zyngier
2022-01-11 22:12     ` Jing Zhang
2022-01-10 21:04 ` [RFC PATCH 3/3] KVM: selftests: Add vgic initialization for dirty log perf test for ARM Jing Zhang
2022-01-11  9:55   ` Andrew Jones
2022-01-11 22:12     ` Jing Zhang
2022-01-11 10:30   ` Marc Zyngier
2022-01-11 22:16     ` Jing Zhang
2022-01-12 11:37       ` Marc Zyngier
2022-01-12 17:40         ` Jing Zhang
2022-01-11 11:54 ` Marc Zyngier [this message]
2022-01-11 22:12   ` [RFC PATCH 0/3] ARM64: Guest performance improvement during dirty Jing Zhang
2022-01-13  2:49 ` Ricardo Koller
2022-01-13  3:50   ` Jing Zhang
2022-01-13  6:12     ` Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877db6trlc.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=dmatlack@google.com \
    --cc=jingzhangos@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=pbonzini@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).