public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Keqian Zhu <zhukeqian1@huawei.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>, Will Deacon <will@kernel.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Mark Brown <broonie@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexios Zavras <alexios.zavras@intel.com>,
	wanghaibin.wang@huawei.com, zhengxiang9@huawei.com
Subject: Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM
Date: Mon, 25 May 2020 16:44:53 +0100	[thread overview]
Message-ID: <4b8a939172395bf38e581634abecf925@kernel.org> (raw)
In-Reply-To: <20200525112406.28224-1-zhukeqian1@huawei.com>

On 2020-05-25 12:23, Keqian Zhu wrote:
> This patch series add support for stage2 hardware DBM, and it is only
> used for dirty log for now.
> 
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
> 
> However, there are some known issues not solved.
> 
> 1. Some mechanisms that rely on "write permission fault" become 
> invalid,
>    such as kvm_set_pfn_dirty and "mmap page sharing".
> 
>    kvm_set_pfn_dirty is called in user_mem_abort when guest issues 
> write
>    fault. This guarantees physical page will not be dropped directly 
> when
>    host kernel recycle memory. After using hardware dirty management, 
> we
>    have no chance to call kvm_set_pfn_dirty.

Then you will end-up with memory corruption under memory pressure.
This also breaks things like CoW, which we depend on.

> 
>    For "mmap page sharing" mechanism, host kernel will allocate a new
>    physical page when guest writes a page that is shared with other 
> page
>    table entries. After using hardware dirty management, we have no 
> chance
>    to do this too.
> 
>    I need to do some survey on how stage1 hardware DBM solve these 
> problems.
>    It helps if anyone can figure it out.
> 
> 2. Page Table Modification Races: Though I have found and solved some 
> data
>    races when kernel changes page table entries, I still doubt that 
> there
>    are data races I am not aware of. It's great if anyone can figure 
> them out.
> 
> 3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
>    consumes about 40ms to traverse all PTEs to collect dirty log. It 
> will
>    cause unbearable downtime for migration if memory size is too big. I 
> will
>    try to solve this problem in Patch v1.

This, in my opinion, is why Stage-2 DBM is fairly useless.
 From a performance perspective, this is the worse possible
situation. You end up continuously scanning page tables, at
an arbitrary rate, without a way to evaluate the fault rate.

One thing S2-DBM would be useful for is SVA, where a device
write would mark the S2 PTs dirty as they are shared between
CPU and SMMU. Another thing is SPE, which is essentially a DMA
agent using the CPU's PTs.

But on its own, and just to log the dirty pages, S2-DBM is
pretty rubbish. I wish arm64 had something like Intel's PML,
which looks far more interesting for the purpose of tracking
accesses.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

  parent reply	other threads:[~2020-05-25 15:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-25 11:23 [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 1/7] KVM: arm64: Add some basic functions for hw DBM Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 2/7] KVM: arm64: Set DBM bit of PTEs if hw DBM enabled Keqian Zhu
2020-05-26 11:49   ` Catalin Marinas
2020-05-27  9:28     ` zhukeqian
2020-05-25 11:24 ` [RFC PATCH 3/7] KVM: arm64: Traverse page table entries when sync dirty log Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 4/7] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 5/7] kvm: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 6/7] kvm: arm64: Save stage2 PTE dirty info if it is coverred Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 7/7] KVM: arm64: Enable stage2 hardware DBM Keqian Zhu
2020-05-25 15:44 ` Marc Zyngier [this message]
2020-05-26  2:08   ` [RFC PATCH 0/7] kvm: arm64: Support " zhukeqian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b8a939172395bf38e581634abecf925@kernel.org \
    --to=maz@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexios.zavras@intel.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=zhengxiang9@huawei.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox