From: Marc Zyngier <maz@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Gavin Shan <gshan@redhat.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org,
oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com,
mark.rutland@arm.com, v-songbaohua@oppo.com,
yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v3 1/3] arm64: tlb: Fix TLBI RANGE operand
Date: Wed, 10 Apr 2024 09:45:37 +0100 [thread overview]
Message-ID: <86wmp5sj0u.wl-maz@kernel.org> (raw)
In-Reply-To: <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com>
On Mon, 08 Apr 2024 09:29:31 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 05/04/2024 04:58, Gavin Shan wrote:
> > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
> > pages are collected by VMM and the page table entries become write
> > protected during live migration. Unfortunately, the operand passed
> > to the TLBI RANGE instruction isn't correctly sorted out due to the
> > commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
> > It leads to crash on the destination VM after live migration because
> > TLBs aren't flushed completely and some of the dirty pages are missed.
> >
> > For example, I have a VM where 8GB memory is assigned, starting from
> > 0x40000000 (1GB). Note that the host has 4KB as the base page size.
> > In the middile of migration, kvm_tlb_flush_vmid_range() is executed
> > to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
> > __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
> > and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
> > by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
> > from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
> > in the __flush_tlb_range_op() until the variable @scale underflows
> > and becomes -9, 0xffff708000040000 is set as the operand. The operand
> > is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
> > invalid @scale and @num.
> >
> > Fix it by extending __TLBI_RANGE_NUM() to support the combination of
> > SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
> > be returned from the macro, meaning the TLBs for 0x200000 pages in the
> > above example can be flushed in one shoot with SCALE#3 and NUM#31. The
> > macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
> > comments are also adjusted accordingly.
>
> Perhaps I'm being overly pedantic, but I don't think the bug is
> __TLBI_RANGE_NUM() not being able to return 31; It is clearly documented that it
> can only return in the range [-1, 30] and a maximum of (MAX_TLBI_RANGE_PAGES -
> 1) pages are supported.
I guess "clearly" is pretty relative. I find it misleading that we
don't support the full range of what the architecture offers and have
these odd limitations.
> The bug is in the kvm caller, which tries to call __flush_tlb_range_op() with
> MAX_TLBI_RANGE_PAGES; clearly out-of-bounds.
Nobody disputes that point, hence the Fixes: tag pointing to the KVM
patch. But there are two ways to fix it: either reduce the amount KVM
can use for range invalidation, or fix the helper to allow the full
range offered by the architecture.
> So personally, I would prefer to fix the bug first. Then separately
> enhance the infrastructure to support NUM=31.
I don't think this buys us much, apart from making it harder for
people to know what they need/want/randomly-elect to backport.
> > Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
>
> I would argue that the bug was actually introduced by commit 360839027a6e
> ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range"), which
> separated the tlbi loop from the range size validation in __flush_tlb_range().
> Before this, all calls would have to go through __flush_tlb_range() and
> therefore anything bigger than (MAX_TLBI_RANGE_PAGES - 1) pages would cause the
> whole mm to be flushed. Although I get that bisect will lead to this one, so
> that's probably the right one to highlight.
I haven't tried to bisect it, only picked this as the obviously
culprit.
To your point, using __flush_tlb_range() made little sense for KVM --
what would be the vma here? Splitting the helper out was, I think the
correct decision. But we of course lost sight of the __TLBI_RANGE_NUM
limitation in the process.
> I get why it was split, but perhaps it should have been split at a higher level;
> If tlbi range is not supported, then KVM will flush the whole vmid. Would it be
> better for KVM to follow the same pattern as __flush_tlb_range_nosync() and
> issue per-block tlbis upto a max of MAX_DVM_OPS before falling back to the whole
> vmid? And if tlbi range is supported, KVM uses it regardless of the size of the
> range, whereas __flush_tlb_range_nosync() falls back to flush_tlb_mm() at a
> certain size. It's not clear why this divergence is useful?
That'd be a definitive improvement indeed, and would bring back some
much needed consistency.
> > Cc: stable@kernel.org # v6.6+
> > Reported-by: Yihuang Yu <yihyu@redhat.com>
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
>
> Anyway, the implementation looks correct, so:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Thanks for that!
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-04-10 8:45 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-05 3:58 [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand Gavin Shan
2024-04-05 3:58 ` [PATCH v3 1/3] " Gavin Shan
2024-04-05 17:10 ` Catalin Marinas
2024-04-08 8:29 ` Ryan Roberts
2024-04-10 8:45 ` Marc Zyngier [this message]
2024-04-11 9:59 ` Ryan Roberts
2024-04-10 7:55 ` Anshuman Khandual
2024-04-05 3:58 ` [PATCH v3 2/3] arm64: tlb: Improve __TLBI_VADDR_RANGE() Gavin Shan
2024-04-05 17:10 ` Catalin Marinas
2024-04-08 8:31 ` Ryan Roberts
2024-04-10 7:57 ` Anshuman Khandual
2024-04-05 3:58 ` [PATCH v3 3/3] arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES Gavin Shan
2024-04-05 17:12 ` Catalin Marinas
2024-04-08 8:43 ` Ryan Roberts
2024-04-10 8:50 ` Marc Zyngier
2024-04-11 10:44 ` Will Deacon
2024-04-10 7:58 ` Anshuman Khandual
2024-04-10 8:43 ` [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand Shaoqin Huang
2024-04-10 17:52 ` (subset) " Catalin Marinas
2024-04-12 16:06 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86wmp5sj0u.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=oliver.upton@linux.dev \
--cc=rananta@google.com \
--cc=ryan.roberts@arm.com \
--cc=shahuang@redhat.com \
--cc=shan.gavin@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=will@kernel.org \
--cc=yangyicong@hisilicon.com \
--cc=yihyu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).