From: Marc Zyngier <maz@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Gavin Shan <gshan@redhat.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org,
oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com,
mark.rutland@arm.com, v-songbaohua@oppo.com,
yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v3 1/3] arm64: tlb: Fix TLBI RANGE operand
Date: Wed, 10 Apr 2024 09:45:37 +0100 [thread overview]
Message-ID: <86wmp5sj0u.wl-maz@kernel.org> (raw)
In-Reply-To: <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com>
On Mon, 08 Apr 2024 09:29:31 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 05/04/2024 04:58, Gavin Shan wrote:
> > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
> > pages are collected by VMM and the page table entries become write
> > protected during live migration. Unfortunately, the operand passed
> > to the TLBI RANGE instruction isn't correctly sorted out due to the
> > commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
> > It leads to crash on the destination VM after live migration because
> > TLBs aren't flushed completely and some of the dirty pages are missed.
> >
> > For example, I have a VM where 8GB memory is assigned, starting from
> > 0x40000000 (1GB). Note that the host has 4KB as the base page size.
> > In the middile of migration, kvm_tlb_flush_vmid_range() is executed
> > to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
> > __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
> > and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
> > by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
> > from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
> > in the __flush_tlb_range_op() until the variable @scale underflows
> > and becomes -9, 0xffff708000040000 is set as the operand. The operand
> > is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
> > invalid @scale and @num.
> >
> > Fix it by extending __TLBI_RANGE_NUM() to support the combination of
> > SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
> > be returned from the macro, meaning the TLBs for 0x200000 pages in the
> > above example can be flushed in one shoot with SCALE#3 and NUM#31. The
> > macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
> > comments are also adjusted accordingly.
>
> Perhaps I'm being overly pedantic, but I don't think the bug is
> __TLBI_RANGE_NUM() not being able to return 31; It is clearly documented that it
> can only return in the range [-1, 30] and a maximum of (MAX_TLBI_RANGE_PAGES -
> 1) pages are supported.
I guess "clearly" is pretty relative. I find it misleading that we
don't support the full range of what the architecture offers and have
these odd limitations.
> The bug is in the kvm caller, which tries to call __flush_tlb_range_op() with
> MAX_TLBI_RANGE_PAGES; clearly out-of-bounds.
Nobody disputes that point, hence the Fixes: tag pointing to the KVM
patch. But there are two ways to fix it: either reduce the amount KVM
can use for range invalidation, or fix the helper to allow the full
range offered by the architecture.
> So personally, I would prefer to fix the bug first. Then separately
> enhance the infrastructure to support NUM=31.
I don't think this buys us much, apart from making it harder for
people to know what they need/want/randomly-elect to backport.
> > Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
>
> I would argue that the bug was actually introduced by commit 360839027a6e
> ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range"), which
> separated the tlbi loop from the range size validation in __flush_tlb_range().
> Before this, all calls would have to go through __flush_tlb_range() and
> therefore anything bigger than (MAX_TLBI_RANGE_PAGES - 1) pages would cause the
> whole mm to be flushed. Although I get that bisect will lead to this one, so
> that's probably the right one to highlight.
I haven't tried to bisect it, only picked this as the obviously
culprit.
To your point, using __flush_tlb_range() made little sense for KVM --
what would be the vma here? Splitting the helper out was, I think the
correct decision. But we of course lost sight of the __TLBI_RANGE_NUM
limitation in the process.
> I get why it was split, but perhaps it should have been split at a higher level;
> If tlbi range is not supported, then KVM will flush the whole vmid. Would it be
> better for KVM to follow the same pattern as __flush_tlb_range_nosync() and
> issue per-block tlbis upto a max of MAX_DVM_OPS before falling back to the whole
> vmid? And if tlbi range is supported, KVM uses it regardless of the size of the
> range, whereas __flush_tlb_range_nosync() falls back to flush_tlb_mm() at a
> certain size. It's not clear why this divergence is useful?
That'd be a definitive improvement indeed, and would bring back some
much needed consistency.
> > Cc: stable@kernel.org # v6.6+
> > Reported-by: Yihuang Yu <yihyu@redhat.com>
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
>
> Anyway, the implementation looks correct, so:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Thanks for that!
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Gavin Shan <gshan@redhat.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org,
oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com,
mark.rutland@arm.com, v-songbaohua@oppo.com,
yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v3 1/3] arm64: tlb: Fix TLBI RANGE operand
Date: Wed, 10 Apr 2024 09:45:37 +0100 [thread overview]
Message-ID: <86wmp5sj0u.wl-maz@kernel.org> (raw)
In-Reply-To: <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com>
On Mon, 08 Apr 2024 09:29:31 +0100,
Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 05/04/2024 04:58, Gavin Shan wrote:
> > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
> > pages are collected by VMM and the page table entries become write
> > protected during live migration. Unfortunately, the operand passed
> > to the TLBI RANGE instruction isn't correctly sorted out due to the
> > commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()").
> > It leads to crash on the destination VM after live migration because
> > TLBs aren't flushed completely and some of the dirty pages are missed.
> >
> > For example, I have a VM where 8GB memory is assigned, starting from
> > 0x40000000 (1GB). Note that the host has 4KB as the base page size.
> > In the middile of migration, kvm_tlb_flush_vmid_range() is executed
> > to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to
> > __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3
> > and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported
> > by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned
> > from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop
> > in the __flush_tlb_range_op() until the variable @scale underflows
> > and becomes -9, 0xffff708000040000 is set as the operand. The operand
> > is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to
> > invalid @scale and @num.
> >
> > Fix it by extending __TLBI_RANGE_NUM() to support the combination of
> > SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can
> > be returned from the macro, meaning the TLBs for 0x200000 pages in the
> > above example can be flushed in one shoot with SCALE#3 and NUM#31. The
> > macro TLBI_RANGE_MASK is dropped since no one uses it any more. The
> > comments are also adjusted accordingly.
>
> Perhaps I'm being overly pedantic, but I don't think the bug is
> __TLBI_RANGE_NUM() not being able to return 31; It is clearly documented that it
> can only return in the range [-1, 30] and a maximum of (MAX_TLBI_RANGE_PAGES -
> 1) pages are supported.
I guess "clearly" is pretty relative. I find it misleading that we
don't support the full range of what the architecture offers and have
these odd limitations.
> The bug is in the kvm caller, which tries to call __flush_tlb_range_op() with
> MAX_TLBI_RANGE_PAGES; clearly out-of-bounds.
Nobody disputes that point, hence the Fixes: tag pointing to the KVM
patch. But there are two ways to fix it: either reduce the amount KVM
can use for range invalidation, or fix the helper to allow the full
range offered by the architecture.
> So personally, I would prefer to fix the bug first. Then separately
> enhance the infrastructure to support NUM=31.
I don't think this buys us much, apart from making it harder for
people to know what they need/want/randomly-elect to backport.
> > Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()")
>
> I would argue that the bug was actually introduced by commit 360839027a6e
> ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range"), which
> separated the tlbi loop from the range size validation in __flush_tlb_range().
> Before this, all calls would have to go through __flush_tlb_range() and
> therefore anything bigger than (MAX_TLBI_RANGE_PAGES - 1) pages would cause the
> whole mm to be flushed. Although I get that bisect will lead to this one, so
> that's probably the right one to highlight.
I haven't tried to bisect it, only picked this as the obviously
culprit.
To your point, using __flush_tlb_range() made little sense for KVM --
what would be the vma here? Splitting the helper out was, I think the
correct decision. But we of course lost sight of the __TLBI_RANGE_NUM
limitation in the process.
> I get why it was split, but perhaps it should have been split at a higher level;
> If tlbi range is not supported, then KVM will flush the whole vmid. Would it be
> better for KVM to follow the same pattern as __flush_tlb_range_nosync() and
> issue per-block tlbis upto a max of MAX_DVM_OPS before falling back to the whole
> vmid? And if tlbi range is supported, KVM uses it regardless of the size of the
> range, whereas __flush_tlb_range_nosync() falls back to flush_tlb_mm() at a
> certain size. It's not clear why this divergence is useful?
That'd be a definitive improvement indeed, and would bring back some
much needed consistency.
> > Cc: stable@kernel.org # v6.6+
> > Reported-by: Yihuang Yu <yihyu@redhat.com>
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
>
> Anyway, the implementation looks correct, so:
>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Thanks for that!
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-04-10 8:45 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-05 3:58 [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand Gavin Shan
2024-04-05 3:58 ` Gavin Shan
2024-04-05 3:58 ` [PATCH v3 1/3] " Gavin Shan
2024-04-05 3:58 ` Gavin Shan
2024-04-05 17:10 ` Catalin Marinas
2024-04-05 17:10 ` Catalin Marinas
2024-04-08 8:29 ` Ryan Roberts
2024-04-08 8:29 ` Ryan Roberts
2024-04-10 8:45 ` Marc Zyngier [this message]
2024-04-10 8:45 ` Marc Zyngier
2024-04-11 9:59 ` Ryan Roberts
2024-04-11 9:59 ` Ryan Roberts
2024-04-10 7:55 ` Anshuman Khandual
2024-04-10 7:55 ` Anshuman Khandual
2024-04-05 3:58 ` [PATCH v3 2/3] arm64: tlb: Improve __TLBI_VADDR_RANGE() Gavin Shan
2024-04-05 3:58 ` Gavin Shan
2024-04-05 17:10 ` Catalin Marinas
2024-04-05 17:10 ` Catalin Marinas
2024-04-08 8:31 ` Ryan Roberts
2024-04-08 8:31 ` Ryan Roberts
2024-04-10 7:57 ` Anshuman Khandual
2024-04-10 7:57 ` Anshuman Khandual
2024-04-05 3:58 ` [PATCH v3 3/3] arm64: tlb: Allow range operation for MAX_TLBI_RANGE_PAGES Gavin Shan
2024-04-05 3:58 ` Gavin Shan
2024-04-05 17:12 ` Catalin Marinas
2024-04-05 17:12 ` Catalin Marinas
2024-04-08 8:43 ` Ryan Roberts
2024-04-08 8:43 ` Ryan Roberts
2024-04-10 8:50 ` Marc Zyngier
2024-04-10 8:50 ` Marc Zyngier
2024-04-11 10:44 ` Will Deacon
2024-04-11 10:44 ` Will Deacon
2024-04-10 7:58 ` Anshuman Khandual
2024-04-10 7:58 ` Anshuman Khandual
2024-04-10 8:43 ` [PATCH v3 0/3] arm64: tlb: Fix TLBI RANGE operand Shaoqin Huang
2024-04-10 8:43 ` Shaoqin Huang
2024-04-10 17:52 ` (subset) " Catalin Marinas
2024-04-10 17:52 ` Catalin Marinas
2024-04-12 16:06 ` Will Deacon
2024-04-12 16:06 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86wmp5sj0u.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=oliver.upton@linux.dev \
--cc=rananta@google.com \
--cc=ryan.roberts@arm.com \
--cc=shahuang@redhat.com \
--cc=shan.gavin@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=will@kernel.org \
--cc=yangyicong@hisilicon.com \
--cc=yihyu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.