From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com,
mark.rutland@arm.com, ryan.roberts@arm.com, rananta@google.com,
yangyicong@hisilicon.com, v-songbaohua@oppo.com,
yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64: tlb: Fix TLBI RANGE operand
Date: Wed, 03 Apr 2024 14:44:21 +0100 [thread overview]
Message-ID: <87jzlesgqy.wl-maz@kernel.org> (raw)
In-Reply-To: <20134458-508b-4853-a962-c1b44009e8ed@redhat.com>
On Wed, 03 Apr 2024 12:37:30 +0100,
Gavin Shan <gshan@redhat.com> wrote:
>
> On 4/3/24 18:58, Marc Zyngier wrote:
> > On Wed, 03 Apr 2024 07:49:29 +0100,
> > Gavin Shan <gshan@redhat.com> wrote:
> >>
> >> KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
> >> bitmap is collected by VMM and the corresponding PTEs need to be
> >> write-protected again. Unfortunately, the operand passed to the TLBI
> >> RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4
> >> ("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to
> >> crash on the destination VM after live migration because some of the
> >> dirty pages are missed.
> >>
> >> For example, I have a VM where 8GB memory is assigned, starting from
> >> 0x40000000 (1GB). Note that the host has 4KB as the base page size.
> >> All TLBs for VM can be covered by one TLBI RANGE operation. However,
> >> I receives 0xffff708000040000 as the operand, which is wrong and the
> >> correct one should be 0x00007f8000040000. From the wrong operand, we
> >> have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB
> >> instead of 8GB memory is covered.
> >>
> >> Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI
> >> RANGE operand are provided.
> >>
> >> Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64")
> >> Cc: stable@kernel.org # v5.10+
> >> Reported-by: Yihuang Yu <yihyu@redhat.com>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> ---
> >> arch/arm64/include/asm/tlbflush.h | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> >> index 3b0e8248e1a4..07c4fb4b82b4 100644
> >> --- a/arch/arm64/include/asm/tlbflush.h
> >> +++ b/arch/arm64/include/asm/tlbflush.h
> >> @@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void)
> >> */
> >> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
> >> #define __TLBI_RANGE_NUM(pages, scale) \
> >> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
> >> + ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK)
> >> /*
> >> * TLB Invalidation
> >
> > This looks pretty wrong, by the very definition of the comment that's
> > just above:
> >
> > <quote>
> > /*
> > * Generate 'num' values from -1 to 30 with -1 rejected by the
> > * __flush_tlb_range() loop below.
> > */
> > </quote>
> >
> > With your change, num can't ever be negative, and that breaks
> > __flush_tlb_range_op():
> >
> > <quote>
> > num = __TLBI_RANGE_NUM(pages, scale); \
> > if (num >= 0) { \
> > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> > scale, num, tlb_level); \
> > __tlbi(r##op, addr); \
> > if (tlbi_user) \
> > __tlbi_user(r##op, addr); \
> > start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
> > pages -= __TLBI_RANGE_PAGES(num, scale); \
> > } \
> > scale--; \
> > </quote>
> >
> > We'll then shove whatever value we've found in the TLBI operation,
> > leading to unknown results instead of properly adjusting the scale to
> > issue a smaller invalidation.
> >
>
> Marc, thanks for your review and comments.
>
> Indeed, this patch is incomplete at least. I think we need __TLBI_RANGE_NUM()
> to return [-1 31] instead of [-1 30], to be consistent with MAX_TLBI_RANGE_PAGES.
> -1 will be rejected in the following loop. I'm not 100% sure if I did the correct
> calculation though.
>
> /*
> * Generate 'num' values in range [-1 31], but -1 will be rejected
> * by the __flush_tlb_range() loop below.
> */
> #define __TLBI_RANGE_NUM(pages, scale) \
> ({ \
> int __next = (pages) & (1ULL << (5 * (scale) + 6)); \
> int __mask = ((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK; \
> int __num = (((pages) >> (5 * (scale) + 1)) - 1) & \
> TLBI_RANGE_MASK; \
> (__next || __mask) ? __num : -1; \
> })
I'm afraid I don't follow the logic here, and it looks awfully
complex. I came up with something simpler with this:
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 3b0e8248e1a4..b3f1a9c61189 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -161,12 +161,18 @@ static inline unsigned long get_trans_granule(void)
#define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3)
/*
- * Generate 'num' values from -1 to 30 with -1 rejected by the
+ * Generate 'num' values from -1 to 31 with -1 rejected by the
* __flush_tlb_range() loop below.
*/
#define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
-#define __TLBI_RANGE_NUM(pages, scale) \
- ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
+#define __TLBI_RANGE_NUM(pages, scale) \
+ ({ \
+ int __pages = min((pages), \
+ __TLBI_RANGE_PAGES(31, (scale))); \
+ int __numplus1 = __pages >> (5 * (scale) + 1); \
+ \
+ (__numplus1 - 1); \
+ })
/*
* TLB Invalidation
@@ -379,10 +385,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
* 3. If there is 1 page remaining, flush it through non-range operations. Range
* operations can only span an even number of pages. We save this for last to
* ensure 64KB start alignment is maintained for the LPA2 case.
- *
- * Note that certain ranges can be represented by either num = 31 and
- * scale or num = 0 and scale + 1. The loop below favours the latter
- * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
*/
#define __flush_tlb_range_op(op, start, pages, stride, \
asid, tlb_level, tlbi_user, lpa2) \
>
> Alternatively, we can also limit the number of pages to be invalidated from
> arch/arm64/kvm/hyp/pgtable.c::kvm_tlb_flush_vmid_range() because the maximal
> capacity is (MAX_TLBI_RANGE_PAGES - 1) instead of MAX_TLBI_RANGE_PAGES, as
> the comments for __flush_tlb_range_nosync() say.
>
> - inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES - 1);
>
>
> static inline void __flush_tlb_range_nosync(...)
> {
> :
> /*
> * When not uses TLB range ops, we can handle up to
> * (MAX_DVM_OPS - 1) pages;
> * When uses TLB range ops, we can handle up to
> * (MAX_TLBI_RANGE_PAGES - 1) pages.
> */
> if ((!system_supports_tlb_range() &&
> (end - start) >= (MAX_DVM_OPS * stride)) ||
> pages >= MAX_TLBI_RANGE_PAGES) {
> flush_tlb_mm(vma->vm_mm);
> return;
> }
> }
>
> Please let me know which way is better.
I would really prefer to fix the range stuff itself instead of
papering over the problem by reducing the reach of the range
invalidation.
>
> > I think the problem is that you are triggering NUM=31 and SCALE=3,
> > which the current code cannot handle as per the comment above
> > __flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously).
> >
>
> Yes, exactly.
>
> > Can you try the untested patch below?
> >
> >
> > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> > index 3b0e8248e1a4..b71a1cece802 100644
> > --- a/arch/arm64/include/asm/tlbflush.h
> > +++ b/arch/arm64/include/asm/tlbflush.h
> > @@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> > * 3. If there is 1 page remaining, flush it through non-range operations. Range
> > * operations can only span an even number of pages. We save this for last to
> > * ensure 64KB start alignment is maintained for the LPA2 case.
> > - *
> > - * Note that certain ranges can be represented by either num = 31 and
> > - * scale or num = 0 and scale + 1. The loop below favours the latter
> > - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> > */
> > #define __flush_tlb_range_op(op, start, pages, stride, \
> > asid, tlb_level, tlbi_user, lpa2) \
> > @@ -407,6 +403,7 @@ do { \
> > \
> > num = __TLBI_RANGE_NUM(pages, scale); \
> > if (num >= 0) { \
> > + num += 1; \
> > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> > scale, num, tlb_level); \
> > __tlbi(r##op, addr); \
> >
>
> Thanks, but I don't think it's going to work. The loop will be running infinitely
> because the condition 'if (num >= 0)' can't be met when @pages is 0x200000 when
> @scale is 3/2/1/0 until @scale becomes negative and positive again, but @scale
> isn't in range [0 3]. I ported the chunk of code to user-space and I can see this
> with added printf() messages.
Yeah, we lose num==0, which is silly. Hopefully the hack above helps a
bit.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com,
mark.rutland@arm.com, ryan.roberts@arm.com, rananta@google.com,
yangyicong@hisilicon.com, v-songbaohua@oppo.com,
yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64: tlb: Fix TLBI RANGE operand
Date: Wed, 03 Apr 2024 14:44:21 +0100 [thread overview]
Message-ID: <87jzlesgqy.wl-maz@kernel.org> (raw)
In-Reply-To: <20134458-508b-4853-a962-c1b44009e8ed@redhat.com>
On Wed, 03 Apr 2024 12:37:30 +0100,
Gavin Shan <gshan@redhat.com> wrote:
>
> On 4/3/24 18:58, Marc Zyngier wrote:
> > On Wed, 03 Apr 2024 07:49:29 +0100,
> > Gavin Shan <gshan@redhat.com> wrote:
> >>
> >> KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
> >> bitmap is collected by VMM and the corresponding PTEs need to be
> >> write-protected again. Unfortunately, the operand passed to the TLBI
> >> RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4
> >> ("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to
> >> crash on the destination VM after live migration because some of the
> >> dirty pages are missed.
> >>
> >> For example, I have a VM where 8GB memory is assigned, starting from
> >> 0x40000000 (1GB). Note that the host has 4KB as the base page size.
> >> All TLBs for VM can be covered by one TLBI RANGE operation. However,
> >> I receives 0xffff708000040000 as the operand, which is wrong and the
> >> correct one should be 0x00007f8000040000. From the wrong operand, we
> >> have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB
> >> instead of 8GB memory is covered.
> >>
> >> Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI
> >> RANGE operand are provided.
> >>
> >> Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64")
> >> Cc: stable@kernel.org # v5.10+
> >> Reported-by: Yihuang Yu <yihyu@redhat.com>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> ---
> >> arch/arm64/include/asm/tlbflush.h | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> >> index 3b0e8248e1a4..07c4fb4b82b4 100644
> >> --- a/arch/arm64/include/asm/tlbflush.h
> >> +++ b/arch/arm64/include/asm/tlbflush.h
> >> @@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void)
> >> */
> >> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
> >> #define __TLBI_RANGE_NUM(pages, scale) \
> >> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
> >> + ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK)
> >> /*
> >> * TLB Invalidation
> >
> > This looks pretty wrong, by the very definition of the comment that's
> > just above:
> >
> > <quote>
> > /*
> > * Generate 'num' values from -1 to 30 with -1 rejected by the
> > * __flush_tlb_range() loop below.
> > */
> > </quote>
> >
> > With your change, num can't ever be negative, and that breaks
> > __flush_tlb_range_op():
> >
> > <quote>
> > num = __TLBI_RANGE_NUM(pages, scale); \
> > if (num >= 0) { \
> > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> > scale, num, tlb_level); \
> > __tlbi(r##op, addr); \
> > if (tlbi_user) \
> > __tlbi_user(r##op, addr); \
> > start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
> > pages -= __TLBI_RANGE_PAGES(num, scale); \
> > } \
> > scale--; \
> > </quote>
> >
> > We'll then shove whatever value we've found in the TLBI operation,
> > leading to unknown results instead of properly adjusting the scale to
> > issue a smaller invalidation.
> >
>
> Marc, thanks for your review and comments.
>
> Indeed, this patch is incomplete at least. I think we need __TLBI_RANGE_NUM()
> to return [-1 31] instead of [-1 30], to be consistent with MAX_TLBI_RANGE_PAGES.
> -1 will be rejected in the following loop. I'm not 100% sure if I did the correct
> calculation though.
>
> /*
> * Generate 'num' values in range [-1 31], but -1 will be rejected
> * by the __flush_tlb_range() loop below.
> */
> #define __TLBI_RANGE_NUM(pages, scale) \
> ({ \
> int __next = (pages) & (1ULL << (5 * (scale) + 6)); \
> int __mask = ((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK; \
> int __num = (((pages) >> (5 * (scale) + 1)) - 1) & \
> TLBI_RANGE_MASK; \
> (__next || __mask) ? __num : -1; \
> })
I'm afraid I don't follow the logic here, and it looks awfully
complex. I came up with something simpler with this:
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 3b0e8248e1a4..b3f1a9c61189 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -161,12 +161,18 @@ static inline unsigned long get_trans_granule(void)
#define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3)
/*
- * Generate 'num' values from -1 to 30 with -1 rejected by the
+ * Generate 'num' values from -1 to 31 with -1 rejected by the
* __flush_tlb_range() loop below.
*/
#define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
-#define __TLBI_RANGE_NUM(pages, scale) \
- ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
+#define __TLBI_RANGE_NUM(pages, scale) \
+ ({ \
+ int __pages = min((pages), \
+ __TLBI_RANGE_PAGES(31, (scale))); \
+ int __numplus1 = __pages >> (5 * (scale) + 1); \
+ \
+ (__numplus1 - 1); \
+ })
/*
* TLB Invalidation
@@ -379,10 +385,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
* 3. If there is 1 page remaining, flush it through non-range operations. Range
* operations can only span an even number of pages. We save this for last to
* ensure 64KB start alignment is maintained for the LPA2 case.
- *
- * Note that certain ranges can be represented by either num = 31 and
- * scale or num = 0 and scale + 1. The loop below favours the latter
- * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
*/
#define __flush_tlb_range_op(op, start, pages, stride, \
asid, tlb_level, tlbi_user, lpa2) \
>
> Alternatively, we can also limit the number of pages to be invalidated from
> arch/arm64/kvm/hyp/pgtable.c::kvm_tlb_flush_vmid_range() because the maximal
> capacity is (MAX_TLBI_RANGE_PAGES - 1) instead of MAX_TLBI_RANGE_PAGES, as
> the comments for __flush_tlb_range_nosync() say.
>
> - inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES - 1);
>
>
> static inline void __flush_tlb_range_nosync(...)
> {
> :
> /*
> * When not uses TLB range ops, we can handle up to
> * (MAX_DVM_OPS - 1) pages;
> * When uses TLB range ops, we can handle up to
> * (MAX_TLBI_RANGE_PAGES - 1) pages.
> */
> if ((!system_supports_tlb_range() &&
> (end - start) >= (MAX_DVM_OPS * stride)) ||
> pages >= MAX_TLBI_RANGE_PAGES) {
> flush_tlb_mm(vma->vm_mm);
> return;
> }
> }
>
> Please let me know which way is better.
I would really prefer to fix the range stuff itself instead of
papering over the problem by reducing the reach of the range
invalidation.
>
> > I think the problem is that you are triggering NUM=31 and SCALE=3,
> > which the current code cannot handle as per the comment above
> > __flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously).
> >
>
> Yes, exactly.
>
> > Can you try the untested patch below?
> >
> >
> > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> > index 3b0e8248e1a4..b71a1cece802 100644
> > --- a/arch/arm64/include/asm/tlbflush.h
> > +++ b/arch/arm64/include/asm/tlbflush.h
> > @@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> > * 3. If there is 1 page remaining, flush it through non-range operations. Range
> > * operations can only span an even number of pages. We save this for last to
> > * ensure 64KB start alignment is maintained for the LPA2 case.
> > - *
> > - * Note that certain ranges can be represented by either num = 31 and
> > - * scale or num = 0 and scale + 1. The loop below favours the latter
> > - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> > */
> > #define __flush_tlb_range_op(op, start, pages, stride, \
> > asid, tlb_level, tlbi_user, lpa2) \
> > @@ -407,6 +403,7 @@ do { \
> > \
> > num = __TLBI_RANGE_NUM(pages, scale); \
> > if (num >= 0) { \
> > + num += 1; \
> > addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> > scale, num, tlb_level); \
> > __tlbi(r##op, addr); \
> >
>
> Thanks, but I don't think it's going to work. The loop will be running infinitely
> because the condition 'if (num >= 0)' can't be met when @pages is 0x200000 when
> @scale is 3/2/1/0 until @scale becomes negative and positive again, but @scale
> isn't in range [0 3]. I ported the chunk of code to user-space and I can see this
> with added printf() messages.
Yeah, we lose num==0, which is silly. Hopefully the hack above helps a
bit.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-04-03 13:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 6:49 [PATCH] arm64: tlb: Fix TLBI RANGE operand Gavin Shan
2024-04-03 6:49 ` Gavin Shan
2024-04-03 8:58 ` Marc Zyngier
2024-04-03 8:58 ` Marc Zyngier
2024-04-03 11:37 ` Gavin Shan
2024-04-03 11:37 ` Gavin Shan
2024-04-03 13:44 ` Marc Zyngier [this message]
2024-04-03 13:44 ` Marc Zyngier
2024-04-04 4:35 ` Gavin Shan
2024-04-04 4:35 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzlesgqy.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=gshan@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=rananta@google.com \
--cc=ryan.roberts@arm.com \
--cc=shan.gavin@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=will@kernel.org \
--cc=yangyicong@hisilicon.com \
--cc=yezhenyu2@huawei.com \
--cc=yihyu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.