From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, catalin.marinas@arm.com,
will@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com,
mark.rutland@arm.com, ryan.roberts@arm.com, rananta@google.com,
yangyicong@hisilicon.com, v-songbaohua@oppo.com,
yezhenyu2@huawei.com, yihyu@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64: tlb: Fix TLBI RANGE operand
Date: Thu, 4 Apr 2024 14:35:58 +1000 [thread overview]
Message-ID: <bd502c7a-e2bf-4cb3-b432-7dd472f1c82e@redhat.com> (raw)
In-Reply-To: <87jzlesgqy.wl-maz@kernel.org>
On 4/3/24 23:44, Marc Zyngier wrote:
> On Wed, 03 Apr 2024 12:37:30 +0100,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> On 4/3/24 18:58, Marc Zyngier wrote:
>>> On Wed, 03 Apr 2024 07:49:29 +0100,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>
>>>> KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
>>>> bitmap is collected by VMM and the corresponding PTEs need to be
>>>> write-protected again. Unfortunately, the operand passed to the TLBI
>>>> RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4
>>>> ("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to
>>>> crash on the destination VM after live migration because some of the
>>>> dirty pages are missed.
>>>>
>>>> For example, I have a VM where 8GB memory is assigned, starting from
>>>> 0x40000000 (1GB). Note that the host has 4KB as the base page size.
>>>> All TLBs for VM can be covered by one TLBI RANGE operation. However,
>>>> I receives 0xffff708000040000 as the operand, which is wrong and the
>>>> correct one should be 0x00007f8000040000. From the wrong operand, we
>>>> have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB
>>>> instead of 8GB memory is covered.
>>>>
>>>> Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI
>>>> RANGE operand are provided.
>>>>
>>>> Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64")
>>>> Cc: stable@kernel.org # v5.10+
>>>> Reported-by: Yihuang Yu <yihyu@redhat.com>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>> arch/arm64/include/asm/tlbflush.h | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>>>> index 3b0e8248e1a4..07c4fb4b82b4 100644
>>>> --- a/arch/arm64/include/asm/tlbflush.h
>>>> +++ b/arch/arm64/include/asm/tlbflush.h
>>>> @@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void)
>>>> */
>>>> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
>>>> #define __TLBI_RANGE_NUM(pages, scale) \
>>>> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
>>>> + ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK)
>>>> /*
>>>> * TLB Invalidation
>>>
>>> This looks pretty wrong, by the very definition of the comment that's
>>> just above:
>>>
>>> <quote>
>>> /*
>>> * Generate 'num' values from -1 to 30 with -1 rejected by the
>>> * __flush_tlb_range() loop below.
>>> */
>>> </quote>
>>>
>>> With your change, num can't ever be negative, and that breaks
>>> __flush_tlb_range_op():
>>>
>>> <quote>
>>> num = __TLBI_RANGE_NUM(pages, scale); \
>>> if (num >= 0) { \
>>> addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
>>> scale, num, tlb_level); \
>>> __tlbi(r##op, addr); \
>>> if (tlbi_user) \
>>> __tlbi_user(r##op, addr); \
>>> start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
>>> pages -= __TLBI_RANGE_PAGES(num, scale); \
>>> } \
>>> scale--; \
>>> </quote>
>>>
>>> We'll then shove whatever value we've found in the TLBI operation,
>>> leading to unknown results instead of properly adjusting the scale to
>>> issue a smaller invalidation.
>>>
>>
>> Marc, thanks for your review and comments.
>>
>> Indeed, this patch is incomplete at least. I think we need __TLBI_RANGE_NUM()
>> to return [-1 31] instead of [-1 30], to be consistent with MAX_TLBI_RANGE_PAGES.
>> -1 will be rejected in the following loop. I'm not 100% sure if I did the correct
>> calculation though.
>>
>> /*
>> * Generate 'num' values in range [-1 31], but -1 will be rejected
>> * by the __flush_tlb_range() loop below.
>> */
>> #define __TLBI_RANGE_NUM(pages, scale) \
>> ({ \
>> int __next = (pages) & (1ULL << (5 * (scale) + 6)); \
>> int __mask = ((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK; \
>> int __num = (((pages) >> (5 * (scale) + 1)) - 1) & \
>> TLBI_RANGE_MASK; \
>> (__next || __mask) ? __num : -1; \
>> })
>
> I'm afraid I don't follow the logic here, and it looks awfully
> complex. I came up with something simpler with this:
>
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 3b0e8248e1a4..b3f1a9c61189 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -161,12 +161,18 @@ static inline unsigned long get_trans_granule(void)
> #define MAX_TLBI_RANGE_PAGES __TLBI_RANGE_PAGES(31, 3)
>
> /*
> - * Generate 'num' values from -1 to 30 with -1 rejected by the
> + * Generate 'num' values from -1 to 31 with -1 rejected by the
> * __flush_tlb_range() loop below.
> */
> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
> -#define __TLBI_RANGE_NUM(pages, scale) \
> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
> +#define __TLBI_RANGE_NUM(pages, scale) \
> + ({ \
> + int __pages = min((pages), \
> + __TLBI_RANGE_PAGES(31, (scale))); \
> + int __numplus1 = __pages >> (5 * (scale) + 1); \
> + \
> + (__numplus1 - 1); \
> + })
>
Thanks, Marc. Both your changes and mine worked, my issue can be fixed at least.
Your version is certainly simpler and clearer. I will integrate your changes to
v2 with TLB_RANGE_MASK dropped since no one will uses it any more.
> /*
> * TLB Invalidation
> @@ -379,10 +385,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> * 3. If there is 1 page remaining, flush it through non-range operations. Range
> * operations can only span an even number of pages. We save this for last to
> * ensure 64KB start alignment is maintained for the LPA2 case.
> - *
> - * Note that certain ranges can be represented by either num = 31 and
> - * scale or num = 0 and scale + 1. The loop below favours the latter
> - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> */
> #define __flush_tlb_range_op(op, start, pages, stride, \
> asid, tlb_level, tlbi_user, lpa2) \
>
>>
>> Alternatively, we can also limit the number of pages to be invalidated from
>> arch/arm64/kvm/hyp/pgtable.c::kvm_tlb_flush_vmid_range() because the maximal
>> capacity is (MAX_TLBI_RANGE_PAGES - 1) instead of MAX_TLBI_RANGE_PAGES, as
>> the comments for __flush_tlb_range_nosync() say.
>>
>> - inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
>> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES - 1);
>>
>>
>> static inline void __flush_tlb_range_nosync(...)
>> {
>> :
>> /*
>> * When not uses TLB range ops, we can handle up to
>> * (MAX_DVM_OPS - 1) pages;
>> * When uses TLB range ops, we can handle up to
>> * (MAX_TLBI_RANGE_PAGES - 1) pages.
>> */
>> if ((!system_supports_tlb_range() &&
>> (end - start) >= (MAX_DVM_OPS * stride)) ||
>> pages >= MAX_TLBI_RANGE_PAGES) {
>> flush_tlb_mm(vma->vm_mm);
>> return;
>> }
>> }
>>
>> Please let me know which way is better.
>
> I would really prefer to fix the range stuff itself instead of
> papering over the problem by reducing the reach of the range
> invalidation.
>
Yes, Agreed.
>>
>>> I think the problem is that you are triggering NUM=31 and SCALE=3,
>>> which the current code cannot handle as per the comment above
>>> __flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously).
>>>
>>
>> Yes, exactly.
>>
>>> Can you try the untested patch below?
>>>
>>>
>>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>>> index 3b0e8248e1a4..b71a1cece802 100644
>>> --- a/arch/arm64/include/asm/tlbflush.h
>>> +++ b/arch/arm64/include/asm/tlbflush.h
>>> @@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
>>> * 3. If there is 1 page remaining, flush it through non-range operations. Range
>>> * operations can only span an even number of pages. We save this for last to
>>> * ensure 64KB start alignment is maintained for the LPA2 case.
>>> - *
>>> - * Note that certain ranges can be represented by either num = 31 and
>>> - * scale or num = 0 and scale + 1. The loop below favours the latter
>>> - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
>>> */
>>> #define __flush_tlb_range_op(op, start, pages, stride, \
>>> asid, tlb_level, tlbi_user, lpa2) \
>>> @@ -407,6 +403,7 @@ do { \
>>> \
>>> num = __TLBI_RANGE_NUM(pages, scale); \
>>> if (num >= 0) { \
>>> + num += 1; \
>>> addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
>>> scale, num, tlb_level); \
>>> __tlbi(r##op, addr); \
>>>
>>
>> Thanks, but I don't think it's going to work. The loop will be running infinitely
>> because the condition 'if (num >= 0)' can't be met when @pages is 0x200000 when
>> @scale is 3/2/1/0 until @scale becomes negative and positive again, but @scale
>> isn't in range [0 3]. I ported the chunk of code to user-space and I can see this
>> with added printf() messages.
>
> Yeah, we lose num==0, which is silly. Hopefully the hack above helps a
> bit.
>
Yes, the hack works. Thank you again.
Thanks,
Gavin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2024-04-04 4:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 6:49 [PATCH] arm64: tlb: Fix TLBI RANGE operand Gavin Shan
2024-04-03 8:58 ` Marc Zyngier
2024-04-03 11:37 ` Gavin Shan
2024-04-03 13:44 ` Marc Zyngier
2024-04-04 4:35 ` Gavin Shan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bd502c7a-e2bf-4cb3-b432-7dd472f1c82e@redhat.com \
--to=gshan@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=rananta@google.com \
--cc=ryan.roberts@arm.com \
--cc=shan.gavin@gmail.com \
--cc=v-songbaohua@oppo.com \
--cc=will@kernel.org \
--cc=yangyicong@hisilicon.com \
--cc=yezhenyu2@huawei.com \
--cc=yihyu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).