From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu, shan.gavin@gmail.com
Subject: Re: [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly
Date: Wed, 2 Sep 2020 14:58:55 +0100 [thread overview]
Message-ID: <058dda6c-61d7-c16a-60ac-82b67496c8dd@arm.com> (raw)
In-Reply-To: <4fb3f8e03e1fb3bb93113037fa62fceb@kernel.org>
Hi Marc,
On 9/2/20 1:04 PM, Marc Zyngier wrote:
> On 2020-09-02 12:53, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> On 9/2/20 12:10 PM, Marc Zyngier wrote:
>>> On 2020-09-02 11:59, Alexandru Elisei wrote:
>>>> Hi,
>>>>
>>>> On 8/22/20 3:44 AM, Gavin Shan wrote:
>>>>> Depending on the kernel configuration, PUD_SIZE could be equal to
>>>>> PMD_SIZE. For example, both of them are 512MB with the following
>>>>> kernel configuration. In this case, both PUD and PMD are folded
>>>>> to PGD.
>>>>>
>>>>> CONFIG_ARM64_64K_PAGES y
>>>>> CONFIG_ARM64_VA_BITS 42
>>>>> CONFIG_PGTABLE_LEVELS 2
>>>>>
>>>>> With the above configuration, the stage2 PUD is used to backup the
>>>>> 512MB huge page when the stage2 mapping is built. During the mapping,
>>>>> the PUD and its subordinate levels of page table entries are unmapped
>>>>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge().
>>>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table
>>>>> entries are zapped. It eventually leads to PUD's present bit can't be
>>>>> cleared successfully and infinite loop in stage2_set_pud_huge().
>>>>>
>>>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of
>>>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the
>>>>> huge page. For this particular case, the stage2 PMD entry should be
>>>>> used to backup the 512MB huge page with stage2_set_pmd_huge().
>>>>
>>>> I can reproduce this on my rockpro64 using kvmtool.
>>>>
>>>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB (checked
>>>> using printk), and second, stage2_set_pud_huge() hangs. I'm working on
>>>> debugging them.
>>>
>>> I have this as an immediate fix for the set_pud_huge hang, tested
>>> on Seattle with 64k/42bits.
>>>
>>> I can't wait to see the back of this code...
>>
>> The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns
>> false (CONFIG_PGTABLE_LEVELS = 2):
>>
>> pudp = stage2_get_pud(mmu, cache, addr);
>> VM_BUG_ON(!pudp);
>>
>> old_pud = *pudp;
>>
>> [..]
>>
>> // Returns 1 because !kvm_stage2_has_pmd()
>> if (stage2_pud_present(kvm, old_pud)) {
>> /*
>> * If we already have table level mapping for this block, unmap
>> * the range for this block and retry.
>> */
>> if (!stage2_pud_huge(kvm, old_pud)) { // Always true because
>> !kvm_stage2_has_pmd()
>> unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
>> goto retry;
>> }
>>
>> And we end up jumping back to retry forever. IMO, in user_mem_abort(),
>> if PUD_SIZE
>> == PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something
>> like this?
>
> Err... If PUD_SIZE == PMD_SIZE, what difference does it make to map
> one or the other?
>
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index ba00bcc0c884..178267dec511 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
>> phys_addr_t fault_ipa,
>> * As for PUD huge maps, we must make sure that we have at least
>> * 3 levels, i.e, PMD is not folded.
>> */
>> - if (vma_pagesize == PMD_SIZE ||
>> - (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
>> + if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm))
>> + vma_pagesize = PMD_SIZE;
>> +
>> + if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE)
>> gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >>
>> PAGE_SHIFT;
>> mmap_read_unlock(current->mm);
>
> I don't think this solves anything in the 2 level case. The gist of
> the issue is that if we go on the PUD path, we end-up computing the
> wrong offset for the entry and either loop or do something silly.
> In either case, we read/write something we have no business touching.
In my testing, I seem to be getting the right offset. For fault_ipa 0x8000 0000,
in stage2_set_pud_huge I get pudp = mmu->pgd + 0x20, which looks like the correct
address for the table entry (IPA[41:29] = 4). That address is valid because in
kvm_init_stage2_mmu we allocate stage2_pgd_size = 16K (40 bits IPA so we only need
to map bits [39:29] in the level 2 table).
The reason why I am seeing the hang is what I tried to explained above:
inkvm_stage2_has_pmd() is always false, which means that in stage2_set_pud_huge,
stage2_pud_present is always true and stage2_pud_huge is always false, which leads
to jumping to the retry label in an infinite loop.
>
> Gavin sidesteps the problem by using a fantasy PUD size, while I
> catch it by explicitly checking for the PMD == PUD case.
My mistake, I shouldn't have posted a diff without testing it. You're right, my
change fixes nothing because vma_pagesize still equals PUD_SIZE == PMD_SIZE.
I tested your patch and the hang goes away. If you decide to keep your patch as-is:
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
As an aside, a comment explaining why the check for PUD_SIZE != PMD_SIZE could
make it easier for other people to understand what is going on.
Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2020-09-02 13:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-22 2:44 [PATCH] arm64/kvm: Fix zapping stage2 page table wrongly Gavin Shan
2020-08-22 10:01 ` Marc Zyngier
2020-08-22 23:59 ` Gavin Shan
2020-09-01 16:50 ` Marc Zyngier
2020-09-02 10:59 ` Alexandru Elisei
2020-09-02 11:10 ` Marc Zyngier
2020-09-02 11:53 ` Alexandru Elisei
2020-09-02 11:56 ` Alexandru Elisei
2020-09-02 12:04 ` Marc Zyngier
2020-09-02 13:58 ` Alexandru Elisei [this message]
2020-09-02 17:38 ` Auger Eric
2020-09-02 23:55 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=058dda6c-61d7-c16a-60ac-82b67496c8dd@arm.com \
--to=alexandru.elisei@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=maz@kernel.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox