* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-07 13:18 ` Marc Zyngier
0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-04-07 13:18 UTC (permalink / raw)
To: Keqian Zhu
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
On Tue, 16 Mar 2021 13:43:38 +0000,
Keqian Zhu <zhukeqian1@huawei.com> wrote:
>
> The MMIO region of a device maybe huge (GB level), try to use
> block mapping in stage2 to speedup both map and unmap.
>
> Compared to normal memory mapping, we should consider two more
> points when try block mapping for MMIO region:
>
> 1. For normal memory mapping, the PA(host physical address) and
> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
> the HVA to request hugepage, so we don't need to consider PA
> alignment when verifing block mapping. But for device memory
> mapping, the PA and HVA may have different alignment.
>
> 2. For normal memory mapping, we are sure hugepage size properly
> fit into vma, so we don't check whether the mapping size exceeds
> the boundary of vma. But for device memory mapping, we should pay
> attention to this.
>
> This adds device_rough_page_shift() to check these two points when
> selecting block mapping size.
>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>
> Mainly for RFC, not fully tested. I will fully test it when the
> code logic is well accepted.
>
> ---
> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c59af5ca01b0..224aa15eb4d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
> }
>
> +/*
> + * Find a mapping size that properly insides the intersection of vma and
> + * memslot. And hva and pa have the same alignment to this mapping size.
> + * It's rough because there are still other restrictions, which will be
> + * checked by the following fault_supports_stage2_huge_mapping().
I don't think these restrictions make complete sense to me. If this is
a PFNMAP VMA, we should use the biggest mapping size that covers the
VMA, and not more than the VMA.
> + */
> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
> + struct vm_area_struct *vma,
> + unsigned long hva)
> +{
> + size_t size = memslot->npages * PAGE_SIZE;
> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
> +
> +#ifndef __PAGETABLE_PMD_FOLDED
> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
> + ALIGN(hva, PUD_SIZE) <= sec_end)
> + return PUD_SHIFT;
> +#endif
> +
> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
> + ALIGN(hva, PMD_SIZE) <= sec_end)
> + return PMD_SHIFT;
> +
> + return PAGE_SHIFT;
> +}
> +
> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
> unsigned long hva,
> unsigned long map_size)
> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> - /* Let's check if we will get back a huge page backed by hugetlbfs */
> + /*
> + * Let's check if we will get back a huge page backed by hugetlbfs, or
> + * get block mapping for device MMIO region.
> + */
> mmap_read_lock(current->mm);
> vma = find_vma_intersection(current->mm, hva, hva + 1);
> if (unlikely(!vma)) {
> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (is_vm_hugetlb_page(vma))
> vma_shift = huge_page_shift(hstate_vma(vma));
> + else if (vma->vm_flags & VM_PFNMAP)
> + vma_shift = device_rough_page_shift(memslot, vma, hva);
> else
> vma_shift = PAGE_SHIFT;
>
> - if (logging_active ||
> - (vma->vm_flags & VM_PFNMAP)) {
> + if (logging_active) {
> force_pte = true;
> vma_shift = PAGE_SHIFT;
But why should we downgrade to page-size mappings if logging? This is
a device, and you aren't moving the device around, are you? Or is your
device actually memory with a device mapping that you are trying to
migrate?
> }
> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (kvm_is_device_pfn(pfn)) {
> device = true;
> - force_pte = true;
> + force_pte = (vma_pagesize == PAGE_SIZE);
> } else if (logging_active && !write_fault) {
> /*
> * Only actually map the page as writable if this was a write
> --
> 2.19.1
>
>
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-07 13:18 ` Marc Zyngier
0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2021-04-07 13:18 UTC (permalink / raw)
To: Keqian Zhu
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
On Tue, 16 Mar 2021 13:43:38 +0000,
Keqian Zhu <zhukeqian1@huawei.com> wrote:
>
> The MMIO region of a device maybe huge (GB level), try to use
> block mapping in stage2 to speedup both map and unmap.
>
> Compared to normal memory mapping, we should consider two more
> points when try block mapping for MMIO region:
>
> 1. For normal memory mapping, the PA(host physical address) and
> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
> the HVA to request hugepage, so we don't need to consider PA
> alignment when verifing block mapping. But for device memory
> mapping, the PA and HVA may have different alignment.
>
> 2. For normal memory mapping, we are sure hugepage size properly
> fit into vma, so we don't check whether the mapping size exceeds
> the boundary of vma. But for device memory mapping, we should pay
> attention to this.
>
> This adds device_rough_page_shift() to check these two points when
> selecting block mapping size.
>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> ---
>
> Mainly for RFC, not fully tested. I will fully test it when the
> code logic is well accepted.
>
> ---
> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c59af5ca01b0..224aa15eb4d9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
> }
>
> +/*
> + * Find a mapping size that properly insides the intersection of vma and
> + * memslot. And hva and pa have the same alignment to this mapping size.
> + * It's rough because there are still other restrictions, which will be
> + * checked by the following fault_supports_stage2_huge_mapping().
I don't think these restrictions make complete sense to me. If this is
a PFNMAP VMA, we should use the biggest mapping size that covers the
VMA, and not more than the VMA.
> + */
> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
> + struct vm_area_struct *vma,
> + unsigned long hva)
> +{
> + size_t size = memslot->npages * PAGE_SIZE;
> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
> +
> +#ifndef __PAGETABLE_PMD_FOLDED
> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
> + ALIGN(hva, PUD_SIZE) <= sec_end)
> + return PUD_SHIFT;
> +#endif
> +
> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
> + ALIGN(hva, PMD_SIZE) <= sec_end)
> + return PMD_SHIFT;
> +
> + return PAGE_SHIFT;
> +}
> +
> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
> unsigned long hva,
> unsigned long map_size)
> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> - /* Let's check if we will get back a huge page backed by hugetlbfs */
> + /*
> + * Let's check if we will get back a huge page backed by hugetlbfs, or
> + * get block mapping for device MMIO region.
> + */
> mmap_read_lock(current->mm);
> vma = find_vma_intersection(current->mm, hva, hva + 1);
> if (unlikely(!vma)) {
> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (is_vm_hugetlb_page(vma))
> vma_shift = huge_page_shift(hstate_vma(vma));
> + else if (vma->vm_flags & VM_PFNMAP)
> + vma_shift = device_rough_page_shift(memslot, vma, hva);
> else
> vma_shift = PAGE_SHIFT;
>
> - if (logging_active ||
> - (vma->vm_flags & VM_PFNMAP)) {
> + if (logging_active) {
> force_pte = true;
> vma_shift = PAGE_SHIFT;
But why should we downgrade to page-size mappings if logging? This is
a device, and you aren't moving the device around, are you? Or is your
device actually memory with a device mapping that you are trying to
migrate?
> }
> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> if (kvm_is_device_pfn(pfn)) {
> device = true;
> - force_pte = true;
> + force_pte = (vma_pagesize == PAGE_SIZE);
> } else if (logging_active && !write_fault) {
> /*
> * Only actually map the page as writable if this was a write
> --
> 2.19.1
>
>
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
2021-04-07 13:18 ` Marc Zyngier
(?)
@ 2021-04-08 7:28 ` Keqian Zhu
-1 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-08 7:28 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvm, Catalin Marinas, linux-kernel, lushenming, Will Deacon,
kvmarm, linux-arm-kernel
Hi Marc,
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
cover the VMA. If that's true and we just consider the boundary of the VMA, our
block mapping may beyond the boundary of memslot. Is this a problem?
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
Thanks for the point. We should not move the device around, so we do not
need to consider logging when we build mapping for device.
I found that logging_active is per memslot and we're sure it's always false
for memslot with PFNMAP VMA, because the kvm_arch_prepare_memory_region()
forbids that. Then I think we're OK here.
Thanks,
Keqian
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-08 7:28 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-08 7:28 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
Hi Marc,
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
cover the VMA. If that's true and we just consider the boundary of the VMA, our
block mapping may beyond the boundary of memslot. Is this a problem?
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
Thanks for the point. We should not move the device around, so we do not
need to consider logging when we build mapping for device.
I found that logging_active is per memslot and we're sure it's always false
for memslot with PFNMAP VMA, because the kvm_arch_prepare_memory_region()
forbids that. Then I think we're OK here.
Thanks,
Keqian
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-08 7:28 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-08 7:28 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
Hi Marc,
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
cover the VMA. If that's true and we just consider the boundary of the VMA, our
block mapping may beyond the boundary of memslot. Is this a problem?
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
Thanks for the point. We should not move the device around, so we do not
need to consider logging when we build mapping for device.
I found that logging_active is per memslot and we're sure it's always false
for memslot with PFNMAP VMA, because the kvm_arch_prepare_memory_region()
forbids that. Then I think we're OK here.
Thanks,
Keqian
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
2021-04-08 7:28 ` Keqian Zhu
(?)
@ 2021-04-14 6:58 ` Keqian Zhu
-1 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 6:58 UTC (permalink / raw)
To: Marc Zyngier; +Cc: kvm, linux-kernel, kvmarm, linux-arm-kernel
Hi Marc,
On 2021/4/8 15:28, Keqian Zhu wrote:
> Hi Marc,
>
> On 2021/4/7 21:18, Marc Zyngier wrote:
>> On Tue, 16 Mar 2021 13:43:38 +0000,
>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>
[...]
>>>
>>> +/*
>>> + * Find a mapping size that properly insides the intersection of vma and
>>> + * memslot. And hva and pa have the same alignment to this mapping size.
>>> + * It's rough because there are still other restrictions, which will be
>>> + * checked by the following fault_supports_stage2_huge_mapping().
>>
>> I don't think these restrictions make complete sense to me. If this is
>> a PFNMAP VMA, we should use the biggest mapping size that covers the
>> VMA, and not more than the VMA.
> But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
> cover the VMA. If that's true and we just consider the boundary of the VMA, our
> block mapping may beyond the boundary of memslot. Is this a problem?
emm... Sorry I missed something. The fault_supports_stage2_huge_mapping() will check
the boundary of memslot, so we don't need to check it here. I have send v3, please
check that.
BRs,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-14 6:58 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 6:58 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Mark Rutland,
James Morse, Suzuki K Poulose, Julien Thierry, wanghaibin.wang
Hi Marc,
On 2021/4/8 15:28, Keqian Zhu wrote:
> Hi Marc,
>
> On 2021/4/7 21:18, Marc Zyngier wrote:
>> On Tue, 16 Mar 2021 13:43:38 +0000,
>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>
[...]
>>>
>>> +/*
>>> + * Find a mapping size that properly insides the intersection of vma and
>>> + * memslot. And hva and pa have the same alignment to this mapping size.
>>> + * It's rough because there are still other restrictions, which will be
>>> + * checked by the following fault_supports_stage2_huge_mapping().
>>
>> I don't think these restrictions make complete sense to me. If this is
>> a PFNMAP VMA, we should use the biggest mapping size that covers the
>> VMA, and not more than the VMA.
> But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
> cover the VMA. If that's true and we just consider the boundary of the VMA, our
> block mapping may beyond the boundary of memslot. Is this a problem?
emm... Sorry I missed something. The fault_supports_stage2_huge_mapping() will check
the boundary of memslot, so we don't need to check it here. I have send v3, please
check that.
BRs,
Keqian
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-14 6:58 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 6:58 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Mark Rutland,
James Morse, Suzuki K Poulose, Julien Thierry, wanghaibin.wang
Hi Marc,
On 2021/4/8 15:28, Keqian Zhu wrote:
> Hi Marc,
>
> On 2021/4/7 21:18, Marc Zyngier wrote:
>> On Tue, 16 Mar 2021 13:43:38 +0000,
>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>
[...]
>>>
>>> +/*
>>> + * Find a mapping size that properly insides the intersection of vma and
>>> + * memslot. And hva and pa have the same alignment to this mapping size.
>>> + * It's rough because there are still other restrictions, which will be
>>> + * checked by the following fault_supports_stage2_huge_mapping().
>>
>> I don't think these restrictions make complete sense to me. If this is
>> a PFNMAP VMA, we should use the biggest mapping size that covers the
>> VMA, and not more than the VMA.
> But as described by kvm_arch_prepare_memory_region(), the memslot may not fully
> cover the VMA. If that's true and we just consider the boundary of the VMA, our
> block mapping may beyond the boundary of memslot. Is this a problem?
emm... Sorry I missed something. The fault_supports_stage2_huge_mapping() will check
the boundary of memslot, so we don't need to check it here. I have send v3, please
check that.
BRs,
Keqian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
2021-04-07 13:18 ` Marc Zyngier
(?)
@ 2021-04-14 2:48 ` Keqian Zhu
-1 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 2:48 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvm, Catalin Marinas, linux-kernel, lushenming, Will Deacon,
kvmarm, linux-arm-kernel
Hi Marc,
I think I have fully tested this patch. The next step is to do some restriction on
HVA in vfio module, so we can build block mapping for it with a higher probability.
Is there anything to improve? If not, could you apply it? ^_^
Thanks,
Keqian
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-14 2:48 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 2:48 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
Hi Marc,
I think I have fully tested this patch. The next step is to do some restriction on
HVA in vfio module, so we can build block mapping for it with a higher probability.
Is there anything to improve? If not, could you apply it? ^_^
Thanks,
Keqian
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [RFC PATCH v2 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
@ 2021-04-14 2:48 ` Keqian Zhu
0 siblings, 0 replies; 24+ messages in thread
From: Keqian Zhu @ 2021-04-14 2:48 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-kernel, linux-arm-kernel, kvm, kvmarm, Will Deacon,
Catalin Marinas, Mark Rutland, James Morse, Suzuki K Poulose,
Julien Thierry, wanghaibin.wang, jiangkunkun, yuzenghui,
lushenming
Hi Marc,
I think I have fully tested this patch. The next step is to do some restriction on
HVA in vfio module, so we can build block mapping for it with a higher probability.
Is there anything to improve? If not, could you apply it? ^_^
Thanks,
Keqian
On 2021/4/7 21:18, Marc Zyngier wrote:
> On Tue, 16 Mar 2021 13:43:38 +0000,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>
>> The MMIO region of a device maybe huge (GB level), try to use
>> block mapping in stage2 to speedup both map and unmap.
>>
>> Compared to normal memory mapping, we should consider two more
>> points when try block mapping for MMIO region:
>>
>> 1. For normal memory mapping, the PA(host physical address) and
>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
>> the HVA to request hugepage, so we don't need to consider PA
>> alignment when verifing block mapping. But for device memory
>> mapping, the PA and HVA may have different alignment.
>>
>> 2. For normal memory mapping, we are sure hugepage size properly
>> fit into vma, so we don't check whether the mapping size exceeds
>> the boundary of vma. But for device memory mapping, we should pay
>> attention to this.
>>
>> This adds device_rough_page_shift() to check these two points when
>> selecting block mapping size.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> ---
>>
>> Mainly for RFC, not fully tested. I will fully test it when the
>> code logic is well accepted.
>>
>> ---
>> arch/arm64/kvm/mmu.c | 42 ++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 38 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index c59af5ca01b0..224aa15eb4d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -624,6 +624,36 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
>> send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
>> }
>>
>> +/*
>> + * Find a mapping size that properly insides the intersection of vma and
>> + * memslot. And hva and pa have the same alignment to this mapping size.
>> + * It's rough because there are still other restrictions, which will be
>> + * checked by the following fault_supports_stage2_huge_mapping().
>
> I don't think these restrictions make complete sense to me. If this is
> a PFNMAP VMA, we should use the biggest mapping size that covers the
> VMA, and not more than the VMA.
>
>> + */
>> +static short device_rough_page_shift(struct kvm_memory_slot *memslot,
>> + struct vm_area_struct *vma,
>> + unsigned long hva)
>> +{
>> + size_t size = memslot->npages * PAGE_SIZE;
>> + hva_t sec_start = max(memslot->userspace_addr, vma->vm_start);
>> + hva_t sec_end = min(memslot->userspace_addr + size, vma->vm_end);
>> + phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PUD_SIZE) >= sec_start &&
>> + ALIGN(hva, PUD_SIZE) <= sec_end)
>> + return PUD_SHIFT;
>> +#endif
>> +
>> + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
>> + ALIGN_DOWN(hva, PMD_SIZE) >= sec_start &&
>> + ALIGN(hva, PMD_SIZE) <= sec_end)
>> + return PMD_SHIFT;
>> +
>> + return PAGE_SHIFT;
>> +}
>> +
>> static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>> unsigned long hva,
>> unsigned long map_size)
>> @@ -769,7 +799,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> return -EFAULT;
>> }
>>
>> - /* Let's check if we will get back a huge page backed by hugetlbfs */
>> + /*
>> + * Let's check if we will get back a huge page backed by hugetlbfs, or
>> + * get block mapping for device MMIO region.
>> + */
>> mmap_read_lock(current->mm);
>> vma = find_vma_intersection(current->mm, hva, hva + 1);
>> if (unlikely(!vma)) {
>> @@ -780,11 +813,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (is_vm_hugetlb_page(vma))
>> vma_shift = huge_page_shift(hstate_vma(vma));
>> + else if (vma->vm_flags & VM_PFNMAP)
>> + vma_shift = device_rough_page_shift(memslot, vma, hva);
>> else
>> vma_shift = PAGE_SHIFT;
>>
>> - if (logging_active ||
>> - (vma->vm_flags & VM_PFNMAP)) {
>> + if (logging_active) {
>> force_pte = true;
>> vma_shift = PAGE_SHIFT;
>
> But why should we downgrade to page-size mappings if logging? This is
> a device, and you aren't moving the device around, are you? Or is your
> device actually memory with a device mapping that you are trying to
> migrate?
>
>> }
>> @@ -855,7 +889,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>
>> if (kvm_is_device_pfn(pfn)) {
>> device = true;
>> - force_pte = true;
>> + force_pte = (vma_pagesize == PAGE_SIZE);
>> } else if (logging_active && !write_fault) {
>> /*
>> * Only actually map the page as writable if this was a write
>> --
>> 2.19.1
>>
>>
>
> Thanks,
>
> M.
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread