From: Catalin Marinas <catalin.marinas@arm.com>
To: ankita@nvidia.com
Cc: jgg@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
alex.williamson@redhat.com, kevin.tian@intel.com,
yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org,
gshan@redhat.com, linux-mm@kvack.org, lpieralisi@kernel.org,
aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices
Date: Tue, 12 Dec 2023 17:46:34 +0000 [thread overview]
Message-ID: <ZXicemDzXm8NShs1@arm.com> (raw)
In-Reply-To: <20231208164709.23101-3-ankita@nvidia.com>
On Fri, Dec 08, 2023 at 10:17:09PM +0530, ankita@nvidia.com wrote:
> arch/arm64/kvm/hyp/pgtable.c | 3 +++
> arch/arm64/kvm/mmu.c | 16 +++++++++++++---
> drivers/vfio/pci/vfio_pci_core.c | 3 ++-
> include/linux/mm.h | 7 +++++++
> 4 files changed, 25 insertions(+), 4 deletions(-)
It might be worth factoring out the vfio bits into a separate patch
together with a bit of documentation around this new vma flag (up to
Alex really).
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d4835d553c61..c8696c9e7a60 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -722,6 +722,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
> kvm_pte_t attr;
> u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
>
> + if (device && normal_nc)
> + return -EINVAL;
Ah, the comment Will and I made on patch 1 is handled here. Add a
WARN_ON_ONCE() and please move this hunk to the first patch, it makes
more sense there.
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d14504821b79..1ce1b6d89bf9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1381,7 +1381,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> int ret = 0;
> bool write_fault, writable, force_pte = false;
> bool exec_fault, mte_allowed;
> - bool device = false;
> + bool device = false, vfio_pci_device = false;
I don't think the variable here should be named vfio_pci_device, the
VM_* flag doesn't mention PCI. So just something like "vfio_allow_wc".
> unsigned long mmu_seq;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> @@ -1472,6 +1472,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> gfn = fault_ipa >> PAGE_SHIFT;
> mte_allowed = kvm_vma_mte_allowed(vma);
>
> + vfio_pci_device = !!(vma->vm_flags & VM_VFIO_ALLOW_WC);
Nitpick: no need for !!, you are assigning to a bool variable already.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1cbc990d42e0..c3f95ec7fc3a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1863,7 +1863,8 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> * See remap_pfn_range(), called from vfio_pci_fault() but we can't
> * change vm_flags within the fault handler. Set them now.
> */
> - vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> + vm_flags_set(vma, VM_VFIO_ALLOW_WC | VM_IO | VM_PFNMAP |
> + VM_DONTEXPAND | VM_DONTDUMP);
Please add a comment here that write-combining is allowed to be enabled
by the arch (KVM) code but the default user mmap() will still use
pgprot_noncached().
> vma->vm_ops = &vfio_pci_mmap_ops;
>
> return 0;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a422cc123a2d..8d3c4820c492 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -391,6 +391,13 @@ extern unsigned int kobjsize(const void *objp);
> # define VM_UFFD_MINOR VM_NONE
> #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>
> +#ifdef CONFIG_64BIT
> +#define VM_VFIO_ALLOW_WC_BIT 39 /* Convey KVM to map S2 NORMAL_NC */
This comment shouldn't be in the core header file. It knows nothing
about S2 and Normal-NC, that's arm64 terminology. You can mention
something like VFIO can use this flag hint that write-combining is
allowed.
> +#define VM_VFIO_ALLOW_WC BIT(VM_VFIO_ALLOW_WC_BIT)
> +#else
> +#define VM_VFIO_ALLOW_WC VM_NONE
> +#endif
And I think we need to add some documentation (is there any
VFIO-specific doc) that describes what this flag actually means, what is
permitted. For example, arm64 doesn't have write-combining without
speculative fetches. So if one adds this flag to a new driver, they
should know the implications. There's also an expectation that the
actual driver (KVM guests) or maybe later DPDK can choose the safe
non-cacheable or write-combine (Linux terminology) attributes for the
BAR.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: ankita@nvidia.com
Cc: jgg@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
alex.williamson@redhat.com, kevin.tian@intel.com,
yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org,
gshan@redhat.com, linux-mm@kvack.org, lpieralisi@kernel.org,
aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
mochs@nvidia.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices
Date: Tue, 12 Dec 2023 17:46:34 +0000 [thread overview]
Message-ID: <ZXicemDzXm8NShs1@arm.com> (raw)
In-Reply-To: <20231208164709.23101-3-ankita@nvidia.com>
On Fri, Dec 08, 2023 at 10:17:09PM +0530, ankita@nvidia.com wrote:
> arch/arm64/kvm/hyp/pgtable.c | 3 +++
> arch/arm64/kvm/mmu.c | 16 +++++++++++++---
> drivers/vfio/pci/vfio_pci_core.c | 3 ++-
> include/linux/mm.h | 7 +++++++
> 4 files changed, 25 insertions(+), 4 deletions(-)
It might be worth factoring out the vfio bits into a separate patch
together with a bit of documentation around this new vma flag (up to
Alex really).
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d4835d553c61..c8696c9e7a60 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -722,6 +722,9 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
> kvm_pte_t attr;
> u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
>
> + if (device && normal_nc)
> + return -EINVAL;
Ah, the comment Will and I made on patch 1 is handled here. Add a
WARN_ON_ONCE() and please move this hunk to the first patch, it makes
more sense there.
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d14504821b79..1ce1b6d89bf9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1381,7 +1381,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> int ret = 0;
> bool write_fault, writable, force_pte = false;
> bool exec_fault, mte_allowed;
> - bool device = false;
> + bool device = false, vfio_pci_device = false;
I don't think the variable here should be named vfio_pci_device, the
VM_* flag doesn't mention PCI. So just something like "vfio_allow_wc".
> unsigned long mmu_seq;
> struct kvm *kvm = vcpu->kvm;
> struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> @@ -1472,6 +1472,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> gfn = fault_ipa >> PAGE_SHIFT;
> mte_allowed = kvm_vma_mte_allowed(vma);
>
> + vfio_pci_device = !!(vma->vm_flags & VM_VFIO_ALLOW_WC);
Nitpick: no need for !!, you are assigning to a bool variable already.
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 1cbc990d42e0..c3f95ec7fc3a 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1863,7 +1863,8 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
> * See remap_pfn_range(), called from vfio_pci_fault() but we can't
> * change vm_flags within the fault handler. Set them now.
> */
> - vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> + vm_flags_set(vma, VM_VFIO_ALLOW_WC | VM_IO | VM_PFNMAP |
> + VM_DONTEXPAND | VM_DONTDUMP);
Please add a comment here that write-combining is allowed to be enabled
by the arch (KVM) code but the default user mmap() will still use
pgprot_noncached().
> vma->vm_ops = &vfio_pci_mmap_ops;
>
> return 0;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a422cc123a2d..8d3c4820c492 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -391,6 +391,13 @@ extern unsigned int kobjsize(const void *objp);
> # define VM_UFFD_MINOR VM_NONE
> #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
>
> +#ifdef CONFIG_64BIT
> +#define VM_VFIO_ALLOW_WC_BIT 39 /* Convey KVM to map S2 NORMAL_NC */
This comment shouldn't be in the core header file. It knows nothing
about S2 and Normal-NC, that's arm64 terminology. You can mention
something like VFIO can use this flag hint that write-combining is
allowed.
> +#define VM_VFIO_ALLOW_WC BIT(VM_VFIO_ALLOW_WC_BIT)
> +#else
> +#define VM_VFIO_ALLOW_WC VM_NONE
> +#endif
And I think we need to add some documentation (is there any
VFIO-specific doc) that describes what this flag actually means, what is
permitted. For example, arm64 doesn't have write-combining without
speculative fetches. So if one adds this flag to a new driver, they
should know the implications. There's also an expectation that the
actual driver (KVM guests) or maybe later DPDK can choose the safe
non-cacheable or write-combine (Linux terminology) attributes for the
BAR.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-12 17:46 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-08 16:47 [PATCH v3 0/2] kvm: arm64: allow vm to select DEVICE_* and ankita
2023-12-08 16:47 ` ankita
2023-12-08 16:47 ` [PATCH v3 1/2] kvm: arm64: introduce new flag for non-cacheable IO memory ankita
2023-12-08 16:47 ` ankita
2023-12-12 12:17 ` Will Deacon
2023-12-12 12:17 ` Will Deacon
2023-12-12 17:31 ` Catalin Marinas
2023-12-12 17:31 ` Catalin Marinas
2024-01-03 11:43 ` Suzuki K Poulose
2024-01-03 11:43 ` Suzuki K Poulose
2024-01-03 13:25 ` Ankit Agrawal
2024-01-03 13:25 ` Ankit Agrawal
2023-12-08 16:47 ` [PATCH v3 2/2] kvm: arm64: set io memory s2 pte as normalnc for vfio pci devices ankita
2023-12-08 16:47 ` ankita
2023-12-12 17:46 ` Catalin Marinas [this message]
2023-12-12 17:46 ` Catalin Marinas
2023-12-12 18:11 ` Jason Gunthorpe
2023-12-12 18:11 ` Jason Gunthorpe
2023-12-13 20:05 ` Oliver Upton
2023-12-13 20:05 ` Oliver Upton
2023-12-14 15:48 ` Lorenzo Pieralisi
2023-12-14 15:48 ` Lorenzo Pieralisi
2023-12-14 16:56 ` Oliver Upton
2023-12-14 16:56 ` Oliver Upton
2023-12-21 13:19 ` Catalin Marinas
2023-12-21 13:19 ` Catalin Marinas
2024-01-02 17:09 ` Jason Gunthorpe
2024-01-02 17:09 ` Jason Gunthorpe
2024-01-03 13:33 ` Catalin Marinas
2024-01-03 13:33 ` Catalin Marinas
2024-01-05 20:42 ` Oliver Upton
2024-01-05 20:42 ` Oliver Upton
2024-01-08 11:04 ` Catalin Marinas
2024-01-08 11:04 ` Catalin Marinas
2024-01-08 13:18 ` Jason Gunthorpe
2024-01-08 13:18 ` Jason Gunthorpe
2024-01-02 17:26 ` Jason Gunthorpe
2024-01-02 17:26 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXicemDzXm8NShs1@arm.com \
--to=catalin.marinas@arm.com \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=cjia@nvidia.com \
--cc=danw@nvidia.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=oliver.upton@linux.dev \
--cc=suzuki.poulose@arm.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.