* [PATCH 00/13] mmu_notifier kill invalidate_page callback v2
@ 2017-08-31 21:17 jglisse
2017-08-31 21:17 ` [PATCH 12/13] KVM: update to new mmu_notifier semantic v2 jglisse
2017-09-02 13:28 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 Andrea Arcangeli
0 siblings, 2 replies; 3+ messages in thread
From: jglisse @ 2017-08-31 21:17 UTC (permalink / raw)
To: linux-mm
Cc: Andrea Arcangeli, Joerg Roedel, kvm, Radim Krčmář,
linux-rdma, linuxppc-dev, Jack Steiner, linux-kernel, dri-devel,
Sudeep Dutt, Ashutosh Dixit, iommu, Jérôme Glisse,
Dimitri Sivanich, amd-gfx, xen-devel, Paolo Bonzini,
Andrew Morton, Linus Torvalds, Dan Williams, Kirill A . Shutemov
From: Jérôme Glisse <jglisse@redhat.com>
(Sorry for so many list cross-posting and big cc)
Changes since v1:
- remove more dead code in kvm (no testing impact)
- more accurate end address computation (patch 2)
in page_mkclean_one and try_to_unmap_one
- added tested-by/reviewed-by gotten so far
Tested as both host and guest kernel with KVM nothing is burning yet.
Previous cover letter:
Please help testing !
The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().
This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.
The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.
By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.
There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
- invalidate_range_start()/end() callback (which allow you to sleep)
- invalidate_range() where you can not sleep but happen right after
page table update under page table lock
Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.
The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.
If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.
Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.
Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.
First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().
The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.
Finaly the last page remove it completely so it can RIP.
Jérôme Glisse (13):
dax: update to new mmu_notifier semantic
mm/rmap: update to new mmu_notifier semantic
powerpc/powernv: update to new mmu_notifier semantic
drm/amdgpu: update to new mmu_notifier semantic
IB/umem: update to new mmu_notifier semantic
IB/hfi1: update to new mmu_notifier semantic
iommu/amd: update to new mmu_notifier semantic
iommu/intel: update to new mmu_notifier semantic
misc/mic/scif: update to new mmu_notifier semantic
sgi-gru: update to new mmu_notifier semantic
xen/gntdev: update to new mmu_notifier semantic
KVM: update to new mmu_notifier semantic
mm/mmu_notifier: kill invalidate_page
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org
Jérôme Glisse (13):
dax: update to new mmu_notifier semantic
mm/rmap: update to new mmu_notifier semantic v2
powerpc/powernv: update to new mmu_notifier semantic
drm/amdgpu: update to new mmu_notifier semantic
IB/umem: update to new mmu_notifier semantic
IB/hfi1: update to new mmu_notifier semantic
iommu/amd: update to new mmu_notifier semantic
iommu/intel: update to new mmu_notifier semantic
misc/mic/scif: update to new mmu_notifier semantic
sgi-gru: update to new mmu_notifier semantic
xen/gntdev: update to new mmu_notifier semantic
KVM: update to new mmu_notifier semantic v2
mm/mmu_notifier: kill invalidate_page
arch/arm/include/asm/kvm_host.h | 6 -----
arch/arm64/include/asm/kvm_host.h | 6 -----
arch/mips/include/asm/kvm_host.h | 5 ----
arch/powerpc/include/asm/kvm_host.h | 5 ----
arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/x86.c | 11 ---------
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 31 -----------------------
drivers/infiniband/core/umem_odp.c | 19 ---------------
drivers/infiniband/hw/hfi1/mmu_rb.c | 9 -------
drivers/iommu/amd_iommu_v2.c | 8 ------
drivers/iommu/intel-svm.c | 9 -------
drivers/misc/mic/scif/scif_dma.c | 11 ---------
drivers/misc/sgi-gru/grutlbpurge.c | 12 ---------
drivers/xen/gntdev.c | 8 ------
fs/dax.c | 19 +++++++++------
include/linux/mm.h | 1 +
include/linux/mmu_notifier.h | 25 -------------------
mm/memory.c | 26 ++++++++++++++++----
mm/mmu_notifier.c | 14 -----------
mm/rmap.c | 35 +++++++++++++++++++++++---
virt/kvm/kvm_main.c | 42 --------------------------------
22 files changed, 65 insertions(+), 249 deletions(-)
--
2.13.5
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 12/13] KVM: update to new mmu_notifier semantic v2
2017-08-31 21:17 [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 jglisse
@ 2017-08-31 21:17 ` jglisse
2017-09-02 13:28 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 Andrea Arcangeli
1 sibling, 0 replies; 3+ messages in thread
From: jglisse @ 2017-08-31 21:17 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Jérôme Glisse, Paolo Bonzini,
Radim Krčmář, kvm, Kirill A . Shutemov,
Andrew Morton, Linus Torvalds, Andrea Arcangeli
From: Jérôme Glisse <jglisse@redhat.com>
Call to mmu_notifier_invalidate_page() are replaced by call to
mmu_notifier_invalidate_range() and thus call are bracketed by
call to mmu_notifier_invalidate_range_start()/end()
Remove now useless invalidate_page callback.
Changed since v1 (Linus Torvalds)
- remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
---
arch/arm/include/asm/kvm_host.h | 6 ------
arch/arm64/include/asm/kvm_host.h | 6 ------
arch/mips/include/asm/kvm_host.h | 5 -----
arch/powerpc/include/asm/kvm_host.h | 5 -----
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/x86.c | 11 ----------
virt/kvm/kvm_main.c | 42 -------------------------------------
7 files changed, 77 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd2e21c..4a879f6ff13b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -225,12 +225,6 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-/* We do not have shadow page tables, hence the empty hooks */
-static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address)
-{
-}
-
struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d68630007b14..e923b58606e2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -326,12 +326,6 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-/* We do not have shadow page tables, hence the empty hooks */
-static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address)
-{
-}
-
struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
void kvm_arm_halt_guest(struct kvm *kvm);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 2998479fd4e8..a9af1d2dcd69 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -938,11 +938,6 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address)
-{
-}
-
/* Emulation */
int kvm_get_inst(u32 *opc, struct kvm_vcpu *vcpu, u32 *out);
enum emulation_result update_pc(struct kvm_vcpu *vcpu, u32 cause);
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8b3f1238d07f..e372ed871c51 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -67,11 +67,6 @@ extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
-static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address)
-{
-}
-
#define HPTEG_CACHE_NUM (1 << 15)
#define HPTEG_HASH_BITS_PTE 13
#define HPTEG_HASH_BITS_PTE_LONG 12
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4d120a3e22e..92c9032502d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1375,8 +1375,6 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu);
-void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address);
void kvm_define_shared_msr(unsigned index, u32 msr);
int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a5e57c6f39..272320eb328c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6734,17 +6734,6 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_reload_apic_access_page);
-void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
- unsigned long address)
-{
- /*
- * The physical address of apic access page is stored in the VMCS.
- * Update it when it becomes invalid.
- */
- if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
- kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
-}
-
/*
* Returns 1 to let vcpu_run() continue the guest execution loop without
* exiting to the userspace. Otherwise, the value will be returned to the
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 15252d723b54..4d81f6ded88e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -322,47 +322,6 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
return container_of(mn, struct kvm, mmu_notifier);
}
-static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
- struct mm_struct *mm,
- unsigned long address)
-{
- struct kvm *kvm = mmu_notifier_to_kvm(mn);
- int need_tlb_flush, idx;
-
- /*
- * When ->invalidate_page runs, the linux pte has been zapped
- * already but the page is still allocated until
- * ->invalidate_page returns. So if we increase the sequence
- * here the kvm page fault will notice if the spte can't be
- * established because the page is going to be freed. If
- * instead the kvm page fault establishes the spte before
- * ->invalidate_page runs, kvm_unmap_hva will release it
- * before returning.
- *
- * The sequence increase only need to be seen at spin_unlock
- * time, and not at spin_lock time.
- *
- * Increasing the sequence after the spin_unlock would be
- * unsafe because the kvm page fault could then establish the
- * pte after kvm_unmap_hva returned, without noticing the page
- * is going to be freed.
- */
- idx = srcu_read_lock(&kvm->srcu);
- spin_lock(&kvm->mmu_lock);
-
- kvm->mmu_notifier_seq++;
- need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty;
- /* we've to flush the tlb before the pages can be freed */
- if (need_tlb_flush)
- kvm_flush_remote_tlbs(kvm);
-
- spin_unlock(&kvm->mmu_lock);
-
- kvm_arch_mmu_notifier_invalidate_page(kvm, address);
-
- srcu_read_unlock(&kvm->srcu, idx);
-}
-
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -510,7 +469,6 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
}
static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
- .invalidate_page = kvm_mmu_notifier_invalidate_page,
.invalidate_range_start = kvm_mmu_notifier_invalidate_range_start,
.invalidate_range_end = kvm_mmu_notifier_invalidate_range_end,
.clear_flush_young = kvm_mmu_notifier_clear_flush_young,
--
2.13.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback v2
2017-08-31 21:17 [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 jglisse
2017-08-31 21:17 ` [PATCH 12/13] KVM: update to new mmu_notifier semantic v2 jglisse
@ 2017-09-02 13:28 ` Andrea Arcangeli
1 sibling, 0 replies; 3+ messages in thread
From: Andrea Arcangeli @ 2017-09-02 13:28 UTC (permalink / raw)
To: jglisse
Cc: linux-mm, linux-kernel, Kirill A . Shutemov, Linus Torvalds,
Andrew Morton, Joerg Roedel, Dan Williams, Sudeep Dutt,
Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Paolo Bonzini,
Radim Krčmář, linuxppc-dev, dri-devel, amd-gfx,
linux-rdma, iommu, xen-devel, kvm
On Thu, Aug 31, 2017 at 05:17:25PM -0400, Jerome Glisse wrote:
> Jérôme Glisse (13):
> dax: update to new mmu_notifier semantic
> mm/rmap: update to new mmu_notifier semantic
> powerpc/powernv: update to new mmu_notifier semantic
> drm/amdgpu: update to new mmu_notifier semantic
> IB/umem: update to new mmu_notifier semantic
> IB/hfi1: update to new mmu_notifier semantic
> iommu/amd: update to new mmu_notifier semantic
> iommu/intel: update to new mmu_notifier semantic
> misc/mic/scif: update to new mmu_notifier semantic
> sgi-gru: update to new mmu_notifier semantic
> xen/gntdev: update to new mmu_notifier semantic
> KVM: update to new mmu_notifier semantic
> mm/mmu_notifier: kill invalidate_page
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-09-02 13:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31 21:17 [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 jglisse
2017-08-31 21:17 ` [PATCH 12/13] KVM: update to new mmu_notifier semantic v2 jglisse
2017-09-02 13:28 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 Andrea Arcangeli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox