* [PATCH 00/13] mmu_notifier kill invalidate_page callback
@ 2017-08-29 23:54 Jérôme Glisse
[not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
Cc: Jérôme Glisse, Kirill A . Shutemov, Linus Torvalds,
Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams,
Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner,
Paolo Bonzini, Radim Krčmář,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, kvm
(Sorry for so many list cross-posting and big cc)
Please help testing !
The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().
This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.
The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.
By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.
There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
- invalidate_range_start()/end() callback (which allow you to sleep)
- invalidate_range() where you can not sleep but happen right after
page table update under page table lock
Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.
The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.
If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.
Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.
Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.
First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().
The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.
Finaly the last page remove it completely so it can RIP.
Jérôme Glisse (13):
dax: update to new mmu_notifier semantic
mm/rmap: update to new mmu_notifier semantic
powerpc/powernv: update to new mmu_notifier semantic
drm/amdgpu: update to new mmu_notifier semantic
IB/umem: update to new mmu_notifier semantic
IB/hfi1: update to new mmu_notifier semantic
iommu/amd: update to new mmu_notifier semantic
iommu/intel: update to new mmu_notifier semantic
misc/mic/scif: update to new mmu_notifier semantic
sgi-gru: update to new mmu_notifier semantic
xen/gntdev: update to new mmu_notifier semantic
KVM: update to new mmu_notifier semantic
mm/mmu_notifier: kill invalidate_page
Cc: Kirill A. Shutemov <kirill.shutemov-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Cc: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Andrea Arcangeli <aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org>
Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Sudeep Dutt <sudeep.dutt-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Ashutosh Dixit <ashutosh.dixit-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Dimitri Sivanich <sivanich-sJ/iWh9BUns@public.gmane.org>
Cc: Jack Steiner <steiner-sJ/iWh9BUns@public.gmane.org>
Cc: Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Radim Krčmář <rkrcmar-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b@public.gmane.org
Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 31 ----------------------
drivers/infiniband/core/umem_odp.c | 19 --------------
drivers/infiniband/hw/hfi1/mmu_rb.c | 9 -------
drivers/iommu/amd_iommu_v2.c | 8 ------
drivers/iommu/intel-svm.c | 9 -------
drivers/misc/mic/scif/scif_dma.c | 11 --------
drivers/misc/sgi-gru/grutlbpurge.c | 12 ---------
drivers/xen/gntdev.c | 8 ------
fs/dax.c | 19 ++++++++------
include/linux/mm.h | 1 +
include/linux/mmu_notifier.h | 25 ------------------
mm/memory.c | 26 +++++++++++++++----
mm/mmu_notifier.c | 14 ----------
mm/rmap.c | 44 +++++++++++++++++++++++++++++---
virt/kvm/kvm_main.c | 42 ------------------------------
16 files changed, 74 insertions(+), 214 deletions(-)
--
2.13.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread[parent not found: <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH 07/13] iommu/amd: update to new mmu_notifier semantic [not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-08-29 23:54 ` Jérôme Glisse 2017-08-30 0:11 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback Linus Torvalds 1 sibling, 0 replies; 17+ messages in thread From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Cc: Andrea Arcangeli, Joerg Roedel, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jérôme Glisse, Andrew Morton, Linus Torvalds, Kirill A . Shutemov Call to mmu_notifier_invalidate_page() are replaced by call to mmu_notifier_invalidate_range() and thus call are bracketed by call to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> --- drivers/iommu/amd_iommu_v2.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 6629c472eafd..dccf5b76eff2 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -391,13 +391,6 @@ static int mn_clear_flush_young(struct mmu_notifier *mn, return 0; } -static void mn_invalidate_page(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long address) -{ - __mn_flush_page(mn, address); -} - static void mn_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end) @@ -436,7 +429,6 @@ static void mn_release(struct mmu_notifier *mn, struct mm_struct *mm) static const struct mmu_notifier_ops iommu_mn = { .release = mn_release, .clear_flush_young = mn_clear_flush_young, - .invalidate_page = mn_invalidate_page, .invalidate_range = mn_invalidate_range, }; -- 2.13.5 _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback [not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2017-08-29 23:54 ` [PATCH 07/13] iommu/amd: update to new mmu_notifier semantic Jérôme Glisse @ 2017-08-30 0:11 ` Linus Torvalds 2017-08-30 0:56 ` Jerome Glisse 1 sibling, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2017-08-30 0:11 UTC (permalink / raw) To: Jérôme Glisse, Bernhard Held, Adam Borowski Cc: Andrea Arcangeli, Joerg Roedel, KVM list, Radim Krčmář, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jack Steiner, Linux Kernel Mailing List, DRI, Sudeep Dutt, Ashutosh Dixit, linux-mm, open list:AMD IOMMU (AMD-VI), Dimitri Sivanich, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, xen-devel, Paolo Bonzini, Andrew Morton, ppc-dev, Dan Williams, Kirill A . Shutemov On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote: > > Note this is barely tested. I intend to do more testing of next few days > but i do not have access to all hardware that make use of the mmu_notifier > API. Thanks for doing this. > First 2 patches convert existing call of mmu_notifier_invalidate_page() > to mmu_notifier_invalidate_range() and bracket those call with call to > mmu_notifier_invalidate_range_start()/end(). Ok, those two patches are a bit more complex than I was hoping for, but not *too* bad. And the final end result certainly looks nice: > 16 files changed, 74 insertions(+), 214 deletions(-) Yeah, removing all those invalidate_page() notifiers certainly makes for a nice patch. And I actually think you missed some more lines that can now be removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be needed either, so you can remove all of those too (most of them are empty inline functions, but x86 has one that actually does something. So there's an added 30 or so dead lines that should be removed in the kvm patch, I think. But from a _very_ quick read-through this looks fine. But it obviously needs testing. People - *especially* the people who saw issues under KVM - can you try out Jérôme's patch-series? I aded some people to the cc, the full series is on lkml. Jérôme - do you have a git branch for people to test that they could easily pull and try out? Linus _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-08-30 0:11 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback Linus Torvalds @ 2017-08-30 0:56 ` Jerome Glisse 2017-08-30 8:40 ` Mike Galbraith 2017-08-30 14:57 ` Adam Borowski 0 siblings, 2 replies; 17+ messages in thread From: Jerome Glisse @ 2017-08-30 0:56 UTC (permalink / raw) To: Linus Torvalds Cc: Bernhard Held, KVM list, Radim Krčmář, Sudeep Dutt, DRI, linux-mm, Andrea Arcangeli, Dimitri Sivanich, linux-rdma@vger.kernel.org, amd-gfx, xen-devel, Adam Borowski, Joerg Roedel, Jack Steiner, Dan Williams, Linux Kernel Mailing List, Ashutosh Dixit, open list:AMD IOMMU (AMD-VI), Paolo Bonzini, Andrew Morton, ppc-dev On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 4:54 PM, Jérôme Glisse <jglisse@redhat.com> wrote: > > > > Note this is barely tested. I intend to do more testing of next few days > > but i do not have access to all hardware that make use of the mmu_notifier > > API. > > Thanks for doing this. > > > First 2 patches convert existing call of mmu_notifier_invalidate_page() > > to mmu_notifier_invalidate_range() and bracket those call with call to > > mmu_notifier_invalidate_range_start()/end(). > > Ok, those two patches are a bit more complex than I was hoping for, > but not *too* bad. > > And the final end result certainly looks nice: > > > 16 files changed, 74 insertions(+), 214 deletions(-) > > Yeah, removing all those invalidate_page() notifiers certainly makes > for a nice patch. > > And I actually think you missed some more lines that can now be > removed: kvm_arch_mmu_notifier_invalidate_page() should no longer be > needed either, so you can remove all of those too (most of them are > empty inline functions, but x86 has one that actually does something. > > So there's an added 30 or so dead lines that should be removed in the > kvm patch, I think. Yes i missed that. I will wait for people to test and for result of my own test before reposting if need be, otherwise i will post as separate patch. > > But from a _very_ quick read-through this looks fine. But it obviously > needs testing. > > People - *especially* the people who saw issues under KVM - can you > try out Jérôme's patch-series? I aded some people to the cc, the full > series is on lkml. Jérôme - do you have a git branch for people to > test that they could easily pull and try out? https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch git://people.freedesktop.org/~glisse/linux (Sorry if that tree is bit big it has a lot of dead thing i need to push a clean and slim one) Jérôme _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-08-30 0:56 ` Jerome Glisse @ 2017-08-30 8:40 ` Mike Galbraith 2017-08-30 14:57 ` Adam Borowski 1 sibling, 0 replies; 17+ messages in thread From: Mike Galbraith @ 2017-08-30 8:40 UTC (permalink / raw) To: Jerome Glisse, Linus Torvalds Cc: Bernhard Held, Adam Borowski, Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Paolo Bonzini, Radim Krčmář, ppc-dev, DRI, amd-gfx, linux-rdma@vger.kernel.org, open list:AMD IOMMU (AMD-VI), xen-devel On Tue, 2017-08-29 at 20:56 -0400, Jerome Glisse wrote: > On Tue, Aug 29, 2017 at 05:11:24PM -0700, Linus Torvalds wrote: > > > People - *especially* the people who saw issues under KVM - can you > > try out Jérôme's patch-series? I aded some people to the cc, the full > > series is on lkml. Jérôme - do you have a git branch for people to > > test that they could easily pull and try out? > > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch > git://people.freedesktop.org/~glisse/linux Looks good here. I reproduced fairly quickly with RT host and 1 RT guest by just having the guest do a parallel kbuild over NFS (the guest had to be restored afterward, was corrupted). I'm currently flogging 2 guests as well as the host, whimper free. I'll let the lot broil for while longer, but at this point, smoke/flame appearance seems comfortingly unlikely. -Mike -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-08-30 0:56 ` Jerome Glisse 2017-08-30 8:40 ` Mike Galbraith @ 2017-08-30 14:57 ` Adam Borowski 2017-09-01 14:47 ` Jeff Cook 1 sibling, 1 reply; 17+ messages in thread From: Adam Borowski @ 2017-08-30 14:57 UTC (permalink / raw) To: Jerome Glisse Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Paolo Bonzini, Radim Krčmář, ppc-dev, DRI, amd-gfx, linux-rdma@vger.kernel.org, open list:AMD IOMMU (AMD-VI) On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote: > I will wait for people to test and for result of my own test before > reposting if need be, otherwise i will post as separate patch. > > > But from a _very_ quick read-through this looks fine. But it obviously > > needs testing. > > > > People - *especially* the people who saw issues under KVM - can you > > try out Jérôme's patch-series? I aded some people to the cc, the full > > series is on lkml. Jérôme - do you have a git branch for people to > > test that they could easily pull and try out? > > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch > git://people.freedesktop.org/~glisse/linux Tested your branch as of 10f07641, on a long list of guest VMs. No earth-shattering kaboom. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din ⠈⠳⣄⠀⠀⠀⠀ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-08-30 14:57 ` Adam Borowski @ 2017-09-01 14:47 ` Jeff Cook 2017-09-01 14:50 ` taskboxtester 0 siblings, 1 reply; 17+ messages in thread From: Jeff Cook @ 2017-09-01 14:47 UTC (permalink / raw) To: Adam Borowski, Jerome Glisse Cc: Linus Torvalds, Bernhard Held, Linux Kernel Mailing List, linux-mm, Kirill A . Shutemov, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Paolo Bonzini, Radim Krčmář, ppc-dev, DRI, amd-gfx, linux-rdma, open list:AMD IOMMU (AMD-VI), xen-devel On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote: > On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote: > > I will wait for people to test and for result of my own test before > > reposting if need be, otherwise i will post as separate patch. > > > > > But from a _very_ quick read-through this looks fine. But it obviously > > > needs testing. > > > > > > People - *especially* the people who saw issues under KVM - can you > > > try out Jérôme's patch-series? I aded some people to the cc, the full > > > series is on lkml. Jérôme - do you have a git branch for people to > > > test that they could easily pull and try out? > > > > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch > > git://people.freedesktop.org/~glisse/linux > > Tested your branch as of 10f07641, on a long list of guest VMs. > No earth-shattering kaboom. I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36 hours or so, also without incident. Unlike most other reporters, I experienced a similar splat on 4.12: Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------ Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm] Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_ MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class aes_x86_6 4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec snd_oxygen_lib snd_hda_core Aug 03 15:02:47 kvm_master kernel: snd_mpu401_uart snd_rawmidi snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_raid raid456 libcrc32c Aug 03 15:02:47 kvm_master kernel: crc32c_generic crc32c_intel async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_mod dax raid1 md_mod Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2 Tainted: P B D W O 4.12.3-1-ARCH #1 Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016 Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack: ffffc179880d8000 Aug 03 15:02:47 kvm_master kernel: RIP: 0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm] Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS: 00010246 Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX: 00000009c07cce77 RCX: dead0000000000ff Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI: ffff9fa82d6d6f08 RDI: fffff6e76701f300 Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08: 0000000000100000 R09: 000000000000000d Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11: ffff9fa0a56b0000 R12: 00000000009c07cc Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14: ffff9f9e19dbb1b8 R15: 0000000000000000 Aug 03 15:02:47 kvm_master kernel: FS: 0000000000000000(0000) GS:ffff9fac5f340000(0000) knlGS:0000000000000000 Aug 03 15:02:47 kvm_master kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3: 0000000570a09000 CR4: 00000000003426e0 Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 03 15:02:47 kvm_master kernel: Call Trace: Aug 03 15:02:47 kvm_master kernel: drop_spte+0x1a/0xb0 [kvm] Aug 03 15:02:47 kvm_master kernel: mmu_page_zap_pte+0x9c/0xe0 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_prepare_zap_page+0x65/0x310 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_arch_flush_shadow_all+0xe/0x10 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_notifier_release+0x2c/0x40 [kvm] Aug 03 15:02:47 kvm_master kernel: __mmu_notifier_release+0x44/0xc0 Aug 03 15:02:47 kvm_master kernel: exit_mmap+0x142/0x150 Aug 03 15:02:47 kvm_master kernel: ? kfree+0x175/0x190 Aug 03 15:02:47 kvm_master kernel: ? kfree+0x175/0x190 Aug 03 15:02:47 kvm_master kernel: ? exit_aio+0xc6/0x100 Aug 03 15:02:47 kvm_master kernel: mmput_async_fn+0x4c/0x130 Aug 03 15:02:47 kvm_master kernel: process_one_work+0x1de/0x430 Aug 03 15:02:47 kvm_master kernel: worker_thread+0x47/0x3f0 Aug 03 15:02:47 kvm_master kernel: kthread+0x125/0x140 Aug 03 15:02:47 kvm_master kernel: ? process_one_work+0x430/0x430 Aug 03 15:02:47 kvm_master kernel: ? kthread_create_on_node+0x70/0x70 Aug 03 15:02:47 kvm_master kernel: ret_from_fork+0x25/0x30 Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00 00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c 89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5 53 89 d3 e8 ff 4a fe Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]--- This would typically take 36-48 hours to surface, so we're good so far, but not completely out of the woods yet. I'm optimistic that since this patchset changes the mmu_notifier behavior to something safer in general, this issue will also be resolved by it. Jeff > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? > ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din > ⠈⠳⣄⠀⠀⠀⠀ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-09-01 14:47 ` Jeff Cook @ 2017-09-01 14:50 ` taskboxtester 0 siblings, 0 replies; 17+ messages in thread From: taskboxtester @ 2017-09-01 14:50 UTC (permalink / raw) To: Jeff Cook Cc: open list:AMD IOMMU (AMD-VI), Paolo Bonzini, Dan Williams, linux-rdma, Jack Steiner, Dimitri Sivanich, Bernhard Held, Andrew Morton, Radim Krčmář, amd-gfx, DRI, xen-devel, Joerg Roedel, Jerome Glisse, ppc-dev, Linus Torvalds, Kirill A . Shutemov, Andrea Arcangeli, Sudeep Dutt, KVM list, Adam Borowski [-- Attachment #1: Type: text/plain, Size: 7403 bytes --] taskboxtester@gmail.com liked your message with Boxer for Android. On Sep 1, 2017 10:48 AM, Jeff Cook <jeff@jeffcook.io> wrote: On Wed, Aug 30, 2017, at 10:57 AM, Adam Borowski wrote: > On Tue, Aug 29, 2017 at 08:56:15PM -0400, Jerome Glisse wrote: > > I will wait for people to test and for result of my own test before > > reposting if need be, otherwise i will post as separate patch. > > > > > But from a _very_ quick read-through this looks fine. But it obviously > > > needs testing. > > > > > > People - *especially* the people who saw issues under KVM - can you > > > try out Jérôme's patch-series? I aded some people to the cc, the full > > > series is on lkml. Jérôme - do you have a git branch for people to > > > test that they could easily pull and try out? > > > > https://cgit.freedesktop.org/~glisse/linux mmu-notifier branch > > git://people.freedesktop.org/~glisse/linux > > Tested your branch as of 10f07641, on a long list of guest VMs. > No earth-shattering kaboom. I've been using the mmu_notifier branch @ a3d944233bcf8c for the last 36 hours or so, also without incident. Unlike most other reporters, I experienced a similar splat on 4.12: Aug 03 15:02:47 kvm_master kernel: ------------[ cut here ]------------ Aug 03 15:02:47 kvm_master kernel: WARNING: CPU: 13 PID: 1653 at arch/x86/kvm/mmu.c:682 mmu_spte_clear_track_bits+0xfb/0x100 [kvm] Aug 03 15:02:47 kvm_master kernel: Modules linked in: vhost_net vhost tap xt_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_tcpudp tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter msr nls_iso8859_1 nls_cp437 intel_rapl ipt_ MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel input_leds pcbc aesni_intel led_class aes_x86_6 4 mxm_wmi crypto_simd glue_helper uvcvideo cryptd videobuf2_vmalloc videobuf2_memops igb videobuf2_v4l2 videobuf2_core snd_usb_audio videodev media joydev ptp evdev mousedev intel_cstate pps_core mac_hid intel_rapl_perf snd_hda_intel snd_virtuoso snd_usbmidi_lib snd_hda_codec snd_oxygen_lib snd_hda_core Aug 03 15:02:47 kvm_master kernel: snd_mpu401_uart snd_rawmidi snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore i2c_algo_bit pcspkr i2c_i801 lpc_ich ioatdma shpchp dca wmi acpi_power_meter tpm_tis tpm_tis_core tpm button bridge stp llc sch_fq_codel virtio_pci virtio_blk virtio_balloon virtio_net virtio_ring virtio kvm_intel kvm sg ip_tables x_tables hid_logitech_hidpp hid_logitech_dj hid_generic hid_microsoft usbhid hid sr_mod cdrom sd_mod xhci_pci ahci libahci xhci_hcd libata usbcore scsi_mod usb_common zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio vfat fat ext4 crc16 jbd2 fscrypto mbcache dm_thin_pool dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_raid raid456 libcrc32c Aug 03 15:02:47 kvm_master kernel: crc32c_generic crc32c_intel async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_mod dax raid1 md_mod Aug 03 15:02:47 kvm_master kernel: CPU: 13 PID: 1653 Comm: kworker/13:2 Tainted: P B D W O 4.12.3-1-ARCH #1 Aug 03 15:02:47 kvm_master kernel: Hardware name: Supermicro SYS-7038A-I/X10DAI, BIOS 2.0a 11/09/2016 Aug 03 15:02:47 kvm_master kernel: Workqueue: events mmput_async_fn Aug 03 15:02:47 kvm_master kernel: task: ffff9fa89751b900 task.stack: ffffc179880d8000 Aug 03 15:02:47 kvm_master kernel: RIP: 0010:mmu_spte_clear_track_bits+0xfb/0x100 [kvm] Aug 03 15:02:47 kvm_master kernel: RSP: 0018:ffffc179880dbc20 EFLAGS: 00010246 Aug 03 15:02:47 kvm_master kernel: RAX: 0000000000000000 RBX: 00000009c07cce77 RCX: dead0000000000ff Aug 03 15:02:47 kvm_master kernel: RDX: 0000000000000000 RSI: ffff9fa82d6d6f08 RDI: fffff6e76701f300 Aug 03 15:02:47 kvm_master kernel: RBP: ffffc179880dbc38 R08: 0000000000100000 R09: 000000000000000d Aug 03 15:02:47 kvm_master kernel: R10: ffff9fa0a56b0008 R11: ffff9fa0a56b0000 R12: 00000000009c07cc Aug 03 15:02:47 kvm_master kernel: R13: ffff9fa88b990000 R14: ffff9f9e19dbb1b8 R15: 0000000000000000 Aug 03 15:02:47 kvm_master kernel: FS: 0000000000000000(0000) GS:ffff9fac5f340000(0000) knlGS:0000000000000000 Aug 03 15:02:47 kvm_master kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 03 15:02:47 kvm_master kernel: CR2: ffffd1b542d71000 CR3: 0000000570a09000 CR4: 00000000003426e0 Aug 03 15:02:47 kvm_master kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Aug 03 15:02:47 kvm_master kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Aug 03 15:02:47 kvm_master kernel: Call Trace: Aug 03 15:02:47 kvm_master kernel: drop_spte+0x1a/0xb0 [kvm] Aug 03 15:02:47 kvm_master kernel: mmu_page_zap_pte+0x9c/0xe0 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_prepare_zap_page+0x65/0x310 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_arch_flush_shadow_all+0xe/0x10 [kvm] Aug 03 15:02:47 kvm_master kernel: kvm_mmu_notifier_release+0x2c/0x40 [kvm] Aug 03 15:02:47 kvm_master kernel: __mmu_notifier_release+0x44/0xc0 Aug 03 15:02:47 kvm_master kernel: exit_mmap+0x142/0x150 Aug 03 15:02:47 kvm_master kernel: ? kfree+0x175/0x190 Aug 03 15:02:47 kvm_master kernel: ? kfree+0x175/0x190 Aug 03 15:02:47 kvm_master kernel: ? exit_aio+0xc6/0x100 Aug 03 15:02:47 kvm_master kernel: mmput_async_fn+0x4c/0x130 Aug 03 15:02:47 kvm_master kernel: process_one_work+0x1de/0x430 Aug 03 15:02:47 kvm_master kernel: worker_thread+0x47/0x3f0 Aug 03 15:02:47 kvm_master kernel: kthread+0x125/0x140 Aug 03 15:02:47 kvm_master kernel: ? process_one_work+0x430/0x430 Aug 03 15:02:47 kvm_master kernel: ? kthread_create_on_node+0x70/0x70 Aug 03 15:02:47 kvm_master kernel: ret_from_fork+0x25/0x30 Aug 03 15:02:47 kvm_master kernel: Code: ec 75 04 00 48 b8 00 00 00 00 00 00 00 40 48 21 da 48 39 c2 0f 95 c0 eb b2 48 d1 eb 83 e3 01 eb c0 4c 89 e7 e8 f7 3d fe ff eb a4 <0f> ff eb 8a 90 0f 1f 44 00 00 55 48 89 e5 53 89 d3 e8 ff 4a fe Aug 03 15:02:47 kvm_master kernel: ---[ end trace 8710f4d700a7d36e ]--- This would typically take 36-48 hours to surface, so we're good so far, but not completely out of the woods yet. I'm optimistic that since this patchset changes the mmu_notifier behavior to something safer in general, this issue will also be resolved by it. Jeff > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? > ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din > ⠈⠳⣄⠀⠀⠀⠀ [-- Attachment #2: Type: text/html, Size: 12378 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 08/13] iommu/intel: update to new mmu_notifier semantic 2017-08-29 23:54 [PATCH 00/13] mmu_notifier kill invalidate_page callback Jérôme Glisse [not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-08-29 23:54 ` Jérôme Glisse [not found] ` <20170829235447.10050-3-jglisse@redhat.com> 2017-11-30 9:33 ` BSOD with [PATCH 00/13] mmu_notifier kill invalidate_page callback Fabian Grünbichler 3 siblings, 0 replies; 17+ messages in thread From: Jérôme Glisse @ 2017-08-29 23:54 UTC (permalink / raw) To: linux-kernel, linux-mm Cc: Jérôme Glisse, David Woodhouse, iommu, Joerg Roedel, Kirill A . Shutemov, Andrew Morton, Linus Torvalds, Andrea Arcangeli Call to mmu_notifier_invalidate_page() are replaced by call to mmu_notifier_invalidate_range() and thus call are bracketed by call to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> --- drivers/iommu/intel-svm.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index f167c0d84ebf..f620dccec8ee 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -223,14 +223,6 @@ static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm, intel_flush_svm_range(svm, address, 1, 1, 0); } -static void intel_invalidate_page(struct mmu_notifier *mn, struct mm_struct *mm, - unsigned long address) -{ - struct intel_svm *svm = container_of(mn, struct intel_svm, notifier); - - intel_flush_svm_range(svm, address, 1, 1, 0); -} - /* Pages have been freed at this point */ static void intel_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm, @@ -285,7 +277,6 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) static const struct mmu_notifier_ops intel_mmuops = { .release = intel_mm_release, .change_pte = intel_change_pte, - .invalidate_page = intel_invalidate_page, .invalidate_range = intel_invalidate_range, }; -- 2.13.5 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <20170829235447.10050-3-jglisse@redhat.com>]
[parent not found: <6D58FBE4-5D03-49CC-AAFF-3C1279A5A849@gmail.com>]
[parent not found: <20170830172747.GE13559@redhat.com>]
[parent not found: <003685D9-4DA9-42DC-AF46-7A9F8A43E61F@gmail.com>]
[parent not found: <20170830212514.GI13559@redhat.com>]
* Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic [not found] ` <20170830212514.GI13559@redhat.com> @ 2017-08-30 23:25 ` Nadav Amit 2017-08-31 0:47 ` Jerome Glisse 0 siblings, 1 reply; 17+ messages in thread From: Nadav Amit @ 2017-08-30 23:25 UTC (permalink / raw) To: Andrea Arcangeli Cc: Jérôme Glisse, Linux Kernel Mailing List, open list:MEMORY MANAGEMENT, Dan Williams, Ross Zwisler, Linus Torvalds, Bernhard Held, Adam Borowski, Radim Krčmář, Wanpeng Li, Paolo Bonzini, Takashi Iwai, Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton, Joerg Roedel, iommu [cc’ing IOMMU people, which for some reason are not cc’d] Andrea Arcangeli <aarcange@redhat.com> wrote: > On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote: >> It is not trivial to flush TLBs (primary or secondary) without holding the >> page-table lock, and as we recently encountered this resulted in several >> bugs [1]. The main problem is that even if you perform the TLB flush >> immediately after the PT-lock is released, you cause a situation in which >> other threads may make decisions based on the updated PTE value, without >> being aware that a TLB flush is needed. >> >> For example, we recently encountered a Linux bug when two threads run >> MADV_DONTNEED concurrently on the same address range [2]. One of the threads >> may update a PTE, setting it as non-present, and then deferring the TLB >> flush (for batching). As a result, however, it would cause the second >> thread, which also changes the PTEs to assume that the PTE is already >> non-present and TLB flush is not necessary. As a result the second core may >> still hold stale PTEs in its TLB following MADV_DONTNEED. > > The source of those complex races that requires taking into account > nested tlb gather to solve it, originates from skipping primary MMU > tlb flushes depending on the value of the pagetables (as an > optimization). > > For mmu_notifier_invalidate_range_end we always ignore the value of > the pagetables and mmu_notifier_invalidate_range_end always runs > unconditionally invalidating the secondary MMU for the whole range > under consideration. There are no optimizations that attempts to skip > mmu_notifier_invalidate_range_end depending on the pagetable value and > there's no TLB gather for secondary MMUs either. That is to keep it > simple of course. > > As long as those mmu_notifier_invalidate_range_end stay unconditional, > I don't see how those races you linked, can be possibly relevant in > evaluating if ->invalidate_range (again only for iommuv2 and > intel-svm) has to run inside the PT lock or not. Thanks for the clarifications. It now makes much more sense. > >> There is a swarm of such problems, and some are not too trivial. Deferring >> TLB flushes needs to be done in a very careful manner. > > I agree it's not trivial, but I don't think any complexity comes from > above. > > The only complexity is about, what if the page is copied to some other > page and replaced, because the copy is the only case where coherency > could be retained by the primary MMU. What if the primary MMU starts > working on the new page in between PT lock release and > mmu_notifier_invalidate_range_end, while the secondary MMU is stuck on > the old page? That is the only problem we deal with here, the copy to > other page and replace. Any other case that doesn't involve the copy > seems non coherent by definition, and you couldn't measure it. > > I can't think of a scenario that requires the explicit > mmu_notifier_invalidate_range call before releasing the PT lock, at > least for try_to_unmap_one. > > Could you think of a scenario where calling ->invalidate_range inside > mmu_notifier_invalidate_range_end post PT lock breaks iommuv2 or > intel-svm? Those two are the only ones requiring > ->invalidate_range calls, all other mmu notifier users are safe > without running mmu_notifier_invalidate_range_end under PT lock thanks > to mmu_notifier_invalidate_range_start blocking the secondary MMU. > > Could you post a tangible scenario that invalidates my theory that > those mmu_notifier_invalidate_range calls inside PT lock would be > superfluous? > > Some of the scenarios under consideration: > > 1) migration entry -> migration_entry_wait -> page lock, plus > migrate_pages taking the lock so it can't race with try_to_unmap > from other places > 2) swap entry -> lookup_swap_cache -> page lock (page not really replaced) > 3) CoW -> do_wp_page -> page lock on old page > 4) KSM -> replace_page -> page lock on old page > 5) if the pte is zapped as in MADV_DONTNEED, no coherency possible so > it's not measurable that we let the guest run a bit longer on the > old page past PT lock release For both CoW and KSM, the correctness is maintained by calling ptep_clear_flush_notify(). If you defer the secondary MMU invalidation (i.e., replacing ptep_clear_flush_notify() with ptep_clear_flush() ), you will cause memory corruption, and page-lock would not be enough. BTW: I see some calls to ptep_clear_flush_notify() which are followed immediately after by set_pte_at_notify(). I do not understand why it makes sense, as both notifications end up doing the same thing - invalidating the secondary MMU. The set_pte_at_notify() in these cases can be changed to set_pte(). No? > If you could post a multi CPU trace that shows how iommuv2 or > intel-svm are broken if ->invalidate_range is run inside > mmu_notifier_invalidate_range_end post PT lock in try_to_unmap_one it > would help. > > Of course if we run mmu_notifier_invalidate_range inside PT lock and > we remove ->invalidate_range from mmu_notifier_invalidate_range_stop > all will be obviously safe, so we could just do it to avoid thinking > about the above, but the resulting code will be uglier and less > optimal (even when disarmed there will be dummy branches I wouldn't > love) and I currently can't see why it wouldn't be safe. > > Replacing a page completely without any relation to the old page > content allows no coherency anyway, so even if it breaks you cannot > measure it because it's undefined. > > If the page has a relation with the old contents and coherency has to > be provided for both primary MMU and secondary MMUs, this relation > between old and new page during the replacement, is enforced by some > other mean besides the PT lock, migration entry on locked old page > with migration_entry_wait and page lock in migrate_pages etc.. I think that basically you are correct, and assuming that you always notify/invalidate unconditionally any PTE range you read/write, you are safe. Yet, I want to have another look at the code. Anyhow, just deferring all the TLB flushes, including those of set_pte_at_notify(), is likely to result in errors. Regards, Nadav ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic 2017-08-30 23:25 ` [PATCH 02/13] mm/rmap: " Nadav Amit @ 2017-08-31 0:47 ` Jerome Glisse [not found] ` <20170831004719.GF9445-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Jerome Glisse @ 2017-08-31 0:47 UTC (permalink / raw) To: Nadav Amit Cc: Andrea Arcangeli, Linux Kernel Mailing List, open list:MEMORY MANAGEMENT, Dan Williams, Ross Zwisler, Linus Torvalds, Bernhard Held, Adam Borowski, Radim Krčmář, Wanpeng Li, Paolo Bonzini, Takashi Iwai, Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton, Joerg Roedel, iommu On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote: > [cc’ing IOMMU people, which for some reason are not cc’d] > > Andrea Arcangeli <aarcange@redhat.com> wrote: > > > On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote: > >> It is not trivial to flush TLBs (primary or secondary) without holding the > >> page-table lock, and as we recently encountered this resulted in several > >> bugs [1]. The main problem is that even if you perform the TLB flush > >> immediately after the PT-lock is released, you cause a situation in which > >> other threads may make decisions based on the updated PTE value, without > >> being aware that a TLB flush is needed. > >> > >> For example, we recently encountered a Linux bug when two threads run > >> MADV_DONTNEED concurrently on the same address range [2]. One of the threads > >> may update a PTE, setting it as non-present, and then deferring the TLB > >> flush (for batching). As a result, however, it would cause the second > >> thread, which also changes the PTEs to assume that the PTE is already > >> non-present and TLB flush is not necessary. As a result the second core may > >> still hold stale PTEs in its TLB following MADV_DONTNEED. > > > > The source of those complex races that requires taking into account > > nested tlb gather to solve it, originates from skipping primary MMU > > tlb flushes depending on the value of the pagetables (as an > > optimization). > > > > For mmu_notifier_invalidate_range_end we always ignore the value of > > the pagetables and mmu_notifier_invalidate_range_end always runs > > unconditionally invalidating the secondary MMU for the whole range > > under consideration. There are no optimizations that attempts to skip > > mmu_notifier_invalidate_range_end depending on the pagetable value and > > there's no TLB gather for secondary MMUs either. That is to keep it > > simple of course. > > > > As long as those mmu_notifier_invalidate_range_end stay unconditional, > > I don't see how those races you linked, can be possibly relevant in > > evaluating if ->invalidate_range (again only for iommuv2 and > > intel-svm) has to run inside the PT lock or not. > > Thanks for the clarifications. It now makes much more sense. > > > > >> There is a swarm of such problems, and some are not too trivial. Deferring > >> TLB flushes needs to be done in a very careful manner. > > > > I agree it's not trivial, but I don't think any complexity comes from > > above. > > > > The only complexity is about, what if the page is copied to some other > > page and replaced, because the copy is the only case where coherency > > could be retained by the primary MMU. What if the primary MMU starts > > working on the new page in between PT lock release and > > mmu_notifier_invalidate_range_end, while the secondary MMU is stuck on > > the old page? That is the only problem we deal with here, the copy to > > other page and replace. Any other case that doesn't involve the copy > > seems non coherent by definition, and you couldn't measure it. > > > > I can't think of a scenario that requires the explicit > > mmu_notifier_invalidate_range call before releasing the PT lock, at > > least for try_to_unmap_one. > > > > Could you think of a scenario where calling ->invalidate_range inside > > mmu_notifier_invalidate_range_end post PT lock breaks iommuv2 or > > intel-svm? Those two are the only ones requiring > > ->invalidate_range calls, all other mmu notifier users are safe > > without running mmu_notifier_invalidate_range_end under PT lock thanks > > to mmu_notifier_invalidate_range_start blocking the secondary MMU. > > > > Could you post a tangible scenario that invalidates my theory that > > those mmu_notifier_invalidate_range calls inside PT lock would be > > superfluous? > > > > Some of the scenarios under consideration: > > > > 1) migration entry -> migration_entry_wait -> page lock, plus > > migrate_pages taking the lock so it can't race with try_to_unmap > > from other places > > 2) swap entry -> lookup_swap_cache -> page lock (page not really replaced) > > 3) CoW -> do_wp_page -> page lock on old page > > 4) KSM -> replace_page -> page lock on old page > > 5) if the pte is zapped as in MADV_DONTNEED, no coherency possible so > > it's not measurable that we let the guest run a bit longer on the > > old page past PT lock release > > For both CoW and KSM, the correctness is maintained by calling > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation > (i.e., replacing ptep_clear_flush_notify() with ptep_clear_flush() ), you > will cause memory corruption, and page-lock would not be enough. Just to add up, the IOMMU have its own CPU page table walker and it can walk the page table at any time (not the page table current to current CPU, IOMMU have an array that match a PASID with a page table and device request translation for a given virtual address against a PASID). So this means the following can happen with ptep_clear_flush() only: CPU | IOMMU | - walk page table populate tlb at addr A - clear pte at addr A | - set new pte | Device is using old page and CPU new page :( But with ptep_clear_flush_notify() CPU | IOMMU | - walk page table populate tlb at addr A - clear pte at addr A | - notify -> invalidate_range | > flush IOMMU/device tlb - set new pte | So now either the IOMMU see the empty pte and trigger a page fault (this is if there is a racing IOMMU ATS right after the IOMMU/device tlb flush but before setting the new pte) or it see the new pte. Either way both IOMMU and CPU have a coherent view of what a virtual address points to. > > BTW: I see some calls to ptep_clear_flush_notify() which are followed > immediately after by set_pte_at_notify(). I do not understand why it makes > sense, as both notifications end up doing the same thing - invalidating the > secondary MMU. The set_pte_at_notify() in these cases can be changed to > set_pte(). No? Andrea explained to me the historical reasons set_pte_at_notify call the change_pte callback and it was intended so that KVM could update the secondary page table directly without having to fault. It is now a pointless optimization as the call to range_start() happening in all the places before any set_pte_at_notify() invalidate the secondary page table and thus will lead to page fault for the vm. I have talk with Andrea on way to bring back this optimization. > > If you could post a multi CPU trace that shows how iommuv2 or > > intel-svm are broken if ->invalidate_range is run inside > > mmu_notifier_invalidate_range_end post PT lock in try_to_unmap_one it > > would help. > > > > Of course if we run mmu_notifier_invalidate_range inside PT lock and > > we remove ->invalidate_range from mmu_notifier_invalidate_range_stop > > all will be obviously safe, so we could just do it to avoid thinking > > about the above, but the resulting code will be uglier and less > > optimal (even when disarmed there will be dummy branches I wouldn't > > love) and I currently can't see why it wouldn't be safe. > > > > Replacing a page completely without any relation to the old page > > content allows no coherency anyway, so even if it breaks you cannot > > measure it because it's undefined. > > > > If the page has a relation with the old contents and coherency has to > > be provided for both primary MMU and secondary MMUs, this relation > > between old and new page during the replacement, is enforced by some > > other mean besides the PT lock, migration entry on locked old page > > with migration_entry_wait and page lock in migrate_pages etc.. > > I think that basically you are correct, and assuming that you always > notify/invalidate unconditionally any PTE range you read/write, you are > safe. Yet, I want to have another look at the code. Anyhow, just deferring > all the TLB flushes, including those of set_pte_at_notify(), is likely to > result in errors. Yes we need the following sequence for IOMMU: - clear pte - invalidate IOMMU/device TLB - set new pte Otherwise the IOMMU page table walker can populate IOMMU/device tlb with stall entry. Note that this is not necessary for all the case. For try_to_unmap it is fine for instance to move the IOMMU tlb shoot down after changing the CPU page table as we are not pointing the pte to a different page. Either we clear the pte or we set a swap entry and as long as the page that use to be pointed by the pte is not free before the IOMMU tlb flush then we are fine. In fact i think the only case where we need the above sequence (clear, flush secondary tlb, set new pte) is for COW. I think all other cases we can get rid of invalidate_range() from inside the page table lock and rely on invalidate_range_end() to call unconditionaly. I might ponder on all this for more cleanup for mmu_notifier as i have some optimization that i have line up for it but this is next cycle material. For 4.13 i believe the current patchset is the safest way to go. Jérôme -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20170831004719.GF9445-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic [not found] ` <20170831004719.GF9445-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-08-31 17:12 ` Andrea Arcangeli 2017-08-31 19:15 ` Nadav Amit 0 siblings, 1 reply; 17+ messages in thread From: Andrea Arcangeli @ 2017-08-31 17:12 UTC (permalink / raw) To: Jerome Glisse Cc: Bernhard Held, Adam Borowski, Andrew Morton, Radim Krčmář, Takashi Iwai, Mike Galbraith, Linux Kernel Mailing List, open list:MEMORY MANAGEMENT, iommu, Wanpeng Li, Paolo Bonzini, Dan Williams, Linus Torvalds, Ross Zwisler, Kirill A . Shutemov, axie On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote: > On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote: > > For both CoW and KSM, the correctness is maintained by calling > > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation > > (i.e., replacing ptep_clear_flush_notify() with ptep_clear_flush() ), you > > will cause memory corruption, and page-lock would not be enough. > > Just to add up, the IOMMU have its own CPU page table walker and it can > walk the page table at any time (not the page table current to current > CPU, IOMMU have an array that match a PASID with a page table and device > request translation for a given virtual address against a PASID). > > So this means the following can happen with ptep_clear_flush() only: > > CPU | IOMMU > | - walk page table populate tlb at addr A > - clear pte at addr A | > - set new pte | mmu_notifier_invalidate_range_end | -flush IOMMU/device tlb > > Device is using old page and CPU new page :( That condition won't be persistent. > > But with ptep_clear_flush_notify() > > CPU | IOMMU > | - walk page table populate tlb at addr A > - clear pte at addr A | > - notify -> invalidate_range | > flush IOMMU/device tlb > - set new pte | > > So now either the IOMMU see the empty pte and trigger a page fault (this is > if there is a racing IOMMU ATS right after the IOMMU/device tlb flush but > before setting the new pte) or it see the new pte. Either way both IOMMU > and CPU have a coherent view of what a virtual address points to. Sure, the _notify version is obviously safe. > Andrea explained to me the historical reasons set_pte_at_notify call the > change_pte callback and it was intended so that KVM could update the > secondary page table directly without having to fault. It is now a pointless > optimization as the call to range_start() happening in all the places before > any set_pte_at_notify() invalidate the secondary page table and thus will > lead to page fault for the vm. I have talk with Andrea on way to bring back > this optimization. Yes, we known for a long time, the optimization got basically disabled when range_start/end expanded. It'd be nice to optimize change_pte again but this is for later. > Yes we need the following sequence for IOMMU: > - clear pte > - invalidate IOMMU/device TLB > - set new pte > > Otherwise the IOMMU page table walker can populate IOMMU/device tlb with > stall entry. > > Note that this is not necessary for all the case. For try_to_unmap it > is fine for instance to move the IOMMU tlb shoot down after changing the > CPU page table as we are not pointing the pte to a different page. Either > we clear the pte or we set a swap entry and as long as the page that use > to be pointed by the pte is not free before the IOMMU tlb flush then we > are fine. > > In fact i think the only case where we need the above sequence (clear, > flush secondary tlb, set new pte) is for COW. I think all other cases > we can get rid of invalidate_range() from inside the page table lock > and rely on invalidate_range_end() to call unconditionaly. Even with CoW, it's not big issue if the IOMMU keeps reading from the old page for a little while longer (in between PT lock release and mmu_notifier_invalidate_range_end). How can you tell you read the old data a bit longer, because it noticed the new page only when mmu_notifier_invalidate_range_end run, and not because the CPU was faster at writing than the IOMMU was fast at loading the new pagetable? I figure it would be detectable only that the CPU could see pageA changing before pageB. The iommu-v2 could see pageB changing before pageA. If that's a concern that is the only good reason I can think of right now, for requiring ->invalidate_page inside the CoW PT lock to enforce the same ordering. However if the modifications happens simultaneously and it's a correct runtime, the order must not matter, but still it would be detectable which may not be desirable. Currently the _notify is absolutely needed to run ->invalidate_range before put_page is run in the CoW code below, because of the put_page that is done in a not scalable place, it would be better moved down regardless of ->invalidate_range to reduce the size of the PT lock protected critical section. - if (new_page) - put_page(new_page); pte_unmap_unlock(vmf->pte, vmf->ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); + if (new_page) + put_page(new_page); Of course the iommu will not immediately start reading from the new page, but even if it triggers a write protect fault and calls handle_mm_fault, it will find a writeable pte already there and it'll get an ->invalidate_range as soon as mmu_notifier_invalidate_range_end runs so it can sure notice the new page. Now write_protect_page in KSM... What bad can happen if iommu keeps writing to the write protected page, for a little while longer? As long as nothing writes to the page anymore by the time write_protect_page() returns, pages_identical will work. How do you know the IOMMU was just a bit faster and wrote a few more bytes and it wasn't mmu_notifier_invalidate_range_end that run a bit later after dropping the PT lock? Now replace_page in KSM... ptep_clear_flush_notify(vma, addr, ptep); set_pte_at_notify(mm, addr, ptep, newpte); page_remove_rmap(page, false); if (!page_mapped(page)) try_to_free_swap(page); - put_page(page); pte_unmap_unlock(ptep, ptl); err = 0; out_mn: mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); out: + put_page(page); /* TODO: broken of course, fix error cases */ return err; } If we free the old page after mmu_notifier_invalidate_range_end (fixing up the error case, the above change ignores the error paths), the content of the old and new page are identical for replace_page. Even if new page takes immediately a COW, how do you know the IOMMU was just a bit slower and read the old page content pre-COW? They're guaranteed identical and both readonly already at that point. All the above considered, if this is getting way too complex, it may be preferable to keep things obviously safe and always run mmu_notifier_invalidate_range inside the PT lock and be done with it, and remove the ->invalidate_range from mmu_notifier_invalidate_range_end too to avoid the double invalidates for the secondary MMUs with hardware pagetable walkers and shared pagetables with the primary MMU. In principle the primary reason for doing _notify or explicit mmu_notifier_invalidate_range() is to keep things simpler and to avoid having to care where pages exactly gets freed (i.e. before or after mmu_notifier_invalidate_range_end). For example zap_page_range tlb gather freeing strictly requires an explicit mmu_notifier_invalidate_range before the page is actually freed (because the tlb gather will free the pages well before mmu_notifier_invalidate_range_end can run). The concern that an ->invalidate_range is always needed before PT lock is released if the primary TLB was flushed inside PT lock, is a more recent concern and it looks like to me it's not always needed but perhaps only in some case. An example where the ->invalidate_range inside mmu_notifier_invalidate_range_end pays off, is madvise_free_pte_range. That doesn't flush the TLB before setting the pagetable clean. So the primary MMU can still write through the dirty primary TLB without setting the dirty/accessed bit after madvise_free_pte_range returns. ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); ptent = pte_mkold(ptent); ptent = pte_mkclean(ptent); set_pte_at(mm, addr, pte, ptent); tlb_remove_tlb_entry(tlb, pte, addr); Not even the primary TLB is flushed here. All concurrent writes of the primary MMU can still go lost while MADV_FREE runs. All that is guaranteed is that after madvise MADV_FREE syscall returns to userland, the new writes will stick, so then it's also enough to call ->invalidate_range inside the single mmu_notifier_invalidate_range_end: madvise_free_page_range(&tlb, vma, start, end); mmu_notifier_invalidate_range_end(mm, start, end); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This why we got both _notify and mmu_notifier_invalidate_range_end. If we remove ->invalidate_range from mmu_notifier_invalidate_range_end, we'll have to add a mmu_notifier_invalidate_range in places like above (just before or just after mmu_notifier_invalidate_range_end above). So with ptep_clear_flush_notify that avoids any issue with page freeing in places like CoW, and the explicit mmu_notifier_invalidate_range in the tlb gather, the rest got covered automatically by mmu_notifier_invalidate_range_end. And again this is only started to be needed when we added support for hardware pagetable walkers that cannot stop the pagetable walking (unless they break the sharing of the pagetable with the primary MMU which of course is not desirable and it would cause unnecessary overhead). The current ->invalidate_range handling however results in double calls here and there when armed, but it reduces the number of explicit hooks required in the common code and it keeps the mmu_notifier code less intrusive and more optimal when disarmed (but less optimal when armed). So the current state is a reasonable tradeoff, but there's room for optimization. > I might ponder on all this for more cleanup for mmu_notifier as i have > some optimization that i have line up for it but this is next cycle > material. For 4.13 i believe the current patchset is the safest way > to go. Yes, lots of material above for pondering, but the current status is fine. Did you look in reducing the range flushed in try_to_unmap_once to PAGE_SIZE << compound_order(page)? That looked a straightforward optimization possible that won't add any complexity and it would avoid to worsen the granularity of the invalidates during try_to_unmap compared to the previous code. Thanks, Andrea ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic 2017-08-31 17:12 ` Andrea Arcangeli @ 2017-08-31 19:15 ` Nadav Amit 0 siblings, 0 replies; 17+ messages in thread From: Nadav Amit @ 2017-08-31 19:15 UTC (permalink / raw) To: Andrea Arcangeli Cc: Jerome Glisse, Linux Kernel Mailing List, open list:MEMORY MANAGEMENT, Dan Williams, Ross Zwisler, Linus Torvalds, Bernhard Held, Adam Borowski, Radim Krčmář, Wanpeng Li, Paolo Bonzini, Takashi Iwai, Mike Galbraith, Kirill A . Shutemov, axie, Andrew Morton, Joerg Roedel, iommu Andrea Arcangeli <aarcange@redhat.com> wrote: > On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote: >> On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote: >>> For both CoW and KSM, the correctness is maintained by calling >>> ptep_clear_flush_notify(). If you defer the secondary MMU invalidation >>> (i.e., replacing ptep_clear_flush_notify() with ptep_clear_flush() ), you >>> will cause memory corruption, and page-lock would not be enough. >> >> Just to add up, the IOMMU have its own CPU page table walker and it can >> walk the page table at any time (not the page table current to current >> CPU, IOMMU have an array that match a PASID with a page table and device >> request translation for a given virtual address against a PASID). >> >> So this means the following can happen with ptep_clear_flush() only: >> >> CPU | IOMMU >> | - walk page table populate tlb at addr A >> - clear pte at addr A | >> - set new pte | > mmu_notifier_invalidate_range_end | -flush IOMMU/device tlb > >> Device is using old page and CPU new page :( > > That condition won't be persistent. > >> But with ptep_clear_flush_notify() >> >> CPU | IOMMU >> | - walk page table populate tlb at addr A >> - clear pte at addr A | >> - notify -> invalidate_range | > flush IOMMU/device tlb >> - set new pte | >> >> So now either the IOMMU see the empty pte and trigger a page fault (this is >> if there is a racing IOMMU ATS right after the IOMMU/device tlb flush but >> before setting the new pte) or it see the new pte. Either way both IOMMU >> and CPU have a coherent view of what a virtual address points to. > > Sure, the _notify version is obviously safe. > >> Andrea explained to me the historical reasons set_pte_at_notify call the >> change_pte callback and it was intended so that KVM could update the >> secondary page table directly without having to fault. It is now a pointless >> optimization as the call to range_start() happening in all the places before >> any set_pte_at_notify() invalidate the secondary page table and thus will >> lead to page fault for the vm. I have talk with Andrea on way to bring back >> this optimization. > > Yes, we known for a long time, the optimization got basically disabled > when range_start/end expanded. It'd be nice to optimize change_pte > again but this is for later. > >> Yes we need the following sequence for IOMMU: >> - clear pte >> - invalidate IOMMU/device TLB >> - set new pte >> >> Otherwise the IOMMU page table walker can populate IOMMU/device tlb with >> stall entry. >> >> Note that this is not necessary for all the case. For try_to_unmap it >> is fine for instance to move the IOMMU tlb shoot down after changing the >> CPU page table as we are not pointing the pte to a different page. Either >> we clear the pte or we set a swap entry and as long as the page that use >> to be pointed by the pte is not free before the IOMMU tlb flush then we >> are fine. >> >> In fact i think the only case where we need the above sequence (clear, >> flush secondary tlb, set new pte) is for COW. I think all other cases >> we can get rid of invalidate_range() from inside the page table lock >> and rely on invalidate_range_end() to call unconditionaly. > > Even with CoW, it's not big issue if the IOMMU keeps reading from the > old page for a little while longer (in between PT lock release and > mmu_notifier_invalidate_range_end). > > How can you tell you read the old data a bit longer, because it > noticed the new page only when mmu_notifier_invalidate_range_end run, > and not because the CPU was faster at writing than the IOMMU was fast > at loading the new pagetable? > > I figure it would be detectable only that the CPU could see pageA > changing before pageB. The iommu-v2 could see pageB changing before > pageA. If that's a concern that is the only good reason I can think of > right now, for requiring ->invalidate_page inside the CoW PT lock to > enforce the same ordering. However if the modifications happens > simultaneously and it's a correct runtime, the order must not matter, > but still it would be detectable which may not be desirable. I don’t know what is the memory model that SVM provides, but what you describe here potentially violates it. I don’t think user software should be expected to deal with it. > > Currently the _notify is absolutely needed to run ->invalidate_range > before put_page is run in the CoW code below, because of the put_page > that is done in a not scalable place, it would be better moved down > regardless of ->invalidate_range to reduce the size of the PT lock > protected critical section. > > - if (new_page) > - put_page(new_page); > > pte_unmap_unlock(vmf->pte, vmf->ptl); > mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); > + if (new_page) > + put_page(new_page); > > Of course the iommu will not immediately start reading from the new > page, but even if it triggers a write protect fault and calls > handle_mm_fault, it will find a writeable pte already there and it'll > get an ->invalidate_range as soon as mmu_notifier_invalidate_range_end > runs so it can sure notice the new page. > > Now write_protect_page in KSM... > > What bad can happen if iommu keeps writing to the write protected > page, for a little while longer? As long as nothing writes to the page > anymore by the time write_protect_page() returns, pages_identical will > work. How do you know the IOMMU was just a bit faster and wrote a few > more bytes and it wasn't mmu_notifier_invalidate_range_end that run a > bit later after dropping the PT lock? I guess it should be ok in this case. > > Now replace_page in KSM... > > ptep_clear_flush_notify(vma, addr, ptep); > set_pte_at_notify(mm, addr, ptep, newpte); > > page_remove_rmap(page, false); > if (!page_mapped(page)) > try_to_free_swap(page); > - put_page(page); > > pte_unmap_unlock(ptep, ptl); > err = 0; > out_mn: > mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); > out: > + put_page(page); /* TODO: broken of course, fix error cases */ > return err; > } > > If we free the old page after mmu_notifier_invalidate_range_end > (fixing up the error case, the above change ignores the error paths), > the content of the old and new page are identical for replace_page. > > Even if new page takes immediately a COW, how do you know the IOMMU > was just a bit slower and read the old page content pre-COW? They're > guaranteed identical and both readonly already at that point. A potential problem might be that it requires that there would be no promotion of PTE access-rights (e.g., RO->RW), since in these cases there would be no invalidation. I did not find such a code-path, but I might have missed one, or new one can be introduced in the future. For example, someone may optimize KSM not to COW the last reference of a KSM’d upon page-fault. Therefore, such optimizations may require some soft of a checker, or the very least clear documentation when an invalidation is required, especially that it would require invalidation on access-rights promotion, which is currently not needed. > All the above considered, if this is getting way too complex, it may > be preferable to keep things obviously safe and always run > mmu_notifier_invalidate_range inside the PT lock and be done with it, > and remove the ->invalidate_range from > mmu_notifier_invalidate_range_end too to avoid the double invalidates > for the secondary MMUs with hardware pagetable walkers and shared > pagetables with the primary MMU. > > In principle the primary reason for doing _notify or explicit > mmu_notifier_invalidate_range() is to keep things simpler and to avoid > having to care where pages exactly gets freed (i.e. before or after > mmu_notifier_invalidate_range_end). > > For example zap_page_range tlb gather freeing strictly requires an > explicit mmu_notifier_invalidate_range before the page is actually > freed (because the tlb gather will free the pages well before > mmu_notifier_invalidate_range_end can run). > > The concern that an ->invalidate_range is always needed before PT lock > is released if the primary TLB was flushed inside PT lock, is a more > recent concern and it looks like to me it's not always needed but > perhaps only in some case. > > An example where the ->invalidate_range inside > mmu_notifier_invalidate_range_end pays off, is madvise_free_pte_range. > That doesn't flush the TLB before setting the pagetable clean. So the > primary MMU can still write through the dirty primary TLB without > setting the dirty/accessed bit after madvise_free_pte_range returns. > > ptent = ptep_get_and_clear_full(mm, addr, pte, > tlb->fullmm); > > ptent = pte_mkold(ptent); > ptent = pte_mkclean(ptent); > set_pte_at(mm, addr, pte, ptent); > tlb_remove_tlb_entry(tlb, pte, addr); > > Not even the primary TLB is flushed here. All concurrent writes of the > primary MMU can still go lost while MADV_FREE runs. All that is > guaranteed is that after madvise MADV_FREE syscall returns to > userland, the new writes will stick, so then it's also enough to call > ->invalidate_range inside the single > mmu_notifier_invalidate_range_end: > > madvise_free_page_range(&tlb, vma, start, end); > mmu_notifier_invalidate_range_end(mm, start, end); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > This why we got both _notify and mmu_notifier_invalidate_range_end. If > we remove ->invalidate_range from mmu_notifier_invalidate_range_end, > we'll have to add a mmu_notifier_invalidate_range in places like above > (just before or just after mmu_notifier_invalidate_range_end above). > > So with ptep_clear_flush_notify that avoids any issue with page > freeing in places like CoW, and the explicit > mmu_notifier_invalidate_range in the tlb gather, the rest got covered > automatically by mmu_notifier_invalidate_range_end. And again this is > only started to be needed when we added support for hardware pagetable > walkers that cannot stop the pagetable walking (unless they break the > sharing of the pagetable with the primary MMU which of course is not > desirable and it would cause unnecessary overhead). > > The current ->invalidate_range handling however results in double > calls here and there when armed, but it reduces the number of explicit > hooks required in the common code and it keeps the mmu_notifier code > less intrusive and more optimal when disarmed (but less optimal when > armed). So the current state is a reasonable tradeoff, but there's > room for optimization. Everything you say makes sense, but at the end of the day you end up with two different schemes for flushing the primary and secondary TLBs. For example, the primary TLB flush mechanism is much more selective in flushing than the secondary one. The discrepancy between the two surely does not make things simpler. Indeed, there might be different costs and trade-offs in the primary and secondary TLBs (e.g., TLB flush/shootdown time, TLB miss latency, whether an address-space is always loaded), but I have some doubts whether this decision was data driven. Regards, Nadav ^ permalink raw reply [flat|nested] 17+ messages in thread
* BSOD with [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-08-29 23:54 [PATCH 00/13] mmu_notifier kill invalidate_page callback Jérôme Glisse ` (2 preceding siblings ...) [not found] ` <20170829235447.10050-3-jglisse@redhat.com> @ 2017-11-30 9:33 ` Fabian Grünbichler [not found] ` <20171130093320.66cxaoj45g2ttzoh-aVfaTQcAavps8ZkLLAvlZtBPR1lH4CV8@public.gmane.org> 3 siblings, 1 reply; 17+ messages in thread From: Fabian Grünbichler @ 2017-11-30 9:33 UTC (permalink / raw) To: Jérôme Glisse Cc: linux-kernel, linux-mm, Kirill A . Shutemov, Linus Torvalds, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Paolo Bonzini, Radim Krčmář, linuxppc-dev, iommu, xen-devel, kvm On Tue, Aug 29, 2017 at 07:54:34PM -0400, Jérôme Glisse wrote: > (Sorry for so many list cross-posting and big cc) Ditto (trimmed it a bit already, feel free to limit replies as you see fit). > > Please help testing ! > Kernels 4.13 and 4.14 (which both contain these patch series in its final form) are affected by a bug triggering BSOD (CRITICAL_STRUCTURE_CORRUPTION) in Windows 10/2016 VMs in Qemu under certain conditions on certain hardware/microcode versions (see below for details). Testing this proved to be quite cumbersome, as only some systems are affected and it took a while to find a semi-reliable test setup. Some users reported that microcode updates made the problem disappear on some affected systems[1]. Bisecting the 4.13 release cycle first pointed to aac2fea94f7a3df8ad1eeb477eb2643f81fd5393 rmap: do not call mmu_notifier_invalidate_page() under ptl as likely culprit (although it was not possible to bisect exactly down to this commit). It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which the symptoms disappeared until this series was merged, which contains 369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2 We haven't bisected the individual commits of the series yet, but the commit immediately preceding its merge exhibits no problems, while everything after does. It is not known whether the bug is actually in the series itself, or whether increasing the likelihood of triggering it is just a side-effect. There is a similar report[2] concerning an upgrade from 4.12.12 to 4.12.13, which does not contain this series in any form AFAICT but might be worth another look as well. Our test setup consists of the following: CPU: Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz (single socket) Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts microcode: 0x700000e Mainboard: Supermicro X10SDV-6C+-TLN4F RAM: 64G DDR4 [3] Swap: 8G on an LV Qemu: 2.9.1 with some patches on top ([4]) OS: Debian Stretch based (PVE 5.1) KSM is enabled, but turning it off just increases the number of test iterations needed to trigger the bug. Kernel config: [5] VMs: A: Windows 2016 with virtio-blk disks, 6G RAM B: Windows 10 with (virtual) IDE disks, 6G RAM C: Debian Stretch, ~55G RAM fio config: [global] thread=2 runtime=1800 [write1] ioengine=windowsaio sync=0 direct=0 bs=4k size=30G rw=randwrite iodepth=16 [read1] ioengine=windowsaio sync=0 direct=0 bs=4k size=30G rw=randread iodepth=16 Test run: - Start all three VMs - run 'stress-ng --vm-bytes 1G --vm 52 -t 6000' in VM C - wait until swap is (almost) full and KSM starts to merge pages - start fio in VM A and B - stop stress-ng in VM C, power off VM C - run 'swapoff -av' on host - wait until swap content has been swapped in again (this takes a while) - observe BSOD in at least one of A / B around 30% of the time While this test case is pretty artifical, the BSOD issue does affect users in the wild running regular work loads (where it can take from multiple hours up to several days to trigger). We have reverted this patch series in our 4.13 based kernel for now, with positive feedback from users and our own testing. If more detailed traces or data from a test run on an affected system is needed, we will of course provide it. Any further input / pointers are highly appreciated! 1: https://forum.proxmox.com/threads/blue-screen-with-5-1.37664/ 2: http://www.spinics.net/lists/kvm/msg159179.html https://bugs.launchpad.net/qemu/+bug/1728256 https://bugzilla.kernel.org/show_bug.cgi?id=197951 3: http://www.samsung.com/semiconductor/products/dram/server-dram/ddr4-registered-dimm/M393A2K40BB1?ia=2503 5: https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;h=2c516be8e69a033d14809b17e8a661b3808257f7;hb=8d4a2d3f5569817221c19a91f763964c40e00292 6: https://gist.github.com/Fabian-Gruenbichler/5c3af22ac7e6faae46840bdcebd7df14 ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20171130093320.66cxaoj45g2ttzoh-aVfaTQcAavps8ZkLLAvlZtBPR1lH4CV8@public.gmane.org>]
* Re: BSOD with [PATCH 00/13] mmu_notifier kill invalidate_page callback [not found] ` <20171130093320.66cxaoj45g2ttzoh-aVfaTQcAavps8ZkLLAvlZtBPR1lH4CV8@public.gmane.org> @ 2017-11-30 11:20 ` Paolo Bonzini 2017-11-30 16:19 ` Radim Krčmář 0 siblings, 1 reply; 17+ messages in thread From: Paolo Bonzini @ 2017-11-30 11:20 UTC (permalink / raw) To: Jérôme Glisse, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Kirill A . Shutemov, Linus Torvalds, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, Radim Krčmář, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b, kvm-u79uwXL29TY76Z2rM5mHXA On 30/11/2017 10:33, Fabian Grünbichler wrote: > > It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which > the symptoms disappeared until this series was merged, which contains > > 369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2 > > We haven't bisected the individual commits of the series yet, but the > commit immediately preceding its merge exhibits no problems, while > everything after does. It is not known whether the bug is actually in > the series itself, or whether increasing the likelihood of triggering it > is just a side-effect. There is a similar report[2] concerning an > upgrade from 4.12.12 to 4.12.13, which does not contain this series in > any form AFAICT but might be worth another look as well. I know of one issue in this series (invalidate_page was removed from KVM without reimplementing it as invalidate_range). I'll try to prioritize the fix, but I don't think I can do it before Monday. Thanks, Paolo _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: BSOD with [PATCH 00/13] mmu_notifier kill invalidate_page callback 2017-11-30 11:20 ` Paolo Bonzini @ 2017-11-30 16:19 ` Radim Krčmář 0 siblings, 0 replies; 17+ messages in thread From: Radim Krčmář @ 2017-11-30 16:19 UTC (permalink / raw) To: Paolo Bonzini Cc: Jérôme Glisse, linux-kernel, linux-mm, Kirill A . Shutemov, Linus Torvalds, Andrew Morton, Andrea Arcangeli, Joerg Roedel, Dan Williams, Sudeep Dutt, Ashutosh Dixit, Dimitri Sivanich, Jack Steiner, linuxppc-dev, iommu, xen-devel, kvm 2017-11-30 12:20+0100, Paolo Bonzini: > On 30/11/2017 10:33, Fabian Grünbichler wrote: > > > > It was reverted in 785373b4c38719f4af6775845df6be1dfaea120f after which > > the symptoms disappeared until this series was merged, which contains > > > > 369ea8242c0fb5239b4ddf0dc568f694bd244de4 mm/rmap: update to new mmu_notifier semantic v2 > > > > We haven't bisected the individual commits of the series yet, but the > > commit immediately preceding its merge exhibits no problems, while > > everything after does. It is not known whether the bug is actually in > > the series itself, or whether increasing the likelihood of triggering it > > is just a side-effect. There is a similar report[2] concerning an > > upgrade from 4.12.12 to 4.12.13, which does not contain this series in > > any form AFAICT but might be worth another look as well. > > I know of one issue in this series (invalidate_page was removed from KVM > without reimplementing it as invalidate_range). I'll try to prioritize > the fix, but I don't think I can do it before Monday. The series also dropped the reloading of the APIC access page and we never had it in invalidate_range_start ... I'll look into it today. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 00/13] mmu_notifier kill invalidate_page callback v2
@ 2017-08-31 21:17 jglisse
[not found] ` <20170831211738.17922-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: jglisse @ 2017-08-31 21:17 UTC (permalink / raw)
To: linux-mm
Cc: Andrea Arcangeli, Joerg Roedel, kvm, Radim Krčmář,
linux-rdma, linuxppc-dev, Jack Steiner, linux-kernel, dri-devel,
Sudeep Dutt, Ashutosh Dixit, iommu, Jérôme Glisse,
Dimitri Sivanich, amd-gfx, xen-devel, Paolo Bonzini,
Andrew Morton, Linus Torvalds, Dan Williams, Kirill A . Shutemov
From: Jérôme Glisse <jglisse@redhat.com>
(Sorry for so many list cross-posting and big cc)
Changes since v1:
- remove more dead code in kvm (no testing impact)
- more accurate end address computation (patch 2)
in page_mkclean_one and try_to_unmap_one
- added tested-by/reviewed-by gotten so far
Tested as both host and guest kernel with KVM nothing is burning yet.
Previous cover letter:
Please help testing !
The invalidate_page callback suffered from 2 pitfalls. First it used to
happen after page table lock was release and thus a new page might have
been setup for the virtual address before the call to invalidate_page().
This is in a weird way fixed by c7ab0d2fdc840266b39db94538f74207ec2afbf6
which moved the callback under the page table lock. Which also broke
several existing user of the mmu_notifier API that assumed they could
sleep inside this callback.
The second pitfall was invalidate_page being the only callback not taking
a range of address in respect to invalidation but was giving an address
and a page. Lot of the callback implementer assumed this could never be
THP and thus failed to invalidate the appropriate range for THP pages.
By killing this callback we unify the mmu_notifier callback API to always
take a virtual address range as input.
There is now 2 clear API (I am not mentioning the youngess API which is
seldomly used):
- invalidate_range_start()/end() callback (which allow you to sleep)
- invalidate_range() where you can not sleep but happen right after
page table update under page table lock
Note that a lot of existing user feels broken in respect to range_start/
range_end. Many user only have range_start() callback but there is nothing
preventing them to undo what was invalidated in their range_start() callback
after it returns but before any CPU page table update take place.
The code pattern use in kvm or umem odp is an example on how to properly
avoid such race. In a nutshell use some kind of sequence number and active
range invalidation counter to block anything that might undo what the
range_start() callback did.
If you do not care about keeping fully in sync with CPU page table (ie
you can live with CPU page table pointing to new different page for a
given virtual address) then you can take a reference on the pages inside
the range_start callback and drop it in range_end or when your driver
is done with those pages.
Last alternative is to use invalidate_range() if you can do invalidation
without sleeping as invalidate_range() callback happens under the CPU
page table spinlock right after the page table is updated.
Note this is barely tested. I intend to do more testing of next few days
but i do not have access to all hardware that make use of the mmu_notifier
API.
First 2 patches convert existing call of mmu_notifier_invalidate_page()
to mmu_notifier_invalidate_range() and bracket those call with call to
mmu_notifier_invalidate_range_start()/end().
The next 10 patches remove existing invalidate_page() callback as it can
no longer happen.
Finaly the last page remove it completely so it can RIP.
Jérôme Glisse (13):
dax: update to new mmu_notifier semantic
mm/rmap: update to new mmu_notifier semantic
powerpc/powernv: update to new mmu_notifier semantic
drm/amdgpu: update to new mmu_notifier semantic
IB/umem: update to new mmu_notifier semantic
IB/hfi1: update to new mmu_notifier semantic
iommu/amd: update to new mmu_notifier semantic
iommu/intel: update to new mmu_notifier semantic
misc/mic/scif: update to new mmu_notifier semantic
sgi-gru: update to new mmu_notifier semantic
xen/gntdev: update to new mmu_notifier semantic
KVM: update to new mmu_notifier semantic
mm/mmu_notifier: kill invalidate_page
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: dri-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org
Jérôme Glisse (13):
dax: update to new mmu_notifier semantic
mm/rmap: update to new mmu_notifier semantic v2
powerpc/powernv: update to new mmu_notifier semantic
drm/amdgpu: update to new mmu_notifier semantic
IB/umem: update to new mmu_notifier semantic
IB/hfi1: update to new mmu_notifier semantic
iommu/amd: update to new mmu_notifier semantic
iommu/intel: update to new mmu_notifier semantic
misc/mic/scif: update to new mmu_notifier semantic
sgi-gru: update to new mmu_notifier semantic
xen/gntdev: update to new mmu_notifier semantic
KVM: update to new mmu_notifier semantic v2
mm/mmu_notifier: kill invalidate_page
arch/arm/include/asm/kvm_host.h | 6 -----
arch/arm64/include/asm/kvm_host.h | 6 -----
arch/mips/include/asm/kvm_host.h | 5 ----
arch/powerpc/include/asm/kvm_host.h | 5 ----
arch/powerpc/platforms/powernv/npu-dma.c | 10 --------
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/x86.c | 11 ---------
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 31 -----------------------
drivers/infiniband/core/umem_odp.c | 19 ---------------
drivers/infiniband/hw/hfi1/mmu_rb.c | 9 -------
drivers/iommu/amd_iommu_v2.c | 8 ------
drivers/iommu/intel-svm.c | 9 -------
drivers/misc/mic/scif/scif_dma.c | 11 ---------
drivers/misc/sgi-gru/grutlbpurge.c | 12 ---------
drivers/xen/gntdev.c | 8 ------
fs/dax.c | 19 +++++++++------
include/linux/mm.h | 1 +
include/linux/mmu_notifier.h | 25 -------------------
mm/memory.c | 26 ++++++++++++++++----
mm/mmu_notifier.c | 14 -----------
mm/rmap.c | 35 +++++++++++++++++++++++---
virt/kvm/kvm_main.c | 42 --------------------------------
22 files changed, 65 insertions(+), 249 deletions(-)
--
2.13.5
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 17+ messages in thread[parent not found: <20170831211738.17922-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH 07/13] iommu/amd: update to new mmu_notifier semantic [not found] ` <20170831211738.17922-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2017-08-31 21:17 ` jglisse-H+wXaHxf7aLQT0dZR+AlfA 0 siblings, 0 replies; 17+ messages in thread From: jglisse-H+wXaHxf7aLQT0dZR+AlfA @ 2017-08-31 21:17 UTC (permalink / raw) To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg Cc: Andrea Arcangeli, Joerg Roedel, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jérôme Glisse, Andrew Morton, Linus Torvalds, Kirill A . Shutemov From: Jérôme Glisse <jglisse@redhat.com> Call to mmu_notifier_invalidate_page() are replaced by call to mmu_notifier_invalidate_range() and thus call are bracketed by call to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: iommu@lists.linux-foundation.org Cc: Joerg Roedel <jroedel@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> --- drivers/iommu/amd_iommu_v2.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c index 6629c472eafd..dccf5b76eff2 100644 --- a/drivers/iommu/amd_iommu_v2.c +++ b/drivers/iommu/amd_iommu_v2.c @@ -391,13 +391,6 @@ static int mn_clear_flush_young(struct mmu_notifier *mn, return 0; } -static void mn_invalidate_page(struct mmu_notifier *mn, - struct mm_struct *mm, - unsigned long address) -{ - __mn_flush_page(mn, address); -} - static void mn_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long start, unsigned long end) @@ -436,7 +429,6 @@ static void mn_release(struct mmu_notifier *mn, struct mm_struct *mm) static const struct mmu_notifier_ops iommu_mn = { .release = mn_release, .clear_flush_young = mn_clear_flush_young, - .invalidate_page = mn_invalidate_page, .invalidate_range = mn_invalidate_range, }; -- 2.13.5 _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-11-30 16:19 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-29 23:54 [PATCH 00/13] mmu_notifier kill invalidate_page callback Jérôme Glisse
[not found] ` <20170829235447.10050-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-29 23:54 ` [PATCH 07/13] iommu/amd: update to new mmu_notifier semantic Jérôme Glisse
2017-08-30 0:11 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback Linus Torvalds
2017-08-30 0:56 ` Jerome Glisse
2017-08-30 8:40 ` Mike Galbraith
2017-08-30 14:57 ` Adam Borowski
2017-09-01 14:47 ` Jeff Cook
2017-09-01 14:50 ` taskboxtester
2017-08-29 23:54 ` [PATCH 08/13] iommu/intel: update to new mmu_notifier semantic Jérôme Glisse
[not found] ` <20170829235447.10050-3-jglisse@redhat.com>
[not found] ` <6D58FBE4-5D03-49CC-AAFF-3C1279A5A849@gmail.com>
[not found] ` <20170830172747.GE13559@redhat.com>
[not found] ` <003685D9-4DA9-42DC-AF46-7A9F8A43E61F@gmail.com>
[not found] ` <20170830212514.GI13559@redhat.com>
2017-08-30 23:25 ` [PATCH 02/13] mm/rmap: " Nadav Amit
2017-08-31 0:47 ` Jerome Glisse
[not found] ` <20170831004719.GF9445-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 17:12 ` Andrea Arcangeli
2017-08-31 19:15 ` Nadav Amit
2017-11-30 9:33 ` BSOD with [PATCH 00/13] mmu_notifier kill invalidate_page callback Fabian Grünbichler
[not found] ` <20171130093320.66cxaoj45g2ttzoh-aVfaTQcAavps8ZkLLAvlZtBPR1lH4CV8@public.gmane.org>
2017-11-30 11:20 ` Paolo Bonzini
2017-11-30 16:19 ` Radim Krčmář
-- strict thread matches above, loose matches on Subject: below --
2017-08-31 21:17 [PATCH 00/13] mmu_notifier kill invalidate_page callback v2 jglisse
[not found] ` <20170831211738.17922-1-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-31 21:17 ` [PATCH 07/13] iommu/amd: update to new mmu_notifier semantic jglisse-H+wXaHxf7aLQT0dZR+AlfA
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).