Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <jglisse@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Nadav Amit" <nadav.amit@gmail.com>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Ross Zwisler" <ross.zwisler@linux.intel.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Bernhard Held" <berny156@gmx.de>,
	"Adam Borowski" <kilobyte@angband.pl>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Wanpeng Li" <kernellwp@gmail.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Takashi Iwai" <tiwai@suse.de>, "Mike Galbraith" <efault@gmx.de>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	axie <axie@amd.com>, "Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic
Date: Wed, 30 Aug 2017 14:20:14 -0400	[thread overview]
Message-ID: <20170830182013.GD2386@redhat.com> (raw)
In-Reply-To: <20170830172747.GE13559@redhat.com>

On Wed, Aug 30, 2017 at 07:27:47PM +0200, Andrea Arcangeli wrote:
> On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote:
> > Therefore, IIUC, try_to_umap_one() should only call
> > mmu_notifier_invalidate_range() after ptep_get_and_clear() and
> 
> That would trigger an unnecessarily double call to
> ->invalidate_range() both from mmu_notifier_invalidate_range() after
> ptep_get_and_clear() and later at the end in
> mmu_notifier_invalidate_range_end().
> 
> The only advantage of adding a mmu_notifier_invalidate_range() after
> ptep_get_and_clear() is to flush the secondary MMU TLBs (keep in mind
> the pagetables have to be shared with the primary MMU in order to use
> the ->invalidate_range()) inside the PT lock.
> 
> So if mmu_notifier_invalidate_range() after ptep_get_and_clear() is
> needed or not, again boils down to the issue if the old code calling
> ->invalidate_page outside the PT lock was always broken before or
> not. I don't see why exactly it was broken, we even hold the page lock
> there so I don't see a migration race possible either. Of course the
> constraint to be safe is that the put_page in try_to_unmap_one cannot
> be the last one, and that had to be enforced by the caller taking an
> additional reference on it.
> 
> One can wonder if the primary MMU TLB flush in ptep_clear_flush
> (should_defer_flush returning false) could be put after releasing the
> PT lock too (because that's not different from deferring the secondary
> MMU notifier TLB flush in ->invalidate_range down to
> mmu_notifier_invalidate_range_end) even if TTU_BATCH_FLUSH isn't set,
> which may be safe too for the same reasons.
> 
> When should_defer_flush returns true we already defer the primary MMU
> TLB flush to much later to even mmu_notifier_invalidate_range_end, not
> just after the PT lock release so at least when should_defer_flush is
> true, it looks obviously safe to defer the secondary MMU TLB flush to
> mmu_notifier_invalidate_range_end for the drivers implementing
> ->invalidate_range.
> 
> If I'm wrong and all TLB flushes must happen inside the PT lock, then
> we should at least reconsider the implicit call to ->invalidate_range
> method from mmu_notifier_invalidate_range_end or we would call it
> twice unnecessarily which doesn't look optimal. Either ways this
> doesn't look optimal. We would need to introduce a
> mmu_notifier_invalidate_range_end_full that calls also
> ->invalidate_range in such case so we skip the ->invalidate_range call
> in mmu_notifier_invalidate_range_end if we put an explicit
> mmu_notifier_invalidate_range() after ptep_get_and_clear inside the PT
> lock like you suggested above.

So i went over call to try_to_unmap() (try_to_munlock() is fine as it
does not clear the CPU page table entry). I believe they are 2 cases
where you can get a new pte entry after we release spinlock and before
we call mmu_notifier_invalidate_range_end()

First case is :
if (unlikely(PageSwapBacked(page) != PageSwapCache(page))) {
  ...
  break;
}

The pte is clear, there was an error condition and this should never
happen but a racing thread might install a new pte in the meantime.
Maybe we should restore the pte value here. Anyway when this happens
bad things are going on.

The second case is non dirty anonymous page and MADV_FREE. But here
the application is telling us that no one should care about that
virtual address any more. So i am not sure how much we should care.


If we ignore this 2 cases than the CPU pte can never be replace by
something else between the time we release the spinlock and the time
we call mmu_notifier_invalidate_range_end() so not invalidating the
devices tlb is ok here. Do we want this kind of optimization ?

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-08-30 18:20 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-29 23:54 [PATCH 00/13] mmu_notifier kill invalidate_page callback Jérôme Glisse
2017-08-29 23:54 ` [PATCH 01/13] dax: update to new mmu_notifier semantic Jérôme Glisse
2017-08-29 23:54 ` [PATCH 02/13] mm/rmap: " Jérôme Glisse
2017-08-30  2:46   ` Nadav Amit
2017-08-30  2:59     ` Jerome Glisse
2017-08-30  3:16       ` Nadav Amit
2017-08-30  3:18         ` Nadav Amit
2017-08-30 17:27     ` Andrea Arcangeli
2017-08-30 18:00       ` Nadav Amit
2017-08-30 21:25         ` Andrea Arcangeli
2017-08-30 23:25           ` Nadav Amit
2017-08-31  0:47             ` Jerome Glisse
2017-08-31 17:12               ` Andrea Arcangeli
2017-08-31 19:15                 ` Nadav Amit
2017-08-30 18:20       ` Jerome Glisse [this message]
2017-08-30 18:40         ` Nadav Amit
2017-08-30 20:45           ` Jerome Glisse
2017-08-30 22:17             ` Andrea Arcangeli
2017-08-30 20:55           ` Andrea Arcangeli
2017-08-30 16:52   ` Andrea Arcangeli
2017-08-30 17:48     ` Jerome Glisse
2017-08-30 21:53     ` Linus Torvalds
2017-08-30 23:01       ` Andrea Arcangeli
2017-08-31 18:25         ` Jerome Glisse
2017-08-31 19:40           ` Linus Torvalds
2017-08-29 23:54 ` [PATCH 03/13] powerpc/powernv: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 04/13] drm/amdgpu: " Jérôme Glisse
2017-08-30  6:18   ` Christian König
2017-08-29 23:54 ` [PATCH 05/13] IB/umem: " Jérôme Glisse
2017-08-30  6:13   ` Leon Romanovsky
2017-08-29 23:54 ` [PATCH 06/13] IB/hfi1: " Jérôme Glisse
2017-09-06 14:08   ` Arumugam, Kamenee
2017-08-29 23:54 ` [PATCH 07/13] iommu/amd: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 08/13] iommu/intel: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 09/13] misc/mic/scif: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 10/13] sgi-gru: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 11/13] xen/gntdev: " Jérôme Glisse
2017-08-30 19:32   ` Boris Ostrovsky
2017-08-29 23:54 ` [PATCH 12/13] KVM: " Jérôme Glisse
2017-08-29 23:54 ` [PATCH 13/13] mm/mmu_notifier: kill invalidate_page Jérôme Glisse
2017-08-30  0:11 ` [PATCH 00/13] mmu_notifier kill invalidate_page callback Linus Torvalds
2017-08-30  0:56   ` Jerome Glisse
2017-08-30  8:40     ` Mike Galbraith
2017-08-30 14:57     ` Adam Borowski
2017-09-01 14:47       ` Jeff Cook
2017-09-01 14:50         ` taskboxtester
2017-11-30  9:33 ` BSOD with " Fabian Grünbichler
2017-11-30 11:20   ` Paolo Bonzini
2017-11-30 16:19     ` Radim Krčmář
2017-11-30 18:05       ` [PATCH 1/2] KVM: x86: fix APIC page invalidation Radim Krčmář
2017-11-30 18:05         ` [PATCH 2/2] TESTING! KVM: x86: add invalidate_range mmu notifier Radim Krčmář
2017-12-01 15:15           ` Paolo Bonzini
2017-12-03 17:24             ` Andrea Arcangeli
2017-12-01 12:21         ` [PATCH 1/2] KVM: x86: fix APIC page invalidation Fabian Grünbichler
2017-12-01 15:27         ` Paolo Bonzini
2017-12-03 17:28         ` Andrea Arcangeli
2017-12-06  2:32         ` Wanpeng Li
2017-12-06  9:50           ` 王金浦
2017-12-06 10:00             ` Paolo Bonzini
2017-12-06  8:15         ` Fabian Grünbichler
2017-12-13 12:54         ` Richard Purdie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170830182013.GD2386@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axie@amd.com \
    --cc=berny156@gmx.de \
    --cc=dan.j.williams@intel.com \
    --cc=efault@gmx.de \
    --cc=kernellwp@gmail.com \
    --cc=kilobyte@angband.pl \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).