linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Nadav Amit <nadav.amit@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Alistair Popple <alistair@popple.id.au>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
	linux-next@vger.kernel.org
Subject: Re: [PATCH] mm/mmu_notifier: avoid double notification when it is useless
Date: Tue, 3 Oct 2017 20:15:59 -0400	[thread overview]
Message-ID: <20171004001559.GD20644@redhat.com> (raw)
In-Reply-To: <20171003234215.GA5231@redhat.com>

On Wed, Oct 04, 2017 at 01:42:15AM +0200, Andrea Arcangeli wrote:
> Hello Jerome,
> 
> On Fri, Sep 01, 2017 at 01:30:11PM -0400, Jerome Glisse wrote:
> > +Case A is obvious you do not want to take the risk for the device to write to
> > +a page that might now be use by some completely different task.
> 
> used
> 
> > +is true ven if the thread doing the page table update is preempted right after
> 
> even
> 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 90731e3b7e58..5706252b828a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1167,8 +1167,15 @@ static int do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
> >  		goto out_free_pages;
> >  	VM_BUG_ON_PAGE(!PageHead(page), page);
> >  
> > +	/*
> > +	 * Leave pmd empty until pte is filled note we must notify here as
> > +	 * concurrent CPU thread might write to new page before the call to
> > +	 * mmu_notifier_invalidate_range_end() happen which can lead to a
> 
> happens
> 
> > +	 * device seeing memory write in different order than CPU.
> > +	 *
> > +	 * See Documentation/vm/mmu_notifier.txt
> > +	 */
> >  	pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
> > -	/* leave pmd empty until pte is filled */
> >  
> 
> Here we can change the following mmu_notifier_invalidate_range_end to
> skip calling ->invalidate_range. It could be called
> mmu_notifier_invalidate_range_only_end, or other suggestions
> welcome. Otherwise we'll repeat the call for nothing.
> 
> We need it inside the PT lock for the ordering issue, but we don't
> need to run it twice.
> 
> Same in do_huge_pmd_wp_page, wp_page_copy and
> migrate_vma_insert_page. Every time *clear_flush_notify is used
> mmu_notifier_invalidate_range_only_end should be called after it,
> instead of mmu_notifier_invalidate_range_end.
> 
> I think optimizing that part too, fits in the context of this patchset
> (if not in the same patch), because the objective is still the same:
> to remove unnecessary ->invalidate_range calls.

Yes you are right, good idea, i will respin with that too (and with the
various typo you noted thank you for that). I can do 2 patch or 1, i don't
mind either way. I will probably do 2 as first and they can be folded into
1 if people prefer just one.


> 
> > +				 * No need to notify as we downgrading page
> 
> are
> 
> > +				 * table protection not changing it to point
> > +				 * to a new page.
> > +	 			 *
> 
> > +		 * No need to notify as we downgrading page table to read only
> 
> are
> 
> > +	 * No need to notify as we replacing a read only page with another
> 
> are
> 
> > @@ -1510,13 +1515,43 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >  			if (pte_soft_dirty(pteval))
> >  				swp_pte = pte_swp_mksoft_dirty(swp_pte);
> >  			set_pte_at(mm, address, pvmw.pte, swp_pte);
> > -		} else
> > +		} else {
> > +			/*
> > +			 * We should not need to notify here as we reach this
> > +			 * case only from freeze_page() itself only call from
> > +			 * split_huge_page_to_list() so everything below must
> > +			 * be true:
> > +			 *   - page is not anonymous
> > +			 *   - page is locked
> > +			 *
> > +			 * So as it is a shared page and it is locked, it can
> > +			 * not be remove from the page cache and replace by
> > +			 * a new page before mmu_notifier_invalidate_range_end
> > +			 * so no concurrent thread might update its page table
> > +			 * to point at new page while a device still is using
> > +			 * this page.
> > +			 *
> > +			 * But we can not assume that new user of try_to_unmap
> > +			 * will have that in mind so just to be safe here call
> > +			 * mmu_notifier_invalidate_range()
> > +			 *
> > +			 * See Documentation/vm/mmu_notifier.txt
> > +			 */
> >  			dec_mm_counter(mm, mm_counter_file(page));
> > +			mmu_notifier_invalidate_range(mm, address,
> > +						      address + PAGE_SIZE);
> > +		}
> >  discard:
> > +		/*
> > +		 * No need to call mmu_notifier_invalidate_range() as we are
> > +		 * either replacing a present pte with non present one (either
> > +		 * a swap or special one). We handling the clearing pte case
> > +		 * above.
> > +		 *
> > +		 * See Documentation/vm/mmu_notifier.txt
> > +		 */
> >  		page_remove_rmap(subpage, PageHuge(page));
> >  		put_page(page);
> > -		mmu_notifier_invalidate_range(mm, address,
> > -					      address + PAGE_SIZE);
> >  	}
> >  
> >  	mmu_notifier_invalidate_range_end(vma->vm_mm, start, end);
> 
> That is the path that unmaps filebacked pages (btw, not necessarily
> shared unlike comment says, they can be private but still filebacked).

I was more refering to the fact that they are in page cache and thus
given current condition they can not be migrated to a new page in our
back. But yes it can be a private mapping, i will fix the comment.


> I'd like some more explanation about the inner working of "that new
> user" as per comment above.
> 
> It would be enough to drop mmu_notifier_invalidate_range from above
> without adding it to the filebacked case. The above gives higher prio
> to the hypothetical and uncertain future case, than to the current
> real filebacked case that doesn't need ->invalidate_range inside the
> PT lock, or do you see something that might already need such
> ->invalidate_range?

No i don't see any new user today that might need such invalidate but
i was trying to be extra cautious as i have a tendency to assume that
someone might do a patch that use try_to_unmap() without going through
all the comments in the function and thus possibly using it in a an
unexpected way from mmu_notifier callback point of view. I am fine
with putting the burden on new user to get it right and adding an
extra warning in the function description to try to warn people in a
sensible way.


> I certainly like the patch. I expect ->invalidate_range users will
> like the slight higher complexity in order to eliminate unnecessary
> invalidates that are slowing them down unnecessarily. At the same time
> this is zero risk (because a noop) for all other MMU notifier users
> (those that don't share the primary MMU pagetables, like KVM).

I have another patchset to restore the change_pte optimization for kvm
i want to post too. I will need to do some benchmarking first to make
sure it actualy helps even in a small way.

Thank you for looking into this patch, i will repost with your suggestions
soon.

Jérôme

  reply	other threads:[~2017-10-04  0:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 17:30 [PATCH] mm/mmu_notifier: avoid double notification when it is useless jglisse
2017-10-03 23:42 ` Andrea Arcangeli
2017-10-04  0:15   ` Jerome Glisse [this message]
     [not found]     ` <0D64494B-AB3D-4091-B75A-883EA37BE098@gmail.com>
2017-10-04  1:20       ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171004001559.GD20644@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alistair@popple.id.au \
    --cc=benh@kernel.crashing.org \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nadav.amit@gmail.com \
    --cc=sfr@canb.auug.org.au \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).