Re: [v2 PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jan Stancek <jstancek@redhat.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>,
	peterz@infradead.org,  namit@vmware.com, minchan@kernel.org,
	mgorman@suse.de,  stable@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, Jan Stancek <jstancek@redhat.com>
Subject: Re: [v2 PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush
Date: Thu, 16 May 2019 11:29:35 -0400 (EDT)	[thread overview]
Message-ID: <1158926942.23199905.1558020575293.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190514145445.GB2825@fuggles.cambridge.arm.com>



----- Original Message -----
> On Mon, May 13, 2019 at 04:01:09PM -0700, Yang Shi wrote:
> > 
> > 
> > On 5/13/19 9:38 AM, Will Deacon wrote:
> > > On Fri, May 10, 2019 at 07:26:54AM +0800, Yang Shi wrote:
> > > > diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> > > > index 99740e1..469492d 100644
> > > > --- a/mm/mmu_gather.c
> > > > +++ b/mm/mmu_gather.c
> > > > @@ -245,14 +245,39 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
> > > >   {
> > > >   	/*
> > > >   	 * If there are parallel threads are doing PTE changes on same range
> > > > -	 * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > > > -	 * flush by batching, a thread has stable TLB entry can fail to flush
> > > > -	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > > > -	 * forcefully if we detect parallel PTE batching threads.
> > > > +	 * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB
> > > > +	 * flush by batching, one thread may end up seeing inconsistent PTEs
> > > > +	 * and result in having stale TLB entries.  So flush TLB forcefully
> > > > +	 * if we detect parallel PTE batching threads.
> > > > +	 *
> > > > +	 * However, some syscalls, e.g. munmap(), may free page tables, this
> > > > +	 * needs force flush everything in the given range. Otherwise this
> > > > +	 * may result in having stale TLB entries for some architectures,
> > > > +	 * e.g. aarch64, that could specify flush what level TLB.
> > > >   	 */
> > > > -	if (mm_tlb_flush_nested(tlb->mm)) {
> > > > -		__tlb_reset_range(tlb);
> > > > -		__tlb_adjust_range(tlb, start, end - start);
> > > > +	if (mm_tlb_flush_nested(tlb->mm) && !tlb->fullmm) {
> > > > +		/*
> > > > +		 * Since we can't tell what we actually should have
> > > > +		 * flushed, flush everything in the given range.
> > > > +		 */
> > > > +		tlb->freed_tables = 1;
> > > > +		tlb->cleared_ptes = 1;
> > > > +		tlb->cleared_pmds = 1;
> > > > +		tlb->cleared_puds = 1;
> > > > +		tlb->cleared_p4ds = 1;
> > > > +
> > > > +		/*
> > > > +		 * Some architectures, e.g. ARM, that have range invalidation
> > > > +		 * and care about VM_EXEC for I-Cache invalidation, need force
> > > > +		 * vma_exec set.
> > > > +		 */
> > > > +		tlb->vma_exec = 1;
> > > > +
> > > > +		/* Force vma_huge clear to guarantee safer flush */
> > > > +		tlb->vma_huge = 0;
> > > > +
> > > > +		tlb->start = start;
> > > > +		tlb->end = end;
> > > >   	}
> > > Whilst I think this is correct, it would be interesting to see whether
> > > or not it's actually faster than just nuking the whole mm, as I mentioned
> > > before.
> > > 
> > > At least in terms of getting a short-term fix, I'd prefer the diff below
> > > if it's not measurably worse.
> > 
> > I did a quick test with ebizzy (96 threads with 5 iterations) on my x86 VM,
> > it shows slightly slowdown on records/s but much more sys time spent with
> > fullmm flush, the below is the data.
> > 
> >                                     nofullmm                 fullmm
> > ops (records/s)              225606                  225119
> > sys (s)                            0.69                        1.14
> > 
> > It looks the slight reduction of records/s is caused by the increase of sys
> > time.
> 
> That's not what I expected, and I'm unable to explain why moving to fullmm
> would /increase/ the system time. I would've thought the time spent doing
> the invalidation would decrease, with the downside that the TLB is cold
> when returning back to userspace.
> 

I tried ebizzy with various parameters (malloc vs mmap, ran it for hour),
but performance was very similar for both patches.

So, I was looking for workload that would demonstrate the largest difference.
Inspired by python xml-rpc, which can handle each request in new thread,
I tried following [1]:

16 threads, each looping 100k times over:
  mmap(16M)
  touch 1 page
  madvise(DONTNEED)
  munmap(16M)

This yields quite significant difference for 2 patches when running on
my 46 CPU arm host. I checked it twice - applied patch, recompiled, rebooted,
but numbers stayed +- couple seconds the same.

Does it somewhat match your expectation?

v2 patch
---------
real    2m33.460s
user    0m3.359s
sys     15m32.307s

real    2m33.895s
user    0m2.749s
sys     16m34.500s

real    2m35.666s
user    0m3.528s
sys     15m23.377s

real    2m32.898s
user    0m2.789s
sys     16m18.801s

real    2m33.087s
user    0m3.565s
sys     16m23.815s


fullmm version
---------------
real    0m46.811s
user    0m1.596s
sys     1m47.500s

real    0m47.322s
user    0m1.803s
sys     1m48.449s

real    0m46.668s
user    0m1.508s
sys     1m47.352s

real    0m46.742s
user    0m2.007s
sys     1m47.217s

real    0m46.948s
user    0m1.785s
sys     1m47.906s

[1] https://github.com/jstancek/reproducers/blob/master/kernel/page_fault_stall/mmap8.c

next prev parent reply	other threads:[~2019-05-16 15:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-09 23:26 [v2 PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush Yang Shi
2019-05-13 16:38 ` Will Deacon
2019-05-13 23:01   ` Yang Shi
2019-05-14 14:54     ` Will Deacon
2019-05-14 17:25       ` Yang Shi
2019-05-16 15:29       ` Jan Stancek [this message]
2019-05-20  2:59         ` Yang Shi
2019-05-14 11:52   ` Peter Zijlstra
2019-05-14 12:02     ` Will Deacon
  -- strict thread matches above, loose matches on Subject: below --
2019-05-14  2:01 Nadav Amit
2019-05-14  4:20 ` Yang Shi
2019-05-14  4:30   ` Yang Shi
2019-05-14  7:15 ` Jan Stancek
2019-05-14  7:21   ` Nadav Amit
2019-05-14 11:49     ` Peter Zijlstra
2019-05-14 11:43 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1158926942.23199905.1558020575293.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=will.deacon@arm.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).