Re: [PATCH 0/2] change_protection(): Count the number of pages affected

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 0/2] change_protection(): Count the number of pages affected
Date: Fri, 16 Nov 2012 19:40:02 +0100	[thread overview]
Message-ID: <20121116184002.GB4763@gmail.com> (raw)
In-Reply-To: <CA+55aFz_JnoR73O46YWhZn2A4t_CSUkGzMMprCUpvR79TVMCEQ@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Nov 14, 2012 at 12:50 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > What do you guys think about this mprotect() optimization?
> 
> Hmm..
> 
> If this is mainly about just avoiding the TLB flushing, I do 
> wonder if it might not be more interesting to try to be much 
> more aggressive.
> 
> As noted elsewhere, we should just notice when vm_page_prot 
> doesn't change at all - even if 'flags' change, it is possible 
> that the actual low-level page protection bits do not (due to 
> the X=R issue).
> 
> But even *more* aggressively, how about looking at
> 
>  - not flushing the TLB at all if the bits become  more permissive
> (taking the TLB micro-fault and letting the CPU just update it on its
> own)
> 
>  - even *more* aggressive: if the bits become strictly more 
> restrictive, how about not flushing the TLB at all, *and* not 
> even changing the page tables, and just teaching the page 
> fault code to do it lazily at fault time?
> 
> Now, the "change protections lazily" might actually be a huge 
> performance problem with the page fault overhead dwarfing any 
> TLB flush costs, but we don't really know, do we? It might be 
> worth trying out.

It might be a good idea when ptes get weaker protections - and 
maybe some CPU models see the pte modification in memory and are 
able to hash that to the TLB entry already and flush it? Even if 
they don't guarantee it architecturally they might have it as an 
optimization that works most of the time.

But I'd prefer to keep any such patch separate from these 
patches and maybe even keep them per arch and per CPU model?

I have instrumented and made sure that *these* patches do help 
visibly - but to determine whether not flushing TLBs when they 
are made more permissive is a lot harder to do ... there could 
be per arch differences, even per CPU model differences, 
depending on TLB size, CPU features, etc.

For unthreaded process environments mprotect() is pretty neat 
already.

For small/midsize mprotect()s in threaded environments there's 
two big costs:

  - the down_write(mm->sem)/up_write(mm->sem) serializes between 
    threads.

    Technically this could be improved, as the most expensive 
    parts of mprotect() are really safe via down_read() - the 
    only exception appears to be:

        vma->vm_flags = newflags;
        vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
                                          vm_get_page_prot(newflags));

    and that could be serialized using a spinlock, say the 
    pagetable lock. But it's a lot of footwork factoring out 
    vma->vm_page_prot users and we'd consider each such place 
    whether slowing them down is less of a problem than the 
    benefit of speeding up mprotect().

    So I wouldn't personally go there, dragons and all that.

  - the TLB flush, if done on some highly threaded workload like
    a JVM with threads live on many other CPUs is a global TLB 
    flush, with IPIs sent everywhere and the result has to be 
    waited for.

    This could be improved even if we don't do your
    very aggressive optimization, unless I'm missing something: 
    we could still flush locally and send the IPIs, but we don't
    have to *wait* for them when we weaken protections, right? 

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2012-11-16 18:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-14  8:50 [PATCH 0/2] change_protection(): Count the number of pages affected Ingo Molnar
2012-11-14  8:50 ` [PATCH 1/2] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Ingo Molnar
2012-11-14 18:37   ` Rik van Riel
2012-11-14  8:50 ` [PATCH 2/2] mm: Optimize the TLB flush of sys_mprotect() and change_protection() users Ingo Molnar
2012-11-14 18:39   ` Rik van Riel
2012-11-14 18:01 ` [PATCH 0/2] change_protection(): Count the number of pages affected Linus Torvalds
2012-11-14 18:43   ` Rik van Riel
2012-11-14 20:52     ` Linus Torvalds
2012-11-14 22:04       ` Rik van Riel
2012-11-16 18:40   ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121116184002.GB4763@gmail.com \
    --to=mingo@kernel.org \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).