Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Steve Capper <steve.capper@linaro.org>,
	Dann Frazier <dann.frazier@canonical.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast
Date: Thu, 10 Mar 2016 17:34:39 +0100	[thread overview]
Message-ID: <20160310163439.GS6356@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160310161035.GD30716@redhat.com>

On Thu, Mar 10, 2016 at 05:10:35PM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 25, 2016 at 10:50:14PM -0800, Hugh Dickins wrote:
> > It's a useful suggestion from Gerald, and your THP rework may have
> > brought us closer to being able to rely on RCU locking rather than
> > IRQ disablement there; but you were right just to delete the comment,
> > there are other reasons why fast GUP still depends on IRQs disabled.
> > 
> > For example, see the fallback tlb_remove_table_one() in mm/memory.c:
> > that one uses smp_call_function() sending IPI to all CPUs concerned,
> > without waiting an RCU grace period at all.
> 
> I full agree, the refcounting change just drops the THP splitting from
> the equation, but everything else remains. It's not like x86 is using
> RCU for gup_fast when CONFIG_TRANSPARENT_HUGEPAGE=n.
> 
> The main issue Peter also pointed out is how it can be faster to wait
> a RCU grace period than sending an IPI to only the CPU that have an
> active_mm matching the one the page belongs to 

Typically RCU (sched) grace periods take a relative 'forever' compared
to sending IPIs. That is, synchronize_sched() is typically slower.

But, on the upside, not sending IPIs will not perturb those other
CPUs, which is something HPC/RT people like.

> and I'm not exactly
> sure the cost of disabling irqs in gup_fast is going to pay off.

Entirely depends on the workload of course, but you can do a lot of
gup_fast compared to munmap()s. So making gup_fast, faster, seems like a
potential win. Also, is anybody really interested in munmap()
performance?

> It's
> not just swap, large munmap should be able to free up pagetables or
> pagetables would get a footprint out of proportion with the Rss of the
> process, and in turn it'll have to either block synchronously for long
> before returning to userland, or return to userland when the pagetable
> memory is still not free, and userland may mmap again and munmap again
> in a loop and being legit doing so too, with unclear side effects with
> regard to false positive OOM.

I'm not seeing that, the only point where this matters at all, is if the
batch alloc fails, otherwise the RCU_TABLE_FREE stuff uses
call_rcu_sched() and what you write above is true already.

Now, RCU already has an oom_notifier to push work harder if we approach
that.

> Then there's another issue with synchronize_sched(),
> __get_user_pages_fast has to safe to run from irq (note the
> local_irq_save instead of local_irq_disable) and KVM leverages it.

This is unchanged. synchronize_sched() serialized against anything that
disables preemption, having IRQs disabled is very much included in that.

So there should be no problem running this from IRQ context.

> KVM
> just requires it to be atomic so it can run from inside a preempt
> disabled section (i.e. inside a spinlock), I'm fairly certain the
> irq-safe guarantee could be dropped without pain and
> rcu_read_lock_sched() would be enough, but the documentation of the
> IRQ-safe guarantees provided by __get_user_pages_fast should be also
> altered if we were to use synchronize_sched() and that's a symbol
> exported to GPL modules too.

No changes needed.

> Overall my main concern in switching x86 to RCU gup-fast is the
> performance of synchronize_sched in large munmap pagetable teardown.

Normally, as already established by Martin, you should not actually ever
encounter the sync_sched() call. Only under severe memory pressure, when
the batch alloc in tlb_remove_table() fails is this ever an issue.

And at the point where such allocations fail, performance typically
isn't a concern anymore.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2016-03-10 16:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24 15:59 [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast Kirill A. Shutemov
2016-02-24 17:50 ` Gerald Schaefer
2016-02-25 15:07   ` Kirill A. Shutemov
2016-02-26  6:50     ` Hugh Dickins
2016-02-26 11:06       ` Peter Zijlstra
2016-02-26 11:41         ` Martin Schwidefsky
2016-02-29  2:38           ` Hugh Dickins
2016-03-10 16:10       ` Andrea Arcangeli
2016-03-10 16:34         ` Peter Zijlstra [this message]
2016-03-10 16:40           ` Peter Zijlstra
2016-03-10 17:04           ` Andrea Arcangeli
2016-03-10 17:22             ` Andrea Arcangeli
2016-03-11  9:22               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160310163439.GS6356@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dann.frazier@canonical.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=steve.capper@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.