linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Peter Zijlstra <peterz@infradead.org>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Steve Capper <steve.capper@linaro.org>,
	Dann Frazier <dann.frazier@canonical.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast
Date: Thu, 10 Mar 2016 17:10:35 +0100	[thread overview]
Message-ID: <20160310161035.GD30716@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1602252233280.9793@eggly.anvils>

On Thu, Feb 25, 2016 at 10:50:14PM -0800, Hugh Dickins wrote:
> It's a useful suggestion from Gerald, and your THP rework may have
> brought us closer to being able to rely on RCU locking rather than
> IRQ disablement there; but you were right just to delete the comment,
> there are other reasons why fast GUP still depends on IRQs disabled.
> 
> For example, see the fallback tlb_remove_table_one() in mm/memory.c:
> that one uses smp_call_function() sending IPI to all CPUs concerned,
> without waiting an RCU grace period at all.

I full agree, the refcounting change just drops the THP splitting from
the equation, but everything else remains. It's not like x86 is using
RCU for gup_fast when CONFIG_TRANSPARENT_HUGEPAGE=n.

The main issue Peter also pointed out is how it can be faster to wait
a RCU grace period than sending an IPI to only the CPU that have an
active_mm matching the one the page belongs to and I'm not exactly
sure the cost of disabling irqs in gup_fast is going to pay off. It's
not just swap, large munmap should be able to free up pagetables or
pagetables would get a footprint out of proportion with the Rss of the
process, and in turn it'll have to either block synchronously for long
before returning to userland, or return to userland when the pagetable
memory is still not free, and userland may mmap again and munmap again
in a loop and being legit doing so too, with unclear side effects with
regard to false positive OOM.

Then there's another issue with synchronize_sched(),
__get_user_pages_fast has to safe to run from irq (note the
local_irq_save instead of local_irq_disable) and KVM leverages it. KVM
just requires it to be atomic so it can run from inside a preempt
disabled section (i.e. inside a spinlock), I'm fairly certain the
irq-safe guarantee could be dropped without pain and
rcu_read_lock_sched() would be enough, but the documentation of the
IRQ-safe guarantees provided by __get_user_pages_fast should be also
altered if we were to use synchronize_sched() and that's a symbol
exported to GPL modules too.

Overall my main concern in switching x86 to RCU gup-fast is the
performance of synchronize_sched in large munmap pagetable teardown.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-03-10 16:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24 15:59 [PATCH] thp, mm: remove comments on serializion of THP split vs. gup_fast Kirill A. Shutemov
2016-02-24 17:50 ` Gerald Schaefer
2016-02-25 15:07   ` Kirill A. Shutemov
2016-02-26  6:50     ` Hugh Dickins
2016-02-26 11:06       ` Peter Zijlstra
2016-02-26 11:41         ` Martin Schwidefsky
2016-02-29  2:38           ` Hugh Dickins
2016-03-10 16:10       ` Andrea Arcangeli [this message]
2016-03-10 16:34         ` Peter Zijlstra
2016-03-10 16:40           ` Peter Zijlstra
2016-03-10 17:04           ` Andrea Arcangeli
2016-03-10 17:22             ` Andrea Arcangeli
2016-03-11  9:22               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160310161035.GD30716@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dann.frazier@canonical.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=steve.capper@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).