From: Steve Capper <steve.capper@linaro.org> To: Will Deacon <will.deacon@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Catalin Marinas <Catalin.Marinas@arm.com>, "linux@arm.linux.org.uk" <linux@arm.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "gary.robertson@linaro.org" <gary.robertson@linaro.org>, "christoffer.dall@linaro.org" <christoffer.dall@linaro.org>, "peterz@infradead.org" <peterz@infradead.org>, "anders.roxell@linaro.org" <anders.roxell@linaro.org>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "dann.frazier@canonical.com" <dann.frazier@canonical.com>, Mark Rutland <Mark.Rutland@arm.com>, "mgorman@suse.de" <mgorman@suse.de> Subject: Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Wed, 27 Aug 2014 13:50:28 +0100 [thread overview] Message-ID: <20140827125027.GA7765@linaro.org> (raw) In-Reply-To: <20140827085442.GD16376@arm.com> On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote: > Hi Steve, > Hey Will, > A few minor comments (took me a while to understand how this works, so I > thought I'd make some noise :) A big thank you for reading through it :-). > > On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote: > > get_user_pages_fast attempts to pin user pages by walking the page > > tables directly and avoids taking locks. Thus the walker needs to be > > protected from page table pages being freed from under it, and needs > > to block any THP splits. > > > > One way to achieve this is to have the walker disable interrupts, and > > rely on IPIs from the TLB flushing code blocking before the page table > > pages are freed. > > > > On some platforms we have hardware broadcast of TLB invalidations, thus > > the TLB flushing code doesn't necessarily need to broadcast IPIs; and > > spuriously broadcasting IPIs can hurt system performance if done too > > often. > > > > This problem has been solved on PowerPC and Sparc by batching up page > > table pages belonging to more than one mm_user, then scheduling an > > rcu_sched callback to free the pages. This RCU page table free logic > > has been promoted to core code and is activated when one enables > > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement > > their own get_user_pages_fast routines. > > > > The RCU page table free logic coupled with a an IPI broadcast on THP > > split (which is a rare event), allows one to protect a page table > > walker by merely disabling the interrupts during the walk. > > Disabling interrupts isn't completely free (it's a self-synchronising > operation on ARM). It would be interesting to see if your futex workload > performance is improved by my simple irq_save optimisation for ARM: > > https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b > > (I've been struggling to show anything other than tiny improvements from > that patch). > This looks like a useful optimisation; I'll have a think about workloads that fire many futexes on THP tails. (The test I used only fired off one futex). > > This patch provides a general RCU implementation of get_user_pages_fast > > that can be used by architectures that perform hardware broadcast of > > TLB invalidations. > > > > It is based heavily on the PowerPC implementation by Nick Piggin. > > [...] > > > diff --git a/mm/gup.c b/mm/gup.c > > index 91d044b..2f684fa 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -10,6 +10,10 @@ > > #include <linux/swap.h> > > #include <linux/swapops.h> > > > > +#include <linux/sched.h> > > +#include <linux/rwsem.h> > > +#include <asm/pgtable.h> > > + > > #include "internal.h" > > > > static struct page *no_page_table(struct vm_area_struct *vma, > > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr) > > return page; > > } > > #endif /* CONFIG_ELF_CORE */ > > + > > +#ifdef CONFIG_HAVE_RCU_GUP > > + > > +#ifdef __HAVE_ARCH_PTE_SPECIAL > > Do we actually require this (pte special) if hugepages are disabled or > not supported? We need this logic if we want use fast_gup on normal pages safely. The special bit indicates that we should not attempt to take a reference to the underlying page. Huge pages are guaranteed not to be special. Cheers, -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Steve Capper <steve.capper@linaro.org> To: Will Deacon <will.deacon@arm.com> Cc: "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Catalin Marinas <Catalin.Marinas@arm.com>, "linux@arm.linux.org.uk" <linux@arm.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "gary.robertson@linaro.org" <gary.robertson@linaro.org>, "christoffer.dall@linaro.org" <christoffer.dall@linaro.org>, "peterz@infradead.org" <peterz@infradead.org>, "anders.roxell@linaro.org" <anders.roxell@linaro.org>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "dann.frazier@canonical.com" <dann.frazier@canonical.com>, Mark Rutland <Mark.Rutland@arm.com>, "mgorman@suse.de" <mgorman@suse.de> Subject: Re: [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Wed, 27 Aug 2014 13:50:28 +0100 [thread overview] Message-ID: <20140827125027.GA7765@linaro.org> (raw) Message-ID: <20140827125028.ZBpiBDWytW0BGO0GR0-zsSqaQdc7ERQ21xxym1krp24@z> (raw) In-Reply-To: <20140827085442.GD16376@arm.com> On Wed, Aug 27, 2014 at 09:54:42AM +0100, Will Deacon wrote: > Hi Steve, > Hey Will, > A few minor comments (took me a while to understand how this works, so I > thought I'd make some noise :) A big thank you for reading through it :-). > > On Thu, Aug 21, 2014 at 04:43:27PM +0100, Steve Capper wrote: > > get_user_pages_fast attempts to pin user pages by walking the page > > tables directly and avoids taking locks. Thus the walker needs to be > > protected from page table pages being freed from under it, and needs > > to block any THP splits. > > > > One way to achieve this is to have the walker disable interrupts, and > > rely on IPIs from the TLB flushing code blocking before the page table > > pages are freed. > > > > On some platforms we have hardware broadcast of TLB invalidations, thus > > the TLB flushing code doesn't necessarily need to broadcast IPIs; and > > spuriously broadcasting IPIs can hurt system performance if done too > > often. > > > > This problem has been solved on PowerPC and Sparc by batching up page > > table pages belonging to more than one mm_user, then scheduling an > > rcu_sched callback to free the pages. This RCU page table free logic > > has been promoted to core code and is activated when one enables > > HAVE_RCU_TABLE_FREE. Unfortunately, these architectures implement > > their own get_user_pages_fast routines. > > > > The RCU page table free logic coupled with a an IPI broadcast on THP > > split (which is a rare event), allows one to protect a page table > > walker by merely disabling the interrupts during the walk. > > Disabling interrupts isn't completely free (it's a self-synchronising > operation on ARM). It would be interesting to see if your futex workload > performance is improved by my simple irq_save optimisation for ARM: > > https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=misc-patches&id=312a70adfa6f22e9d62803dd21400f481253e58b > > (I've been struggling to show anything other than tiny improvements from > that patch). > This looks like a useful optimisation; I'll have a think about workloads that fire many futexes on THP tails. (The test I used only fired off one futex). > > This patch provides a general RCU implementation of get_user_pages_fast > > that can be used by architectures that perform hardware broadcast of > > TLB invalidations. > > > > It is based heavily on the PowerPC implementation by Nick Piggin. > > [...] > > > diff --git a/mm/gup.c b/mm/gup.c > > index 91d044b..2f684fa 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -10,6 +10,10 @@ > > #include <linux/swap.h> > > #include <linux/swapops.h> > > > > +#include <linux/sched.h> > > +#include <linux/rwsem.h> > > +#include <asm/pgtable.h> > > + > > #include "internal.h" > > > > static struct page *no_page_table(struct vm_area_struct *vma, > > @@ -672,3 +676,277 @@ struct page *get_dump_page(unsigned long addr) > > return page; > > } > > #endif /* CONFIG_ELF_CORE */ > > + > > +#ifdef CONFIG_HAVE_RCU_GUP > > + > > +#ifdef __HAVE_ARCH_PTE_SPECIAL > > Do we actually require this (pte special) if hugepages are disabled or > not supported? We need this logic if we want use fast_gup on normal pages safely. The special bit indicates that we should not attempt to take a reference to the underlying page. Huge pages are guaranteed not to be special. Cheers, -- Steve
next prev parent reply other threads:[~2014-08-27 12:50 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-08-21 15:43 [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast Steve Capper 2014-08-21 15:43 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 1/6] mm: Introduce a general RCU get_user_pages_fast Steve Capper 2014-08-21 15:43 ` Steve Capper 2014-08-27 8:54 ` Will Deacon 2014-08-27 12:50 ` Steve Capper [this message] 2014-08-27 12:50 ` Steve Capper 2014-08-27 13:14 ` Will Deacon 2014-08-27 14:28 ` Catalin Marinas 2014-08-27 14:42 ` Steve Capper 2014-08-27 15:01 ` Russell King - ARM Linux 2014-08-28 8:59 ` Steve Capper 2014-08-28 8:59 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 2/6] arm: mm: Introduce special ptes for LPAE Steve Capper 2014-08-27 10:46 ` Catalin Marinas 2014-08-27 12:52 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper 2014-08-27 11:50 ` Catalin Marinas 2014-08-27 11:50 ` Catalin Marinas 2014-08-27 12:59 ` Steve Capper 2014-08-27 12:59 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 4/6] arm: mm: Enable RCU fast_gup Steve Capper 2014-08-21 15:43 ` Steve Capper 2014-08-27 11:51 ` Catalin Marinas 2014-08-27 11:51 ` Catalin Marinas 2014-08-27 13:01 ` Steve Capper 2014-08-27 13:01 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Steve Capper 2014-08-27 10:48 ` Catalin Marinas 2014-08-27 10:48 ` Catalin Marinas 2014-08-27 13:08 ` Steve Capper 2014-08-21 15:43 ` [PATH V2 6/6] arm64: mm: Enable RCU fast_gup Steve Capper 2014-08-21 15:43 ` Steve Capper 2014-08-27 11:09 ` Catalin Marinas 2014-08-27 13:43 ` Steve Capper 2014-08-27 13:43 ` Steve Capper 2014-08-21 20:42 ` [PATH V2 0/6] RCU get_user_pages_fast and __get_user_pages_fast Dann Frazier 2014-08-21 20:42 ` Dann Frazier 2014-08-22 8:11 ` Steve Capper 2014-08-22 8:11 ` Steve Capper
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20140827125027.GA7765@linaro.org \ --to=steve.capper@linaro.org \ --cc=Catalin.Marinas@arm.com \ --cc=Mark.Rutland@arm.com \ --cc=akpm@linux-foundation.org \ --cc=anders.roxell@linaro.org \ --cc=christoffer.dall@linaro.org \ --cc=dann.frazier@canonical.com \ --cc=gary.robertson@linaro.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-mm@kvack.org \ --cc=linux@arm.linux.org.uk \ --cc=mgorman@suse.de \ --cc=peterz@infradead.org \ --cc=will.deacon@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).