From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 0/6] get_user_pages_fast for ARM and ARM64 Date: Tue, 6 May 2014 16:30:03 +0100 Message-ID: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: Sender: owner-linux-mm@kvack.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper List-Id: linux-arch.vger.kernel.org Hello, This RFC series implements get_user_pages_fast and __get_user_pages_fast. These are required for Transparent HugePages to function correctly, as a futex on a THP tail will otherwise result in an infinite loop (due to the core implementation of __get_user_pages_fast always returning 0). This series may also be beneficial for direct-IO heavy workloads and certain KVM workloads. The main changes since RFC V4 are: * corrected the arm64 logic so it now correctly rcu-frees page table backing pages. * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to invalidate TLBs anyway. * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge). * dropped Catalin's mmu_gather patch as that's been merged already. I would really appreciate any comments (especially on the validity or otherwise of the core fast_gup implementation) and/or testers. Cheers, -- Steve Steve Capper (6): mm: Introduce a general RCU get_user_pages_fast. arm: mm: Introduce special ptes for LPAE arm: mm: Enable HAVE_RCU_TABLE_FREE logic arm: mm: Enable RCU fast_gup arm64: mm: Enable HAVE_RCU_TABLE_FREE logic arm64: mm: Enable RCU fast_gup arch/arm/Kconfig | 4 + arch/arm/include/asm/pgtable-2level.h | 2 + arch/arm/include/asm/pgtable-3level.h | 14 ++ arch/arm/include/asm/pgtable.h | 6 +- arch/arm/include/asm/tlb.h | 38 ++++- arch/arm/mm/flush.c | 19 +++ arch/arm64/Kconfig | 4 + arch/arm64/include/asm/pgtable.h | 8 +- arch/arm64/include/asm/tlb.h | 18 ++- arch/arm64/mm/flush.c | 19 +++ mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++ 13 files changed, 424 insertions(+), 9 deletions(-) create mode 100644 mm/gup.c -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Tue, 6 May 2014 16:30:04 +0100 Message-ID: <1399390209-1756-2-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper List-Id: linux-arch.vger.kernel.org A general RCU implementation of get_user_pages_fast. It is based on the PowerPC implementation. The lockless page cache protocols are used as this implementation assumes that TLB invalidations do not necessarily need to be broadcast via IPI. This implementation does however assume that THP splits will broadcast an IPI, and this is why interrupts are disabled in the fast_gup walker (otherwise calls to rcu_read_(un)lock would suffice). Signed-off-by: Steve Capper --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 301 insertions(+) create mode 100644 mm/gup.c diff --git a/mm/Kconfig b/mm/Kconfig index ebe5880..8848a16 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -134,6 +134,9 @@ config HAVE_MEMBLOCK config HAVE_MEMBLOCK_NODE_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/Makefile b/mm/Makefile index b484452..83e6ac2 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -29,6 +29,7 @@ else endif obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o +obj-$(CONFIG_HAVE_RCU_GUP) += gup.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o diff --git a/mm/gup.c b/mm/gup.c new file mode 100644 index 0000000..b35296f --- /dev/null +++ b/mm/gup.c @@ -0,0 +1,297 @@ +/* + * mm/gup.c + * + * Copyright (C) 2014 Linaro Ltd. + * + * Based on arch/powerpc/mm/gup.c which is: + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (!pmd_present(orig) || (write && !pmd_write(orig))) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + pmd_t origpmd = __pmd(pud_val(orig)); + int refs; + + if (!pmd_present(origpmd) || (write && !pmd_write(origpmd))) + return 0; + + refs = 0; + head = pmd_page(origpmd); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +} -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup Date: Tue, 6 May 2014 16:30:07 +0100 Message-ID: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper List-Id: linux-arch.vger.kernel.org Activate the RCU fast_gup for ARM. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 3 +++ arch/arm/include/asm/pgtable-3level.h | 6 ++++++ arch/arm/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 6cfdb3b..d0572f1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1803,6 +1803,9 @@ config ARCH_SELECT_MEMORY_MODEL config HAVE_ARCH_PFN_VALID def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM +config HAVE_RCU_GUP + def_bool y + config HIGHMEM bool "High Memory Support" depends on MMU diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index b286ba9..fdc4a4f 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -226,6 +226,12 @@ static inline pte_t pte_mkspecial(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING) + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif #endif #define PMD_BIT_FUNC(fn,op) \ diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 3387e60..91a2b59 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l */ __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Date: Tue, 6 May 2014 16:30:08 +0100 Message-ID: <1399390209-1756-6-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper List-Id: linux-arch.vger.kernel.org In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e759af5..2420390 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -43,6 +43,7 @@ config ARM64 select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index 80e2c08..8e4dde5 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -23,6 +23,20 @@ #include +#include +#include + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * There's three ways the TLB shootdown code is used: * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region(). @@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, { pgtable_page_dtor(pte); tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } #ifndef CONFIG_ARM64_64K_PAGES @@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr) { tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); } #endif -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Date: Tue, 6 May 2014 16:30:06 +0100 Message-ID: <1399390209-1756-4-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: Received: from mail-we0-f172.google.com ([74.125.82.172]:61787 "EHLO mail-we0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752046AbaEFPab (ORCPT ); Tue, 6 May 2014 11:30:31 -0400 Received: by mail-we0-f172.google.com with SMTP id k48so3516626wev.17 for ; Tue, 06 May 2014 08:30:30 -0700 (PDT) In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. One way to achieve this is to have the walker disable interrupts, and rely on IPIs from the TLB flushing code blocking before the page table pages are freed. On some ARM platforms we have hardware TLB invalidation, thus the TLB flushing code won't necessarily broadcast IPIs. Also spuriously broadcasting IPIs can hurt system performance if done too often. This problem has already been solved on PowerPC and Sparc by batching up page table pages belonging to more than one mm_user, then scheduling an rcu_sched callback to free the pages. If one were to disable interrupts, that would delay the scheduling interrupts thus block the page table pages being freed. This logic has also been promoted to core code and is activated when one enables HAVE_RCU_TABLE_FREE. This patch enables HAVE_RCU_TABLE_FREE and incorporates it into the existing ARM TLB logic. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 1 + arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index db3c541..6cfdb3b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE if (SMP && CPU_V7) select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h index f1a0dac..3cadb72 100644 --- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -35,12 +35,39 @@ #define MMU_GATHER_BUNDLE 8 +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} + +struct mmu_table_batch { + struct rcu_head rcu; + unsigned int nr; + void *tables[0]; +}; + +#define MAX_TABLE_BATCH \ + ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *)) + +extern void tlb_table_flush(struct mmu_gather *tlb); +extern void tlb_remove_table(struct mmu_gather *tlb, void *table); + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * TLB handling. This allows us to remove pages from the page * tables, and efficiently handle the TLB issues. */ struct mmu_gather { struct mm_struct *mm; +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + struct mmu_table_batch *batch; + unsigned int need_flush; +#endif unsigned int fullmm; struct vm_area_struct *vma; unsigned long start, end; @@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb) static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) { tlb_flush(tlb); +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb_table_flush(tlb); +#endif } static inline void tlb_flush_mmu_free(struct mmu_gather *tlb) @@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start tlb->pages = tlb->local; tlb->nr = 0; __tlb_alloc_page(tlb); + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb->batch = NULL; +#endif } static inline void @@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, tlb_add_flush(tlb, addr + SZ_1M); #endif - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, @@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, { #ifdef CONFIG_ARM_LPAE tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); #endif } -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 2/6] arm: mm: Introduce special ptes for LPAE Date: Tue, 6 May 2014 16:30:05 +0100 Message-ID: <1399390209-1756-3-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Return-path: In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper List-Id: linux-arch.vger.kernel.org We need a mechanism to tag ptes as being special, this indicates that no attempt should be made to access the underlying struct page * associated with the pte. This is used by the fast_gup when operating on ptes as it has no means to access VMAs (that also contain this information) locklessly. The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies pte_special and pte_mkspecial to make use of it, and defines __HAVE_ARCH_PTE_SPECIAL. This patch also excludes special ptes from the icache/dcache sync logic. Signed-off-by: Steve Capper --- arch/arm/include/asm/pgtable-2level.h | 2 ++ arch/arm/include/asm/pgtable-3level.h | 8 ++++++++ arch/arm/include/asm/pgtable.h | 6 ++---- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 219ac88..f027941 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pmd_addr_end(addr,end) (end) #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext) +#define pte_special(pte) (0) +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * We don't have huge page support for short descriptors, for the moment diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 85c60ad..b286ba9 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pte_huge(pte) (pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT)) #define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT)) +#define pte_special(pte) (!!(pte_val(pte) & L_PTE_SPECIAL)) +static inline pte_t pte_mkspecial(pte_t pte) +{ + pte_val(pte) |= L_PTE_SPECIAL; + return pte; +} +#define __HAVE_ARCH_PTE_SPECIAL + #define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF) #define __HAVE_ARCH_PMD_WRITE diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 5478e5d..63b1db2 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd) #define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY) #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) #define pte_exec(pte) (!(pte_val(pte) & L_PTE_XN)) -#define pte_special(pte) (0) #define pte_valid_user(pte) \ (pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte)) @@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, unsigned long ext = 0; if (addr < TASK_SIZE && pte_valid_user(pteval)) { - __sync_icache_dcache(pteval); + if (!pte_special(pteval)) + __sync_icache_dcache(pteval); ext |= PTE_EXT_NG; } @@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung, |= L_PTE_YOUNG); PTE_BIT_FUNC(mkexec, &= ~L_PTE_XN); PTE_BIT_FUNC(mknexec, |= L_PTE_XN); -static inline pte_t pte_mkspecial(pte_t pte) { return pte; } - static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER | -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: [RFC PATCH V5 6/6] arm64: mm: Enable RCU fast_gup Date: Tue, 6 May 2014 16:30:09 +0100 Message-ID: <1399390209-1756-7-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: anders.roxell@linaro.org, peterz@infradead.org, gary.robertson@linaro.org, will.deacon@arm.com, Steve Capper , akpm@linux-foundation.org, christoffer.dall@linaro.org List-Id: linux-arch.vger.kernel.org Activate the RCU fast_gup for ARM64. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 3 +++ arch/arm64/include/asm/pgtable.h | 8 +++++++- arch/arm64/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2420390..5168949 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,9 @@ config GENERIC_CALIBRATE_DELAY config ZONE_DMA def_bool y +config HAVE_RCU_GUP + def_bool y + config ARCH_DMA_ADDR_T_64BIT def_bool y diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 90c811f..126ed77e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -244,7 +244,13 @@ static inline pmd_t pte_pmd(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) pte_special(pmd_pte(pmd)) -#endif +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +struct vm_area_struct; +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index e4193e3..ddf96c1 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page); */ EXPORT_SYMBOL(flush_cache_all); EXPORT_SYMBOL(flush_icache_range); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Covington Subject: Re: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup Date: Tue, 13 May 2014 11:31:24 -0400 Message-ID: <53723ACC.7020500@codeaurora.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org To: Steve Capper Cc: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org, anders.roxell@linaro.org, peterz@infradead.org, gary.robertson@linaro.org, will.deacon@arm.com, akpm@linux-foundation.org, christoffer.dall@linaro.org List-Id: linux-arch.vger.kernel.org Hi Steve, On 05/06/2014 11:30 AM, Steve Capper wrote: > Activate the RCU fast_gup for ARM. We also need to force THP splits to > broadcast an IPI s.t. we block in the fast_gup page walker. As THP > splits are comparatively rare, this should not lead to a noticeable > performance degradation. > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c > index 3387e60..91a2b59 100644 > --- a/arch/arm/mm/flush.c > +++ b/arch/arm/mm/flush.c > @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l > */ > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > } > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE This is trivia, but I for one find the form #if defined(a) && defined(b) easier to read. (Applies to the A64 version as well). Christopher -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Capper Subject: Re: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup Date: Wed, 14 May 2014 09:34:16 +0100 Message-ID: References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> <53723ACC.7020500@codeaurora.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: <53723ACC.7020500@codeaurora.org> Sender: owner-linux-mm@kvack.org To: Christopher Covington Cc: "linux-arm-kernel@lists.infradead.org" , Catalin Marinas , "linux@arm.linux.org.uk" , "linux-arch@vger.kernel.org" , "linux-mm@kvack.org" , Anders Roxell , Peter Zijlstra , Gary Robertson , Will Deacon , "akpm@linux-foundation.org" , Christoffer Dall List-Id: linux-arch.vger.kernel.org On 13 May 2014 16:31, Christopher Covington wrote: > Hi Steve, > > On 05/06/2014 11:30 AM, Steve Capper wrote: >> Activate the RCU fast_gup for ARM. We also need to force THP splits to >> broadcast an IPI s.t. we block in the fast_gup page walker. As THP >> splits are comparatively rare, this should not lead to a noticeable >> performance degradation. > >> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c >> index 3387e60..91a2b59 100644 >> --- a/arch/arm/mm/flush.c >> +++ b/arch/arm/mm/flush.c >> @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l >> */ >> __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); >> } >> + >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > > This is trivia, but I for one find the form #if defined(a) && defined(b) > easier to read. (Applies to the A64 version as well). > Thank you Christopher, I agree that looks nicer. Cheers, -- Steve > Christopher > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by the Linux Foundation. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f42.google.com ([74.125.82.42]:41674 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752018AbaEFPaZ (ORCPT ); Tue, 6 May 2014 11:30:25 -0400 Received: by mail-wg0-f42.google.com with SMTP id y10so2979607wgg.1 for ; Tue, 06 May 2014 08:30:24 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 0/6] get_user_pages_fast for ARM and ARM64 Date: Tue, 6 May 2014 16:30:03 +0100 Message-ID: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153003.2GCE-6KaxdF_P53s_LCD3mLWRAQAbCJbZSV-LJ5YZJs@z> Hello, This RFC series implements get_user_pages_fast and __get_user_pages_fast. These are required for Transparent HugePages to function correctly, as a futex on a THP tail will otherwise result in an infinite loop (due to the core implementation of __get_user_pages_fast always returning 0). This series may also be beneficial for direct-IO heavy workloads and certain KVM workloads. The main changes since RFC V4 are: * corrected the arm64 logic so it now correctly rcu-frees page table backing pages. * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to invalidate TLBs anyway. * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge). * dropped Catalin's mmu_gather patch as that's been merged already. I would really appreciate any comments (especially on the validity or otherwise of the core fast_gup implementation) and/or testers. Cheers, -- Steve Steve Capper (6): mm: Introduce a general RCU get_user_pages_fast. arm: mm: Introduce special ptes for LPAE arm: mm: Enable HAVE_RCU_TABLE_FREE logic arm: mm: Enable RCU fast_gup arm64: mm: Enable HAVE_RCU_TABLE_FREE logic arm64: mm: Enable RCU fast_gup arch/arm/Kconfig | 4 + arch/arm/include/asm/pgtable-2level.h | 2 + arch/arm/include/asm/pgtable-3level.h | 14 ++ arch/arm/include/asm/pgtable.h | 6 +- arch/arm/include/asm/tlb.h | 38 ++++- arch/arm/mm/flush.c | 19 +++ arch/arm64/Kconfig | 4 + arch/arm64/include/asm/pgtable.h | 8 +- arch/arm64/include/asm/tlb.h | 18 ++- arch/arm64/mm/flush.c | 19 +++ mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++ 13 files changed, 424 insertions(+), 9 deletions(-) create mode 100644 mm/gup.c -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f174.google.com ([74.125.82.174]:55625 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752018AbaEFPa3 (ORCPT ); Tue, 6 May 2014 11:30:29 -0400 Received: by mail-we0-f174.google.com with SMTP id k48so9062740wev.5 for ; Tue, 06 May 2014 08:30:28 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 2/6] arm: mm: Introduce special ptes for LPAE Date: Tue, 6 May 2014 16:30:05 +0100 Message-ID: <1399390209-1756-3-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153005.iNrlN9WgJH_TYbOvnhsPnQJem7o-7aan_f0422D1uqA@z> We need a mechanism to tag ptes as being special, this indicates that no attempt should be made to access the underlying struct page * associated with the pte. This is used by the fast_gup when operating on ptes as it has no means to access VMAs (that also contain this information) locklessly. The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies pte_special and pte_mkspecial to make use of it, and defines __HAVE_ARCH_PTE_SPECIAL. This patch also excludes special ptes from the icache/dcache sync logic. Signed-off-by: Steve Capper --- arch/arm/include/asm/pgtable-2level.h | 2 ++ arch/arm/include/asm/pgtable-3level.h | 8 ++++++++ arch/arm/include/asm/pgtable.h | 6 ++---- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 219ac88..f027941 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pmd_addr_end(addr,end) (end) #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext) +#define pte_special(pte) (0) +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * We don't have huge page support for short descriptors, for the moment diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 85c60ad..b286ba9 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pte_huge(pte) (pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT)) #define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT)) +#define pte_special(pte) (!!(pte_val(pte) & L_PTE_SPECIAL)) +static inline pte_t pte_mkspecial(pte_t pte) +{ + pte_val(pte) |= L_PTE_SPECIAL; + return pte; +} +#define __HAVE_ARCH_PTE_SPECIAL + #define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF) #define __HAVE_ARCH_PMD_WRITE diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 5478e5d..63b1db2 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd) #define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY) #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) #define pte_exec(pte) (!(pte_val(pte) & L_PTE_XN)) -#define pte_special(pte) (0) #define pte_valid_user(pte) \ (pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte)) @@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, unsigned long ext = 0; if (addr < TASK_SIZE && pte_valid_user(pteval)) { - __sync_icache_dcache(pteval); + if (!pte_special(pteval)) + __sync_icache_dcache(pteval); ext |= PTE_EXT_NG; } @@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung, |= L_PTE_YOUNG); PTE_BIT_FUNC(mkexec, &= ~L_PTE_XN); PTE_BIT_FUNC(mknexec, |= L_PTE_XN); -static inline pte_t pte_mkspecial(pte_t pte) { return pte; } - static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER | -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f179.google.com ([74.125.82.179]:40623 "EHLO mail-we0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752018AbaEFPac (ORCPT ); Tue, 6 May 2014 11:30:32 -0400 Received: by mail-we0-f179.google.com with SMTP id q59so6919133wes.10 for ; Tue, 06 May 2014 08:30:31 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup Date: Tue, 6 May 2014 16:30:07 +0100 Message-ID: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153007.Zdi8suMHUDnNmQWpzriq22Ri5Pam7A3mUFkvrnuEsyo@z> Activate the RCU fast_gup for ARM. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 3 +++ arch/arm/include/asm/pgtable-3level.h | 6 ++++++ arch/arm/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 6cfdb3b..d0572f1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1803,6 +1803,9 @@ config ARCH_SELECT_MEMORY_MODEL config HAVE_ARCH_PFN_VALID def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM +config HAVE_RCU_GUP + def_bool y + config HIGHMEM bool "High Memory Support" depends on MMU diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index b286ba9..fdc4a4f 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -226,6 +226,12 @@ static inline pte_t pte_mkspecial(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING) + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif #endif #define PMD_BIT_FUNC(fn,op) \ diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 3387e60..91a2b59 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l */ __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f180.google.com ([74.125.82.180]:55408 "EHLO mail-we0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752018AbaEFPa1 (ORCPT ); Tue, 6 May 2014 11:30:27 -0400 Received: by mail-we0-f180.google.com with SMTP id t61so3813122wes.25 for ; Tue, 06 May 2014 08:30:26 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 1/6] mm: Introduce a general RCU get_user_pages_fast. Date: Tue, 6 May 2014 16:30:04 +0100 Message-ID: <1399390209-1756-2-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153004.D6lcPfoe0hNl0FxkleV6t8o3MWv5qnuQl1EazN9CDZI@z> A general RCU implementation of get_user_pages_fast. It is based on the PowerPC implementation. The lockless page cache protocols are used as this implementation assumes that TLB invalidations do not necessarily need to be broadcast via IPI. This implementation does however assume that THP splits will broadcast an IPI, and this is why interrupts are disabled in the fast_gup walker (otherwise calls to rcu_read_(un)lock would suffice). Signed-off-by: Steve Capper --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 301 insertions(+) create mode 100644 mm/gup.c diff --git a/mm/Kconfig b/mm/Kconfig index ebe5880..8848a16 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -134,6 +134,9 @@ config HAVE_MEMBLOCK config HAVE_MEMBLOCK_NODE_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/Makefile b/mm/Makefile index b484452..83e6ac2 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -29,6 +29,7 @@ else endif obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o +obj-$(CONFIG_HAVE_RCU_GUP) += gup.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o diff --git a/mm/gup.c b/mm/gup.c new file mode 100644 index 0000000..b35296f --- /dev/null +++ b/mm/gup.c @@ -0,0 +1,297 @@ +/* + * mm/gup.c + * + * Copyright (C) 2014 Linaro Ltd. + * + * Based on arch/powerpc/mm/gup.c which is: + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (!pmd_present(orig) || (write && !pmd_write(orig))) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + pmd_t origpmd = __pmd(pud_val(orig)); + int refs; + + if (!pmd_present(origpmd) || (write && !pmd_write(origpmd))) + return 0; + + refs = 0; + head = pmd_page(origpmd); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +} -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f171.google.com ([209.85.212.171]:45624 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755330AbaEFPag (ORCPT ); Tue, 6 May 2014 11:30:36 -0400 Received: by mail-wi0-f171.google.com with SMTP id hm4so7524237wib.10 for ; Tue, 06 May 2014 08:30:35 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 6/6] arm64: mm: Enable RCU fast_gup Date: Tue, 6 May 2014 16:30:09 +0100 Message-ID: <1399390209-1756-7-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153009.O0SxNPEYEMj9BKaEeo9XduP90vHZGsz9KvCEmejWKI4@z> Activate the RCU fast_gup for ARM64. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 3 +++ arch/arm64/include/asm/pgtable.h | 8 +++++++- arch/arm64/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2420390..5168949 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,9 @@ config GENERIC_CALIBRATE_DELAY config ZONE_DMA def_bool y +config HAVE_RCU_GUP + def_bool y + config ARCH_DMA_ADDR_T_64BIT def_bool y diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 90c811f..126ed77e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -244,7 +244,13 @@ static inline pmd_t pte_pmd(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) pte_special(pmd_pte(pmd)) -#endif +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +struct vm_area_struct; +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index e4193e3..ddf96c1 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page); */ EXPORT_SYMBOL(flush_cache_all); EXPORT_SYMBOL(flush_icache_range); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com ([209.85.212.179]:50179 "EHLO mail-wi0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752018AbaEFPae (ORCPT ); Tue, 6 May 2014 11:30:34 -0400 Received: by mail-wi0-f179.google.com with SMTP id bs8so1168029wib.0 for ; Tue, 06 May 2014 08:30:33 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic Date: Tue, 6 May 2014 16:30:08 +0100 Message-ID: <1399390209-1756-6-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: linux-arch-owner@vger.kernel.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Message-ID: <20140506153008.5RaLJmFIm6BqRpIcoweRk1l5WqElbvq4OKF-2K5DAsU@z> In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e759af5..2420390 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -43,6 +43,7 @@ config ARM64 select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index 80e2c08..8e4dde5 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -23,6 +23,20 @@ #include +#include +#include + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * There's three ways the TLB shootdown code is used: * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region(). @@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, { pgtable_page_dtor(pte); tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } #ifndef CONFIG_ARM64_64K_PAGES @@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr) { tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); } #endif -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.codeaurora.org ([198.145.11.231]:54584 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754440AbaEMPb2 (ORCPT ); Tue, 13 May 2014 11:31:28 -0400 Message-ID: <53723ACC.7020500@codeaurora.org> Date: Tue, 13 May 2014 11:31:24 -0400 From: Christopher Covington MIME-Version: 1.0 Subject: Re: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-arch-owner@vger.kernel.org List-ID: To: Steve Capper Cc: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org, anders.roxell@linaro.org, peterz@infradead.org, gary.robertson@linaro.org, will.deacon@arm.com, akpm@linux-foundation.org, christoffer.dall@linaro.org Message-ID: <20140513153124.mL3GQ_cTAWHc2_X88OHtv3cXNOqw7pqa4Eq3d0vTo8g@z> Hi Steve, On 05/06/2014 11:30 AM, Steve Capper wrote: > Activate the RCU fast_gup for ARM. We also need to force THP splits to > broadcast an IPI s.t. we block in the fast_gup page walker. As THP > splits are comparatively rare, this should not lead to a noticeable > performance degradation. > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c > index 3387e60..91a2b59 100644 > --- a/arch/arm/mm/flush.c > +++ b/arch/arm/mm/flush.c > @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l > */ > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > } > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE This is trivia, but I for one find the form #if defined(a) && defined(b) easier to read. (Applies to the A64 version as well). Christopher -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yh0-f42.google.com ([209.85.213.42]:48300 "EHLO mail-yh0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752065AbaENIeR (ORCPT ); Wed, 14 May 2014 04:34:17 -0400 Received: by mail-yh0-f42.google.com with SMTP id t59so1392459yho.29 for ; Wed, 14 May 2014 01:34:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53723ACC.7020500@codeaurora.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> <53723ACC.7020500@codeaurora.org> Date: Wed, 14 May 2014 09:34:16 +0100 Message-ID: Subject: Re: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup From: Steve Capper Content-Type: text/plain; charset=UTF-8 Sender: linux-arch-owner@vger.kernel.org List-ID: To: Christopher Covington Cc: "linux-arm-kernel@lists.infradead.org" , Catalin Marinas , "linux@arm.linux.org.uk" , "linux-arch@vger.kernel.org" , "linux-mm@kvack.org" , Anders Roxell , Peter Zijlstra , Gary Robertson , Will Deacon , "akpm@linux-foundation.org" , Christoffer Dall Message-ID: <20140514083416.WSizXrh07tJBgROaaevcS0AvTzj3B2kLuW5CThrZ4lw@z> On 13 May 2014 16:31, Christopher Covington wrote: > Hi Steve, > > On 05/06/2014 11:30 AM, Steve Capper wrote: >> Activate the RCU fast_gup for ARM. We also need to force THP splits to >> broadcast an IPI s.t. we block in the fast_gup page walker. As THP >> splits are comparatively rare, this should not lead to a noticeable >> performance degradation. > >> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c >> index 3387e60..91a2b59 100644 >> --- a/arch/arm/mm/flush.c >> +++ b/arch/arm/mm/flush.c >> @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l >> */ >> __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); >> } >> + >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > > This is trivia, but I for one find the form #if defined(a) && defined(b) > easier to read. (Applies to the A64 version as well). > Thank you Christopher, I agree that looks nicer. Cheers, -- Steve > Christopher > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by the Linux Foundation. From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:06 +0100 Subject: [RFC PATCH V5 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-4-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. One way to achieve this is to have the walker disable interrupts, and rely on IPIs from the TLB flushing code blocking before the page table pages are freed. On some ARM platforms we have hardware TLB invalidation, thus the TLB flushing code won't necessarily broadcast IPIs. Also spuriously broadcasting IPIs can hurt system performance if done too often. This problem has already been solved on PowerPC and Sparc by batching up page table pages belonging to more than one mm_user, then scheduling an rcu_sched callback to free the pages. If one were to disable interrupts, that would delay the scheduling interrupts thus block the page table pages being freed. This logic has also been promoted to core code and is activated when one enables HAVE_RCU_TABLE_FREE. This patch enables HAVE_RCU_TABLE_FREE and incorporates it into the existing ARM TLB logic. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 1 + arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index db3c541..6cfdb3b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE if (SMP && CPU_V7) select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h index f1a0dac..3cadb72 100644 --- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -35,12 +35,39 @@ #define MMU_GATHER_BUNDLE 8 +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} + +struct mmu_table_batch { + struct rcu_head rcu; + unsigned int nr; + void *tables[0]; +}; + +#define MAX_TABLE_BATCH \ + ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *)) + +extern void tlb_table_flush(struct mmu_gather *tlb); +extern void tlb_remove_table(struct mmu_gather *tlb, void *table); + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * TLB handling. This allows us to remove pages from the page * tables, and efficiently handle the TLB issues. */ struct mmu_gather { struct mm_struct *mm; +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + struct mmu_table_batch *batch; + unsigned int need_flush; +#endif unsigned int fullmm; struct vm_area_struct *vma; unsigned long start, end; @@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb) static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) { tlb_flush(tlb); +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb_table_flush(tlb); +#endif } static inline void tlb_flush_mmu_free(struct mmu_gather *tlb) @@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start tlb->pages = tlb->local; tlb->nr = 0; __tlb_alloc_page(tlb); + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb->batch = NULL; +#endif } static inline void @@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, tlb_add_flush(tlb, addr + SZ_1M); #endif - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, @@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, { #ifdef CONFIG_ARM_LPAE tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); #endif } -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:03 +0100 Subject: [RFC PATCH V5 0/6] get_user_pages_fast for ARM and ARM64 Message-ID: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hello, This RFC series implements get_user_pages_fast and __get_user_pages_fast. These are required for Transparent HugePages to function correctly, as a futex on a THP tail will otherwise result in an infinite loop (due to the core implementation of __get_user_pages_fast always returning 0). This series may also be beneficial for direct-IO heavy workloads and certain KVM workloads. The main changes since RFC V4 are: * corrected the arm64 logic so it now correctly rcu-frees page table backing pages. * rcu free logic relaxed for pre-ARMv7 ARM as we need an IPI to invalidate TLBs anyway. * rebased to 3.15-rc3 (some minor changes were needed to allow it to merge). * dropped Catalin's mmu_gather patch as that's been merged already. I would really appreciate any comments (especially on the validity or otherwise of the core fast_gup implementation) and/or testers. Cheers, -- Steve Steve Capper (6): mm: Introduce a general RCU get_user_pages_fast. arm: mm: Introduce special ptes for LPAE arm: mm: Enable HAVE_RCU_TABLE_FREE logic arm: mm: Enable RCU fast_gup arm64: mm: Enable HAVE_RCU_TABLE_FREE logic arm64: mm: Enable RCU fast_gup arch/arm/Kconfig | 4 + arch/arm/include/asm/pgtable-2level.h | 2 + arch/arm/include/asm/pgtable-3level.h | 14 ++ arch/arm/include/asm/pgtable.h | 6 +- arch/arm/include/asm/tlb.h | 38 ++++- arch/arm/mm/flush.c | 19 +++ arch/arm64/Kconfig | 4 + arch/arm64/include/asm/pgtable.h | 8 +- arch/arm64/include/asm/tlb.h | 18 ++- arch/arm64/mm/flush.c | 19 +++ mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++ 13 files changed, 424 insertions(+), 9 deletions(-) create mode 100644 mm/gup.c -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:04 +0100 Subject: [RFC PATCH V5 1/6] mm: Introduce a general RCU get_user_pages_fast. In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-2-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org A general RCU implementation of get_user_pages_fast. It is based on the PowerPC implementation. The lockless page cache protocols are used as this implementation assumes that TLB invalidations do not necessarily need to be broadcast via IPI. This implementation does however assume that THP splits will broadcast an IPI, and this is why interrupts are disabled in the fast_gup walker (otherwise calls to rcu_read_(un)lock would suffice). Signed-off-by: Steve Capper --- mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 297 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 301 insertions(+) create mode 100644 mm/gup.c diff --git a/mm/Kconfig b/mm/Kconfig index ebe5880..8848a16 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -134,6 +134,9 @@ config HAVE_MEMBLOCK config HAVE_MEMBLOCK_NODE_MAP boolean +config HAVE_RCU_GUP + boolean + config ARCH_DISCARD_MEMBLOCK boolean diff --git a/mm/Makefile b/mm/Makefile index b484452..83e6ac2 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -29,6 +29,7 @@ else endif obj-$(CONFIG_HAVE_MEMBLOCK) += memblock.o +obj-$(CONFIG_HAVE_RCU_GUP) += gup.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o diff --git a/mm/gup.c b/mm/gup.c new file mode 100644 index 0000000..b35296f --- /dev/null +++ b/mm/gup.c @@ -0,0 +1,297 @@ +/* + * mm/gup.c + * + * Copyright (C) 2014 Linaro Ltd. + * + * Based on arch/powerpc/mm/gup.c which is: + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __HAVE_ARCH_PTE_SPECIAL +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + pte_t *ptep, *ptem; + int ret = 0; + + ptem = ptep = pte_offset_map(&pmd, addr); + do { + pte_t pte = ACCESS_ONCE(*ptep); + struct page *page; + + if (!pte_present(pte) || pte_special(pte) + || (write && !pte_write(pte))) + goto pte_unmap; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + goto pte_unmap; + + if (unlikely(pte_val(pte) != pte_val(*ptep))) { + put_page(page); + goto pte_unmap; + } + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + ret = 1; + +pte_unmap: + pte_unmap(ptem); + return ret; +} +#else + +/* + * If we can't determine whether or not a pte is special, then fail immediately + * for ptes. Note, we can still pin HugeTLB and THP as these are guaranteed not + * to be special. + */ +static inline int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + return 0; +} +#endif /* __HAVE_ARCH_PTE_SPECIAL */ + +static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + int refs; + + if (!pmd_present(orig) || (write && !pmd_write(orig))) + return 0; + + refs = 0; + head = pmd_page(orig); + page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + /* + * Any tail pages need their mapcount reference taken before we + * return. (This allows the THP code to bump their ref count when + * they are split into base pages). + */ + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + struct page *head, *page, *tail; + pmd_t origpmd = __pmd(pud_val(orig)); + int refs; + + if (!pmd_present(origpmd) || (write && !pmd_write(origpmd))) + return 0; + + refs = 0; + head = pmd_page(origpmd); + page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + tail = page; + do { + VM_BUG_ON(compound_head(page) != head); + pages[*nr] = page; + (*nr)++; + page++; + refs++; + } while (addr += PAGE_SIZE, addr != end); + + if (!page_cache_add_speculative(head, refs)) { + *nr -= refs; + return 0; + } + + if (unlikely(pud_val(orig) != pud_val(*pudp))) { + *nr -= refs; + while (refs--) + put_page(head); + return 0; + } + + while (refs--) { + if (PageTail(tail)) + get_huge_page_tail(tail); + tail++; + } + + return 1; +} + +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(&pud, addr); + do { + pmd_t pmd = ACCESS_ONCE(*pmdp); + next = pmd_addr_end(addr, end); + if (pmd_none(pmd) || pmd_trans_splitting(pmd)) + return 0; + + if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) { + if (!gup_huge_pmd(pmd, pmdp, addr, next, write, + pages, nr)) + return 0; + } else { + if (!gup_pte_range(pmd, addr, next, write, pages, nr)) + return 0; + } + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + pud_t pud = ACCESS_ONCE(*pudp); + next = pud_addr_end(addr, end); + if (pud_none(pud)) + return 0; + if (pud_huge(pud)) { + if (!gup_huge_pud(pud, pudp, addr, next, write, + pages, nr)) + return 0; + } else if (!gup_pmd_range(pud, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +/* + * Like get_user_pages_fast() except its IRQ-safe in that it won't fall + * back to the regular GUP. + */ +int __get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next, flags; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + return 0; + + /* + * Disable interrupts, we use the nested form as we can already + * have interrupts disabled by get_futex_key. + * + * With interrupts disabled, we block page table pages from being + * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h + * for more details. + * + * We do not adopt an rcu_read_lock(.) here as we also want to + * block IPIs that come from THPs splitting. + */ + + local_irq_save(flags); + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + break; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + break; + } while (pgdp++, addr = next, addr != end); + local_irq_restore(flags); + + return nr; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + int nr, ret; + + start &= PAGE_MASK; + nr = __get_user_pages_fast(start, nr_pages, write, pages); + ret = nr; + + if (nr < nr_pages) { + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + nr_pages - nr, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + } + + return ret; +} -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:05 +0100 Subject: [RFC PATCH V5 2/6] arm: mm: Introduce special ptes for LPAE In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-3-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org We need a mechanism to tag ptes as being special, this indicates that no attempt should be made to access the underlying struct page * associated with the pte. This is used by the fast_gup when operating on ptes as it has no means to access VMAs (that also contain this information) locklessly. The L_PTE_SPECIAL bit is already allocated for LPAE, this patch modifies pte_special and pte_mkspecial to make use of it, and defines __HAVE_ARCH_PTE_SPECIAL. This patch also excludes special ptes from the icache/dcache sync logic. Signed-off-by: Steve Capper --- arch/arm/include/asm/pgtable-2level.h | 2 ++ arch/arm/include/asm/pgtable-3level.h | 8 ++++++++ arch/arm/include/asm/pgtable.h | 6 ++---- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 219ac88..f027941 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -182,6 +182,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pmd_addr_end(addr,end) (end) #define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext) +#define pte_special(pte) (0) +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * We don't have huge page support for short descriptors, for the moment diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 85c60ad..b286ba9 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -207,6 +207,14 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr) #define pte_huge(pte) (pte_val(pte) && !(pte_val(pte) & PTE_TABLE_BIT)) #define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT)) +#define pte_special(pte) (!!(pte_val(pte) & L_PTE_SPECIAL)) +static inline pte_t pte_mkspecial(pte_t pte) +{ + pte_val(pte) |= L_PTE_SPECIAL; + return pte; +} +#define __HAVE_ARCH_PTE_SPECIAL + #define pmd_young(pmd) (pmd_val(pmd) & PMD_SECT_AF) #define __HAVE_ARCH_PMD_WRITE diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 5478e5d..63b1db2 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -222,7 +222,6 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd) #define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY) #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) #define pte_exec(pte) (!(pte_val(pte) & L_PTE_XN)) -#define pte_special(pte) (0) #define pte_valid_user(pte) \ (pte_valid(pte) && (pte_val(pte) & L_PTE_USER) && pte_young(pte)) @@ -241,7 +240,8 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, unsigned long ext = 0; if (addr < TASK_SIZE && pte_valid_user(pteval)) { - __sync_icache_dcache(pteval); + if (!pte_special(pteval)) + __sync_icache_dcache(pteval); ext |= PTE_EXT_NG; } @@ -260,8 +260,6 @@ PTE_BIT_FUNC(mkyoung, |= L_PTE_YOUNG); PTE_BIT_FUNC(mkexec, &= ~L_PTE_XN); PTE_BIT_FUNC(mknexec, |= L_PTE_XN); -static inline pte_t pte_mkspecial(pte_t pte) { return pte; } - static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER | -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:07 +0100 Subject: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Activate the RCU fast_gup for ARM. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 3 +++ arch/arm/include/asm/pgtable-3level.h | 6 ++++++ arch/arm/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 6cfdb3b..d0572f1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1803,6 +1803,9 @@ config ARCH_SELECT_MEMORY_MODEL config HAVE_ARCH_PFN_VALID def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM +config HAVE_RCU_GUP + def_bool y + config HIGHMEM bool "High Memory Support" depends on MMU diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index b286ba9..fdc4a4f 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -226,6 +226,12 @@ static inline pte_t pte_mkspecial(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING) + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif #endif #define PMD_BIT_FUNC(fn,op) \ diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c index 3387e60..91a2b59 100644 --- a/arch/arm/mm/flush.c +++ b/arch/arm/mm/flush.c @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l */ __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); } + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:08 +0100 Subject: [RFC PATCH V5 5/6] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-6-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlb.h | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e759af5..2420390 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -43,6 +43,7 @@ config ARM64 select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index 80e2c08..8e4dde5 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -23,6 +23,20 @@ #include +#include +#include + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * There's three ways the TLB shootdown code is used: * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region(). @@ -88,7 +102,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, { pgtable_page_dtor(pte); tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } #ifndef CONFIG_ARM64_64K_PAGES @@ -96,7 +110,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr) { tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); } #endif -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Tue, 6 May 2014 16:30:09 +0100 Subject: [RFC PATCH V5 6/6] arm64: mm: Enable RCU fast_gup In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Message-ID: <1399390209-1756-7-git-send-email-steve.capper@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Activate the RCU fast_gup for ARM64. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 3 +++ arch/arm64/include/asm/pgtable.h | 8 +++++++- arch/arm64/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2420390..5168949 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,9 @@ config GENERIC_CALIBRATE_DELAY config ZONE_DMA def_bool y +config HAVE_RCU_GUP + def_bool y + config ARCH_DMA_ADDR_T_64BIT def_bool y diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 90c811f..126ed77e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -244,7 +244,13 @@ static inline pmd_t pte_pmd(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) pte_special(pmd_pte(pmd)) -#endif +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +struct vm_area_struct; +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index e4193e3..ddf96c1 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page); */ EXPORT_SYMBOL(flush_cache_all); EXPORT_SYMBOL(flush_icache_range); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 From: cov@codeaurora.org (Christopher Covington) Date: Tue, 13 May 2014 11:31:24 -0400 Subject: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup In-Reply-To: <1399390209-1756-5-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> Message-ID: <53723ACC.7020500@codeaurora.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Steve, On 05/06/2014 11:30 AM, Steve Capper wrote: > Activate the RCU fast_gup for ARM. We also need to force THP splits to > broadcast an IPI s.t. we block in the fast_gup page walker. As THP > splits are comparatively rare, this should not lead to a noticeable > performance degradation. > diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c > index 3387e60..91a2b59 100644 > --- a/arch/arm/mm/flush.c > +++ b/arch/arm/mm/flush.c > @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l > */ > __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); > } > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE This is trivia, but I for one find the form #if defined(a) && defined(b) easier to read. (Applies to the A64 version as well). Christopher -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation. From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve.capper@linaro.org (Steve Capper) Date: Wed, 14 May 2014 09:34:16 +0100 Subject: [RFC PATCH V5 4/6] arm: mm: Enable RCU fast_gup In-Reply-To: <53723ACC.7020500@codeaurora.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> <1399390209-1756-5-git-send-email-steve.capper@linaro.org> <53723ACC.7020500@codeaurora.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 13 May 2014 16:31, Christopher Covington wrote: > Hi Steve, > > On 05/06/2014 11:30 AM, Steve Capper wrote: >> Activate the RCU fast_gup for ARM. We also need to force THP splits to >> broadcast an IPI s.t. we block in the fast_gup page walker. As THP >> splits are comparatively rare, this should not lead to a noticeable >> performance degradation. > >> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c >> index 3387e60..91a2b59 100644 >> --- a/arch/arm/mm/flush.c >> +++ b/arch/arm/mm/flush.c >> @@ -377,3 +377,22 @@ void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned l >> */ >> __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE); >> } >> + >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > > This is trivia, but I for one find the form #if defined(a) && defined(b) > easier to read. (Applies to the A64 version as well). > Thank you Christopher, I agree that looks nicer. Cheers, -- Steve > Christopher > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by the Linux Foundation. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f181.google.com (mail-we0-f181.google.com [74.125.82.181]) by kanga.kvack.org (Postfix) with ESMTP id 62B99829A8 for ; Tue, 6 May 2014 11:30:32 -0400 (EDT) Received: by mail-we0-f181.google.com with SMTP id w61so2194634wes.26 for ; Tue, 06 May 2014 08:30:31 -0700 (PDT) Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by mx.google.com with ESMTPS id a2si4652014wix.9.2014.05.06.08.30.30 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 06 May 2014 08:30:30 -0700 (PDT) Received: by mail-wg0-f52.google.com with SMTP id l18so8102101wgh.23 for ; Tue, 06 May 2014 08:30:30 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 3/6] arm: mm: Enable HAVE_RCU_TABLE_FREE logic Date: Tue, 6 May 2014 16:30:06 +0100 Message-Id: <1399390209-1756-4-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. One way to achieve this is to have the walker disable interrupts, and rely on IPIs from the TLB flushing code blocking before the page table pages are freed. On some ARM platforms we have hardware TLB invalidation, thus the TLB flushing code won't necessarily broadcast IPIs. Also spuriously broadcasting IPIs can hurt system performance if done too often. This problem has already been solved on PowerPC and Sparc by batching up page table pages belonging to more than one mm_user, then scheduling an rcu_sched callback to free the pages. If one were to disable interrupts, that would delay the scheduling interrupts thus block the page table pages being freed. This logic has also been promoted to core code and is activated when one enables HAVE_RCU_TABLE_FREE. This patch enables HAVE_RCU_TABLE_FREE and incorporates it into the existing ARM TLB logic. Signed-off-by: Steve Capper --- arch/arm/Kconfig | 1 + arch/arm/include/asm/tlb.h | 38 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index db3c541..6cfdb3b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE if (SMP && CPU_V7) select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h index f1a0dac..3cadb72 100644 --- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -35,12 +35,39 @@ #define MMU_GATHER_BUNDLE 8 +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} + +struct mmu_table_batch { + struct rcu_head rcu; + unsigned int nr; + void *tables[0]; +}; + +#define MAX_TABLE_BATCH \ + ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *)) + +extern void tlb_table_flush(struct mmu_gather *tlb); +extern void tlb_remove_table(struct mmu_gather *tlb, void *table); + +#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) +#else +#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + /* * TLB handling. This allows us to remove pages from the page * tables, and efficiently handle the TLB issues. */ struct mmu_gather { struct mm_struct *mm; +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + struct mmu_table_batch *batch; + unsigned int need_flush; +#endif unsigned int fullmm; struct vm_area_struct *vma; unsigned long start, end; @@ -101,6 +128,9 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb) static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) { tlb_flush(tlb); +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb_table_flush(tlb); +#endif } static inline void tlb_flush_mmu_free(struct mmu_gather *tlb) @@ -129,6 +159,10 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start tlb->pages = tlb->local; tlb->nr = 0; __tlb_alloc_page(tlb); + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb->batch = NULL; +#endif } static inline void @@ -205,7 +239,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, tlb_add_flush(tlb, addr + SZ_1M); #endif - tlb_remove_page(tlb, pte); + tlb_remove_entry(tlb, pte); } static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, @@ -213,7 +247,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, { #ifdef CONFIG_ARM_LPAE tlb_add_flush(tlb, addr); - tlb_remove_page(tlb, virt_to_page(pmdp)); + tlb_remove_entry(tlb, virt_to_page(pmdp)); #endif } -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by kanga.kvack.org (Postfix) with ESMTP id E8C06829AA for ; Tue, 6 May 2014 11:30:36 -0400 (EDT) Received: by mail-wi0-f173.google.com with SMTP id bs8so7515592wib.0 for ; Tue, 06 May 2014 08:30:36 -0700 (PDT) Received: from mail-we0-f180.google.com (mail-we0-f180.google.com [74.125.82.180]) by mx.google.com with ESMTPS id hj16si4643786wib.59.2014.05.06.08.30.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 06 May 2014 08:30:35 -0700 (PDT) Received: by mail-we0-f180.google.com with SMTP id t61so3855218wes.39 for ; Tue, 06 May 2014 08:30:35 -0700 (PDT) From: Steve Capper Subject: [RFC PATCH V5 6/6] arm64: mm: Enable RCU fast_gup Date: Tue, 6 May 2014 16:30:09 +0100 Message-Id: <1399390209-1756-7-git-send-email-steve.capper@linaro.org> In-Reply-To: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> References: <1399390209-1756-1-git-send-email-steve.capper@linaro.org> Sender: owner-linux-mm@kvack.org List-ID: To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux@arm.linux.org.uk, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: will.deacon@arm.com, gary.robertson@linaro.org, christoffer.dall@linaro.org, peterz@infradead.org, anders.roxell@linaro.org, akpm@linux-foundation.org, Steve Capper Activate the RCU fast_gup for ARM64. We also need to force THP splits to broadcast an IPI s.t. we block in the fast_gup page walker. As THP splits are comparatively rare, this should not lead to a noticeable performance degradation. Signed-off-by: Steve Capper --- arch/arm64/Kconfig | 3 +++ arch/arm64/include/asm/pgtable.h | 8 +++++++- arch/arm64/mm/flush.c | 19 +++++++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 2420390..5168949 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -95,6 +95,9 @@ config GENERIC_CALIBRATE_DELAY config ZONE_DMA def_bool y +config HAVE_RCU_GUP + def_bool y + config ARCH_DMA_ADDR_T_64BIT def_bool y diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 90c811f..126ed77e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -244,7 +244,13 @@ static inline pmd_t pte_pmd(pte_t pte) #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) #define pmd_trans_splitting(pmd) pte_special(pmd_pte(pmd)) -#endif +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH +struct vm_area_struct; +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp); +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index e4193e3..ddf96c1 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -103,3 +103,22 @@ EXPORT_SYMBOL(flush_dcache_page); */ EXPORT_SYMBOL(flush_cache_all); EXPORT_SYMBOL(flush_icache_range); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +static void thp_splitting_flush_sync(void *arg) +{ +} + +void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, + pmd_t *pmdp) +{ + pmd_t pmd = pmd_mksplitting(*pmdp); + VM_BUG_ON(address & ~PMD_MASK); + set_pmd_at(vma->vm_mm, address, pmdp, pmd); + + /* dummy IPI to serialise against fast_gup */ + smp_call_function(thp_splitting_flush_sync, NULL, 1); +} +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -- 1.8.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org