From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7B3CBC77B76 for ; Fri, 14 Apr 2023 13:05:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=0vGimOWDgtE7DWQSIToOOnorsImDW4PmoLGXjlyGRng=; b=1ie6QzOCM6n8qf Vujufqd/ezx2qxMCuElbseKZzIzgfWVAzm6HSgvF9mpnzTsM/1YHg7Q5F2f6SSdsletVzBE5bLtRS zSuDBUpo2ag1azRk2a46c4gPDOBbk8Oey5M0GXZzdylovyzPAJT8+OEp81hAMG1YI62laz42v0ILX ZBRMUo/aXfrtZjPGewNP5fW3KJ4lVsf6BjRkApU/Gdj0uXKbsRD5GWtWhTHRtZuThRuYkyEO7L/E2 Js1LySooNYwF1wd7h8Ek1qSIWoM+cDD4qrlWUCISz6CSk+g1DPiln3xsHIzNF3DFtfH1N0fraFWeu eIV3iMfqNocnwAsnlsdw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5o-009bed-26; Fri, 14 Apr 2023 13:03:57 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5W-009bVm-0l for linux-arm-kernel@bombadil.infradead.org; Fri, 14 Apr 2023 13:03:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=umfZXG2tnEOVqPFoBPLwRJ4GEhHWiFP35tKb/83gX6k=; b=FlWrBjCIoqrB7SRFQaWNoZLU9w oiYjLOEinAX3qeBRz4hSshVR6I/GXA/YzG2yESSVlLXudmqk6BgdeR8LpWCiuBqi8HUVWuvf5Oqk6 KhN90LuftojxqloI8suuscaDlZU188oXyd9hXeCrk4XgBTp2OJdV6aiM+VJSQkK1N783wh6xsAzAu eYkfpX0/ogVUqNsHj6Gg/ZeHy2Ys89xFP5tL8Evexxa7U98i+UQedZkQGx96myNDnaySklP+w2sF1 FbZ1ZPd77s7C+4lxAa2xrZpdwG/iEvLcA8S6v3wJk3qdWRQIV1iFfs049SN7W0GdA5RF7ap7W15CJ bPvqV6Xw==; Received: from foss.arm.com ([217.140.110.172]) by desiato.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pnJ5R-00Fa15-0E for linux-arm-kernel@lists.infradead.org; Fri, 14 Apr 2023 13:03:36 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D7B5F175A; Fri, 14 Apr 2023 06:04:10 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5F32E3F6C4; Fri, 14 Apr 2023 06:03:25 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , "Matthew Wilcox (Oracle)" , Yu Zhao , "Yin, Fengwei" Cc: Ryan Roberts , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 06/17] mm: Allocate large folios for anonymous memory Date: Fri, 14 Apr 2023 14:02:52 +0100 Message-Id: <20230414130303.2345383-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230414130303.2345383-1-ryan.roberts@arm.com> References: <20230414130303.2345383-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230414_140333_503237_176A1B7B X-CRM114-Status: GOOD ( 25.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Add the machinery to determine what order of folio to allocate within do_anonymous_page() and deal with racing faults to the same region. For now, the maximum order is set to 4. This should probably be set per-vma based on factors, and adjusted dynamically. Signed-off-by: Ryan Roberts --- mm/memory.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 138 insertions(+), 16 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index d7e34a8c46aa..f92a28064596 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3030,6 +3030,90 @@ static inline int max_anon_folio_order(struct vm_area_struct *vma) return ANON_FOLIO_ORDER_MAX; } +/* + * Returns index of first pte that is not none, or nr if all are none. + */ +static inline int check_ptes_none(pte_t *pte, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + if (!pte_none(*pte++)) + return i; + } + + return nr; +} + +static int calc_anon_folio_order_alloc(struct vm_fault *vmf, int order) +{ + /* + * The aim here is to determine what size of folio we should allocate + * for this fault. Factors include: + * - Order must not be higher than `order` upon entry + * - Folio must be naturally aligned within VA space + * - Folio must not breach boundaries of vma + * - Folio must be fully contained inside one pmd entry + * - Folio must not overlap any non-none ptes + * + * Additionally, we do not allow order-1 since this breaks assumptions + * elsewhere in the mm; THP pages must be at least order-2 (since they + * store state up to the 3rd struct page subpage), and these pages must + * be THP in order to correctly use pre-existing THP infrastructure such + * as folio_split(). + * + * As a consequence of relying on the THP infrastructure, if the system + * does not support THP, we always fallback to order-0. + * + * Note that the caller may or may not choose to lock the pte. If + * unlocked, the calculation should be considered an estimate that will + * need to be validated under the lock. + */ + + struct vm_area_struct *vma = vmf->vma; + int nr; + unsigned long addr; + pte_t *pte; + pte_t *first_set = NULL; + int ret; + + if (has_transparent_hugepage()) { + order = min(order, PMD_SHIFT - PAGE_SHIFT); + + for (; order > 1; order--) { + nr = 1 << order; + addr = ALIGN_DOWN(vmf->address, nr << PAGE_SHIFT); + pte = vmf->pte - ((vmf->address - addr) >> PAGE_SHIFT); + + /* Check vma bounds. */ + if (addr < vma->vm_start || + addr + (nr << PAGE_SHIFT) > vma->vm_end) + continue; + + /* Ptes covered by order already known to be none. */ + if (pte + nr <= first_set) + break; + + /* Already found set pte in range covered by order. */ + if (pte <= first_set) + continue; + + /* Need to check if all the ptes are none. */ + ret = check_ptes_none(pte, nr); + if (ret == nr) + break; + + first_set = pte + ret; + } + + if (order == 1) + order = 0; + } else + order = 0; + + return order; +} + /* * Handle write page faults for pages that can be reused in the current vma * @@ -4058,6 +4142,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) struct folio *folio; vm_fault_t ret = 0; pte_t entry; + unsigned long addr; + int order = max_anon_folio_order(vma); + int pgcount = BIT(order); /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) @@ -4099,24 +4186,42 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); return handle_userfault(vmf, VM_UFFD_MISSING); } - goto setpte; + set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache(vma, vmf->address, vmf->pte); + goto unlock; } - /* Allocate our own private page. */ +retry: + /* + * Estimate the folio order to allocate. We are not under the ptl here + * so this estiamte needs to be re-checked later once we have the lock. + */ + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + order = calc_anon_folio_order_alloc(vmf, order); + pte_unmap(vmf->pte); + + /* Allocate our own private folio. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address, 0, 0); + folio = try_vma_alloc_movable_folio(vma, vmf->address, order, true); if (!folio) goto oom; + /* We may have been granted less than we asked for. */ + order = folio_order(folio); + pgcount = BIT(order); + addr = ALIGN_DOWN(vmf->address, pgcount << PAGE_SHIFT); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; - cgroup_throttle_swaprate(&folio->page, GFP_KERNEL); + folio_throttle_swaprate(folio, GFP_KERNEL); /* * The memory barrier inside __folio_mark_uptodate makes sure that - * preceding stores to the page contents become visible before - * the set_pte_at() write. + * preceding stores to the folio contents become visible before + * the set_ptes() write. */ __folio_mark_uptodate(folio); @@ -4125,11 +4230,26 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); - if (!pte_none(*vmf->pte)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); - goto release; + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); + + /* + * Ensure our estimate above is still correct; we could have raced with + * another thread to service a fault in the region. + */ + if (unlikely(check_ptes_none(vmf->pte, pgcount) != pgcount)) { + pte_t *pte = vmf->pte + ((vmf->address - addr) >> PAGE_SHIFT); + + /* If faulting pte was allocated by another, exit early. */ + if (order == 0 || !pte_none(*pte)) { + update_mmu_tlb(vma, vmf->address, pte); + goto release; + } + + /* Else try again, with a lower order. */ + pte_unmap_unlock(vmf->pte, vmf->ptl); + folio_put(folio); + order--; + goto retry; } ret = check_stable_address_space(vma->vm_mm); @@ -4143,14 +4263,16 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, pgcount - 1); + + add_mm_counter(vma->vm_mm, MM_ANONPAGES, pgcount); + folio_add_new_anon_rmap_range(folio, &folio->page, pgcount, vma, addr); folio_add_lru_vma(folio, vma); -setpte: - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + + set_ptes(vma->vm_mm, addr, vmf->pte, entry, pgcount); /* No need to invalidate - it was non-present before */ - update_mmu_cache(vma, vmf->address, vmf->pte); + update_mmu_cache_range(vma, addr, vmf->pte, pgcount); unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); return ret; -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel