From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 33017C4167B for ; Tue, 12 Dec 2023 11:58:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=lbpEdNHSgiS08iViXSSx+OUZHrFGkP9j6/Pwk5G9kBQ=; b=qkarM0MCI+t/3E jYMuOCu0LSZQByutdBMX7jv0XQKu0LnCxBKG77fUQQ+2UycQMuKC6nAbMVSMyks6/nUjxc0cddmBq jKMBouoHqee9zsYETBOHADb93UyEKM81077zzzw2PpWrI+nZCQ6mMOdQtn4Z568Niih1PKB7Ne4Jx XahO8u/xGszfzFx7IbXEPkfj0vO2vh5nFOhcN/Ow+VM0ksn6mtlvjRX9Hx/MbAprRmx+QrTXw2tSI CHndVepuiqQKril/TEKDgo1gsLIBFlGOyK+J7jXjUrVyVuBXA8acdNiJeblyOFj1IecBRRETtdx9e hPo+K0L1kVlQEMPkyFFQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rD1Oa-00Bb9a-2V; Tue, 12 Dec 2023 11:57:52 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rD1OX-00Bb8b-2m for linux-arm-kernel@lists.infradead.org; Tue, 12 Dec 2023 11:57:51 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 494A5143D; Tue, 12 Dec 2023 03:58:35 -0800 (PST) Received: from [10.1.39.183] (XHFQ2J9959.cambridge.arm.com [10.1.39.183]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B5AE3F762; Tue, 12 Dec 2023 03:57:45 -0800 (PST) Message-ID: <97489e94-ea4e-40a3-9e56-d5f7d1219e81@arm.com> Date: Tue, 12 Dec 2023 11:57:44 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 02/15] mm: Batch-clear PTE ranges during zap_pte_range() Content-Language: en-GB To: Alistair Popple Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Yang Shi , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231204105440.61448-1-ryan.roberts@arm.com> <20231204105440.61448-3-ryan.roberts@arm.com> <87h6kta3ap.fsf@nvdebian.thelocal> From: Ryan Roberts In-Reply-To: <87h6kta3ap.fsf@nvdebian.thelocal> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231212_035749_993349_749D25BD X-CRM114-Status: GOOD ( 29.70 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 08/12/2023 01:30, Alistair Popple wrote: > > Ryan Roberts writes: > >> Convert zap_pte_range() to clear a set of ptes in a batch. A given batch >> maps a physically contiguous block of memory, all belonging to the same >> folio. This will likely improve performance by a tiny amount due to >> removing duplicate calls to mark the folio dirty and accessed. And also >> provides us with a future opportunity to batch the rmap removal. >> >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during exit and other syscalls that cause zap_pte_range() (e.g. munmap, >> madvise(DONTNEED), etc.), as it is about to add transparent support for >> the "contiguous bit" in its ptes. By clearing ptes using the new >> clear_ptes() API, the backend doesn't have to perform an expensive >> unfold operation when a PTE being cleared is part of a contpte block. >> Instead it can just clear the whole block immediately. >> >> This change addresses the core-mm refactoring only, and introduces >> clear_ptes() with a default implementation that calls >> ptep_get_and_clear_full() for each pte in the range. Note that this API >> returns the pte at the beginning of the batch, but with the dirty and >> young bits set if ANY of the ptes in the cleared batch had those bits >> set; this information is applied to the folio by the core-mm. Given the >> batch is garranteed to cover only a single folio, collapsing this state > > Nit: s/garranteed/guaranteed/ > >> does not lose any useful information. >> >> A separate change will implement clear_ptes() in the arm64 backend to >> realize the performance improvement as part of the work to enable >> contpte mappings. >> >> Signed-off-by: Ryan Roberts >> --- >> include/asm-generic/tlb.h | 9 ++++++ >> include/linux/pgtable.h | 26 ++++++++++++++++ >> mm/memory.c | 63 ++++++++++++++++++++++++++------------- >> mm/mmu_gather.c | 14 +++++++++ >> 4 files changed, 92 insertions(+), 20 deletions(-) > > > >> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c >> index 4f559f4ddd21..57b4d5f0dfa4 100644 >> --- a/mm/mmu_gather.c >> +++ b/mm/mmu_gather.c >> @@ -47,6 +47,20 @@ static bool tlb_next_batch(struct mmu_gather *tlb) >> return true; >> } >> >> +unsigned int tlb_get_guaranteed_space(struct mmu_gather *tlb) >> +{ >> + struct mmu_gather_batch *batch = tlb->active; >> + unsigned int nr_next = 0; >> + >> + /* Allocate next batch so we can guarrantee at least one batch. */ >> + if (tlb_next_batch(tlb)) { >> + tlb->active = batch; > > Rather than calling tlb_next_batch(tlb) and then undoing some of what it > does I think it would be clearer to factor out the allocation part of > tlb_next_batch(tlb) into a separate function (eg. tlb_alloc_batch) that > you can call from both here and tlb_next_batch(). As per my email against patch 1, I have some perf regressions to iron out for microbenchmarks; one issue is that this code forces the allocation of a page for a batch even when we are only modifying a single pte (which would previously fit in the embedded batch). So I've renamed this function to tlb_reserve_space(int nr). If it already has enough room, it will jsut return immediately. Else it will keep calling tlb_next_batch() in a loop until space has been allocated. Then after the loop we set tlb->active back to the original batch. Given the new potential need to loop a couple of times, and the need to build up that linked list, I think it works nicely without refactoring tlb_next_batch(). > > Otherwise I think this overall direction looks better than trying to > play funny games in the arch layer as it's much clearer what's going on > to core-mm code. > > - Alistair > >> + nr_next = batch->next->max; >> + } >> + >> + return batch->max - batch->nr + nr_next; >> +} >> + >> #ifdef CONFIG_SMP >> static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma) >> { > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel