From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E71B932ED4E for ; Wed, 10 Jun 2026 06:29:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781072991; cv=none; b=SX/iyy+q5ijyL5ItcezHzftNovvASqp9DFarGnptxL8FGncVoZl9krHQj+cPRDBPjqjSwuHCFYyvVnu6NRNdV+iGSGFY1FV1ri8ICdGHsUg/YlndyvX8AJNv7tX+QrWQb/X8B0u01xNaRMH0qpxFFaOE7b0mXQfiFhT2d7vS06Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781072991; c=relaxed/simple; bh=OQxmEepkBnLqQ0SyuA7EFhFCI22U+PYFTDtsORAwyvc=; h=From:To:Cc:Subject:In-Reply-To:Date:Message-ID:References; b=JSjMMCId2tWAr+PUeyEq15HJbL6txkRYUGWQnW7RlwLM1yvfE4wdnjX6Mwv6oMHa4hCCAzh84xcXJQLlMvjAGWn8JIdAOtVYmBiEC3lARzdq6+gkuss+P4qokQMHlCS//8ucasYtb91L3/Of134hXnMUvTC8mvESIDONscir7nc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=C2bJ6ZPd; arc=none smtp.client-ip=209.85.216.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="C2bJ6ZPd" Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-36b8d414666so3609620a91.3 for ; Tue, 09 Jun 2026 23:29:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781072989; x=1781677789; darn=vger.kernel.org; h=references:message-id:date:in-reply-to:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=nXhoKBKFsBSVgBh+Kydkjm+k3Aw/b2vmM+J6r2LJmuo=; b=C2bJ6ZPdUDvlPIAuO/UhKjV/yoITNXMXkEXoyLSfKlV6mcuBzeirgjw/GOySntWXT8 123AiimFaazjBQdtqNwcmcDkmZIcLXA4OOIzUBeEcxyBh5m+4bmRxdqhdi3cujJeOFcP 3wP9PUM9PxHHjZxTl1QmmMqm4ygopK2rONVburdGis+DeQOjxmn2enttZoHVOs4az/EV FRFZXH6xthE2Xct7ybZio3RVyGZasSRjW3855BZOVnYpJPaUU7do4OGZHLiJxOE2sdSU CbJq/w0i0COnheCT/2GODWUigbjVK0zxTfd0WldRfm6GCeUWy92+82aaaX7LXEWQT4tz M0pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781072989; x=1781677789; h=references:message-id:date:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nXhoKBKFsBSVgBh+Kydkjm+k3Aw/b2vmM+J6r2LJmuo=; b=B/Le3yggvk5WB5l9Z2WYwOj/Jn0Jeo761F4KoNh/AbSyTAUfYp5QcppolDXoRL4RzB zDK1b/DbY3uzJ1EVuSkWWPQGm72/r8d89JXWwNFbwBP1MMZehwrA/xq02JYAgIjhTu7f TLmF8kgR6CxyRVUNa49LHP0HzkSxuAvSYzczcG5Rij+NXzoWNkhvE/WXBOxey39YWkuY scimryORLKyCC2p+uuEmBXAqEVB1g1Z718WCYzYBQv/CCKDRw/TzBnINr/UzaLU1xEH1 HWC3pWl2+0EWag/gYA8dko9LWp0kLAHFDa03caq6n0UlpV/VEf1PxcByPvV3BUTfA7lW vpcA== X-Forwarded-Encrypted: i=1; AFNElJ9ThjTOYOVaE+2tu7fjYzDWfXScWtUTC4faGdc/GQgxkeNJaIT6Jkz7LSYFhaBGJxnjFjHBLToViiqXmgo=@vger.kernel.org X-Gm-Message-State: AOJu0Yy1A/irvy9bmlKC0IEKh59t9v7017mBr/IJcri1tYvcnbwNtHtB lRrHlDsz4twUIM5JDBxkKFfDRm3+QpxNOli5IZtsCOGBeFvFjBO1jAzO X-Gm-Gg: Acq92OG8s0fsj8rOzlqIelEXCCdvuEoYqZbqQWVp1lFX7Ta4BgTnq2EPO8TASP4po13 lz/aqW8ZK3Y9i2c2pvRc1CSl39SR2ed4n5P+l9UGoo7+BBQAaQMZGSRw6LPgtbeL+LiVHv1JESw b4Oben4qgK+9N9ycqLYWUw8MDw1dDFhQ5nKP30YHhfXi+rwD3kUBE0DLMEZsyXnXlEbsE3MRmX8 t1Z5weHKq+cA0F9090LYJojTTmEouSIOWido9OUwqxNL+zNXqL+B5Lslhu/b2n0txe+E2QJKD2P skOpiJWGY7D2AY0dQRXj81sJ9uTb4AK7OnLn30UNIiC6P3Zcvr8ziKyjIUyrWtd0R8NgJQnlIVj JRyvssFMfmCyi8xeDc5fuTnTQeTrpqtVPjulCsetUTwsD0hwnw/VTPH+VCMbPaBbASzyjx/INbe WENfOfmelykhPMCDyuP3p9cUoEMZP2kTOw X-Received: by 2002:a05:6a20:1615:b0:3b3:1951:489b with SMTP id adf61e73a8af0-3b53bf75f2amr8504650637.45.1781072988909; Tue, 09 Jun 2026 23:29:48 -0700 (PDT) Received: from pve-server ([49.205.216.49]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c85df04a0e9sm17535932a12.13.2026.06.09.23.29.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jun 2026 23:29:48 -0700 (PDT) From: Ritesh Harjani (IBM) To: YoungJun Park Cc: linux-mm@kvack.org, Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , David Hildenbrand , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Sayali Patil Subject: Re: [RFC 0/4] mm, swap: Enable THP SWAP for PowerPC Book3S64 In-Reply-To: Date: Wed, 10 Jun 2026 11:00:43 +0530 Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: YoungJun Park writes: > On Tue, Jun 09, 2026 at 06:49:30PM +0530, Ritesh Harjani (IBM) wrote: >> On PowerPC Book3S64, MMU is selected at runtime, so macros like PMD_SHIFT are >> effectively runtime variables in the Book3S64 code. THP swap code uses these >> macros for e.g. to size some of its array data structures based on PMD_ORDER. >> This patch series makes that usage dependent on the runtime variable. >> >> Sayali did some performance runs of this on Book3S64 with Radix and it gives >> 40-50% performance improvement. We also plan to run it with Hash, will soon >> update the results. >> >> Note that this patch series is based out of linux-next (next-20260608). >> >> Ritesh Harjani (IBM) (4): >> include/linux/swap.h: Remove unused leftovers >> mm, swap: make SWAPFILE_CLUSTER runtime >> mm, swap: make SWAP_NR_ORDERS runtime >> powerpc: Kconfig: Enable THP_SWAP on Book3S64 >> >> arch/powerpc/platforms/Kconfig.cputype | 1 + >> include/linux/swap.h | 17 +--- >> mm/swap.h | 5 +- >> mm/swap_table.h | 6 +- >> mm/swapfile.c | 132 ++++++++++++++++++------- >> 5 files changed, 106 insertions(+), 55 deletions(-) >> >> -- >> 2.39.5 >> > Hello! > Thanks for taking a look at this. > Instead of making SWAP_NR_ORDERS fully runtime, could we set it to the max > PMD_ORDER possible on PowerPC Book3S64 as a compile-time constant in the > swap.h ifdef block? (My assumtion is PMD_ORDER max not too big.) > > I think the general runtime version adds cost. It impacts all other archs. > percpu_swap_cluster needs a runtime alloc, > the si/offset and nonfull/frag arrays become separate pointers, and some > accesses get one more indirection. And for nr_orders=1, the allocation > itself is just waste. > > With a compile-time possible max constant, the only downside is some acceptable amount of > wasted bytes per CPU / per device on Book3S64 (the unused entries in the swap > offset cache and the nonfull/frag lists), with no perf impact. the perf > improvement comes from THP swap itself, right? Other arches see no > impact at all. > I looked into the memory waste comparison between static v/s runtime alloc. And the wastage for per-cpu alloc data structures (with Radix MMU) will be 0, because we use kcalloc_node() which will use kmalloc-64 slab. So slab padding would anyway add some memory waste. So it is as good as using static arrays with some max PMD_ORDER for the percpu_swap_cluster. For the other lists you mentioned, it anyways adds a onetime negligible cost which isn't worth for making SWAP_NR_ORDERS runtime. > patch 2 looks fine as is. SWAPFILE_CLUSTER backs much bigger per-cluster > arrays, so runtime sizing makes sense there, and it looks like no impact to > other arches or the current code. > yup. That make sense. So, unless someone else raises any objection - I will give this a try instead of patch-3 in this series and will get back with v2. diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index e67e64ac6e8c..57abd8b2c9a1 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -204,6 +204,9 @@ extern unsigned long __pmd_frag_size_shift; #define MAX_PTRS_PER_PGD (1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE ? \ H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE)) +#define ARCH_MAX_PMD_ORDER ((H_PTE_INDEX_SIZE > RADIX_PTE_INDEX_SIZE) ? \ + H_PTE_INDEX_SIZE : RADIX_PTE_INDEX_SIZE) + /* PMD_SHIFT determines what a second-level page table entry can map */ #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE) #define PMD_SIZE (1UL << PMD_SHIFT) diff --git a/include/linux/swap.h b/include/linux/swap.h index 46c25523d7b8..5f1451f8f266 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -224,10 +224,14 @@ enum { #define SWAP_ENTRY_INVALID 0 #ifdef CONFIG_THP_SWAP +#ifdef ARCH_MAX_PMD_ORDER +#define SWAP_NR_ORDERS (ARCH_MAX_PMD_ORDER + 1) +#else #define SWAP_NR_ORDERS (PMD_ORDER + 1) +#endif /* ARCH_MAX_PMD_ORDER */ #else #define SWAP_NR_ORDERS 1 -#endif +#endif /* CONFIG_THP_SWAP */ -ritesh