From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92FDB1DB356 for ; Wed, 24 Jun 2026 11:50:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782301860; cv=none; b=qmRXMmr8kdjwS/THaQSs9brvghHMlLkLRdXBDZMRffLYR2Au2MGI1ymsIwEarPDidGnBbtu9VrCGg1JlZkAQyalRIIMI34jzcO0JnPmrShha0zcDfPbpdvQBOPuwkXleKEk53CTsN0SurZB9K3y0OQdCS+45j/pRzz6jY/jIQZY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782301860; c=relaxed/simple; bh=/xz05pz6deue6P0QNHqraBcCc0F90hoRfEeIZvDIfv4=; h=From:To:Cc:Subject:In-Reply-To:Date:Message-ID:References: MIME-version:Content-type; b=VWbLAAmcICIm1XgYxZfZqyL2uA1fYbhzMTl2fy6ZLfl1GCH72c2fQNQztSbtnhC587qfahIsn0V3haRtX/BoGqIGJV0toCg2yx4IKnbVWhpL9dtBPf9q05OezW9qAYy1PJsA1XHWMyhs9TwyKhlA9C5fhD1AQ0DGRrmr6d/8OEc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EvFkUAdp; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EvFkUAdp" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-84538597e1fso655374b3a.1 for ; Wed, 24 Jun 2026 04:50:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1782301859; x=1782906659; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:message-id:date :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=XLYGI+jGGD5SSrEhLKeuITqC8EDy62Dm21Ns6LCR8Mk=; b=EvFkUAdp6BwfrKEOYlZ2PnYjLyT4h73WdZ3BluISOna5RigAvFbJvqPLEJX7SSc2Gi kkovII/oWo12I8pNDgblbNaPFRMjqv8QgJYAnVTjwTneggAFwURV+QwFtZkUbbrT6Crg MpqnScjUqz61oDIRIIO8YiOIY45kUXDseicRnHojSpjfdNKKgeXNVO9mioiCbUggqFxI B/M4VYhVVDwRhQDS32Tdm0y4Oqm+uEDYOvxk7ac+EEMPN5A+vSQMqGcBjzVdUrQgcHad QomvXd8FdA0Nk/v2wILszri/pwGV/JRSpg/aeqVKg+rfLtITQJSrEhz5zm6gu3ptjbLS ld2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782301859; x=1782906659; h=content-transfer-encoding:mime-version:references:message-id:date :in-reply-to:subject:cc:to:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=XLYGI+jGGD5SSrEhLKeuITqC8EDy62Dm21Ns6LCR8Mk=; b=sBdZ7SAomUCt0DjBePCgBR19EcNL8UK80Z3nWEdKDkTzPRQsRaWGGsTxErAqZCpXoX d548U8JI9aYrb6GmFYq6PP065fP5Jp9wJhLQ8VJqsAMVWJ6IuNGCcD/dqxB2W45NXBOw TKdnymiLevb5DKy1Yhgj+Kk1v1zbo7U3j9GJl645Gx4LlwIzHkTuU3x1vhxfAfnNTv7C UO5v/UOCSz8ClboGMrAtSoxcxMpBVGl3YElpPRGRXvgge79jTxjEcTAjtOYJFIpYWfub n/xLqSd3xQT/34ZrTCgelo4+LCoz8zDdmNU70xuJJgt/2haKMpsf/TApFO4A7SnyaXjw Nrgw== X-Forwarded-Encrypted: i=1; AFNElJ9FaZkfU8dLIkcEiIU33ntIs+ppyBtZlrvnqYredbvV/G2eJIyT+2KmGLsecTb5CH5gULmaqdUziUR0xis=@vger.kernel.org X-Gm-Message-State: AOJu0YzN1W46doWmh+ch2LizQW+4FiHjcpkIqOpiYkFHJZgfLQYeNiyg d/eHibBREe+QIgU+ERLLq6gRysE/ygU6BBLOdzCe+Ci9Z3ncfFJkePnv X-Gm-Gg: AfdE7cnRGaukM0wEa7Pino6SMIWAVLhnuJ5zVhbbkc9EK5ZXJ5JpJ9xUFGUEnKP0y5a zb4aX6nokHJGyvduozHwkcKWgEwavCDK+a0xg0oQ/2GtKa7m4LG/GYdM/o4NdSI2Db32S79i+iS rbN5JXuq1CsSHT2qotVyRme2FQRUimRSKHZ7lv9WWycak9uOjmzM3/x2ckTMvzc7i7pNDt5FETu 0mceYfAo8fO2fFD/jvs+zb2PagEY3mzsW6a6rTgsWe2H9T77rat1r80XOGNbxlQLe1l4OpxNicu iVqMCaoHxSD/pzmEPneefvqzXAKoeAFFP618+Jes2sfJLewPtZvqGNuCUVME/cl7Cwhq/whlQPR s1+7DIjHdBfkJziMPvHNFpQVj4+INFohebEoevSjpnqmlzbwE92rpdgG4ZWPXVgO83DmXuGVbI6 iNZPSb10osJmFoAj2wQW2VfKnlaA== X-Received: by 2002:a05:6a00:2305:b0:842:7f81:8079 with SMTP id d2e1a72fcca58-84595326988mr8140571b3a.37.1782301858737; Wed, 24 Jun 2026 04:50:58 -0700 (PDT) Received: from pve-server ([49.205.216.49]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-845a413d2aasm2435598b3a.59.2026.06.24.04.50.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2026 04:50:56 -0700 (PDT) From: Ritesh Harjani (IBM) To: Kairui Song Cc: linux-mm@kvack.org, Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Andrew Morton , Chris Li , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Youngjun Park , David Hildenbrand , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Sayali Patil Subject: Re: [PATCH v4 2/3] mm, swap: allow archs to override SWAP_NR_ORDERS via ARCH_MAX_PMD_ORDER In-Reply-To: Date: Wed, 24 Jun 2026 16:45:21 +0530 Message-ID: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Kairui Song writes: > On Fri, Jun 19, 2026 at 12:42 PM Ritesh Harjani (IBM) > wrote: >> >> SWAP_NR_ORDERS sizes a few small bounded arrays inside THP swap >> allocator code (nofull/frag cluster lists, percpu_swap_cluster's >> si/offset arrays, next array for rotational device). This currently >> expands to PMD_ORDER+1, which only works when PMD_ORDER is a compile >> time constant. >> >> However on architecture like PowerPC Book3S64, PMD_ORDER is a runtime >> variable which depends upon which MMU is selected (Radix / Hash), so in >> that case, PMD_ORDER cannot be used to size the static arrays. >> >> This patch provides an optional ARCH_MAX_PMD_ORDER (upper-bound) >> override for such architectures. The memory overhead on enabling this >> override is negligible. Even if we make SWAP_NR_ORDERS runtime alloc, >> default slab padding could cause some memory waste. Also we lose the >> per-cpu cacheline benefits (for percpu_swap_cluster) because it might >> cost an extra cacheline indirection overhead in swap_alloc_fast() for >> fetching si[order]/offset[order]. Note that a fully runtime >> SWAP_NR_ORDERS was considered in previous version but was dropped for >> this reason [1] >> >> [1]: https://lore.kernel.org/linuxppc-dev/pl1zdksc.ritesh.list@gmail.com/ >> >> Suggested-by: YoungJun Park >> Signed-off-by: Ritesh Harjani (IBM) >> --- >> arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +++++++ >> include/linux/swap.h | 12 +++++++++++- >> 2 files changed, 18 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h >> index e67e64ac6e8c..7f22d5d5fbdf 100644 >> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h >> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h >> @@ -204,6 +204,13 @@ extern unsigned long __pmd_frag_size_shift; >> #define MAX_PTRS_PER_PGD (1 << (H_PGD_INDEX_SIZE > RADIX_PGD_INDEX_SIZE ? \ >> H_PGD_INDEX_SIZE : RADIX_PGD_INDEX_SIZE)) >> >> +/* >> + * Compile-time upper bound on PMD_ORDER across hash and radix MMUs. >> + * Used by THP SWAP code. Check include/linux/swap.h >> + */ >> +#define ARCH_MAX_PMD_ORDER ((H_PTE_INDEX_SIZE > RADIX_PTE_INDEX_SIZE) ? \ >> + H_PTE_INDEX_SIZE : RADIX_PTE_INDEX_SIZE) > > Hi Ritesh > > So swap is the only user of this macro? Will there by any other users? > No other users so far other than swap. > I see that due to the percpu cluster design, it's hard to use a > flexible array here. We will probabaly get rid of the fixed percpu > cluster design in the future. By then should we be able to get rid of > this macro? > Earlier in RFC version [1] it was runtime though, but as stated in the commit msg, it adds unncessary complexity and yes, the per-cpu usage there, made me re-think this whole thing (as Youngjun also suggested). Since the allocation of si/offset of percpu_swap_cluster in fastpath means, we also loose on the cacheline benefits that it otherwise had. [1]: https://lore.kernel.org/linux-mm/19688ab5ab8017467749e003cf630c76a4b2b198.1781000840.git.ritesh.list@gmail.com/ Sure - I am not well aware of the plans on how to avoid the fixed per-cpu cluster design here. Maybe if you can share some details, that will be helpful. But essentially yes, per-cpu swap cluster was the major reason why we looked at adding ARCH_MAX_PMD_ORDER for PowerPC. Also note that this does not cost any additional memory overhead compared to the runtime solution, since kmalloc allocations of these structures were anyway adding some bit of padding. > I'm OK with this approach though. This current design has no negative > effect on other archs so no reason to block it, Sure. Thanks! > just wondering if this can be made simpler in the future :) Well it's relative. I felt this is a simpler design compared to the RFC version we had earlier [1]. But still - can you share some additional details of your concerns please. Having said that - sure if in future we get rid of the fixed percpu design, then I am happy to revisit this to see if this macro can be killed - by maybe adopting to runtime allocations. Thanks for looking into this! -ritesh