From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8B233822BB for ; Thu, 30 Apr 2026 20:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580567; cv=none; b=XOX6DnauvWSJtfLiwAWyMFNFdmqeGLZfq3tXUxKYvPdnBpOiNzTQJYuRDA1qdq2gklbLiURmGxUqCAfn/GRFhmxwbmOEVJR1MClPCOPPKckKEOFessudy3CjwF9WLN2HHcz80p2Cmk1GRuyGf0+xam0Eg8v3I5muEcHXGoJTXbw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777580567; c=relaxed/simple; bh=Sl9lzMSKdW0jTiCedxntaXGSN6tXw6CmB2XM2iHmpfQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Qw/lflCPVq0OCSiNEE/pUAAfc1sYagQjkTzntERFJLZkRwZ4C07u8i2A72CZksVYOl4SI58mH4JOYJnhpbWa/qp98Nq9pwb5E4hvLWZ7KY41MfNEVKxGMnucnc7xtlOA51NlZH6yxU4Ly1XwTqT4qcUiWdiPusVwXb1N+JQDAvc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=Ma0iimHf; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="Ma0iimHf" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:Message-ID:Date:Subject:Cc :To:From:Sender:Reply-To:Content-Type:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Sl9lzMSKdW0jTiCedxntaXGSN6tXw6CmB2XM2iHmpfQ=; b=Ma0iimHfP+quUiNRvQZHVPO/oL 4kOxlODNo4yc+E8ooEbNWZVyB3VOHLoeLWD+eFUFdWE1rIwluA1Qfprwxa9s76E6P3pKDUBQmD09H YSFwuF4N0MhBamBMXmV0im3BQOl86PUFQ+Kjr2JQoVCYwkG1mesl5LaKuWk8vQxnRBI8NMF2qvKUP IzOTedDsUeWdH9Jwdh81jDDDRKLNjsnIi67QzzX978M4ePv0CkxRxheO+qVuX/2MlEj6MpSN9ANit 8IAoc303Ah0RlfU7mrTdBkGSrK/SWYiqBOCchuyvk8JYHpVwWECqsFf5TtEGv2kgcFEHQt8Qvju0O 0FkNDroQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wIXuC-000000001R0-1gnr; Thu, 30 Apr 2026 16:22:40 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, linux-mm@kvack.org, david@kernel.org, willy@infradead.org, surenb@google.com, hannes@cmpxchg.org, ljs@kernel.org, ziy@nvidia.com, usama.arif@linux.dev Subject: [00/45 RFC PATCH] 1GB superpageblock memory allocation Date: Thu, 30 Apr 2026 16:20:29 -0400 Message-ID: <20260430202233.111010-1-riel@surriel.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Neither of those are great solutions, given that modern servers tend to be large, often run multiple workloads simultaneously, and each workload wants something else. To address that issue, this patch series divides memory not just into 2MB page blocks, but into PUD sized superpageblocks, and aggressively tries to steer unmovable, reclaimable, and highatomic allocations into those superpageblocks that have already been "tainted" by such allocations. The goal is to leave as many 1GB superpageblocks as possible used by only movable allocations, so they can be easily defragmented for either regular PMD sized huge pages, or for PUD sized huge pages. This series is still very much a work in progress, with lots of work left to do, but I am posting it now (ahead of LSF/MM) in the hopes of getting some feedback on whether this looks like the right direction to go in. This code has been largely written by AI, then nitpicked over by me (with some early feedback from Johannes and Usama), and gone through the cycle of nitpicks several times. I am sure there are places left where the code could, and should be better. However, it does seem to work. On my 256GB system, I can run syzkaller with AI automatically analyzing crashes, examining git history for potential causes and fixes, etc, using up all of memory. Out of the 238 superpageblocks in the normal zone, normally less than 20 get used for unmovable and reclaimable allocations. I can allocate 50 1GB huge pages without driving the workload out of memory. Presumably I could allocate a lot more if I shut down the workload. A number of the patches, especially later in the series, are fixes that should be folded into earlier patches. I hope to do that soon-ish. In the mean time, these are probably the patches to focus on when reviewing the ideas behind this series: - mm: page_alloc: track actual page contents in pageblock flags - mm: page_alloc: add per-superpageblock free lists - mm: page_alloc: add background superpageblock defragmentation worker - mm: page_alloc: add within-superpageblock compaction for clean superpageblocks - mm: page_alloc: superpageblock-aware contiguous and higher order allocation Based on 1c9982b49613