From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79F14C3ABDA for ; Tue, 20 May 2025 06:06:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 198076B0082; Tue, 20 May 2025 02:06:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 148956B0089; Tue, 20 May 2025 02:06:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 010176B008A; Tue, 20 May 2025 02:06:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D84B06B0082 for ; Tue, 20 May 2025 02:06:05 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9B7BAB96BD for ; Tue, 20 May 2025 06:06:05 +0000 (UTC) X-FDA: 83462250690.08.1640688 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf20.hostedemail.com (Postfix) with ESMTP id D4EA11C000A for ; Tue, 20 May 2025 06:06:03 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DToLBVXk; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DToLBVXk; spf=pass (imf20.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747721163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=exjs9qv769uSOVDe/SfT4xTE+P3QbGIK6VCcQPIsDkU=; b=hHLu3Vr+ZDt918NgogCZ6I33V4ug3N1f6vxztRKBXrt7RlvXQowkMPaiI1QnaaqrSdCrOv IhDzq7shWnc2auEJpNjdL3qCB6kZ3NKw5B5K6n+u6uCW4Aldu4jXYpV4sewFpsuqtaI6dJ 1No6KLYlstxk/0/yR6VkFkKzp3qWt/M= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747721163; a=rsa-sha256; cv=none; b=cz/XoWSphvUbgdFV04c4M/Krp3G77RHw5ZNxGQcd+v5lsrIKu7hDFiGkxP5nPdbGxHXbz8 IJ6a45+r/J0QF4uow+QbH2FvNSr9w1jH6YGL7mcfWPx1msbj2WRAOCsPjOGkTQ5R32UX3+ G/9BViebU51D/awS2zAHNPhFGDFwtvw= Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-30e8daea8c6so3033335a91.0 for ; Mon, 19 May 2025 23:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747721163; x=1748325963; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=exjs9qv769uSOVDe/SfT4xTE+P3QbGIK6VCcQPIsDkU=; b=DToLBVXkDdK5V30wcHzVLG73+tzVdVts6sZhl80YVbTCoJi/XxOTfoFut7gooiYSOY FZh/C3XbAJXAuKIGp01NTsC52YoN871frxdlYW9PBbkql7qxbO5DYPT6rUUsvPW/14Gd h9RtvkPnIJTrbRuTQRa5kKfrqx/dNqrToaI/jThUMYfeB2urdr/VktQIlC67HQPSfFaE 0J2K36tHzDIJu97ZRyZitvZ4+WnTwtF53Xr+MfBTxRMf9v3KCfQnBo8pmREsk9kaSxzi uZfXNnOFO/6gMFlbG+a+3DmEU71Xm5ukiYiiFd0o96Y/il90YrA39N6RRP02hsiH8Yo0 Enrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747721163; x=1748325963; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=exjs9qv769uSOVDe/SfT4xTE+P3QbGIK6VCcQPIsDkU=; b=iKssqKU4WeGn5a1+Ah860vFKF1TWH8ARk56WhtTxbMAEXX3muFnQivCP9WA8C21DnL w6+D/RNZ3NpuGhxJ4wWJyNb2qFeEDyq65GfzlF5HI2ADAT+d2c30BcULWoqmpNi5ssVI +FIJSqFGCgLShdtWyeBe/q4Xsp8KurWpOEEh/T82I68HVMg3R3p2axBd1h0GoS7w2Smn LFf1c9W4znb/fubuo17dTjtsHNO+PBiSynUVmHoMd3gOkThGALfsRrrXj86ajifiDGLT DOLqc04E5eJ8j3ajiXFUzB67vZxsQpg6mHn2q1ngimJ6Ycjq9ClfLouLikFdgrzgyaTO MWkA== X-Forwarded-Encrypted: i=1; AJvYcCU9bCMuegGl001kFrFVQPYCSaSv2tF+RMb1kkL4g3d/3RyDzDjHqtyCJJkqv/rMBI9xzGOKie3auw==@kvack.org X-Gm-Message-State: AOJu0YyNM+Mj3ZQQ7BBxgS5kJ7RTjg9L5DFBuTMMnOypiP8Nf7FCg9py s+Ipcto37mo9/ky5tgK1mZkSl3WR9LGbuc8nr1BS2asIrMyrO3kKKZx8 X-Gm-Gg: ASbGnctzZsMAQfxhJrC7Mopaj1KaRGCvuS0Zm2RvF/YvUznoO/HOuVzlO8FcYCt4ggx 2oSze19272P41r7PF/86wnF3B51XUmZ4gevkDIddW2kmSFzIdysHN89/ltj/JC6p1zfKXqGVCYS VZ2AjSmkX8cf3wEj6KBMqP7nVZoiv3kNQfvn0kDRF2muMYUveaGdmgUzYhWYhHLofCepTG2OYxu Ri0oRWBcTN8FbUnIApRCjH+NnueUttOFgwapbo/pLa9LoVLX/Lp9xjYeRhaiNgSGrI2hB4GIsPd CR85V/dTOHfreUG7JSzJvdsSbMg/aFamv5Ve/y++A9c6IcF2oJPupwEPti1fKnImU2STqvtzMa0 1oY9ylPUCCA== X-Google-Smtp-Source: AGHT+IHIoNbQcghkG/2Yz2wEhJFRGdpNDkEnmYh23lOeG8yiA4c65L5qk+Gp1vIDPBA6sGNxxCYglA== X-Received: by 2002:a17:90b:4c8d:b0:30e:9349:2d96 with SMTP id 98e67ed59e1d1-30e93493228mr20420682a91.28.1747721162532; Mon, 19 May 2025 23:06:02 -0700 (PDT) Received: from localhost.localdomain ([39.144.103.61]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f36385e91sm823428a91.12.2025.05.19.23.05.54 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 19 May 2025 23:06:02 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, hannes@cmpxchg.org, usamaarif642@gmail.com, gutierrez.asier@huawei-partners.com, willy@infradead.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-mm@kvack.org, Yafang Shao Subject: [RFC PATCH v2 1/5] mm: thp: Add a new mode "bpf" Date: Tue, 20 May 2025 14:04:59 +0800 Message-Id: <20250520060504.20251-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250520060504.20251-1-laoar.shao@gmail.com> References: <20250520060504.20251-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D4EA11C000A X-Stat-Signature: jcm8wxp6rhrpi1dbgkgnbjw3ccpzd979 X-Rspam-User: X-HE-Tag: 1747721163-219875 X-HE-Meta: U2FsdGVkX1+bnsqzOff+QVxSqjmrAv2mIDE776RmsLpqd20ApoDMc7QY9gBcJWFi/DIXUSbAHssejOwf+o9hOKlDYivxXpFPC2jQsA8J9sPjKALaFRGTkVGy0Tdwid5FIZCDuIAfrGs9NEE1wqg2i8ZmGhPFPCQ8cJgJ0DgtFrq0VvWw35n30mMpwh8OlYtxoAdnxvdHbExJP0VwXrbiB2wvdKFmlGJ2knAktQok2npJK044l0xAEt+auBmzEOJmNC8Rh9lQflp+tOTyQ0SELdcrl9d3JXeiDO688+URDvBN4WB8FHNQ/XPRSkt/bVMuxyy9UwqcHZM3VKrBgw/qb93KGN1AWVqDVyS3kzpT3ZkoxAXYKOaVI2J35ZJRcTwNstGvUP6LtnxE386ssjPeKXv+KuiuJEMYuRh6VVkeNPKFwHXBbW7Y0hNGay6TjVctksEd6oPraVRRKYUUe+pDck3KyMth+8cTwMlmu3CdOuEf1/Eyv04LjsTaNHtZPIKmLWjR+QyGyeuFfqibNQ0KohDNXIHHrkRsaOZJj8uKWcJ3whnk0XnPornulSxqn/iPzU36QLDlbXJpIZolhYy7cDSNL0B/kw+BG/wg/BhVsrp1LTGVwPZPflUQz22PMZi3DYME/jUJcG/xyPUInKly0BZyiNoqDYRNCcx/BDIcmtjPqqdCFMu1VGD26VJEz96A1/jK873754yFm2OVSvEpEImSfqEokM5i5gpQwB6fUDtt/DyZhn3woONxAr38P+/S0p0gPhmFu+FLsCL0j7z7b/AcD3e1DTOcgE1ga5gJfS0FqIkQ33lx7Vet96KMpxrPuZRv+CXvaJ8y//F8kVO6MSPBvhN2ZUUE8eqK1Qw+B3+P0cDuIJBjm69n9U1r7Qw8vhKYNELPqsTwVCYmUbBj4LbpyXBZ6iGNvyAor5LNRF3J7Lyx/f2nrPtfJdtWkxcMURGdi2NsIWJ6amLHuOT OmiFlt99 3cZsQqaMo1DWOLoT16FgCI6LVaM4BBex+mkJU8LLXoZ7HMED6pS6xfOA/UvUsoeAkcvpopj+YBDPmJYHY4fJQcBQG9g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Background ---------- Historically, our production environment has always configured THP to never due to past incidents. This has made system administrators hesitant to switch to madvise. New Motivation -------------- We’ve now identified that AI workloads can achieve significant performance gains with THP enabled. To balance safety and performance, we aim to allow THP only for AI services while keeping the global system setting at never. Proposed Solution ----------------- Johannes suggested introducing a dedicated mode for this use case [0]. This approach elegantly solves our problem while avoiding the complexity of managing BPF alongside other THP modes. Link: https://lore.kernel.org/linux-mm/20250509164654.GA608090@cmpxchg.org/ [0] Suggested-by: Johannes Weiner Signed-off-by: Yafang Shao --- include/linux/huge_mm.h | 2 ++ mm/huge_memory.c | 65 ++++++++++++++++++++++++++++++++++++----- 2 files changed, 59 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e893d546a49f..3b5429f73e6e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -54,6 +54,7 @@ enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, + TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, /* "bpf" mode */ }; struct kobject; @@ -174,6 +175,7 @@ static inline void count_mthp_stat(int order, enum mthp_stat_item item) extern unsigned long transparent_hugepage_flags; extern unsigned long huge_anon_orders_always; +extern unsigned long huge_anon_orders_bpf; extern unsigned long huge_anon_orders_madvise; extern unsigned long huge_anon_orders_inherit; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 47d76d03ce30..8af56ee8d979 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -79,6 +79,7 @@ static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; unsigned long huge_zero_pfn __read_mostly = ~0UL; unsigned long huge_anon_orders_always __read_mostly; +unsigned long huge_anon_orders_bpf __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; static bool anon_orders_configured __initdata; @@ -297,12 +298,15 @@ static ssize_t enabled_show(struct kobject *kobj, const char *output; if (test_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags)) - output = "[always] madvise never"; + output = "[always] bpf madvise never"; + else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, + &transparent_hugepage_flags)) + output = "always [bpf] madvise never"; else if (test_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags)) - output = "always [madvise] never"; + output = "always bpf [madvise] never"; else - output = "always madvise [never]"; + output = "always bpf madvise [never]"; return sysfs_emit(buf, "%s\n", output); } @@ -315,13 +319,20 @@ static ssize_t enabled_store(struct kobject *kobj, if (sysfs_streq(buf, "always")) { clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); + } else if (sysfs_streq(buf, "bpf")) { + clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); + set_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, &transparent_hugepage_flags); } else if (sysfs_streq(buf, "madvise")) { clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); } else if (sysfs_streq(buf, "never")) { clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, &transparent_hugepage_flags); } else ret = -EINVAL; @@ -495,13 +506,15 @@ static ssize_t anon_enabled_show(struct kobject *kobj, const char *output; if (test_bit(order, &huge_anon_orders_always)) - output = "[always] inherit madvise never"; + output = "[always] bpf inherit madvise never"; + else if (test_bit(order, &huge_anon_orders_bpf)) + output = "always [bpf] inherit madvise never"; else if (test_bit(order, &huge_anon_orders_inherit)) - output = "always [inherit] madvise never"; + output = "always bpf [inherit] madvise never"; else if (test_bit(order, &huge_anon_orders_madvise)) - output = "always inherit [madvise] never"; + output = "always bpf inherit [madvise] never"; else - output = "always inherit madvise [never]"; + output = "always bpf inherit madvise [never]"; return sysfs_emit(buf, "%s\n", output); } @@ -515,25 +528,36 @@ static ssize_t anon_enabled_store(struct kobject *kobj, if (sysfs_streq(buf, "always")) { spin_lock(&huge_anon_orders_lock); + clear_bit(order, &huge_anon_orders_bpf); clear_bit(order, &huge_anon_orders_inherit); clear_bit(order, &huge_anon_orders_madvise); set_bit(order, &huge_anon_orders_always); spin_unlock(&huge_anon_orders_lock); + } else if (sysfs_streq(buf, "bpf")) { + spin_lock(&huge_anon_orders_lock); + clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_inherit); + clear_bit(order, &huge_anon_orders_madvise); + set_bit(order, &huge_anon_orders_bpf); + spin_unlock(&huge_anon_orders_lock); } else if (sysfs_streq(buf, "inherit")) { spin_lock(&huge_anon_orders_lock); clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_bpf); clear_bit(order, &huge_anon_orders_madvise); set_bit(order, &huge_anon_orders_inherit); spin_unlock(&huge_anon_orders_lock); } else if (sysfs_streq(buf, "madvise")) { spin_lock(&huge_anon_orders_lock); clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_bpf); clear_bit(order, &huge_anon_orders_inherit); set_bit(order, &huge_anon_orders_madvise); spin_unlock(&huge_anon_orders_lock); } else if (sysfs_streq(buf, "never")) { spin_lock(&huge_anon_orders_lock); clear_bit(order, &huge_anon_orders_always); + clear_bit(order, &huge_anon_orders_bpf); clear_bit(order, &huge_anon_orders_inherit); clear_bit(order, &huge_anon_orders_madvise); spin_unlock(&huge_anon_orders_lock); @@ -943,10 +967,22 @@ static int __init setup_transparent_hugepage(char *str) &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, + &transparent_hugepage_flags); + ret = 1; + } else if (!strcmp(str, "bpf")) { + clear_bit(TRANSPARENT_HUGEPAGE_FLAG, + &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, + &transparent_hugepage_flags); + set_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, + &transparent_hugepage_flags); ret = 1; } else if (!strcmp(str, "madvise")) { clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, + &transparent_hugepage_flags); set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); ret = 1; @@ -955,6 +991,8 @@ static int __init setup_transparent_hugepage(char *str) &transparent_hugepage_flags); clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags); + clear_bit(TRANSPARENT_HUGEPAGE_REQ_BPF_FLAG, + &transparent_hugepage_flags); ret = 1; } out: @@ -967,8 +1005,8 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); static char str_dup[PAGE_SIZE] __initdata; static int __init setup_thp_anon(char *str) { + unsigned long always, bpf, inherit, madvise; char *token, *range, *policy, *subtoken; - unsigned long always, inherit, madvise; char *start_size, *end_size; int start, end, nr; char *p; @@ -978,6 +1016,7 @@ static int __init setup_thp_anon(char *str) strscpy(str_dup, str); always = huge_anon_orders_always; + bpf = huge_anon_orders_bpf; madvise = huge_anon_orders_madvise; inherit = huge_anon_orders_inherit; p = str_dup; @@ -1019,18 +1058,27 @@ static int __init setup_thp_anon(char *str) bitmap_set(&always, start, nr); bitmap_clear(&inherit, start, nr); bitmap_clear(&madvise, start, nr); + bitmap_clear(&bpf, start, nr); + } else if (!strcmp(policy, "bpf")) { + bitmap_set(&bpf, start, nr); + bitmap_clear(&inherit, start, nr); + bitmap_clear(&always, start, nr); + bitmap_clear(&madvise, start, nr); } else if (!strcmp(policy, "madvise")) { bitmap_set(&madvise, start, nr); bitmap_clear(&inherit, start, nr); bitmap_clear(&always, start, nr); + bitmap_clear(&bpf, start, nr); } else if (!strcmp(policy, "inherit")) { bitmap_set(&inherit, start, nr); bitmap_clear(&madvise, start, nr); bitmap_clear(&always, start, nr); + bitmap_clear(&bpf, start, nr); } else if (!strcmp(policy, "never")) { bitmap_clear(&inherit, start, nr); bitmap_clear(&madvise, start, nr); bitmap_clear(&always, start, nr); + bitmap_clear(&bpf, start, nr); } else { pr_err("invalid policy %s in thp_anon boot parameter\n", policy); goto err; @@ -1041,6 +1089,7 @@ static int __init setup_thp_anon(char *str) huge_anon_orders_always = always; huge_anon_orders_madvise = madvise; huge_anon_orders_inherit = inherit; + huge_anon_orders_bpf = bpf; anon_orders_configured = true; return 1; -- 2.43.5