From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6A0C3ABDA for ; Wed, 14 May 2025 23:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48D0D6B00DC; Wed, 14 May 2025 19:43:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EDF56B00DD; Wed, 14 May 2025 19:43:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2197B6B00DE; Wed, 14 May 2025 19:43:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F085C6B00DC for ; Wed, 14 May 2025 19:43:35 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 387DFBEC1A for ; Wed, 14 May 2025 23:43:37 +0000 (UTC) X-FDA: 83443142874.29.3FBAA25 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf27.hostedemail.com (Postfix) with ESMTP id 684BF4000A for ; Wed, 14 May 2025 23:43:35 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g9wdt+LE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3piolaAsKCNc35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3piolaAsKCNc35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747266215; a=rsa-sha256; cv=none; b=l+L89KeJf3dIp0MJJNfVNOrb1ktPrp2oXjH0xLRE/flqUJlxltiIJapq5M45GJJYdf43FN H5ims61MnMtk/dM2rhZ4Ol5H7YUFmJZw2JI3U+vx3OW4aisK2FqhdhbouWFhlU1cnbrUoT Vjf6jLqCsl+M2tWlS1AILGQybdLL1sQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=g9wdt+LE; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of 3piolaAsKCNc35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3piolaAsKCNc35D7KE7RMG99HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747266215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tc8r/LLWs1XCr5lU0UzhDkU/KqodMLVECq9VtOKJK+U=; b=vwATyLPYLjRAHNj+Wxt2F8eNtmbP1DOF+5oe/9JYjNy/QXZhCUipaVSd3YHSCKyEQ0hVKF CGePjm9ZTm3vbcK2+H09Cnwps9CfM3glkR593uiwd/8aQABrTyj0T4v9LhKECHFFlvW61k Z/sQREL0J03IuvfzCH8acfaCLCMeYMU= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ac9abbd4bso590017a91.3 for ; Wed, 14 May 2025 16:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266214; x=1747871014; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tc8r/LLWs1XCr5lU0UzhDkU/KqodMLVECq9VtOKJK+U=; b=g9wdt+LEhOxyDwYWCko/9UIZzmZwMr9yxNMFHhRhSjc2wUfHixQXaLU+OXbRKEA1Oz vr9o0e5A0uRPXGSzhRSY3gsMI0bwjsgaKXjo4TKN9w6I0E01BsnLqz3d7lWSgYHvSmnl 92L8f4TPRsnsd8qC9UTInTd1FTDPPRGqcAToGEUNXk6EN30iGI3XumX4PkMeCH8YADSJ eoGRg3IRmsavaj3oi4i58uB1AY57aTflis5GV2ILo/PhQZbGK7J6cBskN3m4hspbyaqV jBqf9f/o0novWFP3AFvw3W0t3Xy53QXQPrC8q59IDKejwtpDVaKmgVbYTZ61IEA0Y9sG kIrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266214; x=1747871014; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tc8r/LLWs1XCr5lU0UzhDkU/KqodMLVECq9VtOKJK+U=; b=VFChtf8wvnLm1mrnMQWFTryE0RwVEfrm6KSo+Z0TsQreIb0iwXINKwqTDDkZs7tX19 I+tWR8lsBn/oUdRiybY05tq3qNcGQDwIucpXlgZkpEr8Htw6lhH7K/sfgvJRSAexBOe4 EGNKXLaS9JPnWk6R9WPh1Sh3qTt7sKCi4UUK347I3jiXZhOb7zYfad/FC6/9duFEWZdS Wj1a4c7gjoO0ZCFOX+3eW7q2SxWWy50KYjzp40FIzZgVO4Pr3YOmUiBpxCS0ERTbtcg9 oaJmTVXiXRE4J4zzIAs8YzMCSIpr4r+1pfAY9JFvXWxcHg+EuAXBVxzXExArMbxZfeGi jNUQ== X-Forwarded-Encrypted: i=1; AJvYcCWRnIxFlVMdFo1LenK9H4V4fb6N+6FmxxEaQ+Yp7KzpGIiTZhPJxnZkPKTJJUqYH6+g6K4bPDGPBg==@kvack.org X-Gm-Message-State: AOJu0YwhiB1kL269G4x9Zd02tKy5HV2oblT9ezggKF6b8m8W0Ft+ud8l T9iTlQAVs3vV3F7qHH6Ec3H9zejx+iLXS08RojxNiuu+T6osFDF871zoDqShZmFt7iULmCHKk3Z wcOWbtYIeBvoUxZePow8D8w== X-Google-Smtp-Source: AGHT+IF0TxGX5QDuiZ5fF+gyFVLYwkxQqYFd22OI+c7HWH/Uv05Rw/kMdg1Ug4u6l3WRzY2e8Z7/iM3iEM1ERmAmvQ== X-Received: from pjbeu14.prod.google.com ([2002:a17:90a:f94e:b0:2fc:2f33:e07d]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:c2cb:b0:2ff:52e1:c49f with SMTP id 98e67ed59e1d1-30e51930f33mr611504a91.26.1747266214289; Wed, 14 May 2025 16:43:34 -0700 (PDT) Date: Wed, 14 May 2025 16:42:03 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <3f2ac9240cd39295e7341d408548719818d5ea91.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 24/51] mm: hugetlb: Add option to create new subpool without using surplus From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 684BF4000A X-Stat-Signature: ct9f4tjuhmy8ee4eyhf6ifwi3ich46tz X-HE-Tag: 1747266215-475911 X-HE-Meta: U2FsdGVkX1/gVKyMbIS0kFSbXHVHy2SlcCHdr0poSXjcCEqIIKuhg1vVIB2XjdJASDdiYQFrzbeWXReGzNmvK4/S2fveOD4CZtfdcHu+EU6PbKyYBA2iVluQ+Pzm5HxqKtBvDFRq7pBthNJ3zFKv5d3u6ncJNj4wLNfwoJyM0YOPL8m+Xt/VBmlXYlZS0Tle4cIiOzOJEfyqSzHikIjuVOVo+L9pBlg3HUmjcAvTy5IaQbK9/MqEL5kmeMlwgSVvss9FyEE74dDoZng7c+athZ0fCS5wNjb19gIisCR4vvAeeuDQ213Az1YVCWSrrXdaSB4ugigLZdN1BnBpSuY/80acaWwf+pSYbIu1O093LZ0kEfDuplMEx7B3GVXSl27le+PTep/K/TQqdxP8eQvWvN5pbmMdxBW2rtoXD7vEROthJ49NE7JXRH8EYev7BQvc0MNANdvkcGSXgPJJqvBwGYkuwgQIIZ0fYn1yoIX9Jn/F3EJUeEF9J9ZGbDTZY1ib3L2Qdj9Y2YbQdyzH2S1O+HNiq70MbTr2/PNYapJU+D4YjWiYABK0RYFKkZA8vfXoC1B3xQ0wmRsOA4UC2+V1D/u13w7kDrSkxU1rVZMa1mQ5nQx2He7ZFFdU2eOcHbks5py1VQ9Xek20vW+uBlb7FyBZnup8hXa7M5OuzKGld+fVmPdZ0EQEvfRXIVt0JUigJaagj1YRc91jliPqXb2c+qVSNlHQH89aIxLrWFuwncRK8o4kyaQ+bDH2hFCh9TJS8EjFQgz3lVLNBFKFQNAfECgBz2oLEwoc3KyGrkYbQ9PA4WzAF8M3FRui0MaUgecyQkbFpPsxLq0DCJyzIfLovYEtOUx8i5n1YTaa60uk+0JAOC7GJXzPWqLCH5eETJ5XA3NP91r5EON5iMVwoSu47tFHxLj5q6rOR4X3VIfSmvG1+MTUJ0ULRNcyOFN49oyk5em3demHRKb4TblgJEC Dl0iBCpQ UnDhNFb9Rn+kr0otucGlfxGbd3sFsYEhOvXI8ynFdsQLGR8IWODgAqBN4Bq8PLKF5yrvYKAPG7LwIfyB74/7vsC33LafzTMtGSjxJ2hrRipWzHYSJJllktdPGbg/zOfPwNtr6zSs55ZpHAdWFx2fO0OU+ShiiBR6C8iG+a4jvwtQFgGMIGV/NXQHCgSxWG+tw8XVpuguo3PPE7hFc4J1RmUyW+bL2dCBN6rMAAxlNvXeZxO5Ezp000NeERr5o1XhuNgiDGLMLEB5OO8fYngxnEqz9EdlBnMBXzT2BGAUseDcIOSnXSyceMHoDDRl4LAdxAFcWjDOGNbH8rkWUXdec6gCCuIem2tNS/T+EwQI85n6+YamQfUFq7LQgsDhccxLRKLzJ8zdo5fg6b9MnEB3Y9yPdtqBH1Jju7NS12WTArWQMMJSES0/IO0hbza07uWZR49xS0O0GO0tfz7pICrhLOSYcAy+I/rip6UHUxH2coCQUzt9qOykYjERBOgnTfiV+JNCZrzKEOMWFcTMQq3AJoleh/z61fp5KAXjvPZr+mf5AvS60mx2U7jraWOydkuz/XWmhZ5FsAx8xzi4UxMdUZ8JBGdmGXzJ24R1shizIoXleODDTNGlrn12dH+TsWz57WBwUTpczTqRI4LpAeQfyOTKL1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: __hugetlb_acct_memory() today does more than just memory accounting. When there's insufficient HugeTLB pages, __hugetlb_acct_memory() will attempt to get surplus pages. This change adds a flag to disable getting surplus pages if there are insufficient HugeTLB pages. Signed-off-by: Ackerley Tng Change-Id: Id79fdeaa236b4fed38fc3c20482b03fff729198f --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 77 +++++++++++++++++++++++++++++++---------- 3 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index e4de5425838d..609a88950354 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1424,7 +1424,7 @@ hugetlbfs_fill_super(struct super_block *sb, struct fs_context *fc) if (ctx->max_hpages != -1 || ctx->min_hpages != -1) { sbinfo->spool = hugepage_new_subpool(ctx->hstate, ctx->max_hpages, - ctx->min_hpages); + ctx->min_hpages, true); if (!sbinfo->spool) goto out_free; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8ba941d88956..c59264391c33 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -116,7 +116,7 @@ extern int hugetlb_max_hstate __read_mostly; for ((h) = hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, - long min_hpages); + long min_hpages, bool use_surplus); void hugepage_put_subpool(struct hugepage_subpool *spool); void hugetlb_dup_vma_private(struct vm_area_struct *vma); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b088fe002a2..d22c5a8fd441 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -115,6 +115,7 @@ static int num_fault_mutexes __ro_after_init; struct mutex *hugetlb_fault_mutex_table __ro_after_init; /* Forward declaration */ +static int __hugetlb_acct_memory(struct hstate *h, long delta, bool use_surplus); static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); @@ -162,7 +163,7 @@ static inline void unlock_or_release_subpool(struct hugepage_subpool *spool, } struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, - long min_hpages) + long min_hpages, bool use_surplus) { struct hugepage_subpool *spool; @@ -176,7 +177,8 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, spool->hstate = h; spool->min_hpages = min_hpages; - if (min_hpages != -1 && hugetlb_acct_memory(h, min_hpages)) { + if (min_hpages != -1 && + __hugetlb_acct_memory(h, min_hpages, use_surplus)) { kfree(spool); return NULL; } @@ -2382,35 +2384,64 @@ static nodemask_t *policy_mbind_nodemask(gfp_t gfp) return NULL; } -/* - * Increase the hugetlb pool such that it can accommodate a reservation - * of size 'delta'. +/** + * hugetlb_hstate_reserve_pages() - Reserve @requested number of hugetlb pages + * from hstate @h. + * + * @h: the hstate to reserve from. + * @requested: number of hugetlb pages to reserve. + * + * If there are insufficient available hugetlb pages, no reservations are made. + * + * Return: the number of surplus pages required to meet the @requested number of + * hugetlb pages. */ -static int gather_surplus_pages(struct hstate *h, long delta) +static int hugetlb_hstate_reserve_pages(struct hstate *h, long requested) + __must_hold(&hugetlb_lock) +{ + long needed; + + needed = (h->resv_huge_pages + requested) - h->free_huge_pages; + if (needed <= 0) { + h->resv_huge_pages += requested; + return 0; + } + + return needed; +} + +/** + * gather_surplus_pages() - Increase the hugetlb pool such that it can + * accommodate a reservation of size @requested. + * + * @h: the hstate in concern. + * @requested: The requested number of hugetlb pages. + * @needed: The number of hugetlb pages the pool needs to be increased by, based + * on current number of reservations and free hugetlb pages. + * + * Return: 0 if successful or negative error otherwise. + */ +static int gather_surplus_pages(struct hstate *h, long requested, long needed) __must_hold(&hugetlb_lock) { LIST_HEAD(surplus_list); struct folio *folio, *tmp; int ret; long i; - long needed, allocated; + long allocated; bool alloc_ok = true; int node; nodemask_t *mbind_nodemask, alloc_nodemask; + if (needed == 0) + return 0; + mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h)); if (mbind_nodemask) nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed); else alloc_nodemask = cpuset_current_mems_allowed; - lockdep_assert_held(&hugetlb_lock); - needed = (h->resv_huge_pages + delta) - h->free_huge_pages; - if (needed <= 0) { - h->resv_huge_pages += delta; - return 0; - } - allocated = 0; ret = -ENOMEM; @@ -2448,7 +2479,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * because either resv_huge_pages or free_huge_pages may have changed. */ spin_lock_irq(&hugetlb_lock); - needed = (h->resv_huge_pages + delta) - + needed = (h->resv_huge_pages + requested) - (h->free_huge_pages + allocated); if (needed > 0) { if (alloc_ok) @@ -2469,7 +2500,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) * before they are reserved. */ needed += allocated; - h->resv_huge_pages += delta; + h->resv_huge_pages += requested; ret = 0; /* Free the needed pages to the hugetlb pool */ @@ -5284,7 +5315,7 @@ unsigned long hugetlb_total_pages(void) return nr_total_pages; } -static int hugetlb_acct_memory(struct hstate *h, long delta) +static int __hugetlb_acct_memory(struct hstate *h, long delta, bool use_surplus) { int ret = -ENOMEM; @@ -5316,7 +5347,12 @@ static int hugetlb_acct_memory(struct hstate *h, long delta) * above. */ if (delta > 0) { - if (gather_surplus_pages(h, delta) < 0) + long needed = hugetlb_hstate_reserve_pages(h, delta); + + if (!use_surplus && needed > 0) + goto out; + + if (gather_surplus_pages(h, delta, needed) < 0) goto out; if (delta > allowed_mems_nr(h)) { @@ -5334,6 +5370,11 @@ static int hugetlb_acct_memory(struct hstate *h, long delta) return ret; } +static int hugetlb_acct_memory(struct hstate *h, long delta) +{ + return __hugetlb_acct_memory(h, delta, true); +} + static void hugetlb_vm_op_open(struct vm_area_struct *vma) { struct resv_map *resv = vma_resv_map(vma); -- 2.49.0.1045.g170613ef41-goog