From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEE07C83F17 for ; Tue, 15 Jul 2025 14:53:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 833446B0096; Tue, 15 Jul 2025 10:53:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 80B516B0098; Tue, 15 Jul 2025 10:53:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 720EB6B0099; Tue, 15 Jul 2025 10:53:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 640846B0096 for ; Tue, 15 Jul 2025 10:53:32 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C0FF12DED7 for ; Tue, 15 Jul 2025 14:53:32 +0000 (UTC) X-FDA: 83666792664.04.C4E1C3F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id D14E18000C for ; Tue, 15 Jul 2025 14:53:29 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=D54OADV1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752591209; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yG7W7QXKxv1nGJlD8ktB4MquSjEIuQB5dAsyzH+sdtE=; b=KA5tfI7RiEWEjuWLgAuuUJ1hHtqf+7ob3a7UywUfWXtLtt8hPBsCXJm/rogcZCn1B52csv RIx5YXfW7SOrJDeoutwdO/s3NRCzly5tSOothJDaVbsGONg2r8Zy1cmDpqMa34s1Wv4p74 uADijpINvYvgqOK9kRJc7G1Rvl8enpE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752591210; a=rsa-sha256; cv=none; b=N3UJm+XoJ2/Wbk01iZkenv3yDhsCw5mMg51Tlrp3GODSyMFYbDCnCUuYJMfdluZbQikLwD YC4Lceczy2asyIB7/FdZW7Y3z0Jh0PjnbKatlgRLpKiFkLxKrJhuOtN1mCC8PaVU0/TRo0 AucpuvVXO4mBuqI8+4y1j48hbcKHruY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=D54OADV1; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752591209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yG7W7QXKxv1nGJlD8ktB4MquSjEIuQB5dAsyzH+sdtE=; b=D54OADV1y8LCgv6gK1KglM1ALF8WdsUQxHEzo1NTQHeETA9g6NmMBWm6zYLyU+Cb3HqWHX nyOBYJzouoSumkvPV4iTxK5fn50ID+1skyPpwlu1UTXPmWeIoNjE2gHHkKAodixxkFkgcR ae4jknzIAg+FfNQYrcf8xK8YmeJpm5M= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-451-jH04XtPlOpqVSYroMd1CFw-1; Tue, 15 Jul 2025 10:53:28 -0400 X-MC-Unique: jH04XtPlOpqVSYroMd1CFw-1 X-Mimecast-MFC-AGG-ID: jH04XtPlOpqVSYroMd1CFw_1752591206 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-3a4eb6fcd88so3680150f8f.1 for ; Tue, 15 Jul 2025 07:53:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752591206; x=1753196006; h=content-transfer-encoding:in-reply-to:organization:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yG7W7QXKxv1nGJlD8ktB4MquSjEIuQB5dAsyzH+sdtE=; b=ag7CGowXjeoyQvT3XO1oL0w1OQTuf/45lR2VauItH4o/SiOyhyA0IfwfNtXuFEEDsm 5wnq2WkCPqJZJPKOYQMEPquyT9iQGGwK04lUoUrSR6YZijZ5rur1Nar0uGJamJ23k10d 4qxFQj63ZPxtwcWxEX92gpzJ7VJPSVPi+QNUhudhxh4BXCF6WN6d43Z8xD/NKbVwT+j6 9BqN2/DDgalQWCWQOgnqvtess3LDhccgI6YSwPRmirjt7iA6FCZVePpFr5vsen6f8TFZ VkY+kjWhKxuH5O4C8mRLd4IyOWrINnGXxptxw8A8JQziFqrlCjsfABYKJMq3DK2J8Lxo fG7w== X-Forwarded-Encrypted: i=1; AJvYcCVrRXJj9hsVbp++oEEzbm+yE3Vj4mJIWCnJJjR2qc4z6Gd29KpfAJefsvufZt68rk5bE4KdmoF3zA==@kvack.org X-Gm-Message-State: AOJu0YxPR6daRAzsgMbl6z+IkeHpNvStB2bKh9d5S2JwEytTZp9bzmJ1 SKcNBiqnBhNa8VHhcjGCi3AdDLCtk2oYgsFDx/n/2T7r0z98VxG7PRAyetSFJWjQogQC2U1iWDb WOYNdsMbaYsf5xHMQNOH52v021KIQ5eepZDg/2iR2pXY+Dfq1G9jA X-Gm-Gg: ASbGncvJ6sTkgDIbdPbJuovqsk0h+Yjm07Ksp8sCgj7AVbsoCPNi4HTKS5A8RDzhs6c yNrRHImF74R+Ioa4LQ/xXRbzyUj77wcHt2XEJLD4+5fEaJn54vVaxelgvT29LJ1v4fH6cu7YPbt YfPeKDH8N+KvK74WfqymhwvT2RMThKsPw8kcY2UgtdpxuMs2I7rSH5sD6ZubctYsNUm1r4971GR 4rKchfZDS/8uUMG4qI4rB8t/k8EQs4/+984koYEciktKVr/4s/Bafu+LUlI3aI+Dl5UseOL5xHs lSMfWJK61iOiVPi2eRAxpXZn3zxy9zSQimp/jKzpHg6wnGcFGeVIOCDBRvP11LOsVkhryJW6B+V JGahatdyl986QGRgvEr9k8hDFNUBNqMqx7zG+l6mIeSMdPz3GPPz6rbOGjQc1dVWXRDo= X-Received: by 2002:a05:6000:1882:b0:3a3:6595:9209 with SMTP id ffacd0b85a97d-3b5f2dfdfdfmr14440940f8f.36.1752591205644; Tue, 15 Jul 2025 07:53:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHABYfEu0aA45rJWgW7Ehq+BxklHF1kyk/Dk0aXHgWXMrgHnaeRAV1JwoCX+I/OL/h/FHyMvw== X-Received: by 2002:a05:6000:1882:b0:3a3:6595:9209 with SMTP id ffacd0b85a97d-3b5f2dfdfdfmr14440880f8f.36.1752591205054; Tue, 15 Jul 2025 07:53:25 -0700 (PDT) Received: from ?IPV6:2003:d8:2f28:4900:2c24:4e20:1f21:9fbd? (p200300d82f2849002c244e201f219fbd.dip0.t-ipconnect.de. [2003:d8:2f28:4900:2c24:4e20:1f21:9fbd]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-454d5053725sm206624595e9.16.2025.07.15.07.53.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 15 Jul 2025 07:53:24 -0700 (PDT) Message-ID: Date: Tue, 15 Jul 2025 16:53:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 3/5] mm: add static PMD zero page From: David Hildenbrand To: "Pankaj Raghav (Samsung)" , Suren Baghdasaryan , Ryan Roberts , Baolin Wang , Borislav Petkov , Ingo Molnar , "H . Peter Anvin" , Vlastimil Babka , Zi Yan , Mike Rapoport , Dave Hansen , Michal Hocko , Lorenzo Stoakes , Andrew Morton , Thomas Gleixner , Nico Pache , Dev Jain , "Liam R . Howlett" , Jens Axboe Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, x86@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Darrick J . Wong" , mcgrof@kernel.org, gost.dev@samsung.com, hch@lst.de, Pankaj Raghav References: <20250707142319.319642-1-kernel@pankajraghav.com> <20250707142319.319642-4-kernel@pankajraghav.com> <26fded53-b79d-4538-bc56-3d2055eb5d62@redhat.com> Organization: Red Hat In-Reply-To: <26fded53-b79d-4538-bc56-3d2055eb5d62@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 9-WLJ5XDZBxaH3XnYkJ-CvEwLc2n8-01y6piTxMX3Zc_1752591206 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: D14E18000C X-Stat-Signature: xhxzuifjdxxjx9mm7hwdispp3i83jp7i X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1752591209-614358 X-HE-Meta: U2FsdGVkX1+mXZ8m5O+fqe+2XcMVxVFsGPLFt5LigTmAuBzDsWK65XLyTrxgpMDl3va24TO0B5xN3S4/u2r3t30HAvJtLcDIKKQGa4LTeSQ81qdlQdszKPzBL46QJr+MFitFcIc+3Q6xA/Itm1JeoZRdBY5TFYBcbYKLXSjwaXLqfZwnUmQilG291lOoNSApiIQ5R7tY8rnvpOkwRZeSfnJyRncWNSAyna35C+USs14gtHOXOF1Z2W9CrG+js0rIpMQJMdV6G51Pd+AdOkNkUUbD5FAIgEZkjGi/lPnNnvAUItPrOHnWW2O6MgS8x3yJ7DvsWXLDojc6M24OXXcTqAoO1dgQYF8y2kJoJSikVMIWLCsE/as3gcUI5ySc+qqV8ZAKg9GL4e8xnAilCGQrX7iOVAqnqY4WxgzV8GgOuqIFTQcUx5vow2LtEvB6Z3kurVncHEL23EK05YltbAHlw6RDFR4Bzy64H212qcih3lFUjhCjsa+D6xSi04Ni8lk6qmP3iDzmtuEI/CVZJWR3F7kuokxJ35wCu860PjkNrmdVSsfxQvNi+euTOF5lG3FpJkveQNgY4+MPc9o2M/VR9XsxFKdcw4PhDCPFhSreWEwrawyBIXNePz5VJVMAXXCchuo/zP/0lrp6Bz+jYoux3W+S+iC6A+GG73RxXqo1y0U+apTFWJyqP2H6ZJUF/sBzsWo1XiJ7U8iPSQLW2PM28GFweqgY20CngioATGaaNO0pV3Rhlm2I5uCSjus2gAusEyYHU0hIcpFIyZgkRpFlF2kgsBp1syM9hfhD7NBudPzF6UQAS6L0ZZbbmGMUfscsKF+sYx4Di6YeUZPIv9RBVWL6OVDFCevxUfR+hNnBEGaN8shbAVapvv/FMsUJa0lEGn5uAUhPaoyw/VVzDRdIjuM0+JXJkepTrwMO/not/y92F1Nh11XMLpllPbaU7kOmCm1ttXV68UJA0+rWuy+ CzPX4GFr WPN8lMkaMYu6j0OJK7RAGG+V0eGtPIK7UC4MpdhkYIlvS4ZBF9cn/Q+hn7w+PUPoK4lGKY2EAgxj7sGWlD54kHpX+ipGZJJrHkZPUa8lH8ouLTSg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15.07.25 16:21, David Hildenbrand wrote: > On 07.07.25 16:23, Pankaj Raghav (Samsung) wrote: >> From: Pankaj Raghav >> >> There are many places in the kernel where we need to zeroout larger >> chunks but the maximum segment we can zeroout at a time by ZERO_PAGE >> is limited by PAGE_SIZE. >> >> This is especially annoying in block devices and filesystems where we >> attach multiple ZERO_PAGEs to the bio in different bvecs. With multipage >> bvec support in block layer, it is much more efficient to send out >> larger zero pages as a part of single bvec. >> >> This concern was raised during the review of adding LBS support to >> XFS[1][2]. >> >> Usually huge_zero_folio is allocated on demand, and it will be >> deallocated by the shrinker if there are no users of it left. At moment, >> huge_zero_folio infrastructure refcount is tied to the process lifetime >> that created it. This might not work for bio layer as the completitions >> can be async and the process that created the huge_zero_folio might no >> longer be alive. > > Of course, what we could do is indicating that there is any untracked > reference to the huge zero folio, and then simply refuse to free it for > all eternity. > > Essentially, every any non-mm reference -> un-shrinkable. > > We'd still be allocating the huge zero folio dynamically. We could try > allocating it on first usage either from memblock, or from the buddy if > already around. > > Then, we'd only need a config option to allow for that to happen. Something incomplete and very hacky just to give an idea. It would try allocating it if there is actual code running that would need it, and then have it stick around forever. diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e0a27f80f390d..357e29e98d8d2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -481,6 +481,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); extern struct folio *huge_zero_folio; extern unsigned long huge_zero_pfn; +atomic_t huge_zero_folio_is_static; static inline bool is_huge_zero_folio(const struct folio *folio) { @@ -499,6 +500,16 @@ static inline bool is_huge_zero_pmd(pmd_t pmd) struct folio *mm_get_huge_zero_folio(struct mm_struct *mm); void mm_put_huge_zero_folio(struct mm_struct *mm); +struct folio *__get_static_huge_zero_folio(void); + +static inline struct folio *get_static_huge_zero_folio(void) +{ + if (!IS_ENMABLED(CONFIG_STATIC_HUGE_ZERO_FOLIO)) + return NULL; + if (likely(atomic_read(&huge_zero_folio_is_static))) + return huge_zero_folio; + return get_static_huge_zero_folio(); +} static inline bool thp_migration_supported(void) { @@ -509,7 +520,6 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze); bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmdp, struct folio *folio); - #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline bool folio_test_pmd_mappable(struct folio *folio) @@ -690,6 +700,11 @@ static inline int change_huge_pud(struct mmu_gather *tlb, { return 0; } + +static inline struct folio *static_huge_zero_folio(void) +{ + return NULL; +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline int split_folio_to_list_to_order(struct folio *folio, @@ -703,4 +718,14 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) return split_folio_to_list_to_order(folio, NULL, new_order); } +static inline struct folio *largest_zero_folio(void) +{ + struct folio *folio; + + folio = get_static_huge_zero_folio(); + if (folio) + return folio; + return page_folio(ZERO_PAGE(0)); +} + #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 31b5c4e61a574..eb49c69f9c8e2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -77,6 +77,7 @@ static bool split_underused_thp = true; static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; unsigned long huge_zero_pfn __read_mostly = ~0UL; +atomic_t huge_zero_folio_is_static __read_mostly; unsigned long huge_anon_orders_always __read_mostly; unsigned long huge_anon_orders_madvise __read_mostly; unsigned long huge_anon_orders_inherit __read_mostly; @@ -266,6 +267,25 @@ void mm_put_huge_zero_folio(struct mm_struct *mm) put_huge_zero_page(); } +#ifdef CONFIG_STATIC_HUGE_ZERO_FOLIO +struct folio *__get_static_huge_zero_folio(void) +{ + /* + * Our raised reference will prevent the shrinker from ever having + * success -> static. + */ + if (atomic_read(&huge_zero_folio_is_static)) + return huge_zero_folio; + /* TODO: memblock allocation if buddy is not up yet? Or Reject that earlier. */ + if (!get_huge_zero_page()) + return NULL; + if (atomic_cmpxchg(&huge_zero_folio_is_static, 0, 1) != 0) + put_huge_zero_page(); + return huge_zero_folio; + +} +#endif /* CONFIG_STATIC_HUGE_ZERO_FOLIO */ + static unsigned long shrink_huge_zero_page_count(struct shrinker *shrink, struct shrink_control *sc) { -- Cheers, David / dhildenb