From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76B7EC83F1A for ; Thu, 17 Jul 2025 11:52:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64CC68D000B; Thu, 17 Jul 2025 07:52:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FC808D0009; Thu, 17 Jul 2025 07:52:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 428D38D000B; Thu, 17 Jul 2025 07:52:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 293CF8D0009 for ; Thu, 17 Jul 2025 07:52:33 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0A97C1107B3 for ; Thu, 17 Jul 2025 11:52:33 +0000 (UTC) X-FDA: 83673594186.19.0E0FBB3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf09.hostedemail.com (Postfix) with ESMTP id D0739140018 for ; Thu, 17 Jul 2025 11:52:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dzFJrD9B; spf=pass (imf09.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752753150; a=rsa-sha256; cv=none; b=U++r685RE2wk9orrVWO2myzIcHVRNPUIO9+wP8lIool5yk6AabZwcUTPh6u3xZCnQK7G0n UgoprhJMaeIBcJm6uyO1WCXWFKFpGZMe1NY8k5qCtU55Yv+vDk+nCYIwWlCjPPPpgAdm9D y5PpcGOHKGxZuJjeLdZK6J4lzd+ltH0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dzFJrD9B; spf=pass (imf09.hostedemail.com: domain of dhildenb@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhildenb@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752753150; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WXxbBdWkSFir8qoOw7q+QMZqWjLZWA9TcFjO3xeUJH4=; b=5De99DjE4jLY4Iun4onJolZtutFCvPx6jJ5xhUajs5vf61zIpuuz9wFi5UWngJvG5QR5yF zhKbPqjZkAwJiZMLpV4AL0hjmJWQWsQXvJGUB3Jl2ibE7a01kN06hdYHJ4tlq2214JrXZ5 uY1XrGz3H77EZVQLHljy+JUTofHLS3Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752753150; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WXxbBdWkSFir8qoOw7q+QMZqWjLZWA9TcFjO3xeUJH4=; b=dzFJrD9BXtOo90nvysQW+6hWrxvznyBjLpwndhVXoQOD3gweHcP6iZiH9RLPvnSL3YuYbe f+epGBITaMKXV1MAALtRUrTZx7IHoRoxSHC2//jsxCS0ZVOC3mud6OpvCoSDayI741UcTq XV3zGdcTOG+nZyZHh425dsySe4wwWTY= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-245-VIMt-3JbN9-inPZ678YNjg-1; Thu, 17 Jul 2025 07:52:28 -0400 X-MC-Unique: VIMt-3JbN9-inPZ678YNjg-1 X-Mimecast-MFC-AGG-ID: VIMt-3JbN9-inPZ678YNjg_1752753148 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3a4eee2398bso397799f8f.1 for ; Thu, 17 Jul 2025 04:52:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752753148; x=1753357948; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WXxbBdWkSFir8qoOw7q+QMZqWjLZWA9TcFjO3xeUJH4=; b=S1TkKUEVBzOgndHJD1/UjPkJwvw00L+d+jfU8YomyYcnGox4PZtmYyNqzeW87lFa6k 02Tal9NxzMi0iGkVXHtuYML54VECFpUVrHUVGWNQO1HT9OtYGVgIL7QI0rwscdAiWnpB XLu4UAnb6MtKJHycoxvyZaiVWAL2Spt0r3eN9loKY9gDjDL4xbh0lJNtT7FyYolcxZkm RjkuZfXuV4tUn4mGGbAPTkNw/z78WqvA2IneGriOJXuosfd9ZJpZs49vPe0P7EvprhI1 meKPecqd9ToUkvafKoh1j04E8rxQR6LBGQTQFCnqXgYksFOeZIfbJIISC2q+oI9/5Gcn 0UDw== X-Gm-Message-State: AOJu0YyqqyEBqNJ5dRIgfPdYzie8tmnhGf1+fHWI8u0nFb4l/Qm8pvwW mqN6Wmwpgwgh8ItGnUD84jKGlCCwNRUiQc9bZTNEbEmU/nHWGAi+LOKPXqZVKsJtfo+EyJ0L/i8 DqwpJBpg0oN20Oko+p+vjMKIQseHNJEDmTScneUvzxQKe9226mwSe X-Gm-Gg: ASbGncvGPesZtw8cuZiyxpF9COv9aQz+VGQbj9fSkukRY6sO6SvJFbxSx3uGoPjTZ0f xYtePMFnoCmqIsGB/FJmBqrVevKzB7IC6paBDlHMivrdZJZ4a3ofjnbXxPdTVVvC+T4Cyoys+Vl B0YiJYDtnmzOqglfCkUY3pY5EAbcfQsxPU1Q9ww4NV1+OTb2JTwdMyZ13PqFbqr9Hs/k4IlLizg cZPlvOR7iad0CDShyrDHS7HjohUzjbgGnM6U65sNcGJ4z4naF9hg4YQewpRN6/fM1YH/rDA/8LT 5j0SzO9E0CywnfnHWUM7Un/Qjeh2lgQho+8chZSL9yk6Bo6gy8WoN/yvbKHEYTJONopNyJLPByq Ac+5e+attrmdocsgJ01rk1+w= X-Received: by 2002:a05:6000:4b05:b0:3a4:e4ee:4ca9 with SMTP id ffacd0b85a97d-3b60dd72378mr5333957f8f.23.1752753147614; Thu, 17 Jul 2025 04:52:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVug96IdPAxHxRCwLhhCK8Tw1f2oOZhwD0absH2YXaDMF3tjDXL2HLEbyWszstU7d4tmbzEQ== X-Received: by 2002:a05:6000:4b05:b0:3a4:e4ee:4ca9 with SMTP id ffacd0b85a97d-3b60dd72378mr5333908f8f.23.1752753147033; Thu, 17 Jul 2025 04:52:27 -0700 (PDT) Received: from localhost (p200300d82f1f36000dc826ee9aa9fdc7.dip0.t-ipconnect.de. [2003:d8:2f1f:3600:dc8:26ee:9aa9:fdc7]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3b5e8e25e75sm20438446f8f.87.2025.07.17.04.52.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Jul 2025 04:52:26 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, xen-devel@lists.xenproject.org, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, David Hildenbrand , Andrew Morton , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Dan Williams , Matthew Wilcox , Jan Kara , Alexander Viro , Christian Brauner , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Baolin Wang , Nico Pache , Ryan Roberts , Dev Jain , Barry Song , Jann Horn , Pedro Falcato , Hugh Dickins , Oscar Salvador , Lance Yang Subject: [PATCH v2 5/9] mm/huge_memory: mark PMD mappings of the huge zero folio special Date: Thu, 17 Jul 2025 13:52:08 +0200 Message-ID: <20250717115212.1825089-6-david@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250717115212.1825089-1-david@redhat.com> References: <20250717115212.1825089-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: EEy5g5rExbbuwNqWwBTADML2GnhXZaH2hzKiWLHtGQk_1752753148 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D0739140018 X-Stat-Signature: mjfd7wwey8ji6aothze8bwwa7srjyins X-HE-Tag: 1752753150-927012 X-HE-Meta: U2FsdGVkX1+wxkf1m8ubYQQwMQ+WGOIdvOgc+UNWVJUU9o/a9ydmD4uDZc1tGhlXUBrCzHCFspwuQYp/Na5D5+nvhqmvaWw95eFgFvA8VzncHtjLoakZm22tP2MVLzKyKTpYUz23ehF356IhbenO0hKxhTPSJGFa8fXbBtLFB17kt4hUNFJGpQLKHXaMIWgUIv8U2MDfvv5OV93D/2fzlptAx76n4d5EODQbEnXGeGSa6vL2VQba2tf3b17QIBDUyyCNa1EEAa04dhyJx4m3/Qcf1PrihGXrlDzt9D16jFgd2l8agQKvdL6qRZv6kCWXrWZdAHTATUXqmIJPzBybbDKwnSSzqTlPT97KYvIZ2Fwh7je/vPYav7TNBmV0dSkd2BJK2VlrvVm5q7+4ypx8o/1YYpzODaDGiNSYD1eYuXoHw2poxUhnN9WMl7lyyUWuFvW/6hEzKLIIPSOe77WHSFZGg+49favo/KGs2f1cjzUgAVBeeKNugu/HCqfDe+7xLQAdsGhwdZEZ930qc9DBJIj5wtnlzvhlirOVLwABUuju9Bq8nrGJdSzWblvdIf/YosTHBlfxCnDp0rMtCPP+sosFpqzLgGcVQgtSrW23SQIjW/9M2k+oKU2FljYs2xrvSq/CXM10XyZDY2jG+Z0eIwSiNmZkjhOQQuhUMTx3DfZgZozn2A+XqovzdalEZCsNjLMu2Zg2P5JrDoZR85+o/Z2tMJ3Twg6yIXkVIzm+JgRvqQWsk6EtLQJ+wWpoJfliXcjd4n+krLfG+umsXQjDRbeSpfQG+orXOJajNac3wFV90zWnEt50Bgjip8iymIK/HfxM4qUrlRBcx3BrDdBBl4s6mID7ddeHSBzz9tyti63gLbRIfLlSeT61m+Ry5KIcQSrFRj7FvuPjd35lOfRMNaW1i6qEKPC/5QLh9LeUgrl5/WG/EpLTE4hrQ4ndg0d6MbtkTH5opmERKW2a1we LWZGRV4l dbtrLcUmtX2b6Umq9BXzUrs6jBOxeYugSy/97VCcr2RI35aZ+GqkGcTJeCAGHgutMBp3YSE7vOpA8DBezURV9RtPFEPHoAxRp3aszMrFUfxiRZbWULjWS+rmVxHa5PwB/Y2OuEpZ+cIIVUnBBemuadT9uD8Wf0OOh6VFx/2sl4q9oJFbp28BeZytPSer/9QTWjM9di64gXmcBS5x1XAwTpo/mzHUHXiUtaaH7dF0l/7/tpSxBXUF1cWB1Odl0dVgU3TqNjOXKZ517tc4pjn7NRP2jTc9OYrsGJKaqvSZnvJ5kbA9mNuG+0TDOhEjDkCg2NTkFUvy58AetoPldubLSjEtaDTNmNvCGHZPw9KydJcLUbxCCPVjGIWJSfpqrsEo2Yr5HKpE0W609uGutA2n6YET+kqViJxMjKUvD91FLF4RPgNcI+L2E6qBmOMG98vZtuv4OvRyCz2oZH7f6+UiViPL3tsnrcO3ilIrUEDVcLwQui+y6/Ean2aaAtMtkJFgBSa4RDCrOkBpASsLvFXATnF4iVw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The huge zero folio is refcounted (+mapcounted -- is that a word?) differently than "normal" folios, similarly (but different) to the ordinary shared zeropage. For this reason, we special-case these pages in vm_normal_page*/vm_normal_folio*, and only allow selected callers to still use them (e.g., GUP can still take a reference on them). vm_normal_page_pmd() already filters out the huge zero folio. However, so far we are not marking it as special like we do with the ordinary shared zeropage. Let's mark it as special, so we can further refactor vm_normal_page_pmd() and vm_normal_page(). While at it, update the doc regarding the shared zero folios. Reviewed-by: Oscar Salvador Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 5 ++++- mm/memory.c | 14 +++++++++----- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index db08c37b87077..3f9a27812a590 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1320,6 +1320,7 @@ static void set_huge_zero_folio(pgtable_t pgtable, struct mm_struct *mm, { pmd_t entry; entry = folio_mk_pmd(zero_folio, vma->vm_page_prot); + entry = pmd_mkspecial(entry); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, haddr, pmd, entry); mm_inc_nr_ptes(mm); @@ -1429,7 +1430,9 @@ static vm_fault_t insert_pmd(struct vm_area_struct *vma, unsigned long addr, if (fop.is_folio) { entry = folio_mk_pmd(fop.folio, vma->vm_page_prot); - if (!is_huge_zero_folio(fop.folio)) { + if (is_huge_zero_folio(fop.folio)) { + entry = pmd_mkspecial(entry); + } else { folio_get(fop.folio); folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma); add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR); diff --git a/mm/memory.c b/mm/memory.c index 92fd18a5d8d1f..173eb6267e0ac 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -537,7 +537,13 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * * "Special" mappings do not wish to be associated with a "struct page" (either * it doesn't exist, or it exists but they don't want to touch it). In this - * case, NULL is returned here. "Normal" mappings do have a struct page. + * case, NULL is returned here. "Normal" mappings do have a struct page and + * are ordinarily refcounted. + * + * Page mappings of the shared zero folios are always considered "special", as + * they are not ordinarily refcounted. However, selected page table walkers + * (such as GUP) can still identify these mappings and work with the + * underlying "struct page". * * There are 2 broad cases. Firstly, an architecture may define a pte_special() * pte bit, in which case this function is trivial. Secondly, an architecture @@ -567,9 +573,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr, * * VM_MIXEDMAP mappings can likewise contain memory with or without "struct * page" backing, however the difference is that _all_ pages with a struct - * page (that is, those where pfn_valid is true) are refcounted and considered - * normal pages by the VM. The only exception are zeropages, which are - * *never* refcounted. + * page (that is, those where pfn_valid is true, except the shared zero + * folios) are refcounted and considered normal pages by the VM. * * The disadvantage is that pages are refcounted (which can be slower and * simply not an option for some PFNMAP users). The advantage is that we @@ -649,7 +654,6 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, { unsigned long pfn = pmd_pfn(pmd); - /* Currently it's only used for huge pfnmaps */ if (unlikely(pmd_special(pmd))) return NULL; -- 2.50.1