From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16E83C5AE5A for ; Wed, 28 Aug 2024 14:24:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67EE86B0082; Wed, 28 Aug 2024 10:24:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 607CE6B0083; Wed, 28 Aug 2024 10:24:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4817A6B0085; Wed, 28 Aug 2024 10:24:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 268AE6B0082 for ; Wed, 28 Aug 2024 10:24:17 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8F7221A207F for ; Wed, 28 Aug 2024 14:24:16 +0000 (UTC) X-FDA: 82501874112.11.0F3BBC8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 6B652140008 for ; Wed, 28 Aug 2024 14:24:14 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Cu5tW/Hn"; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724854966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=3J9ZuPrffobveTzJOFEMkDguGnpyAcOdtAcHtwEF4exrhrfrqy3jcxNkld8nbvv1nMTOCD s7UpSgFXKXNvmzJVOSG+1AK/vTQRdlZQWCmos+GAqnURm+e7TvTmicJZ9GRzcXhTk1465j KypA0DO5OSU1+3FuPtLx7h4i44AoFYE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724854966; a=rsa-sha256; cv=none; b=UtkXq21tGek/8kJmpsDZS/yCCiqAF5Tqzjc5FiRAey3oLZyS8EvbnQFw5nOtREAr+Y+iyh TCGUAcFor+rZ8IReVBdE/FgDPEcBQiQIDBkCXAA5JFM7bc3ba0I2R9tI1j2ME2ANFEelA1 a/jgmb6scjqXe5NFCC6GnJ06ybKgK10= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Cu5tW/Hn"; spf=pass (imf26.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724855053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=Cu5tW/HndwrFYh7LFMBb+eOH9v+vM+iBCT6SmwKIQ+K4UlP8B++z8ynuo7O7sTBUZ2hDO/ 9yo67zgiADFZCqSr551KaAI99flnh3jDIdURY8zGiWvcEQgDCVPX5Wg2dJuXIm82Lnpv0u oS05/BpuG3TA9Rr9lmN8jlV+jDFz/7M= Received: from mail-oo1-f69.google.com (mail-oo1-f69.google.com [209.85.161.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-17-E3Vihrq5Ot6Ykdr6lCy3hg-1; Wed, 28 Aug 2024 10:24:12 -0400 X-MC-Unique: E3Vihrq5Ot6Ykdr6lCy3hg-1 Received: by mail-oo1-f69.google.com with SMTP id 006d021491bc7-5d5c7bfd8aaso8767964eaf.2 for ; Wed, 28 Aug 2024 07:24:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724855052; x=1725459852; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=e6t6M+n3Wv78/JQhArcfrqmQU65cuty645eKvecimQouL77kDSzIeyczLd76rpiQZM Bo+p5ZrEvlLBldpUktHCg2NTPo2TTy27PJmn4p9L0S04vqrgeTddNZHNEYRjGMB3qULW rGQRFOKgTXzK4m8D99pIVAZfCKpBZdYLj7vr5Rf7/OhRHcmA62TxY1zBkfNW5yFw2C3d b1Wv8ZLm/QiJsB3ypVwEDJPClUjnEgzxShs7wOMS007kXRwIIMmjkD/5lrHvR3HVztFz hXJP7sTBntDxGzNdzICr2UoTRPUe+XyEWy6UlQoz1d9c/rQdrGb3i2lh3aOjnJebIwKY QEQQ== X-Forwarded-Encrypted: i=1; AJvYcCWh7Kq9dmq7c845M0gpJLAHm60xLrLk/wz1hVuif+lgy1DH6vW0fjT9bjBIqG0n4Lh2fVVxcUGz5Q==@kvack.org X-Gm-Message-State: AOJu0YyjjKjmAzO85QeV4aHub8g9ClKgYKernJdwyKKmOO+8z8gf5Rcg J0VK0I+OgeUV1SWRvZ3gruqR2247iLA1WgmBeZsIDcqq7Kvj7BJKVxfrTe1gGlewWqGnVJy1q0f NbsPNH+aBs4HrWGwLLiRUiIE6kyYXd5uYSQtNLmpEEVEXkYW+ X-Received: by 2002:a05:6820:22a9:b0:5d8:6769:9d85 with SMTP id 006d021491bc7-5dcc6259314mr17483966eaf.6.1724855051617; Wed, 28 Aug 2024 07:24:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHrf66e2cWOHb1ey05OprbItKe0gmFxS5nGBvoISWxV5VBk1weYo4yvuE0vLG+8egFr0ZhZGA== X-Received: by 2002:a05:6820:22a9:b0:5d8:6769:9d85 with SMTP id 006d021491bc7-5dcc6259314mr17483914eaf.6.1724855051201; Wed, 28 Aug 2024 07:24:11 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5dcb5e9451bsm3044256eaf.42.2024.08.28.07.24.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Aug 2024 07:24:10 -0700 (PDT) Date: Wed, 28 Aug 2024 10:24:05 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Andrew Morton , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , Yan Zhao , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-7-peterx@redhat.com> <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com> MIME-Version: 1.0 In-Reply-To: <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: jz71681ekt6n189gjx5dntmhmr7kbzsg X-Rspamd-Queue-Id: 6B652140008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724855054-695121 X-HE-Meta: U2FsdGVkX1+8M7SSejnYHuLHBUfjqoD5nRleXrFaGvIkSLNt3OWp65MsRh+9BHXlvN1Re3x/IE0Oq5v8E7vYAkl3WnUxWWR4CV/OYsIHbFoaH9cz2cvQeT+LdFrfubJ8L/f1KJw9rRi+MB8gg6T/9ITeFBVTSYKDAe3C2gUZXcnDnyeuPVjzCYHsEcVlRUPsAf1z1t8lQT5f2jxwzUsuTOwf2j1U7M6CqE/Mr4WmgQ3gIYvqsJ8QCqDbhq/2im5BI9SUxyxJqVnLpEOXGhuhrZCtlpqYWkTvotKRHylmzJDiUK8yyhO7ovV6pEl7tMjuCptoIRXFbEOb6O/AiMYe0tbVlhL46nJ2Vh8vDxBvm++BJI2+Cap7nPAI6GGDQzH1rahBoa4ro/dL4NdDSwHTS0Pcs2v4alxNQectjK/C+rjkABvMxmbwqE7Wc473uFQAQqRw/278IbW77iTK9aRZWYAJs2DkUTjSQ0E9isiVo1JiEwsjFDqM/Ot6zMwP5BGyhoXYRCigwsE5yws4+1ugXnVZqpgY/QZbDrY3MgdZVjaMZbJLhO3PA8zrJ27KGtGDeXFW3gidG5Bssa7/1lSCM4V+5kwkTcz0vKYkO2WPbOJtOr18n99tl2Lbv56w929E/bFX5/KQO5af5BLVNY6T4BsbUqLARH3yzO1Q2Y5rFqr7TP/UjtY7/HUSk5pSzyc2NyOD/ua/2rRi+/HMIRM31DtTyB02HTk+6byWIoTfdduKndLtNgwlgoZFnvEiuO7zGIEHpLKriEfXA9IMfWzURYUItMkrnzgUQh8u2gcDhqUcXtzQlB627mTrrTCJH2MphUmkBUPuMg5hiT/RPivgjm3slSbwpWqfIV8itzaKTzKcDhuMBDnvmmA8Q5O9TbTuQLglo1d0TBdhHUyjau7tgx2EqFJxRHNhxKMYFb6njPhAKJVHXQ0/BAVXlg15G0517O84ys6m5bdNKiKSVnH FOVLLPHL 7CrK7R/SEo3izJtP9R0XeRQE3gpnqdgM87rt3XYRXxMpeONj0BAEVAQBrNNuWCv8t0NEAXf6DcsxrcCwUAQsB4uqj2E3qvXVlkVdvXSz7E6eyzI1H62yCQG2uNk3VHF+20iEOxq0HAeEqE4NQZouoW7FmFP6k8vJr5UqTyrNWP3BE652vu6TuySffiomYJ1xKLJprLjtE0279z9l4H7T17+xKyHN7N/FYKto6It7yZ1NiUHgEP5ZxAUWjwalcn2j+czVKKaoRYxocIdqdeFmRbaflkPgK0ZpOSH6LWLimsOZh16ndi5H6i1OxiTu+l3Q02rToDWUyypyBTKuru7Q2DYMbF/2MsILRRgqqkjl3WkReVYBBYwYlJHKklkqwfRL7RXgCcNPceIxfwLsQ8W6X88IvD1TV3crd/qKyPHzv3po1EgQkgKsppJ7D01nG1lp7o4E0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 28, 2024 at 09:44:04AM +0200, David Hildenbrand wrote: > On 26.08.24 22:43, Peter Xu wrote: > > Teach folio_walk_start() to recognize special pmd/pud mappings, and fail > > them properly as it means there's no folio backing them. > > > > Cc: David Hildenbrand > > Signed-off-by: Peter Xu > > --- > > mm/pagewalk.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > > index cd79fb3b89e5..12be5222d70e 100644 > > --- a/mm/pagewalk.c > > +++ b/mm/pagewalk.c > > @@ -753,7 +753,7 @@ struct folio *folio_walk_start(struct folio_walk *fw, > > fw->pudp = pudp; > > fw->pud = pud; > > - if (!pud_present(pud) || pud_devmap(pud)) { > > + if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) { > > spin_unlock(ptl); > > goto not_found; > > } else if (!pud_leaf(pud)) { > > @@ -783,7 +783,7 @@ struct folio *folio_walk_start(struct folio_walk *fw, > > fw->pmdp = pmdp; > > fw->pmd = pmd; > > - if (pmd_none(pmd)) { > > + if (pmd_none(pmd) || pmd_special(pmd)) { > > spin_unlock(ptl); > > goto not_found; > > } else if (!pmd_leaf(pmd)) { > > As raised, this is not the right way to to it. You should follow what > CONFIG_ARCH_HAS_PTE_SPECIAL and vm_normal_page() does. > > It's even spelled out in vm_normal_page_pmd() that at the time it was > introduced there was no pmd_special(), so there was no way to handle that. I can try to do something like that, but even so it'll be mostly cosmetic changes, and AFAICT there's no real functional difference. Meanwhile, see below comment. > > > > diff --git a/mm/memory.c b/mm/memory.c > index f0cf5d02b4740..272445e9db147 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -672,15 +672,29 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > { > unsigned long pfn = pmd_pfn(pmd); > - /* > - * There is no pmd_special() but there may be special pmds, e.g. > - * in a direct-access (dax) mapping, so let's just replicate the > - * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here. > - */ This one is correct; I overlooked this comment which can be obsolete. I can either refine this patch or add one patch on top to refine the comment at least. > + if (IS_ENABLED(CONFIG_ARCH_HAS_PMD_SPECIAL)) { We don't yet have CONFIG_ARCH_HAS_PMD_SPECIAL, but I get your point. > + if (likely(!pmd_special(pmd))) > + goto check_pfn; > + if (vma->vm_ops && vma->vm_ops->find_special_page) > + return vma->vm_ops->find_special_page(vma, addr); Why do we ever need this? This is so far destined to be totally a waste of cycles. I think it's better we leave that until either xen/gntdev.c or any new driver start to use it, rather than keeping dead code around. > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > + return NULL; > + if (is_huge_zero_pmd(pmd)) > + return NULL; This is meaningless too until we make huge zero pmd apply special bit first, which does sound like to be outside the scope of this series. > + if (pmd_devmap(pmd)) > + /* See vm_normal_page() */ > + return NULL; When will it be pmd_devmap() if it's already pmd_special()? > + return NULL; And see this one.. it's after: if (xxx) return NULL; if (yyy) return NULL; if (zzz) return NULL; return NULL; Hmm?? If so, what's the difference if we simply check pmd_special and return NULL.. > + } > + > + /* !CONFIG_ARCH_HAS_PMD_SPECIAL case follows: */ > + > if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { > if (vma->vm_flags & VM_MIXEDMAP) { > if (!pfn_valid(pfn)) > return NULL; > + if (is_huge_zero_pmd(pmd)) > + return NULL; I'd rather not touch here as this series doesn't change anything for MIXEDMAP yet.. > goto out; > } else { > unsigned long off; > @@ -692,6 +706,11 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > } > } > + /* > + * For historical reasons, these might not have pmd_special() set, > + * so we'll check them manually, in contrast to vm_normal_page(). > + */ > +check_pfn: > if (pmd_devmap(pmd)) > return NULL; > if (is_huge_zero_pmd(pmd)) > > > > We should then look into mapping huge zeropages also with pmd_special. > pmd_devmap we'll leave alone until removed. But that's indeoendent of your series. This does look reasonable to match what we do with pte zeropage. Could you remind me what might be the benefit when we switch to using special bit for pmd zero pages? > > I wonder if CONFIG_ARCH_HAS_PTE_SPECIAL is sufficient and we don't need additional > CONFIG_ARCH_HAS_PMD_SPECIAL. The hope is we can always reuse the bit in the pte to work the same for pmd/pud. Now we require arch to select ARCH_SUPPORTS_HUGE_PFNMAP to say "pmd/pud has the same special bit defined". > > As I said, if you need someone to add vm_normal_page_pud(), I can handle that. I'm pretty confused why we need that for this series alone. If you prefer vm_normal_page_pud() to be defined and check pud_special() there, I can do that. But again, I don't yet see how that can make a functional difference considering the so far very limited usage of the special bit, and wonder whether we can do that on top when it became necessary (and when we start to have functional requirement of such). Thanks, -- Peter Xu