From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BBD8BC5B548 for ; Wed, 28 Aug 2024 14:25:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=kdxUh/QsjPS/2w22khnvUUjfJJ KaJb2fJ5NI90CsSk/iCJitVhJRhhuhE/YSgnpffJhecFAYPa7ga+rJppynpeE47ZhBge89ga2zEqY wXgjrxg+ZyAlJdpQGEZk/uqPyW5yZVzBMayci+H1NuWTzgp0uKQBq+Yv5KsiGnvUDB+SUBuNrdQfj hrupEHhy9y89SkA0Jir0hHsgNUT5FMuWreKrjKoik6gEbhVO1zfd9BTKM4/G1psyapMFrppmKM5tc IX5WjTEqiarNv4goGTFngC8w0YUCeQYJyeYHwp+IwrMqAZ+sSVamWn8TdOiyLHuU9+6C0/g4EU+8v jML60v1g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sjJbh-0000000Fiy2-2LgN; Wed, 28 Aug 2024 14:25:09 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sjJap-0000000FijP-1zoA for linux-arm-kernel@lists.infradead.org; Wed, 28 Aug 2024 14:24:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1724855054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=V2oiP8vTCQXba9mOXwJ4NC/6t/h5UAJ/dFD/xbJd4/zUjU/Bm+RzU9Yz8aqXMWiv51FysG JgIMcfKwkt49KKdO2fQU0JOg9Y8zezsbtVguD/ASdg6R2n+xkHlBPtnwbHF2Fr+FKSJ9Xl Zvv8oTKlQKIHzMIiPMBr6uYcAP+sbNc= Received: from mail-oo1-f69.google.com (mail-oo1-f69.google.com [209.85.161.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-622-l1A-ruwgP2qWPAnVuAW6FA-1; Wed, 28 Aug 2024 10:24:12 -0400 X-MC-Unique: l1A-ruwgP2qWPAnVuAW6FA-1 Received: by mail-oo1-f69.google.com with SMTP id 006d021491bc7-5da5516c615so8389889eaf.1 for ; Wed, 28 Aug 2024 07:24:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724855051; x=1725459851; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=355PW4sSqsC1A4aL38kel0SH9CqCQG/5oU1R6gXP9T0=; b=I8SopShD/qQQjbHeyO0XgLjkMY0s/Pk7GV6jCLZXoYb05Zm8ZGSkmM3+QGdRtCpNIl PW/B7Kewtw/8iSk5+VJ2bpo3HQPunehD3Ro1anu4N2bAtl4KhPIEvIKWh1KGjAVz0CGa enFU5PudJ/lmTHgjaBuD1dbR0BqsGjTUzQb6qZdPTR3mEhZK2JEKCEhxVWygnxoLdPWM T4TJFPShHv3ZzEHmI2yhxfkZr43uERqcb02AVP6SDsew59VXqsWMNOG1o14LGBUfvAw6 S2g69mS9BoxfhDbqU5tuElerjx0C66q+Z3X/m08FLAieL45a+Q4OQOmG54E/jXfMuN51 fckg== X-Forwarded-Encrypted: i=1; AJvYcCW9wsNAT7+p50oLarbMo0wNxBXIP9MnJJTlj/PODbZ3estyHtoQtjocD/msJCSCMoeQak3uRIEd5w9PqwBtmsJU@lists.infradead.org X-Gm-Message-State: AOJu0Yw3XlIHPLzqoxeqzsRkvw+8vHzXbDL9M3HFNhoh8013ZKssant6 zQ8HM9ZTgFviMO046kEh8IJ1H4hES/PCsAbJ9WhXtX8qnVjW6O4kLbdNNpSPAVmpFQ7iIJjHGBi oqxKAStlKl4Vidp7kk06UTBAaPGGJQbYf8Ui2aDwk796Cv884AWggXGbn0qtUMylawnFBE6Tl X-Received: by 2002:a05:6820:22a9:b0:5d8:6769:9d85 with SMTP id 006d021491bc7-5dcc6259314mr17483950eaf.6.1724855051609; Wed, 28 Aug 2024 07:24:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHrf66e2cWOHb1ey05OprbItKe0gmFxS5nGBvoISWxV5VBk1weYo4yvuE0vLG+8egFr0ZhZGA== X-Received: by 2002:a05:6820:22a9:b0:5d8:6769:9d85 with SMTP id 006d021491bc7-5dcc6259314mr17483914eaf.6.1724855051201; Wed, 28 Aug 2024 07:24:11 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5dcb5e9451bsm3044256eaf.42.2024.08.28.07.24.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Aug 2024 07:24:10 -0700 (PDT) Date: Wed, 28 Aug 2024 10:24:05 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Andrew Morton , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , Yan Zhao , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-7-peterx@redhat.com> <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com> MIME-Version: 1.0 In-Reply-To: <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240828_072415_618356_5C9F9961 X-CRM114-Status: GOOD ( 46.75 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Aug 28, 2024 at 09:44:04AM +0200, David Hildenbrand wrote: > On 26.08.24 22:43, Peter Xu wrote: > > Teach folio_walk_start() to recognize special pmd/pud mappings, and fail > > them properly as it means there's no folio backing them. > > > > Cc: David Hildenbrand > > Signed-off-by: Peter Xu > > --- > > mm/pagewalk.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > > index cd79fb3b89e5..12be5222d70e 100644 > > --- a/mm/pagewalk.c > > +++ b/mm/pagewalk.c > > @@ -753,7 +753,7 @@ struct folio *folio_walk_start(struct folio_walk *fw, > > fw->pudp = pudp; > > fw->pud = pud; > > - if (!pud_present(pud) || pud_devmap(pud)) { > > + if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) { > > spin_unlock(ptl); > > goto not_found; > > } else if (!pud_leaf(pud)) { > > @@ -783,7 +783,7 @@ struct folio *folio_walk_start(struct folio_walk *fw, > > fw->pmdp = pmdp; > > fw->pmd = pmd; > > - if (pmd_none(pmd)) { > > + if (pmd_none(pmd) || pmd_special(pmd)) { > > spin_unlock(ptl); > > goto not_found; > > } else if (!pmd_leaf(pmd)) { > > As raised, this is not the right way to to it. You should follow what > CONFIG_ARCH_HAS_PTE_SPECIAL and vm_normal_page() does. > > It's even spelled out in vm_normal_page_pmd() that at the time it was > introduced there was no pmd_special(), so there was no way to handle that. I can try to do something like that, but even so it'll be mostly cosmetic changes, and AFAICT there's no real functional difference. Meanwhile, see below comment. > > > > diff --git a/mm/memory.c b/mm/memory.c > index f0cf5d02b4740..272445e9db147 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -672,15 +672,29 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > { > unsigned long pfn = pmd_pfn(pmd); > - /* > - * There is no pmd_special() but there may be special pmds, e.g. > - * in a direct-access (dax) mapping, so let's just replicate the > - * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here. > - */ This one is correct; I overlooked this comment which can be obsolete. I can either refine this patch or add one patch on top to refine the comment at least. > + if (IS_ENABLED(CONFIG_ARCH_HAS_PMD_SPECIAL)) { We don't yet have CONFIG_ARCH_HAS_PMD_SPECIAL, but I get your point. > + if (likely(!pmd_special(pmd))) > + goto check_pfn; > + if (vma->vm_ops && vma->vm_ops->find_special_page) > + return vma->vm_ops->find_special_page(vma, addr); Why do we ever need this? This is so far destined to be totally a waste of cycles. I think it's better we leave that until either xen/gntdev.c or any new driver start to use it, rather than keeping dead code around. > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > + return NULL; > + if (is_huge_zero_pmd(pmd)) > + return NULL; This is meaningless too until we make huge zero pmd apply special bit first, which does sound like to be outside the scope of this series. > + if (pmd_devmap(pmd)) > + /* See vm_normal_page() */ > + return NULL; When will it be pmd_devmap() if it's already pmd_special()? > + return NULL; And see this one.. it's after: if (xxx) return NULL; if (yyy) return NULL; if (zzz) return NULL; return NULL; Hmm?? If so, what's the difference if we simply check pmd_special and return NULL.. > + } > + > + /* !CONFIG_ARCH_HAS_PMD_SPECIAL case follows: */ > + > if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { > if (vma->vm_flags & VM_MIXEDMAP) { > if (!pfn_valid(pfn)) > return NULL; > + if (is_huge_zero_pmd(pmd)) > + return NULL; I'd rather not touch here as this series doesn't change anything for MIXEDMAP yet.. > goto out; > } else { > unsigned long off; > @@ -692,6 +706,11 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > } > } > + /* > + * For historical reasons, these might not have pmd_special() set, > + * so we'll check them manually, in contrast to vm_normal_page(). > + */ > +check_pfn: > if (pmd_devmap(pmd)) > return NULL; > if (is_huge_zero_pmd(pmd)) > > > > We should then look into mapping huge zeropages also with pmd_special. > pmd_devmap we'll leave alone until removed. But that's indeoendent of your series. This does look reasonable to match what we do with pte zeropage. Could you remind me what might be the benefit when we switch to using special bit for pmd zero pages? > > I wonder if CONFIG_ARCH_HAS_PTE_SPECIAL is sufficient and we don't need additional > CONFIG_ARCH_HAS_PMD_SPECIAL. The hope is we can always reuse the bit in the pte to work the same for pmd/pud. Now we require arch to select ARCH_SUPPORTS_HUGE_PFNMAP to say "pmd/pud has the same special bit defined". > > As I said, if you need someone to add vm_normal_page_pud(), I can handle that. I'm pretty confused why we need that for this series alone. If you prefer vm_normal_page_pud() to be defined and check pud_special() there, I can do that. But again, I don't yet see how that can make a functional difference considering the so far very limited usage of the special bit, and wonder whether we can do that on top when it became necessary (and when we start to have functional requirement of such). Thanks, -- Peter Xu