linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
	Lance Yang <lance.yang@linux.dev>,
	 Oscar Salvador <osalvador@suse.de>,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, nvdimm@lists.linux.dev,
	 Andrew Morton <akpm@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>,
	 Stefano Stabellini <sstabellini@kernel.org>,
	 Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
	 Dan Williams <dan.j.williams@intel.com>,
	 Alistair Popple <apopple@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,  Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Zi Yan <ziy@nvidia.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,  Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>,  Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,  Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	 Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page()
Date: Mon, 7 Jul 2025 19:52:51 -0700 (PDT)	[thread overview]
Message-ID: <0b1cb496-4e50-252e-5bcf-74a89a78a8c0@google.com> (raw)
In-Reply-To: <36dd6b12-f683-48a2-8b9c-c8cd0949dfdc@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5546 bytes --]

On Mon, 7 Jul 2025, David Hildenbrand wrote:
> On 07.07.25 08:31, Hugh Dickins wrote:
> > On Fri, 4 Jul 2025, David Hildenbrand wrote:
> >> On 03.07.25 16:44, Lance Yang wrote:
> >>> On 2025/7/3 20:39, David Hildenbrand wrote:
> >>>> On 03.07.25 14:34, Lance Yang wrote:
> >>>>> On Mon, Jun 23, 2025 at 10:04 PM David Hildenbrand <david@redhat.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> On 20.06.25 14:50, Oscar Salvador wrote:
> >>>>>>> On Tue, Jun 17, 2025 at 05:43:32PM +0200, David Hildenbrand wrote:
> >>>>>>>> In 2009, we converted a VM_BUG_ON(!pfn_valid(pfn)) to the current
> >>>>>>>> highest_memmap_pfn sanity check in commit 22b31eec63e5 ("badpage:
> >>>>>>>> vm_normal_page use print_bad_pte"), because highest_memmap_pfn was
> >>>>>>>> readily available.

highest_memmap_pfn was introduced by that commit for this purpose.

> >>>>>>>>
> >>>>>>>> Nowadays, this is the last remaining highest_memmap_pfn user, and
> >>>>>>>> this
> >>>>>>>> sanity check is not really triggering ... frequently.
> >>>>>>>>
> >>>>>>>> Let's convert it to VM_WARN_ON_ONCE(!pfn_valid(pfn)), so we can
> >>>>>>>> simplify and get rid of highest_memmap_pfn. Checking for
> >>>>>>>> pfn_to_online_page() might be even better, but it would not handle
> >>>>>>>> ZONE_DEVICE properly.
> >>>>>>>>
> >>>>>>>> Do the same in vm_normal_page_pmd(), where we don't even report a
> >>>>>>>> problem at all ...
> >>>>>>>>
> >>>>>>>> What might be better in the future is having a runtime option like
> >>>>>>>> page-table-check to enable such checks dynamically on-demand.
> >>>>>>>> Something
> >>>>>>>> for the future.
> >>>>>>>>
> >>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
> > 
> > The author of 22b31eec63e5 thinks this is not at all an improvement.
> > Of course the condition is not triggering frequently, of course it
> > should not happen: but it does happen, and it still seems worthwhile
> > to catch it in production with a "Bad page map" than to let it run on
> > to whatever kind of crash it hits instead.
> 
> Well, obviously I don't agree and was waiting for having this discussion :)
> 
> We catch corruption in a handful of PTE bits, and that's about it. You neither
> detect corruption of flags nor of PFN bits that result in another valid PFN.

Of course it's limited in what it can catch (and won't even get called
if the present bit was not set - a more complete patch might unify with
those various "Bad swap" messages). Of course. But it's still useful for
stopping pfn_to_page() veering off the end of the memmap[] (in some configs).
And it's still useful for printing out a series of "Bad page map" messages
when the page table is corrupted: from which a picture can sometimes be
built up (isolated instance may just be a bitflip; series of them can
sometimes show e.g. ascii text, occasionally helpful for debugging).

> 
> Corruption of the "special" bit might be fun.
> 
> When I was able to trigger this during development once, the whole machine
> went down shortly after -- mostly because of use-after-free of something that
> is now a page table, which is just bad for both users of such a page!
> 
> E.g., quit that process and we will happily clear the PTE, corrupting data of
> the other user. Fun.
> 
> I'm sure I could find a way to unify the code while printing some comparable
> message, but this check as it stands is just not worth it IMHO: trying to
> handle something gracefully that shouldn't happen, when really we cannot
> handle it gracefully.

So, you have experience of a time when it didn't help you. Okay. And we
have had experience of other times when it has helped, if only a little.
Like with other "Bad page"s: sometimes helpful, often not; but tending to
build up a big picture from repeated occurrences.

We continue to disagree. I can't argue more than append the 2.6.29
commit message, which seems to me as valid now as it was then.

From 22b31eec63e5f2e219a3ee15f456897272bc73e8 Mon Sep 17 00:00:00 2001
From: Hugh Dickins <hugh@veritas.com>
Date: Tue, 6 Jan 2009 14:40:09 -0800
Subject: [PATCH] badpage: vm_normal_page use print_bad_pte

print_bad_pte() is so far being called only when zap_pte_range() finds
negative page_mapcount, or there's a fault on a pte_file where it does not
belong.  That's weak coverage when we suspect pagetable corruption.

Originally, it was called when vm_normal_page() found an invalid pfn: but
pfn_valid is expensive on some architectures and configurations, so 2.6.24
put that under CONFIG_DEBUG_VM (which doesn't help in the field), then
2.6.26 replaced it by a VM_BUG_ON (likewise).

Reinstate the print_bad_pte() in vm_normal_page(), but use a cheaper test
than pfn_valid(): memmap_init_zone() (used in bootup and hotplug) keep a
__read_mostly note of the highest_memmap_pfn, vm_normal_page() then check
pfn against that.  We could call this pfn_plausible() or pfn_sane(), but I
doubt we'll need it elsewhere: of course it's not reliable, but gives much
stronger pagetable validation on many boxes.

Also use print_bad_pte() when the pte_special bit is found outside a
VM_PFNMAP or VM_MIXEDMAP area, instead of VM_BUG_ON.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

  reply	other threads:[~2025-07-08  2:53 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-17 15:43 [PATCH RFC 00/14] mm: vm_normal_page*() + CoW PFNMAP improvements David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 01/14] mm/memory: drop highest_memmap_pfn sanity check in vm_normal_page() David Hildenbrand
2025-06-20 12:50   ` Oscar Salvador
2025-06-23 14:04     ` David Hildenbrand
2025-06-25  7:54       ` Oscar Salvador
2025-07-03 12:34       ` Lance Yang
2025-07-03 12:39         ` David Hildenbrand
2025-07-03 14:44           ` Lance Yang
2025-07-04 12:40             ` David Hildenbrand
2025-07-07  6:31               ` Hugh Dickins
2025-07-07 13:19                 ` David Hildenbrand
2025-07-08  2:52                   ` Hugh Dickins [this message]
2025-07-11 15:30                     ` David Hildenbrand
2025-07-11 18:49                       ` Hugh Dickins
2025-07-11 18:57                         ` David Hildenbrand
2025-06-25  7:55   ` Oscar Salvador
2025-07-03 14:50   ` Lance Yang
2025-06-17 15:43 ` [PATCH RFC 02/14] mm: drop highest_memmap_pfn David Hildenbrand
2025-06-20 13:04   ` Oscar Salvador
2025-06-20 18:11   ` Pedro Falcato
2025-06-17 15:43 ` [PATCH RFC 03/14] mm: compare pfns only if the entry is present when inserting pfns/pages David Hildenbrand
2025-06-20 13:27   ` Oscar Salvador
2025-06-23 19:22     ` David Hildenbrand
2025-06-20 18:24   ` Pedro Falcato
2025-06-23 19:19     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 04/14] mm/huge_memory: move more common code into insert_pmd() David Hildenbrand
2025-06-20 14:12   ` Oscar Salvador
2025-07-07  2:48     ` Alistair Popple
2025-06-17 15:43 ` [PATCH RFC 05/14] mm/huge_memory: move more common code into insert_pud() David Hildenbrand
2025-06-20 14:15   ` Oscar Salvador
2025-07-07  2:51   ` Alistair Popple
2025-06-17 15:43 ` [PATCH RFC 06/14] mm/huge_memory: support huge zero folio in vmf_insert_folio_pmd() David Hildenbrand
2025-06-25  8:15   ` Oscar Salvador
2025-06-25  8:17     ` Oscar Salvador
2025-06-25  8:20   ` Oscar Salvador
2025-06-25  8:59     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 07/14] fs/dax: use vmf_insert_folio_pmd() to insert the huge zero folio David Hildenbrand
2025-06-24  1:16   ` Alistair Popple
2025-06-25  9:03     ` David Hildenbrand
2025-07-04 13:22       ` David Hildenbrand
2025-07-07 11:50         ` Alistair Popple
2025-06-17 15:43 ` [PATCH RFC 08/14] mm/huge_memory: mark PMD mappings of the huge zero folio special David Hildenbrand
2025-06-25  8:32   ` Oscar Salvador
2025-07-14 12:41     ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 09/14] mm/memory: introduce is_huge_zero_pfn() and use it in vm_normal_page_pmd() David Hildenbrand
2025-06-25  8:37   ` Oscar Salvador
2025-06-17 15:43 ` [PATCH RFC 10/14] mm/memory: factor out common code from vm_normal_page_*() David Hildenbrand
2025-06-25  8:53   ` Oscar Salvador
2025-06-25  8:57     ` David Hildenbrand
2025-06-25  9:20       ` Oscar Salvador
2025-06-25 10:14         ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 11/14] mm: remove "horrible special case to handle copy-on-write behaviour" David Hildenbrand
2025-06-25  8:47   ` David Hildenbrand
2025-06-25  9:02     ` Oscar Salvador
2025-06-25  9:04       ` David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 12/14] mm: drop addr parameter from vm_normal_*_pmd() David Hildenbrand
2025-06-17 15:43 ` [PATCH RFC 13/14] mm: introduce and use vm_normal_page_pud() David Hildenbrand
2025-06-25  9:22   ` Oscar Salvador
2025-06-17 15:43 ` [PATCH RFC 14/14] mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page() David Hildenbrand
2025-06-25  9:34   ` Oscar Salvador
2025-07-14 14:19     ` David Hildenbrand
2025-06-17 16:18 ` [PATCH RFC 00/14] mm: vm_normal_page*() + CoW PFNMAP improvements David Hildenbrand
2025-06-17 18:25   ` David Hildenbrand
2025-06-25  8:49 ` Lorenzo Stoakes
2025-06-25  8:55   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b1cb496-4e50-252e-5bcf-74a89a78a8c0@google.com \
    --to=hughd@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgross@suse.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=osalvador@suse.de \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).