All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	stable@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH] mm,swapops: Update check in is_pfn_swap_entry for hwpoison entries
Date: Sun, 7 Apr 2024 15:19:57 -0400	[thread overview]
Message-ID: <ZhLx3fwzQNPDbBei@x1n> (raw)
In-Reply-To: <ZhKmAecilbb2oSD9@localhost.localdomain>

On Sun, Apr 07, 2024 at 03:56:17PM +0200, Oscar Salvador wrote:
> On Sun, Apr 07, 2024 at 03:05:37PM +0200, Oscar Salvador wrote:
> > Tony reported that the Machine check recovery was broken in v6.9-rc1,
> > as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
> > to DRAM.
> > After some more digging and debugging on his side, he realized that this
> > went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
> > swp_offset_pfn() to fetch PFN from swap entry")'.
> > That commit, among other things, introduced swp_offset_pfn(), replacing
> > hwpoison_entry_to_pfn() in its favour.
> > 
> > The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
> > but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
> > means that we would hit the VM_BUG_ON whenever we would call
> > swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
> > Fix this by updating the check to cover hwpoison entries as well, and update
> > the comment while we are it.
> > 
> > Reported-by: Tony Luck <tony.luck@intel.com>
> > Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")

Totally unexpected, as this commit even removed hwpoison_entry_to_pfn().
Obviously even until now I assumed hwpoison is accounted as pfn swap entry
but it's just missing..

Since this commit didn't really change is_pfn_swap_entry() itself, I was
thinking maybe an older fix tag would apply, but then I noticed the old
code indeed should work well even if hwpoison entry is missing.  For
example, it's a grey area on whether a hwpoisoned page should be accounted
in smaps.  So I think the Fixes tag is correct, and thanks for fixing this.

Reviewed-by: Peter Xu <peterx@redhat.com>

> > Cc: <stable@vger.kernel.org> # 6.1.x
> 
> I think I need to clarify why the stable.
> 
> It is my understanding that some distros ship their kernel with
> CONFIG_DEBUG_VM set by default (I think Fedora comes to my mind?).
> I am fine with backing down if people think that this is an
> overreaction.

Fedora stopped having DEBUG_VM for some time, but not sure about when it's
still in the 6.1 trees.  It looks like cc stable is still reasonable from
that regard.

A side note is that when I'm looking at this, I went back and see why in
some cases we need the pfn maintained for the poisoned, then I saw the only
user is check_hwpoisoned_entry() who wants to do fast kills in some
contexts and that includes a double check on the pfns in a poisoned entry.
Then afaict this path is just too rarely used and buggy.

A few things we may need fixing, maybe someone in the loop would have time
to have a look:

- check_hwpoisoned_entry()
  - pte_none check is missing
  - all the rest swap types are missing (e.g., we want to kill the proc too
    if the page is during migration)
- check_hwpoisoned_pmd_entry()
  - need similar care like above (pmd_none is covered not others)

I copied Naoya too.

Thanks,

-- 
Peter Xu


  reply	other threads:[~2024-04-07 19:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-07 13:05 [PATCH] mm,swapops: Update check in is_pfn_swap_entry for hwpoison entries Oscar Salvador
2024-04-07 13:56 ` Oscar Salvador
2024-04-07 19:19   ` Peter Xu [this message]
2024-04-07 20:31     ` Oscar Salvador
2024-04-10  8:16       ` Miaohe Lin
2024-04-08  7:44 ` David Hildenbrand
2024-04-08  8:31 ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhLx3fwzQNPDbBei@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.