[PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
@ 2025-07-11  2:17 Jinjiang Tu
  2025-07-11  3:04 ` Zi Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jinjiang Tu @ 2025-07-11  2:17 UTC (permalink / raw)
  To: akpm, linmiaohe, david, osalvador, mhocko, ziy
  Cc: linux-mm, wangkefeng.wang, tujinjiang

In shrink_folio_list(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
due to the page isn't in swapcache.

Since UCE is rare in real world, and race with reclaimation is more rare,
just skipping the hwpoisoned large folio is enough. memory_failure() will
handle it if the UCE is triggered again.

Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
v3:
 * collect Acked-by and Reviewed-by
 * update commit message and commemts, sugguested by Oscar Salvador.

 mm/memory-failure.c | 4 ++++
 mm/vmscan.c         | 8 ++++++++
 2 files changed, 12 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b91a33fb6c69..9ee176fcc949 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
 	return ret;
 }
 
+/*
+ * The caller must guarantee the folio isn't large folio. try_to_unmap()
+ * can't handle it.
+ */
 int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
 {
 	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f8dfd2864bbf..424412680cfc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1138,6 +1138,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			goto keep;
 
 		if (folio_contain_hwpoisoned_page(folio)) {
+			/*
+			 * unmap_poisoned_folio() can't handle large
+			 * folio, just skip it. memory_failure() will
+			 * handle it if the UCE is triggered again.
+			 */
+			if (folio_test_large(folio))
+				goto keep_locked;
+
 			unmap_poisoned_folio(folio, folio_pfn(folio), false);
 			folio_unlock(folio);
 			folio_put(folio);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  2:17 [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
@ 2025-07-11  3:04 ` Zi Yan
  2025-07-11  5:37 ` Oscar Salvador
  2025-07-11  8:05 ` David Hildenbrand
  2 siblings, 0 replies; 7+ messages in thread
From: Zi Yan @ 2025-07-11  3:04 UTC (permalink / raw)
  To: Jinjiang Tu
  Cc: akpm, linmiaohe, david, osalvador, mhocko, linux-mm,
	wangkefeng.wang

On 10 Jul 2025, at 22:17, Jinjiang Tu wrote:

> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
> can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
> must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
> retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
> pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
> due to the page isn't in swapcache.
>
> Since UCE is rare in real world, and race with reclaimation is more rare,
> just skipping the hwpoisoned large folio is enough. memory_failure() will
> handle it if the UCE is triggered again.
>
> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
> v3:
>  * collect Acked-by and Reviewed-by
>  * update commit message and commemts, sugguested by Oscar Salvador.
>
>  mm/memory-failure.c | 4 ++++
>  mm/vmscan.c         | 8 ++++++++
>  2 files changed, 12 insertions(+)
>
Acked-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  2:17 [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
  2025-07-11  3:04 ` Zi Yan
@ 2025-07-11  5:37 ` Oscar Salvador
  2025-07-11  8:05 ` David Hildenbrand
  2 siblings, 0 replies; 7+ messages in thread
From: Oscar Salvador @ 2025-07-11  5:37 UTC (permalink / raw)
  To: Jinjiang Tu
  Cc: akpm, linmiaohe, david, mhocko, ziy, linux-mm, wangkefeng.wang

On Fri, Jul 11, 2025 at 10:17:34AM +0800, Jinjiang Tu wrote:
> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
> can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
> must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
> retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
> pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
> due to the page isn't in swapcache.
> 
> Since UCE is rare in real world, and race with reclaimation is more rare,
> just skipping the hwpoisoned large folio is enough. memory_failure() will
> handle it if the UCE is triggered again.
> 
> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  2:17 [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
  2025-07-11  3:04 ` Zi Yan
  2025-07-11  5:37 ` Oscar Salvador
@ 2025-07-11  8:05 ` David Hildenbrand
  2025-07-11  8:55   ` [PATCH v4] " Jinjiang Tu
  2025-07-11  8:56   ` [PATCH v3] " Jinjiang Tu
  2 siblings, 2 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-07-11  8:05 UTC (permalink / raw)
  To: Jinjiang Tu, akpm, linmiaohe, osalvador, mhocko, ziy
  Cc: linux-mm, wangkefeng.wang

On 11.07.25 04:17, Jinjiang Tu wrote:
> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
> can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
> must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
> retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
> pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
> due to the page isn't in swapcache.
> 
> Since UCE is rare in real world, and race with reclaimation is more rare,
> just skipping the hwpoisoned large folio is enough. memory_failure() will
> handle it if the UCE is triggered again.
> 
> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
> Acked-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
> v3:
>   * collect Acked-by and Reviewed-by
>   * update commit message and commemts, sugguested by Oscar Salvador.
> 
>   mm/memory-failure.c | 4 ++++
>   mm/vmscan.c         | 8 ++++++++
>   2 files changed, 12 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b91a33fb6c69..9ee176fcc949 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page *p, unsigned long flags)
>   	return ret;
>   }
>   
> +/*
> + * The caller must guarantee the folio isn't large folio. try_to_unmap()
> + * can't handle it.

Not completely accurate: it may be a hugetlb folios, that is also large, 
but supported.

"isn't a large folio, except hugetlb."

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  8:05 ` David Hildenbrand
@ 2025-07-11  8:55   ` Jinjiang Tu
  2025-07-12 23:42     ` Andrew Morton
  2025-07-11  8:56   ` [PATCH v3] " Jinjiang Tu
  1 sibling, 1 reply; 7+ messages in thread
From: Jinjiang Tu @ 2025-07-11  8:55 UTC (permalink / raw)
  To: David Hildenbrand, akpm, linmiaohe, osalvador, mhocko, ziy
  Cc: linux-mm, wangkefeng.wang

In shrink_folio_list(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
due to the page isn't in swapcache.

Since UCE is rare in real world, and race with reclaimation is more rare,
just skipping the hwpoisoned large folio is enough. memory_failure() will
handle it if the UCE is triggered again.

Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
Closes: 
https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
  mm/memory-failure.c | 4 ++++
  mm/vmscan.c         | 8 ++++++++
  2 files changed, 12 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b91a33fb6c69..225dddff091d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page *p, 
unsigned long flags)
      return ret;
  }

+/*
+ * The caller must guarantee the folio isn't large folio, except hugetlb.
+ * try_to_unmap() can't handle it.
+ */
  int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool 
must_kill)
  {
      enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f8dfd2864bbf..424412680cfc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1138,6 +1138,14 @@ static unsigned int shrink_folio_list(struct 
list_head *folio_list,
              goto keep;

          if (folio_contain_hwpoisoned_page(folio)) {
+            /*
+             * unmap_poisoned_folio() can't handle large
+             * folio, just skip it. memory_failure() will
+             * handle it if the UCE is triggered again.
+             */
+            if (folio_test_large(folio))
+                goto keep_locked;
+
              unmap_poisoned_folio(folio, folio_pfn(folio), false);
              folio_unlock(folio);
              folio_put(folio);
-- 
2.43.0




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  8:05 ` David Hildenbrand
  2025-07-11  8:55   ` [PATCH v4] " Jinjiang Tu
@ 2025-07-11  8:56   ` Jinjiang Tu
  1 sibling, 0 replies; 7+ messages in thread
From: Jinjiang Tu @ 2025-07-11  8:56 UTC (permalink / raw)
  To: David Hildenbrand, akpm, linmiaohe, osalvador, mhocko, ziy
  Cc: linux-mm, wangkefeng.wang


在 2025/7/11 16:05, David Hildenbrand 写道:
> On 11.07.25 04:17, Jinjiang Tu wrote:
>> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
>> can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
>> must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
>> retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
>> pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a 
>> WARN_ON_ONCE
>> due to the page isn't in swapcache.
>>
>> Since UCE is rare in real world, and race with reclaimation is more 
>> rare,
>> just skipping the hwpoisoned large folio is enough. memory_failure() 
>> will
>> handle it if the UCE is triggered again.
>>
>> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
>> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
>> Closes: 
>> https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>> ---
>> v3:
>>   * collect Acked-by and Reviewed-by
>>   * update commit message and commemts, sugguested by Oscar Salvador.
>>
>>   mm/memory-failure.c | 4 ++++
>>   mm/vmscan.c         | 8 ++++++++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b91a33fb6c69..9ee176fcc949 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page *p, 
>> unsigned long flags)
>>       return ret;
>>   }
>>   +/*
>> + * The caller must guarantee the folio isn't large folio. 
>> try_to_unmap()
>> + * can't handle it.
>
> Not completely accurate: it may be a hugetlb folios, that is also 
> large, but supported.
>
> "isn't a large folio, except hugetlb."
>
Thanks, updated it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
  2025-07-11  8:55   ` [PATCH v4] " Jinjiang Tu
@ 2025-07-12 23:42     ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2025-07-12 23:42 UTC (permalink / raw)
  To: Jinjiang Tu
  Cc: David Hildenbrand, linmiaohe, osalvador, mhocko, ziy, linux-mm,
	wangkefeng.wang

On Fri, 11 Jul 2025 16:55:45 +0800 Jinjiang Tu <tujinjiang@huawei.com> wrote:

> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
> can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one()
> must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
> retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
> pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE
> due to the page isn't in swapcache.
> 
> Since UCE is rare in real world, and race with reclaimation is more rare,
> just skipping the hwpoisoned large folio is enough. memory_failure() will
> handle it if the UCE is triggered again.

Your email client made a mess of the whitespace.  I fixed that up and
turned this into a v2->v4 delta so I/we can see what happened:


--- a/mm/memory-failure.c~mm-vmscan-fix-hwpoisoned-large-folio-handling-in-shrink_folio_list-v4
+++ a/mm/memory-failure.c
@@ -1561,6 +1561,10 @@ static int get_hwpoison_page(struct page
 	return ret;
 }
 
+/*
+ * The caller must guarantee the folio isn't large folio, except hugetlb.
+ * try_to_unmap() can't handle it.
+ */
 int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
 {
 	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
_


Also, the v2 patch's changelog (probably as amended by me) had a nice
description of the race, which is lost in this v4 patch.  I restored
it, so the final changelog is as below.  Please check.


From: Jinjiang Tu <tujinjiang@huawei.com>
Subject: mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
Date: Fri, 27 Jun 2025 20:57:46 +0800

In shrink_folio_list(), the hwpoisoned folio may be large folio, which
can't be handled by unmap_poisoned_folio().  For THP, try_to_unmap_one()
must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then
retry.  Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of
pvmw.pte.  Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a
WARN_ON_ONCE due to the page isn't in swapcache.

Since UCE is rare in real world, and race with reclaimation is more rare,
just skipping the hwpoisoned large folio is enough.  memory_failure() will
handle it if the UCE is triggered again.

This happens when memory reclaim for large folio races with
memory_failure(), and will lead to kernel panic.  The race is as
follows:

cpu0      cpu1
 shrink_folio_list memory_failure
  TestSetPageHWPoison
  unmap_poisoned_folio
  --> trigger BUG_ON due to
  unmap_poisoned_folio couldn't
   handle large folio

Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/vmscan.c~mm-vmscan-fix-hwpoisoned-large-folio-handling-in-shrink_folio_list
+++ a/mm/vmscan.c
@@ -1138,6 +1138,14 @@ retry:
 			goto keep;
 
 		if (folio_contain_hwpoisoned_page(folio)) {
+			/*
+			 * unmap_poisoned_folio() can't handle large
+			 * folio, just skip it. memory_failure() will
+			 * handle it if the UCE is triggered again.
+			 */
+			if (folio_test_large(folio))
+				goto keep_locked;
+
 			unmap_poisoned_folio(folio, folio_pfn(folio), false);
 			folio_unlock(folio);
 			folio_put(folio);
_



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-12 23:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11  2:17 [PATCH v3] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
2025-07-11  3:04 ` Zi Yan
2025-07-11  5:37 ` Oscar Salvador
2025-07-11  8:05 ` David Hildenbrand
2025-07-11  8:55   ` [PATCH v4] " Jinjiang Tu
2025-07-12 23:42     ` Andrew Morton
2025-07-11  8:56   ` [PATCH v3] " Jinjiang Tu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).