From: David Hildenbrand <david@redhat.com>
To: Jinjiang Tu <tujinjiang@huawei.com>,
akpm@linux-foundation.org, linmiaohe@huawei.com
Cc: linux-mm@kvack.org, wangkefeng.wang@huawei.com, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
Date: Wed, 11 Jun 2025 11:20:01 +0200 [thread overview]
Message-ID: <a5b77c94-bc8f-4a79-9e45-95dffbaaf280@redhat.com> (raw)
In-Reply-To: <b388bcc1-bdf5-fc6f-bccd-6541835f1c80@huawei.com>
On 11.06.25 11:00, Jinjiang Tu wrote:
>
> 在 2025/6/11 16:35, David Hildenbrand 写道:
>> On 11.06.25 10:29, Jinjiang Tu wrote:
>>>
>>> 在 2025/6/11 15:59, David Hildenbrand 写道:
>>>> On 11.06.25 09:46, Jinjiang Tu wrote:
>>>>> In shrink_folio_list(), the hwpoisoned folio may be large folio, which
>>>>> can't be handled by unmap_poisoned_folio().
>>>>>
>>>>> Since UCE is rare in real world, and race with reclaimation is more
>>>>> rare,
>>>>> just skipping the hwpoisoned large folio is enough. memory_failure()
>>>>> will
>>>>> handle it if the UCE is triggered again.
>>>>>
>>>>> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio")
>>>>
>>>> Please also add
>>>>
>>>> Closes:
>>>>
>>>> with a link to the report
>>> Thanks, I will add it.
>>>>
>>>>> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
>>>>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>>>>> ---
>>>>> mm/vmscan.c | 8 ++++++++
>>>>> 1 file changed, 8 insertions(+)
>>>>> /home/tujinjiang/hulk-repo/hulk/mm/mempolicy.c
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index b6f4db6c240f..3a4e8d7419ae 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -1131,6 +1131,14 @@ static unsigned int shrink_folio_list(struct
>>>>> list_head *folio_list,
>>>>> goto keep;
>>>>> if (folio_contain_hwpoisoned_page(folio)) {
>>>>> + /*
>>>>> + * unmap_poisoned_folio() can't handle large
>>>>> + * folio, just skip it. memory_failure() will
>>>>> + * handle it if the UCE is triggered again.
>>>>> + */
>>>>> + if (folio_test_large(folio))
>>>>> + goto keep_locked;
>>>>> +
>>>>> unmap_poisoned_folio(folio, folio_pfn(folio), false);
>>>>> folio_unlock(folio);
>>>>> folio_put(folio);
>>>>
>>>> Why not handle that in unmap_poisoned_folio() to make that limitation
>>>> clear and avoid?
>>> I tried to put the check in unmap_poisoned_folio(), but it still exists
>>> other issues.
>>
>>
>>
>>> The calltrace in v6.6 kernel:
>>>
>>> Unable to handle kernel paging request at virtual address
>>> fbd5200000000024
>>> KASAN: maybe wild-memory-access in range
>>> [0xdead000000000120-0xdead000000000127]
>>> pc : __list_add_valid_or_report+0x50/0x158 lib/list_debug.c:32
>>> lr : __list_add_valid include/linux/list.h:88 [inline]
>>> lr : __list_add include/linux/list.h:150 [inline]
>>> lr : list_add_tail include/linux/list.h:183 [inline]
>>> lr : lru_add_page_tail.constprop.0+0x4ac/0x640 mm/huge_memory.c:3187
>>> Call trace:
>>> __list_add_valid_or_report+0x50/0x158 lib/list_debug.c:32
>>> __list_add_valid include/linux/list.h:88 [inline]
>>> __list_add include/linux/list.h:150 [inline]
>>> list_add_tail include/linux/list.h:183 [inline]
>>> lru_add_page_tail.constprop.0+0x4ac/0x640 mm/huge_memory.c:3187
>>> __split_huge_page_tail.isra.0+0x344/0x508 mm/huge_memory.c:3286
>>> __split_huge_page+0x244/0x1270 mm/huge_memory.c:3317
>>> split_huge_page_to_list_to_order+0x1038/0x1620 mm/huge_memory.c:3625
>>> split_folio_to_list_to_order include/linux/huge_mm.h:638 [inline]
>>> split_folio_to_order include/linux/huge_mm.h:643 [inline]
>>> deferred_split_scan+0x5f8/0xb70 mm/huge_memory.c:3778
>>> do_shrink_slab+0x2a0/0x828 mm/vmscan.c:927
>>> shrink_slab_memcg+0x2c0/0x558 mm/vmscan.c:996
>>> shrink_slab+0x228/0x250 mm/vmscan.c:1075
>>> shrink_node_memcgs+0x34c/0x6a0 mm/vmscan.c:6630
>>> shrink_node+0x21c/0x1378 mm/vmscan.c:6664
>>> shrink_zones.constprop.0+0x24c/0xab0 mm/vmscan.c:6906
>>> do_try_to_free_pages+0x150/0x880 mm/vmscan.c:6968
>>>
>>>
>>> The folio is deleted from lru and the folio->lru can't be accessed. If
>>> the folio is splitted later,
>>> lru_add_split_folio() assumes the folio is on lru.
>>
>> Not sure if something like the following would be appropriate:
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b91a33fb6c694..fdd58c8ba5254 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1566,6 +1566,9 @@ int unmap_poisoned_folio(struct folio *folio,
>> unsigned long pfn, bool must_kill)
>> enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
>> struct address_space *mapping;
>>
>> + if (folio_test_large && !folio_test_hugetlb(folio))
>> + return -EBUSY;
>> +
>> if (folio_test_swapcache(folio)) {
>> pr_err("%#lx: keeping poisoned page in swap cache\n",
>> pfn);
>> ttu &= ~TTU_HWPOISON;
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f8dfd2864bbf4..6a3426bc9e9d7 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1138,7 +1138,8 @@ static unsigned int shrink_folio_list(struct
>> list_head *folio_list,
>> goto keep;
>>
>> if (folio_contain_hwpoisoned_page(folio)) {
>> - unmap_poisoned_folio(folio, folio_pfn(folio),
>> false);
>> + if (unmap_poisoned_folio(folio,
>> folio_pfn(folio), false)){
>> + list_add(&folio->lru, &ret_folios);
>> folio_unlock(folio);
>> folio_put(folio);
>> continue;
>
> The expected behaviour is keeping the folio on lru if
> unmap_poisoned_folio fails?
Good question, it's a mess.
If we keep the LRU bit cleared (kept isolated), we wouldn't have to add
it to the list.
But now I wonder where deferred_split_scan() would check for the LRU flag?
It seems to trylock, to then call split_folio().
In __folio_split(), I don't find any checks for the lru flag ... :(
We call lru_add_split_folio() where we
VM_BUG_ON_FOLIO(folio_test_lru(new_folio), folio);
So now I am confused.
>
> If so, we should:
>
> + if (unmap_poisoned_folio(folio,
> folio_pfn(folio), false)){
> + goto keep_locked;
>
> otherwise, folio_put() is called twice to put ref grabbed from isolation.
Good point.
Maybe Zi Yan can help.
Re-adding them to the LRU if anything goes wrong should work, but I am
not sure if that's the right approach.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-06-11 9:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-11 7:46 [PATCH] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
2025-06-11 7:59 ` David Hildenbrand
2025-06-11 8:29 ` Jinjiang Tu
2025-06-11 8:35 ` David Hildenbrand
2025-06-11 9:00 ` Jinjiang Tu
2025-06-11 9:20 ` David Hildenbrand [this message]
2025-06-11 9:24 ` David Hildenbrand
2025-06-11 14:30 ` Zi Yan
2025-06-11 17:34 ` David Hildenbrand
2025-06-11 17:52 ` Zi Yan
2025-06-12 7:53 ` David Hildenbrand
2025-06-12 15:35 ` Zi Yan
2025-06-12 15:50 ` David Hildenbrand
2025-06-12 16:48 ` Zi Yan
2025-06-16 11:34 ` Jinjiang Tu
2025-06-16 11:33 ` Jinjiang Tu
2025-06-16 19:27 ` David Hildenbrand
2025-06-17 6:43 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5b77c94-bc8f-4a79-9e45-95dffbaaf280@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=tujinjiang@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).