From: Oscar Salvador <osalvador@suse.de>
To: Jinjiang Tu <tujinjiang@huawei.com>
Cc: akpm@linux-foundation.org, linmiaohe@huawei.com,
david@redhat.com, mhocko@kernel.org, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range
Date: Tue, 1 Jul 2025 16:21:00 +0200 [thread overview]
Message-ID: <aGPuzE3QMeszkOQj@localhost.localdomain> (raw)
In-Reply-To: <20250627125747.3094074-3-tujinjiang@huawei.com>
On Fri, Jun 27, 2025 at 08:57:47PM +0800, Jinjiang Tu wrote:
> In do_migrate_range(), the hwpoisoned folio may be large folio, which
> can't be handled by unmap_poisoned_folio().
>
> I can reproduce this issue in qemu after adding delay in memory_failure()
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:try_to_unmap_one+0x16a/0xfc0
> <TASK>
> rmap_walk_anon+0xda/0x1f0
> try_to_unmap+0x78/0x80
> ? __pfx_try_to_unmap_one+0x10/0x10
> ? __pfx_folio_not_mapped+0x10/0x10
> ? __pfx_folio_lock_anon_vma_read+0x10/0x10
> unmap_poisoned_folio+0x60/0x140
> do_migrate_range+0x4d1/0x600
> ? slab_memory_callback+0x6a/0x190
> ? notifier_call_chain+0x56/0xb0
> offline_pages+0x3e6/0x460
> memory_subsys_offline+0x130/0x1f0
> device_offline+0xba/0x110
> acpi_bus_offline+0xb7/0x130
> acpi_scan_hot_remove+0x77/0x290
> acpi_device_hotplug+0x1e0/0x240
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x186/0x340
>
> In this case, just make offline_pages() fail.
>
> Besides, do_migrate_range() may be called between memory_failure set
> hwposion flag and ioslate the folio from lru, so remove WARN_ON(). In other
> places, unmap_poisoned_folio() is called when the folio is isolated, obey
> it in do_migrate_range() too.
>
> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
...
> @@ -2041,11 +2048,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>
> ret = scan_movable_pages(pfn, end_pfn, &pfn);
> if (!ret) {
> - /*
> - * TODO: fatal migration failures should bail
> - * out
> - */
> - do_migrate_range(pfn, end_pfn);
> + ret = do_migrate_range(pfn, end_pfn);
> + if (ret)
> + break;
I am not really sure about this one.
I get the reason you're adding it, but note that migrate_pages() can also return
"fatal" errors and we don't propagate that.
The moto has always been to migrate as much as possible, and this changes this
behaviour.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2025-07-01 14:21 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-27 12:57 [PATCH v2 0/2] fix two calls of unmap_poisoned_folio() for large folio Jinjiang Tu
2025-06-27 12:57 ` [PATCH v2 1/2] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
2025-06-27 17:10 ` David Hildenbrand
2025-06-27 22:00 ` Andrew Morton
2025-06-28 2:38 ` Jinjiang Tu
2025-06-28 3:13 ` Miaohe Lin
2025-07-01 14:13 ` Oscar Salvador
2025-07-03 7:30 ` Jinjiang Tu
2025-06-27 12:57 ` [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range Jinjiang Tu
2025-07-01 14:21 ` Oscar Salvador [this message]
2025-07-03 7:46 ` Jinjiang Tu
2025-07-03 7:57 ` David Hildenbrand
2025-07-03 8:24 ` Jinjiang Tu
2025-07-03 9:06 ` David Hildenbrand
2025-07-07 11:51 ` Jinjiang Tu
2025-07-07 12:37 ` David Hildenbrand
2025-07-08 1:15 ` Jinjiang Tu
2025-07-08 9:54 ` David Hildenbrand
2025-07-09 16:27 ` Zi Yan
2025-07-14 13:53 ` Pankaj Raghav
2025-07-14 14:20 ` Zi Yan
2025-07-14 14:24 ` David Hildenbrand
2025-07-14 15:09 ` Pankaj Raghav (Samsung)
2025-07-14 15:14 ` David Hildenbrand
2025-07-14 15:25 ` Zi Yan
2025-07-14 15:28 ` Zi Yan
2025-07-14 15:33 ` David Hildenbrand
2025-07-14 15:44 ` Zi Yan
2025-07-14 15:52 ` David Hildenbrand
2025-07-20 2:23 ` Andrew Morton
2025-07-22 15:30 ` David Hildenbrand
2025-08-21 5:02 ` Andrew Morton
2025-08-21 22:07 ` David Hildenbrand
2025-08-22 17:24 ` Zi Yan
2025-08-25 2:05 ` Miaohe Lin
2025-07-03 7:53 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGPuzE3QMeszkOQj@localhost.localdomain \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=tujinjiang@huawei.com \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.