public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Cc: Zenghui Yu <zenghui.yu@linux.dev>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org, jgg@ziepe.ca,
	leon@kernel.org, akpm@linux-foundation.org,  david@kernel.org,
	Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org,
	 surenb@google.com, mhocko@suse.com, balbirs@nvidia.com
Subject: Re: running mm/ksft_hmm.sh on arm64 results in a kernel panic
Date: Thu, 19 Mar 2026 12:49:23 +1100	[thread overview]
Message-ID: <abtUZpLOSaSMAkCK@nvdebian.thelocal> (raw)
In-Reply-To: <3f58a6f6-bf26-4c6c-8bc4-c05264ad0cc3@lucifer.local>

On 2026-03-19 at 02:05 +1100, "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote...
> On Wed, Mar 18, 2026 at 01:26:39PM +0800, Zenghui Yu wrote:
> > Hi all,
> >
> > When running mm/ksft_hmm.sh in my arm64 virtual machine, I ran into the
> > following kernel panic:
> >
> > [root@localhost mm]# ./ksft_hmm.sh
> > TAP version 13
> > # --------------------------------
> > # running bash ./test_hmm.sh smoke
> > # --------------------------------
> > # Running smoke test. Note, this test provides basic coverage.
> > # TAP version 13
> > # 1..74
> > # # Starting 74 tests from 4 test cases.
> > # #  RUN           hmm.hmm_device_private.benchmark_thp_migration ...
> > #
> > # HMM THP Migration Benchmark
> > # ---------------------------
> > # System page size: 16384 bytes
> > #
> > # === Small Buffer (512KB) (0.5 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 0.423 ms        | 0.182 ms        | -133.0%
> > # Dev->Sys Migration   | 0.027 ms        | 0.025 ms        | -7.0%
> > # S->D Throughput      | 1.15 GB/s      | 2.69 GB/s      | -57.1%
> > # D->S Throughput      | 18.12 GB/s      | 19.38 GB/s      | -6.5%
> > #
> > # === Half THP Size (1MB) (1.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 0.367 ms        | 1.187 ms        | 69.0%
> > # Dev->Sys Migration   | 0.048 ms        | 0.049 ms        | 2.2%
> > # S->D Throughput      | 2.66 GB/s      | 0.82 GB/s      | 222.9%
> > # D->S Throughput      | 20.53 GB/s      | 20.08 GB/s      | 2.3%
> > #
> > # === Single THP Size (2MB) (2.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 0.817 ms        | 0.782 ms        | -4.4%
> > # Dev->Sys Migration   | 0.089 ms        | 0.096 ms        | 7.1%
> > # S->D Throughput      | 2.39 GB/s      | 2.50 GB/s      | -4.2%
> > # D->S Throughput      | 22.00 GB/s      | 20.44 GB/s      | 7.6%
> > #
> > # === Two THP Size (4MB) (4.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 3.419 ms        | 2.337 ms        | -46.3%
> > # Dev->Sys Migration   | 0.321 ms        | 0.225 ms        | -42.6%
> > # S->D Throughput      | 1.14 GB/s      | 1.67 GB/s      | -31.6%
> > # D->S Throughput      | 12.17 GB/s      | 17.36 GB/s      | -29.9%
> > #
> > # === Four THP Size (8MB) (8.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 4.535 ms        | 4.563 ms        | 0.6%
> > # Dev->Sys Migration   | 0.583 ms        | 0.582 ms        | -0.2%
> > # S->D Throughput      | 1.72 GB/s      | 1.71 GB/s      | 0.6%
> > # D->S Throughput      | 13.39 GB/s      | 13.43 GB/s      | -0.2%
> > #
> > # === Eight THP Size (16MB) (16.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 10.190 ms        | 9.805 ms        | -3.9%
> > # Dev->Sys Migration   | 1.130 ms        | 1.195 ms        | 5.5%
> > # S->D Throughput      | 1.53 GB/s      | 1.59 GB/s      | -3.8%
> > # D->S Throughput      | 13.83 GB/s      | 13.07 GB/s      | 5.8%
> > #
> > # === One twenty eight THP Size (256MB) (256.0 MB) ===
> > #                      | With THP        | Without THP     | Improvement
> > # ---------------------------------------------------------------------
> > # Sys->Dev Migration   | 80.464 ms        | 92.764 ms        | 13.3%
> > # Dev->Sys Migration   | 9.528 ms        | 18.166 ms        | 47.6%
> > # S->D Throughput      | 3.11 GB/s      | 2.70 GB/s      | 15.3%
> > # D->S Throughput      | 26.24 GB/s      | 13.76 GB/s      | 90.7%
> > # #            OK  hmm.hmm_device_private.benchmark_thp_migration
> > # ok 1 hmm.hmm_device_private.benchmark_thp_migration
> > # #  RUN           hmm.hmm_device_private.migrate_anon_huge_zero_err ...
> > # # hmm-tests.c:2622:migrate_anon_huge_zero_err:Expected ret (-2) == 0 (0)
> >
> > [  154.077143] Unable to handle kernel paging request at virtual address
> > 0000000000005268
> > [  154.077179] Mem abort info:
> > [  154.077203]   ESR = 0x0000000096000007
> > [  154.077219]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [  154.078433]   SET = 0, FnV = 0
> > [  154.078434]   EA = 0, S1PTW = 0
> > [  154.078435]   FSC = 0x07: level 3 translation fault
> > [  154.078435] Data abort info:
> > [  154.078436]   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
> > [  154.078459]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > [  154.078479]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [  154.078484] user pgtable: 16k pages, 47-bit VAs, pgdp=000000010b920000
> > [  154.078487] [0000000000005268] pgd=0800000101b4c403,
> > p4d=0800000101b4c403, pud=0800000101b4c403, pmd=0800000108cd8403,
> > pte=0000000000000000
> > [  154.078520] Internal error: Oops: 0000000096000007 [#1]  SMP
> > [  154.098664] Modules linked in: test_hmm rfkill drm fuse backlight ipv6
> > [  154.100468] CPU: 7 UID: 0 PID: 1357 Comm: hmm-tests Kdump: loaded Not
> > tainted 7.0.0-rc4-00029-ga989fde763f4-dirty #260 PREEMPT
> > [  154.103855] Hardware name: QEMU QEMU Virtual Machine, BIOS
> > edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > [  154.104409] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS
> > BTYPE=--)
> > [  154.104847] pc : dmirror_devmem_fault+0xe4/0x1c0 [test_hmm]
> > [  154.105758] lr : dmirror_devmem_fault+0xcc/0x1c0 [test_hmm]
> > [  154.109465] sp : ffffc000855ab430
> > [  154.109677] x29: ffffc000855ab430 x28: ffff8000c9f73e40 x27:
> > ffff8000c9f73e40
> > [  154.110091] x26: ffff8000cb920000 x25: ffffc000812e0000 x24:
> > 0000000000000000
> > [  154.110540] x23: ffff8000c9f73e40 x22: 0000000000000000 x21:
> > 0000000000000008
> > [  154.110888] x20: ffff8000c07e1980 x19: ffffc000855ab618 x18:
> > ffffc000855abc40
> > [  154.111223] x17: 0000000000000000 x16: 0000000000000000 x15:
> > 0000000000000000
> > [  154.111563] x14: 0000000000000000 x13: 0000000000000000 x12:
> > ffffc00080fedd68
> > [  154.111903] x11: 00007fffa3bf7fff x10: 0000000000000000 x9 :
> > 1ffff00019166a41
> > [  154.112244] x8 : ffff8000c132df20 x7 : 0000000000000000 x6 :
> > ffff8000c53bfe88
> > [  154.112581] x5 : 0000000000000009 x4 : ffffc000855ab3d0 x3 :
> > 0000000000000004
> > [  154.112921] x2 : 0000000000000004 x1 : ffff8000c132df18 x0 :
> > 0000000000005200
> > [  154.113254] Call trace:
> > [  154.113370]  dmirror_devmem_fault+0xe4/0x1c0 [test_hmm] (P)
> > [  154.113679]  do_swap_page+0x132c/0x17b0
> > [  154.113912]  __handle_mm_fault+0x7e4/0x1af4
> > [  154.114124]  handle_mm_fault+0xb4/0x294
> > [  154.114398]  __get_user_pages+0x210/0xbfc
> > [  154.114607]  get_dump_page+0xd8/0x144
> > [  154.114795]  dump_user_range+0x70/0x2e8
> > [  154.115020]  elf_core_dump+0xb64/0xe40
> > [  154.115212]  vfs_coredump+0xfb4/0x1ce8
> > [  154.115397]  get_signal+0x6cc/0x844
> > [  154.115582]  arch_do_signal_or_restart+0x7c/0x33c
> > [  154.115805]  exit_to_user_mode_loop+0x104/0x16c
> > [  154.116030]  el0_svc+0x174/0x178
> > [  154.116216]  el0t_64_sync_handler+0xa0/0xe4
> > [  154.116414]  el0t_64_sync+0x198/0x19c
> > [  154.116594] Code: d2800083 f9400280 f9003be0 2a0303e2 (b9406800)
> > [  154.116891] ---[ end trace 0000000000000000 ]---
> > [  158.741771] Kernel panic - not syncing: Oops: Fatal exception
> > [  158.742164] SMP: stopping secondary CPUs
> > [  158.742970] Kernel Offset: disabled
> > [  158.743162] CPU features: 0x0000000,00060005,11210501,94067723
> > [  158.743440] Memory Limit: none
> > [  164.002089] Starting crashdump kernel...
> > [  164.002867] Bye!
> 
> That 'Bye!' is delightful :)
> 
> >
> > [root@localhost linux]# ./scripts/faddr2line lib/test_hmm.ko
> > dmirror_devmem_fault+0xe4/0x1c0
> > dmirror_devmem_fault+0xe4/0x1c0:
> > dmirror_select_device at /root/code/linux/lib/test_hmm.c:153
> > (inlined by) dmirror_devmem_fault at /root/code/linux/lib/test_hmm.c:1659
> >
> > The kernel is built with arm64's virt.config plus
> >
> > +CONFIG_ARM64_16K_PAGES=y
> > +CONFIG_ZONE_DEVICE=y
> > +CONFIG_DEVICE_PRIVATE=y
> > +CONFIG_TEST_HMM=m
> >
> > I *guess* the problem is that migrate_anon_huge_zero_err() has chosen an
> > incorrect THP size (which should be 32M in a system with 16k page size),
> 
> Yeah, it hardcodes to 2mb:
> 
> TEST_F(hmm, migrate_anon_huge_zero_err)
> {
> 	...
> 
> 	size = TWOMEG;
> }
> 
> Which isn't correct obviously and needs to be fixed.
> 
> We should read /sys/kernel/mm/transparent_hugepage/hpage_pmd_size instead.
> 
> vm_utils.h has read_pmd_pagesize() So this can be fixed with:
> 
> 	size = read_pmd_pagesize();
> 
> We then madvise(.., MADV_HUGEPAGE) region of size, which is now too small.:
> 
> TEST_F(hmm, migrate_anon_huge_zero_err)
> {
> 	...
> 
> 	size = TWOMEG;
> 
> 	...
> 
> 	ret = madvise(map, size, MADV_HUGEPAGE);
> 	ASSERT_EQ(ret, 0); <-- but should succeed anyway, just won't do anything
> 
> 	...
> 
> 	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
> 			      HMM_DMIRROR_FLAG_FAIL_ALLOC);
> }
> 
> Then we switch into lib/test_hmm.c:
> 
> static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
> 					   struct dmirror *dmirror)
> {
> 	...
> 
> 	for (addr = args->start; addr < args->end; ) {
> 		...
> 
> 		if (dmirror->flags & HMM_DMIRROR_FLAG_FAIL_ALLOC) {
> 			dmirror->flags &= ~HMM_DMIRROR_FLAG_FAIL_ALLOC;
> 			dpage = NULL;  <-- force failure for 1st page
> 
> 	...
> 
> 		if (!dpage) {
> 			...
> 
> 			if (!is_large) <-- isn't large, as MADV_HUGEPAGE failed
> 				goto next;
> 
> 	...
> next:
> 		src++;
> 		dst++;
> 		addr += PAGE_SIZE;
> 	}
> }
> 
> Back to the hmm-tests.c selftest:
> 
> TEST_F(hmm, migrate_anon_huge_zero_err)
> {
> 	...
> 
> 	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_FLAGS, buffer,
> 			      HMM_DMIRROR_FLAG_FAIL_ALLOC);
> 	ASSERT_EQ(ret, 0);                  <-- succeeds but...
> 	ASSERT_EQ(buffer->cpages, npages);  <-- cpages = npages - 1.
> }
> 
> So then we try to teardown which inokves:
> 
> FIXTURE_TEARDOWN(hmm)
> {
> 	int ret = close(self->fd);  <-- triggers kernel dmirror_fops_release()
> 	...
> }
> 
> In the kernel:
> 
> static int dmirror_fops_release(struct inode *inode, struct file *filp)
> {
> 	struct dmirror *dmirror = filp->private_data;
> 	...
> 
> 	kfree(dmirror);  <-- frees dmirror...
> 	return 0;
> }
> 
> So dmirror is fred but in dmirror_migrate_alloc_and_copy(), for all those pages
> we DID migrate:
> 
> static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
> 					   struct dmirror *dmirror)
> {
> 	...
> 
> 	for (addr = args->start; addr < args->end; ) {
> 		...
> 
> 		if (!dpage) { <-- we will succeed allocation so don't branch.
> 			...
> 		}
> 
> 		rpage = BACKING_PAGE(dpage);
> 
> 		/*
> 		 * Normally, a device would use the page->zone_device_data to
> 		 * point to the mirror but here we use it to hold the page for
> 		 * the simulated device memory and that page holds the pointer
> 		 * to the mirror.
> 		 */
> 		rpage->zone_device_data = dmirror;
> 
> 		...
> 	}
> 
> 	...
> }
> 
> So now a bunch of device private pages have a zone_device_data set to a dangling
> dmirror pointer.
> 
> Then on coredump, we walk the VMAs, meaning we fault in device private pages and
> end up invoking do_swap_page() which in turn calls dmirror_devmem_fault() (via
> the struct dev_pagemap_ops
> dmirror_devmem_ops->migrate_to_ram=dmirror_devmem_fault callback)
> 
> This is via get_dump_page() -> __get_user_pages_locked(..., FOLL_FORCE |
> FOLL_DUMP | FOLL_GET) -> __get_user_pages() -> handle_mm_fault() ->
> __handle_mm_fault() -> do_swap_page() and:
> 
> vm_fault_t do_swap_page(struct vm_fault *vmf)
> {
> 	...
> 		entry = softleaf_from_pte(vmf->orig_pte);
> 	if (unlikely(!softleaf_is_swap(entry))) {
> 		if (softleaf_is_migration(entry)) {
> 			...
> 		} else if (softleaf_is_device_private(entry)) {
> 			...
> 
> 			if (trylock_page(vmf->page)) {
> 				...
> 
> 				ret = pgmap->ops->migrate_to_ram(vmf);
> 
> 				...
> 			}
> 
> 			...
> 		}
> 
> 		...
> 	}
> 
> 	...
> }
> 
> (BTW, we seriously need to clean this up).

What did you have in mind here?

> And in dmirror_devmem_fault callback():
> 
> static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
> {
> 	...
> 
> 	/*
> 	 * Normally, a device would use the page->zone_device_data to point to
> 	 * the mirror but here we use it to hold the page for the simulated
> 	 * device memory and that page holds the pointer to the mirror.
> 	 */
> 	rpage = folio_zone_device_data(page_folio(vmf->page));
> 	dmirror = rpage->zone_device_data;
> 
> 	...
> 
> 	args.pgmap_owner = dmirror->mdevice; <-- oops
> 
> 	...
> }
> 
> So in terms of fixing:
> 
> 1. Fix the test (trivial)
> 
> Use
> 
> 	size = read_pmd_pagesize();
> 
> Instead of:
> 
> 	size = TWOMEG;

Adding Balbir as this would have come in with his hugepage changes.

> 2. Have dmirror_fops_release() migrate all the device private pages back to ram
>    before freeing dmirror or something like this

Oh yeah that's bad. We definitely need to do that migration once the file is
closed.

> You'd want to abstract code from dmirror_migrate_to_system() to be shared
> between the two functions I think.
> 
> But I leave that as an exercise for the reader :)

Good thing I can't read :) I can try and put something together but that won't
happen before next week, so I won't complain if someone beats me to it. Thanks
for the detailed analysis and report though!

> > leading to the failure of the first hmm_migrate_sys_to_dev(). The test
> > program received a SIGABRT signal and initiated vfs_coredump(). And
> > something in the test_hmm module doesn't play well with the coredump
> > process, which ends up with a panic. I'm not familiar with that.
> >
> > Note that I can also reproduce the panic by aborting the test manually
> > with following diff (and skipping migrate_anon_huge{,_zero}_err()):
> >
> > diff --git a/tools/testing/selftests/mm/hmm-tests.c
> > b/tools/testing/selftests/mm/hmm-tests.c
> > index e8328c89d855..8d8ea8063a73 100644
> > --- a/tools/testing/selftests/mm/hmm-tests.c
> > +++ b/tools/testing/selftests/mm/hmm-tests.c
> > @@ -1027,6 +1027,8 @@ TEST_F(hmm, migrate)
> >  	ASSERT_EQ(ret, 0);
> >  	ASSERT_EQ(buffer->cpages, npages);
> >
> > +	ASSERT_TRUE(0);
> 
> This makes sense as the same dangling dmirror pointer issue arises.
> 
> > +
> >  	/* Check what the device read. */
> >  	for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
> >  		ASSERT_EQ(ptr[i], i);
> >
> > Please have a look!
> 
> Hopefully did so usefully here :)
> 
> >
> > Thanks,
> > Zenghui
> 
> Cheers, Lorenzo
> 


  reply	other threads:[~2026-03-19  1:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-18  5:26 running mm/ksft_hmm.sh on arm64 results in a kernel panic Zenghui Yu
2026-03-18 15:05 ` Lorenzo Stoakes (Oracle)
2026-03-19  1:49   ` Alistair Popple [this message]
2026-03-19  2:00     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abtUZpLOSaSMAkCK@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=david@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=zenghui.yu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox