Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Zenghui Yu <zenghui.yu@linux.dev>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Zenghui Yu <yuzenghui@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, jgg@ziepe.ca,
	 leon@kernel.org, Andrew Morton <akpm@linux-foundation.org>,
	ljs@kernel.org,  liam@infradead.org, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com,  mhocko@suse.com
Subject: Re: "alloc_tag was not set" when running mm/ksft_hmm.sh
Date: Tue, 12 May 2026 11:28:27 +1000	[thread overview]
Message-ID: <agJ9762r0HwKSsb7@nvdebian.thelocal> (raw)
In-Reply-To: <dff84fe1-ae17-410a-93a2-c5fb921e7f8f@linux.dev>

On 2026-05-12 at 02:38 +1000, Zenghui Yu <zenghui.yu@linux.dev> wrote...
> Hi David,
> 
> On 5/11/26 8:47 PM, David Hildenbrand (Arm) wrote:
> > On 5/11/26 14:19, Zenghui Yu wrote:
> > > On 2026/5/8 19:53, David Hildenbrand (Arm) wrote:
> > > > On 5/6/26 17:42, Zenghui Yu wrote:
> > > > > Hi all,
> > > > >
> > > > > Running mm/ksft_hmm.sh triggers the following splat:
> > > > >
> > > > >  ------------[ cut here ]------------
> > > > >  alloc_tag was not set
> > > > >  WARNING: ./include/linux/alloc_tag.h:164 at ___free_pages+0x2a0/0x2d0, CPU#5: hmm-tests/2020
> > > > >  Modules linked in: test_hmm rfkill drm backlight fuse
> > > > >  CPU: 5 UID: 0 PID: 2020 Comm: hmm-tests Kdump: loaded Not tainted 7.1.0-rc2-00099-gadc1e5c6203c-dirty #285 PREEMPT
> > > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > > >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > > >  pc : ___free_pages+0x2a0/0x2d0
> > > > >  lr : ___free_pages+0x2a0/0x2d0
> > > > >  sp : ffff80008345b530
> > > > >  x29: ffff80008345b530 x28: ffff80008345b700 x27: ffffffffbfff8040
> > > > >  x26: ffff0000c41cb360 x25: ffff0000c0c64008 x24: ffff800081aae400
> > > > >  x23: 05ffff0000000200 x22: 0000000000000000 x21: 0000000000000000
> > > > >  x20: fffffdffc5f20040 x19: 0000000000000000 x18: fffffffffffe7c78
> > > > >  x17: 0000000000000000 x16: 0000000000000000 x15: fffffffffffe7c98
> > > > >  x14: 00000000000001d1 x13: ffff8000818f3d58 x12: 0000000000000573
> > > > >  x11: fffffffffffe7c98 x10: ffff80008194bd58 x9 : 3ffffffffffff000
> > > > >  x8 : ffff8000818f3d58 x7 : ffff80008194bd58 x6 : 0000000000000000
> > > > >  x5 : ffff0001fedb1088 x4 : 0000000000000001 x3 : 0000000000000000
> > > > >  x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c7f58000
> > > > >  Call trace:
> > > > >   ___free_pages+0x2a0/0x2d0 (P)
> > > > >   __free_pages+0x14/0x20
> > > > >   dmirror_devmem_free+0x13c/0x158 [test_hmm]
> > > > >   free_zone_device_folio+0x144/0x1e4
> > > > >   __folio_put+0x124/0x130
> > > > >   free_folio_and_swap_cache+0xa8/0xcc
> > > > >   __folio_split+0x664/0x7fc
> > > > >   split_folio_to_list+0x50/0x5c
> > > > >   migrate_vma_split_folio+0x13c/0x25c
> > > > >   migrate_vma_collect_pmd+0xed4/0xf68
> > > > >   walk_pgd_range+0x598/0x9a0
> > > > >   __walk_page_range+0x90/0x1a0
> > > > >   walk_page_range_mm_unsafe+0x194/0x20c
> > > > >   walk_page_range+0x20/0x2c
> > > > >   migrate_vma_setup+0x18c/0x224
> > > > >   dmirror_devmem_fault+0x188/0x2b8 [test_hmm]
> > > > >   do_swap_page+0x1458/0x185c
> > > > >   __handle_mm_fault+0x85c/0x1ba0
> > > > >   handle_mm_fault+0xb0/0x290
> > > > >   do_page_fault+0x1f8/0x6f8
> > > > >   do_translation_fault+0x60/0x6c
> > > > >   do_mem_abort+0x44/0x94
> > > > >   el0_da+0x30/0xdc
> > > > >   el0t_64_sync_handler+0xd0/0xe4
> > > > >   el0t_64_sync+0x198/0x19c
> > > > >  ---[ end trace 0000000000000000 ]---
> > > > >  lib/test_hmm.c:705 module test_hmm func:dmirror_devmem_alloc_page has 16744448 allocated at module unload
> > > > >
> > > > >
> > > > > It was tested on kernel built with arm64's virt.config and
> > > > >
> > > > > +CONFIG_ZONE_DEVICE=y
> > > > > +CONFIG_DEVICE_PRIVATE=y
> > > > > +CONFIG_TEST_HMM=m
> > > > > +CONFIG_MEM_ALLOC_PROFILING=y
> > > > > +CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> > > >
> > > > I assume there is a weird interaction between alloc tags and simulated
> > > > ZONE_DEVICE memory in test_hmm.c
> > >
> > > FYI this can be reproduced by running the migrate_partial_unmap_fault
> > > test case.

Thanks. I have reproduced it now that my fingers are skinnier.

> > > TEST_F(hmm, migrate_partial_unmap_fault)
> > > {
> > > 	buffer->mirror = malloc(TWOMEG);
> > > 	buffer->ptr = map;	// points to a THP
> > >
> > > 	/* Initialize buffer in system memory. */
> > > 	for (i = 0, ptr = buffer->ptr; i < TWOMEG / sizeof(*ptr); ++i)
> > > 		ptr[i] = i;
> > >
> > > 	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
> > >
> > > 	munmap(buffer->ptr, ONEMEG);
> > >
> > > 	/* Fault pages back to system memory and check them. */
> > > 	for (i = 0, ptr = buffer->ptr; i < TWOMEG / sizeof(*ptr); ++i)
> > > 		if (i * sizeof(int) < 0 ||
> > > 		    i * sizeof(int) >= ONEMEG)
> > > 			ASSERT_EQ(ptr[i], i);	// triggers a fault ->
> > >
> > >
> > > dmirror_devmem_fault()
> > > 	migrate_vma_setup()
> > > 		migrate_vma_collect_pmd()
> > > 			// !pte_present(pte) && folio_test_large(folio)
> > > 			migrate_vma_split_folio()
> > > 				split_folio()
> > > 					[...]
> > >
> > > __folio_split() {
> > > 	unmap_folio();
> > >
> > > 	__folio_freeze_and_split_unmapped() {
> > > 		__split_unmapped_folio();
> > >
> > > 		for (...) {
> > > 			zone_device_private_split_cb(.., new_folio);
> > > 			// -> dmirror_devmem_folio_split() which doesn't
> > > 			// set alloc tag for the backing system memory
> > > 			// page being split, i.e., rpage_tail
> > > 		}
> > >
> > > 		zone_device_private_split_cb(.., NULL);
> > > 	}
> > >
> > > 	remap_page();
> > >
> > > 	for (...)
> > > 		free_folio_and_swap_cache(new_folio);
> > > 		// -> dmirror_devmem_free()/__free_page() which warns if
> > > 		// the page being freed doesn't have alloc tag set, in
> > > 		// alloc_tag_sub_check().
> > > }
> > >
> > > The WARN disappears with the following diff. But I'm not sure if I've
> > > missed more important points (which is likely to happen ;-) ).
> > >
> > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > index ed1bdcf1f8ab..eefa2a739917 100644
> > > --- a/lib/alloc_tag.c
> > > +++ b/lib/alloc_tag.c
> > > @@ -191,6 +191,7 @@ void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
> > >  		}
> > >  	}
> > >  }
> > > +EXPORT_SYMBOL(pgalloc_tag_split);
> > >
> > >  void pgalloc_tag_swap(struct folio *new, struct folio *old)
> > >  {
> > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> > > index 213504915737..3bec51828916 100644
> > > --- a/lib/test_hmm.c
> > > +++ b/lib/test_hmm.c
> > > @@ -1713,6 +1713,7 @@ static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
> > >  	rfolio = page_folio(rpage);
> > >
> > >  	if (tail == NULL) {
> > > +		pgalloc_tag_split(rfolio, folio_order(rfolio), 0);
> > >  		folio_reset_order(rfolio);
> > >  		rfolio->mapping = NULL;
> > >  		folio_set_count(rfolio, 1);
> > 
> > 
> > zone_device_private_split_cb(), that ends up calling ->folio_split().
> > 
> > We do have a call to pgalloc_tag_split() in __split_unmapped_folio(), invoked in
> > __folio_freeze_and_split_unmapped() before calling
> > zone_device_private_split_cb() when iterating the folios.
> 
> If I read the code correctly, pgalloc_tag_split() in
> __split_unmapped_folio() deals with device private pages' alloc tag. But
> what alloc_tag_sub_check() warns on are real system memory pages (device
> page's backing page), which are allocated by
> dmirror_devmem_alloc_page()/folio_page().
> 
> static void dmirror_devmem_folio_split(struct folio *head, struct folio
> *tail)
> {
> 	struct page *rpage = BACKING_PAGE(folio_page(head, 0));
> 
> Thanks,
> Zenghui
> 
> > The zone_device_private_split_cb(folio, NULL); is then called on the first folio
> > after looping over the other (new) folios.
> >
> > I would assume that __folio_freeze_and_split_unmapped() would already do the
> > right thing?

Well you know what they say about assumptions :) Although in this case
__folio_freeze_and_split_unmapped() isn't called on the backing page anyway
(it's called to split the ZONE_DEVICE page, not the page simulating device
memory). The problem is we're not splitting the tag associated with the backing
page for the simulated memory.

I came up with the below fix last night, but I suspect it will quite reasonably
get NACKED on the basis of the symbol export so was looking at other solutions.

The simulated memory should just be used like a bare physical address range. So
there really is no reason for the backing page simulating device memory to be
allocated as a higher order folio. Using the struct page to store some metadata
for the simulated device is convenient though to avoid creating a test-specific
data structure for this. So I am looking at going back to allocating the
simulated backing memory as always order-0 pages in the test which is what it
was prior to the introduction of large device pages, but that was causing a
crash I'm yet to debug.

 - Alistair

---

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index ed1bdcf1f8ab..8828cfcbab43 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -191,6 +191,7 @@ void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
                }
        }
 }
+EXPORT_SYMBOL_GPL(pgalloc_tag_split);
 
 void pgalloc_tag_swap(struct folio *new, struct folio *old)
 {
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 213504915737..977f080de6f3 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -29,6 +29,7 @@
 #include <linux/rmap.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
+#include <linux/pgalloc_tag.h>
 
 #include "test_hmm_uapi.h"
 
@@ -1713,6 +1714,16 @@ static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
        rfolio = page_folio(rpage);
 
        if (tail == NULL) {
+               pgalloc_tag_split(rfolio, folio_order(rfolio), 0);
                folio_reset_order(rfolio);
                rfolio->mapping = NULL;
                folio_set_count(rfolio, 1);

> > Maybe the issue is the hard-coded folio_reset_order() in
> > dmirror_devmem_folio_split(), where we seem to assume that we split to an
> > order-0 folio?



  parent reply	other threads:[~2026-05-12  1:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 15:42 "alloc_tag was not set" when running mm/ksft_hmm.sh Zenghui Yu
2026-05-08 11:53 ` David Hildenbrand (Arm)
2026-05-08 16:35   ` Alistair Popple
2026-05-11 12:19   ` Zenghui Yu
2026-05-11 12:47     ` David Hildenbrand (Arm)
2026-05-11 16:38       ` Zenghui Yu
2026-05-12  1:05         ` Zenghui Yu
2026-05-12  6:40           ` Alistair Popple
2026-05-12  1:28         ` Alistair Popple [this message]
2026-05-12  6:47           ` David Hildenbrand (Arm)
2026-05-12  7:46             ` Alistair Popple
2026-05-12  7:51               ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agJ9762r0HwKSsb7@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=yuzenghui@huawei.com \
    --cc=zenghui.yu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox