All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Zenghui Yu <zenghui.yu@linux.dev>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Zenghui Yu <yuzenghui@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, jgg@ziepe.ca,
	 leon@kernel.org, Andrew Morton <akpm@linux-foundation.org>,
	ljs@kernel.org,  liam@infradead.org, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com,  mhocko@suse.com
Subject: Re: "alloc_tag was not set" when running mm/ksft_hmm.sh
Date: Tue, 12 May 2026 11:28:27 +1000	[thread overview]
Message-ID: <agJ9762r0HwKSsb7@nvdebian.thelocal> (raw)
In-Reply-To: <dff84fe1-ae17-410a-93a2-c5fb921e7f8f@linux.dev>

On 2026-05-12 at 02:38 +1000, Zenghui Yu <zenghui.yu@linux.dev> wrote...
> Hi David,
> 
> On 5/11/26 8:47 PM, David Hildenbrand (Arm) wrote:
> > On 5/11/26 14:19, Zenghui Yu wrote:
> > > On 2026/5/8 19:53, David Hildenbrand (Arm) wrote:
> > > > On 5/6/26 17:42, Zenghui Yu wrote:
> > > > > Hi all,
> > > > >
> > > > > Running mm/ksft_hmm.sh triggers the following splat:
> > > > >
> > > > >  ------------[ cut here ]------------
> > > > >  alloc_tag was not set
> > > > >  WARNING: ./include/linux/alloc_tag.h:164 at ___free_pages+0x2a0/0x2d0, CPU#5: hmm-tests/2020
> > > > >  Modules linked in: test_hmm rfkill drm backlight fuse
> > > > >  CPU: 5 UID: 0 PID: 2020 Comm: hmm-tests Kdump: loaded Not tainted 7.1.0-rc2-00099-gadc1e5c6203c-dirty #285 PREEMPT
> > > > >  Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202408-prebuilt.qemu.org 08/13/2024
> > > > >  pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > > >  pc : ___free_pages+0x2a0/0x2d0
> > > > >  lr : ___free_pages+0x2a0/0x2d0
> > > > >  sp : ffff80008345b530
> > > > >  x29: ffff80008345b530 x28: ffff80008345b700 x27: ffffffffbfff8040
> > > > >  x26: ffff0000c41cb360 x25: ffff0000c0c64008 x24: ffff800081aae400
> > > > >  x23: 05ffff0000000200 x22: 0000000000000000 x21: 0000000000000000
> > > > >  x20: fffffdffc5f20040 x19: 0000000000000000 x18: fffffffffffe7c78
> > > > >  x17: 0000000000000000 x16: 0000000000000000 x15: fffffffffffe7c98
> > > > >  x14: 00000000000001d1 x13: ffff8000818f3d58 x12: 0000000000000573
> > > > >  x11: fffffffffffe7c98 x10: ffff80008194bd58 x9 : 3ffffffffffff000
> > > > >  x8 : ffff8000818f3d58 x7 : ffff80008194bd58 x6 : 0000000000000000
> > > > >  x5 : ffff0001fedb1088 x4 : 0000000000000001 x3 : 0000000000000000
> > > > >  x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c7f58000
> > > > >  Call trace:
> > > > >   ___free_pages+0x2a0/0x2d0 (P)
> > > > >   __free_pages+0x14/0x20
> > > > >   dmirror_devmem_free+0x13c/0x158 [test_hmm]
> > > > >   free_zone_device_folio+0x144/0x1e4
> > > > >   __folio_put+0x124/0x130
> > > > >   free_folio_and_swap_cache+0xa8/0xcc
> > > > >   __folio_split+0x664/0x7fc
> > > > >   split_folio_to_list+0x50/0x5c
> > > > >   migrate_vma_split_folio+0x13c/0x25c
> > > > >   migrate_vma_collect_pmd+0xed4/0xf68
> > > > >   walk_pgd_range+0x598/0x9a0
> > > > >   __walk_page_range+0x90/0x1a0
> > > > >   walk_page_range_mm_unsafe+0x194/0x20c
> > > > >   walk_page_range+0x20/0x2c
> > > > >   migrate_vma_setup+0x18c/0x224
> > > > >   dmirror_devmem_fault+0x188/0x2b8 [test_hmm]
> > > > >   do_swap_page+0x1458/0x185c
> > > > >   __handle_mm_fault+0x85c/0x1ba0
> > > > >   handle_mm_fault+0xb0/0x290
> > > > >   do_page_fault+0x1f8/0x6f8
> > > > >   do_translation_fault+0x60/0x6c
> > > > >   do_mem_abort+0x44/0x94
> > > > >   el0_da+0x30/0xdc
> > > > >   el0t_64_sync_handler+0xd0/0xe4
> > > > >   el0t_64_sync+0x198/0x19c
> > > > >  ---[ end trace 0000000000000000 ]---
> > > > >  lib/test_hmm.c:705 module test_hmm func:dmirror_devmem_alloc_page has 16744448 allocated at module unload
> > > > >
> > > > >
> > > > > It was tested on kernel built with arm64's virt.config and
> > > > >
> > > > > +CONFIG_ZONE_DEVICE=y
> > > > > +CONFIG_DEVICE_PRIVATE=y
> > > > > +CONFIG_TEST_HMM=m
> > > > > +CONFIG_MEM_ALLOC_PROFILING=y
> > > > > +CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> > > >
> > > > I assume there is a weird interaction between alloc tags and simulated
> > > > ZONE_DEVICE memory in test_hmm.c
> > >
> > > FYI this can be reproduced by running the migrate_partial_unmap_fault
> > > test case.

Thanks. I have reproduced it now that my fingers are skinnier.

> > > TEST_F(hmm, migrate_partial_unmap_fault)
> > > {
> > > 	buffer->mirror = malloc(TWOMEG);
> > > 	buffer->ptr = map;	// points to a THP
> > >
> > > 	/* Initialize buffer in system memory. */
> > > 	for (i = 0, ptr = buffer->ptr; i < TWOMEG / sizeof(*ptr); ++i)
> > > 		ptr[i] = i;
> > >
> > > 	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
> > >
> > > 	munmap(buffer->ptr, ONEMEG);
> > >
> > > 	/* Fault pages back to system memory and check them. */
> > > 	for (i = 0, ptr = buffer->ptr; i < TWOMEG / sizeof(*ptr); ++i)
> > > 		if (i * sizeof(int) < 0 ||
> > > 		    i * sizeof(int) >= ONEMEG)
> > > 			ASSERT_EQ(ptr[i], i);	// triggers a fault ->
> > >
> > >
> > > dmirror_devmem_fault()
> > > 	migrate_vma_setup()
> > > 		migrate_vma_collect_pmd()
> > > 			// !pte_present(pte) && folio_test_large(folio)
> > > 			migrate_vma_split_folio()
> > > 				split_folio()
> > > 					[...]
> > >
> > > __folio_split() {
> > > 	unmap_folio();
> > >
> > > 	__folio_freeze_and_split_unmapped() {
> > > 		__split_unmapped_folio();
> > >
> > > 		for (...) {
> > > 			zone_device_private_split_cb(.., new_folio);
> > > 			// -> dmirror_devmem_folio_split() which doesn't
> > > 			// set alloc tag for the backing system memory
> > > 			// page being split, i.e., rpage_tail
> > > 		}
> > >
> > > 		zone_device_private_split_cb(.., NULL);
> > > 	}
> > >
> > > 	remap_page();
> > >
> > > 	for (...)
> > > 		free_folio_and_swap_cache(new_folio);
> > > 		// -> dmirror_devmem_free()/__free_page() which warns if
> > > 		// the page being freed doesn't have alloc tag set, in
> > > 		// alloc_tag_sub_check().
> > > }
> > >
> > > The WARN disappears with the following diff. But I'm not sure if I've
> > > missed more important points (which is likely to happen ;-) ).
> > >
> > > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > index ed1bdcf1f8ab..eefa2a739917 100644
> > > --- a/lib/alloc_tag.c
> > > +++ b/lib/alloc_tag.c
> > > @@ -191,6 +191,7 @@ void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
> > >  		}
> > >  	}
> > >  }
> > > +EXPORT_SYMBOL(pgalloc_tag_split);
> > >
> > >  void pgalloc_tag_swap(struct folio *new, struct folio *old)
> > >  {
> > > diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> > > index 213504915737..3bec51828916 100644
> > > --- a/lib/test_hmm.c
> > > +++ b/lib/test_hmm.c
> > > @@ -1713,6 +1713,7 @@ static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
> > >  	rfolio = page_folio(rpage);
> > >
> > >  	if (tail == NULL) {
> > > +		pgalloc_tag_split(rfolio, folio_order(rfolio), 0);
> > >  		folio_reset_order(rfolio);
> > >  		rfolio->mapping = NULL;
> > >  		folio_set_count(rfolio, 1);
> > 
> > 
> > zone_device_private_split_cb(), that ends up calling ->folio_split().
> > 
> > We do have a call to pgalloc_tag_split() in __split_unmapped_folio(), invoked in
> > __folio_freeze_and_split_unmapped() before calling
> > zone_device_private_split_cb() when iterating the folios.
> 
> If I read the code correctly, pgalloc_tag_split() in
> __split_unmapped_folio() deals with device private pages' alloc tag. But
> what alloc_tag_sub_check() warns on are real system memory pages (device
> page's backing page), which are allocated by
> dmirror_devmem_alloc_page()/folio_page().
> 
> static void dmirror_devmem_folio_split(struct folio *head, struct folio
> *tail)
> {
> 	struct page *rpage = BACKING_PAGE(folio_page(head, 0));
> 
> Thanks,
> Zenghui
> 
> > The zone_device_private_split_cb(folio, NULL); is then called on the first folio
> > after looping over the other (new) folios.
> >
> > I would assume that __folio_freeze_and_split_unmapped() would already do the
> > right thing?

Well you know what they say about assumptions :) Although in this case
__folio_freeze_and_split_unmapped() isn't called on the backing page anyway
(it's called to split the ZONE_DEVICE page, not the page simulating device
memory). The problem is we're not splitting the tag associated with the backing
page for the simulated memory.

I came up with the below fix last night, but I suspect it will quite reasonably
get NACKED on the basis of the symbol export so was looking at other solutions.

The simulated memory should just be used like a bare physical address range. So
there really is no reason for the backing page simulating device memory to be
allocated as a higher order folio. Using the struct page to store some metadata
for the simulated device is convenient though to avoid creating a test-specific
data structure for this. So I am looking at going back to allocating the
simulated backing memory as always order-0 pages in the test which is what it
was prior to the introduction of large device pages, but that was causing a
crash I'm yet to debug.

 - Alistair

---

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index ed1bdcf1f8ab..8828cfcbab43 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -191,6 +191,7 @@ void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
                }
        }
 }
+EXPORT_SYMBOL_GPL(pgalloc_tag_split);
 
 void pgalloc_tag_swap(struct folio *new, struct folio *old)
 {
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 213504915737..977f080de6f3 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -29,6 +29,7 @@
 #include <linux/rmap.h>
 #include <linux/mmu_notifier.h>
 #include <linux/migrate.h>
+#include <linux/pgalloc_tag.h>
 
 #include "test_hmm_uapi.h"
 
@@ -1713,6 +1714,16 @@ static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
        rfolio = page_folio(rpage);
 
        if (tail == NULL) {
+               pgalloc_tag_split(rfolio, folio_order(rfolio), 0);
                folio_reset_order(rfolio);
                rfolio->mapping = NULL;
                folio_set_count(rfolio, 1);

> > Maybe the issue is the hard-coded folio_reset_order() in
> > dmirror_devmem_folio_split(), where we seem to assume that we split to an
> > order-0 folio?



  parent reply	other threads:[~2026-05-12  1:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 15:42 "alloc_tag was not set" when running mm/ksft_hmm.sh Zenghui Yu
2026-05-08 11:53 ` David Hildenbrand (Arm)
2026-05-08 16:35   ` Alistair Popple
2026-05-11 12:19   ` Zenghui Yu
2026-05-11 12:47     ` David Hildenbrand (Arm)
2026-05-11 16:38       ` Zenghui Yu
2026-05-12  1:05         ` Zenghui Yu
2026-05-12  6:40           ` Alistair Popple
2026-05-12  1:28         ` Alistair Popple [this message]
2026-05-12  6:47           ` David Hildenbrand (Arm)
2026-05-12  7:46             ` Alistair Popple
2026-05-12  7:51               ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agJ9762r0HwKSsb7@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=yuzenghui@huawei.com \
    --cc=zenghui.yu@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.