From: Jason Gunthorpe <jgg@nvidia.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
"Francois Dugast" <francois.dugast@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"Zi Yan" <ziy@nvidia.com>, "Alistair Popple" <apopple@nvidia.com>,
"adhavan Srinivasan" <maddy@linux.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"David Airlie" <airlied@gmail.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Lyude Paul" <lyude@redhat.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
"Balbir Singh" <balbirs@nvidia.com>,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Fri, 16 Jan 2026 20:51:14 -0400 [thread overview]
Message-ID: <20260117005114.GC1134360@nvidia.com> (raw)
In-Reply-To: <aWqgHTZ5hjlRvlKU@lstrano-desk.jf.intel.com>
On Fri, Jan 16, 2026 at 12:31:25PM -0800, Matthew Brost wrote:
> > I suppose we could be getting say an order-9 folio that was previously used
> > as two order-8 folios? And each of them had their _nr_pages in their head
>
> Yes, this is a good example. At this point we have idea what previous
> allocation(s) order(s) were - we could have multiple places in the loop
> where _nr_pages is populated, thus we have to clear this everywhere.
Why? The fact you have to use such a crazy expression to even access
_nr_pages strongly says nothing will read it as _nr_pages.
Explain each thing:
new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */
OK, the tail page flags need to be set right, and prep_compound_page()
called later depends on them being zero.
((struct folio *)(new_page - 1))->_nr_pages = 0;
Can't see a reason, nothing reads _nr_pages from a random tail
page. _nr_pages is the last 8 bytes of struct page so it overlaps
memcg_data, which is also not supposed to be read from a tail page?
new_folio->mapping = NULL;
Pointless, prep_compound_page() -> prep_compound_tail() -> p->mapping = TAIL_MAPPING;
new_folio->pgmap = pgmap; /* Also clear compound head */
Pointless, compound_head is set in prep_compound_tail(): set_compound_head(p, head);
new_folio->share = 0; /* fsdax only, unused for device private */
Not sure, certainly share isn't read from a tail page..
> > > Why can't this use the normal helpers, like memmap_init_compound()?
> > >
> > > struct folio *new_folio = page
> > >
> > > /* First 4 tail pages are part of struct folio */
> > > for (i = 4; i < (1UL << order); i++) {
> > > prep_compound_tail(..)
> > > }
> > >
> > > prep_comound_head(page, order)
> > > new_folio->_nr_pages = 0
> > >
> > > ??
>
> I've beat this to death with Alistair, normal helpers do not work here.
What do you mean? It already calls prep_compound_page()! The issue
seems to be that prep_compound_page() makes assumptions about what
values are in flags already?
So how about move that page flags mask logic into
prep_compound_tail()? I think that would help Vlastimil's
concern. That function is already touching most of the cache line so
an extra word shouldn't make a performance difference.
> An order zero allocation could have _nr_pages set in its page,
> new_folio->_nr_pages is page + 1 memory.
An order zero allocation does not have _nr_pages because it is in page
+1 memory that doesn't exist.
An order zero allocation might have memcg_data in the same slot, does
it need zeroing? If so why not add that to prep_compound_head() ?
Also, prep_compound_head() handles order 0 too:
if (IS_ENABLED(CONFIG_64BIT) || order > 1) {
atomic_set(&folio->_pincount, 0);
atomic_set(&folio->_entire_mapcount, -1);
}
if (order > 1)
INIT_LIST_HEAD(&folio->_deferred_list);
So some of the problem here looks to be not calling it:
if (order)
prep_compound_page(page, order);
So, remove that if ? Also shouldn't it be moved above the
set_page_count/lock_page ?
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
"Francois Dugast" <francois.dugast@intel.com>,
intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"Zi Yan" <ziy@nvidia.com>, "Alistair Popple" <apopple@nvidia.com>,
"adhavan Srinivasan" <maddy@linux.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
"Felix Kuehling" <Felix.Kuehling@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Simona Vetter" <simona@ffwll.ch>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Hildenbrand" <david@kernel.org>,
"Oscar Salvador" <osalvador@suse.de>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Leon Romanovsky" <leon@kernel.org>,
"Lorenzo Stoakes" <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Mike Rapoport" <rppt@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Michal Hocko" <mhocko@suse.com>,
linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
nouveau@lists.freedesktop.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Date: Fri, 16 Jan 2026 20:51:14 -0400 [thread overview]
Message-ID: <20260117005114.GC1134360@nvidia.com> (raw)
In-Reply-To: <aWqgHTZ5hjlRvlKU@lstrano-desk.jf.intel.com>
On Fri, Jan 16, 2026 at 12:31:25PM -0800, Matthew Brost wrote:
> > I suppose we could be getting say an order-9 folio that was previously used
> > as two order-8 folios? And each of them had their _nr_pages in their head
>
> Yes, this is a good example. At this point we have idea what previous
> allocation(s) order(s) were - we could have multiple places in the loop
> where _nr_pages is populated, thus we have to clear this everywhere.
Why? The fact you have to use such a crazy expression to even access
_nr_pages strongly says nothing will read it as _nr_pages.
Explain each thing:
new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */
OK, the tail page flags need to be set right, and prep_compound_page()
called later depends on them being zero.
((struct folio *)(new_page - 1))->_nr_pages = 0;
Can't see a reason, nothing reads _nr_pages from a random tail
page. _nr_pages is the last 8 bytes of struct page so it overlaps
memcg_data, which is also not supposed to be read from a tail page?
new_folio->mapping = NULL;
Pointless, prep_compound_page() -> prep_compound_tail() -> p->mapping = TAIL_MAPPING;
new_folio->pgmap = pgmap; /* Also clear compound head */
Pointless, compound_head is set in prep_compound_tail(): set_compound_head(p, head);
new_folio->share = 0; /* fsdax only, unused for device private */
Not sure, certainly share isn't read from a tail page..
> > > Why can't this use the normal helpers, like memmap_init_compound()?
> > >
> > > struct folio *new_folio = page
> > >
> > > /* First 4 tail pages are part of struct folio */
> > > for (i = 4; i < (1UL << order); i++) {
> > > prep_compound_tail(..)
> > > }
> > >
> > > prep_comound_head(page, order)
> > > new_folio->_nr_pages = 0
> > >
> > > ??
>
> I've beat this to death with Alistair, normal helpers do not work here.
What do you mean? It already calls prep_compound_page()! The issue
seems to be that prep_compound_page() makes assumptions about what
values are in flags already?
So how about move that page flags mask logic into
prep_compound_tail()? I think that would help Vlastimil's
concern. That function is already touching most of the cache line so
an extra word shouldn't make a performance difference.
> An order zero allocation could have _nr_pages set in its page,
> new_folio->_nr_pages is page + 1 memory.
An order zero allocation does not have _nr_pages because it is in page
+1 memory that doesn't exist.
An order zero allocation might have memcg_data in the same slot, does
it need zeroing? If so why not add that to prep_compound_head() ?
Also, prep_compound_head() handles order 0 too:
if (IS_ENABLED(CONFIG_64BIT) || order > 1) {
atomic_set(&folio->_pincount, 0);
atomic_set(&folio->_entire_mapcount, -1);
}
if (order > 1)
INIT_LIST_HEAD(&folio->_deferred_list);
So some of the problem here looks to be not calling it:
if (order)
prep_compound_page(page, order);
So, remove that if ? Also shouldn't it be moved above the
set_page_count/lock_page ?
Jason
next prev parent reply other threads:[~2026-01-17 9:39 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-16 11:10 [PATCH v6 0/5] Enable THP support in drm_pagemap Francois Dugast
2026-01-16 11:10 ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios Francois Dugast
2026-01-16 11:10 ` Francois Dugast
2026-01-16 13:10 ` Balbir Singh
2026-01-16 13:10 ` Balbir Singh
2026-01-16 16:07 ` Vlastimil Babka
2026-01-16 16:07 ` Vlastimil Babka
2026-01-16 17:20 ` Jason Gunthorpe
2026-01-16 17:20 ` Jason Gunthorpe
2026-01-16 17:27 ` Vlastimil Babka
2026-01-16 17:27 ` Vlastimil Babka
2026-01-22 8:02 ` Vlastimil Babka
2026-01-22 8:02 ` Vlastimil Babka
2026-01-16 17:49 ` Jason Gunthorpe
2026-01-16 17:49 ` Jason Gunthorpe
2026-01-16 19:17 ` Vlastimil Babka
2026-01-16 19:17 ` Vlastimil Babka
2026-01-16 20:31 ` Matthew Brost
2026-01-16 20:31 ` Matthew Brost
2026-01-17 0:51 ` Jason Gunthorpe [this message]
2026-01-17 0:51 ` Jason Gunthorpe
2026-01-17 3:55 ` Matthew Brost
2026-01-17 3:55 ` Matthew Brost
2026-01-17 4:42 ` Balbir Singh
2026-01-17 4:42 ` Balbir Singh
2026-01-17 5:27 ` Matthew Brost
2026-01-17 5:27 ` Matthew Brost
2026-01-19 5:59 ` Alistair Popple
2026-01-19 5:59 ` Alistair Popple
2026-01-19 14:20 ` Jason Gunthorpe
2026-01-19 14:20 ` Jason Gunthorpe
2026-01-19 20:09 ` Zi Yan
2026-01-19 20:09 ` Zi Yan
2026-01-19 20:35 ` Jason Gunthorpe
2026-01-19 20:35 ` Jason Gunthorpe
2026-01-19 22:15 ` Balbir Singh
2026-01-19 22:15 ` Balbir Singh
2026-01-20 2:50 ` Zi Yan
2026-01-20 2:50 ` Zi Yan
2026-01-20 13:53 ` Jason Gunthorpe
2026-01-20 13:53 ` Jason Gunthorpe
2026-01-21 3:01 ` Zi Yan
2026-01-21 3:01 ` Zi Yan
2026-01-22 7:19 ` Matthew Brost
2026-01-22 7:19 ` Matthew Brost
2026-01-22 8:00 ` Vlastimil Babka
2026-01-22 8:00 ` Vlastimil Babka
2026-01-22 9:10 ` Balbir Singh
2026-01-22 9:10 ` Balbir Singh
2026-01-22 21:41 ` Andrew Morton
2026-01-22 21:41 ` Andrew Morton
2026-01-22 22:53 ` Alistair Popple
2026-01-22 22:53 ` Alistair Popple
2026-01-23 6:45 ` Vlastimil Babka
2026-01-23 6:45 ` Vlastimil Babka
2026-01-22 14:29 ` Jason Gunthorpe
2026-01-22 14:29 ` Jason Gunthorpe
2026-01-22 15:46 ` Jason Gunthorpe
2026-01-22 15:46 ` Jason Gunthorpe
2026-01-23 2:41 ` Zi Yan
2026-01-23 2:41 ` Zi Yan
2026-01-23 14:19 ` Jason Gunthorpe
2026-01-23 14:19 ` Jason Gunthorpe
2026-01-21 3:51 ` Balbir Singh
2026-01-21 3:51 ` Balbir Singh
2026-01-17 0:19 ` Jason Gunthorpe
2026-01-17 0:19 ` Jason Gunthorpe
2026-01-19 5:41 ` Alistair Popple
2026-01-19 5:41 ` Alistair Popple
2026-01-19 14:24 ` Jason Gunthorpe
2026-01-19 14:24 ` Jason Gunthorpe
2026-01-16 22:34 ` Andrew Morton
2026-01-16 22:34 ` Andrew Morton
2026-01-16 22:36 ` Matthew Brost
2026-01-16 22:36 ` Matthew Brost
2026-01-16 11:10 ` [PATCH v6 2/5] drm/pagemap: Unlock and put folios when possible Francois Dugast
2026-01-16 11:10 ` [PATCH v6 3/5] drm/pagemap: Add helper to access zone_device_data Francois Dugast
2026-01-16 11:10 ` [PATCH v6 4/5] drm/pagemap: Correct cpages calculation for migrate_vma_setup Francois Dugast
2026-01-16 11:37 ` Balbir Singh
2026-01-16 12:02 ` Francois Dugast
2026-01-16 11:10 ` [PATCH v6 5/5] drm/pagemap: Enable THP support for GPU memory migration Francois Dugast
2026-01-16 12:46 ` ✓ CI.KUnit: success for Enable THP support in drm_pagemap (rev7) Patchwork
2026-01-16 13:02 ` ✗ CI.checksparse: warning " Patchwork
2026-01-16 13:24 ` ✓ Xe.CI.BAT: success " Patchwork
2026-01-16 17:13 ` ✗ Xe.CI.Full: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260117005114.GC1134360@nvidia.com \
--to=jgg@nvidia.com \
--cc=Felix.Kuehling@amd.com \
--cc=Liam.Howlett@oracle.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=apopple@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=francois.dugast@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lyude@redhat.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=maddy@linux.ibm.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mpe@ellerman.id.au \
--cc=mripard@kernel.org \
--cc=nouveau@lists.freedesktop.org \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=rppt@kernel.org \
--cc=simona@ffwll.ch \
--cc=surenb@google.com \
--cc=tzimmermann@suse.de \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.