From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
"David Hildenbrand (Arm)" <david@kernel.org>,
intel-xe@lists.freedesktop.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Huang Rui <ray.huang@amd.com>,
Matthew Auld <matthew.auld@intel.com>,
Matthew Brost <matthew.brost@intel.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
David Airlie <airlied@gmail.com>,
Simona Vetter <simona@ffwll.ch>,
dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/shmem: add shmem_insert_folio()
Date: Wed, 13 May 2026 10:31:47 +0200 [thread overview]
Message-ID: <68705cec43ab43088327b21c9d9318036fb2a255.camel@linux.intel.com> (raw)
In-Reply-To: <c6b4031a-5625-4b1b-8a51-6e47241d32c9@amd.com>
Hi, David & Christian,
Thanks for the feedback. I'll respin this to see if I can come up with
something that is more acceptable given the comments.
A couple of questions and comments below.
On Wed, 2026-05-13 at 09:47 +0200, Christian König wrote:
> Hi David & Thomas,
>
> On 5/12/26 22:03, David Hildenbrand (Arm) wrote:
> > On 5/12/26 13:31, Thomas Hellström wrote:
> ...
> > > > > +int shmem_insert_folio(struct file *file, struct folio
> > > > > *folio,
> > > > > unsigned int order,
> > > > > + pgoff_t index, bool writeback, gfp_t
> > > > > folio_gfp)
> > > > > +{
> > > > > + struct address_space *mapping = file->f_mapping;
> > > > > + struct inode *inode = mapping->host;
> > > > > + bool promoted;
> > > > > + long nr_pages;
> > > > > + int ret;
> > > > > +
> > > > > + promoted = order > 0 && !folio_test_large(folio);
> > > > > + if (promoted)
> > > > > + prep_compound_page(&folio->page, order);
> > > > > + nr_pages = folio_nr_pages(folio);
> > > > > +
> > > > > + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> > > > > + VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
> > > > > + VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
> > > > > + VM_BUG_ON_FOLIO(folio->mapping, folio);
> > > > > + VM_BUG_ON(index != round_down(index, nr_pages));
> > > >
> > > > No new VM_BUG_ON_FOLIO etc.
> > >
> > > OK, can eliminate those. Is VM_WARN_ON_FOLIO() preferred,
> > > or any other type of assert?
> >
> > VM_WARN_ON_FOLIO() is usually what you want, or VM_WARN_ON_ONCE().
> >
> > >
> > > >
> > > > But in general, pushing in random allocated pages into shmem,
> > > > converting them to
> > > > folios is not something I particularly enjoy seeing.
> > > >
> > >
> > > OK, let me understand the concern. The pages are allocated as
> > > multi-
> > > page folios using alloc_pages(gfp, order), but typically not
> > > promoted
> > > to compound pages, until inserted here. Is it that promotion that
> > > is of
> > > concern or inserting pages of unknown origin into shmem? Anything
> > > we
> > > can do to alleviate that concern?
> >
> > It's all rather questionable.
> >
> > A couple of points:
> >
> > a) The pages are allocated to be unmovable, but adding them to
> > shmem effectively
> > turns them movable. Now you interfere with the page allocator
> > logic of
> > placing movable and unmovable pages a reasonable way into
> > pageblocks that group allocations of similar types.
> >
> > b) A driver is not supposed to decide which folio size will be
> > allocated for
> > shmem.
>
> Exactly that is one of the major reasons why we aren't using a shmem
> as backing store for TTM buffers in the first place.
>
> While HW today can usually work with everything down to 4k it needs
> higher order pages for optimal performance.
>
> So for example for AMD GPUs you need 2M pages or otherwise the
> performance goes down by ~30% in quite a number of use cases.
>
> Everything between 4k and 2M and above 2M is still preferred because
> it results in better L0/L1 reach, but if you can't get 2M the L2
> reach goes down so rapidly that people start to complain immediately.
>
> And that stuff is very specific for each vendor and HW generation.
> Some have the sweet spot at 64k, some at 256k, most at 2M.
>
> > I am not even sure if there is a fencing on
> > CONFIG_TRANSPARENT_HUGEPAGE somewhere when ending up with large
> > folios. order
> > > PMD_ORDER is currently essentially unsupported, and I suspect
> > your code
> > would even allow for that (looking at
> > ttm_pool_alloc_find_order).
> >
> > We also have some problems with the pagecache not actually
> > supporting all
> > MAX_PAGE_ORDER orders (see MAX_PAGECACHE_ORDER).
> >
> > You are bypassing shmem logic to decide on that completely.
> >
> > While these things might not actually cause harm for you today
> > (although I
> > suspect some of them might in shmem swapout code), we don't want
> > drivers to
> > make our life harder by doing completely unexpected things.
>
> Yeah but that is the requirement the HW has.
>
> I mean we can keep torturing the buddy allocator to give us 2M pages,
> but essentially we want to get away from those specialized solutions
> and has more of the functionality necessary to driver the HW in the
> common Linux memory management code because that prevents vendors
> from re-implementing that stuff in their specific driver over and
> over again.
For the code at hand, if we insert an order 10 folio shmem will split
it at writeout time but spit out a warning (if enabled) at the same
time. For this particular use-case, I think it might make sense for the
drivers that use direct insertion to cap the page-allocator orders to
THP size (2M).
>
> Regards,
> Christian.
>
> > c) You pass folio + order, which is just the red flag that you are
> > doing
> > something extremely dodgy.
> >
> > You just cast something that is not a folio, and was not
> > allocated to be a
> > folio to a folio through page_folio(page). That will stop
> > working completely
> > in the future once we decouple struct page from struct folio.
> >
> > If it's not a folio with a proper set order, you should be
> > passing page +
> > order.
> >
> > d) We are once more open-coding creation of a folio, by hand-
> > crafting it
> > ourselves.
> >
> > We have folio_alloc() and friends for a reason. Where we, for
> > example, do a
> > page_rmappable_folio().
> >
> > I am pretty sure that you are missing a call to
> > page_rmappable_folio(),
> > resulting in the large folios not getting
> > folio_set_large_rmappable() set.
> >
> > e) undo_compound_page(). No words.
> >
> >
> >
> > *maybe* it would be a little less bad if you would just allocate a
> > compound page
> > in your driver and use page_rmappable_folio() in there.
OK, yes it sounds like a prereq for this is that the driver actually
allocates compound pages. It might be that the TTM comment about *not*
doing that is stale, but need to check.
Would it be acceptable to export a function from core mm to split an
isolated folio?
> >
> > That wouldn't change a) or b), though.
> >
> >
> > >
> > > Given the problem statement in the cover-letter, would there be a
> > > better direction to take here? We could, for example, bypass
> > > shmem and
> > > insert the folios directly into the swap-cache, (although there
> > > is an
> > > issue with the swap-cache when the number of swap_entries are
> > > close to
> > > being depleted).
> >
> > Good question.
> > We'd have to keep swapoff and all of that working. For example, in
> > try_to_unuse(), we special-case shmem_unuse() to handle non-
> > anonymous pages.
> >
> > But then, the whole swapcache operates on folios ... so I am not
> > sure if there
> > is a lot to be won by re-implementing what shmem already does?
> >
Still that would alleviate a) and b), right? At least as long as we
keep folio sizes within the swap cache limits?
Thanks,
Thomas
next prev parent reply other threads:[~2026-05-13 8:31 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 11:03 [PATCH 0/2] Insert instead of copy pages into shmem when shrinking Thomas Hellström
2026-05-12 11:03 ` [PATCH 1/2] mm/shmem: add shmem_insert_folio() Thomas Hellström
2026-05-12 11:07 ` David Hildenbrand (Arm)
2026-05-12 11:31 ` Thomas Hellström
2026-05-12 20:03 ` David Hildenbrand (Arm)
2026-05-13 7:47 ` Christian König
2026-05-13 8:31 ` Thomas Hellström [this message]
2026-05-13 9:30 ` David Hildenbrand (Arm)
2026-05-13 8:37 ` David Hildenbrand (Arm)
2026-05-13 8:51 ` Thomas Hellström
2026-05-13 10:03 ` David Hildenbrand (Arm)
2026-05-13 10:37 ` Thomas Hellström
2026-05-13 11:36 ` David Hildenbrand (Arm)
2026-05-13 14:53 ` Thomas Hellström
2026-05-13 19:35 ` David Hildenbrand (Arm)
2026-05-14 10:40 ` Thomas Hellström
2026-05-13 11:54 ` Christian König
2026-05-13 19:43 ` David Hildenbrand (Arm)
2026-05-12 11:03 ` [PATCH 2/2] drm/ttm: Use ttm_backup_insert_folio() for zero-copy swapout Thomas Hellström
2026-05-12 16:46 ` ✗ CI.checkpatch: warning for Insert instead of copy pages into shmem when shrinking Patchwork
2026-05-12 16:47 ` ✓ CI.KUnit: success " Patchwork
2026-05-12 18:11 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-13 7:30 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=68705cec43ab43088327b21c9d9318036fb2a255.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=christian.koenig@amd.com \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jackmanb@google.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=mripard@kernel.org \
--cc=ray.huang@amd.com \
--cc=rppt@kernel.org \
--cc=simona@ffwll.ch \
--cc=surenb@google.com \
--cc=tzimmermann@suse.de \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.