Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	"David Hildenbrand (Arm)" <david@kernel.org>,
	intel-xe@lists.freedesktop.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	 "Liam R. Howlett"	 <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport	 <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko	 <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
	Baolin Wang	 <baolin.wang@linux.alibaba.com>,
	Brendan Jackman <jackmanb@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Huang Rui <ray.huang@amd.com>,
	Matthew Auld	 <matthew.auld@intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann	 <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>,
	Simona Vetter	 <simona@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/shmem: add shmem_insert_folio()
Date: Wed, 13 May 2026 10:31:47 +0200	[thread overview]
Message-ID: <68705cec43ab43088327b21c9d9318036fb2a255.camel@linux.intel.com> (raw)
In-Reply-To: <c6b4031a-5625-4b1b-8a51-6e47241d32c9@amd.com>

Hi, David & Christian,

Thanks for the feedback. I'll respin this to see if I can come up with
something that is more acceptable given the comments.

A couple of questions and comments below.

On Wed, 2026-05-13 at 09:47 +0200, Christian König wrote:
> Hi David & Thomas,
> 
> On 5/12/26 22:03, David Hildenbrand (Arm) wrote:
> > On 5/12/26 13:31, Thomas Hellström wrote:
> ...
> > > > > +int shmem_insert_folio(struct file *file, struct folio
> > > > > *folio,
> > > > > unsigned int order,
> > > > > +		       pgoff_t index, bool writeback, gfp_t
> > > > > folio_gfp)
> > > > > +{
> > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > +	struct inode *inode = mapping->host;
> > > > > +	bool promoted;
> > > > > +	long nr_pages;
> > > > > +	int ret;
> > > > > +
> > > > > +	promoted = order > 0 && !folio_test_large(folio);
> > > > > +	if (promoted)
> > > > > +		prep_compound_page(&folio->page, order);
> > > > > +	nr_pages = folio_nr_pages(folio);
> > > > > +
> > > > > +	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> > > > > +	VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
> > > > > +	VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
> > > > > +	VM_BUG_ON_FOLIO(folio->mapping, folio);
> > > > > +	VM_BUG_ON(index != round_down(index, nr_pages));
> > > > 
> > > > No new VM_BUG_ON_FOLIO etc.
> > > 
> > > OK, can eliminate those. Is VM_WARN_ON_FOLIO() preferred,
> > > or any other type of assert?
> > 
> > VM_WARN_ON_FOLIO() is usually what you want, or VM_WARN_ON_ONCE().
> > 
> > > 
> > > > 
> > > > But in general, pushing in random allocated pages into shmem,
> > > > converting them to
> > > > folios is not something I particularly enjoy seeing.
> > > > 
> > > 
> > > OK, let me understand the concern. The pages are allocated as
> > > multi-
> > > page folios using alloc_pages(gfp, order), but typically not
> > > promoted
> > > to compound pages, until inserted here. Is it that promotion that
> > > is of
> > > concern or inserting pages of unknown origin into shmem? Anything
> > > we
> > > can do to alleviate that concern?
> > 
> > It's all rather questionable.
> > 
> > A couple of points:
> > 
> > a) The pages are allocated to be unmovable, but adding them to
> > shmem effectively
> >    turns them movable. Now you interfere with the page allocator
> > logic of
> >    placing movable and unmovable pages a reasonable way into
> >    pageblocks that group allocations of similar types.
> > 
> > b) A driver is not supposed to decide which folio size will be
> > allocated for
> >    shmem.
> 
> Exactly that is one of the major reasons why we aren't using a shmem
> as backing store for TTM buffers in the first place.
> 
> While HW today can usually work with everything down to 4k it needs
> higher order pages for optimal performance.
> 
> So for example for AMD GPUs you need 2M pages or otherwise the
> performance goes down by ~30% in quite a number of use cases.
> 
> Everything between 4k and 2M and above 2M is still preferred because
> it results in better L0/L1 reach, but if you can't get 2M the L2
> reach goes down so rapidly that people start to complain immediately.
> 
> And that stuff is very specific for each vendor and HW generation.
> Some have the sweet spot at 64k, some at 256k, most at 2M.
> 
> >    I am not even sure if there is a fencing on
> >    CONFIG_TRANSPARENT_HUGEPAGE somewhere when ending up with large
> > folios. order
> >    > PMD_ORDER is currently essentially unsupported, and I suspect
> > your code
> >    would  even allow for that (looking at
> > ttm_pool_alloc_find_order).
> > 
> >    We also have some problems with the pagecache not actually
> > supporting all
> >    MAX_PAGE_ORDER orders (see MAX_PAGECACHE_ORDER).
> > 
> >    You are bypassing shmem logic to decide on that completely.
> > 
> >    While these things might not actually cause harm for you today
> > (although I
> >    suspect some of them might in shmem swapout code), we don't want
> > drivers to
> >    make our life harder by doing completely unexpected things.
> 
> Yeah but that is the requirement the HW has.
> 
> I mean we can keep torturing the buddy allocator to give us 2M pages,
> but essentially we want to get away from those specialized solutions
> and has more of the functionality necessary to driver the HW in the
> common Linux memory management code because that prevents vendors
> from re-implementing that stuff in their specific driver over and
> over again.

For the code at hand, if we insert an order 10 folio shmem will split
it at writeout time but spit out a warning (if enabled) at the same
time. For this particular use-case, I think it might make sense for the
drivers that use direct insertion to cap the page-allocator orders to
THP size (2M).

> 
> Regards,
> Christian.
> 
> > c) You pass folio + order, which is just the red flag that you are
> > doing
> >    something extremely dodgy.
> > 
> >    You just cast something that is not a folio, and was not
> > allocated to be a
> >    folio to a folio through page_folio(page). That will stop
> > working completely
> >    in the future once we decouple struct page from struct folio.
> > 
> >    If it's not a folio with a proper set order, you should be
> > passing page +
> >    order.
> > 
> > d) We are once more open-coding creation of a folio, by hand-
> > crafting it
> >    ourselves.
> > 
> >    We have folio_alloc() and friends for a reason. Where we, for
> > example, do a
> >    page_rmappable_folio().
> > 
> >    I am pretty sure that you are missing a call to
> > page_rmappable_folio(),
> >    resulting in the large folios not getting
> > folio_set_large_rmappable() set.
> > 
> > e) undo_compound_page(). No words.
> > 
> > 
> > 
> > *maybe* it would be a little less bad if you would just allocate a
> > compound page
> > in your driver and use page_rmappable_folio() in there.

OK, yes it sounds like a prereq for this is that the driver actually
allocates compound pages. It might be that the TTM comment about *not*
doing that is stale, but need to check.

Would it be acceptable to export a function from core mm to split an
isolated folio?

> > 
> > That wouldn't change a) or b), though.
> > 
> > 
> > > 
> > > Given the problem statement in the cover-letter, would there be a
> > > better direction to take here? We could, for example, bypass
> > > shmem and
> > > insert the folios directly into the swap-cache, (although there
> > > is an
> > > issue with the swap-cache when the number of swap_entries are
> > > close to
> > > being depleted).
> > 
> > Good question.
> > We'd have to keep swapoff and all of that working. For example, in
> > try_to_unuse(), we special-case shmem_unuse() to handle non-
> > anonymous pages.
> > 
> > But then, the whole swapcache operates on folios ... so I am not
> > sure if there
> > is a lot to be won by re-implementing what shmem already does?
> > 

Still that would alleviate a) and b), right? At least as long as we
keep folio sizes within the swap cache limits?

Thanks,
Thomas



  reply	other threads:[~2026-05-13  8:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:03 [PATCH 0/2] Insert instead of copy pages into shmem when shrinking Thomas Hellström
2026-05-12 11:03 ` [PATCH 1/2] mm/shmem: add shmem_insert_folio() Thomas Hellström
2026-05-12 11:07   ` David Hildenbrand (Arm)
2026-05-12 11:31     ` Thomas Hellström
2026-05-12 20:03       ` David Hildenbrand (Arm)
2026-05-13  7:47         ` Christian König
2026-05-13  8:31           ` Thomas Hellström [this message]
2026-05-13  9:30             ` David Hildenbrand (Arm)
2026-05-13  8:37           ` David Hildenbrand (Arm)
2026-05-13  8:51             ` Thomas Hellström
2026-05-13 10:03               ` David Hildenbrand (Arm)
2026-05-13 10:37                 ` Thomas Hellström
2026-05-12 11:03 ` [PATCH 2/2] drm/ttm: Use ttm_backup_insert_folio() for zero-copy swapout Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68705cec43ab43088327b21c9d9318036fb2a255.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=christian.koenig@amd.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mripard@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox