Re: [PATCH 1/2] mm/shmem: add shmem_insert_folio()

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-xe@lists.freedesktop.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Huang Rui <ray.huang@amd.com>,
	Matthew Auld <matthew.auld@intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/shmem: add shmem_insert_folio()
Date: Wed, 13 May 2026 09:47:31 +0200	[thread overview]
Message-ID: <c6b4031a-5625-4b1b-8a51-6e47241d32c9@amd.com> (raw)
In-Reply-To: <26479389-459b-4cc4-914d-e7d29d5e5cc9@kernel.org>

Hi David & Thomas,

On 5/12/26 22:03, David Hildenbrand (Arm) wrote:
> On 5/12/26 13:31, Thomas Hellström wrote:
...
>>>> +int shmem_insert_folio(struct file *file, struct folio *folio,
>>>> unsigned int order,
>>>> +		       pgoff_t index, bool writeback, gfp_t
>>>> folio_gfp)
>>>> +{
>>>> +	struct address_space *mapping = file->f_mapping;
>>>> +	struct inode *inode = mapping->host;
>>>> +	bool promoted;
>>>> +	long nr_pages;
>>>> +	int ret;
>>>> +
>>>> +	promoted = order > 0 && !folio_test_large(folio);
>>>> +	if (promoted)
>>>> +		prep_compound_page(&folio->page, order);
>>>> +	nr_pages = folio_nr_pages(folio);
>>>> +
>>>> +	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>>>> +	VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
>>>> +	VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>>> +	VM_BUG_ON_FOLIO(folio->mapping, folio);
>>>> +	VM_BUG_ON(index != round_down(index, nr_pages));
>>>
>>> No new VM_BUG_ON_FOLIO etc.
>>
>> OK, can eliminate those. Is VM_WARN_ON_FOLIO() preferred,
>> or any other type of assert?
> 
> VM_WARN_ON_FOLIO() is usually what you want, or VM_WARN_ON_ONCE().
> 
>>
>>>
>>> But in general, pushing in random allocated pages into shmem,
>>> converting them to
>>> folios is not something I particularly enjoy seeing.
>>>
>>
>> OK, let me understand the concern. The pages are allocated as multi-
>> page folios using alloc_pages(gfp, order), but typically not promoted
>> to compound pages, until inserted here. Is it that promotion that is of
>> concern or inserting pages of unknown origin into shmem? Anything we
>> can do to alleviate that concern?
> 
> It's all rather questionable.
> 
> A couple of points:
> 
> a) The pages are allocated to be unmovable, but adding them to shmem effectively
>    turns them movable. Now you interfere with the page allocator logic of
>    placing movable and unmovable pages a reasonable way into
>    pageblocks that group allocations of similar types.
> 
> b) A driver is not supposed to decide which folio size will be allocated for
>    shmem.

Exactly that is one of the major reasons why we aren't using a shmem as backing store for TTM buffers in the first place.

While HW today can usually work with everything down to 4k it needs higher order pages for optimal performance.

So for example for AMD GPUs you need 2M pages or otherwise the performance goes down by ~30% in quite a number of use cases.

Everything between 4k and 2M and above 2M is still preferred because it results in better L0/L1 reach, but if you can't get 2M the L2 reach goes down so rapidly that people start to complain immediately. 

And that stuff is very specific for each vendor and HW generation. Some have the sweet spot at 64k, some at 256k, most at 2M.

>    I am not even sure if there is a fencing on
>    CONFIG_TRANSPARENT_HUGEPAGE somewhere when ending up with large folios. order
>    > PMD_ORDER is currently essentially unsupported, and I suspect your code
>    would  even allow for that (looking at ttm_pool_alloc_find_order).
> 
>    We also have some problems with the pagecache not actually supporting all
>    MAX_PAGE_ORDER orders (see MAX_PAGECACHE_ORDER).
> 
>    You are bypassing shmem logic to decide on that completely.
> 
>    While these things might not actually cause harm for you today (although I
>    suspect some of them might in shmem swapout code), we don't want drivers to
>    make our life harder by doing completely unexpected things.

Yeah but that is the requirement the HW has.

I mean we can keep torturing the buddy allocator to give us 2M pages, but essentially we want to get away from those specialized solutions and has more of the functionality necessary to driver the HW in the common Linux memory management code because that prevents vendors from re-implementing that stuff in their specific driver over and over again.

Regards,
Christian.

> c) You pass folio + order, which is just the red flag that you are doing
>    something extremely dodgy.
> 
>    You just cast something that is not a folio, and was not allocated to be a
>    folio to a folio through page_folio(page). That will stop working completely
>    in the future once we decouple struct page from struct folio.
> 
>    If it's not a folio with a proper set order, you should be passing page +
>    order.
> 
> d) We are once more open-coding creation of a folio, by hand-crafting it
>    ourselves.
> 
>    We have folio_alloc() and friends for a reason. Where we, for example, do a
>    page_rmappable_folio().
> 
>    I am pretty sure that you are missing a call to page_rmappable_folio(),
>    resulting in the large folios not getting folio_set_large_rmappable() set.
> 
> e) undo_compound_page(). No words.
> 
> 
> 
> *maybe* it would be a little less bad if you would just allocate a compound page
> in your driver and use page_rmappable_folio() in there.
> 
> That wouldn't change a) or b), though.
> 
> 
>>
>> Given the problem statement in the cover-letter, would there be a
>> better direction to take here? We could, for example, bypass shmem and
>> insert the folios directly into the swap-cache, (although there is an
>> issue with the swap-cache when the number of swap_entries are close to
>> being depleted).
> 
> Good question.
> We'd have to keep swapoff and all of that working. For example, in
> try_to_unuse(), we special-case shmem_unuse() to handle non-anonymous pages.
> 
> But then, the whole swapcache operates on folios ... so I am not sure if there
> is a lot to be won by re-implementing what shmem already does?
>

next prev parent reply	other threads:[~2026-05-13  7:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:03 [PATCH 0/2] Insert instead of copy pages into shmem when shrinking Thomas Hellström
2026-05-12 11:03 ` [PATCH 1/2] mm/shmem: add shmem_insert_folio() Thomas Hellström
2026-05-12 11:07   ` David Hildenbrand (Arm)
2026-05-12 11:31     ` Thomas Hellström
2026-05-12 20:03       ` David Hildenbrand (Arm)
2026-05-13  7:47         ` Christian König [this message]
2026-05-13  8:31           ` Thomas Hellström
2026-05-13  9:30             ` David Hildenbrand (Arm)
2026-05-13  8:37           ` David Hildenbrand (Arm)
2026-05-13  8:51             ` Thomas Hellström
2026-05-13 10:03               ` David Hildenbrand (Arm)
2026-05-12 11:03 ` [PATCH 2/2] drm/ttm: Use ttm_backup_insert_folio() for zero-copy swapout Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6b4031a-5625-4b1b-8a51-6e47241d32c9@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mripard@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox