All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-xe@lists.freedesktop.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Christian Koenig <christian.koenig@amd.com>,
	Huang Rui <ray.huang@amd.com>,
	Matthew Auld <matthew.auld@intel.com>,
	Matthew Brost <matthew.brost@intel.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/shmem: add shmem_insert_folio()
Date: Tue, 12 May 2026 22:03:47 +0200	[thread overview]
Message-ID: <26479389-459b-4cc4-914d-e7d29d5e5cc9@kernel.org> (raw)
In-Reply-To: <b4800086347aa3b27edc42b6536cb3a2b76a1aa8.camel@linux.intel.com>

On 5/12/26 13:31, Thomas Hellström wrote:
> Hi David,
> 
> Thanks for having a look.
> 
> On Tue, 2026-05-12 at 13:07 +0200, David Hildenbrand (Arm) wrote:
>>
>>>  
>>> +/**
>>> + * undo_compound_page() - Reverse the effect of
>>> prep_compound_page().
>>> + * @page: The head page of a compound page to demote.
>>> + *
>>> + * Returns the pages to non-compound state as if
>>> prep_compound_page()
>>> + * had never been called.  split_page() must NOT have been called
>>> on
>>> + * the compound page; tail refcounts must be 0.  The caller must
>>> ensure
>>> + * no other users hold references to the compound page.
>>> + */
>>> +void undo_compound_page(struct page *page)
>>> +{
>>> +	unsigned int i, nr = 1U << compound_order(page);
>>> +
>>> +	page[1].flags.f &= ~PAGE_FLAGS_SECOND;
>>> +	for (i = 1; i < nr; i++) {
>>> +		page[i].mapping = NULL;
>>> +		clear_compound_head(&page[i]);
>>> +	}
>>> +	ClearPageHead(page);
>>> +}
>>> +
>>>  static inline void set_buddy_order(struct page *page, unsigned int
>>> order)
>>>  {
>>>  	set_page_private(page, order);
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 3b5dc21b323c..45e80a74f77c 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -937,6 +937,111 @@ int shmem_add_to_page_cache(struct folio
>>> *folio,
>>>  	return 0;
>>>  }
>>>  
>>> +/**
>>> + * shmem_insert_folio() - Insert an isolated folio into a shmem
>>> file.
>>> + * @file: The shmem file created with shmem_file_setup().
>>> + * @folio: The folio to insert. Must be isolated (not on LRU),
>>> unlocked,
>>> + *         have exactly one reference (the caller's), have no
>>> page-table
>>> + *         mappings, and have folio->mapping == NULL.
>>> + * @order: The allocation order of @folio.  If @order > 0 and
>>> @folio is
>>> + *         not already a large (compound) folio, it will be
>>> promoted to a
>>> + *         compound folio of this order inside this function. 
>>> This requires
>>> + *         the standard post-alloc state: head refcount == 1, tail
>>> + *         refcounts == 0 (i.e. split_page() must NOT have been
>>> called).
>>> + *         On failure the promotion is reversed and the folio is
>>> returned
>>> + *         to its original non-compound state.
>>> + * @index: Page-cache index at which to insert. Must be aligned to
>>> + *         (1 << @order) and within the file's size.
>>> + * @writeback: If true, attempt immediate writeback to swap after
>>> insertion.
>>> + *             Best-effort; failure is silently ignored.
>>> + * @folio_gfp: The GFP flags to use for memory-cgroup charging.
>>> + *
>>> + * The folio is inserted zero-copy into the shmem page cache and
>>> placed on
>>> + * the anon LRU, where it participates in normal kernel reclaim
>>> (written to
>>> + * swap under memory pressure).  Any previous content at @index is
>>> discarded.
>>> + * On success the caller should release their reference with
>>> folio_put() and
>>> + * track the (@file, @index) pair for later recovery via
>>> shmem_read_folio()
>>> + * and release via shmem_truncate_range().
>>> + *
>>> + * Return: 0 on success.  On failure the folio is returned to its
>>> original
>>> + * state and the caller retains ownership.
>>> + */
>>> +int shmem_insert_folio(struct file *file, struct folio *folio,
>>> unsigned int order,
>>> +		       pgoff_t index, bool writeback, gfp_t
>>> folio_gfp)
>>> +{
>>> +	struct address_space *mapping = file->f_mapping;
>>> +	struct inode *inode = mapping->host;
>>> +	bool promoted;
>>> +	long nr_pages;
>>> +	int ret;
>>> +
>>> +	promoted = order > 0 && !folio_test_large(folio);
>>> +	if (promoted)
>>> +		prep_compound_page(&folio->page, order);
>>> +	nr_pages = folio_nr_pages(folio);
>>> +
>>> +	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>>> +	VM_BUG_ON_FOLIO(folio_mapped(folio), folio);
>>> +	VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
>>> +	VM_BUG_ON_FOLIO(folio->mapping, folio);
>>> +	VM_BUG_ON(index != round_down(index, nr_pages));
>>
>> No new VM_BUG_ON_FOLIO etc.
> 
> OK, can eliminate those. Is VM_WARN_ON_FOLIO() preferred,
> or any other type of assert?

VM_WARN_ON_FOLIO() is usually what you want, or VM_WARN_ON_ONCE().

> 
>>
>> But in general, pushing in random allocated pages into shmem,
>> converting them to
>> folios is not something I particularly enjoy seeing.
>>
> 
> OK, let me understand the concern. The pages are allocated as multi-
> page folios using alloc_pages(gfp, order), but typically not promoted
> to compound pages, until inserted here. Is it that promotion that is of
> concern or inserting pages of unknown origin into shmem? Anything we
> can do to alleviate that concern?

It's all rather questionable.

A couple of points:

a) The pages are allocated to be unmovable, but adding them to shmem effectively
   turns them movable. Now you interfere with the page allocator logic of
   placing movable and unmovable pages a reasonable way into
   pageblocks that group allocations of similar types.

b) A driver is not supposed to decide which folio size will be allocated for
   shmem. I am not even sure if there is a fencing on
   CONFIG_TRANSPARENT_HUGEPAGE somewhere when ending up with large folios. order
   > PMD_ORDER is currently essentially unsupported, and I suspect your code
   would  even allow for that (looking at ttm_pool_alloc_find_order).

   We also have some problems with the pagecache not actually supporting all
   MAX_PAGE_ORDER orders (see MAX_PAGECACHE_ORDER).

   You are bypassing shmem logic to decide on that completely.

   While these things might not actually cause harm for you today (although I
   suspect some of them might in shmem swapout code), we don't want drivers to
   make our life harder by doing completely unexpected things.

c) You pass folio + order, which is just the red flag that you are doing
   something extremely dodgy.

   You just cast something that is not a folio, and was not allocated to be a
   folio to a folio through page_folio(page). That will stop working completely
   in the future once we decouple struct page from struct folio.

   If it's not a folio with a proper set order, you should be passing page +
   order.

d) We are once more open-coding creation of a folio, by hand-crafting it
   ourselves.

   We have folio_alloc() and friends for a reason. Where we, for example, do a
   page_rmappable_folio().

   I am pretty sure that you are missing a call to page_rmappable_folio(),
   resulting in the large folios not getting folio_set_large_rmappable() set.

e) undo_compound_page(). No words.



*maybe* it would be a little less bad if you would just allocate a compound page
in your driver and use page_rmappable_folio() in there.

That wouldn't change a) or b), though.


> 
> Given the problem statement in the cover-letter, would there be a
> better direction to take here? We could, for example, bypass shmem and
> insert the folios directly into the swap-cache, (although there is an
> issue with the swap-cache when the number of swap_entries are close to
> being depleted).

Good question.
We'd have to keep swapoff and all of that working. For example, in
try_to_unuse(), we special-case shmem_unuse() to handle non-anonymous pages.

But then, the whole swapcache operates on folios ... so I am not sure if there
is a lot to be won by re-implementing what shmem already does?

-- 
Cheers,

David

  reply	other threads:[~2026-05-12 20:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:03 [PATCH 0/2] Insert instead of copy pages into shmem when shrinking Thomas Hellström
2026-05-12 11:03 ` [PATCH 1/2] mm/shmem: add shmem_insert_folio() Thomas Hellström
2026-05-12 11:07   ` David Hildenbrand (Arm)
2026-05-12 11:31     ` Thomas Hellström
2026-05-12 20:03       ` David Hildenbrand (Arm) [this message]
2026-05-13  7:47         ` Christian König
2026-05-13  8:31           ` Thomas Hellström
2026-05-13  9:30             ` David Hildenbrand (Arm)
2026-05-13  8:37           ` David Hildenbrand (Arm)
2026-05-13  8:51             ` Thomas Hellström
2026-05-13 10:03               ` David Hildenbrand (Arm)
2026-05-13 10:37                 ` Thomas Hellström
2026-05-13 11:36                   ` David Hildenbrand (Arm)
2026-05-13 14:53                     ` Thomas Hellström
2026-05-13 19:35                       ` David Hildenbrand (Arm)
2026-05-14 10:40                         ` Thomas Hellström
2026-05-13 11:54             ` Christian König
2026-05-13 19:43               ` David Hildenbrand (Arm)
2026-05-12 11:03 ` [PATCH 2/2] drm/ttm: Use ttm_backup_insert_folio() for zero-copy swapout Thomas Hellström
2026-05-12 16:46 ` ✗ CI.checkpatch: warning for Insert instead of copy pages into shmem when shrinking Patchwork
2026-05-12 16:47 ` ✓ CI.KUnit: success " Patchwork
2026-05-12 18:11 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-13  7:30 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26479389-459b-4cc4-914d-e7d29d5e5cc9@kernel.org \
    --to=david@kernel.org \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mripard@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=rppt@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.