From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Zi Yan <ziy@nvidia.com>, Matthew Wilcox <willy@infradead.org>
Cc: Song Liu <songliubraving@fb.com>, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check
Date: Wed, 15 Apr 2026 14:09:18 +0800 [thread overview]
Message-ID: <1024290c-a00a-45db-990e-50bcf7c817ff@linux.alibaba.com> (raw)
In-Reply-To: <CD565023-4FA6-44C4-8E40-8B06CB09B59F@nvidia.com>
On 4/14/26 4:34 AM, Zi Yan wrote:
> On 13 Apr 2026, at 16:20, Matthew Wilcox wrote:
>
>> On Mon, Apr 13, 2026 at 03:20:19PM -0400, Zi Yan wrote:
>>> collapse_file() requires FSes supporting large folio with at least
>>> PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that.
>>> MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem.
>>>
>>> While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE.
>>
>> Why? These are bugs. I don't think we gain anything from continuing.
>
> The goal is to catch these issues during development. VM_BUG_ON crashes
> the system and that is too much for such issues in collapse_file().
>
>>
>>> + /*
>>> + * skip files without PMD-order folio support
>>> + * do not check shmem, since MADV_COLLAPSE ignores shmem huge config
>>> + */
>>> + if (!shmem_file(file) && mapping_max_folio_order(mapping) < PMD_ORDER)
>>> + return SCAN_FAIL;
>>
>> I wonder if it should. If the commit message to 5a90c155defa is
>> to be believed,
>>
>> Since 'deny' is for emergencies and 'force' is for testing, performance
>> issues should not be a problem in real production environments, so don't
>> call mapping_set_large_folios() in __shmem_get_inode() when large folio is
>> disabled with mount huge=never option (default policy).
>>
>> so maybe MADV_COLLAPSE should honour huge=never?
>> Documentation/filesystems/tmpfs.rst implies that we do!
>>
>> huge=never Do not allocate huge pages. This is the default.
>> huge=always Attempt to allocate huge page every time a new page is needed.
>> huge=within_size Only allocate huge page if it will be fully within i_size.
>> Also respect madvise(2) hints.
>> huge=advise Only allocate huge page if requested with madvise(2).
>>
>> so what's the difference between huge=never and huge=madvise?
>
> I think madvise means MADV_HUGEPAGE for the region, not MADV_COLLAPSE.
Right.
> In v1, I did the check for shmem, but that regressed MADV_COLLAPSE, which
> always can collapse THPs on shmem. I know it sounds unreasonable, but
> that ship has sailed.
Previously, I tried to make MADV_COLLAPSE also honour the THP
configuration of shmem/tmpfs[1], but Hugh strongly objected and
explained the original intent of MADV_COLLAPSE[2]. I’ll quote Hugh’s
comments:
"
Seldom has a feature been so thorougly documented as MADV_COLLAPSE,
in its 6.1 commits and in the "man 2 madvise" page: which are
explicit about MADV_COLLAPSE providing a way to get THPs where the
sysfs setting governing automatic behaviour does not insert them.
We would all prefer a less messy world of THP tunables. I certainly
find plenty to dislike there too; and wish that a less assertive name
than "never" had been chosen originally for the default off position.
But please don't break the accepted and documented behaviour of
MADV_COLLAPSE now.
If you want to exclude all possibility of THPs, then please use the
prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me
who insisted that be respected by MADV_COLLAPSE back then).
"
Afterwards, we reached an agreement to keep the current logic, and
Lorenzo helped update the docs, see commit a27848a03504 (“docs: update
THP documentation to clarify sysfs ‘never’ setting”).
[1]
https://lore.kernel.org/all/cover.1750815384.git.baolin.wang@linux.alibaba.com/
[2]
https://lore.kernel.org/all/75c02dbf-4189-958d-515e-fa80bb2187fc@google.com/
next prev parent reply other threads:[~2026-04-15 6:09 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-13 20:20 ` Matthew Wilcox
2026-04-13 20:34 ` Zi Yan
2026-04-14 10:19 ` David Hildenbrand (Arm)
2026-04-14 10:20 ` David Hildenbrand (Arm)
2026-04-15 6:09 ` Baolin Wang [this message]
2026-04-14 10:29 ` David Hildenbrand (Arm)
2026-04-14 15:37 ` Lance Yang
2026-04-14 15:43 ` Lance Yang
2026-04-14 15:59 ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
2026-04-13 20:23 ` Matthew Wilcox
2026-04-13 20:28 ` Zi Yan
2026-04-14 10:38 ` David Hildenbrand (Arm)
2026-04-14 15:55 ` Zi Yan
2026-04-17 2:09 ` Zi Yan
2026-04-17 11:50 ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-14 10:40 ` David Hildenbrand (Arm)
2026-04-14 15:59 ` Zi Yan
2026-04-15 6:17 ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-14 10:40 ` David Hildenbrand (Arm)
2026-04-15 6:20 ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-13 20:33 ` Matthew Wilcox
2026-04-13 20:42 ` Zi Yan
2026-04-14 11:02 ` David Hildenbrand (Arm)
2026-04-14 16:30 ` Zi Yan
2026-04-14 18:14 ` David Hildenbrand (Arm)
2026-04-14 18:25 ` Zi Yan
2026-04-15 6:36 ` Baolin Wang
2026-04-15 8:00 ` David Hildenbrand (Arm)
2026-04-15 9:21 ` Baolin Wang
2026-04-15 18:01 ` Zi Yan
2026-04-16 0:49 ` Baolin Wang
2026-04-16 8:47 ` David Hildenbrand (Arm)
2026-04-16 13:56 ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-13 20:35 ` Matthew Wilcox
2026-04-14 11:02 ` David Hildenbrand (Arm)
2026-04-15 6:53 ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-13 20:38 ` Matthew Wilcox
2026-04-15 6:44 ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-13 20:41 ` Matthew Wilcox
2026-04-13 20:46 ` Zi Yan
2026-04-14 11:03 ` David Hildenbrand (Arm)
2026-04-15 6:47 ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-14 11:06 ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-14 11:06 ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
2026-04-13 20:47 ` Matthew Wilcox
2026-04-13 20:51 ` Zi Yan
2026-04-13 22:28 ` Matthew Wilcox
2026-04-14 11:09 ` David Hildenbrand (Arm)
2026-04-14 16:45 ` Zi Yan
2026-04-14 17:40 ` Matthew Wilcox
2026-04-14 17:53 ` Zi Yan
2026-04-14 11:07 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1024290c-a00a-45db-990e-50bcf7c817ff@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=lance.yang@linux.dev \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.