The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Song Liu <songliubraving@fb.com>
Cc: Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files
Date: Fri, 8 May 2026 22:13:50 +0200	[thread overview]
Message-ID: <b8a3c3eb-f241-40fe-9121-4ae5a1097807@kernel.org> (raw)
In-Reply-To: <20260429153538.727855-9-ziy@nvidia.com>

On 4/29/26 17:35, Zi Yan wrote:
> collapse_file() is capable of collapsing pagecache folios from writable
> files to PMD folios. Now enable clean pagecache folio collapse in addition
> to read-only pagecache folio collapse by removing the
> inode_is_open_for_write() from file_thp_enabled() and only performing
> filemap_flush() if the file is read-only.
> 
> This means userspace needs to explicitly flush the content of pagecache
> folios before khugepaged can collapse the folios, or use
> madvise(MADV_COLLAPSE), which does the flush in the retry. The reason is
> that blindly enabling dirty pagecache folio from writable files collapse
> makes khugepaged flush these folios all the time. It is undesirable to
> cause system level pagecache flushes.
> 
> To properly support dirty pagecache folio collapse, filemap_flush() needs
> to be avoided. Potentially, merging associated buffer instead of dropping
> it with filemap_release_folio() might be needed.
> 
> NOTE: this breaks khugepaged selftests for writable file pagecache
> collapse, which is set to fail all the time. The next commit fix it.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/huge_memory.c | 2 +-
>  mm/khugepaged.c  | 9 ++++++++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9b3abb98a7e51..e1e9d59db6e70 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>  	if (!mapping_pmd_folio_support(vma->vm_file->f_mapping))
>  		return false;
>  
> -	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> +	return S_ISREG(inode->i_mode);
>  }
>  
>  /* If returns true, we are unable to access the VMA's folios. */
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1ee15b48962a3..fb7ff643973cc 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2345,7 +2345,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  				 * forcing writeback in loop.
>  				 */
>  				xas_unlock_irq(&xas);
> -				filemap_flush(mapping);
> +				/*
> +				 * Only flush for read-only files. Writable
> +				 * files can have their folios dirty at any
> +				 * time; blindly flushing them would cause
> +				 * undesirable system-wide writeback.
> +				 */

That comment should really be merged in the comment above.

Also, there we say "khugepaged only works on read-only fd" ... which is now just
wrong?

Please revise that whole comment as you incorporate your comment.

Apart from that I guess this is fine ... or we'll learn rather quickly, haha.

-- 
Cheers,

David

  parent reply	other threads:[~2026-05-08 20:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260429152924.727124-1-ziy@nvidia.com>
     [not found] ` <20260429152924.727124-4-ziy@nvidia.com>
2026-05-07  4:29   ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Lance Yang
2026-05-08 19:43   ` David Hildenbrand (Arm)
     [not found] ` <20260429153538.727855-7-ziy@nvidia.com>
     [not found]   ` <52285e2c-af42-4c0d-9926-017f80b6614c@redhat.com>
2026-05-06 13:11     ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-05-08 19:51       ` David Hildenbrand (Arm)
     [not found]   ` <7e42faea-9f55-4722-a426-94be7fc3a49b@redhat.com>
2026-05-06 13:15     ` Zi Yan
2026-05-07  6:35       ` Nico Pache
2026-05-07  7:21         ` Zi Yan
2026-05-07  7:24   ` Zi Yan
     [not found]   ` <B1D68BA2-11D2-4053-B715-F7704ED784DA@nvidia.com>
     [not found]     ` <3BFC4C26-1C97-40AA-B4B7-7472B9768565@nvidia.com>
2026-05-08 19:48       ` David Hildenbrand (Arm)
2026-05-08 20:06   ` David Hildenbrand (Arm)
     [not found] ` <20260429153538.727855-1-ziy@nvidia.com>
2026-05-07 12:48   ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Lance Yang
2026-05-08  2:52   ` Wei Yang
2026-05-08  3:22     ` Lance Yang
     [not found] ` <20260429153538.727855-5-ziy@nvidia.com>
2026-05-08  7:01   ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Lance Yang
2026-05-08 19:46   ` David Hildenbrand (Arm)
     [not found] ` <20260429153538.727855-9-ziy@nvidia.com>
2026-05-08  7:46   ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Lance Yang
     [not found]   ` <CFECCB44-EEEA-4D3F-A505-3BA2C564C107@nvidia.com>
2026-05-08 20:09     ` David Hildenbrand (Arm)
2026-05-08 20:13   ` David Hildenbrand (Arm) [this message]
     [not found] ` <20260429152924.727124-2-ziy@nvidia.com>
2026-05-07  6:08   ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-05-07  6:57     ` Zi Yan
2026-05-08 19:39   ` David Hildenbrand (Arm)
     [not found] ` <20260429091305.fd5a1c8c986c111527c2b024@linux-foundation.org>
2026-05-09 22:10   ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-05-11  7:19     ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8a3c3eb-f241-40fe-9121-4ae5a1097807@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=lance.yang@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox