All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Kairui Song <kasong@tencent.com>, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Chris Li <chrisl@kernel.org>, Nhat Pham <nphamcs@gmail.com>,
	Baoquan He <bhe@redhat.com>, Barry Song <baohua@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 4/9] mm/shmem, swap: tidy up swap entry splitting
Date: Sun, 6 Jul 2025 11:35:30 +0800	[thread overview]
Message-ID: <452cad4b-e0c7-4792-9272-69199fa52a55@linux.alibaba.com> (raw)
In-Reply-To: <20250704181748.63181-5-ryncsn@gmail.com>



On 2025/7/5 02:17, Kairui Song wrote:
> From: Kairui Song <kasong@tencent.com>
> 
> Instead of keeping different paths of splitting the entry before the
> swap in start, move the entry splitting after the swapin has put
> the folio in swap cache (or set the SWAP_HAS_CACHE bit). This way
> we only need one place and one unified way to split the large entry.
> Whenever swapin brought in a folio smaller than the shmem swap entry,
> split the entry and recalculate the entry and index for verification.
> 
> This removes duplicated codes and function calls, reduces LOC,
> and the split is less racy as it's guarded by swap cache now. So it
> will have a lower chance of repeated faults due to raced split.
> The compiler is also able to optimize the coder further:
> 
> bloat-o-meter results with GCC 14:
> 
> With DEBUG_SECTION_MISMATCH (-fno-inline-functions-called-once):
> ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-82 (-82)
> Function                                     old     new   delta
> shmem_swapin_folio                          2361    2279     -82
> Total: Before=33151, After=33069, chg -0.25%
> 
> With !DEBUG_SECTION_MISMATCH:
> ./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
> add/remove: 0/1 grow/shrink: 1/0 up/down: 949/-750 (199)
> Function                                     old     new   delta
> shmem_swapin_folio                          2878    3827    +949
> shmem_split_large_entry.isra                 750       -    -750
> Total: Before=33086, After=33285, chg +0.60%
> 
> Since shmem_split_large_entry is only called in one place now. The
> compiler will either generate more compact code, or inlined it for
> better performance.
> 
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>   mm/shmem.c | 53 +++++++++++++++++++++--------------------------------
>   1 file changed, 21 insertions(+), 32 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e43becfa04b3..217264315842 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2266,14 +2266,15 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>   	struct address_space *mapping = inode->i_mapping;
>   	struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL;
>   	struct shmem_inode_info *info = SHMEM_I(inode);
> +	swp_entry_t swap, index_entry;
>   	struct swap_info_struct *si;
>   	struct folio *folio = NULL;
>   	bool skip_swapcache = false;
> -	swp_entry_t swap;
>   	int error, nr_pages, order, split_order;
> +	pgoff_t offset;
>   
>   	VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
> -	swap = radix_to_swp_entry(*foliop);
> +	swap = index_entry = radix_to_swp_entry(*foliop);
>   	*foliop = NULL;
>   
>   	if (is_poisoned_swp_entry(swap))
> @@ -2321,46 +2322,35 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>   		}
>   
>   		/*
> -		 * Now swap device can only swap in order 0 folio, then we
> -		 * should split the large swap entry stored in the pagecache
> -		 * if necessary.
> -		 */
> -		split_order = shmem_split_large_entry(inode, index, swap, gfp);
> -		if (split_order < 0) {
> -			error = split_order;
> -			goto failed;
> -		}
> -
> -		/*
> -		 * If the large swap entry has already been split, it is
> +		 * Now swap device can only swap in order 0 folio, it is
>   		 * necessary to recalculate the new swap entry based on
> -		 * the old order alignment.
> +		 * the offset, as the swapin index might be unalgined.
>   		 */
> -		if (split_order > 0) {
> -			pgoff_t offset = index - round_down(index, 1 << split_order);
> -
> +		if (order) {
> +			offset = index - round_down(index, 1 << order);
>   			swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
>   		}
>   
> -		/* Here we actually start the io */
>   		folio = shmem_swapin_cluster(swap, gfp, info, index);
>   		if (!folio) {
>   			error = -ENOMEM;
>   			goto failed;
>   		}
> -	} else if (order > folio_order(folio)) {
> +	}
> +alloced:
> +	if (order > folio_order(folio)) {
>   		/*
> -		 * Swap readahead may swap in order 0 folios into swapcache
> +		 * Swapin may get smaller folios due to various reasons:
> +		 * It may fallback to order 0 due to memory pressure or race,
> +		 * swap readahead may swap in order 0 folios into swapcache
>   		 * asynchronously, while the shmem mapping can still stores
>   		 * large swap entries. In such cases, we should split the
>   		 * large swap entry to prevent possible data corruption.
>   		 */
> -		split_order = shmem_split_large_entry(inode, index, swap, gfp);
> +		split_order = shmem_split_large_entry(inode, index, index_entry, gfp);
>   		if (split_order < 0) {
> -			folio_put(folio);
> -			folio = NULL;
>   			error = split_order;
> -			goto failed;
> +			goto failed_nolock;
>   		}
>   
>   		/*
> @@ -2369,15 +2359,13 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>   		 * the old order alignment.
>   		 */
>   		if (split_order > 0) {
> -			pgoff_t offset = index - round_down(index, 1 << split_order);
> -
> +			offset = index - round_down(index, 1 << split_order);
>   			swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);

Obviously, you should use the original swap value 'index_entry' to 
calculate the new swap value.

With the following fix, you can add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>

diff --git a/mm/shmem.c b/mm/shmem.c
index d530df550f7f..1e8422ac863e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2361,7 +2361,7 @@ static int shmem_swapin_folio(struct inode *inode, 
pgoff_t index,
                  */
                 if (split_order > 0) {
                         offset = index - round_down(index, 1 << 
split_order);
-                       swap = swp_entry(swp_type(swap), 
swp_offset(swap) + offset);
+                       swap = swp_entry(swp_type(swap), 
swp_offset(index_swap) + offset);
                 }
         } else if (order < folio_order(folio)) {
                 swap.val = round_down(swap.val, 1 << folio_order(folio));



  reply	other threads:[~2025-07-06  3:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04 18:17 [PATCH v4 0/9] mm/shmem, swap: bugfix and improvement of mTHP swap-in Kairui Song
2025-07-04 18:17 ` [PATCH v4 1/9] mm/shmem, swap: improve cached mTHP handling and fix potential hung Kairui Song
2025-07-04 18:17 ` [PATCH v4 2/9] mm/shmem, swap: avoid redundant Xarray lookup during swapin Kairui Song
2025-07-04 18:17 ` [PATCH v4 3/9] mm/shmem, swap: tidy up THP swapin checks Kairui Song
2025-07-04 18:17 ` [PATCH v4 4/9] mm/shmem, swap: tidy up swap entry splitting Kairui Song
2025-07-06  3:35   ` Baolin Wang [this message]
2025-07-06 11:50     ` Kairui Song
2025-07-04 18:17 ` [PATCH v4 5/9] mm/shmem, swap: avoid false positive swap cache lookup Kairui Song
2025-07-07  7:53   ` Baolin Wang
2025-07-07  8:04     ` Kairui Song
2025-07-08  6:00       ` Baolin Wang
2025-07-04 18:17 ` [PATCH v4 6/9] mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO Kairui Song
2025-07-07  8:05   ` Baolin Wang
2025-07-04 18:17 ` [PATCH v4 7/9] mm/shmem, swap: simplify swapin path and result handling Kairui Song
2025-07-07  8:14   ` Baolin Wang
2025-07-04 18:17 ` [PATCH v4 8/9] mm/shmem, swap: simplify swap entry and index calculation of large swapin Kairui Song
2025-07-04 18:17 ` [PATCH v4 9/9] mm/shmem, swap: fix major fault counting Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=452cad4b-e0c7-4792-9272-69199fa52a55@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.