linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] remove SWAP_MAP_SHMEM
@ 2024-10-02  1:20 Nhat Pham
  2024-10-02  1:20 ` [PATCH v2 1/1] swap: shmem: " Nhat Pham
  2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
  0 siblings, 2 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  1:20 UTC (permalink / raw)
  To: akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

Changelog:
v2:
	* Fix a WARN in the shmem THP swap path. Thanks Baolin, Yosry, and Barry
	for the report and the discussion on how to solve it.
	* Squash the two patches into one.
RFC v1: https://lore.kernel.org/all/20240923231142.4155415-1-nphamcs@gmail.com/

The SWAP_MAP_SHMEM state was originally introduced in the commit
aaa468653b4a ("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a
swap entry belongs to shmem during swapoff.

However, swapoff has since been rewritten drastically in the commit
b56a2d8af914 ("mm: rid swapoff of quadratic complexity"). Now
having swap count == SWAP_MAP_SHMEM value is basically the same as having
swap count == 1, and swap_shmem_alloc() behaves analogously to
swap_duplicate()
    
This RFC proposes the removal of this state and the associated helper to
simplify the state machine (both mentally and code-wise). We will also
have an extra state/special value that can be repurposed (for swap entries
that never gets re-duplicated).

Another motivation  is the new swap abstraction I am currently working on,
that would allow for swap/zswap decoupling, swapoff optimization, etc. The
fewer states and swap API functions there are, the simpler the conversion
will be.

Nhat Pham (1):
  swap: shmem: remove SWAP_MAP_SHMEM

 include/linux/swap.h | 16 ++++++++--------
 mm/shmem.c           |  2 +-
 mm/swapfile.c        | 41 +++++++++++++++++++++--------------------
 3 files changed, 30 insertions(+), 29 deletions(-)


base-commit: 391cad4424af8bb563e1504c5adaef0155b4abb6
-- 
2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:20 [PATCH v2 0/1] remove SWAP_MAP_SHMEM Nhat Pham
@ 2024-10-02  1:20 ` Nhat Pham
  2024-10-02  1:33   ` Yosry Ahmed
  2024-10-11  6:35   ` Huang, Ying
  2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
  1 sibling, 2 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  1:20 UTC (permalink / raw)
  To: akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

The SWAP_MAP_SHMEM state was introduced in the commit aaa468653b4a
("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a swap entry
belongs to shmem during swapoff.

However, swapoff has since been rewritten in the commit b56a2d8af914
("mm: rid swapoff of quadratic complexity"). Now having swap count ==
SWAP_MAP_SHMEM value is basically the same as having swap count == 1,
and swap_shmem_alloc() behaves analogously to swap_duplicate(). The only
difference of note is that swap_shmem_alloc() does not check for
-ENOMEM returned from __swap_duplicate(), but it is OK because shmem
never re-duplicates any swap entry it owns. This will stil be safe if we
use (batched) swap_duplicate() instead.

This commit adds swap_duplicate_nr(), the batched variant of
swap_duplicate(), and removes the SWAP_MAP_SHMEM state and the
associated swap_shmem_alloc() helper to simplify the state machine (both
mentally and in terms of actual code). We will also have an extra
state/special value that can be repurposed (for swap entries that never
gets re-duplicated).

Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 include/linux/swap.h | 16 ++++++++--------
 mm/shmem.c           |  2 +-
 mm/swapfile.c        | 41 +++++++++++++++++++++--------------------
 3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ca533b478c21..017f3c03ff7a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -232,7 +232,6 @@ enum {
 /* Special value in first swap_map */
 #define SWAP_MAP_MAX	0x3e	/* Max count */
 #define SWAP_MAP_BAD	0x3f	/* Note page is bad */
-#define SWAP_MAP_SHMEM	0xbf	/* Owned by shmem/tmpfs */
 
 /* Special value in each swap_map continuation */
 #define SWAP_CONT_MAX	0x7f	/* Max count */
@@ -482,8 +481,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry);
 extern swp_entry_t get_swap_page_of_type(int);
 extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
-extern void swap_shmem_alloc(swp_entry_t, int);
-extern int swap_duplicate(swp_entry_t);
+extern int swap_duplicate_nr(swp_entry_t, int);
 extern int swapcache_prepare(swp_entry_t entry, int nr);
 extern void swap_free_nr(swp_entry_t entry, int nr_pages);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
@@ -549,11 +547,7 @@ static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
 	return 0;
 }
 
-static inline void swap_shmem_alloc(swp_entry_t swp, int nr)
-{
-}
-
-static inline int swap_duplicate(swp_entry_t swp)
+static inline int swap_duplicate_nr(swp_entry_t swp, int nr)
 {
 	return 0;
 }
@@ -606,6 +600,12 @@ static inline int add_swap_extent(struct swap_info_struct *sis,
 }
 #endif /* CONFIG_SWAP */
 
+static inline int swap_duplicate(swp_entry_t entry)
+{
+	return swap_duplicate_nr(entry, 1);
+}
+
+
 static inline void free_swap_and_cache(swp_entry_t entry)
 {
 	free_swap_and_cache_nr(entry, 1);
diff --git a/mm/shmem.c b/mm/shmem.c
index 0613421e09e7..e3f72f99be32 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1561,7 +1561,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
 			NULL) == 0) {
 		shmem_recalc_inode(inode, 0, nr_pages);
-		swap_shmem_alloc(swap, nr_pages);
+		swap_duplicate_nr(swap, nr_pages);
 		shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
 
 		mutex_unlock(&shmem_swaplist_mutex);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..9bb94e618914 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1381,12 +1381,6 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *si,
 	if (usage == SWAP_HAS_CACHE) {
 		VM_BUG_ON(!has_cache);
 		has_cache = 0;
-	} else if (count == SWAP_MAP_SHMEM) {
-		/*
-		 * Or we could insist on shmem.c using a special
-		 * swap_shmem_free() and free_shmem_swap_and_cache()...
-		 */
-		count = 0;
 	} else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) {
 		if (count == COUNT_CONTINUED) {
 			if (swap_count_continued(si, offset, count))
@@ -3626,7 +3620,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
 
 	offset = swp_offset(entry);
 	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
-	VM_WARN_ON(usage == 1 && nr > 1);
 	ci = lock_cluster_or_swap_info(si, offset);
 
 	err = 0;
@@ -3652,6 +3645,13 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
 				err = -EEXIST;
 		} else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
 			err = -EINVAL;
+		} else {
+			/*
+			 * The only swap_duplicate_nr() caller that passes nr > 1 is shmem,
+			 * who never re-duplicates any swap entry it owns. So this should
+			 * not happen.
+			 */
+			VM_WARN_ON(nr > 1 && (count & ~COUNT_CONTINUED) == SWAP_MAP_MAX);
 		}
 
 		if (err)
@@ -3686,27 +3686,28 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
 	return err;
 }
 
-/*
- * Help swapoff by noting that swap entry belongs to shmem/tmpfs
- * (in which case its reference count is never incremented).
- */
-void swap_shmem_alloc(swp_entry_t entry, int nr)
-{
-	__swap_duplicate(entry, SWAP_MAP_SHMEM, nr);
-}
-
-/*
- * Increase reference count of swap entry by 1.
+/**
+ * swap_duplicate_nr() - Increase reference count of nr contiguous swap entries
+ *                       by 1.
+ *
+ * @entry: first swap entry from which we want to increase the refcount.
+ * @nr: Number of entries in range.
+ *
  * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
  * but could not be atomically allocated.  Returns 0, just as if it succeeded,
  * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
  * might occur if a page table entry has got corrupted.
+ *
+ * Note that we are currently not handling the case where nr > 1 and we need to
+ * add swap count continuation. This is OK, because no such user exists - shmem
+ * is the only user that can pass nr > 1, and it never re-duplicates any swap
+ * entry it owns.
  */
-int swap_duplicate(swp_entry_t entry)
+int swap_duplicate_nr(swp_entry_t entry, int nr)
 {
 	int err = 0;
 
-	while (!err && __swap_duplicate(entry, 1, 1) == -ENOMEM)
+	while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
 		err = add_swap_count_continuation(entry, GFP_ATOMIC);
 	return err;
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/1] remove SWAP_MAP_SHMEM
  2024-10-02  1:20 [PATCH v2 0/1] remove SWAP_MAP_SHMEM Nhat Pham
  2024-10-02  1:20 ` [PATCH v2 1/1] swap: shmem: " Nhat Pham
@ 2024-10-02  1:22 ` Nhat Pham
  2024-10-02  1:25   ` Nhat Pham
                     ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  1:22 UTC (permalink / raw)
  To: akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 6:20 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> Changelog:
> v2:
>         * Fix a WARN in the shmem THP swap path. Thanks Baolin, Yosry, and Barry
>         for the report and the discussion on how to solve it.

Hi Baolin, could you run your stress program again on this version?
FWIW, I was able to trigger 64kb mTHP swap in shmem + no-fallback
swapout, so I think the issue is fixed.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/1] remove SWAP_MAP_SHMEM
  2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
@ 2024-10-02  1:25   ` Nhat Pham
  2024-10-08  9:27   ` Baolin Wang
  2024-10-10  8:53   ` Baolin Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  1:25 UTC (permalink / raw)
  To: akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel, Baolin Wang

On Tue, Oct 1, 2024 at 6:22 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> Hi Baolin, could you run your stress program again on this version?
> FWIW, I was able to trigger 64kb mTHP swap in shmem + no-fallback
> swapout, so I think the issue is fixed.

Ooops, I think I missed Baolin in the cc. My apologies.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:20 ` [PATCH v2 1/1] swap: shmem: " Nhat Pham
@ 2024-10-02  1:33   ` Yosry Ahmed
  2024-10-02  1:58     ` Nhat Pham
  2024-10-11  6:35   ` Huang, Ying
  1 sibling, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-10-02  1:33 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 6:20 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> The SWAP_MAP_SHMEM state was introduced in the commit aaa468653b4a
> ("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a swap entry
> belongs to shmem during swapoff.
>
> However, swapoff has since been rewritten in the commit b56a2d8af914
> ("mm: rid swapoff of quadratic complexity"). Now having swap count ==
> SWAP_MAP_SHMEM value is basically the same as having swap count == 1,
> and swap_shmem_alloc() behaves analogously to swap_duplicate(). The only
> difference of note is that swap_shmem_alloc() does not check for
> -ENOMEM returned from __swap_duplicate(), but it is OK because shmem
> never re-duplicates any swap entry it owns. This will stil be safe if we
> use (batched) swap_duplicate() instead.
>
> This commit adds swap_duplicate_nr(), the batched variant of
> swap_duplicate(), and removes the SWAP_MAP_SHMEM state and the
> associated swap_shmem_alloc() helper to simplify the state machine (both
> mentally and in terms of actual code). We will also have an extra
> state/special value that can be repurposed (for swap entries that never
> gets re-duplicated).
>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> ---
>  include/linux/swap.h | 16 ++++++++--------
>  mm/shmem.c           |  2 +-
>  mm/swapfile.c        | 41 +++++++++++++++++++++--------------------
>  3 files changed, 30 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index ca533b478c21..017f3c03ff7a 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -232,7 +232,6 @@ enum {
>  /* Special value in first swap_map */
>  #define SWAP_MAP_MAX   0x3e    /* Max count */
>  #define SWAP_MAP_BAD   0x3f    /* Note page is bad */
> -#define SWAP_MAP_SHMEM 0xbf    /* Owned by shmem/tmpfs */
>
>  /* Special value in each swap_map continuation */
>  #define SWAP_CONT_MAX  0x7f    /* Max count */
> @@ -482,8 +481,7 @@ void put_swap_folio(struct folio *folio, swp_entry_t entry);
>  extern swp_entry_t get_swap_page_of_type(int);
>  extern int get_swap_pages(int n, swp_entry_t swp_entries[], int order);
>  extern int add_swap_count_continuation(swp_entry_t, gfp_t);
> -extern void swap_shmem_alloc(swp_entry_t, int);
> -extern int swap_duplicate(swp_entry_t);
> +extern int swap_duplicate_nr(swp_entry_t, int);
>  extern int swapcache_prepare(swp_entry_t entry, int nr);
>  extern void swap_free_nr(swp_entry_t entry, int nr_pages);
>  extern void swapcache_free_entries(swp_entry_t *entries, int n);
> @@ -549,11 +547,7 @@ static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>         return 0;
>  }
>
> -static inline void swap_shmem_alloc(swp_entry_t swp, int nr)
> -{
> -}
> -
> -static inline int swap_duplicate(swp_entry_t swp)
> +static inline int swap_duplicate_nr(swp_entry_t swp, int nr)
>  {
>         return 0;
>  }
> @@ -606,6 +600,12 @@ static inline int add_swap_extent(struct swap_info_struct *sis,
>  }
>  #endif /* CONFIG_SWAP */
>
> +static inline int swap_duplicate(swp_entry_t entry)
> +{
> +       return swap_duplicate_nr(entry, 1);
> +}
> +
> +

Nit: extra blank line.

>  static inline void free_swap_and_cache(swp_entry_t entry)
>  {
>         free_swap_and_cache_nr(entry, 1);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0613421e09e7..e3f72f99be32 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1561,7 +1561,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>                         __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
>                         NULL) == 0) {
>                 shmem_recalc_inode(inode, 0, nr_pages);
> -               swap_shmem_alloc(swap, nr_pages);
> +               swap_duplicate_nr(swap, nr_pages);
>                 shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
>
>                 mutex_unlock(&shmem_swaplist_mutex);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0cded32414a1..9bb94e618914 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1381,12 +1381,6 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *si,
>         if (usage == SWAP_HAS_CACHE) {
>                 VM_BUG_ON(!has_cache);
>                 has_cache = 0;
> -       } else if (count == SWAP_MAP_SHMEM) {
> -               /*
> -                * Or we could insist on shmem.c using a special
> -                * swap_shmem_free() and free_shmem_swap_and_cache()...
> -                */
> -               count = 0;
>         } else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) {
>                 if (count == COUNT_CONTINUED) {
>                         if (swap_count_continued(si, offset, count))
> @@ -3626,7 +3620,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>
>         offset = swp_offset(entry);
>         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> -       VM_WARN_ON(usage == 1 && nr > 1);
>         ci = lock_cluster_or_swap_info(si, offset);
>
>         err = 0;
> @@ -3652,6 +3645,13 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>                                 err = -EEXIST;
>                 } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
>                         err = -EINVAL;
> +               } else {
> +                       /*
> +                        * The only swap_duplicate_nr() caller that passes nr > 1 is shmem,
> +                        * who never re-duplicates any swap entry it owns. So this should

nit: I think "which" is the right word here, but I am not a native speaker :)

> +                        * not happen.
> +                        */
> +                       VM_WARN_ON(nr > 1 && (count & ~COUNT_CONTINUED) == SWAP_MAP_MAX);

Why not return an error in this case? I think we should add recovery
for bugs when it's possible and simple, which I believe is the case
here.

In shmem_writepage() we can add a WARN if swap_duplicate_nr() fails,
or propagate an error to the caller as well (perhaps this belongs in a
separate patch that does this for swap_shmem_alloc() first).

Sorry if I am being paranoid here, please let me know if this is the case.

>                 }
>
>                 if (err)
> @@ -3686,27 +3686,28 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>         return err;
>  }
>
> -/*
> - * Help swapoff by noting that swap entry belongs to shmem/tmpfs
> - * (in which case its reference count is never incremented).
> - */
> -void swap_shmem_alloc(swp_entry_t entry, int nr)
> -{
> -       __swap_duplicate(entry, SWAP_MAP_SHMEM, nr);
> -}
> -
> -/*
> - * Increase reference count of swap entry by 1.
> +/**
> + * swap_duplicate_nr() - Increase reference count of nr contiguous swap entries
> + *                       by 1.

Can we avoid the line break by using "refcount" instead of "reference count"?

> + *
> + * @entry: first swap entry from which we want to increase the refcount.
> + * @nr: Number of entries in range.
> + *
>   * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
>   * but could not be atomically allocated.  Returns 0, just as if it succeeded,
>   * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
>   * might occur if a page table entry has got corrupted.
> + *
> + * Note that we are currently not handling the case where nr > 1 and we need to
> + * add swap count continuation. This is OK, because no such user exists - shmem
> + * is the only user that can pass nr > 1, and it never re-duplicates any swap
> + * entry it owns.

Do we need this comment when we have the WARN + comment in __swap_duplicate()?

>   */
> -int swap_duplicate(swp_entry_t entry)
> +int swap_duplicate_nr(swp_entry_t entry, int nr)
>  {
>         int err = 0;
>
> -       while (!err && __swap_duplicate(entry, 1, 1) == -ENOMEM)
> +       while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
>                 err = add_swap_count_continuation(entry, GFP_ATOMIC);
>         return err;
>  }
> --
> 2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:33   ` Yosry Ahmed
@ 2024-10-02  1:58     ` Nhat Pham
  2024-10-02  2:04       ` Nhat Pham
  2024-10-02  2:11       ` Yosry Ahmed
  0 siblings, 2 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  1:58 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
>
> Nit: extra blank line.
>
> >  static inline void free_swap_and_cache(swp_entry_t entry)
> >  {
> >         free_swap_and_cache_nr(entry, 1);
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 0613421e09e7..e3f72f99be32 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1561,7 +1561,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> >                         __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
> >                         NULL) == 0) {
> >                 shmem_recalc_inode(inode, 0, nr_pages);
> > -               swap_shmem_alloc(swap, nr_pages);
> > +               swap_duplicate_nr(swap, nr_pages);
> >                 shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
> >
> >                 mutex_unlock(&shmem_swaplist_mutex);
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 0cded32414a1..9bb94e618914 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1381,12 +1381,6 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *si,
> >         if (usage == SWAP_HAS_CACHE) {
> >                 VM_BUG_ON(!has_cache);
> >                 has_cache = 0;
> > -       } else if (count == SWAP_MAP_SHMEM) {
> > -               /*
> > -                * Or we could insist on shmem.c using a special
> > -                * swap_shmem_free() and free_shmem_swap_and_cache()...
> > -                */
> > -               count = 0;
> >         } else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) {
> >                 if (count == COUNT_CONTINUED) {
> >                         if (swap_count_continued(si, offset, count))
> > @@ -3626,7 +3620,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >
> >         offset = swp_offset(entry);
> >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> > -       VM_WARN_ON(usage == 1 && nr > 1);
> >         ci = lock_cluster_or_swap_info(si, offset);
> >
> >         err = 0;
> > @@ -3652,6 +3645,13 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >                                 err = -EEXIST;
> >                 } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
> >                         err = -EINVAL;
> > +               } else {
> > +                       /*
> > +                        * The only swap_duplicate_nr() caller that passes nr > 1 is shmem,
> > +                        * who never re-duplicates any swap entry it owns. So this should
>
> nit: I think "which" is the right word here, but I am not a native speaker :)

Yeah I think it should be which. Fix(let) incoming.

>
> > +                        * not happen.
> > +                        */
> > +                       VM_WARN_ON(nr > 1 && (count & ~COUNT_CONTINUED) == SWAP_MAP_MAX);
>
> Why not return an error in this case? I think we should add recovery
> for bugs when it's possible and simple, which I believe is the case
> here.
>
> In shmem_writepage() we can add a WARN if swap_duplicate_nr() fails,
> or propagate an error to the caller as well (perhaps this belongs in a
> separate patch that does this for swap_shmem_alloc() first).
>
> Sorry if I am being paranoid here, please let me know if this is the case.

I was debating between WARN-ing here, and returning -ENOMEM and
WARN-ing at shmem's callsite.

My thinking is that if we return -ENOMEM here, it will work in the
current setup, for both shmem and other callsites. However, in the
future, if we add another user of swap_duplicate_nr(), this time
without guaranteeing that we won't need continuation, I think it won't
work unless we have the fallback logic in place as well:

while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
err = add_swap_count_continuation(entry, GFP_ATOMIC);

>
> >                 }
> >
> >                 if (err)
> > @@ -3686,27 +3686,28 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >         return err;
> >  }
> >
> > -/*
> > - * Help swapoff by noting that swap entry belongs to shmem/tmpfs
> > - * (in which case its reference count is never incremented).
> > - */
> > -void swap_shmem_alloc(swp_entry_t entry, int nr)
> > -{
> > -       __swap_duplicate(entry, SWAP_MAP_SHMEM, nr);
> > -}
> > -
> > -/*
> > - * Increase reference count of swap entry by 1.
> > +/**
> > + * swap_duplicate_nr() - Increase reference count of nr contiguous swap entries
> > + *                       by 1.
>
> Can we avoid the line break by using "refcount" instead of "reference count"?
>
> > + *
> > + * @entry: first swap entry from which we want to increase the refcount.
> > + * @nr: Number of entries in range.
> > + *
> >   * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> >   * but could not be atomically allocated.  Returns 0, just as if it succeeded,
> >   * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> >   * might occur if a page table entry has got corrupted.
> > + *
> > + * Note that we are currently not handling the case where nr > 1 and we need to
> > + * add swap count continuation. This is OK, because no such user exists - shmem
> > + * is the only user that can pass nr > 1, and it never re-duplicates any swap
> > + * entry it owns.
>
> Do we need this comment when we have the WARN + comment in __swap_duplicate()?

Here I'm just being cautious and include the limitation of the
function in the API documentation itself.

No strong opinions though.

>
> >   */
> > -int swap_duplicate(swp_entry_t entry)
> > +int swap_duplicate_nr(swp_entry_t entry, int nr)
> >  {
> >         int err = 0;
> >
> > -       while (!err && __swap_duplicate(entry, 1, 1) == -ENOMEM)
> > +       while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> >                 err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >         return err;
> >  }
> > --
> > 2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:58     ` Nhat Pham
@ 2024-10-02  2:04       ` Nhat Pham
  2024-10-02  2:06         ` Yosry Ahmed
  2024-10-02  2:11       ` Yosry Ahmed
  1 sibling, 1 reply; 16+ messages in thread
From: Nhat Pham @ 2024-10-02  2:04 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 6:58 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> I was debating between WARN-ing here, and returning -ENOMEM and
> WARN-ing at shmem's callsite.
>
> My thinking is that if we return -ENOMEM here, it will work in the
> current setup, for both shmem and other callsites. However, in the
> future, if we add another user of swap_duplicate_nr(), this time
> without guaranteeing that we won't need continuation, I think it won't
> work unless we have the fallback logic in place as well:
>
> while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> err = add_swap_count_continuation(entry, GFP_ATOMIC);

Sorry, I accidentally sent out the email without completing my explanation :)

Anyway, the point being, with the current implementation, any new user
would immediately hit a WARN and the implementer will know to check.

Whereas if we return -ENOMEM in __swap_duplicate(), then I think we
would just hang, no? We only try to add swap count continuation to the
first entry only, which is not sufficient to fix the problem.

I can probably whip up the fallback logic here, but it would be dead,
untestable code (as it has no users, and I cannot even conceive one to
test it). And the swap abstraction might render all of this moot
anyway.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  2:04       ` Nhat Pham
@ 2024-10-02  2:06         ` Yosry Ahmed
  2024-10-02  2:13           ` Yosry Ahmed
  0 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-10-02  2:06 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 7:04 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Oct 1, 2024 at 6:58 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > I was debating between WARN-ing here, and returning -ENOMEM and
> > WARN-ing at shmem's callsite.
> >
> > My thinking is that if we return -ENOMEM here, it will work in the
> > current setup, for both shmem and other callsites. However, in the
> > future, if we add another user of swap_duplicate_nr(), this time
> > without guaranteeing that we won't need continuation, I think it won't
> > work unless we have the fallback logic in place as well:
> >
> > while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> > err = add_swap_count_continuation(entry, GFP_ATOMIC);
>
> Sorry, I accidentally sent out the email without completing my explanation :)
>
> Anyway, the point being, with the current implementation, any new user
> would immediately hit a WARN and the implementer will know to check.
>
> Whereas if we return -ENOMEM in __swap_duplicate(), then I think we
> would just hang, no? We only try to add swap count continuation to the
> first entry only, which is not sufficient to fix the problem.
>
> I can probably whip up the fallback logic here, but it would be dead,
> untestable code (as it has no users, and I cannot even conceive one to
> test it). And the swap abstraction might render all of this moot
> anyway.

What I had in mind is not returning -ENOMEM at all, but something like
-EOPNOTSUPP. The swap_duplicate_nr() will just return the error to the
caller. All callers of swap_duplicate() and swap_duplicate_nr()
currently check the error except shmem.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:58     ` Nhat Pham
  2024-10-02  2:04       ` Nhat Pham
@ 2024-10-02  2:11       ` Yosry Ahmed
  1 sibling, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2024-10-02  2:11 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

[..]
> > > + *
> > > + * @entry: first swap entry from which we want to increase the refcount.
> > > + * @nr: Number of entries in range.
> > > + *
> > >   * Returns 0 for success, or -ENOMEM if a swap_count_continuation is required
> > >   * but could not be atomically allocated.  Returns 0, just as if it succeeded,
> > >   * if __swap_duplicate() fails for another reason (-EINVAL or -ENOENT), which
> > >   * might occur if a page table entry has got corrupted.
> > > + *
> > > + * Note that we are currently not handling the case where nr > 1 and we need to
> > > + * add swap count continuation. This is OK, because no such user exists - shmem
> > > + * is the only user that can pass nr > 1, and it never re-duplicates any swap
> > > + * entry it owns.
> >
> > Do we need this comment when we have the WARN + comment in __swap_duplicate()?
>
> Here I'm just being cautious and include the limitation of the
> function in the API documentation itself.
>
> No strong opinions though.

Maybe it would be more useful to add a warning in the loop if nr > 1,
with a comment that explains that the current -ENOMEM handling does
not properly handle nr > 1?

> >
> > >   */
> > > -int swap_duplicate(swp_entry_t entry)
> > > +int swap_duplicate_nr(swp_entry_t entry, int nr)
> > >  {
> > >         int err = 0;
> > >
> > > -       while (!err && __swap_duplicate(entry, 1, 1) == -ENOMEM)
> > > +       while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> > >                 err = add_swap_count_continuation(entry, GFP_ATOMIC);
> > >         return err;
> > >  }
> > > --
> > > 2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  2:06         ` Yosry Ahmed
@ 2024-10-02  2:13           ` Yosry Ahmed
  2024-10-02 18:01             ` Nhat Pham
  0 siblings, 1 reply; 16+ messages in thread
From: Yosry Ahmed @ 2024-10-02  2:13 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 7:06 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Oct 1, 2024 at 7:04 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > On Tue, Oct 1, 2024 at 6:58 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > >
> > > On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > I was debating between WARN-ing here, and returning -ENOMEM and
> > > WARN-ing at shmem's callsite.
> > >
> > > My thinking is that if we return -ENOMEM here, it will work in the
> > > current setup, for both shmem and other callsites. However, in the
> > > future, if we add another user of swap_duplicate_nr(), this time
> > > without guaranteeing that we won't need continuation, I think it won't
> > > work unless we have the fallback logic in place as well:
> > >
> > > while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> > > err = add_swap_count_continuation(entry, GFP_ATOMIC);
> >
> > Sorry, I accidentally sent out the email without completing my explanation :)
> >
> > Anyway, the point being, with the current implementation, any new user
> > would immediately hit a WARN and the implementer will know to check.
> >
> > Whereas if we return -ENOMEM in __swap_duplicate(), then I think we
> > would just hang, no? We only try to add swap count continuation to the
> > first entry only, which is not sufficient to fix the problem.
> >
> > I can probably whip up the fallback logic here, but it would be dead,
> > untestable code (as it has no users, and I cannot even conceive one to
> > test it). And the swap abstraction might render all of this moot
> > anyway.
>
> What I had in mind is not returning -ENOMEM at all, but something like
> -EOPNOTSUPP. The swap_duplicate_nr() will just return the error to the
> caller. All callers of swap_duplicate() and swap_duplicate_nr()
> currently check the error except shmem.

..and just to be extra clear, I meant WARN _and_ return -EOPNOTSUPP.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  2:13           ` Yosry Ahmed
@ 2024-10-02 18:01             ` Nhat Pham
  2024-10-02 18:06               ` Yosry Ahmed
  0 siblings, 1 reply; 16+ messages in thread
From: Nhat Pham @ 2024-10-02 18:01 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Tue, Oct 1, 2024 at 7:14 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Oct 1, 2024 at 7:06 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Tue, Oct 1, 2024 at 7:04 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > >
> > > On Tue, Oct 1, 2024 at 6:58 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > I was debating between WARN-ing here, and returning -ENOMEM and
> > > > WARN-ing at shmem's callsite.
> > > >
> > > > My thinking is that if we return -ENOMEM here, it will work in the
> > > > current setup, for both shmem and other callsites. However, in the
> > > > future, if we add another user of swap_duplicate_nr(), this time
> > > > without guaranteeing that we won't need continuation, I think it won't
> > > > work unless we have the fallback logic in place as well:
> > > >
> > > > while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> > > > err = add_swap_count_continuation(entry, GFP_ATOMIC);
> > >
> > > Sorry, I accidentally sent out the email without completing my explanation :)
> > >
> > > Anyway, the point being, with the current implementation, any new user
> > > would immediately hit a WARN and the implementer will know to check.
> > >
> > > Whereas if we return -ENOMEM in __swap_duplicate(), then I think we
> > > would just hang, no? We only try to add swap count continuation to the
> > > first entry only, which is not sufficient to fix the problem.
> > >
> > > I can probably whip up the fallback logic here, but it would be dead,
> > > untestable code (as it has no users, and I cannot even conceive one to
> > > test it). And the swap abstraction might render all of this moot
> > > anyway.
> >
> > What I had in mind is not returning -ENOMEM at all, but something like
> > -EOPNOTSUPP. The swap_duplicate_nr() will just return the error to the
> > caller. All callers of swap_duplicate() and swap_duplicate_nr()
> > currently check the error except shmem.
>
> ..and just to be extra clear, I meant WARN _and_ return -EOPNOTSUPP.

Ah ok this makes a lot of sense actually.

I'll return -EOPNOTSUPP here. Do you think warn within
__swap_duplicate() makes more sense, or at shmem's callsite make more
sense?

I feel like we should warn within __swap_duplicate callsite. That way
if we accidentally screw up for other swap_duplicaters in the future,
the feedback will be immediate :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02 18:01             ` Nhat Pham
@ 2024-10-02 18:06               ` Yosry Ahmed
  0 siblings, 0 replies; 16+ messages in thread
From: Yosry Ahmed @ 2024-10-02 18:06 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Wed, Oct 2, 2024 at 11:01 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Oct 1, 2024 at 7:14 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Tue, Oct 1, 2024 at 7:06 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Tue, Oct 1, 2024 at 7:04 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 1, 2024 at 6:58 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 1, 2024 at 6:33 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > > >
> > > > > I was debating between WARN-ing here, and returning -ENOMEM and
> > > > > WARN-ing at shmem's callsite.
> > > > >
> > > > > My thinking is that if we return -ENOMEM here, it will work in the
> > > > > current setup, for both shmem and other callsites. However, in the
> > > > > future, if we add another user of swap_duplicate_nr(), this time
> > > > > without guaranteeing that we won't need continuation, I think it won't
> > > > > work unless we have the fallback logic in place as well:
> > > > >
> > > > > while (!err && __swap_duplicate(entry, 1, nr) == -ENOMEM)
> > > > > err = add_swap_count_continuation(entry, GFP_ATOMIC);
> > > >
> > > > Sorry, I accidentally sent out the email without completing my explanation :)
> > > >
> > > > Anyway, the point being, with the current implementation, any new user
> > > > would immediately hit a WARN and the implementer will know to check.
> > > >
> > > > Whereas if we return -ENOMEM in __swap_duplicate(), then I think we
> > > > would just hang, no? We only try to add swap count continuation to the
> > > > first entry only, which is not sufficient to fix the problem.
> > > >
> > > > I can probably whip up the fallback logic here, but it would be dead,
> > > > untestable code (as it has no users, and I cannot even conceive one to
> > > > test it). And the swap abstraction might render all of this moot
> > > > anyway.
> > >
> > > What I had in mind is not returning -ENOMEM at all, but something like
> > > -EOPNOTSUPP. The swap_duplicate_nr() will just return the error to the
> > > caller. All callers of swap_duplicate() and swap_duplicate_nr()
> > > currently check the error except shmem.
> >
> > ..and just to be extra clear, I meant WARN _and_ return -EOPNOTSUPP.
>
> Ah ok this makes a lot of sense actually.
>
> I'll return -EOPNOTSUPP here. Do you think warn within
> __swap_duplicate() makes more sense, or at shmem's callsite make more
> sense?
>
> I feel like we should warn within __swap_duplicate callsite. That way
> if we accidentally screw up for other swap_duplicaters in the future,
> the feedback will be immediate :)

I think we should warn in __swap_duplicate(). We can also propagate
the error from shmem_writepage() to the caller, but I think this may
need extra cleanup to be properly handled, didn't look too closely.

We can also warn in swap_duplicate_nr() if we ever reach the -ENOMEM
fallback code with nr > 1, and document there that the current
fallback logic does not handle this case (instead of documenting it
above the function). This will make sure we never return -ENOMEM from
__swap_duplicate() incorrectly.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/1] remove SWAP_MAP_SHMEM
  2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
  2024-10-02  1:25   ` Nhat Pham
@ 2024-10-08  9:27   ` Baolin Wang
  2024-10-10  8:53   ` Baolin Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Baolin Wang @ 2024-10-08  9:27 UTC (permalink / raw)
  To: Nhat Pham, akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel



On 2024/10/2 09:22, Nhat Pham wrote:
> On Tue, Oct 1, 2024 at 6:20 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> Changelog:
>> v2:
>>          * Fix a WARN in the shmem THP swap path. Thanks Baolin, Yosry, and Barry
>>          for the report and the discussion on how to solve it.
> 
> Hi Baolin, could you run your stress program again on this version?
> FWIW, I was able to trigger 64kb mTHP swap in shmem + no-fallback
> swapout, so I think the issue is fixed.

Sure. I'm just back from vacation, and will find some time this week to 
test it.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/1] remove SWAP_MAP_SHMEM
  2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
  2024-10-02  1:25   ` Nhat Pham
  2024-10-08  9:27   ` Baolin Wang
@ 2024-10-10  8:53   ` Baolin Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Baolin Wang @ 2024-10-10  8:53 UTC (permalink / raw)
  To: Nhat Pham, akpm
  Cc: hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts, ying.huang,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel



On 2024/10/2 09:22, Nhat Pham wrote:
> On Tue, Oct 1, 2024 at 6:20 PM Nhat Pham <nphamcs@gmail.com> wrote:
>>
>> Changelog:
>> v2:
>>          * Fix a WARN in the shmem THP swap path. Thanks Baolin, Yosry, and Barry
>>          for the report and the discussion on how to solve it.
> 
> Hi Baolin, could you run your stress program again on this version?
> FWIW, I was able to trigger 64kb mTHP swap in shmem + no-fallback
> swapout, so I think the issue is fixed.

I tested it on my platform, and no issues were found. So

Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-02  1:20 ` [PATCH v2 1/1] swap: shmem: " Nhat Pham
  2024-10-02  1:33   ` Yosry Ahmed
@ 2024-10-11  6:35   ` Huang, Ying
  2024-10-11 15:56     ` Nhat Pham
  1 sibling, 1 reply; 16+ messages in thread
From: Huang, Ying @ 2024-10-11  6:35 UTC (permalink / raw)
  To: Nhat Pham
  Cc: akpm, hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

Nhat Pham <nphamcs@gmail.com> writes:

[snip]

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0cded32414a1..9bb94e618914 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1381,12 +1381,6 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *si,
>  	if (usage == SWAP_HAS_CACHE) {
>  		VM_BUG_ON(!has_cache);
>  		has_cache = 0;
> -	} else if (count == SWAP_MAP_SHMEM) {
> -		/*
> -		 * Or we could insist on shmem.c using a special
> -		 * swap_shmem_free() and free_shmem_swap_and_cache()...
> -		 */
> -		count = 0;
>  	} else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) {
>  		if (count == COUNT_CONTINUED) {
>  			if (swap_count_continued(si, offset, count))
> @@ -3626,7 +3620,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>  
>  	offset = swp_offset(entry);
>  	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> -	VM_WARN_ON(usage == 1 && nr > 1);
>  	ci = lock_cluster_or_swap_info(si, offset);
>  
>  	err = 0;
> @@ -3652,6 +3645,13 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>  				err = -EEXIST;
>  		} else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
>  			err = -EINVAL;
> +		} else {
> +			/*
> +			 * The only swap_duplicate_nr() caller that passes nr > 1 is shmem,
> +			 * who never re-duplicates any swap entry it owns. So this should
> +			 * not happen.
> +			 */
> +			VM_WARN_ON(nr > 1 && (count & ~COUNT_CONTINUED) == SWAP_MAP_MAX);

Why not

	VM_WARN_ON_ONCE(nr > 1 && count);

?

IIUC, count == 0 is always true for shmem swap entry allocation.  Then
developers who use __swap_duplicate() with nr > 1 without noticing the
unsupported feature can get warning during development immediately.
"(count & ~COUNT_CONTINUED) == SWAP_MAP_MAX" is hard to be triggered
during common swap test.

>  		}
>  
>  		if (err)
> @@ -3686,27 +3686,28 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>  	return err;
>  }
  
[snip]

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] swap: shmem: remove SWAP_MAP_SHMEM
  2024-10-11  6:35   ` Huang, Ying
@ 2024-10-11 15:56     ` Nhat Pham
  0 siblings, 0 replies; 16+ messages in thread
From: Nhat Pham @ 2024-10-11 15:56 UTC (permalink / raw)
  To: Huang, Ying
  Cc: akpm, hannes, yosryahmed, hughd, shakeel.butt, ryan.roberts,
	chrisl, david, kasong, willy, viro, baohua, chengming.zhou,
	v-songbaohua, linux-mm, kernel-team, linux-kernel

On Fri, Oct 11, 2024 at 2:39 AM Huang, Ying <ying.huang@intel.com> wrote:
>
> Nhat Pham <nphamcs@gmail.com> writes:
>
> [snip]
>
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 0cded32414a1..9bb94e618914 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1381,12 +1381,6 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *si,
> >       if (usage == SWAP_HAS_CACHE) {
> >               VM_BUG_ON(!has_cache);
> >               has_cache = 0;
> > -     } else if (count == SWAP_MAP_SHMEM) {
> > -             /*
> > -              * Or we could insist on shmem.c using a special
> > -              * swap_shmem_free() and free_shmem_swap_and_cache()...
> > -              */
> > -             count = 0;
> >       } else if ((count & ~COUNT_CONTINUED) <= SWAP_MAP_MAX) {
> >               if (count == COUNT_CONTINUED) {
> >                       if (swap_count_continued(si, offset, count))
> > @@ -3626,7 +3620,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >
> >       offset = swp_offset(entry);
> >       VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> > -     VM_WARN_ON(usage == 1 && nr > 1);
> >       ci = lock_cluster_or_swap_info(si, offset);
> >
> >       err = 0;
> > @@ -3652,6 +3645,13 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >                               err = -EEXIST;
> >               } else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) {
> >                       err = -EINVAL;
> > +             } else {
> > +                     /*
> > +                      * The only swap_duplicate_nr() caller that passes nr > 1 is shmem,
> > +                      * who never re-duplicates any swap entry it owns. So this should
> > +                      * not happen.
> > +                      */
> > +                     VM_WARN_ON(nr > 1 && (count & ~COUNT_CONTINUED) == SWAP_MAP_MAX);
>
> Why not
>
>         VM_WARN_ON_ONCE(nr > 1 && count);
>
> ?
>
> IIUC, count == 0 is always true for shmem swap entry allocation.  Then
> developers who use __swap_duplicate() with nr > 1 without noticing the
> unsupported feature can get warning during development immediately.
> "(count & ~COUNT_CONTINUED) == SWAP_MAP_MAX" is hard to be triggered
> during common swap test.

Yeah honestly, I agree with this. Let's not try to be smart and make
provision for things that can't happen (and make it harder to catch
issues in the future).

Thanks for the suggestion, Ying. I'll just do this in next version.

>
> >               }
> >
> >               if (err)
> > @@ -3686,27 +3686,28 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
> >       return err;
> >  }
>
> [snip]
>
> --
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-10-11 15:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02  1:20 [PATCH v2 0/1] remove SWAP_MAP_SHMEM Nhat Pham
2024-10-02  1:20 ` [PATCH v2 1/1] swap: shmem: " Nhat Pham
2024-10-02  1:33   ` Yosry Ahmed
2024-10-02  1:58     ` Nhat Pham
2024-10-02  2:04       ` Nhat Pham
2024-10-02  2:06         ` Yosry Ahmed
2024-10-02  2:13           ` Yosry Ahmed
2024-10-02 18:01             ` Nhat Pham
2024-10-02 18:06               ` Yosry Ahmed
2024-10-02  2:11       ` Yosry Ahmed
2024-10-11  6:35   ` Huang, Ying
2024-10-11 15:56     ` Nhat Pham
2024-10-02  1:22 ` [PATCH v2 0/1] " Nhat Pham
2024-10-02  1:25   ` Nhat Pham
2024-10-08  9:27   ` Baolin Wang
2024-10-10  8:53   ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).