* [PATCH 0/4] Some randome fixes and cleanups to swapfile
@ 2025-05-22 12:25 Kemeng Shi
2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi
` (4 more replies)
0 siblings, 5 replies; 20+ messages in thread
From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw)
To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel
Patch 0-3 are some random fixes. Patch 4 is a cleanup. More details can
be found in respective patches. Thanks.
Kemeng Shi (4):
mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap()
to swap_range_alloc()
mm: swap: correctly use maxpages in swapon syscall to avoid potensial
deadloop
mm: swap: fix potensial buffer overflow in setup_clusters()
mm: swap: remove stale comment stale comment in
cluster_alloc_swap_entry()
mm/swapfile.c | 64 ++++++++++++++++++++++++---------------------------
1 file changed, 30 insertions(+), 34 deletions(-)
--
2.30.0
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi ` (3 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel When folio_alloc_swap() encounters a failure in either mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter is not decremented for allocated entry. However, the following put_swap_folio() will increase nr_swap_pages counter unpairly and lead to an imbalance. Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() to pair the nr_swap_pages counting. Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 026090bf3efe..75b69213c2e7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, if (vm_swap_full()) schedule_work(&si->reclaim_work); } + atomic_long_sub(nr_entries, &nr_swap_pages); } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) goto out_free; - atomic_long_sub(size, &nr_swap_pages); return 0; out_free: -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi @ 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Kairui Song @ 2025-05-22 3:55 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > When folio_alloc_swap() encounters a failure in either > mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter > is not decremented for allocated entry. However, the following > put_swap_folio() will increase nr_swap_pages counter unpairly and lead to > an imbalance. > > Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() > to pair the nr_swap_pages counting. > > Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 026090bf3efe..75b69213c2e7 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, > if (vm_swap_full()) > schedule_work(&si->reclaim_work); > } > + atomic_long_sub(nr_entries, &nr_swap_pages); > } > > static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) > if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) > goto out_free; > > - atomic_long_sub(size, &nr_swap_pages); > return 0; > > out_free: > -- > 2.30.0 Good catch! Moving the counter update to swap_range_alloc makes the logic much easier to follow. Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 3:55 ` Kairui Song @ 2025-05-30 1:31 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 1:31 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > When folio_alloc_swap() encounters a failure in either > mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter > is not decremented for allocated entry. However, the following > put_swap_folio() will increase nr_swap_pages counter unpairly and lead to > an imbalance. > > Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() > to pair the nr_swap_pages counting. > > Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) LGTM, Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 026090bf3efe..75b69213c2e7 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, > if (vm_swap_full()) > schedule_work(&si->reclaim_work); > } > + atomic_long_sub(nr_entries, &nr_swap_pages); > } > > static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) > if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) > goto out_free; > > - atomic_long_sub(size, &nr_swap_pages); > return 0; > > out_free: > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 17:08 ` Kairui Song 2025-05-30 2:50 ` Baoquan He 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi ` (2 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel We use maxpages from read_swap_header() to initialize swap_info_struct, however the maxpages might be reduced in setup_swap_extents() and the si->max is assigned with the reduced maxpages from the setup_swap_extents(). Obviously, this could lead to memory waste as we allocated memory based on larger maxpages, besides, this could lead to a potensial deadloop as following: 1) When calling setup_clusters() with larger maxpages, unavailable pages within range [si->max, larger maxpages) are not accounted with inc_cluster_info_page(). As a result, these pages are assumed available but can not be allocated. The cluster contains these pages can be moved to frag_clusters list after it's all available pages were allocated. 2) When the cluster mentioned in 1) is the only cluster in frag_clusters list, cluster_alloc_swap_entry() assume order 0 allocation will never failed and will enter a deadloop by keep trying to allocate page from the only cluster in frag_clusters which contains no actually available page. Call setup_swap_extents() to get the final maxpages before swap_info_struct initialization to fix the issue. Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- 1 file changed, 20 insertions(+), 27 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 75b69213c2e7..a82f4ebefca3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, return maxpages; } -static int setup_swap_map_and_extents(struct swap_info_struct *si, - union swap_header *swap_header, - unsigned char *swap_map, - unsigned long maxpages, - sector_t *span) +static int setup_swap_map(struct swap_info_struct *si, + union swap_header *swap_header, + unsigned char *swap_map, + unsigned long maxpages) { - unsigned int nr_good_pages; unsigned long i; - int nr_extents; - - nr_good_pages = maxpages - 1; /* omit header page */ + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ for (i = 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr = swap_header->info.badpages[i]; if (page_nr == 0 || page_nr > swap_header->info.last_page) return -EINVAL; if (page_nr < maxpages) { swap_map[page_nr] = SWAP_MAP_BAD; - nr_good_pages--; + si->pages--; } } - if (nr_good_pages) { - swap_map[0] = SWAP_MAP_BAD; - si->max = maxpages; - si->pages = nr_good_pages; - nr_extents = setup_swap_extents(si, span); - if (nr_extents < 0) - return nr_extents; - nr_good_pages = si->pages; - } - if (!nr_good_pages) { + if (!si->pages) { pr_warn("Empty swap-file\n"); return -EINVAL; } - return nr_extents; + return 0; } #define SWAP_CLUSTER_INFO_COLS \ @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * Mark unusable pages as unavailable. The clusters aren't * marked free yet, so no list operations are involved yet. * - * See setup_swap_map_and_extents(): header page, bad pages, + * See setup_swap_map(): header page, bad pages, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) goto bad_swap_unlock_inode; } + si->max = maxpages; + si->pages = maxpages - 1; + nr_extents = setup_swap_extents(si, &span); + if (nr_extents < 0) { + error = nr_extents; + goto bad_swap_unlock_inode; + } + maxpages = si->max; + /* OK, set up the swap map and apply the bad block list */ swap_map = vzalloc(maxpages); if (!swap_map) { @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, - maxpages, &span); - if (unlikely(nr_extents < 0)) { - error = nr_extents; + error = setup_swap_map(si, swap_header, swap_map, maxpages); + if (error) goto bad_swap_unlock_inode; - } /* * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi @ 2025-05-25 17:08 ` Kairui Song 2025-06-11 7:54 ` Kemeng Shi 2025-05-30 2:50 ` Baoquan He 1 sibling, 1 reply; 20+ messages in thread From: Kairui Song @ 2025-05-25 17:08 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > We use maxpages from read_swap_header() to initialize swap_info_struct, > however the maxpages might be reduced in setup_swap_extents() and the > si->max is assigned with the reduced maxpages from the > setup_swap_extents(). > > Obviously, this could lead to memory waste as we allocated memory based on > larger maxpages, besides, this could lead to a potensial deadloop as > following: > 1) When calling setup_clusters() with larger maxpages, unavailable pages > within range [si->max, larger maxpages) are not accounted with > inc_cluster_info_page(). As a result, these pages are assumed available > but can not be allocated. The cluster contains these pages can be moved > to frag_clusters list after it's all available pages were allocated. > 2) When the cluster mentioned in 1) is the only cluster in frag_clusters > list, cluster_alloc_swap_entry() assume order 0 allocation will never > failed and will enter a deadloop by keep trying to allocate page from the > only cluster in frag_clusters which contains no actually available page. > > Call setup_swap_extents() to get the final maxpages before swap_info_struct > initialization to fix the issue. > > Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 75b69213c2e7..a82f4ebefca3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, > return maxpages; > } > > -static int setup_swap_map_and_extents(struct swap_info_struct *si, > - union swap_header *swap_header, > - unsigned char *swap_map, > - unsigned long maxpages, > - sector_t *span) > +static int setup_swap_map(struct swap_info_struct *si, > + union swap_header *swap_header, > + unsigned char *swap_map, > + unsigned long maxpages) > { > - unsigned int nr_good_pages; > unsigned long i; > - int nr_extents; > - > - nr_good_pages = maxpages - 1; /* omit header page */ > > + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ > for (i = 0; i < swap_header->info.nr_badpages; i++) { > unsigned int page_nr = swap_header->info.badpages[i]; > if (page_nr == 0 || page_nr > swap_header->info.last_page) > return -EINVAL; > if (page_nr < maxpages) { > swap_map[page_nr] = SWAP_MAP_BAD; > - nr_good_pages--; > + si->pages--; > } > } > > - if (nr_good_pages) { > - swap_map[0] = SWAP_MAP_BAD; > - si->max = maxpages; > - si->pages = nr_good_pages; > - nr_extents = setup_swap_extents(si, span); > - if (nr_extents < 0) > - return nr_extents; > - nr_good_pages = si->pages; > - } > - if (!nr_good_pages) { > + if (!si->pages) { > pr_warn("Empty swap-file\n"); > return -EINVAL; > } > > > - return nr_extents; > + return 0; > } > > #define SWAP_CLUSTER_INFO_COLS \ > @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * Mark unusable pages as unavailable. The clusters aren't > * marked free yet, so no list operations are involved yet. > * > - * See setup_swap_map_and_extents(): header page, bad pages, > + * See setup_swap_map(): header page, bad pages, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > goto bad_swap_unlock_inode; > } > > + si->max = maxpages; > + si->pages = maxpages - 1; > + nr_extents = setup_swap_extents(si, &span); > + if (nr_extents < 0) { > + error = nr_extents; > + goto bad_swap_unlock_inode; > + } > + maxpages = si->max; There seems to be a trivial problem here, previously the si->pages will be seen by swap_activate after bad blocks have been counted and si->pages means the actual available slots. But now si->pages will be seen by swap_active as `maxpages - 1`. One current side effect now is the span value will not be updated properly so the pr_info in swap on may print a larger value, if the swap header contains badblocks and swapfile is on nfs/cifs. This should not be a problem but it's better to mention or add comments about it. And I think it's better to add a sanity check here to check if si->pages still equal to si->max - 1, setup_swap_map_and_extents / setup_swap_map assumes the header section was already counted. This also helps indicate the setup_swap_extents may shrink and modify these two values. BTW, I was thinking that we should get rid of the whole extents design after the swap table series is ready, so mTHP allocation will be usable for swap over fs too. > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, > - maxpages, &span); > - if (unlikely(nr_extents < 0)) { > - error = nr_extents; > + error = setup_swap_map(si, swap_header, swap_map, maxpages); > + if (error) > goto bad_swap_unlock_inode; > - } > > /* > * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might > -- > 2.30.0 > Other than that: Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-25 17:08 ` Kairui Song @ 2025-06-11 7:54 ` Kemeng Shi 2025-07-17 23:21 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 7:54 UTC (permalink / raw) To: Kairui Song; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel on 5/26/2025 1:08 AM, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: >> >> We use maxpages from read_swap_header() to initialize swap_info_struct, >> however the maxpages might be reduced in setup_swap_extents() and the >> si->max is assigned with the reduced maxpages from the >> setup_swap_extents(). >> >> Obviously, this could lead to memory waste as we allocated memory based on >> larger maxpages, besides, this could lead to a potensial deadloop as >> following: >> 1) When calling setup_clusters() with larger maxpages, unavailable pages >> within range [si->max, larger maxpages) are not accounted with >> inc_cluster_info_page(). As a result, these pages are assumed available >> but can not be allocated. The cluster contains these pages can be moved >> to frag_clusters list after it's all available pages were allocated. >> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters >> list, cluster_alloc_swap_entry() assume order 0 allocation will never >> failed and will enter a deadloop by keep trying to allocate page from the >> only cluster in frag_clusters which contains no actually available page. >> >> Call setup_swap_extents() to get the final maxpages before swap_info_struct >> initialization to fix the issue. >> >> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") >> >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- >> 1 file changed, 20 insertions(+), 27 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 75b69213c2e7..a82f4ebefca3 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, >> return maxpages; >> } >> >> -static int setup_swap_map_and_extents(struct swap_info_struct *si, >> - union swap_header *swap_header, >> - unsigned char *swap_map, >> - unsigned long maxpages, >> - sector_t *span) >> +static int setup_swap_map(struct swap_info_struct *si, >> + union swap_header *swap_header, >> + unsigned char *swap_map, >> + unsigned long maxpages) >> { >> - unsigned int nr_good_pages; >> unsigned long i; >> - int nr_extents; >> - >> - nr_good_pages = maxpages - 1; /* omit header page */ >> >> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ >> for (i = 0; i < swap_header->info.nr_badpages; i++) { >> unsigned int page_nr = swap_header->info.badpages[i]; >> if (page_nr == 0 || page_nr > swap_header->info.last_page) >> return -EINVAL; >> if (page_nr < maxpages) { >> swap_map[page_nr] = SWAP_MAP_BAD; >> - nr_good_pages--; >> + si->pages--; >> } >> } >> >> - if (nr_good_pages) { >> - swap_map[0] = SWAP_MAP_BAD; >> - si->max = maxpages; >> - si->pages = nr_good_pages; >> - nr_extents = setup_swap_extents(si, span); >> - if (nr_extents < 0) >> - return nr_extents; >> - nr_good_pages = si->pages; >> - } >> - if (!nr_good_pages) { >> + if (!si->pages) { >> pr_warn("Empty swap-file\n"); >> return -EINVAL; >> } >> >> >> - return nr_extents; >> + return 0; >> } >> >> #define SWAP_CLUSTER_INFO_COLS \ >> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * Mark unusable pages as unavailable. The clusters aren't >> * marked free yet, so no list operations are involved yet. >> * >> - * See setup_swap_map_and_extents(): header page, bad pages, >> + * See setup_swap_map(): header page, bad pages, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> goto bad_swap_unlock_inode; >> } >> >> + si->max = maxpages; >> + si->pages = maxpages - 1; >> + nr_extents = setup_swap_extents(si, &span); >> + if (nr_extents < 0) { >> + error = nr_extents; >> + goto bad_swap_unlock_inode; >> + } >> + maxpages = si->max; > Hello, > There seems to be a trivial problem here, previously the si->pages > will be seen by swap_activate after bad blocks have been counted and > si->pages means the actual available slots. But now si->pages will be > seen by swap_active as `maxpages - 1`. > > One current side effect now is the span value will not be updated > properly so the pr_info in swap on may print a larger value, if the > swap header contains badblocks and swapfile is on nfs/cifs. Thanks for point this out. But I think the larger value is actually correct result. In summary, there are two kinds of swapfile_activate operations. 1. Filesystem style: Treat all blocks logical continuity and find useable physical extents in logical range. In this way, si->pages will be actual useable physical blocks and span will be "1 + highest_block - lowest_block". 2. Block device style: Treat all blocks physically continue and only one single extent is added. In this way, si->pages will be si->max and span will be "si->pages - 1". Actually, si->pages and si->max is only used in block device style and span value is set with si->pages. As a result, span value in block device style will become a larger value as you mentioned. I think larger value is correct based on: 1. Span value in filesystem style is "1 + highest_block - lowest_block" which is the range cover all possible phisical blocks including the badblocks. 2. For block device style, si->pages is the actual useable block number and is already in pr_info. The orignal span value before this patch is also refer to useable block number which is redundant in pr_info. > > This should not be a problem but it's better to mention or add > comments about it I'd like to mention this change as a fix in changelog in next version. > > And I think it's better to add a sanity check here to check if > si->pages still equal to si->max - 1, setup_swap_map_and_extents / > setup_swap_map assumes the header section was already counted. This > also helps indicate the setup_swap_extents may shrink and modify these > two values. Sure, will add this in next version. > > BTW, I was thinking that we should get rid of the whole extents design > after the swap table series is ready, so mTHP allocation will be > usable for swap over fs too. I also noticed this limitation but have not taken a deep look. Look forward to your solution in future. > >> /* OK, set up the swap map and apply the bad block list */ >> swap_map = vzalloc(maxpages); >> if (!swap_map) { >> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> if (error) >> goto bad_swap_unlock_inode; >> >> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, >> - maxpages, &span); >> - if (unlikely(nr_extents < 0)) { >> - error = nr_extents; >> + error = setup_swap_map(si, swap_header, swap_map, maxpages); >> + if (error) >> goto bad_swap_unlock_inode; >> - } >> >> /* >> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might >> -- >> 2.30.0 >> > > Other than that: > > Reviewed-by: Kairui Song <kasong@tencent.com> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-06-11 7:54 ` Kemeng Shi @ 2025-07-17 23:21 ` Andrew Morton 2025-07-18 6:12 ` Kemeng Shi 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2025-07-17 23:21 UTC (permalink / raw) To: Kemeng Shi; +Cc: Kairui Song, bhe, hannes, linux-mm, linux-kernel On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > on 5/26/2025 1:08 AM, Kairui Song wrote: Nearly two months! > Sure, will add this in next version. Do we actually need a new version? Having rescanned the v1 review I'm inclined to merge this series as-is? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-07-17 23:21 ` Andrew Morton @ 2025-07-18 6:12 ` Kemeng Shi 0 siblings, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-07-18 6:12 UTC (permalink / raw) To: Andrew Morton; +Cc: Kairui Song, bhe, hannes, linux-mm, linux-kernel on 7/18/2025 7:21 AM, Andrew Morton wrote: > On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > >> >> >> on 5/26/2025 1:08 AM, Kairui Song wrote: > > Nearly two months! > >> Sure, will add this in next version. > > Do we actually need a new version? Having rescanned the v1 review I'm > inclined to merge this series as-is? > > So sorry for the late. I will write and send a v2 version soon. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi 2025-05-25 17:08 ` Kairui Song @ 2025-05-30 2:50 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 1 reply; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:50 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > We use maxpages from read_swap_header() to initialize swap_info_struct, > however the maxpages might be reduced in setup_swap_extents() and the > si->max is assigned with the reduced maxpages from the > setup_swap_extents(). > Obviously, this could lead to memory waste as we allocated memory based on > larger maxpages, besides, this could lead to a potensial deadloop as ^ typo, potential > following: > 1) When calling setup_clusters() with larger maxpages, unavailable pages > within range [si->max, larger maxpages) are not accounted with > inc_cluster_info_page(). As a result, these pages are assumed available > but can not be allocated. The cluster contains these pages can be moved > to frag_clusters list after it's all available pages were allocated. > 2) When the cluster mentioned in 1) is the only cluster in frag_clusters > list, cluster_alloc_swap_entry() assume order 0 allocation will never > failed and will enter a deadloop by keep trying to allocate page from the > only cluster in frag_clusters which contains no actually available page. > > Call setup_swap_extents() to get the final maxpages before swap_info_struct > initialization to fix the issue. > > Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) Reviedwed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 75b69213c2e7..a82f4ebefca3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, > return maxpages; > } > > -static int setup_swap_map_and_extents(struct swap_info_struct *si, > - union swap_header *swap_header, > - unsigned char *swap_map, > - unsigned long maxpages, > - sector_t *span) > +static int setup_swap_map(struct swap_info_struct *si, > + union swap_header *swap_header, > + unsigned char *swap_map, > + unsigned long maxpages) > { > - unsigned int nr_good_pages; > unsigned long i; > - int nr_extents; > - > - nr_good_pages = maxpages - 1; /* omit header page */ > > + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ > for (i = 0; i < swap_header->info.nr_badpages; i++) { > unsigned int page_nr = swap_header->info.badpages[i]; > if (page_nr == 0 || page_nr > swap_header->info.last_page) > return -EINVAL; > if (page_nr < maxpages) { > swap_map[page_nr] = SWAP_MAP_BAD; > - nr_good_pages--; > + si->pages--; > } > } > > - if (nr_good_pages) { > - swap_map[0] = SWAP_MAP_BAD; > - si->max = maxpages; > - si->pages = nr_good_pages; > - nr_extents = setup_swap_extents(si, span); > - if (nr_extents < 0) > - return nr_extents; > - nr_good_pages = si->pages; > - } > - if (!nr_good_pages) { > + if (!si->pages) { > pr_warn("Empty swap-file\n"); > return -EINVAL; > } > > - return nr_extents; > + return 0; > } > > #define SWAP_CLUSTER_INFO_COLS \ > @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * Mark unusable pages as unavailable. The clusters aren't > * marked free yet, so no list operations are involved yet. > * > - * See setup_swap_map_and_extents(): header page, bad pages, > + * See setup_swap_map(): header page, bad pages, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > goto bad_swap_unlock_inode; > } > > + si->max = maxpages; > + si->pages = maxpages - 1; > + nr_extents = setup_swap_extents(si, &span); > + if (nr_extents < 0) { > + error = nr_extents; > + goto bad_swap_unlock_inode; > + } > + maxpages = si->max; > + > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, > - maxpages, &span); > - if (unlikely(nr_extents < 0)) { > - error = nr_extents; > + error = setup_swap_map(si, swap_header, swap_map, maxpages); > + if (error) > goto bad_swap_unlock_inode; > - } > > /* > * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-30 2:50 ` Baoquan He @ 2025-06-11 8:27 ` Kemeng Shi 0 siblings, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 8:27 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel on 5/30/2025 10:50 AM, Baoquan He wrote: > On 05/22/25 at 08:25pm, Kemeng Shi wrote: >> We use maxpages from read_swap_header() to initialize swap_info_struct, >> however the maxpages might be reduced in setup_swap_extents() and the >> si->max is assigned with the reduced maxpages from the >> setup_swap_extents(). >> Obviously, this could lead to memory waste as we allocated memory based on >> larger maxpages, besides, this could lead to a potensial deadloop as > ^ typo, potential Thanks, will fix this in next version. >> following: >> 1) When calling setup_clusters() with larger maxpages, unavailable pages >> within range [si->max, larger maxpages) are not accounted with >> inc_cluster_info_page(). As a result, these pages are assumed available >> but can not be allocated. The cluster contains these pages can be moved >> to frag_clusters list after it's all available pages were allocated. >> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters >> list, cluster_alloc_swap_entry() assume order 0 allocation will never >> failed and will enter a deadloop by keep trying to allocate page from the >> only cluster in frag_clusters which contains no actually available page. >> >> Call setup_swap_extents() to get the final maxpages before swap_info_struct >> initialization to fix the issue. >> >> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- >> 1 file changed, 20 insertions(+), 27 deletions(-) > > Reviedwed-by: Baoquan He <bhe@redhat.com> > >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 75b69213c2e7..a82f4ebefca3 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, >> return maxpages; >> } >> >> -static int setup_swap_map_and_extents(struct swap_info_struct *si, >> - union swap_header *swap_header, >> - unsigned char *swap_map, >> - unsigned long maxpages, >> - sector_t *span) >> +static int setup_swap_map(struct swap_info_struct *si, >> + union swap_header *swap_header, >> + unsigned char *swap_map, >> + unsigned long maxpages) >> { >> - unsigned int nr_good_pages; >> unsigned long i; >> - int nr_extents; >> - >> - nr_good_pages = maxpages - 1; /* omit header page */ >> >> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ >> for (i = 0; i < swap_header->info.nr_badpages; i++) { >> unsigned int page_nr = swap_header->info.badpages[i]; >> if (page_nr == 0 || page_nr > swap_header->info.last_page) >> return -EINVAL; >> if (page_nr < maxpages) { >> swap_map[page_nr] = SWAP_MAP_BAD; >> - nr_good_pages--; >> + si->pages--; >> } >> } >> >> - if (nr_good_pages) { >> - swap_map[0] = SWAP_MAP_BAD; >> - si->max = maxpages; >> - si->pages = nr_good_pages; >> - nr_extents = setup_swap_extents(si, span); >> - if (nr_extents < 0) >> - return nr_extents; >> - nr_good_pages = si->pages; >> - } >> - if (!nr_good_pages) { >> + if (!si->pages) { >> pr_warn("Empty swap-file\n"); >> return -EINVAL; >> } >> >> - return nr_extents; >> + return 0; >> } >> >> #define SWAP_CLUSTER_INFO_COLS \ >> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * Mark unusable pages as unavailable. The clusters aren't >> * marked free yet, so no list operations are involved yet. >> * >> - * See setup_swap_map_and_extents(): header page, bad pages, >> + * See setup_swap_map(): header page, bad pages, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> goto bad_swap_unlock_inode; >> } >> >> + si->max = maxpages; >> + si->pages = maxpages - 1; >> + nr_extents = setup_swap_extents(si, &span); >> + if (nr_extents < 0) { >> + error = nr_extents; >> + goto bad_swap_unlock_inode; >> + } >> + maxpages = si->max; >> + >> /* OK, set up the swap map and apply the bad block list */ >> swap_map = vzalloc(maxpages); >> if (!swap_map) { >> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> if (error) >> goto bad_swap_unlock_inode; >> >> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, >> - maxpages, &span); >> - if (unlikely(nr_extents < 0)) { >> - error = nr_extents; >> + error = setup_swap_map(si, swap_header, swap_map, maxpages); >> + if (error) >> goto bad_swap_unlock_inode; >> - } >> >> /* >> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might >> -- >> 2.30.0 >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:56 ` Baoquan He 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel In setup_swap_map(), we only ensure badpages are in range (0, last_page]. As maxpages might be < last_page, setup_clusters() will encounter a buffer overflow when a badpage is >= maxpages. Only call inc_cluster_info_page() for badpage which is < maxpages to fix the issue. Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index a82f4ebefca3..63ab9f14b2c6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); - for (i = 0; i < swap_header->info.nr_badpages; i++) - inc_cluster_info_page(si, cluster_info, - swap_header->info.badpages[i]); + for (i = 0; i < swap_header->info.nr_badpages; i++) { + unsigned int page_nr = swap_header->info.badpages[i]; + + if (page_nr >= maxpages) + continue; + inc_cluster_info_page(si, cluster_info, page_nr); + } for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) inc_cluster_info_page(si, cluster_info, i); -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi @ 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-30 2:56 ` Baoquan He 1 sibling, 2 replies; 20+ messages in thread From: Kairui Song @ 2025-05-25 18:44 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > As maxpages might be < last_page, setup_clusters() will encounter a > buffer overflow when a badpage is >= maxpages. > Only call inc_cluster_info_page() for badpage which is < maxpages to > fix the issue. > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index a82f4ebefca3..63ab9f14b2c6 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > - for (i = 0; i < swap_header->info.nr_badpages; i++) > - inc_cluster_info_page(si, cluster_info, > - swap_header->info.badpages[i]); > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > + unsigned int page_nr = swap_header->info.badpages[i]; > + > + if (page_nr >= maxpages) > + continue; > + inc_cluster_info_page(si, cluster_info, page_nr); I think we might need a pr_err or pr_warn here, this means mkswap marked the wrong region as a bad block? Or some fs side things went wrong. > + } > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > inc_cluster_info_page(si, cluster_info, i); > > -- > 2.30.0 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-25 18:44 ` Kairui Song @ 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:55 UTC (permalink / raw) To: Kairui Song; +Cc: Kemeng Shi, akpm, hannes, linux-mm, linux-kernel On 05/26/25 at 02:44am, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > > As maxpages might be < last_page, setup_clusters() will encounter a > > buffer overflow when a badpage is >= maxpages. > > Only call inc_cluster_info_page() for badpage which is < maxpages to > > fix the issue. > > > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > > --- > > mm/swapfile.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index a82f4ebefca3..63ab9f14b2c6 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > > * and the EOF part of the last cluster. > > */ > > inc_cluster_info_page(si, cluster_info, 0); > > - for (i = 0; i < swap_header->info.nr_badpages; i++) > > - inc_cluster_info_page(si, cluster_info, > > - swap_header->info.badpages[i]); > > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > > + unsigned int page_nr = swap_header->info.badpages[i]; > > + > > + if (page_nr >= maxpages) > > + continue; > > + inc_cluster_info_page(si, cluster_info, page_nr); > > I think we might need a pr_err or pr_warn here, this means mkswap > marked the wrong region as a bad block? Or some fs side things went > wrong. There's aready warning in read_swap_header(): static unsigned long read_swap_header(struct swap_info_struct *si, union swap_header *swap_header, struct inode *inode) { ...... if (last_page > maxpages) { pr_warn("Truncating oversized swap area, only using %luk out of %luk\n", K(maxpages), K(last_page)); } ... } And if we add pr_err|warn here, we also need add it in setup_swap_map() when filling swap_map. > > > > + } > > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > > inc_cluster_info_page(si, cluster_info, i); > > > > -- > > 2.30.0 > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He @ 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 8:27 UTC (permalink / raw) To: Kairui Song; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel on 5/26/2025 2:44 AM, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: >> >> In setup_swap_map(), we only ensure badpages are in range (0, last_page]. >> As maxpages might be < last_page, setup_clusters() will encounter a >> buffer overflow when a badpage is >= maxpages. >> Only call inc_cluster_info_page() for badpage which is < maxpages to >> fix the issue. >> >> Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index a82f4ebefca3..63ab9f14b2c6 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> - for (i = 0; i < swap_header->info.nr_badpages; i++) >> - inc_cluster_info_page(si, cluster_info, >> - swap_header->info.badpages[i]); >> + for (i = 0; i < swap_header->info.nr_badpages; i++) { >> + unsigned int page_nr = swap_header->info.badpages[i]; >> + >> + if (page_nr >= maxpages) >> + continue; >> + inc_cluster_info_page(si, cluster_info, page_nr); > > I think we might need a pr_err or pr_warn here, this means mkswap > marked the wrong region as a bad block? Or some fs side things went > wrong. As Baoquan metioned that there is already warning in read_swap_header(). Besides, I think last_page in swap_header already indicates the range which is acceptable to fs and we only need to explicitly handle page_nr which is > last_page. > > >> + } >> for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) >> inc_cluster_info_page(si, cluster_info, i); >> >> -- >> 2.30.0 >> >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi 2025-05-25 18:44 ` Kairui Song @ 2025-05-30 2:56 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:56 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > As maxpages might be < last_page, setup_clusters() will encounter a > buffer overflow when a badpage is >= maxpages. > Only call inc_cluster_info_page() for badpage which is < maxpages to > fix the issue. > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index a82f4ebefca3..63ab9f14b2c6 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > - for (i = 0; i < swap_header->info.nr_badpages; i++) > - inc_cluster_info_page(si, cluster_info, > - swap_header->info.badpages[i]); > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > + unsigned int page_nr = swap_header->info.badpages[i]; > + > + if (page_nr >= maxpages) > + continue; > + inc_cluster_info_page(si, cluster_info, page_nr); > + } > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > inc_cluster_info_page(si, cluster_info, i); > > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi ` (2 preceding siblings ...) 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel As cluster_next_cpu was already dropped, the associated comment is stale now. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 63ab9f14b2c6..8525515fb06c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } /* - * We don't have free cluster but have some clusters in - * discarding, do discard now and reclaim them, then - * reread cluster_next_cpu since we dropped si->lock + * We don't have free cluster but have some clusters in discarding, + * do discard now and reclaim them. */ if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) goto new_cluster; -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi @ 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Kairui Song @ 2025-05-25 17:05 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > As cluster_next_cpu was already dropped, the associated comment is stale > now. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 63ab9f14b2c6..8525515fb06c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > } > > /* > - * We don't have free cluster but have some clusters in > - * discarding, do discard now and reclaim them, then > - * reread cluster_next_cpu since we dropped si->lock > + * We don't have free cluster but have some clusters in discarding, > + * do discard now and reclaim them. > */ > if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) > goto new_cluster; > -- > 2.30.0 > Nice. Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-25 17:05 ` Kairui Song @ 2025-05-30 5:24 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 5:24 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > As cluster_next_cpu was already dropped, the associated comment is stale > now. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 63ab9f14b2c6..8525515fb06c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > } > > /* > - * We don't have free cluster but have some clusters in > - * discarding, do discard now and reclaim them, then > - * reread cluster_next_cpu since we dropped si->lock > + * We don't have free cluster but have some clusters in discarding, > + * do discard now and reclaim them. > */ > if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) > goto new_cluster; > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] Some randome fixes and cleanups to swapfile 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi ` (3 preceding siblings ...) 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi @ 2025-05-22 21:41 ` Andrew Morton 4 siblings, 0 replies; 20+ messages in thread From: Andrew Morton @ 2025-05-22 21:41 UTC (permalink / raw) To: Kemeng Shi; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel On Thu, 22 May 2025 20:25:50 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > Patch 0-3 are some random fixes. Patch 4 is a cleanup. More details can > be found in respective patches. Thanks. Cool. I'll add these to mm-new but I won't advance them to mm-unstable until after the merge window, so we have 2+ weeks for review and test. I added cc:stable to patches 1-3, so they'll eventually find their way into earlier kernels. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-07-18 6:13 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi 2025-05-25 17:08 ` Kairui Song 2025-06-11 7:54 ` Kemeng Shi 2025-07-17 23:21 ` Andrew Morton 2025-07-18 6:12 ` Kemeng Shi 2025-05-30 2:50 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-30 2:56 ` Baoquan He 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.