* [PATCH 0/4] Some randome fixes and cleanups to swapfile @ 2025-05-22 12:25 Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi ` (4 more replies) 0 siblings, 5 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel Patch 0-3 are some random fixes. Patch 4 is a cleanup. More details can be found in respective patches. Thanks. Kemeng Shi (4): mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop mm: swap: fix potensial buffer overflow in setup_clusters() mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() mm/swapfile.c | 64 ++++++++++++++++++++++++--------------------------- 1 file changed, 30 insertions(+), 34 deletions(-) -- 2.30.0 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi ` (3 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel When folio_alloc_swap() encounters a failure in either mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter is not decremented for allocated entry. However, the following put_swap_folio() will increase nr_swap_pages counter unpairly and lead to an imbalance. Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() to pair the nr_swap_pages counting. Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 026090bf3efe..75b69213c2e7 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, if (vm_swap_full()) schedule_work(&si->reclaim_work); } + atomic_long_sub(nr_entries, &nr_swap_pages); } static void swap_range_free(struct swap_info_struct *si, unsigned long offset, @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) goto out_free; - atomic_long_sub(size, &nr_swap_pages); return 0; out_free: -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi @ 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Kairui Song @ 2025-05-22 3:55 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > When folio_alloc_swap() encounters a failure in either > mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter > is not decremented for allocated entry. However, the following > put_swap_folio() will increase nr_swap_pages counter unpairly and lead to > an imbalance. > > Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() > to pair the nr_swap_pages counting. > > Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 026090bf3efe..75b69213c2e7 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, > if (vm_swap_full()) > schedule_work(&si->reclaim_work); > } > + atomic_long_sub(nr_entries, &nr_swap_pages); > } > > static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) > if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) > goto out_free; > > - atomic_long_sub(size, &nr_swap_pages); > return 0; > > out_free: > -- > 2.30.0 Good catch! Moving the counter update to swap_range_alloc makes the logic much easier to follow. Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 3:55 ` Kairui Song @ 2025-05-30 1:31 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 1:31 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > When folio_alloc_swap() encounters a failure in either > mem_cgroup_try_charge_swap() or add_to_swap_cache(), nr_swap_pages counter > is not decremented for allocated entry. However, the following > put_swap_folio() will increase nr_swap_pages counter unpairly and lead to > an imbalance. > > Move nr_swap_pages decrement from folio_alloc_swap() to swap_range_alloc() > to pair the nr_swap_pages counting. > > Fixes: 0ff67f990bd45 ("mm, swap: remove swap slot cache") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) LGTM, Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 026090bf3efe..75b69213c2e7 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1115,6 +1115,7 @@ static void swap_range_alloc(struct swap_info_struct *si, > if (vm_swap_full()) > schedule_work(&si->reclaim_work); > } > + atomic_long_sub(nr_entries, &nr_swap_pages); > } > > static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > @@ -1313,7 +1314,6 @@ int folio_alloc_swap(struct folio *folio, gfp_t gfp) > if (add_to_swap_cache(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) > goto out_free; > > - atomic_long_sub(size, &nr_swap_pages); > return 0; > > out_free: > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 17:08 ` Kairui Song 2025-05-30 2:50 ` Baoquan He 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi ` (2 subsequent siblings) 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel We use maxpages from read_swap_header() to initialize swap_info_struct, however the maxpages might be reduced in setup_swap_extents() and the si->max is assigned with the reduced maxpages from the setup_swap_extents(). Obviously, this could lead to memory waste as we allocated memory based on larger maxpages, besides, this could lead to a potensial deadloop as following: 1) When calling setup_clusters() with larger maxpages, unavailable pages within range [si->max, larger maxpages) are not accounted with inc_cluster_info_page(). As a result, these pages are assumed available but can not be allocated. The cluster contains these pages can be moved to frag_clusters list after it's all available pages were allocated. 2) When the cluster mentioned in 1) is the only cluster in frag_clusters list, cluster_alloc_swap_entry() assume order 0 allocation will never failed and will enter a deadloop by keep trying to allocate page from the only cluster in frag_clusters which contains no actually available page. Call setup_swap_extents() to get the final maxpages before swap_info_struct initialization to fix the issue. Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- 1 file changed, 20 insertions(+), 27 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 75b69213c2e7..a82f4ebefca3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, return maxpages; } -static int setup_swap_map_and_extents(struct swap_info_struct *si, - union swap_header *swap_header, - unsigned char *swap_map, - unsigned long maxpages, - sector_t *span) +static int setup_swap_map(struct swap_info_struct *si, + union swap_header *swap_header, + unsigned char *swap_map, + unsigned long maxpages) { - unsigned int nr_good_pages; unsigned long i; - int nr_extents; - - nr_good_pages = maxpages - 1; /* omit header page */ + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ for (i = 0; i < swap_header->info.nr_badpages; i++) { unsigned int page_nr = swap_header->info.badpages[i]; if (page_nr == 0 || page_nr > swap_header->info.last_page) return -EINVAL; if (page_nr < maxpages) { swap_map[page_nr] = SWAP_MAP_BAD; - nr_good_pages--; + si->pages--; } } - if (nr_good_pages) { - swap_map[0] = SWAP_MAP_BAD; - si->max = maxpages; - si->pages = nr_good_pages; - nr_extents = setup_swap_extents(si, span); - if (nr_extents < 0) - return nr_extents; - nr_good_pages = si->pages; - } - if (!nr_good_pages) { + if (!si->pages) { pr_warn("Empty swap-file\n"); return -EINVAL; } - return nr_extents; + return 0; } #define SWAP_CLUSTER_INFO_COLS \ @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * Mark unusable pages as unavailable. The clusters aren't * marked free yet, so no list operations are involved yet. * - * See setup_swap_map_and_extents(): header page, bad pages, + * See setup_swap_map(): header page, bad pages, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) goto bad_swap_unlock_inode; } + si->max = maxpages; + si->pages = maxpages - 1; + nr_extents = setup_swap_extents(si, &span); + if (nr_extents < 0) { + error = nr_extents; + goto bad_swap_unlock_inode; + } + maxpages = si->max; + /* OK, set up the swap map and apply the bad block list */ swap_map = vzalloc(maxpages); if (!swap_map) { @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap_unlock_inode; - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, - maxpages, &span); - if (unlikely(nr_extents < 0)) { - error = nr_extents; + error = setup_swap_map(si, swap_header, swap_map, maxpages); + if (error) goto bad_swap_unlock_inode; - } /* * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi @ 2025-05-25 17:08 ` Kairui Song 2025-06-11 7:54 ` Kemeng Shi 2025-05-30 2:50 ` Baoquan He 1 sibling, 1 reply; 20+ messages in thread From: Kairui Song @ 2025-05-25 17:08 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > We use maxpages from read_swap_header() to initialize swap_info_struct, > however the maxpages might be reduced in setup_swap_extents() and the > si->max is assigned with the reduced maxpages from the > setup_swap_extents(). > > Obviously, this could lead to memory waste as we allocated memory based on > larger maxpages, besides, this could lead to a potensial deadloop as > following: > 1) When calling setup_clusters() with larger maxpages, unavailable pages > within range [si->max, larger maxpages) are not accounted with > inc_cluster_info_page(). As a result, these pages are assumed available > but can not be allocated. The cluster contains these pages can be moved > to frag_clusters list after it's all available pages were allocated. > 2) When the cluster mentioned in 1) is the only cluster in frag_clusters > list, cluster_alloc_swap_entry() assume order 0 allocation will never > failed and will enter a deadloop by keep trying to allocate page from the > only cluster in frag_clusters which contains no actually available page. > > Call setup_swap_extents() to get the final maxpages before swap_info_struct > initialization to fix the issue. > > Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 75b69213c2e7..a82f4ebefca3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, > return maxpages; > } > > -static int setup_swap_map_and_extents(struct swap_info_struct *si, > - union swap_header *swap_header, > - unsigned char *swap_map, > - unsigned long maxpages, > - sector_t *span) > +static int setup_swap_map(struct swap_info_struct *si, > + union swap_header *swap_header, > + unsigned char *swap_map, > + unsigned long maxpages) > { > - unsigned int nr_good_pages; > unsigned long i; > - int nr_extents; > - > - nr_good_pages = maxpages - 1; /* omit header page */ > > + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ > for (i = 0; i < swap_header->info.nr_badpages; i++) { > unsigned int page_nr = swap_header->info.badpages[i]; > if (page_nr == 0 || page_nr > swap_header->info.last_page) > return -EINVAL; > if (page_nr < maxpages) { > swap_map[page_nr] = SWAP_MAP_BAD; > - nr_good_pages--; > + si->pages--; > } > } > > - if (nr_good_pages) { > - swap_map[0] = SWAP_MAP_BAD; > - si->max = maxpages; > - si->pages = nr_good_pages; > - nr_extents = setup_swap_extents(si, span); > - if (nr_extents < 0) > - return nr_extents; > - nr_good_pages = si->pages; > - } > - if (!nr_good_pages) { > + if (!si->pages) { > pr_warn("Empty swap-file\n"); > return -EINVAL; > } > > > - return nr_extents; > + return 0; > } > > #define SWAP_CLUSTER_INFO_COLS \ > @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * Mark unusable pages as unavailable. The clusters aren't > * marked free yet, so no list operations are involved yet. > * > - * See setup_swap_map_and_extents(): header page, bad pages, > + * See setup_swap_map(): header page, bad pages, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > goto bad_swap_unlock_inode; > } > > + si->max = maxpages; > + si->pages = maxpages - 1; > + nr_extents = setup_swap_extents(si, &span); > + if (nr_extents < 0) { > + error = nr_extents; > + goto bad_swap_unlock_inode; > + } > + maxpages = si->max; There seems to be a trivial problem here, previously the si->pages will be seen by swap_activate after bad blocks have been counted and si->pages means the actual available slots. But now si->pages will be seen by swap_active as `maxpages - 1`. One current side effect now is the span value will not be updated properly so the pr_info in swap on may print a larger value, if the swap header contains badblocks and swapfile is on nfs/cifs. This should not be a problem but it's better to mention or add comments about it. And I think it's better to add a sanity check here to check if si->pages still equal to si->max - 1, setup_swap_map_and_extents / setup_swap_map assumes the header section was already counted. This also helps indicate the setup_swap_extents may shrink and modify these two values. BTW, I was thinking that we should get rid of the whole extents design after the swap table series is ready, so mTHP allocation will be usable for swap over fs too. > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, > - maxpages, &span); > - if (unlikely(nr_extents < 0)) { > - error = nr_extents; > + error = setup_swap_map(si, swap_header, swap_map, maxpages); > + if (error) > goto bad_swap_unlock_inode; > - } > > /* > * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might > -- > 2.30.0 > Other than that: Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-25 17:08 ` Kairui Song @ 2025-06-11 7:54 ` Kemeng Shi 2025-07-17 23:21 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 7:54 UTC (permalink / raw) To: Kairui Song; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel on 5/26/2025 1:08 AM, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: >> >> We use maxpages from read_swap_header() to initialize swap_info_struct, >> however the maxpages might be reduced in setup_swap_extents() and the >> si->max is assigned with the reduced maxpages from the >> setup_swap_extents(). >> >> Obviously, this could lead to memory waste as we allocated memory based on >> larger maxpages, besides, this could lead to a potensial deadloop as >> following: >> 1) When calling setup_clusters() with larger maxpages, unavailable pages >> within range [si->max, larger maxpages) are not accounted with >> inc_cluster_info_page(). As a result, these pages are assumed available >> but can not be allocated. The cluster contains these pages can be moved >> to frag_clusters list after it's all available pages were allocated. >> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters >> list, cluster_alloc_swap_entry() assume order 0 allocation will never >> failed and will enter a deadloop by keep trying to allocate page from the >> only cluster in frag_clusters which contains no actually available page. >> >> Call setup_swap_extents() to get the final maxpages before swap_info_struct >> initialization to fix the issue. >> >> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") >> >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- >> 1 file changed, 20 insertions(+), 27 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 75b69213c2e7..a82f4ebefca3 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, >> return maxpages; >> } >> >> -static int setup_swap_map_and_extents(struct swap_info_struct *si, >> - union swap_header *swap_header, >> - unsigned char *swap_map, >> - unsigned long maxpages, >> - sector_t *span) >> +static int setup_swap_map(struct swap_info_struct *si, >> + union swap_header *swap_header, >> + unsigned char *swap_map, >> + unsigned long maxpages) >> { >> - unsigned int nr_good_pages; >> unsigned long i; >> - int nr_extents; >> - >> - nr_good_pages = maxpages - 1; /* omit header page */ >> >> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ >> for (i = 0; i < swap_header->info.nr_badpages; i++) { >> unsigned int page_nr = swap_header->info.badpages[i]; >> if (page_nr == 0 || page_nr > swap_header->info.last_page) >> return -EINVAL; >> if (page_nr < maxpages) { >> swap_map[page_nr] = SWAP_MAP_BAD; >> - nr_good_pages--; >> + si->pages--; >> } >> } >> >> - if (nr_good_pages) { >> - swap_map[0] = SWAP_MAP_BAD; >> - si->max = maxpages; >> - si->pages = nr_good_pages; >> - nr_extents = setup_swap_extents(si, span); >> - if (nr_extents < 0) >> - return nr_extents; >> - nr_good_pages = si->pages; >> - } >> - if (!nr_good_pages) { >> + if (!si->pages) { >> pr_warn("Empty swap-file\n"); >> return -EINVAL; >> } >> >> >> - return nr_extents; >> + return 0; >> } >> >> #define SWAP_CLUSTER_INFO_COLS \ >> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * Mark unusable pages as unavailable. The clusters aren't >> * marked free yet, so no list operations are involved yet. >> * >> - * See setup_swap_map_and_extents(): header page, bad pages, >> + * See setup_swap_map(): header page, bad pages, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> goto bad_swap_unlock_inode; >> } >> >> + si->max = maxpages; >> + si->pages = maxpages - 1; >> + nr_extents = setup_swap_extents(si, &span); >> + if (nr_extents < 0) { >> + error = nr_extents; >> + goto bad_swap_unlock_inode; >> + } >> + maxpages = si->max; > Hello, > There seems to be a trivial problem here, previously the si->pages > will be seen by swap_activate after bad blocks have been counted and > si->pages means the actual available slots. But now si->pages will be > seen by swap_active as `maxpages - 1`. > > One current side effect now is the span value will not be updated > properly so the pr_info in swap on may print a larger value, if the > swap header contains badblocks and swapfile is on nfs/cifs. Thanks for point this out. But I think the larger value is actually correct result. In summary, there are two kinds of swapfile_activate operations. 1. Filesystem style: Treat all blocks logical continuity and find useable physical extents in logical range. In this way, si->pages will be actual useable physical blocks and span will be "1 + highest_block - lowest_block". 2. Block device style: Treat all blocks physically continue and only one single extent is added. In this way, si->pages will be si->max and span will be "si->pages - 1". Actually, si->pages and si->max is only used in block device style and span value is set with si->pages. As a result, span value in block device style will become a larger value as you mentioned. I think larger value is correct based on: 1. Span value in filesystem style is "1 + highest_block - lowest_block" which is the range cover all possible phisical blocks including the badblocks. 2. For block device style, si->pages is the actual useable block number and is already in pr_info. The orignal span value before this patch is also refer to useable block number which is redundant in pr_info. > > This should not be a problem but it's better to mention or add > comments about it I'd like to mention this change as a fix in changelog in next version. > > And I think it's better to add a sanity check here to check if > si->pages still equal to si->max - 1, setup_swap_map_and_extents / > setup_swap_map assumes the header section was already counted. This > also helps indicate the setup_swap_extents may shrink and modify these > two values. Sure, will add this in next version. > > BTW, I was thinking that we should get rid of the whole extents design > after the swap table series is ready, so mTHP allocation will be > usable for swap over fs too. I also noticed this limitation but have not taken a deep look. Look forward to your solution in future. > >> /* OK, set up the swap map and apply the bad block list */ >> swap_map = vzalloc(maxpages); >> if (!swap_map) { >> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> if (error) >> goto bad_swap_unlock_inode; >> >> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, >> - maxpages, &span); >> - if (unlikely(nr_extents < 0)) { >> - error = nr_extents; >> + error = setup_swap_map(si, swap_header, swap_map, maxpages); >> + if (error) >> goto bad_swap_unlock_inode; >> - } >> >> /* >> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might >> -- >> 2.30.0 >> > > Other than that: > > Reviewed-by: Kairui Song <kasong@tencent.com> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-06-11 7:54 ` Kemeng Shi @ 2025-07-17 23:21 ` Andrew Morton 2025-07-18 6:12 ` Kemeng Shi 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2025-07-17 23:21 UTC (permalink / raw) To: Kemeng Shi; +Cc: Kairui Song, bhe, hannes, linux-mm, linux-kernel On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > on 5/26/2025 1:08 AM, Kairui Song wrote: Nearly two months! > Sure, will add this in next version. Do we actually need a new version? Having rescanned the v1 review I'm inclined to merge this series as-is? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-07-17 23:21 ` Andrew Morton @ 2025-07-18 6:12 ` Kemeng Shi 0 siblings, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-07-18 6:12 UTC (permalink / raw) To: Andrew Morton; +Cc: Kairui Song, bhe, hannes, linux-mm, linux-kernel on 7/18/2025 7:21 AM, Andrew Morton wrote: > On Wed, 11 Jun 2025 15:54:21 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > >> >> >> on 5/26/2025 1:08 AM, Kairui Song wrote: > > Nearly two months! > >> Sure, will add this in next version. > > Do we actually need a new version? Having rescanned the v1 review I'm > inclined to merge this series as-is? > > So sorry for the late. I will write and send a v2 version soon. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi 2025-05-25 17:08 ` Kairui Song @ 2025-05-30 2:50 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 1 reply; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:50 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > We use maxpages from read_swap_header() to initialize swap_info_struct, > however the maxpages might be reduced in setup_swap_extents() and the > si->max is assigned with the reduced maxpages from the > setup_swap_extents(). > Obviously, this could lead to memory waste as we allocated memory based on > larger maxpages, besides, this could lead to a potensial deadloop as ^ typo, potential > following: > 1) When calling setup_clusters() with larger maxpages, unavailable pages > within range [si->max, larger maxpages) are not accounted with > inc_cluster_info_page(). As a result, these pages are assumed available > but can not be allocated. The cluster contains these pages can be moved > to frag_clusters list after it's all available pages were allocated. > 2) When the cluster mentioned in 1) is the only cluster in frag_clusters > list, cluster_alloc_swap_entry() assume order 0 allocation will never > failed and will enter a deadloop by keep trying to allocate page from the > only cluster in frag_clusters which contains no actually available page. > > Call setup_swap_extents() to get the final maxpages before swap_info_struct > initialization to fix the issue. > > Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 27 deletions(-) Reviedwed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 75b69213c2e7..a82f4ebefca3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, > return maxpages; > } > > -static int setup_swap_map_and_extents(struct swap_info_struct *si, > - union swap_header *swap_header, > - unsigned char *swap_map, > - unsigned long maxpages, > - sector_t *span) > +static int setup_swap_map(struct swap_info_struct *si, > + union swap_header *swap_header, > + unsigned char *swap_map, > + unsigned long maxpages) > { > - unsigned int nr_good_pages; > unsigned long i; > - int nr_extents; > - > - nr_good_pages = maxpages - 1; /* omit header page */ > > + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ > for (i = 0; i < swap_header->info.nr_badpages; i++) { > unsigned int page_nr = swap_header->info.badpages[i]; > if (page_nr == 0 || page_nr > swap_header->info.last_page) > return -EINVAL; > if (page_nr < maxpages) { > swap_map[page_nr] = SWAP_MAP_BAD; > - nr_good_pages--; > + si->pages--; > } > } > > - if (nr_good_pages) { > - swap_map[0] = SWAP_MAP_BAD; > - si->max = maxpages; > - si->pages = nr_good_pages; > - nr_extents = setup_swap_extents(si, span); > - if (nr_extents < 0) > - return nr_extents; > - nr_good_pages = si->pages; > - } > - if (!nr_good_pages) { > + if (!si->pages) { > pr_warn("Empty swap-file\n"); > return -EINVAL; > } > > - return nr_extents; > + return 0; > } > > #define SWAP_CLUSTER_INFO_COLS \ > @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * Mark unusable pages as unavailable. The clusters aren't > * marked free yet, so no list operations are involved yet. > * > - * See setup_swap_map_and_extents(): header page, bad pages, > + * See setup_swap_map(): header page, bad pages, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > goto bad_swap_unlock_inode; > } > > + si->max = maxpages; > + si->pages = maxpages - 1; > + nr_extents = setup_swap_extents(si, &span); > + if (nr_extents < 0) { > + error = nr_extents; > + goto bad_swap_unlock_inode; > + } > + maxpages = si->max; > + > /* OK, set up the swap map and apply the bad block list */ > swap_map = vzalloc(maxpages); > if (!swap_map) { > @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap_unlock_inode; > > - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, > - maxpages, &span); > - if (unlikely(nr_extents < 0)) { > - error = nr_extents; > + error = setup_swap_map(si, swap_header, swap_map, maxpages); > + if (error) > goto bad_swap_unlock_inode; > - } > > /* > * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop 2025-05-30 2:50 ` Baoquan He @ 2025-06-11 8:27 ` Kemeng Shi 0 siblings, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 8:27 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel on 5/30/2025 10:50 AM, Baoquan He wrote: > On 05/22/25 at 08:25pm, Kemeng Shi wrote: >> We use maxpages from read_swap_header() to initialize swap_info_struct, >> however the maxpages might be reduced in setup_swap_extents() and the >> si->max is assigned with the reduced maxpages from the >> setup_swap_extents(). >> Obviously, this could lead to memory waste as we allocated memory based on >> larger maxpages, besides, this could lead to a potensial deadloop as > ^ typo, potential Thanks, will fix this in next version. >> following: >> 1) When calling setup_clusters() with larger maxpages, unavailable pages >> within range [si->max, larger maxpages) are not accounted with >> inc_cluster_info_page(). As a result, these pages are assumed available >> but can not be allocated. The cluster contains these pages can be moved >> to frag_clusters list after it's all available pages were allocated. >> 2) When the cluster mentioned in 1) is the only cluster in frag_clusters >> list, cluster_alloc_swap_entry() assume order 0 allocation will never >> failed and will enter a deadloop by keep trying to allocate page from the >> only cluster in frag_clusters which contains no actually available page. >> >> Call setup_swap_extents() to get the final maxpages before swap_info_struct >> initialization to fix the issue. >> >> Fixes: 661383c6111a3 ("mm: swap: relaim the cached parts that got scanned") >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 47 ++++++++++++++++++++--------------------------- >> 1 file changed, 20 insertions(+), 27 deletions(-) > > Reviedwed-by: Baoquan He <bhe@redhat.com> > >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 75b69213c2e7..a82f4ebefca3 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3141,43 +3141,30 @@ static unsigned long read_swap_header(struct swap_info_struct *si, >> return maxpages; >> } >> >> -static int setup_swap_map_and_extents(struct swap_info_struct *si, >> - union swap_header *swap_header, >> - unsigned char *swap_map, >> - unsigned long maxpages, >> - sector_t *span) >> +static int setup_swap_map(struct swap_info_struct *si, >> + union swap_header *swap_header, >> + unsigned char *swap_map, >> + unsigned long maxpages) >> { >> - unsigned int nr_good_pages; >> unsigned long i; >> - int nr_extents; >> - >> - nr_good_pages = maxpages - 1; /* omit header page */ >> >> + swap_map[0] = SWAP_MAP_BAD; /* omit header page */ >> for (i = 0; i < swap_header->info.nr_badpages; i++) { >> unsigned int page_nr = swap_header->info.badpages[i]; >> if (page_nr == 0 || page_nr > swap_header->info.last_page) >> return -EINVAL; >> if (page_nr < maxpages) { >> swap_map[page_nr] = SWAP_MAP_BAD; >> - nr_good_pages--; >> + si->pages--; >> } >> } >> >> - if (nr_good_pages) { >> - swap_map[0] = SWAP_MAP_BAD; >> - si->max = maxpages; >> - si->pages = nr_good_pages; >> - nr_extents = setup_swap_extents(si, span); >> - if (nr_extents < 0) >> - return nr_extents; >> - nr_good_pages = si->pages; >> - } >> - if (!nr_good_pages) { >> + if (!si->pages) { >> pr_warn("Empty swap-file\n"); >> return -EINVAL; >> } >> >> - return nr_extents; >> + return 0; >> } >> >> #define SWAP_CLUSTER_INFO_COLS \ >> @@ -3217,7 +3204,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * Mark unusable pages as unavailable. The clusters aren't >> * marked free yet, so no list operations are involved yet. >> * >> - * See setup_swap_map_and_extents(): header page, bad pages, >> + * See setup_swap_map(): header page, bad pages, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> @@ -3354,6 +3341,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> goto bad_swap_unlock_inode; >> } >> >> + si->max = maxpages; >> + si->pages = maxpages - 1; >> + nr_extents = setup_swap_extents(si, &span); >> + if (nr_extents < 0) { >> + error = nr_extents; >> + goto bad_swap_unlock_inode; >> + } >> + maxpages = si->max; >> + >> /* OK, set up the swap map and apply the bad block list */ >> swap_map = vzalloc(maxpages); >> if (!swap_map) { >> @@ -3365,12 +3361,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> if (error) >> goto bad_swap_unlock_inode; >> >> - nr_extents = setup_swap_map_and_extents(si, swap_header, swap_map, >> - maxpages, &span); >> - if (unlikely(nr_extents < 0)) { >> - error = nr_extents; >> + error = setup_swap_map(si, swap_header, swap_map, maxpages); >> + if (error) >> goto bad_swap_unlock_inode; >> - } >> >> /* >> * Use kvmalloc_array instead of bitmap_zalloc as the allocation order might >> -- >> 2.30.0 >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:56 ` Baoquan He 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel In setup_swap_map(), we only ensure badpages are in range (0, last_page]. As maxpages might be < last_page, setup_clusters() will encounter a buffer overflow when a badpage is >= maxpages. Only call inc_cluster_info_page() for badpage which is < maxpages to fix the issue. Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index a82f4ebefca3..63ab9f14b2c6 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, * and the EOF part of the last cluster. */ inc_cluster_info_page(si, cluster_info, 0); - for (i = 0; i < swap_header->info.nr_badpages; i++) - inc_cluster_info_page(si, cluster_info, - swap_header->info.badpages[i]); + for (i = 0; i < swap_header->info.nr_badpages; i++) { + unsigned int page_nr = swap_header->info.badpages[i]; + + if (page_nr >= maxpages) + continue; + inc_cluster_info_page(si, cluster_info, page_nr); + } for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) inc_cluster_info_page(si, cluster_info, i); -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi @ 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-30 2:56 ` Baoquan He 1 sibling, 2 replies; 20+ messages in thread From: Kairui Song @ 2025-05-25 18:44 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > As maxpages might be < last_page, setup_clusters() will encounter a > buffer overflow when a badpage is >= maxpages. > Only call inc_cluster_info_page() for badpage which is < maxpages to > fix the issue. > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index a82f4ebefca3..63ab9f14b2c6 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > - for (i = 0; i < swap_header->info.nr_badpages; i++) > - inc_cluster_info_page(si, cluster_info, > - swap_header->info.badpages[i]); > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > + unsigned int page_nr = swap_header->info.badpages[i]; > + > + if (page_nr >= maxpages) > + continue; > + inc_cluster_info_page(si, cluster_info, page_nr); I think we might need a pr_err or pr_warn here, this means mkswap marked the wrong region as a bad block? Or some fs side things went wrong. > + } > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > inc_cluster_info_page(si, cluster_info, i); > > -- > 2.30.0 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-25 18:44 ` Kairui Song @ 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:55 UTC (permalink / raw) To: Kairui Song; +Cc: Kemeng Shi, akpm, hannes, linux-mm, linux-kernel On 05/26/25 at 02:44am, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > > > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > > As maxpages might be < last_page, setup_clusters() will encounter a > > buffer overflow when a badpage is >= maxpages. > > Only call inc_cluster_info_page() for badpage which is < maxpages to > > fix the issue. > > > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > > --- > > mm/swapfile.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index a82f4ebefca3..63ab9f14b2c6 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > > * and the EOF part of the last cluster. > > */ > > inc_cluster_info_page(si, cluster_info, 0); > > - for (i = 0; i < swap_header->info.nr_badpages; i++) > > - inc_cluster_info_page(si, cluster_info, > > - swap_header->info.badpages[i]); > > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > > + unsigned int page_nr = swap_header->info.badpages[i]; > > + > > + if (page_nr >= maxpages) > > + continue; > > + inc_cluster_info_page(si, cluster_info, page_nr); > > I think we might need a pr_err or pr_warn here, this means mkswap > marked the wrong region as a bad block? Or some fs side things went > wrong. There's aready warning in read_swap_header(): static unsigned long read_swap_header(struct swap_info_struct *si, union swap_header *swap_header, struct inode *inode) { ...... if (last_page > maxpages) { pr_warn("Truncating oversized swap area, only using %luk out of %luk\n", K(maxpages), K(last_page)); } ... } And if we add pr_err|warn here, we also need add it in setup_swap_map() when filling swap_map. > > > > + } > > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > > inc_cluster_info_page(si, cluster_info, i); > > > > -- > > 2.30.0 > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He @ 2025-06-11 8:27 ` Kemeng Shi 1 sibling, 0 replies; 20+ messages in thread From: Kemeng Shi @ 2025-06-11 8:27 UTC (permalink / raw) To: Kairui Song; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel on 5/26/2025 2:44 AM, Kairui Song wrote: > On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: >> >> In setup_swap_map(), we only ensure badpages are in range (0, last_page]. >> As maxpages might be < last_page, setup_clusters() will encounter a >> buffer overflow when a badpage is >= maxpages. >> Only call inc_cluster_info_page() for badpage which is < maxpages to >> fix the issue. >> >> Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> >> --- >> mm/swapfile.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index a82f4ebefca3..63ab9f14b2c6 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, >> * and the EOF part of the last cluster. >> */ >> inc_cluster_info_page(si, cluster_info, 0); >> - for (i = 0; i < swap_header->info.nr_badpages; i++) >> - inc_cluster_info_page(si, cluster_info, >> - swap_header->info.badpages[i]); >> + for (i = 0; i < swap_header->info.nr_badpages; i++) { >> + unsigned int page_nr = swap_header->info.badpages[i]; >> + >> + if (page_nr >= maxpages) >> + continue; >> + inc_cluster_info_page(si, cluster_info, page_nr); > > I think we might need a pr_err or pr_warn here, this means mkswap > marked the wrong region as a bad block? Or some fs side things went > wrong. As Baoquan metioned that there is already warning in read_swap_header(). Besides, I think last_page in swap_header already indicates the range which is acceptable to fs and we only need to explicitly handle page_nr which is > last_page. > > >> + } >> for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) >> inc_cluster_info_page(si, cluster_info, i); >> >> -- >> 2.30.0 >> >> > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi 2025-05-25 18:44 ` Kairui Song @ 2025-05-30 2:56 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 2:56 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > In setup_swap_map(), we only ensure badpages are in range (0, last_page]. > As maxpages might be < last_page, setup_clusters() will encounter a > buffer overflow when a badpage is >= maxpages. > Only call inc_cluster_info_page() for badpage which is < maxpages to > fix the issue. > > Fixes: b843786b0bd01 ("mm: swapfile: fix SSD detection with swapfile on btrfs") > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index a82f4ebefca3..63ab9f14b2c6 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3208,9 +3208,13 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > * and the EOF part of the last cluster. > */ > inc_cluster_info_page(si, cluster_info, 0); > - for (i = 0; i < swap_header->info.nr_badpages; i++) > - inc_cluster_info_page(si, cluster_info, > - swap_header->info.badpages[i]); > + for (i = 0; i < swap_header->info.nr_badpages; i++) { > + unsigned int page_nr = swap_header->info.badpages[i]; > + > + if (page_nr >= maxpages) > + continue; > + inc_cluster_info_page(si, cluster_info, page_nr); > + } > for (i = maxpages; i < round_up(maxpages, SWAPFILE_CLUSTER); i++) > inc_cluster_info_page(si, cluster_info, i); > > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi ` (2 preceding siblings ...) 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi @ 2025-05-22 12:25 ` Kemeng Shi 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton 4 siblings, 2 replies; 20+ messages in thread From: Kemeng Shi @ 2025-05-22 12:25 UTC (permalink / raw) To: akpm; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel As cluster_next_cpu was already dropped, the associated comment is stale now. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/swapfile.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 63ab9f14b2c6..8525515fb06c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o } /* - * We don't have free cluster but have some clusters in - * discarding, do discard now and reclaim them, then - * reread cluster_next_cpu since we dropped si->lock + * We don't have free cluster but have some clusters in discarding, + * do discard now and reclaim them. */ if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) goto new_cluster; -- 2.30.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi @ 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Kairui Song @ 2025-05-25 17:05 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, bhe, hannes, linux-mm, linux-kernel On Thu, May 22, 2025 at 11:32 AM Kemeng Shi <shikemeng@huaweicloud.com> wrote: > > As cluster_next_cpu was already dropped, the associated comment is stale > now. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 63ab9f14b2c6..8525515fb06c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > } > > /* > - * We don't have free cluster but have some clusters in > - * discarding, do discard now and reclaim them, then > - * reread cluster_next_cpu since we dropped si->lock > + * We don't have free cluster but have some clusters in discarding, > + * do discard now and reclaim them. > */ > if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) > goto new_cluster; > -- > 2.30.0 > Nice. Reviewed-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-25 17:05 ` Kairui Song @ 2025-05-30 5:24 ` Baoquan He 1 sibling, 0 replies; 20+ messages in thread From: Baoquan He @ 2025-05-30 5:24 UTC (permalink / raw) To: Kemeng Shi; +Cc: akpm, kasong, hannes, linux-mm, linux-kernel On 05/22/25 at 08:25pm, Kemeng Shi wrote: > As cluster_next_cpu was already dropped, the associated comment is stale > now. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> > --- > mm/swapfile.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) Reviewed-by: Baoquan He <bhe@redhat.com> > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 63ab9f14b2c6..8525515fb06c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -956,9 +956,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > } > > /* > - * We don't have free cluster but have some clusters in > - * discarding, do discard now and reclaim them, then > - * reread cluster_next_cpu since we dropped si->lock > + * We don't have free cluster but have some clusters in discarding, > + * do discard now and reclaim them. > */ > if ((si->flags & SWP_PAGE_DISCARD) && swap_do_scheduled_discard(si)) > goto new_cluster; > -- > 2.30.0 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] Some randome fixes and cleanups to swapfile 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi ` (3 preceding siblings ...) 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi @ 2025-05-22 21:41 ` Andrew Morton 4 siblings, 0 replies; 20+ messages in thread From: Andrew Morton @ 2025-05-22 21:41 UTC (permalink / raw) To: Kemeng Shi; +Cc: kasong, bhe, hannes, linux-mm, linux-kernel On Thu, 22 May 2025 20:25:50 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote: > Patch 0-3 are some random fixes. Patch 4 is a cleanup. More details can > be found in respective patches. Thanks. Cool. I'll add these to mm-new but I won't advance them to mm-unstable until after the merge window, so we have 2+ weeks for review and test. I added cc:stable to patches 1-3, so they'll eventually find their way into earlier kernels. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-07-18 6:13 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-22 12:25 [PATCH 0/4] Some randome fixes and cleanups to swapfile Kemeng Shi 2025-05-22 12:25 ` [PATCH 1/4] mm: swap: move nr_swap_pages counter decrement from folio_alloc_swap() to swap_range_alloc() Kemeng Shi 2025-05-22 3:55 ` Kairui Song 2025-05-30 1:31 ` Baoquan He 2025-05-22 12:25 ` [PATCH 2/4] mm: swap: correctly use maxpages in swapon syscall to avoid potensial deadloop Kemeng Shi 2025-05-25 17:08 ` Kairui Song 2025-06-11 7:54 ` Kemeng Shi 2025-07-17 23:21 ` Andrew Morton 2025-07-18 6:12 ` Kemeng Shi 2025-05-30 2:50 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-22 12:25 ` [PATCH 3/4] mm: swap: fix potensial buffer overflow in setup_clusters() Kemeng Shi 2025-05-25 18:44 ` Kairui Song 2025-05-30 2:55 ` Baoquan He 2025-06-11 8:27 ` Kemeng Shi 2025-05-30 2:56 ` Baoquan He 2025-05-22 12:25 ` [PATCH 4/4] mm: swap: remove stale comment stale comment in cluster_alloc_swap_entry() Kemeng Shi 2025-05-25 17:05 ` Kairui Song 2025-05-30 5:24 ` Baoquan He 2025-05-22 21:41 ` [PATCH 0/4] Some randome fixes and cleanups to swapfile Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).