* [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages()
@ 2026-03-13 3:45 Baolin Wang
2026-03-13 5:11 ` Dev Jain
2026-03-16 14:06 ` David Hildenbrand (Arm)
0 siblings, 2 replies; 9+ messages in thread
From: Baolin Wang @ 2026-03-13 3:45 UTC (permalink / raw)
To: akpm, willy
Cc: david, lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong,
hare, da.gomez, dchinner, brauner, baolin.wang, xiangzao,
linux-fsdevel, linux-mm, linux-kernel
When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered
some very strange crash issues showing up as "Bad page state":
"
[ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb
[ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb
[ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff)
[ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000
[ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000
[ 734.496442] page dumped because: nonzero mapcount
"
After analyzing this page’s state, it is hard to understand why the mapcount
is not 0 while the refcount is 0, since this page is not where the issue first
occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as
well and captured the first warning where the issue appears:
"
[ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0
[ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 734.469315] memcg:ffff000807a8ec00
[ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540"
[ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff)
......
[ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1),
const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *:
(struct folio *)_compound_head(page + nr_pages - 1))) != folio)
[ 734.469390] ------------[ cut here ]------------
[ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468,
CPU#90: stress-ng-mlock/9430
[ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P)
[ 734.469555] set_pte_range+0xd8/0x2f8
[ 734.469566] filemap_map_folio_range+0x190/0x400
[ 734.469579] filemap_map_pages+0x348/0x638
[ 734.469583] do_fault_around+0x140/0x198
......
[ 734.469640] el0t_64_sync+0x184/0x188
"
The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)",
which indicates that set_pte_range() tried to map beyond the large folio’s
size.
By adding more debug information, I found that 'nr_pages' had overflowed in
filemap_map_pages(), causing set_pte_range() to establish mappings for a range
exceeding the folio size, potentially corrupting fields of pages that do not
belong to this folio (e.g., page->_mapcount).
After above analysis, I think the possible race is as follows:
CPU 0 CPU 1
filemap_map_pages() ext4_setattr()
//get and lock folio with old inode->i_size
next_uptodate_folio()
.......
//shrink the inode->i_size
i_size_write(inode, attr->ia_size);
//calculate the end_pgoff with the new inode->i_size
file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
end_pgoff = min(end_pgoff, file_end);
......
//nr_pages can be overflowed, cause xas.xa_index > end_pgoff
end = folio_next_index(folio) - 1;
nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
......
//map large folio
filemap_map_folio_range()
......
//truncate folios
truncate_pagecache(inode, inode->i_size);
To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(),
so the retrieved folio stays consistent with the file end to avoid 'nr_pages'
calculation overflow. After this patch, the crash issue is gone.
Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()")
Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com>
Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/filemap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index bc6775084744..923d28e59642 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
unsigned int nr_pages = 0, folio_type;
unsigned short mmap_miss = 0, mmap_miss_saved;
+ file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
+ end_pgoff = min(end_pgoff, file_end);
+
rcu_read_lock();
folio = next_uptodate_folio(&xas, mapping, end_pgoff);
if (!folio)
goto out;
- file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1;
- end_pgoff = min(end_pgoff, file_end);
-
/*
* Do not allow to map with PMD across i_size to preserve
* SIGBUS semantics.
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-13 3:45 [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() Baolin Wang @ 2026-03-13 5:11 ` Dev Jain 2026-03-13 5:14 ` Dev Jain 2026-03-16 14:06 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 9+ messages in thread From: Dev Jain @ 2026-03-13 5:11 UTC (permalink / raw) To: Baolin Wang, akpm, willy Cc: david, lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 13/03/26 9:15 am, Baolin Wang wrote: > When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered > some very strange crash issues showing up as "Bad page state": > > " > [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb > [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb > [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) > [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 > [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 > [ 734.496442] page dumped because: nonzero mapcount > " > > After analyzing this page’s state, it is hard to understand why the mapcount > is not 0 while the refcount is 0, since this page is not where the issue first > occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as > well and captured the first warning where the issue appears: > > " > [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 > [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > [ 734.469315] memcg:ffff000807a8ec00 > [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" > [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) > ...... > [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), > const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: > (struct folio *)_compound_head(page + nr_pages - 1))) != folio) > [ 734.469390] ------------[ cut here ]------------ > [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, > CPU#90: stress-ng-mlock/9430 > [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) > [ 734.469555] set_pte_range+0xd8/0x2f8 > [ 734.469566] filemap_map_folio_range+0x190/0x400 > [ 734.469579] filemap_map_pages+0x348/0x638 > [ 734.469583] do_fault_around+0x140/0x198 > ...... > [ 734.469640] el0t_64_sync+0x184/0x188 > " > > The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", > which indicates that set_pte_range() tried to map beyond the large folio’s > size. > > By adding more debug information, I found that 'nr_pages' had overflowed in > filemap_map_pages(), causing set_pte_range() to establish mappings for a range > exceeding the folio size, potentially corrupting fields of pages that do not > belong to this folio (e.g., page->_mapcount). > > After above analysis, I think the possible race is as follows: > > CPU 0 CPU 1 > filemap_map_pages() ext4_setattr() > //get and lock folio with old inode->i_size > next_uptodate_folio() > > ....... > //shrink the inode->i_size > i_size_write(inode, attr->ia_size); > > //calculate the end_pgoff with the new inode->i_size > file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > end_pgoff = min(end_pgoff, file_end); > > ...... > //nr_pages can be overflowed, cause xas.xa_index > end_pgoff > end = folio_next_index(folio) - 1; > nr_pages = min(end, end_pgoff) - xas.xa_index + 1; > > ...... > //map large folio > filemap_map_folio_range() > ...... > //truncate folios > truncate_pagecache(inode, inode->i_size); > > To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), > so the retrieved folio stays consistent with the file end to avoid 'nr_pages' > calculation overflow. After this patch, the crash issue is gone. > > Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") > Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/filemap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index bc6775084744..923d28e59642 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > unsigned int nr_pages = 0, folio_type; > unsigned short mmap_miss = 0, mmap_miss_saved; > > + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > + end_pgoff = min(end_pgoff, file_end); > + > rcu_read_lock(); > folio = next_uptodate_folio(&xas, mapping, end_pgoff); > if (!folio) > goto out; > > - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > - end_pgoff = min(end_pgoff, file_end); > - > /* > * Do not allow to map with PMD across i_size to preserve > * SIGBUS semantics. I am wondering whether something similar can happen in the do-while loop below this code. We can retrieve a folio from next_uptodate_folio, and then a massive truncate happens and we end up mapping a large folio into the pagetables beyong i_size, violating SIGBUS semantics. (truncation may back-off seeing the locked folio/increased refcount in filemap_map_pages) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-13 5:11 ` Dev Jain @ 2026-03-13 5:14 ` Dev Jain 2026-03-13 5:54 ` Baolin Wang 0 siblings, 1 reply; 9+ messages in thread From: Dev Jain @ 2026-03-13 5:14 UTC (permalink / raw) To: Baolin Wang, akpm, willy Cc: david, lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 13/03/26 10:41 am, Dev Jain wrote: > > > On 13/03/26 9:15 am, Baolin Wang wrote: >> When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered >> some very strange crash issues showing up as "Bad page state": >> >> " >> [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb >> [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb >> [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) >> [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 >> [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 >> [ 734.496442] page dumped because: nonzero mapcount >> " >> >> After analyzing this page’s state, it is hard to understand why the mapcount >> is not 0 while the refcount is 0, since this page is not where the issue first >> occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as >> well and captured the first warning where the issue appears: >> >> " >> [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 >> [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >> [ 734.469315] memcg:ffff000807a8ec00 >> [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" >> [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) >> ...... >> [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), >> const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: >> (struct folio *)_compound_head(page + nr_pages - 1))) != folio) >> [ 734.469390] ------------[ cut here ]------------ >> [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, >> CPU#90: stress-ng-mlock/9430 >> [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) >> [ 734.469555] set_pte_range+0xd8/0x2f8 >> [ 734.469566] filemap_map_folio_range+0x190/0x400 >> [ 734.469579] filemap_map_pages+0x348/0x638 >> [ 734.469583] do_fault_around+0x140/0x198 >> ...... >> [ 734.469640] el0t_64_sync+0x184/0x188 >> " >> >> The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", >> which indicates that set_pte_range() tried to map beyond the large folio’s >> size. >> >> By adding more debug information, I found that 'nr_pages' had overflowed in >> filemap_map_pages(), causing set_pte_range() to establish mappings for a range >> exceeding the folio size, potentially corrupting fields of pages that do not >> belong to this folio (e.g., page->_mapcount). >> >> After above analysis, I think the possible race is as follows: >> >> CPU 0 CPU 1 >> filemap_map_pages() ext4_setattr() >> //get and lock folio with old inode->i_size >> next_uptodate_folio() >> >> ....... >> //shrink the inode->i_size >> i_size_write(inode, attr->ia_size); >> >> //calculate the end_pgoff with the new inode->i_size >> file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> end_pgoff = min(end_pgoff, file_end); >> >> ...... >> //nr_pages can be overflowed, cause xas.xa_index > end_pgoff >> end = folio_next_index(folio) - 1; >> nr_pages = min(end, end_pgoff) - xas.xa_index + 1; >> >> ...... >> //map large folio >> filemap_map_folio_range() >> ...... >> //truncate folios >> truncate_pagecache(inode, inode->i_size); >> >> To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), >> so the retrieved folio stays consistent with the file end to avoid 'nr_pages' >> calculation overflow. After this patch, the crash issue is gone. >> >> Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") >> Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >> Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/filemap.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index bc6775084744..923d28e59642 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, >> unsigned int nr_pages = 0, folio_type; >> unsigned short mmap_miss = 0, mmap_miss_saved; >> >> + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> + end_pgoff = min(end_pgoff, file_end); >> + >> rcu_read_lock(); >> folio = next_uptodate_folio(&xas, mapping, end_pgoff); >> if (!folio) >> goto out; >> >> - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> - end_pgoff = min(end_pgoff, file_end); >> - >> /* >> * Do not allow to map with PMD across i_size to preserve >> * SIGBUS semantics. > > I am wondering whether something similar can happen in the do-while loop > below this code. We can retrieve a folio from next_uptodate_folio, and > then a massive truncate happens and we end up mapping a large folio > into the pagetables beyong i_size, violating SIGBUS semantics. (truncation > may back-off seeing the locked folio/increased refcount in filemap_map_pages) Read the bracket text as - (truncation may fail to unmap this folio seeing it locked or with elevated refcount, therefore the illegal mapping stays permanent) > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-13 5:14 ` Dev Jain @ 2026-03-13 5:54 ` Baolin Wang 2026-03-16 12:00 ` Kiryl Shutsemau 0 siblings, 1 reply; 9+ messages in thread From: Baolin Wang @ 2026-03-13 5:54 UTC (permalink / raw) To: Dev Jain, akpm, willy Cc: david, lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 3/13/26 1:14 PM, Dev Jain wrote: > > > On 13/03/26 10:41 am, Dev Jain wrote: >> >> >> On 13/03/26 9:15 am, Baolin Wang wrote: >>> When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered >>> some very strange crash issues showing up as "Bad page state": >>> >>> " >>> [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb >>> [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb >>> [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) >>> [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 >>> [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 >>> [ 734.496442] page dumped because: nonzero mapcount >>> " >>> >>> After analyzing this page’s state, it is hard to understand why the mapcount >>> is not 0 while the refcount is 0, since this page is not where the issue first >>> occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as >>> well and captured the first warning where the issue appears: >>> >>> " >>> [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 >>> [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >>> [ 734.469315] memcg:ffff000807a8ec00 >>> [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" >>> [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) >>> ...... >>> [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), >>> const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: >>> (struct folio *)_compound_head(page + nr_pages - 1))) != folio) >>> [ 734.469390] ------------[ cut here ]------------ >>> [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, >>> CPU#90: stress-ng-mlock/9430 >>> [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) >>> [ 734.469555] set_pte_range+0xd8/0x2f8 >>> [ 734.469566] filemap_map_folio_range+0x190/0x400 >>> [ 734.469579] filemap_map_pages+0x348/0x638 >>> [ 734.469583] do_fault_around+0x140/0x198 >>> ...... >>> [ 734.469640] el0t_64_sync+0x184/0x188 >>> " >>> >>> The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", >>> which indicates that set_pte_range() tried to map beyond the large folio’s >>> size. >>> >>> By adding more debug information, I found that 'nr_pages' had overflowed in >>> filemap_map_pages(), causing set_pte_range() to establish mappings for a range >>> exceeding the folio size, potentially corrupting fields of pages that do not >>> belong to this folio (e.g., page->_mapcount). >>> >>> After above analysis, I think the possible race is as follows: >>> >>> CPU 0 CPU 1 >>> filemap_map_pages() ext4_setattr() >>> //get and lock folio with old inode->i_size >>> next_uptodate_folio() >>> >>> ....... >>> //shrink the inode->i_size >>> i_size_write(inode, attr->ia_size); >>> >>> //calculate the end_pgoff with the new inode->i_size >>> file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>> end_pgoff = min(end_pgoff, file_end); >>> >>> ...... >>> //nr_pages can be overflowed, cause xas.xa_index > end_pgoff >>> end = folio_next_index(folio) - 1; >>> nr_pages = min(end, end_pgoff) - xas.xa_index + 1; >>> >>> ...... >>> //map large folio >>> filemap_map_folio_range() >>> ...... >>> //truncate folios >>> truncate_pagecache(inode, inode->i_size); >>> >>> To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), >>> so the retrieved folio stays consistent with the file end to avoid 'nr_pages' >>> calculation overflow. After this patch, the crash issue is gone. >>> >>> Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") >>> Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >>> Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>> --- >>> mm/filemap.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index bc6775084744..923d28e59642 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, >>> unsigned int nr_pages = 0, folio_type; >>> unsigned short mmap_miss = 0, mmap_miss_saved; >>> >>> + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>> + end_pgoff = min(end_pgoff, file_end); >>> + >>> rcu_read_lock(); >>> folio = next_uptodate_folio(&xas, mapping, end_pgoff); >>> if (!folio) >>> goto out; >>> >>> - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>> - end_pgoff = min(end_pgoff, file_end); >>> - >>> /* >>> * Do not allow to map with PMD across i_size to preserve >>> * SIGBUS semantics. >> >> I am wondering whether something similar can happen in the do-while loop >> below this code. We can retrieve a folio from next_uptodate_folio, and >> then a massive truncate happens and we end up mapping a large folio >> into the pagetables beyong i_size, violating SIGBUS semantics. (truncation >> may back-off seeing the locked folio/increased refcount in filemap_map_pages) > > Read the bracket text as - (truncation may fail to unmap this folio seeing > it locked or with elevated refcount, therefore the illegal mapping stays > permanent) IMHO, the truncate_pagecache() will call unmap_mapping_range() twice, and the folio lock and refcount will not block unmap_mapping_range() to unmap the folio's mapping (only hold ptl lock). So the truncate_pagecache() can still truncate large folios beyond i_size. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-13 5:54 ` Baolin Wang @ 2026-03-16 12:00 ` Kiryl Shutsemau 2026-03-17 1:04 ` Baolin Wang 0 siblings, 1 reply; 9+ messages in thread From: Kiryl Shutsemau @ 2026-03-16 12:00 UTC (permalink / raw) To: Baolin Wang Cc: Dev Jain, akpm, willy, david, lorenzo.stoakes, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On Fri, Mar 13, 2026 at 01:54:31PM +0800, Baolin Wang wrote: > > > On 3/13/26 1:14 PM, Dev Jain wrote: > > > > > > On 13/03/26 10:41 am, Dev Jain wrote: > > > > > > > > > On 13/03/26 9:15 am, Baolin Wang wrote: > > > > When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered > > > > some very strange crash issues showing up as "Bad page state": > > > > > > > > " > > > > [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb > > > > [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb > > > > [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) > > > > [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 > > > > [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 > > > > [ 734.496442] page dumped because: nonzero mapcount > > > > " > > > > > > > > After analyzing this page’s state, it is hard to understand why the mapcount > > > > is not 0 while the refcount is 0, since this page is not where the issue first > > > > occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as > > > > well and captured the first warning where the issue appears: > > > > > > > > " > > > > [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 > > > > [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > > > > [ 734.469315] memcg:ffff000807a8ec00 > > > > [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" > > > > [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) > > > > ...... > > > > [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), > > > > const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: > > > > (struct folio *)_compound_head(page + nr_pages - 1))) != folio) > > > > [ 734.469390] ------------[ cut here ]------------ > > > > [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, > > > > CPU#90: stress-ng-mlock/9430 > > > > [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) > > > > [ 734.469555] set_pte_range+0xd8/0x2f8 > > > > [ 734.469566] filemap_map_folio_range+0x190/0x400 > > > > [ 734.469579] filemap_map_pages+0x348/0x638 > > > > [ 734.469583] do_fault_around+0x140/0x198 > > > > ...... > > > > [ 734.469640] el0t_64_sync+0x184/0x188 > > > > " > > > > > > > > The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", > > > > which indicates that set_pte_range() tried to map beyond the large folio’s > > > > size. > > > > > > > > By adding more debug information, I found that 'nr_pages' had overflowed in > > > > filemap_map_pages(), causing set_pte_range() to establish mappings for a range > > > > exceeding the folio size, potentially corrupting fields of pages that do not > > > > belong to this folio (e.g., page->_mapcount). > > > > > > > > After above analysis, I think the possible race is as follows: > > > > > > > > CPU 0 CPU 1 > > > > filemap_map_pages() ext4_setattr() > > > > //get and lock folio with old inode->i_size > > > > next_uptodate_folio() > > > > > > > > ....... > > > > //shrink the inode->i_size > > > > i_size_write(inode, attr->ia_size); > > > > > > > > //calculate the end_pgoff with the new inode->i_size > > > > file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > > > > end_pgoff = min(end_pgoff, file_end); > > > > > > > > ...... > > > > //nr_pages can be overflowed, cause xas.xa_index > end_pgoff > > > > end = folio_next_index(folio) - 1; > > > > nr_pages = min(end, end_pgoff) - xas.xa_index + 1; > > > > > > > > ...... > > > > //map large folio > > > > filemap_map_folio_range() > > > > ...... > > > > //truncate folios > > > > truncate_pagecache(inode, inode->i_size); > > > > > > > > To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), > > > > so the retrieved folio stays consistent with the file end to avoid 'nr_pages' > > > > calculation overflow. After this patch, the crash issue is gone. > > > > > > > > Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") > > > > Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > > > > Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > > > > --- > > > > mm/filemap.c | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index bc6775084744..923d28e59642 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > > > > unsigned int nr_pages = 0, folio_type; > > > > unsigned short mmap_miss = 0, mmap_miss_saved; > > > > + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > > > > + end_pgoff = min(end_pgoff, file_end); > > > > + > > > > rcu_read_lock(); > > > > folio = next_uptodate_folio(&xas, mapping, end_pgoff); > > > > if (!folio) > > > > goto out; > > > > - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > > > > - end_pgoff = min(end_pgoff, file_end); > > > > - > > > > /* > > > > * Do not allow to map with PMD across i_size to preserve > > > > * SIGBUS semantics. > > > > > > I am wondering whether something similar can happen in the do-while loop > > > below this code. We can retrieve a folio from next_uptodate_folio, and > > > then a massive truncate happens and we end up mapping a large folio > > > into the pagetables beyong i_size, violating SIGBUS semantics. (truncation > > > may back-off seeing the locked folio/increased refcount in filemap_map_pages) > > > > Read the bracket text as - (truncation may fail to unmap this folio seeing > > it locked or with elevated refcount, therefore the illegal mapping stays > > permanent) > > IMHO, the truncate_pagecache() will call unmap_mapping_range() twice, and > the folio lock and refcount will not block unmap_mapping_range() to unmap > the folio's mapping (only hold ptl lock). > > So the truncate_pagecache() can still truncate large folios beyond i_size. Yeah, we serialize here on the folio lock. It should be safe. The fix looks sane to me: Acked-by: Kiryl Shutsemau (Meta) <kas@kernel.org> -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-16 12:00 ` Kiryl Shutsemau @ 2026-03-17 1:04 ` Baolin Wang 0 siblings, 0 replies; 9+ messages in thread From: Baolin Wang @ 2026-03-17 1:04 UTC (permalink / raw) To: Kiryl Shutsemau Cc: Dev Jain, akpm, willy, david, lorenzo.stoakes, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 3/16/26 8:00 PM, Kiryl Shutsemau wrote: > On Fri, Mar 13, 2026 at 01:54:31PM +0800, Baolin Wang wrote: >> >> >> On 3/13/26 1:14 PM, Dev Jain wrote: >>> >>> >>> On 13/03/26 10:41 am, Dev Jain wrote: >>>> >>>> >>>> On 13/03/26 9:15 am, Baolin Wang wrote: >>>>> When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered >>>>> some very strange crash issues showing up as "Bad page state": >>>>> >>>>> " >>>>> [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb >>>>> [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb >>>>> [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) >>>>> [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 >>>>> [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 >>>>> [ 734.496442] page dumped because: nonzero mapcount >>>>> " >>>>> >>>>> After analyzing this page’s state, it is hard to understand why the mapcount >>>>> is not 0 while the refcount is 0, since this page is not where the issue first >>>>> occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as >>>>> well and captured the first warning where the issue appears: >>>>> >>>>> " >>>>> [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 >>>>> [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >>>>> [ 734.469315] memcg:ffff000807a8ec00 >>>>> [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" >>>>> [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) >>>>> ...... >>>>> [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), >>>>> const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: >>>>> (struct folio *)_compound_head(page + nr_pages - 1))) != folio) >>>>> [ 734.469390] ------------[ cut here ]------------ >>>>> [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, >>>>> CPU#90: stress-ng-mlock/9430 >>>>> [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) >>>>> [ 734.469555] set_pte_range+0xd8/0x2f8 >>>>> [ 734.469566] filemap_map_folio_range+0x190/0x400 >>>>> [ 734.469579] filemap_map_pages+0x348/0x638 >>>>> [ 734.469583] do_fault_around+0x140/0x198 >>>>> ...... >>>>> [ 734.469640] el0t_64_sync+0x184/0x188 >>>>> " >>>>> >>>>> The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", >>>>> which indicates that set_pte_range() tried to map beyond the large folio’s >>>>> size. >>>>> >>>>> By adding more debug information, I found that 'nr_pages' had overflowed in >>>>> filemap_map_pages(), causing set_pte_range() to establish mappings for a range >>>>> exceeding the folio size, potentially corrupting fields of pages that do not >>>>> belong to this folio (e.g., page->_mapcount). >>>>> >>>>> After above analysis, I think the possible race is as follows: >>>>> >>>>> CPU 0 CPU 1 >>>>> filemap_map_pages() ext4_setattr() >>>>> //get and lock folio with old inode->i_size >>>>> next_uptodate_folio() >>>>> >>>>> ....... >>>>> //shrink the inode->i_size >>>>> i_size_write(inode, attr->ia_size); >>>>> >>>>> //calculate the end_pgoff with the new inode->i_size >>>>> file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>>>> end_pgoff = min(end_pgoff, file_end); >>>>> >>>>> ...... >>>>> //nr_pages can be overflowed, cause xas.xa_index > end_pgoff >>>>> end = folio_next_index(folio) - 1; >>>>> nr_pages = min(end, end_pgoff) - xas.xa_index + 1; >>>>> >>>>> ...... >>>>> //map large folio >>>>> filemap_map_folio_range() >>>>> ...... >>>>> //truncate folios >>>>> truncate_pagecache(inode, inode->i_size); >>>>> >>>>> To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), >>>>> so the retrieved folio stays consistent with the file end to avoid 'nr_pages' >>>>> calculation overflow. After this patch, the crash issue is gone. >>>>> >>>>> Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") >>>>> Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >>>>> Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >>>>> --- >>>>> mm/filemap.c | 6 +++--- >>>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index bc6775084744..923d28e59642 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, >>>>> unsigned int nr_pages = 0, folio_type; >>>>> unsigned short mmap_miss = 0, mmap_miss_saved; >>>>> + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>>>> + end_pgoff = min(end_pgoff, file_end); >>>>> + >>>>> rcu_read_lock(); >>>>> folio = next_uptodate_folio(&xas, mapping, end_pgoff); >>>>> if (!folio) >>>>> goto out; >>>>> - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >>>>> - end_pgoff = min(end_pgoff, file_end); >>>>> - >>>>> /* >>>>> * Do not allow to map with PMD across i_size to preserve >>>>> * SIGBUS semantics. >>>> >>>> I am wondering whether something similar can happen in the do-while loop >>>> below this code. We can retrieve a folio from next_uptodate_folio, and >>>> then a massive truncate happens and we end up mapping a large folio >>>> into the pagetables beyong i_size, violating SIGBUS semantics. (truncation >>>> may back-off seeing the locked folio/increased refcount in filemap_map_pages) >>> >>> Read the bracket text as - (truncation may fail to unmap this folio seeing >>> it locked or with elevated refcount, therefore the illegal mapping stays >>> permanent) >> >> IMHO, the truncate_pagecache() will call unmap_mapping_range() twice, and >> the folio lock and refcount will not block unmap_mapping_range() to unmap >> the folio's mapping (only hold ptl lock). >> >> So the truncate_pagecache() can still truncate large folios beyond i_size. > > Yeah, we serialize here on the folio lock. It should be safe. > > The fix looks sane to me: > > Acked-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Thanks for reviewing. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-13 3:45 [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() Baolin Wang 2026-03-13 5:11 ` Dev Jain @ 2026-03-16 14:06 ` David Hildenbrand (Arm) 2026-03-17 1:16 ` Baolin Wang 1 sibling, 1 reply; 9+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-16 14:06 UTC (permalink / raw) To: Baolin Wang, akpm, willy Cc: lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 3/13/26 04:45, Baolin Wang wrote: > When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered > some very strange crash issues showing up as "Bad page state": > > " > [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb > [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb > [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) > [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 > [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 > [ 734.496442] page dumped because: nonzero mapcount > " > > After analyzing this page’s state, it is hard to understand why the mapcount > is not 0 while the refcount is 0, since this page is not where the issue first > occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as > well and captured the first warning where the issue appears: > > " > [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 > [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > [ 734.469315] memcg:ffff000807a8ec00 > [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" > [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) > ...... > [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), > const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: > (struct folio *)_compound_head(page + nr_pages - 1))) != folio) > [ 734.469390] ------------[ cut here ]------------ > [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, > CPU#90: stress-ng-mlock/9430 > [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) > [ 734.469555] set_pte_range+0xd8/0x2f8 > [ 734.469566] filemap_map_folio_range+0x190/0x400 > [ 734.469579] filemap_map_pages+0x348/0x638 > [ 734.469583] do_fault_around+0x140/0x198 > ...... > [ 734.469640] el0t_64_sync+0x184/0x188 > " > > The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", > which indicates that set_pte_range() tried to map beyond the large folio’s > size. We had a bunch of similar reports throughout the last year from syzbot, but always without a reproducer. In particular this here: https://syzkaller.appspot.com/bug?extid=c0673e1f1f054fac28c2 via https://lore.kernel.org/all/6758f0cc.050a0220.17f54a.0001.GAE@google.com/ Could that be the same root cause? > > By adding more debug information, I found that 'nr_pages' had overflowed in > filemap_map_pages(), causing set_pte_range() to establish mappings for a range > exceeding the folio size, potentially corrupting fields of pages that do not > belong to this folio (e.g., page->_mapcount). Sounds quite bad. > > After above analysis, I think the possible race is as follows: > > CPU 0 CPU 1 > filemap_map_pages() ext4_setattr() > //get and lock folio with old inode->i_size > next_uptodate_folio() > > ....... > //shrink the inode->i_size > i_size_write(inode, attr->ia_size); > > //calculate the end_pgoff with the new inode->i_size > file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > end_pgoff = min(end_pgoff, file_end); > > ...... > //nr_pages can be overflowed, cause xas.xa_index > end_pgoff > end = folio_next_index(folio) - 1; > nr_pages = min(end, end_pgoff) - xas.xa_index + 1; > > ...... > //map large folio > filemap_map_folio_range() > ...... > //truncate folios > truncate_pagecache(inode, inode->i_size); > > To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), > so the retrieved folio stays consistent with the file end to avoid 'nr_pages' > calculation overflow. After this patch, the crash issue is gone. > > Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") This certainly sounds like stable material :) > Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> > --- > mm/filemap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index bc6775084744..923d28e59642 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, > unsigned int nr_pages = 0, folio_type; > unsigned short mmap_miss = 0, mmap_miss_saved; > > + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > + end_pgoff = min(end_pgoff, file_end); > + > rcu_read_lock(); > folio = next_uptodate_folio(&xas, mapping, end_pgoff); > if (!folio) > goto out; > > - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; > - end_pgoff = min(end_pgoff, file_end); > - > /* > * Do not allow to map with PMD across i_size to preserve > * SIGBUS semantics. LGTM. I do wonder if we want to add a comment above the file_end, stating that this really must happen before the next_uptodate_folio() to handle concurrent truncation. -- Cheers, David ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-16 14:06 ` David Hildenbrand (Arm) @ 2026-03-17 1:16 ` Baolin Wang 2026-03-17 8:27 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 9+ messages in thread From: Baolin Wang @ 2026-03-17 1:16 UTC (permalink / raw) To: David Hildenbrand (Arm), akpm, willy Cc: lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel On 3/16/26 10:06 PM, David Hildenbrand (Arm) wrote: > On 3/13/26 04:45, Baolin Wang wrote: >> When running stress-ng on my Arm64 machine with v7.0-rc3 kernel, I encountered >> some very strange crash issues showing up as "Bad page state": >> >> " >> [ 734.496287] BUG: Bad page state in process stress-ng-env pfn:415735fb >> [ 734.496427] page: refcount:0 mapcount:1 mapping:0000000000000000 index:0x4cf316 pfn:0x415735fb >> [ 734.496434] flags: 0x57fffe000000800(owner_2|node=1|zone=2|lastcpupid=0x3ffff) >> [ 734.496439] raw: 057fffe000000800 0000000000000000 dead000000000122 0000000000000000 >> [ 734.496440] raw: 00000000004cf316 0000000000000000 0000000000000000 0000000000000000 >> [ 734.496442] page dumped because: nonzero mapcount >> " >> >> After analyzing this page’s state, it is hard to understand why the mapcount >> is not 0 while the refcount is 0, since this page is not where the issue first >> occurred. By enabling the CONFIG_DEBUG_VM config, I can reproduce the crash as >> well and captured the first warning where the issue appears: >> >> " >> [ 734.469226] page: refcount:33 mapcount:0 mapping:00000000bef2d187 index:0x81a0 pfn:0x415735c0 >> [ 734.469304] head: order:5 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 >> [ 734.469315] memcg:ffff000807a8ec00 >> [ 734.469320] aops:ext4_da_aops ino:100b6f dentry name(?):"stress-ng-mmaptorture-9397-0-2736200540" >> [ 734.469335] flags: 0x57fffe400000069(locked|uptodate|lru|head|node=1|zone=2|lastcpupid=0x3ffff) >> ...... >> [ 734.469364] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), >> const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: >> (struct folio *)_compound_head(page + nr_pages - 1))) != folio) >> [ 734.469390] ------------[ cut here ]------------ >> [ 734.469393] WARNING: ./include/linux/rmap.h:351 at folio_add_file_rmap_ptes+0x3b8/0x468, >> CPU#90: stress-ng-mlock/9430 >> [ 734.469551] folio_add_file_rmap_ptes+0x3b8/0x468 (P) >> [ 734.469555] set_pte_range+0xd8/0x2f8 >> [ 734.469566] filemap_map_folio_range+0x190/0x400 >> [ 734.469579] filemap_map_pages+0x348/0x638 >> [ 734.469583] do_fault_around+0x140/0x198 >> ...... >> [ 734.469640] el0t_64_sync+0x184/0x188 >> " >> >> The code that triggers the warning is: "VM_WARN_ON_FOLIO(page_folio(page + nr_pages - 1) != folio, folio)", >> which indicates that set_pte_range() tried to map beyond the large folio’s >> size. > > We had a bunch of similar reports throughout the last year from syzbot, > but always without a reproducer. I can reproduce it by running stress-ng for a few hours on my 128‑core machine, but I can’t reproduce it on smaller machines (e.g., 32‑core systems). It seems we need sufficient concurrency to trigger this race. > In particular this here: > > https://syzkaller.appspot.com/bug?extid=c0673e1f1f054fac28c2 > > via > > https://lore.kernel.org/all/6758f0cc.050a0220.17f54a.0001.GAE@google.com/ > > Could that be the same root cause? It appears so. Its warning stack calltrace matches what I reproduced. >> By adding more debug information, I found that 'nr_pages' had overflowed in >> filemap_map_pages(), causing set_pte_range() to establish mappings for a range >> exceeding the folio size, potentially corrupting fields of pages that do not >> belong to this folio (e.g., page->_mapcount). > > Sounds quite bad. > >> >> After above analysis, I think the possible race is as follows: >> >> CPU 0 CPU 1 >> filemap_map_pages() ext4_setattr() >> //get and lock folio with old inode->i_size >> next_uptodate_folio() >> >> ....... >> //shrink the inode->i_size >> i_size_write(inode, attr->ia_size); >> >> //calculate the end_pgoff with the new inode->i_size >> file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> end_pgoff = min(end_pgoff, file_end); >> >> ...... >> //nr_pages can be overflowed, cause xas.xa_index > end_pgoff >> end = folio_next_index(folio) - 1; >> nr_pages = min(end, end_pgoff) - xas.xa_index + 1; >> >> ...... >> //map large folio >> filemap_map_folio_range() >> ...... >> //truncate folios >> truncate_pagecache(inode, inode->i_size); >> >> To fix this issue, move the 'end_pgoff' calculation before next_uptodate_folio(), >> so the retrieved folio stays consistent with the file end to avoid 'nr_pages' >> calculation overflow. After this patch, the crash issue is gone. >> >> Fixes: 743a2753a02e ("filemap: cap PTE range to be created to allowed zero fill in folio_map_range()") > > This certainly sounds like stable material :) Yes, will cc stable. >> Reported-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >> Tested-by: Yuanhe Shu <xiangzao@linux.alibaba.com> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> >> --- >> mm/filemap.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index bc6775084744..923d28e59642 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -3879,14 +3879,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, >> unsigned int nr_pages = 0, folio_type; >> unsigned short mmap_miss = 0, mmap_miss_saved; >> >> + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> + end_pgoff = min(end_pgoff, file_end); >> + >> rcu_read_lock(); >> folio = next_uptodate_folio(&xas, mapping, end_pgoff); >> if (!folio) >> goto out; >> >> - file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; >> - end_pgoff = min(end_pgoff, file_end); >> - >> /* >> * Do not allow to map with PMD across i_size to preserve >> * SIGBUS semantics. > > > LGTM. I do wonder if we want to add a comment above the file_end, > stating that this really must happen before the next_uptodate_folio() > to handle concurrent truncation. Ack. How about adding the following comments? " Recalculate end_pgoff based on file_end before calling next_uptodate_folio() to avoid races with concurrent truncation. " ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() 2026-03-17 1:16 ` Baolin Wang @ 2026-03-17 8:27 ` David Hildenbrand (Arm) 0 siblings, 0 replies; 9+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-17 8:27 UTC (permalink / raw) To: Baolin Wang, akpm, willy Cc: lorenzo.stoakes, kas, p.raghav, mcgrof, dhowells, djwong, hare, da.gomez, dchinner, brauner, xiangzao, linux-fsdevel, linux-mm, linux-kernel >> LGTM. I do wonder if we want to add a comment above the file_end, >> stating that this really must happen before the next_uptodate_folio() >> to handle concurrent truncation. > > Ack. How about adding the following comments? > > " > Recalculate end_pgoff based on file_end before calling > next_uptodate_folio() to avoid races with concurrent truncation. > " Works for me, thanks! -- Cheers, David ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-17 8:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-13 3:45 [RFC PATCH] mm: filemap: fix nr_pages calculation overflow in filemap_map_pages() Baolin Wang 2026-03-13 5:11 ` Dev Jain 2026-03-13 5:14 ` Dev Jain 2026-03-13 5:54 ` Baolin Wang 2026-03-16 12:00 ` Kiryl Shutsemau 2026-03-17 1:04 ` Baolin Wang 2026-03-16 14:06 ` David Hildenbrand (Arm) 2026-03-17 1:16 ` Baolin Wang 2026-03-17 8:27 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox