* [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite()
@ 2025-11-22 1:57 Deepanshu Kartikey
2025-11-30 2:06 ` Deepanshu Kartikey
0 siblings, 1 reply; 19+ messages in thread
From: Deepanshu Kartikey @ 2025-11-22 1:57 UTC (permalink / raw)
To: tytso, adilger.kernel, djwong
Cc: linux-ext4, linux-kernel, Deepanshu Kartikey,
syzbot+b0a0670332b6b3230a0a
When delayed block allocation fails due to filesystem corruption,
ext4's writeback error handling invalidates affected folios by calling
mpage_release_unused_pages() with invalidate=true, which explicitly
clears the uptodate flag:
static void mpage_release_unused_pages(..., bool invalidate)
{
...
if (invalidate) {
block_invalidate_folio(folio, 0, folio_size(folio));
folio_clear_uptodate(folio);
}
}
If ext4_page_mkwrite() is subsequently called on such a non-uptodate
folio, it can proceed to mark the folio dirty without checking its
state. This triggers a warning in __folio_mark_dirty():
WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960
__folio_mark_dirty+0x578/0x880
Call Trace:
fault_dirty_shared_page+0x16e/0x2d0
do_wp_page+0x38b/0xd20
handle_pte_fault+0x1da/0x450
__handle_mm_fault+0x652/0x13b0
handle_mm_fault+0x22a/0x6f0
do_user_addr_fault+0x200/0x8a0
exc_page_fault+0x81/0x1b0
This scenario occurs when:
1. A write with delayed allocation marks a folio dirty (uptodate=1)
2. Writeback attempts block allocation but detects filesystem corruption
3. Error handling calls mpage_release_unused_pages(invalidate=true),
which clears the uptodate flag via folio_clear_uptodate()
4. A subsequent ftruncate() triggers ext4_truncate()
5. ext4_block_truncate_page() attempts to zero the page tail
6. This triggers a write fault on the mmap'd page
7. ext4_page_mkwrite() is called with the non-uptodate folio
8. Without checking uptodate, it proceeds to mark the folio dirty
9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate())
Fix this by checking folio_test_uptodate() early in ext4_page_mkwrite()
and returning VM_FAULT_SIGBUS if the folio is not uptodate. This prevents
attempting to write to invalidated folios and properly signals the error
to userspace.
The check is placed early, before the delalloc/journal/normal code paths,
as none of these paths should proceed with a non-uptodate folio.
Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com
Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
fs/ext4/inode.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e99306a8f47c..18a029362c1f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6688,6 +6688,14 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
if (err)
goto out_ret;
+ folio_lock(folio);
+ if (!folio_test_uptodate(folio)) {
+ folio_unlock(folio);
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+ folio_unlock(folio);
+
/*
* On data journalling we skip straight to the transaction handle:
* there's no delalloc; page truncated will be checked later; the
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-11-22 1:57 [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() Deepanshu Kartikey @ 2025-11-30 2:06 ` Deepanshu Kartikey 2025-12-02 12:24 ` Zhang Yi 0 siblings, 1 reply; 19+ messages in thread From: Deepanshu Kartikey @ 2025-11-30 2:06 UTC (permalink / raw) To: tytso, adilger.kernel, djwong Cc: linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a On Sat, Nov 22, 2025 at 7:27 AM Deepanshu Kartikey <kartikey406@gmail.com> wrote: > > When delayed block allocation fails due to filesystem corruption, > ext4's writeback error handling invalidates affected folios by calling > mpage_release_unused_pages() with invalidate=true, which explicitly > clears the uptodate flag: > > static void mpage_release_unused_pages(..., bool invalidate) > { > ... > if (invalidate) { > block_invalidate_folio(folio, 0, folio_size(folio)); > folio_clear_uptodate(folio); > } > } > > If ext4_page_mkwrite() is subsequently called on such a non-uptodate > folio, it can proceed to mark the folio dirty without checking its > state. This triggers a warning in __folio_mark_dirty(): > > WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 > __folio_mark_dirty+0x578/0x880 > > Call Trace: > fault_dirty_shared_page+0x16e/0x2d0 > do_wp_page+0x38b/0xd20 > handle_pte_fault+0x1da/0x450 > __handle_mm_fault+0x652/0x13b0 > handle_mm_fault+0x22a/0x6f0 > do_user_addr_fault+0x200/0x8a0 > exc_page_fault+0x81/0x1b0 > > This scenario occurs when: > 1. A write with delayed allocation marks a folio dirty (uptodate=1) > 2. Writeback attempts block allocation but detects filesystem corruption > 3. Error handling calls mpage_release_unused_pages(invalidate=true), > which clears the uptodate flag via folio_clear_uptodate() > 4. A subsequent ftruncate() triggers ext4_truncate() > 5. ext4_block_truncate_page() attempts to zero the page tail > 6. This triggers a write fault on the mmap'd page > 7. ext4_page_mkwrite() is called with the non-uptodate folio > 8. Without checking uptodate, it proceeds to mark the folio dirty > 9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()) > > Fix this by checking folio_test_uptodate() early in ext4_page_mkwrite() > and returning VM_FAULT_SIGBUS if the folio is not uptodate. This prevents > attempting to write to invalidated folios and properly signals the error > to userspace. > > The check is placed early, before the delalloc/journal/normal code paths, > as none of these paths should proceed with a non-uptodate folio. > > Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com > Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> > --- > fs/ext4/inode.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index e99306a8f47c..18a029362c1f 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -6688,6 +6688,14 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) > if (err) > goto out_ret; > > + folio_lock(folio); > + if (!folio_test_uptodate(folio)) { > + folio_unlock(folio); > + ret = VM_FAULT_SIGBUS; > + goto out; > + } > + folio_unlock(folio); > + > /* > * On data journalling we skip straight to the transaction handle: > * there's no delalloc; page truncated will be checked later; the > -- > 2.43.0 > Hi Ted and ext4 maintainers, I wanted to follow up on this patch submitted a week ago. This fixes a syzbot-reported WARNING in __folio_mark_dirty() that occurs when ext4_page_mkwrite() is called with a non-uptodate folio after delayed allocation writeback failure. Please let me know if there's any feedback or if I should make any changes. Thanks, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-11-30 2:06 ` Deepanshu Kartikey @ 2025-12-02 12:24 ` Zhang Yi 2025-12-03 1:37 ` Deepanshu Kartikey 0 siblings, 1 reply; 19+ messages in thread From: Zhang Yi @ 2025-12-02 12:24 UTC (permalink / raw) To: Deepanshu Kartikey Cc: linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, tytso, adilger.kernel, djwong Hi Deepanshu! On 11/30/2025 10:06 AM, Deepanshu Kartikey wrote: > On Sat, Nov 22, 2025 at 7:27 AM Deepanshu Kartikey > <kartikey406@gmail.com> wrote: >> >> When delayed block allocation fails due to filesystem corruption, >> ext4's writeback error handling invalidates affected folios by calling >> mpage_release_unused_pages() with invalidate=true, which explicitly >> clears the uptodate flag: >> >> static void mpage_release_unused_pages(..., bool invalidate) >> { >> ... >> if (invalidate) { >> block_invalidate_folio(folio, 0, folio_size(folio)); >> folio_clear_uptodate(folio); >> } >> } >> >> If ext4_page_mkwrite() is subsequently called on such a non-uptodate >> folio, it can proceed to mark the folio dirty without checking its >> state. This triggers a warning in __folio_mark_dirty(): >> >> WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 >> __folio_mark_dirty+0x578/0x880 >> >> Call Trace: >> fault_dirty_shared_page+0x16e/0x2d0 >> do_wp_page+0x38b/0xd20 >> handle_pte_fault+0x1da/0x450 >> __handle_mm_fault+0x652/0x13b0 >> handle_mm_fault+0x22a/0x6f0 >> do_user_addr_fault+0x200/0x8a0 >> exc_page_fault+0x81/0x1b0 >> >> This scenario occurs when: >> 1. A write with delayed allocation marks a folio dirty (uptodate=1) >> 2. Writeback attempts block allocation but detects filesystem corruption >> 3. Error handling calls mpage_release_unused_pages(invalidate=true), >> which clears the uptodate flag via folio_clear_uptodate() >> 4. A subsequent ftruncate() triggers ext4_truncate() >> 5. ext4_block_truncate_page() attempts to zero the page tail >> 6. This triggers a write fault on the mmap'd page >> 7. ext4_page_mkwrite() is called with the non-uptodate folio >> 8. Without checking uptodate, it proceeds to mark the folio dirty >> 9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()) Thank you a lot for analyzing this issue and the fix patch. As I was going through the process of understanding this issue, I had one question. Is the code flow that triggers the warning as follows? wp_page_shared() do_page_mkwrite() ext4_page_mkwrite() block_page_mkwrite() //The default delalloc path block_commit_write() mark_buffer_dirty() __folio_mark_dirty(0) //'warn' is false, doesn't trigger warning folio_mark_dirty() ext4_dirty_folio() block_dirty_folio //newly_dirty is false, doesn't call __folio_mark_dirty() fault_dirty_shared_page() folio_mark_dirty() //Trigger warning ? This folio has been marked as dirty. How was this warning triggered? Am I missing something? Thanks, Yi. >> >> Fix this by checking folio_test_uptodate() early in ext4_page_mkwrite() >> and returning VM_FAULT_SIGBUS if the folio is not uptodate. This prevents >> attempting to write to invalidated folios and properly signals the error >> to userspace. >> >> The check is placed early, before the delalloc/journal/normal code paths, >> as none of these paths should proceed with a non-uptodate folio. >> >> Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com >> Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com >> Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a >> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> >> --- >> fs/ext4/inode.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index e99306a8f47c..18a029362c1f 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -6688,6 +6688,14 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) >> if (err) >> goto out_ret; >> >> + folio_lock(folio); >> + if (!folio_test_uptodate(folio)) { >> + folio_unlock(folio); >> + ret = VM_FAULT_SIGBUS; >> + goto out; >> + } >> + folio_unlock(folio); >> + >> /* >> * On data journalling we skip straight to the transaction handle: >> * there's no delalloc; page truncated will be checked later; the >> -- >> 2.43.0 >> > > Hi Ted and ext4 maintainers, > > I wanted to follow up on this patch submitted a week ago. This fixes > a syzbot-reported WARNING in __folio_mark_dirty() that occurs when > ext4_page_mkwrite() is called with a non-uptodate folio after delayed > allocation writeback failure. > > Please let me know if there's any feedback or if I should make any > changes. > > Thanks, > Deepanshu > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-02 12:24 ` Zhang Yi @ 2025-12-03 1:37 ` Deepanshu Kartikey 2025-12-03 6:52 ` Zhang Yi 0 siblings, 1 reply; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-03 1:37 UTC (permalink / raw) To: Zhang Yi Cc: linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, tytso, adilger.kernel, djwong On Tue, Dec 2, 2025 at 5:54 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: > > Hi Deepanshu! > > On 11/30/2025 10:06 AM, Deepanshu Kartikey wrote: > > On Sat, Nov 22, 2025 at 7:27 AM Deepanshu Kartikey > > <kartikey406@gmail.com> wrote: > >> > >> When delayed block allocation fails due to filesystem corruption, > >> ext4's writeback error handling invalidates affected folios by calling > >> mpage_release_unused_pages() with invalidate=true, which explicitly > >> clears the uptodate flag: > >> > >> static void mpage_release_unused_pages(..., bool invalidate) > >> { > >> ... > >> if (invalidate) { > >> block_invalidate_folio(folio, 0, folio_size(folio)); > >> folio_clear_uptodate(folio); > >> } > >> } > >> > >> If ext4_page_mkwrite() is subsequently called on such a non-uptodate > >> folio, it can proceed to mark the folio dirty without checking its > >> state. This triggers a warning in __folio_mark_dirty(): > >> > >> WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 > >> __folio_mark_dirty+0x578/0x880 > >> > >> Call Trace: > >> fault_dirty_shared_page+0x16e/0x2d0 > >> do_wp_page+0x38b/0xd20 > >> handle_pte_fault+0x1da/0x450 > >> __handle_mm_fault+0x652/0x13b0 > >> handle_mm_fault+0x22a/0x6f0 > >> do_user_addr_fault+0x200/0x8a0 > >> exc_page_fault+0x81/0x1b0 > >> > >> This scenario occurs when: > >> 1. A write with delayed allocation marks a folio dirty (uptodate=1) > >> 2. Writeback attempts block allocation but detects filesystem corruption > >> 3. Error handling calls mpage_release_unused_pages(invalidate=true), > >> which clears the uptodate flag via folio_clear_uptodate() > >> 4. A subsequent ftruncate() triggers ext4_truncate() > >> 5. ext4_block_truncate_page() attempts to zero the page tail > >> 6. This triggers a write fault on the mmap'd page > >> 7. ext4_page_mkwrite() is called with the non-uptodate folio > >> 8. Without checking uptodate, it proceeds to mark the folio dirty > >> 9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()) > > Thank you a lot for analyzing this issue and the fix patch. As I was > going through the process of understanding this issue, I had one > question. Is the code flow that triggers the warning as follows? > > wp_page_shared() > do_page_mkwrite() > ext4_page_mkwrite() > block_page_mkwrite() //The default delalloc path > block_commit_write() > mark_buffer_dirty() > __folio_mark_dirty(0) //'warn' is false, doesn't trigger warning > folio_mark_dirty() > ext4_dirty_folio() > block_dirty_folio //newly_dirty is false, doesn't call __folio_mark_dirty() > fault_dirty_shared_page() > folio_mark_dirty() //Trigger warning ? > > This folio has been marked as dirty. How was this warning triggered? > Am I missing something? > > Thanks, > Yi. > Hi Yi, Thank you for your question about the exact flow that triggers the warning. You're correct that the code paths within ext4_page_mkwrite() and block_page_mkwrite() call __folio_mark_dirty() with warn=0, so no warning occurs there. The warning actually triggers later, in fault_dirty_shared_page() after page_mkwrite returns. Here's the complete flow: wp_page_shared() ↓ do_page_mkwrite() ↓ ext4_page_mkwrite() ↓ block_page_mkwrite() ↓ mark_buffer_dirty() → __folio_mark_dirty(warn=0) // No warning ↓ Returns success ↓ fault_dirty_shared_page(vma, folio) ← Warning triggers here ↓ folio_mark_dirty(folio) ↓ ext4_dirty_folio() ↓ block_dirty_folio() ↓ if (!folio_test_set_dirty(folio)) // Folio not already dirty __folio_mark_dirty(folio, mapping, 1) ← warn=1, triggers WARNING The key is that the folio can become non-uptodate between when it's initially read and when wp_page_shared() is called. This happens when: 1. Delayed block allocation fails due to filesystem corruption 2. Error handling in mpage_release_unused_pages() explicitly clears uptodate: if (invalidate) { block_invalidate_folio(folio, 0, folio_size(folio)); folio_clear_uptodate(folio); } 3. A subsequent operation (like ftruncate) triggers ext4_block_truncate_page() 4. This causes a write fault on the mmap'd page 5. wp_page_shared() is called with the now-non-uptodate folio From my debug logs with a test kernel: [22.387777] EXT4-fs error: lblock 0 mapped to illegal pblock 0 [22.389798] EXT4-fs: Delayed block allocation failed... error 117 [22.390401] EXT4-fs: This should not happen!! Data will be lost [22.399463] EXT4-fs error: Corrupt filesystem [22.400513] WP_PAGE_SHARED: ENTER folio=... uptodate=0 dirty=0 [22.401953] WP_PAGE_SHARED: page_mkwrite failed, returning 2 With my fix, ext4_page_mkwrite() detects the non-uptodate state and returns VM_FAULT_SIGBUS before block_page_mkwrite() is called, preventing wp_page_shared() from reaching fault_dirty_shared_page(). Without the fix, the sequence would be: - ext4_page_mkwrite() succeeds (doesn't check uptodate) - block_page_mkwrite() marks buffers dirty (warn=0, no warning) - Returns to wp_page_shared() - fault_dirty_shared_page() calls folio_mark_dirty() - block_dirty_folio() finds folio not dirty (uptodate=0, dirty=0) - Calls __folio_mark_dirty() with warn=1 - WARNING triggers: WARN_ON_ONCE(warn && !folio_test_uptodate(folio) && !folio_test_dirty(folio)) The syzbot call trace confirms this: Call Trace: fault_dirty_shared_page+0x16e/0x2d0 do_wp_page+0x38b/0xd20 handle_pte_fault+0x1da/0x450 Does this clarify the flow? Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 1:37 ` Deepanshu Kartikey @ 2025-12-03 6:52 ` Zhang Yi 2025-12-03 7:43 ` Deepanshu Kartikey 0 siblings, 1 reply; 19+ messages in thread From: Zhang Yi @ 2025-12-03 6:52 UTC (permalink / raw) To: Deepanshu Kartikey Cc: linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, tytso, adilger.kernel, djwong On 12/3/2025 9:37 AM, Deepanshu Kartikey wrote: > On Tue, Dec 2, 2025 at 5:54 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: >> >> Hi Deepanshu! >> >> On 11/30/2025 10:06 AM, Deepanshu Kartikey wrote: >>> On Sat, Nov 22, 2025 at 7:27 AM Deepanshu Kartikey >>> <kartikey406@gmail.com> wrote: >>>> >>>> When delayed block allocation fails due to filesystem corruption, >>>> ext4's writeback error handling invalidates affected folios by calling >>>> mpage_release_unused_pages() with invalidate=true, which explicitly >>>> clears the uptodate flag: >>>> >>>> static void mpage_release_unused_pages(..., bool invalidate) >>>> { >>>> ... >>>> if (invalidate) { >>>> block_invalidate_folio(folio, 0, folio_size(folio)); >>>> folio_clear_uptodate(folio); >>>> } >>>> } >>>> >>>> If ext4_page_mkwrite() is subsequently called on such a non-uptodate >>>> folio, it can proceed to mark the folio dirty without checking its >>>> state. This triggers a warning in __folio_mark_dirty(): >>>> >>>> WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 >>>> __folio_mark_dirty+0x578/0x880 >>>> >>>> Call Trace: >>>> fault_dirty_shared_page+0x16e/0x2d0 >>>> do_wp_page+0x38b/0xd20 >>>> handle_pte_fault+0x1da/0x450 >>>> __handle_mm_fault+0x652/0x13b0 >>>> handle_mm_fault+0x22a/0x6f0 >>>> do_user_addr_fault+0x200/0x8a0 >>>> exc_page_fault+0x81/0x1b0 >>>> >>>> This scenario occurs when: >>>> 1. A write with delayed allocation marks a folio dirty (uptodate=1) >>>> 2. Writeback attempts block allocation but detects filesystem corruption >>>> 3. Error handling calls mpage_release_unused_pages(invalidate=true), >>>> which clears the uptodate flag via folio_clear_uptodate() >>>> 4. A subsequent ftruncate() triggers ext4_truncate() >>>> 5. ext4_block_truncate_page() attempts to zero the page tail >>>> 6. This triggers a write fault on the mmap'd page >>>> 7. ext4_page_mkwrite() is called with the non-uptodate folio >>>> 8. Without checking uptodate, it proceeds to mark the folio dirty >>>> 9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()) >> >> Thank you a lot for analyzing this issue and the fix patch. As I was >> going through the process of understanding this issue, I had one >> question. Is the code flow that triggers the warning as follows? >> >> wp_page_shared() >> do_page_mkwrite() >> ext4_page_mkwrite() >> block_page_mkwrite() //The default delalloc path >> block_commit_write() >> mark_buffer_dirty() >> __folio_mark_dirty(0) //'warn' is false, doesn't trigger warning >> folio_mark_dirty() >> ext4_dirty_folio() >> block_dirty_folio //newly_dirty is false, doesn't call __folio_mark_dirty() >> fault_dirty_shared_page() >> folio_mark_dirty() //Trigger warning ? >> >> This folio has been marked as dirty. How was this warning triggered? >> Am I missing something? >> >> Thanks, >> Yi. >> > > Hi Yi, > > Thank you for your question about the exact flow that triggers the warning. > Thank you for the clarification, but there are still some details that are unclear. > You're correct that the code paths within ext4_page_mkwrite() and > block_page_mkwrite() call __folio_mark_dirty() with warn=0, so no > warning occurs there. The warning actually triggers later, in > fault_dirty_shared_page() after page_mkwrite returns. > > Here's the complete flow: > > wp_page_shared() > ↓ > do_page_mkwrite() > ↓ > ext4_page_mkwrite() > ↓ > block_page_mkwrite() > ↓ > mark_buffer_dirty() → __folio_mark_dirty(warn=0) // No warning ↓ if (!folio_test_set_dirty(folio)) //The folio will be mark as dirty --- 1 > ↓ > Returns success > ↓ > fault_dirty_shared_page(vma, folio) ← Warning triggers here > ↓ > folio_mark_dirty(folio) > ↓ > ext4_dirty_folio() > ↓ > block_dirty_folio() > ↓ > if (!folio_test_set_dirty(folio)) // Folio not already dirty This makes me confused, this folio should be set to dirty at position 1. If it is not dirty here, who cleared the dirty flag for this folio? > __folio_mark_dirty(folio, mapping, 1) ← warn=1, triggers WARNING > > The key is that the folio can become non-uptodate between when it's > initially read and when wp_page_shared() is called. This happens when: > > 1. Delayed block allocation fails due to filesystem corruption > 2. Error handling in mpage_release_unused_pages() explicitly clears uptodate: > > if (invalidate) { > block_invalidate_folio(folio, 0, folio_size(folio)); > folio_clear_uptodate(folio); > } > > 3. A subsequent operation (like ftruncate) triggers ext4_block_truncate_page() > 4. This causes a write fault on the mmap'd page > 5. wp_page_shared() is called with the now-non-uptodate folio > > From my debug logs with a test kernel: > > [22.387777] EXT4-fs error: lblock 0 mapped to illegal pblock 0 > [22.389798] EXT4-fs: Delayed block allocation failed... error 117 > [22.390401] EXT4-fs: This should not happen!! Data will be lost > > [22.399463] EXT4-fs error: Corrupt filesystem > > [22.400513] WP_PAGE_SHARED: ENTER folio=... uptodate=0 dirty=0 > [22.401953] WP_PAGE_SHARED: page_mkwrite failed, returning 2 > > With my fix, ext4_page_mkwrite() detects the non-uptodate state and > returns VM_FAULT_SIGBUS before block_page_mkwrite() is called, > preventing wp_page_shared() from reaching fault_dirty_shared_page(). > > Without the fix, the sequence would be: > - ext4_page_mkwrite() succeeds (doesn't check uptodate) > - block_page_mkwrite() marks buffers dirty (warn=0, no warning) > - Returns to wp_page_shared() > - fault_dirty_shared_page() calls folio_mark_dirty() > - block_dirty_folio() finds folio not dirty (uptodate=0, dirty=0) > - Calls __folio_mark_dirty() with warn=1 > - WARNING triggers: WARN_ON_ONCE(warn && !folio_test_uptodate(folio) > && !folio_test_dirty(folio)) > > The syzbot call trace confirms this: > > Call Trace: > fault_dirty_shared_page+0x16e/0x2d0 > do_wp_page+0x38b/0xd20 > handle_pte_fault+0x1da/0x450 > > Does this clarify the flow? > > Best regards, > Deepanshu > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 6:52 ` Zhang Yi @ 2025-12-03 7:43 ` Deepanshu Kartikey 2025-12-03 15:46 ` Theodore Tso 0 siblings, 1 reply; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-03 7:43 UTC (permalink / raw) To: Zhang Yi Cc: linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, tytso, adilger.kernel, djwong On Wed, Dec 3, 2025 at 12:22 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: > > On 12/3/2025 9:37 AM, Deepanshu Kartikey wrote: > > On Tue, Dec 2, 2025 at 5:54 PM Zhang Yi <yi.zhang@huaweicloud.com> wrote: > >> > >> Hi Deepanshu! > >> > >> On 11/30/2025 10:06 AM, Deepanshu Kartikey wrote: > >>> On Sat, Nov 22, 2025 at 7:27 AM Deepanshu Kartikey > >>> <kartikey406@gmail.com> wrote: > >>>> > >>>> When delayed block allocation fails due to filesystem corruption, > >>>> ext4's writeback error handling invalidates affected folios by calling > >>>> mpage_release_unused_pages() with invalidate=true, which explicitly > >>>> clears the uptodate flag: > >>>> > >>>> static void mpage_release_unused_pages(..., bool invalidate) > >>>> { > >>>> ... > >>>> if (invalidate) { > >>>> block_invalidate_folio(folio, 0, folio_size(folio)); > >>>> folio_clear_uptodate(folio); > >>>> } > >>>> } > >>>> > >>>> If ext4_page_mkwrite() is subsequently called on such a non-uptodate > >>>> folio, it can proceed to mark the folio dirty without checking its > >>>> state. This triggers a warning in __folio_mark_dirty(): > >>>> > >>>> WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960 > >>>> __folio_mark_dirty+0x578/0x880 > >>>> > >>>> Call Trace: > >>>> fault_dirty_shared_page+0x16e/0x2d0 > >>>> do_wp_page+0x38b/0xd20 > >>>> handle_pte_fault+0x1da/0x450 > >>>> __handle_mm_fault+0x652/0x13b0 > >>>> handle_mm_fault+0x22a/0x6f0 > >>>> do_user_addr_fault+0x200/0x8a0 > >>>> exc_page_fault+0x81/0x1b0 > >>>> > >>>> This scenario occurs when: > >>>> 1. A write with delayed allocation marks a folio dirty (uptodate=1) > >>>> 2. Writeback attempts block allocation but detects filesystem corruption > >>>> 3. Error handling calls mpage_release_unused_pages(invalidate=true), > >>>> which clears the uptodate flag via folio_clear_uptodate() > >>>> 4. A subsequent ftruncate() triggers ext4_truncate() > >>>> 5. ext4_block_truncate_page() attempts to zero the page tail > >>>> 6. This triggers a write fault on the mmap'd page > >>>> 7. ext4_page_mkwrite() is called with the non-uptodate folio > >>>> 8. Without checking uptodate, it proceeds to mark the folio dirty > >>>> 9. __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()) > >> > >> Thank you a lot for analyzing this issue and the fix patch. As I was > >> going through the process of understanding this issue, I had one > >> question. Is the code flow that triggers the warning as follows? > >> > >> wp_page_shared() > >> do_page_mkwrite() > >> ext4_page_mkwrite() > >> block_page_mkwrite() //The default delalloc path > >> block_commit_write() > >> mark_buffer_dirty() > >> __folio_mark_dirty(0) //'warn' is false, doesn't trigger warning > >> folio_mark_dirty() > >> ext4_dirty_folio() > >> block_dirty_folio //newly_dirty is false, doesn't call __folio_mark_dirty() > >> fault_dirty_shared_page() > >> folio_mark_dirty() //Trigger warning ? > >> > >> This folio has been marked as dirty. How was this warning triggered? > >> Am I missing something? > >> > >> Thanks, > >> Yi. > >> > > > > Hi Yi, > > > > Thank you for your question about the exact flow that triggers the warning. > > > > Thank you for the clarification, but there are still some details that > are unclear. > > > You're correct that the code paths within ext4_page_mkwrite() and > > block_page_mkwrite() call __folio_mark_dirty() with warn=0, so no > > warning occurs there. The warning actually triggers later, in > > fault_dirty_shared_page() after page_mkwrite returns. > > > > Here's the complete flow: > > > > wp_page_shared() > > ↓ > > do_page_mkwrite() > > ↓ > > ext4_page_mkwrite() > > ↓ > > block_page_mkwrite() > > ↓ > > mark_buffer_dirty() → __folio_mark_dirty(warn=0) // No warning > > ↓ > if (!folio_test_set_dirty(folio)) > //The folio will be mark as dirty --- 1 > > > ↓ > > Returns success > > ↓ > > fault_dirty_shared_page(vma, folio) ← Warning triggers here > > ↓ > > folio_mark_dirty(folio) > > ↓ > > ext4_dirty_folio() > > ↓ > > block_dirty_folio() > > ↓ > > if (!folio_test_set_dirty(folio)) // Folio not already dirty > > This makes me confused, this folio should be set to dirty > at position 1. If it is not dirty here, who cleared the dirty > flag for this folio? > Hi Yi, Excellent question! You're absolutely right that mark_buffer_dirty() marks the folio dirty at position 1. The key is that the folio dirty flag is cleared later by the error handling code. When delayed allocation fails and mpage_release_unused_pages() is called with invalidate=true, it calls: block_invalidate_folio(folio, 0, folio_size(folio)); This function not only invalidates the folio but also clears the dirty flag. Looking at the code path: block_invalidate_folio() → do_invalidate_folio() → Clears the dirty flag → Detaches buffer heads So the sequence is: 1. First wp_page_shared(): folio becomes dirty=1, uptodate=1 (via mark_buffer_dirty) 2. Writeback fails due to corruption 3. mpage_release_unused_pages(invalidate=true): - block_invalidate_folio() clears dirty flag - folio_clear_uptodate() clears uptodate flag - Result: folio is now dirty=0, uptodate=0 4. Second wp_page_shared(): called with folio dirty=0, uptodate=0 This is confirmed by my debug logs: [22.400513] WP_PAGE_SHARED: ENTER folio=... uptodate=0 dirty=0 The folio is BOTH non-uptodate AND non-dirty when the second wp_page_shared() is called. Without my fix: - ext4_page_mkwrite() succeeds (doesn't check uptodate) - block_page_mkwrite() tries to mark the folio dirty again - fault_dirty_shared_page() is reached - block_dirty_folio() finds folio not dirty (dirty=0) - Calls __folio_mark_dirty(warn=1) with uptodate=0 - WARNING triggers With my fix: - ext4_page_mkwrite() checks uptodate and returns VM_FAULT_SIGBUS - Never reaches fault_dirty_shared_page() - No warning Does this answer your question? Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 7:43 ` Deepanshu Kartikey @ 2025-12-03 15:46 ` Theodore Tso 2025-12-03 17:04 ` Deepanshu Kartikey ` (3 more replies) 0 siblings, 4 replies; 19+ messages in thread From: Theodore Tso @ 2025-12-03 15:46 UTC (permalink / raw) To: Deepanshu Kartikey Cc: Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong, Matthew Wilcox My main concern with your patch is folio_lock() is *incredibly* heavyweight and is going to be a real scalability concern if we need to take it every single time we need to make a page writeable. So could we perhaps do something like this? So the first question is do we need to take the lock at all? I'm not sure we need to worry about the case where the page is not uptodate because we're racing with the page being brought into memory; if we that could happen under normal circumstances we would be triggering the warning even without these situations such as a delayed allocaiton write failing due to a corrupted file system image. So can we just do this? if (!folio_test_uptodate(folio)) { ret = VM_FAULT_SIGBUS; goto out; } If it is legitmate that ext4_page_mkwrite() could be called while the page is still being read in (and again, I don't think it is), then we could do something like this: if (!folio_test_uptodate(folio)) { folio_lock(folio); if (!folio_test_uptodate(folio)) { folio_unlock(folio); ret = VM_FAULT_SIGBUS; goto out; } folio_unlock(folio); } Matthew, as the page cache maintainer, do we actually need this extra rigamarole. Or can we just skip taking the lock before checking to see if the folio is uptodate in ext4_page_mkwrite()? - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 15:46 ` Theodore Tso @ 2025-12-03 17:04 ` Deepanshu Kartikey 2025-12-03 18:40 ` Theodore Tso ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-03 17:04 UTC (permalink / raw) To: Theodore Tso Cc: Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong, Matthew Wilcox On Wed, Dec 3, 2025 at 9:18 PM Theodore Tso <tytso@mit.edu> wrote: > > My main concern with your patch is folio_lock() is *incredibly* > heavyweight and is going to be a real scalability concern if we need > to take it every single time we need to make a page writeable. > > So could we perhaps do something like this? So the first question is > do we need to take the lock at all? I'm not sure we need to worry > about the case where the page is not uptodate because we're racing > with the page being brought into memory; if we that could happen under > normal circumstances we would be triggering the warning even without > these situations such as a delayed allocaiton write failing due to a > corrupted file system image. So can we just do this? > > if (!folio_test_uptodate(folio)) { > ret = VM_FAULT_SIGBUS; > goto out; > } > > If it is legitmate that ext4_page_mkwrite() could be called while the > page is still being read in (and again, I don't think it is), then we > could do something like this: > > if (!folio_test_uptodate(folio)) { > folio_lock(folio); > if (!folio_test_uptodate(folio)) { > folio_unlock(folio); > ret = VM_FAULT_SIGBUS; > goto out; > } > folio_unlock(folio); > } > > Matthew, as the page cache maintainer, do we actually need this extra > rigamarole. Or can we just skip taking the lock before checking to > see if the folio is uptodate in ext4_page_mkwrite()? > > - Ted Hi Ted, Thank you for the feedback and the performance concern! You're absolutely right that folio_lock() is heavyweight. I included it because I was being overly cautious about potential races, but I agree with your analysis that under normal circumstances, ext4_page_mkwrite() should never be called with a non-uptodate folio. The non-uptodate state only occurs in this specific error case where: 1. Delayed allocation fails due to corruption 2. mpage_release_unused_pages() invalidates the folio 3. A subsequent operation triggers the fault In this error path, the folio is already in an inconsistent state, so checking folio_test_uptodate() without the lock should be sufficient to catch it. I'll wait for Matthew's input on the locking question, and then send v3 with the appropriate changes. Thank you for the guidance! Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 15:46 ` Theodore Tso 2025-12-03 17:04 ` Deepanshu Kartikey @ 2025-12-03 18:40 ` Theodore Tso 2025-12-03 21:35 ` Matthew Wilcox 2025-12-03 21:54 ` Matthew Wilcox 3 siblings, 0 replies; 19+ messages in thread From: Theodore Tso @ 2025-12-03 18:40 UTC (permalink / raw) To: Deepanshu Kartikey Cc: Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong, Matthew Wilcox On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote: > > if (!folio_test_uptodate(folio)) { > ret = VM_FAULT_SIGBUS; > goto out; > } And actually, thinking about this some more, is this check something that we should be doing in ext4? Or in the mm layer? Matthew, what do you think? - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 15:46 ` Theodore Tso 2025-12-03 17:04 ` Deepanshu Kartikey 2025-12-03 18:40 ` Theodore Tso @ 2025-12-03 21:35 ` Matthew Wilcox 2025-12-03 22:33 ` Theodore Tso 2025-12-03 21:54 ` Matthew Wilcox 3 siblings, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2025-12-03 21:35 UTC (permalink / raw) To: Theodore Tso Cc: Deepanshu Kartikey, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong You snipped out all the context when adding me to the cc, and I'm on holiday until after Plumbers, so I'm disinclined to go looking for context. On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote: > My main concern with your patch is folio_lock() is *incredibly* > heavyweight and is going to be a real scalability concern if we need > to take it every single time we need to make a page writeable. > > So could we perhaps do something like this? So the first question is > do we need to take the lock at all? I'm not sure we need to worry > about the case where the page is not uptodate because we're racing > with the page being brought into memory; if we that could happen under > normal circumstances we would be triggering the warning even without > these situations such as a delayed allocaiton write failing due to a > corrupted file system image. So can we just do this? > > if (!folio_test_uptodate(folio)) { > ret = VM_FAULT_SIGBUS; > goto out; > } > > If it is legitmate that ext4_page_mkwrite() could be called while the > page is still being read in (and again, I don't think it is), then we > could do something like this: > > if (!folio_test_uptodate(folio)) { > folio_lock(folio); > if (!folio_test_uptodate(folio)) { > folio_unlock(folio); > ret = VM_FAULT_SIGBUS; > goto out; > } > folio_unlock(folio); > } > > Matthew, as the page cache maintainer, do we actually need this extra > rigamarole. Or can we just skip taking the lock before checking to > see if the folio is uptodate in ext4_page_mkwrite()? > > - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 21:35 ` Matthew Wilcox @ 2025-12-03 22:33 ` Theodore Tso 2025-12-04 9:54 ` Deepanshu Kartikey 0 siblings, 1 reply; 19+ messages in thread From: Theodore Tso @ 2025-12-03 22:33 UTC (permalink / raw) To: Matthew Wilcox Cc: Deepanshu Kartikey, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Wed, Dec 03, 2025 at 09:35:29PM +0000, Matthew Wilcox wrote: > You snipped out all the context when adding me to the cc, and I'm on > holiday until after Plumbers, so I'm disinclined to go looking for > context. Sorry about that. A quick summary. Deepanshu was attempting to address a Syzbot complaint[1]. [1] https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a The TL;DR summary is that the syzbot complaint involved a maliciously corrupted file system which resulted file system getting detected when delayed allocation writeback attempts to do a block allocation. Error handling calls mpage_release_unused_pages(invalidate=true), which clears the uptodate flag via folio_clear_uptodate(). Because syzbot mounts the file system using errors=continue (which is the worst case; we're not panic'ing the kernel or forcing the file system to be read-only), we now have a situation where we have a folio which can be mapped, but !uptodate, but the file system can still be subject to changes. In the syzkaller reproducer, the potential malware might call ftruncate on the file, and this results ext4_truncate() calling ext4_block_truncate_page() which thentries to zero the page tail, which triggers a write fault, resulting in ext4_page_mkwrite() on a page/folio which is not uptodate. It then tries to mark the folio dirty, mapped but !uptdate, and then __folio_mark_dirty() triggers: WARN_ON_ONCE(!folio_test_uptodate()). Since in Syzkaller assumes users are stupid enough to panic on warn, this is an urgent security issue because it's a denial of service attack which is CVE worthy --- where the system admiinstrator is stupid enough to allow an untrusted user to mount an untrusted, maliciously crafted file system, instead of using fuse2fs. The security people thinkt his is super-duper important. Personally, I don't think it's all that urgent, so by all means, don't feel obliged to think about this while on vacation. :-) Anyway, that's the context. Deepanshu has a proposed fix here[2] which puts a folio_lock() into every single write page fault for ext4: + folio_lock(folio); + if (!folio_test_uptodate(folio)) { + folio_unlock(folio); + ret = VM_FAULT_SIGBUS; + goto out; + } + folio_unlock(folio); [2] https://lore.kernel.org/r/20251122015742.362444-1-kartikey406@gmail.com This seems.... unfortunate to me, so the first question is, "is locking the folio really necessary"? (I suspect the answer is no), and two, should this check be done in the mm layer calling page_mkwrite(), or in ext4_page_mkwrite()? Presumably, this might happen for other file systems, with either syzkaller coming up with this rather implausible scenario of really stupid, unfortunately system adminsitrator choices --- or in real life, if we do have some system adminisrtators making really stupid, unfortunate life choices. So maybe we should this check should be done above the file system layer? - Ted > > On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote: > > My main concern with your patch is folio_lock() is *incredibly* > > heavyweight and is going to be a real scalability concern if we need > > to take it every single time we need to make a page writeable. > > > > So could we perhaps do something like this? So the first question is > > do we need to take the lock at all? I'm not sure we need to worry > > about the case where the page is not uptodate because we're racing > > with the page being brought into memory; if we that could happen under > > normal circumstances we would be triggering the warning even without > > these situations such as a delayed allocaiton write failing due to a > > corrupted file system image. So can we just do this? > > > > if (!folio_test_uptodate(folio)) { > > ret = VM_FAULT_SIGBUS; > > goto out; > > } > > > > If it is legitmate that ext4_page_mkwrite() could be called while the > > page is still being read in (and again, I don't think it is), then we > > could do something like this: > > > > if (!folio_test_uptodate(folio)) { > > folio_lock(folio); > > if (!folio_test_uptodate(folio)) { > > folio_unlock(folio); > > ret = VM_FAULT_SIGBUS; > > goto out; > > } > > folio_unlock(folio); > > } > > > > Matthew, as the page cache maintainer, do we actually need this extra > > rigamarole. Or can we just skip taking the lock before checking to > > see if the folio is uptodate in ext4_page_mkwrite()? > > > > - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 22:33 ` Theodore Tso @ 2025-12-04 9:54 ` Deepanshu Kartikey 2025-12-05 2:18 ` Theodore Tso 0 siblings, 1 reply; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-04 9:54 UTC (permalink / raw) To: Theodore Tso Cc: Matthew Wilcox, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Thu, Dec 4, 2025 at 4:04 AM Theodore Tso <tytso@mit.edu> wrote: > > On Wed, Dec 03, 2025 at 09:35:29PM +0000, Matthew Wilcox wrote: > > You snipped out all the context when adding me to the cc, and I'm on > > holiday until after Plumbers, so I'm disinclined to go looking for > > context. > > Sorry about that. A quick summary. Deepanshu was attempting to > address a Syzbot complaint[1]. > > [1] https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a > > The TL;DR summary is that the syzbot complaint involved a maliciously > corrupted file system which resulted file system getting detected when > delayed allocation writeback attempts to do a block allocation. Error > handling calls mpage_release_unused_pages(invalidate=true), which > clears the uptodate flag via folio_clear_uptodate(). > > Because syzbot mounts the file system using errors=continue (which is > the worst case; we're not panic'ing the kernel or forcing the file > system to be read-only), we now have a situation where we have a folio > which can be mapped, but !uptodate, but the file system can still be > subject to changes. > > In the syzkaller reproducer, the potential malware might call > ftruncate on the file, and this results ext4_truncate() calling > ext4_block_truncate_page() which thentries to zero the page tail, > which triggers a write fault, resulting in ext4_page_mkwrite() on a > page/folio which is not uptodate. It then tries to mark the folio > dirty, mapped but !uptdate, and then __folio_mark_dirty() triggers: > WARN_ON_ONCE(!folio_test_uptodate()). > > Since in Syzkaller assumes users are stupid enough to panic on warn, > this is an urgent security issue because it's a denial of service > attack which is CVE worthy --- where the system admiinstrator is > stupid enough to allow an untrusted user to mount an untrusted, > maliciously crafted file system, instead of using fuse2fs. The > security people thinkt his is super-duper important. Personally, I > don't think it's all that urgent, so by all means, don't feel obliged > to think about this while on vacation. :-) > > Anyway, that's the context. Deepanshu has a proposed fix here[2] > which puts a folio_lock() into every single write page fault for ext4: > > + folio_lock(folio); > + if (!folio_test_uptodate(folio)) { > + folio_unlock(folio); > + ret = VM_FAULT_SIGBUS; > + goto out; > + } > + folio_unlock(folio); > > [2] https://lore.kernel.org/r/20251122015742.362444-1-kartikey406@gmail.com > > This seems.... unfortunate to me, so the first question is, "is > locking the folio really necessary"? (I suspect the answer is no), > and two, should this check be done in the mm layer calling > page_mkwrite(), or in ext4_page_mkwrite()? > > Presumably, this might happen for other file systems, with either > syzkaller coming up with this rather implausible scenario of really > stupid, unfortunately system adminsitrator choices --- or in real > life, if we do have some system adminisrtators making really stupid, > unfortunate life choices. So maybe we should this check should be > done above the file system layer? > > - Ted > > > > > > > > > > > On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote: > > > My main concern with your patch is folio_lock() is *incredibly* > > > heavyweight and is going to be a real scalability concern if we need > > > to take it every single time we need to make a page writeable. > > > > > > So could we perhaps do something like this? So the first question is > > > do we need to take the lock at all? I'm not sure we need to worry > > > about the case where the page is not uptodate because we're racing > > > with the page being brought into memory; if we that could happen under > > > normal circumstances we would be triggering the warning even without > > > these situations such as a delayed allocaiton write failing due to a > > > corrupted file system image. So can we just do this? > > > > > > if (!folio_test_uptodate(folio)) { > > > ret = VM_FAULT_SIGBUS; > > > goto out; > > > } > > > > > > If it is legitmate that ext4_page_mkwrite() could be called while the > > > page is still being read in (and again, I don't think it is), then we > > > could do something like this: > > > > > > if (!folio_test_uptodate(folio)) { > > > folio_lock(folio); > > > if (!folio_test_uptodate(folio)) { > > > folio_unlock(folio); > > > ret = VM_FAULT_SIGBUS; > > > goto out; > > > } > > > folio_unlock(folio); > > > } > > > > > > Matthew, as the page cache maintainer, do we actually need this extra > > > rigamarole. Or can we just skip taking the lock before checking to > > > see if the folio is uptodate in ext4_page_mkwrite()? > > > > > > - Ted Hi Ted, Thank you for the detailed summary and context. Based on Matthew's earlier feedback that we need to "prevent !uptodate folios from being referenced by the page tables," I believe the correct fix is not in ext4_page_mkwrite() at all, but rather in mpage_release_unused_pages(). When we invalidate folios due to writeback failure, we should also unmap them from page tables: static void mpage_release_unused_pages(..., bool invalidate) { // ... existing code ... if (invalidate) { if (folio_mapped(folio)) folio_clear_dirty_for_io(folio); // Unmap from page tables before invalidating if (folio_mapped(folio)) unmap_mapping_folio(folio); block_invalidate_folio(folio, 0, folio_size(folio)); folio_clear_uptodate(folio); } } This way: 1. No performance impact on normal operations (only in error path) 2. No folio_lock() needed in ext4_page_mkwrite() 3. Prevents stale PTEs from referencing invalidated folios 4. Any subsequent access triggers a new page fault instead To address your question about whether this should be in MM layer: other filesystems could hit this same issue if they use delayed allocation and invalidate folios on writeback failure. However, since the invalidation is happening in ext4-specific code (mpage_release_unused_pages), fixing it there seems appropriate. If this pattern appears in other filesystems, perhaps the fix could be moved to a common helper. I'll send v3 with this approach. Does this look correct? Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-04 9:54 ` Deepanshu Kartikey @ 2025-12-05 2:18 ` Theodore Tso 2025-12-05 3:31 ` Deepanshu Kartikey 2025-12-05 3:33 ` Matthew Wilcox 0 siblings, 2 replies; 19+ messages in thread From: Theodore Tso @ 2025-12-05 2:18 UTC (permalink / raw) To: Deepanshu Kartikey Cc: Matthew Wilcox, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Thu, Dec 04, 2025 at 03:24:50PM +0530, Deepanshu Kartikey wrote: > Based on Matthew's earlier feedback that we need to "prevent !uptodate > folios from being referenced by the page tables," I believe the > correct fix is not in ext4_page_mkwrite() at all, but rather in > mpage_release_unused_pages(). > > When we invalidate folios due to writeback failure, we should also > unmap them from page tables.... Hmm.... if the page is mmap'ed into the user process, on a writeback failure, the page contents will suddenly and without any warning, *disappear*. So the other option is we could simply *not* invalidate the folio, but instead leave the folio dirty. In some cases, where a particular block group is corrupted, if we retry the block allocation, the corrupted block group will be busied out, and so when the write back is retried, it's possible that the data will be actually be persisted. We do need to make sure the right thing we unmount the filesystem, since at that point, we have no choice but the invalidate the page and the data will get lost when the file system is unmounted. So it's a more complicated approach. But if this is happening when the file system is corrupted, especially if it was maliciously corrupted, all bets are off anyway, so maybe it's not worth the complexity. - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-05 2:18 ` Theodore Tso @ 2025-12-05 3:31 ` Deepanshu Kartikey 2025-12-05 3:33 ` Matthew Wilcox 1 sibling, 0 replies; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-05 3:31 UTC (permalink / raw) To: Theodore Tso Cc: Matthew Wilcox, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Fri, Dec 5, 2025 at 7:49 AM Theodore Tso <tytso@mit.edu> wrote: > > Hmm.... if the page is mmap'ed into the user process, on a writeback > failure, the page contents will suddenly and without any warning, > *disappear*. > > So the other option is we could simply *not* invalidate the folio, but > instead leave the folio dirty. In some cases, where a particular > block group is corrupted, if we retry the block allocation, the > corrupted block group will be busied out, and so when the write back > is retried, it's possible that the data will be actually be persisted. > > We do need to make sure the right thing we unmount the filesystem, > since at that point, we have no choice but the invalidate the page and > the data will get lost when the file system is unmounted. So it's a > more complicated approach. But if this is happening when the file > system is corrupted, especially if it was maliciously corrupted, all > bets are off anyway, so maybe it's not worth the complexity. > > - Ted Hi Ted, I understand your concern about data loss. You're right that unmapping causes user data to disappear from memory. However, as you noted, when the filesystem is corrupted: 1. The error message already says "Data will be lost" 2. All bets are off anyway 3. The simpler fix prevents the WARNING/crash The more complex approach (keep dirty + retry) would be nice, but given that corruption is already detected, I agree it's not worth the complexity. I'll proceed with v3 using the unmap approach. Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-05 2:18 ` Theodore Tso 2025-12-05 3:31 ` Deepanshu Kartikey @ 2025-12-05 3:33 ` Matthew Wilcox 2025-12-05 13:37 ` Theodore Tso 1 sibling, 1 reply; 19+ messages in thread From: Matthew Wilcox @ 2025-12-05 3:33 UTC (permalink / raw) To: Theodore Tso Cc: Deepanshu Kartikey, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Thu, Dec 04, 2025 at 09:18:18PM -0500, Theodore Tso wrote: > On Thu, Dec 04, 2025 at 03:24:50PM +0530, Deepanshu Kartikey wrote: > > Based on Matthew's earlier feedback that we need to "prevent !uptodate > > folios from being referenced by the page tables," I believe the > > correct fix is not in ext4_page_mkwrite() at all, but rather in > > mpage_release_unused_pages(). > > > > When we invalidate folios due to writeback failure, we should also > > unmap them from page tables.... > > Hmm.... if the page is mmap'ed into the user process, on a writeback > failure, the page contents will suddenly and without any warning, > *disappear*. It sounds like I was confused -- I thought the folios being invalidated in mpage_release_unused_pages() belonged to the block device, but from what you're saying, they belong to a user-visible file? Once we hit a writeback error (whether we're in a "device gave EIO" or "filesystem is corrupted" situation), we're firmly outside what POSIX speaks to, and so all that matters is quality of implementation. Now, is the folio necessarily dirty at this point? I guess so if we're in the writeback path. Darrick got rid of similar code in iomap a few years ago; see commit e9c3a8e820ed. So it'd probably be good to have ext4 behave the same way. > So the other option is we could simply *not* invalidate the folio, but > instead leave the folio dirty. In some cases, where a particular > block group is corrupted, if we retry the block allocation, the > corrupted block group will be busied out, and so when the write back > is retried, it's possible that the data will be actually be persisted. > > We do need to make sure the right thing we unmount the filesystem, > since at that point, we have no choice but the invalidate the page and > the data will get lost when the file system is unmounted. So it's a > more complicated approach. But if this is happening when the file > system is corrupted, especially if it was maliciously corrupted, all > bets are off anyway, so maybe it's not worth the complexity. I'm generally in favour of doing anything we can to write dirty user data back to storage ;-) Of course if the storage is throwing a wobbly, that's beyond our abilities. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-05 3:33 ` Matthew Wilcox @ 2025-12-05 13:37 ` Theodore Tso 2025-12-05 14:28 ` Deepanshu Kartikey 0 siblings, 1 reply; 19+ messages in thread From: Theodore Tso @ 2025-12-05 13:37 UTC (permalink / raw) To: Matthew Wilcox Cc: Deepanshu Kartikey, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Fri, Dec 05, 2025 at 03:33:22AM +0000, Matthew Wilcox wrote: > It sounds like I was confused -- I thought the folios being > invalidated in mpage_release_unused_pages() belonged to the block > device, but from what you're saying, they belong to a user-visible > file? Yes, correct. I'm guessing that we were marking the page !uptodate back when that was the only way to indicate that there had been any kind of I/O error (either on the read or write side). Obviously we have much better ways of doing it in the 21st century. :-) > Now, is the folio necessarily dirty at this point? I guess so if > we're in the writeback path. Darrick got rid of similar code in > iomap a few years ago; see commit e9c3a8e820ed. So it'd probably be > good to have ext4 behave the same way. Hmm, yes. Agreed. commit e9c3a8e820ed0eeb2be05072f29f80d1b79f053b Author: Darrick J. Wong <djwong@kernel.org> Date: Mon May 16 15:27:38 2022 -0700 iomap: don't invalidate folios after writeback errors XFS has the unique behavior (as compared to the other Linux filesystems) that on writeback errors it will completely invalidate the affected folio and force the page cache to reread the contents from disk. All other filesystems leave the page mapped and up to date. This is a rude awakening for user programs, since (in the case where write fails but reread doesn't) file contents will appear to revert old disk contents with no notification other than an EIO on fsync. This might have been annoying back in the days when iomap dealt with one page at a time, but with multipage folios, we can now throw away *megabytes* worth of data for a single write error... As Darrick pointed out we could potentially append a *single* byte to a file, and if there was some kind of writeback error, we could potentially throw away *vast* amounts of data for no good reason. - Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-05 13:37 ` Theodore Tso @ 2025-12-05 14:28 ` Deepanshu Kartikey 2026-01-05 2:08 ` Deepanshu Kartikey 0 siblings, 1 reply; 19+ messages in thread From: Deepanshu Kartikey @ 2025-12-05 14:28 UTC (permalink / raw) To: Theodore Tso Cc: Matthew Wilcox, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Fri, Dec 5, 2025 at 7:08 PM Theodore Tso <tytso@mit.edu> wrote: > > On Fri, Dec 05, 2025 at 03:33:22AM +0000, Matthew Wilcox wrote: > > It sounds like I was confused -- I thought the folios being > > invalidated in mpage_release_unused_pages() belonged to the block > > device, but from what you're saying, they belong to a user-visible > > file? > > Yes, correct. I'm guessing that we were marking the page !uptodate > back when that was the only way to indicate that there had been any > kind of I/O error (either on the read or write side). Obviously we > have much better ways of doing it in the 21st century. :-) > > > Now, is the folio necessarily dirty at this point? I guess so if > > we're in the writeback path. Darrick got rid of similar code in > > iomap a few years ago; see commit e9c3a8e820ed. So it'd probably be > > good to have ext4 behave the same way. > > Hmm, yes. Agreed. > > commit e9c3a8e820ed0eeb2be05072f29f80d1b79f053b > Author: Darrick J. Wong <djwong@kernel.org> > Date: Mon May 16 15:27:38 2022 -0700 > > iomap: don't invalidate folios after writeback errors > > XFS has the unique behavior (as compared to the other Linux > filesystems) that on writeback errors it will completely > invalidate the affected folio and force the page cache to reread > the contents from disk. All other filesystems leave the page > mapped and up to date. > > This is a rude awakening for user programs, since (in the case > where write fails but reread doesn't) file contents will appear to > revert old disk contents with no notification other than an EIO on > fsync. This might have been annoying back in the days when iomap > dealt with one page at a time, but with multipage folios, we can > now throw away *megabytes* worth of data for a single write error... > > As Darrick pointed out we could potentially append a *single* byte to > a file, and if there was some kind of writeback error, we could > potentially throw away *vast* amounts of data for no good reason. > > - Ted Hi Ted and Matthew, Thank you for pointing out the iomap commit. I now understand that invalidating folios on writeback errors is the wrong approach. Looking at Darrick's commit e9c3a8e820ed, iomap removed both folio_clear_uptodate() and the invalidation call, keeping folios in memory with their data intact even after writeback errors. For ext4, should I apply the same approach to mpage_release_unused_pages()? Specifically, remove the invalidation entirely: if (invalidate) { /* * On writeback errors, do not invalidate the folio or * clear the uptodate flag. This follows the behavior * established by iomap (commit e9c3a8e820ed "iomap: * don't invalidate folios after writeback errors"). */ if (folio_mapped(folio)) folio_clear_dirty_for_io(folio); - block_invalidate_folio(folio, 0, folio_size(folio)); - folio_clear_uptodate(folio); } This would: - Keep user data in memory instead of discarding it - Prevent the WARNING since folio remains uptodate - Match the behavior of modern filesystems - Prevent data loss from discarding potentially megabytes of data Is this the correct approach? If so, I'll send v4 with this fix. Best regards, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-05 14:28 ` Deepanshu Kartikey @ 2026-01-05 2:08 ` Deepanshu Kartikey 0 siblings, 0 replies; 19+ messages in thread From: Deepanshu Kartikey @ 2026-01-05 2:08 UTC (permalink / raw) To: Theodore Tso Cc: Matthew Wilcox, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Fri, Dec 5, 2025 at 7:58 PM Deepanshu Kartikey <kartikey406@gmail.com> wrote: > > On Fri, Dec 5, 2025 at 7:08 PM Theodore Tso <tytso@mit.edu> wrote: > > > > On Fri, Dec 05, 2025 at 03:33:22AM +0000, Matthew Wilcox wrote: > > > It sounds like I was confused -- I thought the folios being > > > invalidated in mpage_release_unused_pages() belonged to the block > > > device, but from what you're saying, they belong to a user-visible > > > file? > > > > Yes, correct. I'm guessing that we were marking the page !uptodate > > back when that was the only way to indicate that there had been any > > kind of I/O error (either on the read or write side). Obviously we > > have much better ways of doing it in the 21st century. :-) > > > > > Now, is the folio necessarily dirty at this point? I guess so if > > > we're in the writeback path. Darrick got rid of similar code in > > > iomap a few years ago; see commit e9c3a8e820ed. So it'd probably be > > > good to have ext4 behave the same way. > > > > Hmm, yes. Agreed. > > > > commit e9c3a8e820ed0eeb2be05072f29f80d1b79f053b > > Author: Darrick J. Wong <djwong@kernel.org> > > Date: Mon May 16 15:27:38 2022 -0700 > > > > iomap: don't invalidate folios after writeback errors > > > > XFS has the unique behavior (as compared to the other Linux > > filesystems) that on writeback errors it will completely > > invalidate the affected folio and force the page cache to reread > > the contents from disk. All other filesystems leave the page > > mapped and up to date. > > > > This is a rude awakening for user programs, since (in the case > > where write fails but reread doesn't) file contents will appear to > > revert old disk contents with no notification other than an EIO on > > fsync. This might have been annoying back in the days when iomap > > dealt with one page at a time, but with multipage folios, we can > > now throw away *megabytes* worth of data for a single write error... > > > > As Darrick pointed out we could potentially append a *single* byte to > > a file, and if there was some kind of writeback error, we could > > potentially throw away *vast* amounts of data for no good reason. > > > > - Ted > > > Hi Ted and Matthew, > > Thank you for pointing out the iomap commit. I now understand that > invalidating folios on writeback errors is the wrong approach. > > Looking at Darrick's commit e9c3a8e820ed, iomap removed both > folio_clear_uptodate() and the invalidation call, keeping folios in > memory with their data intact even after writeback errors. > > For ext4, should I apply the same approach to mpage_release_unused_pages()? > Specifically, remove the invalidation entirely: > > if (invalidate) { > /* > * On writeback errors, do not invalidate the folio or > * clear the uptodate flag. This follows the behavior > * established by iomap (commit e9c3a8e820ed "iomap: > * don't invalidate folios after writeback errors"). > */ > if (folio_mapped(folio)) > folio_clear_dirty_for_io(folio); > - block_invalidate_folio(folio, 0, folio_size(folio)); > - folio_clear_uptodate(folio); > } > > This would: > - Keep user data in memory instead of discarding it > - Prevent the WARNING since folio remains uptodate > - Match the behavior of modern filesystems > - Prevent data loss from discarding potentially megabytes of data > > Is this the correct approach? If so, I'll send v4 with this fix. > > Best regards, > Deepanshu Hi Ted and Matthew, I wanted to follow up on my previous email about the fix approach. Just to confirm: should I remove the folio invalidation from mpage_release_unused_pages() entirely, following Darrick's commit e9c3a8e820ed for iomap? The change would be: if (invalidate) { if (folio_mapped(folio)) folio_clear_dirty_for_io(folio); - block_invalidate_folio(folio, 0, folio_size(folio)); - folio_clear_uptodate(folio); } This keeps the folio in memory with data intact, prevents the WARNING, and matches modern filesystem behavior. If this approach is correct, I'll test and send v4. Otherwise, please let me know what adjustments are needed. Thank you, Deepanshu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() 2025-12-03 15:46 ` Theodore Tso ` (2 preceding siblings ...) 2025-12-03 21:35 ` Matthew Wilcox @ 2025-12-03 21:54 ` Matthew Wilcox 3 siblings, 0 replies; 19+ messages in thread From: Matthew Wilcox @ 2025-12-03 21:54 UTC (permalink / raw) To: Theodore Tso Cc: Deepanshu Kartikey, Zhang Yi, linux-ext4, linux-kernel, syzbot+b0a0670332b6b3230a0a, adilger.kernel, djwong On Wed, Dec 03, 2025 at 10:46:57AM -0500, Theodore Tso wrote: > Matthew, as the page cache maintainer, do we actually need this extra > rigamarole. Or can we just skip taking the lock before checking to > see if the folio is uptodate in ext4_page_mkwrite()? Oh, this triggered a memory. Looks like you missed my reply in that thread. This patch is bogus. You need to prevent !uptodate folios from being referenced by the page tables. https://groups.google.com/g/syzkaller-bugs/c/kjaCOwdrWVg/m/6Li1dROPBQAJ ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-01-05 2:08 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-22 1:57 [PATCH v2] ext4: check folio uptodate state in ext4_page_mkwrite() Deepanshu Kartikey 2025-11-30 2:06 ` Deepanshu Kartikey 2025-12-02 12:24 ` Zhang Yi 2025-12-03 1:37 ` Deepanshu Kartikey 2025-12-03 6:52 ` Zhang Yi 2025-12-03 7:43 ` Deepanshu Kartikey 2025-12-03 15:46 ` Theodore Tso 2025-12-03 17:04 ` Deepanshu Kartikey 2025-12-03 18:40 ` Theodore Tso 2025-12-03 21:35 ` Matthew Wilcox 2025-12-03 22:33 ` Theodore Tso 2025-12-04 9:54 ` Deepanshu Kartikey 2025-12-05 2:18 ` Theodore Tso 2025-12-05 3:31 ` Deepanshu Kartikey 2025-12-05 3:33 ` Matthew Wilcox 2025-12-05 13:37 ` Theodore Tso 2025-12-05 14:28 ` Deepanshu Kartikey 2026-01-05 2:08 ` Deepanshu Kartikey 2025-12-03 21:54 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox