* [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes
@ 2026-06-30 15:28 Aditya Srivastava
2026-06-30 15:28 ` [PATCH 1/2] ext4: use fsdata to track inline data write state Aditya Srivastava
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Aditya Srivastava @ 2026-06-30 15:28 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Andreas Dilger, Jan Kara, Baokun Li, Ojaswin Mujoo,
Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d,
linux-ext4, linux-fsdevel, linux-kernel,
Aditya Prakash Srivastava
From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com>
This patch series addresses the remaining race conditions and locking
issues involved with inline data writes, implementing the clean
state-communication design suggested by Jan Kara.
Previously, `ext4_write_end()`, `ext4_journalled_write_end()`, and
`ext4_da_write_end()` checked the inode state and the inline data flag
directly to decide whether to finish writing inline data or to fall
back to block writes. This is highly susceptible to TOCTOU race
conditions where concurrent memory-mapped page faults
(`ext4_page_mkwrite()`) can convert the inline data to an extent
between `write_begin` and `write_end`. Since block buffers were not
allocated in the inline path during `write_begin`, such fallbacks
resulted in kernel crashes and NULL pointer dereferences because
`folio_buffers(folio)` was NULL.
The series cleans up and resolves these issues in two distinct steps:
1) Patch 1 introduces state tracking via the standard `fsdata`
parameter. By marking whether a write was prepared as inline
(`EXT4_WRITE_DATA_INLINE`) directly in the private per-write
`fsdata` during `write_begin`, the corresponding `write_end`
handlers can reliably decide whether to call
`ext4_write_inline_data_end()` or complete a normal extent write.
This eliminates the race-prone checks on the live inode state and
gets rid of crude fallback/retry hacks.
2) Patch 2 replaces a potential kernel panic
(`BUG_ON(!ext4_has_inline_data(inode))`) inside
`ext4_write_inline_data_end()` with a graceful retry error path.
If a concurrent conversion clears the inline flag right after the
`write_end` checks pass but before the xattr semaphore is acquired,
we gracefully release all held resources and return 0 (VFS retry) to
let the VFS safely retry the write from scratch.
The series compiles clean against the latest linux-next/ext4 tree.
Thanks,
Aditya
Aditya Prakash Srivastava (2):
ext4: use fsdata to track inline data write state
ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end
fs/ext4/ext4.h | 1 +
fs/ext4/inline.c | 14 +++++++++++++-
fs/ext4/inode.c | 22 +++++++++++++---------
3 files changed, 27 insertions(+), 10 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] ext4: use fsdata to track inline data write state 2026-06-30 15:28 [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Aditya Srivastava @ 2026-06-30 15:28 ` Aditya Srivastava 2026-07-01 9:26 ` Jan Kara 2026-06-30 15:28 ` [PATCH 2/2] ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end Aditya Srivastava 2026-07-01 9:03 ` [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Jan Kara 2 siblings, 1 reply; 7+ messages in thread From: Aditya Srivastava @ 2026-06-30 15:28 UTC (permalink / raw) To: Theodore Ts'o Cc: Andreas Dilger, Jan Kara, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel, Aditya Prakash Srivastava From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> Instead of checking the live inode state (`ext4_has_inline_data(inode)` and `ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)`) in the write_end handlers, use the `fsdata` parameter of the address space operations to explicitly pass down the state in which `write_begin` prepared the write. A concurrent thread (such as `ext4_page_mkwrite()`) can convert the inline data to an extent between `write_begin` and `write_end`. If this happens, the write_end handlers would previously miss the inline write_end path and fall through to extent-based write_end logic. However, since block buffers were never allocated in `write_begin`, this resulted in NULL pointer dereferences or data loss because `folio_buffers(folio)` was NULL. By defining `EXT4_WRITE_DATA_INLINE` (3) and communicating this state via `fsdata`: 1) `ext4_write_begin()` and `ext4_da_write_begin()` explicitly set `*fsdata` to `EXT4_WRITE_DATA_INLINE` when an inline write is successfully prepared. 2) `ext4_write_end()`, `ext4_journalled_write_end()`, and `ext4_da_write_end()` rely solely on `fsdata` / `write_mode` to invoke `ext4_write_inline_data_end()`. This removes the crude race fallbacks and makes the write_end determination unambiguous, simple, and clean. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> --- fs/ext4/ext4.h | 1 + fs/ext4/inode.c | 22 +++++++++++++--------- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b37c136ea3ab..521bd5d6321c 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3138,6 +3138,7 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, void ext4_set_inode_mapping_order(struct inode *inode); #define FALL_BACK_TO_NONDELALLOC 1 #define CONVERT_INLINE_DATA 2 +#define EXT4_WRITE_DATA_INLINE 3 typedef enum { EXT4_IGET_NORMAL = 0, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ce99807c5f5b..e2e8ac5fb8d8 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1302,6 +1302,9 @@ static int ext4_write_begin(const struct kiocb *iocb, if (unlikely(ret)) return ret; + if (fsdata && *fsdata != (void *)FALL_BACK_TO_NONDELALLOC) + *fsdata = NULL; + trace_ext4_write_begin(inode, pos, len); /* * Reserve one block more for addition to orphan list in case @@ -1316,8 +1319,11 @@ static int ext4_write_begin(const struct kiocb *iocb, foliop); if (ret < 0) return ret; - if (ret == 1) + if (ret == 1) { + if (fsdata) + *fsdata = (void *)EXT4_WRITE_DATA_INLINE; return 0; + } } /* @@ -1450,8 +1456,7 @@ static int ext4_write_end(const struct kiocb *iocb, trace_ext4_write_end(inode, pos, len, copied); - if (ext4_has_inline_data(inode) && - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) + if (fsdata == (void *)EXT4_WRITE_DATA_INLINE) return ext4_write_inline_data_end(inode, pos, len, copied, folio); @@ -1560,8 +1565,7 @@ static int ext4_journalled_write_end(const struct kiocb *iocb, BUG_ON(!ext4_handle_valid(handle)); - if (ext4_has_inline_data(inode) && - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) + if (fsdata == (void *)EXT4_WRITE_DATA_INLINE) return ext4_write_inline_data_end(inode, pos, len, copied, folio); @@ -3161,8 +3165,10 @@ static int ext4_da_write_begin(const struct kiocb *iocb, foliop, fsdata, true); if (ret < 0) return ret; - if (ret == 1) + if (ret == 1) { + *fsdata = (void *)EXT4_WRITE_DATA_INLINE; return 0; + } } retry: @@ -3299,9 +3305,7 @@ static int ext4_da_write_end(const struct kiocb *iocb, trace_ext4_da_write_end(inode, pos, len, copied); - if (write_mode != CONVERT_INLINE_DATA && - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && - ext4_has_inline_data(inode)) + if (write_mode == EXT4_WRITE_DATA_INLINE) return ext4_write_inline_data_end(inode, pos, len, copied, folio); -- 2.47.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ext4: use fsdata to track inline data write state 2026-06-30 15:28 ` [PATCH 1/2] ext4: use fsdata to track inline data write state Aditya Srivastava @ 2026-07-01 9:26 ` Jan Kara 0 siblings, 0 replies; 7+ messages in thread From: Jan Kara @ 2026-07-01 9:26 UTC (permalink / raw) To: Aditya Srivastava Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel On Tue 30-06-26 15:28:11, Aditya Srivastava wrote: > From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> > > Instead of checking the live inode state (`ext4_has_inline_data(inode)` > and `ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)`) in the > write_end handlers, use the `fsdata` parameter of the address space > operations to explicitly pass down the state in which `write_begin` > prepared the write. > > A concurrent thread (such as `ext4_page_mkwrite()`) can convert the > inline data to an extent between `write_begin` and `write_end`. If this > happens, the write_end handlers would previously miss the inline > write_end path and fall through to extent-based write_end logic. However, > since block buffers were never allocated in `write_begin`, this resulted > in NULL pointer dereferences or data loss because `folio_buffers(folio)` > was NULL. > > By defining `EXT4_WRITE_DATA_INLINE` (3) and communicating this state via > `fsdata`: > 1) `ext4_write_begin()` and `ext4_da_write_begin()` explicitly set > `*fsdata` to `EXT4_WRITE_DATA_INLINE` when an inline write is > successfully prepared. > 2) `ext4_write_end()`, `ext4_journalled_write_end()`, and > `ext4_da_write_end()` rely solely on `fsdata` / `write_mode` to > invoke `ext4_write_inline_data_end()`. > > This removes the crude race fallbacks and makes the write_end > determination unambiguous, simple, and clean. > > Suggested-by: Jan Kara <jack@suse.cz> > Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> > --- > fs/ext4/ext4.h | 1 + > fs/ext4/inode.c | 22 +++++++++++++--------- > 2 files changed, 14 insertions(+), 9 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index b37c136ea3ab..521bd5d6321c 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -3138,6 +3138,7 @@ int do_journal_get_write_access(handle_t *handle, struct inode *inode, > void ext4_set_inode_mapping_order(struct inode *inode); > #define FALL_BACK_TO_NONDELALLOC 1 > #define CONVERT_INLINE_DATA 2 > +#define EXT4_WRITE_DATA_INLINE 3 > > typedef enum { > EXT4_IGET_NORMAL = 0, > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index ce99807c5f5b..e2e8ac5fb8d8 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1302,6 +1302,9 @@ static int ext4_write_begin(const struct kiocb *iocb, > if (unlikely(ret)) > return ret; > > + if (fsdata && *fsdata != (void *)FALL_BACK_TO_NONDELALLOC) > + *fsdata = NULL; This is pointless. *fsdata is guaranteed to be NULL on entry to ->write_begin from generic code, ext4 calls ext4_write_begin() from one place and there it has FALL_BACK_TO_NONDELALLOC value. > + > trace_ext4_write_begin(inode, pos, len); > /* > * Reserve one block more for addition to orphan list in case > @@ -1316,8 +1319,11 @@ static int ext4_write_begin(const struct kiocb *iocb, > foliop); > if (ret < 0) > return ret; > - if (ret == 1) > + if (ret == 1) { > + if (fsdata) fsdata is guaranteed to be non-NULL so no need to check for it here. > + *fsdata = (void *)EXT4_WRITE_DATA_INLINE; > return 0; > + } > } > > /* > @@ -1450,8 +1456,7 @@ static int ext4_write_end(const struct kiocb *iocb, > > trace_ext4_write_end(inode, pos, len, copied); > > - if (ext4_has_inline_data(inode) && > - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) > + if (fsdata == (void *)EXT4_WRITE_DATA_INLINE) > return ext4_write_inline_data_end(inode, pos, len, copied, > folio); OK, but if we race like: CPU1 CPU2 ext4_try_to_write_inline_data() ext4_page_mkwrite() ext4_write_inline_data_end() Then ext4_write_inline_data_end() will now hit: BUG_ON(!ext4_has_inline_data(inode)); with much higher probability. You fix that in patch 2 but generally we try hard to avoid such breaking points in git history. So please fold patch 2 into this one to keep things working all the time. Honza > > @@ -1560,8 +1565,7 @@ static int ext4_journalled_write_end(const struct kiocb *iocb, > > BUG_ON(!ext4_handle_valid(handle)); > > - if (ext4_has_inline_data(inode) && > - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) > + if (fsdata == (void *)EXT4_WRITE_DATA_INLINE) > return ext4_write_inline_data_end(inode, pos, len, copied, > folio); > > @@ -3161,8 +3165,10 @@ static int ext4_da_write_begin(const struct kiocb *iocb, > foliop, fsdata, true); > if (ret < 0) > return ret; > - if (ret == 1) > + if (ret == 1) { > + *fsdata = (void *)EXT4_WRITE_DATA_INLINE; > return 0; > + } > } > > retry: > @@ -3299,9 +3305,7 @@ static int ext4_da_write_end(const struct kiocb *iocb, > > trace_ext4_da_write_end(inode, pos, len, copied); > > - if (write_mode != CONVERT_INLINE_DATA && > - ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && > - ext4_has_inline_data(inode)) > + if (write_mode == EXT4_WRITE_DATA_INLINE) > return ext4_write_inline_data_end(inode, pos, len, copied, > folio); > > -- > 2.47.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end 2026-06-30 15:28 [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Aditya Srivastava 2026-06-30 15:28 ` [PATCH 1/2] ext4: use fsdata to track inline data write state Aditya Srivastava @ 2026-06-30 15:28 ` Aditya Srivastava 2026-07-01 9:03 ` [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Jan Kara 2 siblings, 0 replies; 7+ messages in thread From: Aditya Srivastava @ 2026-06-30 15:28 UTC (permalink / raw) To: Theodore Ts'o Cc: Andreas Dilger, Jan Kara, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel, Aditya Prakash Srivastava From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> During a buffered write, `ext4_write_inline_data_end()` acquires the xattr lock after preparing the write. If a concurrent page fault (`ext4_page_mkwrite()`) converts the inline data to an extent after the write_end handlers check the state but before `ext4_write_inline_data_end()` acquires the xattr write lock, the subsequent check will trigger a kernel panic via `BUG_ON(!ext4_has_inline_data(inode))`. Replace the `BUG_ON` check with a graceful error-handling retry path. If the inline data is cleared after locking the xattr, we safely release all resources (releasing `iloc.bh`, unlocking/putting the folio, stopping the active journal transaction handle) and return 0 (VFS retry) to let the generic write path retry the operation safely. Reported-by: syzbot+0c89d865531d053abb2d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c89d865531d053abb2d Fixes: 3fdcfb668fd7 ("ext4: add journalled write support for inline data") Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> --- fs/ext4/inline.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 8045e4ff270c..cfd591dc1d9c 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -812,7 +812,19 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, goto out; } ext4_write_lock_xattr(inode, &no_expand); - BUG_ON(!ext4_has_inline_data(inode)); + /* + * We could have raced with ext4_page_mkwrite() converting + * the inode and clearing the inline data flag, so we just + * release resources and retry the whole write. + */ + if (unlikely(!ext4_has_inline_data(inode))) { + ext4_write_unlock_xattr(inode, &no_expand); + brelse(iloc.bh); + folio_unlock(folio); + folio_put(folio); + ext4_journal_stop(handle); + return 0; + } /* * ei->i_inline_off may have changed since -- 2.47.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes 2026-06-30 15:28 [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Aditya Srivastava 2026-06-30 15:28 ` [PATCH 1/2] ext4: use fsdata to track inline data write state Aditya Srivastava 2026-06-30 15:28 ` [PATCH 2/2] ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end Aditya Srivastava @ 2026-07-01 9:03 ` Jan Kara 2026-07-01 9:29 ` Aditya Prakash Srivastava 2 siblings, 1 reply; 7+ messages in thread From: Jan Kara @ 2026-07-01 9:03 UTC (permalink / raw) To: Aditya Srivastava Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel On Tue 30-06-26 15:28:10, Aditya Srivastava wrote: > From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> > > This patch series addresses the remaining race conditions and locking > issues involved with inline data writes, implementing the clean > state-communication design suggested by Jan Kara. > > Previously, `ext4_write_end()`, `ext4_journalled_write_end()`, and > `ext4_da_write_end()` checked the inode state and the inline data flag > directly to decide whether to finish writing inline data or to fall > back to block writes. This is highly susceptible to TOCTOU race > conditions where concurrent memory-mapped page faults > (`ext4_page_mkwrite()`) can convert the inline data to an extent > between `write_begin` and `write_end`. Since block buffers were not > allocated in the inline path during `write_begin`, such fallbacks > resulted in kernel crashes and NULL pointer dereferences because > `folio_buffers(folio)` was NULL. > > The series cleans up and resolves these issues in two distinct steps: > > 1) Patch 1 introduces state tracking via the standard `fsdata` > parameter. By marking whether a write was prepared as inline > (`EXT4_WRITE_DATA_INLINE`) directly in the private per-write > `fsdata` during `write_begin`, the corresponding `write_end` > handlers can reliably decide whether to call > `ext4_write_inline_data_end()` or complete a normal extent write. > This eliminates the race-prone checks on the live inode state and > gets rid of crude fallback/retry hacks. > > 2) Patch 2 replaces a potential kernel panic > (`BUG_ON(!ext4_has_inline_data(inode))`) inside > `ext4_write_inline_data_end()` with a graceful retry error path. > If a concurrent conversion clears the inline flag right after the > `write_end` checks pass but before the xattr semaphore is acquired, > we gracefully release all held resources and return 0 (VFS retry) to > let the VFS safely retry the write from scratch. > > The series compiles clean against the latest linux-next/ext4 tree. This caught my eye: Do you actually do real testing of your patches? Like using fstests / kvm-xfstests? Every patch author for ext4 is supposed to do that before submitting his changes... Honza > Aditya Prakash Srivastava (2): > ext4: use fsdata to track inline data write state > ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end > > fs/ext4/ext4.h | 1 + > fs/ext4/inline.c | 14 +++++++++++++- > fs/ext4/inode.c | 22 +++++++++++++--------- > 3 files changed, 27 insertions(+), 10 deletions(-) > > -- > 2.47.3 -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes 2026-07-01 9:03 ` [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Jan Kara @ 2026-07-01 9:29 ` Aditya Prakash Srivastava 2026-07-01 9:36 ` Jan Kara 0 siblings, 1 reply; 7+ messages in thread From: Aditya Prakash Srivastava @ 2026-07-01 9:29 UTC (permalink / raw) To: Jan Kara Cc: Theodore Ts'o, Andreas Dilger, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel Hi Jan, While the cover letter mentions only a clean compile, I can categorically confirm that I have run fstests, LTP, and a layered filesystem I/O testing matrix before submitting. Sorry about the confusion in the write-up. Thanks, Aditya On Wed, Jul 1, 2026 at 2:33 PM Jan Kara <jack@suse.cz> wrote: > > On Tue 30-06-26 15:28:10, Aditya Srivastava wrote: > > From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> > > > > This patch series addresses the remaining race conditions and locking > > issues involved with inline data writes, implementing the clean > > state-communication design suggested by Jan Kara. > > > > Previously, `ext4_write_end()`, `ext4_journalled_write_end()`, and > > `ext4_da_write_end()` checked the inode state and the inline data flag > > directly to decide whether to finish writing inline data or to fall > > back to block writes. This is highly susceptible to TOCTOU race > > conditions where concurrent memory-mapped page faults > > (`ext4_page_mkwrite()`) can convert the inline data to an extent > > between `write_begin` and `write_end`. Since block buffers were not > > allocated in the inline path during `write_begin`, such fallbacks > > resulted in kernel crashes and NULL pointer dereferences because > > `folio_buffers(folio)` was NULL. > > > > The series cleans up and resolves these issues in two distinct steps: > > > > 1) Patch 1 introduces state tracking via the standard `fsdata` > > parameter. By marking whether a write was prepared as inline > > (`EXT4_WRITE_DATA_INLINE`) directly in the private per-write > > `fsdata` during `write_begin`, the corresponding `write_end` > > handlers can reliably decide whether to call > > `ext4_write_inline_data_end()` or complete a normal extent write. > > This eliminates the race-prone checks on the live inode state and > > gets rid of crude fallback/retry hacks. > > > > 2) Patch 2 replaces a potential kernel panic > > (`BUG_ON(!ext4_has_inline_data(inode))`) inside > > `ext4_write_inline_data_end()` with a graceful retry error path. > > If a concurrent conversion clears the inline flag right after the > > `write_end` checks pass but before the xattr semaphore is acquired, > > we gracefully release all held resources and return 0 (VFS retry) to > > let the VFS safely retry the write from scratch. > > > > The series compiles clean against the latest linux-next/ext4 tree. > > This caught my eye: Do you actually do real testing of your patches? Like > using fstests / kvm-xfstests? Every patch author for ext4 is supposed to do > that before submitting his changes... > > Honza > > > Aditya Prakash Srivastava (2): > > ext4: use fsdata to track inline data write state > > ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end > > > > fs/ext4/ext4.h | 1 + > > fs/ext4/inline.c | 14 +++++++++++++- > > fs/ext4/inode.c | 22 +++++++++++++--------- > > 3 files changed, 27 insertions(+), 10 deletions(-) > > > > -- > > 2.47.3 > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes 2026-07-01 9:29 ` Aditya Prakash Srivastava @ 2026-07-01 9:36 ` Jan Kara 0 siblings, 0 replies; 7+ messages in thread From: Jan Kara @ 2026-07-01 9:36 UTC (permalink / raw) To: Aditya Prakash Srivastava Cc: Jan Kara, Theodore Ts'o, Andreas Dilger, Baokun Li, Ojaswin Mujoo, Ritesh Harjani, Zhang Yi, Tao Ma, syzbot+0c89d865531d053abb2d, linux-ext4, linux-fsdevel, linux-kernel Hi! On Wed 01-07-26 14:59:41, Aditya Prakash Srivastava wrote: > While the cover letter mentions only a clean compile, I can categorically > confirm that I have run fstests, LTP, and a layered filesystem I/O > testing matrix before submitting. Sorry about the confusion in the write-up. Cool, thanks for confirmation! I was suspecting this was just unlucky formulation but wanted to make sure... I'm sorry for the noise. Honza > On Wed, Jul 1, 2026 at 2:33 PM Jan Kara <jack@suse.cz> wrote: > > > > On Tue 30-06-26 15:28:10, Aditya Srivastava wrote: > > > From: Aditya Prakash Srivastava <aditya.ansh182@gmail.com> > > > > > > This patch series addresses the remaining race conditions and locking > > > issues involved with inline data writes, implementing the clean > > > state-communication design suggested by Jan Kara. > > > > > > Previously, `ext4_write_end()`, `ext4_journalled_write_end()`, and > > > `ext4_da_write_end()` checked the inode state and the inline data flag > > > directly to decide whether to finish writing inline data or to fall > > > back to block writes. This is highly susceptible to TOCTOU race > > > conditions where concurrent memory-mapped page faults > > > (`ext4_page_mkwrite()`) can convert the inline data to an extent > > > between `write_begin` and `write_end`. Since block buffers were not > > > allocated in the inline path during `write_begin`, such fallbacks > > > resulted in kernel crashes and NULL pointer dereferences because > > > `folio_buffers(folio)` was NULL. > > > > > > The series cleans up and resolves these issues in two distinct steps: > > > > > > 1) Patch 1 introduces state tracking via the standard `fsdata` > > > parameter. By marking whether a write was prepared as inline > > > (`EXT4_WRITE_DATA_INLINE`) directly in the private per-write > > > `fsdata` during `write_begin`, the corresponding `write_end` > > > handlers can reliably decide whether to call > > > `ext4_write_inline_data_end()` or complete a normal extent write. > > > This eliminates the race-prone checks on the live inode state and > > > gets rid of crude fallback/retry hacks. > > > > > > 2) Patch 2 replaces a potential kernel panic > > > (`BUG_ON(!ext4_has_inline_data(inode))`) inside > > > `ext4_write_inline_data_end()` with a graceful retry error path. > > > If a concurrent conversion clears the inline flag right after the > > > `write_end` checks pass but before the xattr semaphore is acquired, > > > we gracefully release all held resources and return 0 (VFS retry) to > > > let the VFS safely retry the write from scratch. > > > > > > The series compiles clean against the latest linux-next/ext4 tree. > > > > This caught my eye: Do you actually do real testing of your patches? Like > > using fstests / kvm-xfstests? Every patch author for ext4 is supposed to do > > that before submitting his changes... > > > > Honza > > > > > Aditya Prakash Srivastava (2): > > > ext4: use fsdata to track inline data write state > > > ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end > > > > > > fs/ext4/ext4.h | 1 + > > > fs/ext4/inline.c | 14 +++++++++++++- > > > fs/ext4/inode.c | 22 +++++++++++++--------- > > > 3 files changed, 27 insertions(+), 10 deletions(-) > > > > > > -- > > > 2.47.3 > > -- > > Jan Kara <jack@suse.com> > > SUSE Labs, CR -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-07-01 9:36 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-30 15:28 [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Aditya Srivastava 2026-06-30 15:28 ` [PATCH 1/2] ext4: use fsdata to track inline data write state Aditya Srivastava 2026-07-01 9:26 ` Jan Kara 2026-06-30 15:28 ` [PATCH 2/2] ext4: replace BUG_ON with graceful retry in ext4_write_inline_data_end Aditya Srivastava 2026-07-01 9:03 ` [PATCH 0/2] ext4: fix race conditions and clean up locking of inline data writes Jan Kara 2026-07-01 9:29 ` Aditya Prakash Srivastava 2026-07-01 9:36 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox