* [PATCH v7 0/9] Ext4 Fast Commit Performance Patchset
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-08-18 4:49 ` harshad shirwadkar
2024-08-18 4:03 ` [PATCH v7 1/9] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (8 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
V7 Cover Letter
---------------
This is the V7 of the patch series. This patch series contains fixes to
review comments by Jan Kara (<jack@suse.cz>). The main changes are as
follows:
- As discussed in V6 review, I have dropped "ext4: add nolock mode to
ext4_map_blocks()" patch given that we now ensure that i_data_sem is
always grabbed before ext4_mark_inode_dirty() is called.
- I have also dropped "ext4: introduce selective flushing in fast commit"
given that correctly implementing that would require more changes, and I
think they would be best done outside of this series.
- I added "ext4: introduce selective flushing in fast commit" as the last
patch in the series. While testing log group tests I found a few failures
which get fixed due to this patch.
V6 Cover Letter
---------------
This is the V6 of the patch series. This patch series contains fixes to a
bunch of kernel build warnings reported by Kernel Test Robot
(lkp@intel.com) and Dan Carpenter (dan.carpenter@linaro.org). Thank you!
V5 Cover Letter
---------------
This patch series supersedes the patch series "ext4: remove journal barrier
during fast commit" sent in July 2022[1]. Except following three patches
all the other patches are newly introduced in this series.
* ext4: convert i_fc_lock to spinlock
* ext4: for committing inode, make ext4_fc_track_inode wait
* ext4: drop i_fc_updates from inode fc info
This patchset does not introduce any on-disk format and / or user visible
API change. This patchset reworks fast-commit's commit path improve overall
performance of the commit path. Following optimizations have been added in
this series:
* Avoid having to lock the journal throughout the fast commit.
* Remove tracking of open handles per inode.
* Avoid issuing cache flushes when fast commits are contained within a
single FUA writes and there is no data that needs flushing.
* Temporarily lift committing thread's priority to match that of the
journal thread to reduce scheduling delays.
With the changes introduced in this patch series, now the commit path for
fast commits is as follows:
1. Lock the journal by calling jbd2_journal_lock_updates_no_rsv(). This
ensures that all the exsiting handles finish and no new handles can
start.
2. Mark all the fast commit eligible inodes as undergoing fast commit by
setting "EXT4_STATE_FC_COMMITTING" state.
3. Unlock the journal by calling jbd2_journal_unlock_updates. This allows
starting of new handles. If new handles try to start an update on any of
the inodes that are being committed, ext4_fc_track_inode() will block
until those inodes have finished the fast commit.
4. Submit data buffers of all the committing inodes.
5. Wait for [4] to complete.
6. Commit all the directory entry updates in the fast commit space.
7. Commit all the changed inodes in the fast commit space and clear
"EXT4_STATE_FC_COMMITTING" for all the inodes.
8. Write tail tag to ensure atomicity of commits.
(The above flow has been documented in the code as well)
I verified that the patch series introduces no regressions in "log" groups
when "fast_commit" feature is enabled.
Also, we have a paper on fast commits in USENIX ATC 2024 this year which
should become available on the website[2] in a few months.
[1] https://lwn.net/Articles/902022/
[2] https://www.usenix.org/conference/atc24
Harshad Shirwadkar (9):
ext4: convert i_fc_lock to spinlock
ext4: for committing inode, make ext4_fc_track_inode wait
ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr
ext4: rework fast commit commit path
ext4: drop i_fc_updates from inode fc info
ext4: update code documentation
ext4: temporarily elevate commit thread priority
ext4: make fast commit ineligible on ext4_reserve_inode_write failure
ext4: hold s_fc_lock while during fast commit
fs/ext4/ext4.h | 18 +-
fs/ext4/fast_commit.c | 339 ++++++++++++++++++------------------
fs/ext4/fast_commit.h | 1 +
fs/ext4/inline.c | 3 +
fs/ext4/inode.c | 38 ++--
fs/ext4/super.c | 9 +-
fs/jbd2/journal.c | 2 -
include/trace/events/ext4.h | 7 +-
8 files changed, 214 insertions(+), 203 deletions(-)
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v7 0/9] Ext4 Fast Commit Performance Patchset
2024-08-18 4:03 ` [PATCH v7 0/9] Ext4 Fast Commit Performance Patchset Harshad Shirwadkar
@ 2024-08-18 4:49 ` harshad shirwadkar
0 siblings, 0 replies; 22+ messages in thread
From: harshad shirwadkar @ 2024-08-18 4:49 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads
On Sat, Aug 17, 2024 at 9:04 PM Harshad Shirwadkar
<harshadshirwadkar@gmail.com> wrote:
>
> V7 Cover Letter
> ---------------
>
> This is the V7 of the patch series. This patch series contains fixes to
> review comments by Jan Kara (<jack@suse.cz>). The main changes are as
> follows:
>
> - As discussed in V6 review, I have dropped "ext4: add nolock mode to
> ext4_map_blocks()" patch given that we now ensure that i_data_sem is
> always grabbed before ext4_mark_inode_dirty() is called.
>
> - I have also dropped "ext4: introduce selective flushing in fast commit"
> given that correctly implementing that would require more changes, and I
> think they would be best done outside of this series.
>
> - I added "ext4: introduce selective flushing in fast commit" as the last
> patch in the series. While testing log group tests I found a few failures
> which get fixed due to this patch.
Sorry that was a copy paste error. I meant to say I have added a new
patch - "ext4: hold s_fc_lock while during fast commit" (which is the
last patch in the series).
- Harshad
>
> V6 Cover Letter
> ---------------
>
> This is the V6 of the patch series. This patch series contains fixes to a
> bunch of kernel build warnings reported by Kernel Test Robot
> (lkp@intel.com) and Dan Carpenter (dan.carpenter@linaro.org). Thank you!
>
> V5 Cover Letter
> ---------------
>
> This patch series supersedes the patch series "ext4: remove journal barrier
> during fast commit" sent in July 2022[1]. Except following three patches
> all the other patches are newly introduced in this series.
>
> * ext4: convert i_fc_lock to spinlock
> * ext4: for committing inode, make ext4_fc_track_inode wait
> * ext4: drop i_fc_updates from inode fc info
>
> This patchset does not introduce any on-disk format and / or user visible
> API change. This patchset reworks fast-commit's commit path improve overall
> performance of the commit path. Following optimizations have been added in
> this series:
>
> * Avoid having to lock the journal throughout the fast commit.
> * Remove tracking of open handles per inode.
> * Avoid issuing cache flushes when fast commits are contained within a
> single FUA writes and there is no data that needs flushing.
> * Temporarily lift committing thread's priority to match that of the
> journal thread to reduce scheduling delays.
>
> With the changes introduced in this patch series, now the commit path for
> fast commits is as follows:
>
> 1. Lock the journal by calling jbd2_journal_lock_updates_no_rsv(). This
> ensures that all the exsiting handles finish and no new handles can
> start.
> 2. Mark all the fast commit eligible inodes as undergoing fast commit by
> setting "EXT4_STATE_FC_COMMITTING" state.
> 3. Unlock the journal by calling jbd2_journal_unlock_updates. This allows
> starting of new handles. If new handles try to start an update on any of
> the inodes that are being committed, ext4_fc_track_inode() will block
> until those inodes have finished the fast commit.
> 4. Submit data buffers of all the committing inodes.
> 5. Wait for [4] to complete.
> 6. Commit all the directory entry updates in the fast commit space.
> 7. Commit all the changed inodes in the fast commit space and clear
> "EXT4_STATE_FC_COMMITTING" for all the inodes.
> 8. Write tail tag to ensure atomicity of commits.
>
> (The above flow has been documented in the code as well)
>
> I verified that the patch series introduces no regressions in "log" groups
> when "fast_commit" feature is enabled.
>
> Also, we have a paper on fast commits in USENIX ATC 2024 this year which
> should become available on the website[2] in a few months.
>
> [1] https://lwn.net/Articles/902022/
> [2] https://www.usenix.org/conference/atc24
>
> Harshad Shirwadkar (9):
> ext4: convert i_fc_lock to spinlock
> ext4: for committing inode, make ext4_fc_track_inode wait
> ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr
> ext4: rework fast commit commit path
> ext4: drop i_fc_updates from inode fc info
> ext4: update code documentation
> ext4: temporarily elevate commit thread priority
> ext4: make fast commit ineligible on ext4_reserve_inode_write failure
> ext4: hold s_fc_lock while during fast commit
>
> fs/ext4/ext4.h | 18 +-
> fs/ext4/fast_commit.c | 339 ++++++++++++++++++------------------
> fs/ext4/fast_commit.h | 1 +
> fs/ext4/inline.c | 3 +
> fs/ext4/inode.c | 38 ++--
> fs/ext4/super.c | 9 +-
> fs/jbd2/journal.c | 2 -
> include/trace/events/ext4.h | 7 +-
> 8 files changed, 214 insertions(+), 203 deletions(-)
>
> --
> 2.46.0.184.g6999bdac58-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 1/9] ext4: convert i_fc_lock to spinlock
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 0/9] Ext4 Fast Commit Performance Patchset Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
` (7 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
Convert ext4_inode_info->i_fc_lock to spinlock to avoid sleeping
in invalid contexts.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/ext4.h | 7 +++++--
fs/ext4/fast_commit.c | 24 +++++++++++-------------
fs/ext4/super.c | 2 +-
3 files changed, 17 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 08acd1522..51ae1785a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1062,8 +1062,11 @@ struct ext4_inode_info {
/* Fast commit wait queue for this inode */
wait_queue_head_t i_fc_wait;
- /* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */
- struct mutex i_fc_lock;
+ /*
+ * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len
+ * and inode's EXT4_FC_STATE_COMMITTING state bit.
+ */
+ spinlock_t i_fc_lock;
/*
* i_disksize keeps track of what the inode size is ON DISK, not
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 3926a05ec..8731728cc 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -382,7 +382,7 @@ static int ext4_fc_track_template(
int ret;
tid = handle->h_transaction->t_tid;
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
if (tid == ei->i_sync_tid) {
update = true;
} else {
@@ -390,7 +390,7 @@ static int ext4_fc_track_template(
ei->i_sync_tid = tid;
}
ret = __fc_track_fn(inode, args, update);
- mutex_unlock(&ei->i_fc_lock);
+ spin_unlock(&ei->i_fc_lock);
if (!enqueue)
return ret;
@@ -424,19 +424,19 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
struct super_block *sb = inode->i_sb;
struct ext4_sb_info *sbi = EXT4_SB(sb);
- mutex_unlock(&ei->i_fc_lock);
+ spin_unlock(&ei->i_fc_lock);
if (IS_ENCRYPTED(dir)) {
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_ENCRYPTED_FILENAME,
NULL);
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
return -EOPNOTSUPP;
}
node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS);
if (!node) {
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, NULL);
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
return -ENOMEM;
}
@@ -448,7 +448,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
if (!node->fcd_name.name) {
kmem_cache_free(ext4_fc_dentry_cachep, node);
ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, NULL);
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
return -ENOMEM;
}
memcpy((u8 *)node->fcd_name.name, dentry->d_name.name,
@@ -482,7 +482,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist);
}
spin_unlock(&sbi->s_fc_lock);
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
return 0;
}
@@ -614,10 +614,8 @@ static int __track_range(struct inode *inode, void *arg, bool update)
struct __track_range_args *__arg =
(struct __track_range_args *)arg;
- if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) {
- ext4_debug("Special inode %ld being modified\n", inode->i_ino);
+ if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb))
return -ECANCELED;
- }
oldstart = ei->i_fc_lblk_start;
@@ -902,15 +900,15 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
struct ext4_extent *ex;
int ret;
- mutex_lock(&ei->i_fc_lock);
+ spin_lock(&ei->i_fc_lock);
if (ei->i_fc_lblk_len == 0) {
- mutex_unlock(&ei->i_fc_lock);
+ spin_unlock(&ei->i_fc_lock);
return 0;
}
old_blk_size = ei->i_fc_lblk_start;
new_blk_size = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;
ei->i_fc_lblk_len = 0;
- mutex_unlock(&ei->i_fc_lock);
+ spin_unlock(&ei->i_fc_lock);
cur_lblk_off = old_blk_size;
ext4_debug("will try writing %d to %d for inode %ld\n",
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e72145c4a..d37944839 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1423,7 +1423,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
atomic_set(&ei->i_unwritten, 0);
INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work);
ext4_fc_init_inode(&ei->vfs_inode);
- mutex_init(&ei->i_fc_lock);
+ spin_lock_init(&ei->i_fc_lock);
return &ei->vfs_inode;
}
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 0/9] Ext4 Fast Commit Performance Patchset Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 1/9] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-12 22:00 ` Jan Kara
2024-08-18 4:03 ` [PATCH v7 3/9] ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr Harshad Shirwadkar
` (6 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
If the inode that's being requested to track using ext4_fc_track_inode
is being committed, then wait until the inode finishes the
commit. Also, add calls to ext4_fc_track_inode at the right places.
With this patch, now calling ext4_reserve_inode_write() results in
inode being tracked for next fast commit. A subtle lock ordering
requirement with i_data_sem (which is documented in the code) requires
that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
this patch also adds explicit ext4_fc_track_inode() calls in places
where i_data_sem grabbed.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/fast_commit.c | 33 +++++++++++++++++++++++++++++++++
fs/ext4/inline.c | 3 +++
fs/ext4/inode.c | 4 ++++
3 files changed, 40 insertions(+)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 8731728cc..dfa999913 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -12,6 +12,7 @@
#include "ext4_extents.h"
#include "mballoc.h"
+#include <linux/lockdep.h>
/*
* Ext4 Fast Commits
* -----------------
@@ -581,6 +582,8 @@ static int __track_inode(struct inode *inode, void *arg, bool update)
void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ wait_queue_head_t *wq;
int ret;
if (S_ISDIR(inode->i_mode))
@@ -598,6 +601,36 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
return;
+ if (!list_empty(&ei->i_fc_list))
+ return;
+
+#ifdef CONFIG_LOCKDEP
+ /*
+ * If we come here, we may sleep while waiting for the inode to
+ * commit. We shouldn't be holding i_data_sem when we go to sleep since
+ * the commit path needs to grab the lock while committing the inode.
+ */
+ WARN_ON(lockdep_is_held(&ei->i_data_sem));
+#endif
+
+ while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
+#if (BITS_PER_LONG < 64)
+ DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_state_flags,
+ EXT4_STATE_FC_COMMITTING);
+#else
+ DEFINE_WAIT_BIT(wait, &ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+ wq = bit_waitqueue(&ei->i_flags,
+ EXT4_STATE_FC_COMMITTING);
+#endif
+ prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
+ if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
+ schedule();
+ finish_wait(wq, &wait.wq_entry);
+ }
+
ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
trace_ext4_fc_track_inode(handle, inode, ret);
}
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index e7a09a998..534a5c1bf 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -596,6 +596,7 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
goto out;
}
+ ext4_fc_track_inode(handle, inode);
ret = ext4_destroy_inline_data_nolock(handle, inode);
if (ret)
goto out;
@@ -696,6 +697,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
goto convert;
}
+ ext4_fc_track_inode(handle, inode);
ret = ext4_journal_get_write_access(handle, inode->i_sb, iloc.bh,
EXT4_JTR_NONE);
if (ret)
@@ -948,6 +950,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
if (ret < 0)
goto out_release_page;
}
+ ext4_fc_track_inode(handle, inode);
ret = ext4_journal_get_write_access(handle, inode->i_sb, iloc.bh,
EXT4_JTR_NONE);
if (ret)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5..e11f00ff8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -636,6 +636,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
*/
map->m_flags &= ~EXT4_MAP_FLAGS;
+ ext4_fc_track_inode(handle, inode);
/*
* New blocks allocate and/or writing to unwritten extent
* will possibly result in updating i_data, so we take
@@ -4057,6 +4058,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
if (stop_block > first_block) {
ext4_lblk_t hole_len = stop_block - first_block;
+ ext4_fc_track_inode(handle, inode);
down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_preallocations(inode);
@@ -4210,6 +4212,7 @@ int ext4_truncate(struct inode *inode)
if (err)
goto out_stop;
+ ext4_fc_track_inode(handle, inode);
down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_preallocations(inode);
@@ -5806,6 +5809,7 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
brelse(iloc->bh);
iloc->bh = NULL;
}
+ ext4_fc_track_inode(handle, inode);
}
ext4_std_error(inode->i_sb, err);
return err;
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait
2024-08-18 4:03 ` [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
@ 2024-12-12 22:00 ` Jan Kara
2024-12-13 15:10 ` Jan Kara
2025-04-14 16:57 ` harshad shirwadkar
0 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-12 22:00 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:49, Harshad Shirwadkar wrote:
> If the inode that's being requested to track using ext4_fc_track_inode
> is being committed, then wait until the inode finishes the
> commit. Also, add calls to ext4_fc_track_inode at the right places.
>
> With this patch, now calling ext4_reserve_inode_write() results in
> inode being tracked for next fast commit. A subtle lock ordering
> requirement with i_data_sem (which is documented in the code) requires
> that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
> this patch also adds explicit ext4_fc_track_inode() calls in places
> where i_data_sem grabbed.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Sorry for the huge delay! Some comments are below:
> @@ -598,6 +601,36 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
> if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
> return;
>
> + if (!list_empty(&ei->i_fc_list))
> + return;
> +
> +#ifdef CONFIG_LOCKDEP
> + /*
> + * If we come here, we may sleep while waiting for the inode to
> + * commit. We shouldn't be holding i_data_sem when we go to sleep since
> + * the commit path needs to grab the lock while committing the inode.
> + */
> + WARN_ON(lockdep_is_held(&ei->i_data_sem));
> +#endif
We have lockdep_assert_not_held() for this so you can avoid the ifdef.
> +
> + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> +#if (BITS_PER_LONG < 64)
> + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> + EXT4_STATE_FC_COMMITTING);
> + wq = bit_waitqueue(&ei->i_state_flags,
> + EXT4_STATE_FC_COMMITTING);
> +#else
> + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> + EXT4_STATE_FC_COMMITTING);
> + wq = bit_waitqueue(&ei->i_flags,
> + EXT4_STATE_FC_COMMITTING);
> +#endif
> + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
> + schedule();
> + finish_wait(wq, &wait.wq_entry);
> + }
But what protects us from fastcommit setting EXT4_STATE_FC_COMMITTING at
this moment before we call ext4_fc_track_template(). Don't you need
to grab sbi->s_fc_lock and hold it until the inode is attached to the
fastcommit?
I might be missing something so some documentation (like a comment here)
would be nice to explain what are you actually trying to achieve with the
waiting...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait
2024-12-12 22:00 ` Jan Kara
@ 2024-12-13 15:10 ` Jan Kara
2025-04-14 16:57 ` harshad shirwadkar
1 sibling, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-13 15:10 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Thu 12-12-24 23:00:43, Jan Kara wrote:
> On Sun 18-08-24 04:03:49, Harshad Shirwadkar wrote:
> > +
> > + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> > +#if (BITS_PER_LONG < 64)
> > + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#else
> > + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#endif
> > + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
> > + schedule();
> > + finish_wait(wq, &wait.wq_entry);
> > + }
>
> But what protects us from fastcommit setting EXT4_STATE_FC_COMMITTING at
> this moment before we call ext4_fc_track_template(). Don't you need
> to grab sbi->s_fc_lock and hold it until the inode is attached to the
> fastcommit?
>
> I might be missing something so some documentation (like a comment here)
> would be nice to explain what are you actually trying to achieve with the
> waiting...
Ah, I see now. In patch 4 you make setting of EXT4_STATE_FC_COMMITTING
protected by journal locking so holding a handle open protects us. Good.
But please comment about it in the changelog.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait
2024-12-12 22:00 ` Jan Kara
2024-12-13 15:10 ` Jan Kara
@ 2025-04-14 16:57 ` harshad shirwadkar
1 sibling, 0 replies; 22+ messages in thread
From: harshad shirwadkar @ 2025-04-14 16:57 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4, tytso, harshads
On Thu, Dec 12, 2024 at 2:00 PM Jan Kara <jack@suse.cz> wrote:
>
> On Sun 18-08-24 04:03:49, Harshad Shirwadkar wrote:
> > If the inode that's being requested to track using ext4_fc_track_inode
> > is being committed, then wait until the inode finishes the
> > commit. Also, add calls to ext4_fc_track_inode at the right places.
> >
> > With this patch, now calling ext4_reserve_inode_write() results in
> > inode being tracked for next fast commit. A subtle lock ordering
> > requirement with i_data_sem (which is documented in the code) requires
> > that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
> > this patch also adds explicit ext4_fc_track_inode() calls in places
> > where i_data_sem grabbed.
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
>
> Sorry for the huge delay! Some comments are below:
Hi Jan,
Sorry for a huge delay from my end as well. Thank you for your
comments on V7 of this patchset. I just sent out a V8 of this patchset
where I have handled all of your comments on this and subsequent
patches in the series.
Thank you,
Harshad
>
> > @@ -598,6 +601,36 @@ void ext4_fc_track_inode(handle_t *handle, struct inode *inode)
> > if (ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE))
> > return;
> >
> > + if (!list_empty(&ei->i_fc_list))
> > + return;
> > +
> > +#ifdef CONFIG_LOCKDEP
> > + /*
> > + * If we come here, we may sleep while waiting for the inode to
> > + * commit. We shouldn't be holding i_data_sem when we go to sleep since
> > + * the commit path needs to grab the lock while committing the inode.
> > + */
> > + WARN_ON(lockdep_is_held(&ei->i_data_sem));
> > +#endif
>
> We have lockdep_assert_not_held() for this so you can avoid the ifdef.
>
> > +
> > + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> > +#if (BITS_PER_LONG < 64)
> > + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_state_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#else
> > + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > + wq = bit_waitqueue(&ei->i_flags,
> > + EXT4_STATE_FC_COMMITTING);
> > +#endif
> > + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
> > + schedule();
> > + finish_wait(wq, &wait.wq_entry);
> > + }
>
> But what protects us from fastcommit setting EXT4_STATE_FC_COMMITTING at
> this moment before we call ext4_fc_track_template(). Don't you need
> to grab sbi->s_fc_lock and hold it until the inode is attached to the
> fastcommit?
>
> I might be missing something so some documentation (like a comment here)
> would be nice to explain what are you actually trying to achieve with the
> waiting...
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 3/9] ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (2 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 2/9] ext4: for committing inode, make ext4_fc_track_inode wait Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-12 21:57 ` Jan Kara
2024-08-18 4:03 ` [PATCH v7 4/9] ext4: rework fast commit commit path Harshad Shirwadkar
` (5 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
Mark inode dirty first and then grab i_data_sem in ext4_setattr().
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/inode.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e11f00ff8..c82eba178 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5489,12 +5489,13 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
(attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
inode->i_sb->s_blocksize_bits);
- down_write(&EXT4_I(inode)->i_data_sem);
- old_disksize = EXT4_I(inode)->i_disksize;
- EXT4_I(inode)->i_disksize = attr->ia_size;
rc = ext4_mark_inode_dirty(handle, inode);
if (!error)
error = rc;
+ down_write(&EXT4_I(inode)->i_data_sem);
+ old_disksize = EXT4_I(inode)->i_disksize;
+ EXT4_I(inode)->i_disksize = attr->ia_size;
+
/*
* We have to update i_size under i_data_sem together
* with i_disksize to avoid races with writeback code
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 3/9] ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr
2024-08-18 4:03 ` [PATCH v7 3/9] ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr Harshad Shirwadkar
@ 2024-12-12 21:57 ` Jan Kara
0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-12 21:57 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:50, Harshad Shirwadkar wrote:
> Mark inode dirty first and then grab i_data_sem in ext4_setattr().
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
...
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e11f00ff8..c82eba178 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5489,12 +5489,13 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
> (attr->ia_size > 0 ? attr->ia_size - 1 : 0) >>
> inode->i_sb->s_blocksize_bits);
>
> - down_write(&EXT4_I(inode)->i_data_sem);
> - old_disksize = EXT4_I(inode)->i_disksize;
> - EXT4_I(inode)->i_disksize = attr->ia_size;
> rc = ext4_mark_inode_dirty(handle, inode);
> if (!error)
> error = rc;
> + down_write(&EXT4_I(inode)->i_data_sem);
> + old_disksize = EXT4_I(inode)->i_disksize;
> + EXT4_I(inode)->i_disksize = attr->ia_size;
Well, but this doesn't really work, does it? If you modify inode metadata
*after* calling ext4_mark_inode_dirty() the new metadata will not be copied
into the inode buffer and thus will not be written out... You can move the
ext4_mark_inode_dirty() call to the point where i_data_sem is already
released though. That is perfectly fine.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 4/9] ext4: rework fast commit commit path
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (3 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 3/9] ext4: mark inode dirty before grabbing i_data_sem in ext4_setattr Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-13 15:12 ` Jan Kara
2024-08-18 4:03 ` [PATCH v7 5/9] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
` (4 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
This patch reworks fast commit's commit path to remove locking the
journal for the entire duration of a fast commit. Instead, we only lock
the journal while marking all the eligible inodes as "committing". This
allows handles to make progress in parallel with the fast commit.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/fast_commit.c | 74 ++++++++++++++++++++++++++++---------------
fs/jbd2/journal.c | 2 --
2 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index dfa999913..7a35234ce 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -291,20 +291,30 @@ void ext4_fc_del(struct inode *inode)
if (ext4_fc_disabled(inode->i_sb))
return;
-restart:
spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
return;
}
- if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- ext4_fc_wait_committing_inode(inode);
- goto restart;
- }
-
- if (!list_empty(&ei->i_fc_list))
- list_del_init(&ei->i_fc_list);
+ /*
+ * Since ext4_fc_del is called from ext4_evict_inode while having a
+ * handle open, there is no need for us to wait here even if a fast
+ * commit is going on. That is because, if this inode is being
+ * committed, ext4_mark_inode_dirty would have waited for inode commit
+ * operation to finish before we come here. So, by the time we come
+ * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
+ * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
+ * here.
+ *
+ * We may come here without any handles open in the "no_delete" case of
+ * ext4_evict_inode as well. However, if that happens, we first mark the
+ * file system as fast commit ineligible anyway. So, even in that case,
+ * it is okay to remove the inode from the fc list.
+ */
+ WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
+ && !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
+ list_del_init(&ei->i_fc_list);
/*
* Since this inode is getting removed, let's also remove all FC
@@ -327,8 +337,6 @@ void ext4_fc_del(struct inode *inode)
fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
kfree(fc_dentry->fcd_name.name);
kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
-
- return;
}
/*
@@ -1004,19 +1012,6 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
spin_lock(&sbi->s_fc_lock);
list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
- ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING);
- while (atomic_read(&ei->i_fc_updates)) {
- DEFINE_WAIT(wait);
-
- prepare_to_wait(&ei->i_fc_wait, &wait,
- TASK_UNINTERRUPTIBLE);
- if (atomic_read(&ei->i_fc_updates)) {
- spin_unlock(&sbi->s_fc_lock);
- schedule();
- spin_lock(&sbi->s_fc_lock);
- }
- finish_wait(&ei->i_fc_wait, &wait);
- }
spin_unlock(&sbi->s_fc_lock);
ret = jbd2_submit_inode_data(journal, ei->jinode);
if (ret)
@@ -1129,6 +1124,19 @@ static int ext4_fc_perform_commit(journal_t *journal)
int ret = 0;
u32 crc = 0;
+ /*
+ * Wait for all the handles of the current transaction to complete
+ * and then lock the journal.
+ */
+ jbd2_journal_lock_updates(journal);
+ spin_lock(&sbi->s_fc_lock);
+ list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
+ ext4_set_inode_state(&iter->vfs_inode,
+ EXT4_STATE_FC_COMMITTING);
+ }
+ spin_unlock(&sbi->s_fc_lock);
+ jbd2_journal_unlock_updates(journal);
+
ret = ext4_fc_submit_inode_data_all(journal);
if (ret)
return ret;
@@ -1179,6 +1187,18 @@ static int ext4_fc_perform_commit(journal_t *journal)
ret = ext4_fc_write_inode(inode, &crc);
if (ret)
goto out;
+ ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
+ /*
+ * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+ * visible before we send the wakeup. Pairs with implicit
+ * barrier in prepare_to_wait() in ext4_fc_track_inode().
+ */
+ smp_mb();
+#if (BITS_PER_LONG < 64)
+ wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
+#else
+ wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
+#endif
spin_lock(&sbi->s_fc_lock);
}
spin_unlock(&sbi->s_fc_lock);
@@ -1316,13 +1336,17 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
spin_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
i_fc_list) {
- list_del_init(&iter->i_fc_list);
ext4_clear_inode_state(&iter->vfs_inode,
EXT4_STATE_FC_COMMITTING);
if (tid_geq(tid, iter->i_sync_tid))
ext4_fc_reset_inode(&iter->vfs_inode);
- /* Make sure EXT4_STATE_FC_COMMITTING bit is clear */
+ /*
+ * Make sure clearing of EXT4_STATE_FC_COMMITTING is
+ * visible before we send the wakeup. Pairs with implicit
+ * barrier in prepare_to_wait() in ext4_fc_track_inode().
+ */
smp_mb();
+ list_del_init(&iter->i_fc_list);
#if (BITS_PER_LONG < 64)
wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
#else
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 1ebf2393b..ecd70b506 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -728,7 +728,6 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
}
journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
write_unlock(&journal->j_state_lock);
- jbd2_journal_lock_updates(journal);
return 0;
}
@@ -740,7 +739,6 @@ EXPORT_SYMBOL(jbd2_fc_begin_commit);
*/
static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
{
- jbd2_journal_unlock_updates(journal);
if (journal->j_fc_cleanup_callback)
journal->j_fc_cleanup_callback(journal, 0, tid);
write_lock(&journal->j_state_lock);
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 4/9] ext4: rework fast commit commit path
2024-08-18 4:03 ` [PATCH v7 4/9] ext4: rework fast commit commit path Harshad Shirwadkar
@ 2024-12-13 15:12 ` Jan Kara
0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-13 15:12 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:51, Harshad Shirwadkar wrote:
> This patch reworks fast commit's commit path to remove locking the
> journal for the entire duration of a fast commit. Instead, we only lock
> the journal while marking all the eligible inodes as "committing". This
> allows handles to make progress in parallel with the fast commit.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/fast_commit.c | 74 ++++++++++++++++++++++++++++---------------
> fs/jbd2/journal.c | 2 --
> 2 files changed, 49 insertions(+), 27 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index dfa999913..7a35234ce 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -291,20 +291,30 @@ void ext4_fc_del(struct inode *inode)
> if (ext4_fc_disabled(inode->i_sb))
> return;
>
> -restart:
> spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
> spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> return;
> }
>
> - if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> - ext4_fc_wait_committing_inode(inode);
> - goto restart;
> - }
> -
> - if (!list_empty(&ei->i_fc_list))
> - list_del_init(&ei->i_fc_list);
> + /*
> + * Since ext4_fc_del is called from ext4_evict_inode while having a
> + * handle open, there is no need for us to wait here even if a fast
> + * commit is going on. That is because, if this inode is being
> + * committed, ext4_mark_inode_dirty would have waited for inode commit
> + * operation to finish before we come here. So, by the time we come
> + * here, inode's EXT4_STATE_FC_COMMITTING would have been cleared. So,
> + * we shouldn't see EXT4_STATE_FC_COMMITTING to be set on this inode
> + * here.
> + *
> + * We may come here without any handles open in the "no_delete" case of
> + * ext4_evict_inode as well. However, if that happens, we first mark the
> + * file system as fast commit ineligible anyway. So, even in that case,
> + * it is okay to remove the inode from the fc list.
> + */
> + WARN_ON(ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)
> + && !ext4_test_mount_flag(inode->i_sb, EXT4_MF_FC_INELIGIBLE));
> + list_del_init(&ei->i_fc_list);
>
> /*
> * Since this inode is getting removed, let's also remove all FC
> @@ -327,8 +337,6 @@ void ext4_fc_del(struct inode *inode)
> fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> kfree(fc_dentry->fcd_name.name);
> kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
> -
> - return;
> }
>
> /*
> @@ -1004,19 +1012,6 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
>
> spin_lock(&sbi->s_fc_lock);
> list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> - ext4_set_inode_state(&ei->vfs_inode, EXT4_STATE_FC_COMMITTING);
> - while (atomic_read(&ei->i_fc_updates)) {
> - DEFINE_WAIT(wait);
> -
> - prepare_to_wait(&ei->i_fc_wait, &wait,
> - TASK_UNINTERRUPTIBLE);
> - if (atomic_read(&ei->i_fc_updates)) {
> - spin_unlock(&sbi->s_fc_lock);
> - schedule();
> - spin_lock(&sbi->s_fc_lock);
> - }
> - finish_wait(&ei->i_fc_wait, &wait);
> - }
> spin_unlock(&sbi->s_fc_lock);
> ret = jbd2_submit_inode_data(journal, ei->jinode);
> if (ret)
> @@ -1129,6 +1124,19 @@ static int ext4_fc_perform_commit(journal_t *journal)
> int ret = 0;
> u32 crc = 0;
>
> + /*
> + * Wait for all the handles of the current transaction to complete
> + * and then lock the journal.
> + */
> + jbd2_journal_lock_updates(journal);
> + spin_lock(&sbi->s_fc_lock);
> + list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> + ext4_set_inode_state(&iter->vfs_inode,
> + EXT4_STATE_FC_COMMITTING);
> + }
> + spin_unlock(&sbi->s_fc_lock);
> + jbd2_journal_unlock_updates(journal);
> +
> ret = ext4_fc_submit_inode_data_all(journal);
> if (ret)
> return ret;
> @@ -1179,6 +1187,18 @@ static int ext4_fc_perform_commit(journal_t *journal)
> ret = ext4_fc_write_inode(inode, &crc);
> if (ret)
> goto out;
> + ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
> + /*
> + * Make sure clearing of EXT4_STATE_FC_COMMITTING is
> + * visible before we send the wakeup. Pairs with implicit
> + * barrier in prepare_to_wait() in ext4_fc_track_inode().
> + */
> + smp_mb();
> +#if (BITS_PER_LONG < 64)
> + wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
> +#else
> + wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
> +#endif
> spin_lock(&sbi->s_fc_lock);
> }
> spin_unlock(&sbi->s_fc_lock);
> @@ -1316,13 +1336,17 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
> spin_lock(&sbi->s_fc_lock);
> list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
> i_fc_list) {
> - list_del_init(&iter->i_fc_list);
> ext4_clear_inode_state(&iter->vfs_inode,
> EXT4_STATE_FC_COMMITTING);
> if (tid_geq(tid, iter->i_sync_tid))
> ext4_fc_reset_inode(&iter->vfs_inode);
> - /* Make sure EXT4_STATE_FC_COMMITTING bit is clear */
> + /*
> + * Make sure clearing of EXT4_STATE_FC_COMMITTING is
> + * visible before we send the wakeup. Pairs with implicit
> + * barrier in prepare_to_wait() in ext4_fc_track_inode().
> + */
> smp_mb();
> + list_del_init(&iter->i_fc_list);
> #if (BITS_PER_LONG < 64)
> wake_up_bit(&iter->i_state_flags, EXT4_STATE_FC_COMMITTING);
> #else
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 1ebf2393b..ecd70b506 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -728,7 +728,6 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid)
> }
> journal->j_flags |= JBD2_FAST_COMMIT_ONGOING;
> write_unlock(&journal->j_state_lock);
> - jbd2_journal_lock_updates(journal);
>
> return 0;
> }
> @@ -740,7 +739,6 @@ EXPORT_SYMBOL(jbd2_fc_begin_commit);
> */
> static int __jbd2_fc_end_commit(journal_t *journal, tid_t tid, bool fallback)
> {
> - jbd2_journal_unlock_updates(journal);
> if (journal->j_fc_cleanup_callback)
> journal->j_fc_cleanup_callback(journal, 0, tid);
> write_lock(&journal->j_state_lock);
> --
> 2.46.0.184.g6999bdac58-goog
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 5/9] ext4: drop i_fc_updates from inode fc info
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (4 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 4/9] ext4: rework fast commit commit path Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 6/9] ext4: update code documentation Harshad Shirwadkar
` (3 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
The new logic introduced in this series does not require tracking number
of active handles open on an inode. So, drop it.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 5 ----
fs/ext4/fast_commit.c | 68 -------------------------------------------
2 files changed, 73 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 51ae1785a..03734c523 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1056,9 +1056,6 @@ struct ext4_inode_info {
/* End of lblk range that needs to be committed in this fast commit */
ext4_lblk_t i_fc_lblk_len;
- /* Number of ongoing updates on this inode */
- atomic_t i_fc_updates;
-
/* Fast commit wait queue for this inode */
wait_queue_head_t i_fc_wait;
@@ -2912,8 +2909,6 @@ void __ext4_fc_track_create(handle_t *handle, struct inode *inode,
void ext4_fc_track_create(handle_t *handle, struct dentry *dentry);
void ext4_fc_track_inode(handle_t *handle, struct inode *inode);
void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handle);
-void ext4_fc_start_update(struct inode *inode);
-void ext4_fc_stop_update(struct inode *inode);
void ext4_fc_del(struct inode *inode);
bool ext4_fc_replay_check_excluded(struct super_block *sb, ext4_fsblk_t block);
void ext4_fc_replay_cleanup(struct super_block *sb);
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 7a35234ce..1b0540f13 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -202,32 +202,6 @@ void ext4_fc_init_inode(struct inode *inode)
INIT_LIST_HEAD(&ei->i_fc_list);
INIT_LIST_HEAD(&ei->i_fc_dilist);
init_waitqueue_head(&ei->i_fc_wait);
- atomic_set(&ei->i_fc_updates, 0);
-}
-
-/* This function must be called with sbi->s_fc_lock held. */
-static void ext4_fc_wait_committing_inode(struct inode *inode)
-__releases(&EXT4_SB(inode->i_sb)->s_fc_lock)
-{
- wait_queue_head_t *wq;
- struct ext4_inode_info *ei = EXT4_I(inode);
-
-#if (BITS_PER_LONG < 64)
- DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_state_flags,
- EXT4_STATE_FC_COMMITTING);
-#else
- DEFINE_WAIT_BIT(wait, &ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
- wq = bit_waitqueue(&ei->i_flags,
- EXT4_STATE_FC_COMMITTING);
-#endif
- lockdep_assert_held(&EXT4_SB(inode->i_sb)->s_fc_lock);
- prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- schedule();
- finish_wait(wq, &wait.wq_entry);
}
static bool ext4_fc_disabled(struct super_block *sb)
@@ -236,48 +210,6 @@ static bool ext4_fc_disabled(struct super_block *sb)
(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY));
}
-/*
- * Inform Ext4's fast about start of an inode update
- *
- * This function is called by the high level call VFS callbacks before
- * performing any inode update. This function blocks if there's an ongoing
- * fast commit on the inode in question.
- */
-void ext4_fc_start_update(struct inode *inode)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
-
- if (ext4_fc_disabled(inode->i_sb))
- return;
-
-restart:
- spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
- if (list_empty(&ei->i_fc_list))
- goto out;
-
- if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
- ext4_fc_wait_committing_inode(inode);
- goto restart;
- }
-out:
- atomic_inc(&ei->i_fc_updates);
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
-}
-
-/*
- * Stop inode update and wake up waiting fast commits if any.
- */
-void ext4_fc_stop_update(struct inode *inode)
-{
- struct ext4_inode_info *ei = EXT4_I(inode);
-
- if (ext4_fc_disabled(inode->i_sb))
- return;
-
- if (atomic_dec_and_test(&ei->i_fc_updates))
- wake_up_all(&ei->i_fc_wait);
-}
-
/*
* Remove inode from fast commit list. If the inode is being committed
* we wait until inode commit is done.
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v7 6/9] ext4: update code documentation
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (5 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 5/9] ext4: drop i_fc_updates from inode fc info Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-13 15:15 ` Jan Kara
2024-08-18 4:03 ` [PATCH v7 7/9] ext4: temporarily elevate commit thread priority Harshad Shirwadkar
` (2 subsequent siblings)
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
This patch updates code documentation to reflect the commit path changes
made in this series.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/fast_commit.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 1b0540f13..71cf8dc8a 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -50,14 +50,21 @@
* that need to be committed during a fast commit in another in memory queue of
* inodes. During the commit operation, we commit in the following order:
*
- * [1] Lock inodes for any further data updates by setting COMMITTING state
- * [2] Submit data buffers of all the inodes
- * [3] Wait for [2] to complete
- * [4] Commit all the directory entry updates in the fast commit space
- * [5] Commit all the changed inode structures
- * [6] Write tail tag (this tag ensures the atomicity, please read the following
+ * [1] Lock the journal by calling jbd2_journal_lock_updates. This ensures that
+ * all the exsiting handles finish and no new handles can start.
+ * [2] Mark all the fast commit eligible inodes as undergoing fast commit
+ * by setting "EXT4_STATE_FC_COMMITTING" state.
+ * [3] Unlock the journal by calling jbd2_journal_unlock_updates. This allows
+ * starting of new handles. If new handles try to start an update on
+ * any of the inodes that are being committed, ext4_fc_track_inode()
+ * will block until those inodes have finished the fast commit.
+ * [4] Submit data buffers of all the committing inodes.
+ * [5] Wait for [4] to complete.
+ * [6] Commit all the directory entry updates in the fast commit space.
+ * [7] Commit all the changed inodes in the fast commit space and clear
+ * "EXT4_STATE_FC_COMMITTING" for these inodes.
+ * [8] Write tail tag (this tag ensures the atomicity, please read the following
* section for more details).
- * [7] Wait for [4], [5] and [6] to complete.
*
* All the inode updates must call ext4_fc_start_update() before starting an
* update. If such an ongoing update is present, fast commit waits for it to
@@ -143,6 +150,13 @@
* similarly. Thus, by converting a non-idempotent procedure into a series of
* idempotent outcomes, fast commits ensured idempotence during the replay.
*
+ * Locking
+ * -------
+ * sbi->s_fc_lock protects the fast commit inodes queue and the fast commit
+ * dentry queue. ei->i_fc_lock protects the fast commit related info in a given
+ * inode. Most of the code avoids acquiring both the locks, but if one must do
+ * that then sbi->s_fc_lock must be acquired before ei->i_fc_lock.
+ *
* TODOs
* -----
*
@@ -157,13 +171,12 @@
* fast commit recovery even if that area is invalidated by later full
* commits.
*
- * 1) Fast commit's commit path locks the entire file system during fast
- * commit. This has significant performance penalty. Instead of that, we
- * should use ext4_fc_start/stop_update functions to start inode level
- * updates from ext4_journal_start/stop. Once we do that we can drop file
- * system locking during commit path.
+ * 1) Handle more ineligible cases.
*
- * 2) Handle more ineligible cases.
+ * 2) Change ext4_fc_commit() to lookup logical to physical mapping using extent
+ * status tree. This would get rid of the need to call ext4_fc_track_inode()
+ * before acquiring i_data_sem. To do that we would need to ensure that
+ * modified extents from the extent status tree are not evicted from memory.
*/
#include <trace/events/ext4.h>
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 6/9] ext4: update code documentation
2024-08-18 4:03 ` [PATCH v7 6/9] ext4: update code documentation Harshad Shirwadkar
@ 2024-12-13 15:15 ` Jan Kara
0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-13 15:15 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:53, Harshad Shirwadkar wrote:
> This patch updates code documentation to reflect the commit path changes
> made in this series.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
Just noticed one thing:
> * All the inode updates must call ext4_fc_start_update() before starting an
> * update. If such an ongoing update is present, fast commit waits for it to
This paragraph is outdated as well. Can you please fix it? Thanks!
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 7/9] ext4: temporarily elevate commit thread priority
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (6 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 6/9] ext4: update code documentation Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 8/9] ext4: make fast commit ineligible on ext4_reserve_inode_write failure Harshad Shirwadkar
2024-08-18 4:03 ` [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit Harshad Shirwadkar
9 siblings, 0 replies; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
Unlike JBD2 based full commits, there is no dedicated journal thread
for fast commits. Thus to reduce scheduling delays between IO
submission and completion, temporarily elevate the committer thread's
priority to match the configured priority of the JBD2 journal
thread.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4.h | 4 +++-
fs/ext4/fast_commit.c | 13 +++++++++++++
fs/ext4/super.c | 5 ++---
3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 03734c523..4ecb63f95 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2279,10 +2279,12 @@ static inline int ext4_forced_shutdown(struct super_block *sb)
#define EXT4_DEFM_NODELALLOC 0x0800
/*
- * Default journal batch times
+ * Default journal batch times and ioprio.
*/
#define EXT4_DEF_MIN_BATCH_TIME 0
#define EXT4_DEF_MAX_BATCH_TIME 15000 /* 15ms */
+#define EXT4_DEF_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
+
/*
* Minimum number of groups in a flexgroup before we separate out
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 71cf8dc8a..2fc43b1e2 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1196,6 +1196,7 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
int subtid = atomic_read(&sbi->s_fc_subtid);
int status = EXT4_FC_STATUS_OK, fc_bufs_before = 0;
ktime_t start_time, commit_time;
+ int old_ioprio, journal_ioprio;
if (!test_opt2(sb, JOURNAL_FAST_COMMIT))
return jbd2_complete_transaction(journal, commit_tid);
@@ -1203,6 +1204,7 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
trace_ext4_fc_commit_start(sb, commit_tid);
start_time = ktime_get();
+ old_ioprio = get_current_ioprio();
restart_fc:
ret = jbd2_fc_begin_commit(journal, commit_tid);
@@ -1233,6 +1235,15 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
goto fallback;
}
+ /*
+ * Now that we know that this thread is going to do a fast commit,
+ * elevate the priority to match that of the journal thread.
+ */
+ if (journal->j_task->io_context)
+ journal_ioprio = sbi->s_journal->j_task->io_context->ioprio;
+ else
+ journal_ioprio = EXT4_DEF_JOURNAL_IOPRIO;
+ set_task_ioprio(current, journal_ioprio);
fc_bufs_before = (sbi->s_fc_bytes + bsize - 1) / bsize;
ret = ext4_fc_perform_commit(journal);
if (ret < 0) {
@@ -1247,6 +1258,7 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
}
atomic_inc(&sbi->s_fc_subtid);
ret = jbd2_fc_end_commit(journal);
+ set_task_ioprio(current, old_ioprio);
/*
* weight the commit time higher than the average time so we
* don't react too strongly to vast changes in the commit time
@@ -1256,6 +1268,7 @@ int ext4_fc_commit(journal_t *journal, tid_t commit_tid)
return ret;
fallback:
+ set_task_ioprio(current, old_ioprio);
ret = jbd2_fc_end_commit_fallback(journal);
ext4_fc_update_stats(sb, status, 0, 0, commit_tid);
return ret;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d37944839..4f38a34b0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1817,7 +1817,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
{}
};
-#define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
#define MOPT_SET 0x0001
#define MOPT_CLEAR 0x0002
@@ -5179,7 +5178,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
/* Set defaults for the variables that will be set during parsing */
if (!(ctx->spec & EXT4_SPEC_JOURNAL_IOPRIO))
- ctx->journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
+ ctx->journal_ioprio = EXT4_DEF_JOURNAL_IOPRIO;
sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
sbi->s_sectors_written_start =
@@ -6437,7 +6436,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
ctx->journal_ioprio =
sbi->s_journal->j_task->io_context->ioprio;
else
- ctx->journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
+ ctx->journal_ioprio = EXT4_DEF_JOURNAL_IOPRIO;
}
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH v7 8/9] ext4: make fast commit ineligible on ext4_reserve_inode_write failure
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (7 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 7/9] ext4: temporarily elevate commit thread priority Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-16 10:40 ` Jan Kara
2024-08-18 4:03 ` [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit Harshad Shirwadkar
9 siblings, 1 reply; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
Fast commit by default makes every inode on which
ext4_reserve_inode_write() is called. Thus, if that function
fails for some reason, make the next fast commit ineligible.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/fast_commit.c | 1 +
fs/ext4/fast_commit.h | 1 +
fs/ext4/inode.c | 29 ++++++++++++++++++-----------
include/trace/events/ext4.h | 7 +++++--
4 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 2fc43b1e2..7525450f1 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -2282,6 +2282,7 @@ static const char * const fc_ineligible_reasons[] = {
[EXT4_FC_REASON_FALLOC_RANGE] = "Falloc range op",
[EXT4_FC_REASON_INODE_JOURNAL_DATA] = "Data journalling",
[EXT4_FC_REASON_ENCRYPTED_FILENAME] = "Encrypted filename",
+ [EXT4_FC_REASON_INODE_RSV_WRITE_FAIL] = "Inode reserve write failure"
};
int ext4_fc_info_show(struct seq_file *seq, void *v)
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 2fadb2c47..f7f85c3dd 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -97,6 +97,7 @@ enum {
EXT4_FC_REASON_FALLOC_RANGE,
EXT4_FC_REASON_INODE_JOURNAL_DATA,
EXT4_FC_REASON_ENCRYPTED_FILENAME,
+ EXT4_FC_REASON_INODE_RSV_WRITE_FAIL,
EXT4_FC_REASON_MAX
};
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c82eba178..5a187902b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5798,20 +5798,27 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
{
int err;
- if (unlikely(ext4_forced_shutdown(inode->i_sb)))
- return -EIO;
+ if (unlikely(ext4_forced_shutdown(inode->i_sb))) {
+ err = -EIO;
+ goto out;
+ }
err = ext4_get_inode_loc(inode, iloc);
- if (!err) {
- BUFFER_TRACE(iloc->bh, "get_write_access");
- err = ext4_journal_get_write_access(handle, inode->i_sb,
- iloc->bh, EXT4_JTR_NONE);
- if (err) {
- brelse(iloc->bh);
- iloc->bh = NULL;
- }
- ext4_fc_track_inode(handle, inode);
+ if (err)
+ goto out;
+
+ BUFFER_TRACE(iloc->bh, "get_write_access");
+ err = ext4_journal_get_write_access(handle, inode->i_sb,
+ iloc->bh, EXT4_JTR_NONE);
+ if (err) {
+ brelse(iloc->bh);
+ iloc->bh = NULL;
}
+ ext4_fc_track_inode(handle, inode);
+out:
+ if (err)
+ ext4_fc_mark_ineligible(inode->i_sb,
+ EXT4_FC_REASON_INODE_RSV_WRITE_FAIL, handle);
ext4_std_error(inode->i_sb, err);
return err;
}
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index cc5e9b7b2..8bab4febd 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -105,6 +105,7 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_RENAME_DIR);
TRACE_DEFINE_ENUM(EXT4_FC_REASON_FALLOC_RANGE);
TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_JOURNAL_DATA);
TRACE_DEFINE_ENUM(EXT4_FC_REASON_ENCRYPTED_FILENAME);
+TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_RSV_WRITE_FAIL);
TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX);
#define show_fc_reason(reason) \
@@ -118,7 +119,8 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX);
{ EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
{ EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \
{ EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \
- { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"})
+ { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}, \
+ { EXT4_FC_REASON_INODE_RSV_WRITE_FAIL, "INODE_RSV_WRITE_FAIL"})
TRACE_DEFINE_ENUM(CR_POWER2_ALIGNED);
TRACE_DEFINE_ENUM(CR_GOAL_LEN_FAST);
@@ -2809,7 +2811,7 @@ TRACE_EVENT(ext4_fc_stats,
),
TP_printk("dev %d,%d fc ineligible reasons:\n"
- "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u"
+ "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u"
"num_commits:%lu, ineligible: %lu, numblks: %lu",
MAJOR(__entry->dev), MINOR(__entry->dev),
FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
@@ -2822,6 +2824,7 @@ TRACE_EVENT(ext4_fc_stats,
FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA),
FC_REASON_NAME_STAT(EXT4_FC_REASON_ENCRYPTED_FILENAME),
+ FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_RSV_WRITE_FAIL),
__entry->fc_commits, __entry->fc_ineligible_commits,
__entry->fc_numblks)
);
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 8/9] ext4: make fast commit ineligible on ext4_reserve_inode_write failure
2024-08-18 4:03 ` [PATCH v7 8/9] ext4: make fast commit ineligible on ext4_reserve_inode_write failure Harshad Shirwadkar
@ 2024-12-16 10:40 ` Jan Kara
0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-16 10:40 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:55, Harshad Shirwadkar wrote:
> Fast commit by default makes every inode on which
> ext4_reserve_inode_write() is called. Thus, if that function
> fails for some reason, make the next fast commit ineligible.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
I think this is a bit pointless. If ext4_reserve_inode_write() fails, we
have a data corruption problems, journal has aborted or similar. Thus I
think data consistency of fsync is kind of the least problem we are having
:). That being said just completely turning of fastcommit as soon as we hit
some filesystem error (in __ext4_std_error() when we don't decide to panic
system / shutdown the filesystem) makes sense to me as a kind of "let's
limit possible damage" measure.
Honza
> ---
> fs/ext4/fast_commit.c | 1 +
> fs/ext4/fast_commit.h | 1 +
> fs/ext4/inode.c | 29 ++++++++++++++++++-----------
> include/trace/events/ext4.h | 7 +++++--
> 4 files changed, 25 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 2fc43b1e2..7525450f1 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -2282,6 +2282,7 @@ static const char * const fc_ineligible_reasons[] = {
> [EXT4_FC_REASON_FALLOC_RANGE] = "Falloc range op",
> [EXT4_FC_REASON_INODE_JOURNAL_DATA] = "Data journalling",
> [EXT4_FC_REASON_ENCRYPTED_FILENAME] = "Encrypted filename",
> + [EXT4_FC_REASON_INODE_RSV_WRITE_FAIL] = "Inode reserve write failure"
> };
>
> int ext4_fc_info_show(struct seq_file *seq, void *v)
> diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
> index 2fadb2c47..f7f85c3dd 100644
> --- a/fs/ext4/fast_commit.h
> +++ b/fs/ext4/fast_commit.h
> @@ -97,6 +97,7 @@ enum {
> EXT4_FC_REASON_FALLOC_RANGE,
> EXT4_FC_REASON_INODE_JOURNAL_DATA,
> EXT4_FC_REASON_ENCRYPTED_FILENAME,
> + EXT4_FC_REASON_INODE_RSV_WRITE_FAIL,
> EXT4_FC_REASON_MAX
> };
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c82eba178..5a187902b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5798,20 +5798,27 @@ ext4_reserve_inode_write(handle_t *handle, struct inode *inode,
> {
> int err;
>
> - if (unlikely(ext4_forced_shutdown(inode->i_sb)))
> - return -EIO;
> + if (unlikely(ext4_forced_shutdown(inode->i_sb))) {
> + err = -EIO;
> + goto out;
> + }
>
> err = ext4_get_inode_loc(inode, iloc);
> - if (!err) {
> - BUFFER_TRACE(iloc->bh, "get_write_access");
> - err = ext4_journal_get_write_access(handle, inode->i_sb,
> - iloc->bh, EXT4_JTR_NONE);
> - if (err) {
> - brelse(iloc->bh);
> - iloc->bh = NULL;
> - }
> - ext4_fc_track_inode(handle, inode);
> + if (err)
> + goto out;
> +
> + BUFFER_TRACE(iloc->bh, "get_write_access");
> + err = ext4_journal_get_write_access(handle, inode->i_sb,
> + iloc->bh, EXT4_JTR_NONE);
> + if (err) {
> + brelse(iloc->bh);
> + iloc->bh = NULL;
> }
> + ext4_fc_track_inode(handle, inode);
> +out:
> + if (err)
> + ext4_fc_mark_ineligible(inode->i_sb,
> + EXT4_FC_REASON_INODE_RSV_WRITE_FAIL, handle);
> ext4_std_error(inode->i_sb, err);
> return err;
> }
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index cc5e9b7b2..8bab4febd 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -105,6 +105,7 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_RENAME_DIR);
> TRACE_DEFINE_ENUM(EXT4_FC_REASON_FALLOC_RANGE);
> TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_JOURNAL_DATA);
> TRACE_DEFINE_ENUM(EXT4_FC_REASON_ENCRYPTED_FILENAME);
> +TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_RSV_WRITE_FAIL);
> TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX);
>
> #define show_fc_reason(reason) \
> @@ -118,7 +119,8 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX);
> { EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \
> { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \
> { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \
> - { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"})
> + { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"}, \
> + { EXT4_FC_REASON_INODE_RSV_WRITE_FAIL, "INODE_RSV_WRITE_FAIL"})
>
> TRACE_DEFINE_ENUM(CR_POWER2_ALIGNED);
> TRACE_DEFINE_ENUM(CR_GOAL_LEN_FAST);
> @@ -2809,7 +2811,7 @@ TRACE_EVENT(ext4_fc_stats,
> ),
>
> TP_printk("dev %d,%d fc ineligible reasons:\n"
> - "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u"
> + "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u"
> "num_commits:%lu, ineligible: %lu, numblks: %lu",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR),
> @@ -2822,6 +2824,7 @@ TRACE_EVENT(ext4_fc_stats,
> FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA),
> FC_REASON_NAME_STAT(EXT4_FC_REASON_ENCRYPTED_FILENAME),
> + FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_RSV_WRITE_FAIL),
> __entry->fc_commits, __entry->fc_ineligible_commits,
> __entry->fc_numblks)
> );
> --
> 2.46.0.184.g6999bdac58-goog
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit
2024-08-18 4:03 [PATCH] ext4: convert i_fc_lock to spinlock Harshad Shirwadkar
` (8 preceding siblings ...)
2024-08-18 4:03 ` [PATCH v7 8/9] ext4: make fast commit ineligible on ext4_reserve_inode_write failure Harshad Shirwadkar
@ 2024-08-18 4:03 ` Harshad Shirwadkar
2024-12-16 10:50 ` Jan Kara
2025-01-13 14:16 ` Baokun Li
9 siblings, 2 replies; 22+ messages in thread
From: Harshad Shirwadkar @ 2024-08-18 4:03 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, jack, harshads, Harshad Shirwadkar
Leaving s_fc_lock in between during commit in ext4_fc_perform_commit()
function leaves room for subtle concurrency bugs where ext4_fc_del() may
delete an inode from the fast commit list, leaving list in an inconsistent
state. Also, this patch converts s_fc_lock to mutex type so that it can be
held when kmem_cache_* functions are called.
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/fast_commit.c | 91 +++++++++++++++++--------------------------
fs/ext4/super.c | 2 +-
3 files changed, 38 insertions(+), 57 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4ecb63f95..a1acd34ff 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1748,7 +1748,7 @@ struct ext4_sb_info {
* following fields:
* ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
*/
- spinlock_t s_fc_lock;
+ struct mutex s_fc_lock;
struct buffer_head *s_fc_bh;
struct ext4_fc_stats s_fc_stats;
tid_t s_fc_ineligible_tid;
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 7525450f1..c3627efd9 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -236,9 +236,9 @@ void ext4_fc_del(struct inode *inode)
if (ext4_fc_disabled(inode->i_sb))
return;
- spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
+ mutex_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
- spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
+ mutex_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
return;
}
@@ -266,7 +266,7 @@ void ext4_fc_del(struct inode *inode)
* dentry create references, since it is not needed to log it anyways.
*/
if (list_empty(&ei->i_fc_dilist)) {
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
return;
}
@@ -276,7 +276,7 @@ void ext4_fc_del(struct inode *inode)
list_del_init(&fc_dentry->fcd_dilist);
WARN_ON(!list_empty(&ei->i_fc_dilist));
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
if (fc_dentry->fcd_name.name &&
fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
@@ -306,10 +306,10 @@ void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handl
sbi->s_journal->j_running_transaction->t_tid : 0;
read_unlock(&sbi->s_journal->j_state_lock);
}
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
if (tid_gt(tid, sbi->s_fc_ineligible_tid))
sbi->s_fc_ineligible_tid = tid;
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
WARN_ON(reason >= EXT4_FC_REASON_MAX);
sbi->s_fc_stats.fc_ineligible_reason_count[reason]++;
}
@@ -349,14 +349,14 @@ static int ext4_fc_track_template(
if (!enqueue)
return ret;
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
if (list_empty(&EXT4_I(inode)->i_fc_list))
list_add_tail(&EXT4_I(inode)->i_fc_list,
(sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
&sbi->s_fc_q[FC_Q_STAGING] :
&sbi->s_fc_q[FC_Q_MAIN]);
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
return ret;
}
@@ -414,7 +414,8 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
}
node->fcd_name.len = dentry->d_name.len;
INIT_LIST_HEAD(&node->fcd_dilist);
- spin_lock(&sbi->s_fc_lock);
+ INIT_LIST_HEAD(&node->fcd_list);
+ mutex_lock(&sbi->s_fc_lock);
if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING)
list_add_tail(&node->fcd_list,
@@ -435,7 +436,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
WARN_ON(!list_empty(&ei->i_fc_dilist));
list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist);
}
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
spin_lock(&ei->i_fc_lock);
return 0;
@@ -955,15 +956,15 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
struct ext4_inode_info *ei;
int ret = 0;
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
ret = jbd2_submit_inode_data(journal, ei->jinode);
if (ret)
return ret;
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
}
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
return ret;
}
@@ -976,19 +977,19 @@ static int ext4_fc_wait_inode_data_all(journal_t *journal)
struct ext4_inode_info *pos, *n;
int ret = 0;
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(pos, n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
if (!ext4_test_inode_state(&pos->vfs_inode,
EXT4_STATE_FC_COMMITTING))
continue;
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
ret = jbd2_wait_inode_data(journal, pos->jinode);
if (ret)
return ret;
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
}
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
return 0;
}
@@ -1010,26 +1011,22 @@ __releases(&sbi->s_fc_lock)
list_for_each_entry_safe(fc_dentry, fc_dentry_n,
&sbi->s_fc_dentry_q[FC_Q_MAIN], fcd_list) {
if (fc_dentry->fcd_op != EXT4_FC_TAG_CREAT) {
- spin_unlock(&sbi->s_fc_lock);
- if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry)) {
- ret = -ENOSPC;
- goto lock_and_exit;
- }
- spin_lock(&sbi->s_fc_lock);
+ if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry))
+ return -ENOSPC;
continue;
}
/*
* With fcd_dilist we need not loop in sbi->s_fc_q to get the
- * corresponding inode pointer
+ * corresponding inode. Also, the corresponding inode could have been
+ * deleted, in which case, we don't need to do anything.
*/
- WARN_ON(list_empty(&fc_dentry->fcd_dilist));
+ if (list_empty(&fc_dentry->fcd_dilist))
+ continue;
ei = list_first_entry(&fc_dentry->fcd_dilist,
struct ext4_inode_info, i_fc_dilist);
inode = &ei->vfs_inode;
WARN_ON(inode->i_ino != fc_dentry->fcd_ino);
- spin_unlock(&sbi->s_fc_lock);
-
/*
* We first write the inode and then the create dirent. This
* allows the recovery code to create an unnamed inode first
@@ -1039,23 +1036,14 @@ __releases(&sbi->s_fc_lock)
*/
ret = ext4_fc_write_inode(inode, crc);
if (ret)
- goto lock_and_exit;
-
+ return ret;
ret = ext4_fc_write_inode_data(inode, crc);
if (ret)
- goto lock_and_exit;
-
- if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry)) {
- ret = -ENOSPC;
- goto lock_and_exit;
- }
-
- spin_lock(&sbi->s_fc_lock);
+ return ret;
+ if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry))
+ return -ENOSPC;
}
return 0;
-lock_and_exit:
- spin_lock(&sbi->s_fc_lock);
- return ret;
}
static int ext4_fc_perform_commit(journal_t *journal)
@@ -1074,12 +1062,12 @@ static int ext4_fc_perform_commit(journal_t *journal)
* and then lock the journal.
*/
jbd2_journal_lock_updates(journal);
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
ext4_set_inode_state(&iter->vfs_inode,
EXT4_STATE_FC_COMMITTING);
}
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
jbd2_journal_unlock_updates(journal);
ret = ext4_fc_submit_inode_data_all(journal);
@@ -1113,19 +1101,16 @@ static int ext4_fc_perform_commit(journal_t *journal)
}
}
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
ret = ext4_fc_commit_dentry_updates(journal, &crc);
- if (ret) {
- spin_unlock(&sbi->s_fc_lock);
+ if (ret)
goto out;
- }
list_for_each_entry(iter, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
inode = &iter->vfs_inode;
if (!ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
continue;
- spin_unlock(&sbi->s_fc_lock);
ret = ext4_fc_write_inode_data(inode, &crc);
if (ret)
goto out;
@@ -1144,13 +1129,11 @@ static int ext4_fc_perform_commit(journal_t *journal)
#else
wake_up_bit(&iter->i_flags, EXT4_STATE_FC_COMMITTING);
#endif
- spin_lock(&sbi->s_fc_lock);
}
- spin_unlock(&sbi->s_fc_lock);
-
ret = ext4_fc_write_tail(sb, crc);
out:
+ mutex_unlock(&sbi->s_fc_lock);
blk_finish_plug(&plug);
return ret;
}
@@ -1291,7 +1274,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
trace_ext4_fc_cleanup(journal, full, tid);
jbd2_fc_release_bufs(journal);
- spin_lock(&sbi->s_fc_lock);
+ mutex_lock(&sbi->s_fc_lock);
list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN],
i_fc_list) {
ext4_clear_inode_state(&iter->vfs_inode,
@@ -1318,13 +1301,11 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
fcd_list);
list_del_init(&fc_dentry->fcd_list);
list_del_init(&fc_dentry->fcd_dilist);
- spin_unlock(&sbi->s_fc_lock);
if (fc_dentry->fcd_name.name &&
fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
kfree(fc_dentry->fcd_name.name);
kmem_cache_free(ext4_fc_dentry_cachep, fc_dentry);
- spin_lock(&sbi->s_fc_lock);
}
list_splice_init(&sbi->s_fc_dentry_q[FC_Q_STAGING],
@@ -1339,7 +1320,7 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
if (full)
sbi->s_fc_bytes = 0;
- spin_unlock(&sbi->s_fc_lock);
+ mutex_unlock(&sbi->s_fc_lock);
trace_ext4_fc_stats(sb);
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4f38a34b0..bef9fd128 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4436,7 +4436,7 @@ static void ext4_fast_commit_init(struct super_block *sb)
sbi->s_fc_bytes = 0;
ext4_clear_mount_flag(sb, EXT4_MF_FC_INELIGIBLE);
sbi->s_fc_ineligible_tid = 0;
- spin_lock_init(&sbi->s_fc_lock);
+ mutex_init(&sbi->s_fc_lock);
memset(&sbi->s_fc_stats, 0, sizeof(sbi->s_fc_stats));
sbi->s_fc_replay_state.fc_regions = NULL;
sbi->s_fc_replay_state.fc_regions_size = 0;
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit
2024-08-18 4:03 ` [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit Harshad Shirwadkar
@ 2024-12-16 10:50 ` Jan Kara
2025-01-13 14:16 ` Baokun Li
1 sibling, 0 replies; 22+ messages in thread
From: Jan Kara @ 2024-12-16 10:50 UTC (permalink / raw)
To: Harshad Shirwadkar; +Cc: linux-ext4, tytso, jack, harshads
On Sun 18-08-24 04:03:56, Harshad Shirwadkar wrote:
> Leaving s_fc_lock in between during commit in ext4_fc_perform_commit()
> function leaves room for subtle concurrency bugs where ext4_fc_del() may
> delete an inode from the fast commit list, leaving list in an inconsistent
> state. Also, this patch converts s_fc_lock to mutex type so that it can be
> held when kmem_cache_* functions are called.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
This would be easier to review if you split the patch into two - one for
mindless conversion of a spinlock into a mutex and another one for the
change when the lock is held which can have non-trivial effects.
Otherwise the patch looks good, just one nit below. With that fixed feel
free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> @@ -1010,26 +1011,22 @@ __releases(&sbi->s_fc_lock)
^^ the sparse annotation with __acquires, __releases needs updating as
well
> list_for_each_entry_safe(fc_dentry, fc_dentry_n,
> &sbi->s_fc_dentry_q[FC_Q_MAIN], fcd_list) {
> if (fc_dentry->fcd_op != EXT4_FC_TAG_CREAT) {
> - spin_unlock(&sbi->s_fc_lock);
> - if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry)) {
> - ret = -ENOSPC;
> - goto lock_and_exit;
> - }
> - spin_lock(&sbi->s_fc_lock);
> + if (!ext4_fc_add_dentry_tlv(sb, crc, fc_dentry))
> + return -ENOSPC;
> continue;
> }
> /*
> * With fcd_dilist we need not loop in sbi->s_fc_q to get the
> - * corresponding inode pointer
> + * corresponding inode. Also, the corresponding inode could have been
> + * deleted, in which case, we don't need to do anything.
> */
> - WARN_ON(list_empty(&fc_dentry->fcd_dilist));
> + if (list_empty(&fc_dentry->fcd_dilist))
> + continue;
> ei = list_first_entry(&fc_dentry->fcd_dilist,
> struct ext4_inode_info, i_fc_dilist);
> inode = &ei->vfs_inode;
> WARN_ON(inode->i_ino != fc_dentry->fcd_ino);
>
> - spin_unlock(&sbi->s_fc_lock);
> -
> /*
> * We first write the inode and then the create dirent. This
> * allows the recovery code to create an unnamed inode first
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit
2024-08-18 4:03 ` [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit Harshad Shirwadkar
2024-12-16 10:50 ` Jan Kara
@ 2025-01-13 14:16 ` Baokun Li
2025-04-14 16:59 ` harshad shirwadkar
1 sibling, 1 reply; 22+ messages in thread
From: Baokun Li @ 2025-01-13 14:16 UTC (permalink / raw)
To: Harshad Shirwadkar
Cc: linux-ext4, tytso, jack, harshads, Baokun Li, Yang Erkun
Hi Harshad,
On 2024/8/18 12:03, Harshad Shirwadkar wrote:
> Leaving s_fc_lock in between during commit in ext4_fc_perform_commit()
> function leaves room for subtle concurrency bugs where ext4_fc_del() may
> delete an inode from the fast commit list, leaving list in an inconsistent
> state. Also, this patch converts s_fc_lock to mutex type so that it can be
> held when kmem_cache_* functions are called.
>
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> ---
> fs/ext4/ext4.h | 2 +-
> fs/ext4/fast_commit.c | 91 +++++++++++++++++--------------------------
> fs/ext4/super.c | 2 +-
> 3 files changed, 38 insertions(+), 57 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 4ecb63f95..a1acd34ff 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1748,7 +1748,7 @@ struct ext4_sb_info {
> * following fields:
> * ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
> */
> - spinlock_t s_fc_lock;
> + struct mutex s_fc_lock;
> struct buffer_head *s_fc_bh;
> struct ext4_fc_stats s_fc_stats;
> tid_t s_fc_ineligible_tid;
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 7525450f1..c3627efd9 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -236,9 +236,9 @@ void ext4_fc_del(struct inode *inode)
> if (ext4_fc_disabled(inode->i_sb))
> return;
>
> - spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> + mutex_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
> - spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> + mutex_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> return;
> }
>
> @@ -266,7 +266,7 @@ void ext4_fc_del(struct inode *inode)
> * dentry create references, since it is not needed to log it anyways.
> */
> if (list_empty(&ei->i_fc_dilist)) {
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
> return;
> }
>
> @@ -276,7 +276,7 @@ void ext4_fc_del(struct inode *inode)
> list_del_init(&fc_dentry->fcd_dilist);
>
> WARN_ON(!list_empty(&ei->i_fc_dilist));
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
>
> if (fc_dentry->fcd_name.name &&
> fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> @@ -306,10 +306,10 @@ void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handl
> sbi->s_journal->j_running_transaction->t_tid : 0;
> read_unlock(&sbi->s_journal->j_state_lock);
> }
> - spin_lock(&sbi->s_fc_lock);
> + mutex_lock(&sbi->s_fc_lock);
> if (tid_gt(tid, sbi->s_fc_ineligible_tid))
> sbi->s_fc_ineligible_tid = tid;
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
> WARN_ON(reason >= EXT4_FC_REASON_MAX);
> sbi->s_fc_stats.fc_ineligible_reason_count[reason]++;
> }
> @@ -349,14 +349,14 @@ static int ext4_fc_track_template(
> if (!enqueue)
> return ret;
>
> - spin_lock(&sbi->s_fc_lock);
> + mutex_lock(&sbi->s_fc_lock);
> if (list_empty(&EXT4_I(inode)->i_fc_list))
> list_add_tail(&EXT4_I(inode)->i_fc_list,
> (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
> sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
> &sbi->s_fc_q[FC_Q_STAGING] :
> &sbi->s_fc_q[FC_Q_MAIN]);
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
>
> return ret;
> }
> @@ -414,7 +414,8 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
> }
> node->fcd_name.len = dentry->d_name.len;
> INIT_LIST_HEAD(&node->fcd_dilist);
> - spin_lock(&sbi->s_fc_lock);
> + INIT_LIST_HEAD(&node->fcd_list);
> + mutex_lock(&sbi->s_fc_lock);
> if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
> sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING)
> list_add_tail(&node->fcd_list,
> @@ -435,7 +436,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
> WARN_ON(!list_empty(&ei->i_fc_dilist));
> list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist);
> }
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
> spin_lock(&ei->i_fc_lock);
>
> return 0;
> @@ -955,15 +956,15 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
> struct ext4_inode_info *ei;
> int ret = 0;
>
> - spin_lock(&sbi->s_fc_lock);
> + mutex_lock(&sbi->s_fc_lock);
> list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
> ret = jbd2_submit_inode_data(journal, ei->jinode);
> if (ret)
> return ret;
> - spin_lock(&sbi->s_fc_lock);
> + mutex_lock(&sbi->s_fc_lock);
> }
> - spin_unlock(&sbi->s_fc_lock);
> + mutex_unlock(&sbi->s_fc_lock);
>
We're also seeing a similar race condition here. This issue was encountered
while running `kvm-xfstests -c ext4/adv -C 500 generic/241`:
P1 | P2
----------------------------------------------------
evict
ext4_evict_inode
ext4_free_inode
ext4_clear_inode
ext4_fc_del(inode)
ext4_sync_file
ext4_fsync_journal
ext4_fc_commit
ext4_fc_perform_commit
ext4_fc_submit_inode_data_all
-- spin_lock(&sbi->s_fc_lock);
list_for_each_entry(i_fc_list)
-- spin_unlock(&sbi->s_fc_lock);
-- spin_lock(&sbi->s_fc_lock)
if (!list_empty(&ei->i_fc_list))
list_del_init(&ei->i_fc_list);
-- spin_unlock(&sbi->s_fc_lock);
jbd2_free_inode(EXT4_I(inode)->jinode)
EXT4_I(inode)->jinode = NULL
jbd2_submit_inode_data
journal->j_submit_inode_data_buffers
ext4_journal_submit_inode_data_buffers
ext4_should_journal_data(jinode->i_vfs_inode)
// a. jinode may use-after-free !!!
ext4_inode_journal_mode(inode)
EXT4_JOURNAL(inode)
(inode)->i_sb
// b. inode may null-ptr-deref !!!
-- spin_lock(&sbi->s_fc_lock);
-- spin_unlock(&sbi->s_fc_lock);
By the way, the WARN_ON added in patch 5 can detect this issue without
enabling KASAN, but patch 5 also introduced softlocks and other UAFs.
Regards,
Baokun
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v7 9/9] ext4: hold s_fc_lock while during fast commit
2025-01-13 14:16 ` Baokun Li
@ 2025-04-14 16:59 ` harshad shirwadkar
0 siblings, 0 replies; 22+ messages in thread
From: harshad shirwadkar @ 2025-04-14 16:59 UTC (permalink / raw)
To: Baokun Li; +Cc: linux-ext4, tytso, jack, harshads, Yang Erkun
On Mon, Jan 13, 2025 at 6:16 AM Baokun Li <libaokun1@huawei.com> wrote:
>
> Hi Harshad,
>
> On 2024/8/18 12:03, Harshad Shirwadkar wrote:
> > Leaving s_fc_lock in between during commit in ext4_fc_perform_commit()
> > function leaves room for subtle concurrency bugs where ext4_fc_del() may
> > delete an inode from the fast commit list, leaving list in an inconsistent
> > state. Also, this patch converts s_fc_lock to mutex type so that it can be
> > held when kmem_cache_* functions are called.
> >
> > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
> > ---
> > fs/ext4/ext4.h | 2 +-
> > fs/ext4/fast_commit.c | 91 +++++++++++++++++--------------------------
> > fs/ext4/super.c | 2 +-
> > 3 files changed, 38 insertions(+), 57 deletions(-)
> >
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 4ecb63f95..a1acd34ff 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1748,7 +1748,7 @@ struct ext4_sb_info {
> > * following fields:
> > * ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
> > */
> > - spinlock_t s_fc_lock;
> > + struct mutex s_fc_lock;
> > struct buffer_head *s_fc_bh;
> > struct ext4_fc_stats s_fc_stats;
> > tid_t s_fc_ineligible_tid;
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > index 7525450f1..c3627efd9 100644
> > --- a/fs/ext4/fast_commit.c
> > +++ b/fs/ext4/fast_commit.c
> > @@ -236,9 +236,9 @@ void ext4_fc_del(struct inode *inode)
> > if (ext4_fc_disabled(inode->i_sb))
> > return;
> >
> > - spin_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> > + mutex_lock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> > if (list_empty(&ei->i_fc_list) && list_empty(&ei->i_fc_dilist)) {
> > - spin_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> > + mutex_unlock(&EXT4_SB(inode->i_sb)->s_fc_lock);
> > return;
> > }
> >
> > @@ -266,7 +266,7 @@ void ext4_fc_del(struct inode *inode)
> > * dentry create references, since it is not needed to log it anyways.
> > */
> > if (list_empty(&ei->i_fc_dilist)) {
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> > return;
> > }
> >
> > @@ -276,7 +276,7 @@ void ext4_fc_del(struct inode *inode)
> > list_del_init(&fc_dentry->fcd_dilist);
> >
> > WARN_ON(!list_empty(&ei->i_fc_dilist));
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> >
> > if (fc_dentry->fcd_name.name &&
> > fc_dentry->fcd_name.len > DNAME_INLINE_LEN)
> > @@ -306,10 +306,10 @@ void ext4_fc_mark_ineligible(struct super_block *sb, int reason, handle_t *handl
> > sbi->s_journal->j_running_transaction->t_tid : 0;
> > read_unlock(&sbi->s_journal->j_state_lock);
> > }
> > - spin_lock(&sbi->s_fc_lock);
> > + mutex_lock(&sbi->s_fc_lock);
> > if (tid_gt(tid, sbi->s_fc_ineligible_tid))
> > sbi->s_fc_ineligible_tid = tid;
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> > WARN_ON(reason >= EXT4_FC_REASON_MAX);
> > sbi->s_fc_stats.fc_ineligible_reason_count[reason]++;
> > }
> > @@ -349,14 +349,14 @@ static int ext4_fc_track_template(
> > if (!enqueue)
> > return ret;
> >
> > - spin_lock(&sbi->s_fc_lock);
> > + mutex_lock(&sbi->s_fc_lock);
> > if (list_empty(&EXT4_I(inode)->i_fc_list))
> > list_add_tail(&EXT4_I(inode)->i_fc_list,
> > (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
> > sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ?
> > &sbi->s_fc_q[FC_Q_STAGING] :
> > &sbi->s_fc_q[FC_Q_MAIN]);
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> >
> > return ret;
> > }
> > @@ -414,7 +414,8 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
> > }
> > node->fcd_name.len = dentry->d_name.len;
> > INIT_LIST_HEAD(&node->fcd_dilist);
> > - spin_lock(&sbi->s_fc_lock);
> > + INIT_LIST_HEAD(&node->fcd_list);
> > + mutex_lock(&sbi->s_fc_lock);
> > if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING ||
> > sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING)
> > list_add_tail(&node->fcd_list,
> > @@ -435,7 +436,7 @@ static int __track_dentry_update(struct inode *inode, void *arg, bool update)
> > WARN_ON(!list_empty(&ei->i_fc_dilist));
> > list_add_tail(&node->fcd_dilist, &ei->i_fc_dilist);
> > }
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> > spin_lock(&ei->i_fc_lock);
> >
> > return 0;
> > @@ -955,15 +956,15 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
> > struct ext4_inode_info *ei;
> > int ret = 0;
> >
> > - spin_lock(&sbi->s_fc_lock);
> > + mutex_lock(&sbi->s_fc_lock);
> > list_for_each_entry(ei, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) {
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> > ret = jbd2_submit_inode_data(journal, ei->jinode);
> > if (ret)
> > return ret;
> > - spin_lock(&sbi->s_fc_lock);
> > + mutex_lock(&sbi->s_fc_lock);
> > }
> > - spin_unlock(&sbi->s_fc_lock);
> > + mutex_unlock(&sbi->s_fc_lock);
> >
> We're also seeing a similar race condition here. This issue was encountered
> while running `kvm-xfstests -c ext4/adv -C 500 generic/241`:
>
> P1 | P2
> ----------------------------------------------------
> evict
> ext4_evict_inode
> ext4_free_inode
> ext4_clear_inode
> ext4_fc_del(inode)
> ext4_sync_file
> ext4_fsync_journal
> ext4_fc_commit
> ext4_fc_perform_commit
> ext4_fc_submit_inode_data_all
> -- spin_lock(&sbi->s_fc_lock);
> list_for_each_entry(i_fc_list)
> -- spin_unlock(&sbi->s_fc_lock);
> -- spin_lock(&sbi->s_fc_lock)
> if (!list_empty(&ei->i_fc_list))
> list_del_init(&ei->i_fc_list);
> -- spin_unlock(&sbi->s_fc_lock);
> jbd2_free_inode(EXT4_I(inode)->jinode)
> EXT4_I(inode)->jinode = NULL
> jbd2_submit_inode_data
> journal->j_submit_inode_data_buffers
> ext4_journal_submit_inode_data_buffers
> ext4_should_journal_data(jinode->i_vfs_inode)
> // a. jinode may use-after-free !!!
> ext4_inode_journal_mode(inode)
> EXT4_JOURNAL(inode)
> (inode)->i_sb
> // b. inode may null-ptr-deref !!!
> -- spin_lock(&sbi->s_fc_lock);
> -- spin_unlock(&sbi->s_fc_lock);
>
> By the way, the WARN_ON added in patch 5 can detect this issue without
> enabling KASAN, but patch 5 also introduced softlocks and other UAFs.
Thanks for mentioning this. V8 of the patchset handles this race by
not releasing s_fc_lock in ext4_fc_submit_inode_data_all().
- Harshad
>
>
> Regards,
> Baokun
>
^ permalink raw reply [flat|nested] 22+ messages in thread