* [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS
@ 2013-04-09 8:34 Jan Kara
2013-04-09 8:34 ` [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction Jan Kara
2013-04-09 16:42 ` [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Theodore Ts'o
0 siblings, 2 replies; 10+ messages in thread
From: Jan Kara @ 2013-04-09 8:34 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Estimate of 27 credits for allocation of a block in extent based inode
is unnecessarily high. We can easily argue 20 is enough.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ext4_jbd2.h | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 4c216b1..042e463 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -29,11 +29,13 @@
* block to complete the transaction.
*
* For extents-enabled fs we may have to allocate and modify up to
- * 5 levels of tree + root which are stored in the inode. */
+ * 5 levels of tree, data block (for each of these we need bitmap + group
+ * summaries), root which is stored in the inode, sb
+ */
#define EXT4_SINGLEDATA_TRANS_BLOCKS(sb) \
(EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS) \
- ? 27U : 8U)
+ ? 20U : 8U)
/* Extended attribute operations touch at most two data buffers,
* two bitmap buffers, and two group summaries, in addition to the inode
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction
2013-04-09 8:34 [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Jan Kara
@ 2013-04-09 8:34 ` Jan Kara
2013-04-09 16:42 ` Theodore Ts'o
2013-04-09 16:42 ` [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Theodore Ts'o
1 sibling, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-04-09 8:34 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Jan Kara
Inode allocation transaction is pretty heavy (246 credits with quotas
and extents before previous patch, still around 200 after it). This is
mostly due to credits required for allocation of quota structures
(credits there are heavily overestimated but it's difficult to make
better estimates if we don't want to wire non-trivial assumptions about
quota format into filesystem).
So move quota initialization out of allocation transaction. That way
transaction for quota structure allocation will be started only if we
need to look up quota structure on disk (rare) and furthermore it will
be started for each quota type separately, not for all of them at once.
This reduces maximum transaction size to 34 is most cases and to 73 in
the worst case.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ialloc.c | 62 +++++++++++++++++++++++++++---------------------------
fs/ext4/namei.c | 12 +++-------
2 files changed, 35 insertions(+), 39 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 6c5bb8d..3f37297 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -652,6 +652,7 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
struct inode *ret;
ext4_group_t i;
ext4_group_t flex_group;
+ bool quota_allocated = false;
/* Cannot create files in a deleted directory */
if (!dir || !dir->i_nlink)
@@ -666,6 +667,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
ei = EXT4_I(inode);
sbi = EXT4_SB(sb);
+ /*
+ * Initalize owners and quota early so that we don't have to account
+ * for quota initialization worst case in standard inode creating
+ * transaction
+ */
+ if (owner) {
+ inode->i_mode = mode;
+ i_uid_write(inode, owner[0]);
+ i_gid_write(inode, owner[1]);
+ } else if (test_opt(sb, GRPID)) {
+ inode->i_mode = mode;
+ inode->i_uid = current_fsuid();
+ inode->i_gid = dir->i_gid;
+ } else
+ inode_init_owner(inode, dir, mode);
+ dquot_initialize(inode);
+
if (!goal)
goal = sbi->s_inode_goal;
@@ -851,17 +869,6 @@ got:
flex_group = ext4_flex_group(sbi, group);
atomic_dec(&sbi->s_flex_groups[flex_group].free_inodes);
}
- if (owner) {
- inode->i_mode = mode;
- i_uid_write(inode, owner[0]);
- i_gid_write(inode, owner[1]);
- } else if (test_opt(sb, GRPID)) {
- inode->i_mode = mode;
- inode->i_uid = current_fsuid();
- inode->i_gid = dir->i_gid;
- } else
- inode_init_owner(inode, dir, mode);
-
inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
/* This is the optimal IO size (for stat), not the fs block size */
inode->i_blocks = 0;
@@ -918,18 +925,18 @@ got:
ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
ret = inode;
- dquot_initialize(inode);
err = dquot_alloc_inode(inode);
if (err)
- goto fail_drop;
+ goto out;
+ quota_allocated = true;
err = ext4_init_acl(handle, inode, dir);
if (err)
- goto fail_free_drop;
+ goto out;
err = ext4_init_security(handle, inode, dir, qstr);
if (err)
- goto fail_free_drop;
+ goto out;
if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) {
/* set extent flag only for directory, file and normal symlink*/
@@ -945,10 +952,8 @@ got:
}
err = ext4_mark_inode_dirty(handle, inode);
- if (err) {
- ext4_std_error(sb, err);
- goto fail_free_drop;
- }
+ if (err)
+ goto fail;
ext4_debug("allocating inode %lu\n", inode->i_ino);
trace_ext4_allocate_inode(inode, dir, mode);
@@ -956,23 +961,18 @@ got:
fail:
ext4_std_error(sb, err);
out:
- iput(inode);
- ret = ERR_PTR(err);
-really_out:
- brelse(inode_bitmap_bh);
- return ret;
-
-fail_free_drop:
- dquot_free_inode(inode);
-
-fail_drop:
+ if (quota_allocated)
+ dquot_free_inode(inode);
dquot_drop(inode);
inode->i_flags |= S_NOQUOTA;
clear_nlink(inode);
- unlock_new_inode(inode);
+ if (inode->i_state & I_NEW)
+ unlock_new_inode(inode);
iput(inode);
+ ret = ERR_PTR(err);
+really_out:
brelse(inode_bitmap_bh);
- return ERR_PTR(err);
+ return ret;
}
/* Verify that we are loading a valid orphan from disk */
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 3825d6a..ede1337 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2251,8 +2251,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2286,8 +2285,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2396,8 +2394,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, S_IFDIR | mode,
&dentry->d_name,
@@ -2826,8 +2823,7 @@ static int ext4_symlink(struct inode *dir,
* quota blocks, sb is already counted in previous macros).
*/
credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
}
retry:
inode = ext4_new_inode_start_handle(dir, S_IFLNK|S_IRWXUGO,
--
1.7.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS
2013-04-09 8:34 [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Jan Kara
2013-04-09 8:34 ` [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction Jan Kara
@ 2013-04-09 16:42 ` Theodore Ts'o
1 sibling, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2013-04-09 16:42 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Tue, Apr 09, 2013 at 10:34:07AM +0200, Jan Kara wrote:
> Estimate of 27 credits for allocation of a block in extent based inode
> is unnecessarily high. We can easily argue 20 is enough.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, added to the dev branch for testing.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction
2013-04-09 8:34 ` [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction Jan Kara
@ 2013-04-09 16:42 ` Theodore Ts'o
2013-04-18 19:07 ` Theodore Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2013-04-09 16:42 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Tue, Apr 09, 2013 at 10:34:08AM +0200, Jan Kara wrote:
> Inode allocation transaction is pretty heavy (246 credits with quotas
> and extents before previous patch, still around 200 after it). This is
> mostly due to credits required for allocation of quota structures
> (credits there are heavily overestimated but it's difficult to make
> better estimates if we don't want to wire non-trivial assumptions about
> quota format into filesystem).
>
> So move quota initialization out of allocation transaction. That way
> transaction for quota structure allocation will be started only if we
> need to look up quota structure on disk (rare) and furthermore it will
> be started for each quota type separately, not for all of them at once.
> This reduces maximum transaction size to 34 is most cases and to 73 in
> the worst case.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Thanks, added to the dev branch for testing.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction
2013-04-09 16:42 ` Theodore Ts'o
@ 2013-04-18 19:07 ` Theodore Ts'o
2013-04-18 21:46 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2013-04-18 19:07 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-ext4
On Tue, Apr 09, 2013 at 12:42:37PM -0400, Theodore Ts'o wrote:
>
> Thanks, added to the dev branch for testing.
>
This patch was causing a reliable failure for xfstests generic/083
using bigalloc. I bisected down this patch, and then localized the
failure down to the insert_inode_locked() call in __ext4_new_inode()
failing (which means that there is an old inode in the inode hash
lists with the same inode number)
The error seems to be caused by the error handling code paths. I
believe the problem was caused by clear_nlink() and unlock_new_inode()
getting called in some error cleanup paths when the inode that hadn't
yet been inserted into inode hash lists.
Here is a revamped patch which has a cleaned up set of error paths.
The only change is in the cleanup paths, and this causes the
generic/083 test for bigalloc to pass where it was previously failing.
- Ted
commit 623df013725903824ac341a32e9c11c72752d8eb
Author: Jan Kara <jack@suse.cz>
Date: Thu Apr 18 15:03:28 2013 -0400
ext4: move quota initialization out of inode allocation transaction
Inode allocation transaction is pretty heavy (246 credits with quotas
and extents before previous patch, still around 200 after it). This is
mostly due to credits required for allocation of quota structures
(credits there are heavily overestimated but it's difficult to make
better estimates if we don't want to wire non-trivial assumptions about
quota format into filesystem).
So move quota initialization out of allocation transaction. That way
transaction for quota structure allocation will be started only if we
need to look up quota structure on disk (rare) and furthermore it will
be started for each quota type separately, not for all of them at once.
This reduces maximum transaction size to 34 is most cases and to 73 in
the worst case.
[ Modified by tytso to clean up the cleanup paths for error handling.
Also use a separate call to ext4_std_error() for each failure so it
is easier for someone who is debugging a problem in this function to
determine which function call failed. ]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4c358f7..9081fef 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -666,6 +666,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
ei = EXT4_I(inode);
sbi = EXT4_SB(sb);
+ /*
+ * Initalize owners and quota early so that we don't have to account
+ * for quota initialization worst case in standard inode creating
+ * transaction
+ */
+ if (owner) {
+ inode->i_mode = mode;
+ i_uid_write(inode, owner[0]);
+ i_gid_write(inode, owner[1]);
+ } else if (test_opt(sb, GRPID)) {
+ inode->i_mode = mode;
+ inode->i_uid = current_fsuid();
+ inode->i_gid = dir->i_gid;
+ } else
+ inode_init_owner(inode, dir, mode);
+ dquot_initialize(inode);
+
if (!goal)
goal = sbi->s_inode_goal;
@@ -697,7 +714,7 @@ got_group:
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
if (!gdp)
- goto fail;
+ goto out;
/*
* Check free inodes count before loading bitmap.
@@ -711,7 +728,7 @@ got_group:
brelse(inode_bitmap_bh);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
if (!inode_bitmap_bh)
- goto fail;
+ goto out;
repeat_in_this_group:
ino = ext4_find_next_zero_bit((unsigned long *)
@@ -733,13 +750,17 @@ repeat_in_this_group:
handle_type, nblocks);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
- goto fail;
+ ext4_std_error(sb, err);
+ goto out;
}
+
}
BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
ext4_lock_group(sb, group);
ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
ext4_unlock_group(sb, group);
@@ -755,8 +776,10 @@ repeat_in_this_group:
got:
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
/* We may have to initialize the block bitmap if it isn't already */
if (ext4_has_group_desc_csum(sb) &&
@@ -768,7 +791,8 @@ got:
err = ext4_journal_get_write_access(handle, block_bitmap_bh);
if (err) {
brelse(block_bitmap_bh);
- goto fail;
+ ext4_std_error(sb, err);
+ goto out;
}
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
@@ -787,14 +811,18 @@ got:
ext4_unlock_group(sb, group);
brelse(block_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
}
BUFFER_TRACE(group_desc_bh, "get_write_access");
err = ext4_journal_get_write_access(handle, group_desc_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
@@ -840,8 +868,10 @@ got:
BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
percpu_counter_dec(&sbi->s_freeinodes_counter);
if (S_ISDIR(mode))
@@ -851,16 +881,6 @@ got:
flex_group = ext4_flex_group(sbi, group);
atomic_dec(&sbi->s_flex_groups[flex_group].free_inodes);
}
- if (owner) {
- inode->i_mode = mode;
- i_uid_write(inode, owner[0]);
- i_gid_write(inode, owner[1]);
- } else if (test_opt(sb, GRPID)) {
- inode->i_mode = mode;
- inode->i_uid = current_fsuid();
- inode->i_gid = dir->i_gid;
- } else
- inode_init_owner(inode, dir, mode);
inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
/* This is the optimal IO size (for stat), not the fs block size */
@@ -889,7 +909,9 @@ got:
* twice.
*/
err = -EIO;
- goto fail;
+ ext4_error(sb, "failed to insert inode %lu: doubly allocated?",
+ inode->i_ino);
+ goto out;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;
@@ -917,7 +939,6 @@ got:
ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
ret = inode;
- dquot_initialize(inode);
err = dquot_alloc_inode(inode);
if (err)
goto fail_drop;
@@ -951,24 +972,17 @@ got:
ext4_debug("allocating inode %lu\n", inode->i_ino);
trace_ext4_allocate_inode(inode, dir, mode);
- goto really_out;
-fail:
- ext4_std_error(sb, err);
-out:
- iput(inode);
- ret = ERR_PTR(err);
-really_out:
brelse(inode_bitmap_bh);
return ret;
fail_free_drop:
dquot_free_inode(inode);
-
fail_drop:
- dquot_drop(inode);
inode->i_flags |= S_NOQUOTA;
clear_nlink(inode);
unlock_new_inode(inode);
+out:
+ dquot_drop(inode);
iput(inode);
brelse(inode_bitmap_bh);
return ERR_PTR(err);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 45a5ca8..955c907 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2252,8 +2252,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2287,8 +2286,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2397,8 +2395,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, S_IFDIR | mode,
&dentry->d_name,
@@ -2827,8 +2824,7 @@ static int ext4_symlink(struct inode *dir,
* quota blocks, sb is already counted in previous macros).
*/
credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
}
retry:
inode = ext4_new_inode_start_handle(dir, S_IFLNK|S_IRWXUGO,
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction
2013-04-18 19:07 ` Theodore Ts'o
@ 2013-04-18 21:46 ` Jan Kara
2013-04-18 23:24 ` [PATCH -v3] ext4: move " Theodore Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-04-18 21:46 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4
On Thu 18-04-13 15:07:38, Ted Tso wrote:
> On Tue, Apr 09, 2013 at 12:42:37PM -0400, Theodore Ts'o wrote:
> >
> > Thanks, added to the dev branch for testing.
> >
>
> This patch was causing a reliable failure for xfstests generic/083
> using bigalloc. I bisected down this patch, and then localized the
> failure down to the insert_inode_locked() call in __ext4_new_inode()
> failing (which means that there is an old inode in the inode hash
> lists with the same inode number)
>
> The error seems to be caused by the error handling code paths. I
> believe the problem was caused by clear_nlink() and unlock_new_inode()
> getting called in some error cleanup paths when the inode that hadn't
> yet been inserted into inode hash lists.
>
> Here is a revamped patch which has a cleaned up set of error paths.
> The only change is in the cleanup paths, and this causes the
> generic/083 test for bigalloc to pass where it was previously failing.
Thanks for catching this. But there's a bug in the new error handling as
well:
> @@ -733,13 +750,17 @@ repeat_in_this_group:
> handle_type, nblocks);
> if (IS_ERR(handle)) {
> err = PTR_ERR(handle);
> - goto fail;
> + ext4_std_error(sb, err);
> + goto out;
> }
> +
Extra empty line...
> @@ -951,24 +972,17 @@ got:
>
> ext4_debug("allocating inode %lu\n", inode->i_ino);
> trace_ext4_allocate_inode(inode, dir, mode);
> - goto really_out;
> -fail:
> - ext4_std_error(sb, err);
> -out:
> - iput(inode);
> - ret = ERR_PTR(err);
> -really_out:
> brelse(inode_bitmap_bh);
> return ret;
>
> fail_free_drop:
> dquot_free_inode(inode);
> -
> fail_drop:
> - dquot_drop(inode);
> inode->i_flags |= S_NOQUOTA;
> clear_nlink(inode);
> unlock_new_inode(inode);
> +out:
> + dquot_drop(inode);
> iput(inode);
> brelse(inode_bitmap_bh);
> return ERR_PTR(err);
You need to move
inode->i_flags |= S_NOQUOTA;
along with dquot_drop() call as dquot_drop() will do nothing on an inode
marked with S_NOQUOTA so we leak dquot references...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH -v3] ext4: move quota initialization out of inode allocation transaction
2013-04-18 21:46 ` Jan Kara
@ 2013-04-18 23:24 ` Theodore Ts'o
2013-04-19 14:30 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2013-04-18 23:24 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Jan Kara, Theodore Ts'o
From: Jan Kara <jack@suse.cz>
Inode allocation transaction is pretty heavy (246 credits with quotas
and extents before previous patch, still around 200 after it). This is
mostly due to credits required for allocation of quota structures
(credits there are heavily overestimated but it's difficult to make
better estimates if we don't want to wire non-trivial assumptions about
quota format into filesystem).
So move quota initialization out of allocation transaction. That way
transaction for quota structure allocation will be started only if we
need to look up quota structure on disk (rare) and furthermore it will
be started for each quota type separately, not for all of them at once.
This reduces maximum transaction size to 34 is most cases and to 73 in
the worst case.
[ Modified by tytso to clean up the cleanup paths for error handling.
Also use a separate call to ext4_std_error() for each failure so it
is easier for someone who is debugging a problem in this function to
determine which function call failed. ]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
fs/ext4/ialloc.c | 85 ++++++++++++++++++++++++++++++++------------------------
fs/ext4/namei.c | 12 +++-----
2 files changed, 53 insertions(+), 44 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4c358f7..d92e4ff 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -666,6 +666,23 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
ei = EXT4_I(inode);
sbi = EXT4_SB(sb);
+ /*
+ * Initalize owners and quota early so that we don't have to account
+ * for quota initialization worst case in standard inode creating
+ * transaction
+ */
+ if (owner) {
+ inode->i_mode = mode;
+ i_uid_write(inode, owner[0]);
+ i_gid_write(inode, owner[1]);
+ } else if (test_opt(sb, GRPID)) {
+ inode->i_mode = mode;
+ inode->i_uid = current_fsuid();
+ inode->i_gid = dir->i_gid;
+ } else
+ inode_init_owner(inode, dir, mode);
+ dquot_initialize(inode);
+
if (!goal)
goal = sbi->s_inode_goal;
@@ -697,7 +714,7 @@ got_group:
gdp = ext4_get_group_desc(sb, group, &group_desc_bh);
if (!gdp)
- goto fail;
+ goto out;
/*
* Check free inodes count before loading bitmap.
@@ -711,7 +728,7 @@ got_group:
brelse(inode_bitmap_bh);
inode_bitmap_bh = ext4_read_inode_bitmap(sb, group);
if (!inode_bitmap_bh)
- goto fail;
+ goto out;
repeat_in_this_group:
ino = ext4_find_next_zero_bit((unsigned long *)
@@ -733,13 +750,16 @@ repeat_in_this_group:
handle_type, nblocks);
if (IS_ERR(handle)) {
err = PTR_ERR(handle);
- goto fail;
+ ext4_std_error(sb, err);
+ goto out;
}
}
BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
ext4_lock_group(sb, group);
ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
ext4_unlock_group(sb, group);
@@ -755,8 +775,10 @@ repeat_in_this_group:
got:
BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
/* We may have to initialize the block bitmap if it isn't already */
if (ext4_has_group_desc_csum(sb) &&
@@ -768,7 +790,8 @@ got:
err = ext4_journal_get_write_access(handle, block_bitmap_bh);
if (err) {
brelse(block_bitmap_bh);
- goto fail;
+ ext4_std_error(sb, err);
+ goto out;
}
BUFFER_TRACE(block_bitmap_bh, "dirty block bitmap");
@@ -787,14 +810,18 @@ got:
ext4_unlock_group(sb, group);
brelse(block_bitmap_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
}
BUFFER_TRACE(group_desc_bh, "get_write_access");
err = ext4_journal_get_write_access(handle, group_desc_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
/* Update the relevant bg descriptor fields */
if (ext4_has_group_desc_csum(sb)) {
@@ -840,8 +867,10 @@ got:
BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
- if (err)
- goto fail;
+ if (err) {
+ ext4_std_error(sb, err);
+ goto out;
+ }
percpu_counter_dec(&sbi->s_freeinodes_counter);
if (S_ISDIR(mode))
@@ -851,16 +880,6 @@ got:
flex_group = ext4_flex_group(sbi, group);
atomic_dec(&sbi->s_flex_groups[flex_group].free_inodes);
}
- if (owner) {
- inode->i_mode = mode;
- i_uid_write(inode, owner[0]);
- i_gid_write(inode, owner[1]);
- } else if (test_opt(sb, GRPID)) {
- inode->i_mode = mode;
- inode->i_uid = current_fsuid();
- inode->i_gid = dir->i_gid;
- } else
- inode_init_owner(inode, dir, mode);
inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
/* This is the optimal IO size (for stat), not the fs block size */
@@ -889,7 +908,9 @@ got:
* twice.
*/
err = -EIO;
- goto fail;
+ ext4_error(sb, "failed to insert inode %lu: doubly allocated?",
+ inode->i_ino);
+ goto out;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;
@@ -917,7 +938,6 @@ got:
ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
ret = inode;
- dquot_initialize(inode);
err = dquot_alloc_inode(inode);
if (err)
goto fail_drop;
@@ -951,24 +971,17 @@ got:
ext4_debug("allocating inode %lu\n", inode->i_ino);
trace_ext4_allocate_inode(inode, dir, mode);
- goto really_out;
-fail:
- ext4_std_error(sb, err);
-out:
- iput(inode);
- ret = ERR_PTR(err);
-really_out:
brelse(inode_bitmap_bh);
return ret;
fail_free_drop:
dquot_free_inode(inode);
-
fail_drop:
- dquot_drop(inode);
- inode->i_flags |= S_NOQUOTA;
clear_nlink(inode);
unlock_new_inode(inode);
+out:
+ inode->i_flags |= S_NOQUOTA;
+ dquot_drop(inode);
iput(inode);
brelse(inode_bitmap_bh);
return ERR_PTR(err);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 45a5ca8..955c907 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2252,8 +2252,7 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2287,8 +2286,7 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, mode, &dentry->d_name, 0,
NULL, EXT4_HT_DIR, credits);
@@ -2397,8 +2395,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
dquot_initialize(dir);
credits = (EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3);
retry:
inode = ext4_new_inode_start_handle(dir, S_IFDIR | mode,
&dentry->d_name,
@@ -2827,8 +2824,7 @@ static int ext4_symlink(struct inode *dir,
* quota blocks, sb is already counted in previous macros).
*/
credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
- EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
- EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb);
+ EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3;
}
retry:
inode = ext4_new_inode_start_handle(dir, S_IFLNK|S_IRWXUGO,
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH -v3] ext4: move quota initialization out of inode allocation transaction
2013-04-18 23:24 ` [PATCH -v3] ext4: move " Theodore Ts'o
@ 2013-04-19 14:30 ` Jan Kara
2013-04-19 16:24 ` Theodore Ts'o
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-04-19 14:30 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List, Jan Kara
Still not quite...
On Thu 18-04-13 19:24:04, Ted Tso wrote:
> @@ -951,24 +971,17 @@ got:
>
> ext4_debug("allocating inode %lu\n", inode->i_ino);
> trace_ext4_allocate_inode(inode, dir, mode);
> - goto really_out;
> -fail:
> - ext4_std_error(sb, err);
> -out:
> - iput(inode);
> - ret = ERR_PTR(err);
> -really_out:
> brelse(inode_bitmap_bh);
> return ret;
>
> fail_free_drop:
> dquot_free_inode(inode);
> -
> fail_drop:
> - dquot_drop(inode);
> - inode->i_flags |= S_NOQUOTA;
> clear_nlink(inode);
> unlock_new_inode(inode);
> +out:
> + inode->i_flags |= S_NOQUOTA;
> + dquot_drop(inode);
The above two lines have to be swapped to preserve original ordering
which is actually important. Thanks.
Honza
> iput(inode);
> brelse(inode_bitmap_bh);
> return ERR_PTR(err);
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -v3] ext4: move quota initialization out of inode allocation transaction
2013-04-19 14:30 ` Jan Kara
@ 2013-04-19 16:24 ` Theodore Ts'o
2013-04-19 19:21 ` Jan Kara
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2013-04-19 16:24 UTC (permalink / raw)
To: Jan Kara; +Cc: Ext4 Developers List
On Fri, Apr 19, 2013 at 04:30:28PM +0200, Jan Kara wrote:
> > + inode->i_flags |= S_NOQUOTA;
> > + dquot_drop(inode);
> The above two lines have to be swapped to preserve original ordering
> which is actually important. Thanks.
>
> > iput(inode);
OK. I'm confused --- what is the function of the S_NOQUOTA flag here?
I thought it was dquot_drop() which needed this magic flag set. I can
swap these two lines, but maybe we should add a comment somewhere
about why this flag is needed and what the heck it's doing? At this
point, the use of this flag seems like total magic to me. :-(
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH -v3] ext4: move quota initialization out of inode allocation transaction
2013-04-19 16:24 ` Theodore Ts'o
@ 2013-04-19 19:21 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2013-04-19 19:21 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Jan Kara, Ext4 Developers List
On Fri 19-04-13 12:24:53, Ted Tso wrote:
> On Fri, Apr 19, 2013 at 04:30:28PM +0200, Jan Kara wrote:
> > > + inode->i_flags |= S_NOQUOTA;
> > > + dquot_drop(inode);
> > The above two lines have to be swapped to preserve original ordering
> > which is actually important. Thanks.
> >
> > > iput(inode);
>
> OK. I'm confused --- what is the function of the S_NOQUOTA flag here?
> I thought it was dquot_drop() which needed this magic flag set. I can
> swap these two lines, but maybe we should add a comment somewhere
> about why this flag is needed and what the heck it's doing? At this
> point, the use of this flag seems like total magic to me. :-(
S_NOQUOTA flag means for quota code: Don't touch this inode, don't
account this inode or it's blocks. So in the file creation path we are
setting it when we bail out before inode is actually accounted in quota
(so that we don't decrease used inode count when freeing the inode
without incrementing it). To make things simple, we also set the flag when
the inode was already accounted for, but we handle the quota cleanup
ourselves (see fail_free_drop label).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-19 19:21 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-09 8:34 [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Jan Kara
2013-04-09 8:34 ` [PATCH 2/2] ext4: Move quota initialization out of inode allocation transaction Jan Kara
2013-04-09 16:42 ` Theodore Ts'o
2013-04-18 19:07 ` Theodore Ts'o
2013-04-18 21:46 ` Jan Kara
2013-04-18 23:24 ` [PATCH -v3] ext4: move " Theodore Ts'o
2013-04-19 14:30 ` Jan Kara
2013-04-19 16:24 ` Theodore Ts'o
2013-04-19 19:21 ` Jan Kara
2013-04-09 16:42 ` [PATCH 1/2] ext4: Improve credit estimate for EXT4_SINGLEDATA_TRANS_BLOCKS Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).