From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Mingming Cao <cmm@us.ibm.com>
Cc: tytso <tytso@mit.edu>, Shehjar Tikoo <shehjart@cse.unsw.edu.au>,
linux-ext4@vger.kernel.org
Subject: Re: [RFC]Ext4: journal credits reservation fixes for DIO, fallocate and delalloc writepages
Date: Mon, 28 Jul 2008 21:41:56 +0530 [thread overview]
Message-ID: <20080728161156.GB4545@skywalker> (raw)
In-Reply-To: <1217014002.6322.29.camel@mingming-laptop>
On Fri, Jul 25, 2008 at 12:26:42PM -0700, Mingming Cao wrote:
>
......
> > > */
> > > int ext4_ext_calc_credits_for_insert(struct inode *inode,
> > > struct ext4_ext_path *path)
> > > @@ -1930,9 +1931,6 @@ int ext4_ext_calc_credits_for_insert(str
> > > */
> > > needed += (depth * 2) + (depth * 2);
> > >
> > > - /* any allocation modifies superblock */
> > > - needed += 1;
> > > -
> >
> >
> > Why are we dropping the super block modification credit ? An insert of
> > an extent can result in super block modification also. If the goal is to
> > use ext4_writepages_trans_blocks everywhere then this change is correct.
> > But i see many place not using ext4_writepages_trans_blocks.
> >
> ext4_writepages_trans_blocks() will take care of the credits need for
> updating superblock, for just once.ext4_ext_calc_credits_for_insert()
> calculate the credits needed for inserting a single extent, You could
> see in ext4_writepages_trans_blocks(), it will multiple it will the
> total numberof blocks. If we account for superblock credit in
> ext4_ext_calc_credits_for_insert(), then super block updates for
> multiple extents allocation will be overaccounted, and have to remove
> that later in ext4_writepages_trans_blocks()
ext4_ext_calc_credit for insert was not doing it currently with
path != NULL. I attaching a patch which reflect some of the changes
I suggested.
>
> Other places calling ext4_ext_calc_credits_for_insert() (mostly
> migrate.c and defrag,c) are updated to add extra 4 credits, including
> superblock, inode block and quota blocks.
I guess it should not be +4 but should be +2 + 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
>
> > You also need to update the function comment saying that super block
> > update is not accounted.Also it doesn't account for block bitmap,
> > group descriptor and inode block update.
> >
>
> Credits for block bitmap and group descriptors are accounted in
> ext4_ext_calc_credits_for_insert()
>
> inode block update is only accounted once per writepages, so it's
> accounted in
> ext4_writepages_trans_blocks()/ext4_ext_writepages_trans_blocks()
>
....
> > > *
> > > @@ -1142,13 +1133,13 @@ int ext4_get_block(struct inode *inode,
> > > handle_t *handle = ext4_journal_current_handle();
> > > int ret = 0, started = 0;
> > > unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
> > > + int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> >
> > Again this should be
> > int dio_credits = ext4_writeblocks_trans(inode, DIO_MAX_BLOCKS);
> >
> No. DIO case is different than writepages(). When get_block() is called
> from DIO path, the get_block() is called in a loop, and the credit is
> reserved inside the loop. Each time get_block(),will return a single
> extent, so we should use EXT4_DATA_TRANS_BLOCKS(inode->i_sb) which is
> calculate a single chunk of allocation credits.
That is true only for extent format. With noextents we need something
like below
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
dio_credits = ext4_writeblocks_trans_credits_old(inode, max_blocks);
else {
/*
* For extent format get_block return only one extent
*/
dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
}
>
> ext4_da_writepages() is different, we have to reserve the credits
> outside the loop of calling get_block(), since the underlying
> get_block() could be called multiple times, we need to worse case, so
> ext4_writepages_trans_blocks() is called to handling the worse case.
>
> > >
> > > if (create && !handle) {
> > > /* Direct IO write... */
> > > if (max_blocks > DIO_MAX_BLOCKS)
> > > max_blocks = DIO_MAX_BLOCKS;
> > > - handle = ext4_journal_start(inode, DIO_CREDITS +
> > > - 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb));
> > > + handle = ext4_journal_start(inode, dio_credits);
> > > if (IS_ERR(handle)) {
> > > ret = PTR_ERR(handle);
> > > goto out;
> > > @@ -1327,7 +1318,7 @@ static int ext4_write_begin(struct file
> > > struct page **pagep, void **fsdata)
> > > {
> > > struct inode *inode = mapping->host;
....
> >
> > > - int bpp = ext4_journal_blocks_per_page(inode);
> > > - int indirects = (EXT4_NDIR_BLOCKS % bpp) ? 5 : 3;
> > > + int indirects = (EXT4_NDIR_BLOCKS % num) ? 5 : 3;
> > > int ret;
> > >
> > > - if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
> > > - return ext4_ext_writepage_trans_blocks(inode, bpp);
> > > -
> > > if (ext4_should_journal_data(inode))
> > > - ret = 3 * (bpp + indirects) + 2;
> > > + ret = 3 * (num + indirects) + 2;
> > > else
> > > - ret = 2 * (bpp + indirects) + 2;
> > > + ret = 2 * (num + indirects) + 2;
> >
> >
> > With non journalled moded we should just decrease num. I guess the above
> > should be
> > if (ext4_should_journal_data(inode))
> > ret = 3 * (num + indirects) + 2;
> > else
> > ret = 3 * (num + indirects) + 2 - num;
> >
> >
> >
>
> Well, I think in the journalled mode we need to journal the content of
> the indirect/double indirect blocks also, no?
>
With non journalled mode we still need to journal the indirect, dindirect
and tindirect block so this should be
if (ext4_should_journal_data(inode))
ret = 3 * (num + indirects) + 2;
else
ret = 3 * (num + indirects) + 2 - num;
Attaching the patch for easy review
diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
index 7c819fd..46e9600 100644
--- a/fs/ext4/defrag.c
+++ b/fs/ext4/defrag.c
@@ -1385,8 +1385,9 @@ ext4_defrag_alloc_blocks(handle_t *handle, struct inode *org_inode,
struct buffer_head *bh = NULL;
int err, i, credits = 0;
- credits = ext4_ext_calc_credits_for_single_extent(dest_inode, dest_path)
- + 4;
+ credits = ext4_ext_calc_credits_for_single_extent(dest_inode,
+ dest_path) + 2 +
+ 2 * EXT4_QUOTA_TRANS_BLOCKS(dest_inode->i_sb);
err = ext4_ext_journal_restart(handle,
credits + EXT4_TRANS_META_BLOCKS);
if (err)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77f4f94..969b1e6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1898,6 +1898,9 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
* for old tree and new tree, for every level we need to reserve
* credits to log the bitmap and block group descriptors
*
+ * This doesn't take into account credit needed for the update of
+ * super block + inode block + quota files
+ *
*/
int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
struct ext4_ext_path *path)
@@ -1909,7 +1912,8 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
depth = ext_depth(inode);
if (le16_to_cpu(path[depth].p_hdr->eh_entries)
< le16_to_cpu(path[depth].p_hdr->eh_max))
- return 1;
+ /* 1 group desc + 1 block bitmap for allocated blocks */
+ return 2;
}
/*
@@ -1919,7 +1923,7 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
*/
depth = 5;
- /* allocation of new data block(s) */
+ /* 1 group desc + 1 block bitmap for allocated blocks */
needed = 2;
/*
@@ -2059,9 +2063,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
correct_index = 1;
credits += (ext_depth(inode)) + 1;
}
-#ifdef CONFIG_QUOTA
credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
err = ext4_ext_journal_restart(handle, credits);
if (err)
@@ -3023,9 +3025,7 @@ int ext4_ext_writeblocks_trans_credits(struct inode *inode, int nrblocks)
else
needed = nrblocks * needed + 2;
-#ifdef CONFIG_QUOTA
needed += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
return needed;
}
@@ -3092,7 +3092,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
/*
* credits to insert 1 extent into extent tree
*/
- credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
mutex_lock(&inode->i_mutex);
retry:
while (ret >= 0 && ret < max_blocks) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5a394c8..d2832a1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1126,18 +1126,29 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
up_write((&EXT4_I(inode)->i_data_sem));
return retval;
}
+static int ext4_writeblocks_trans_credits_old(struct inode *, int);
int ext4_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = ext4_journal_current_handle();
int ret = 0, started = 0;
unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
- int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ int dio_credits;
if (create && !handle) {
/* Direct IO write... */
if (max_blocks > DIO_MAX_BLOCKS)
max_blocks = DIO_MAX_BLOCKS;
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ dio_credits = ext4_writeblocks_trans_credits_old(inode,
+ max_blocks);
+ else {
+ /*
+ * For extent format get_block return only one extent
+ */
+ dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ }
+
handle = ext4_journal_start(inode, dio_credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -4310,14 +4321,18 @@ static int ext4_writeblocks_trans_credits_old(struct inode *inode, int nrblocks)
if (ext4_should_journal_data(inode))
ret = 3 * (nrblocks + indirects) + 2;
- else
- ret = 2 * (nrblocks + indirects) + 2;
+ else {
+ /*
+ * We still need to journal update for the
+ * indirect, dindirect, and tindirect blocks
+ * only data blocks are not journalled
+ */
+ ret = 3 * (nrblocks + indirects) + 2 - nrblocks;
+ }
-#ifdef CONFIG_QUOTA
/* We know that structure was already allocated during DQUOT_INIT so
* we will be updating only the data blocks + inodes */
- ret += 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
+ ret += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
return ret;
}
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3456094..72488ab 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -55,7 +55,8 @@ static int finish_range(handle_t *handle, struct inode *inode,
*
* extra 4 credits for: 1 superblock, 1 inode block, 2 quotas
*/
- needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 4;
+ needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 2 +
+ 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);;
/*
* Make sure the credit we accumalated is not really high
next prev parent reply other threads:[~2008-07-28 16:12 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-21 4:28 Crash and stack trace for jbd2 under ext4 Shehjar Tikoo
2008-07-21 8:20 ` Aneesh Kumar K.V
2008-07-23 0:49 ` Mingming Cao
2008-07-23 0:51 ` [RFC]Ext4: journal credits reservation fixes for DIO, fallocate and delalloc writepages Mingming Cao
2008-07-23 1:18 ` Andreas Dilger
2008-07-23 18:19 ` Theodore Tso
2008-07-25 19:38 ` Mingming Cao
2008-07-23 7:42 ` Aneesh Kumar K.V
2008-07-24 2:07 ` Andreas Dilger
2008-07-25 19:26 ` Mingming Cao
2008-07-28 16:11 ` Aneesh Kumar K.V [this message]
2008-07-28 19:07 ` Mingming Cao
2008-07-29 6:24 ` Aneesh Kumar K.V
2008-07-26 0:42 ` [PATCH v2]Ext4: " Mingming Cao
2008-07-30 1:58 ` [PATCH v3]Ext4: " Mingming Cao
2008-07-30 11:29 ` Frédéric Bohé
2008-07-31 18:07 ` Mingming Cao
2008-08-01 5:49 ` Theodore Tso
2008-08-01 10:51 ` Frédéric Bohé
2008-08-01 18:08 ` Mingming Cao
2008-08-01 18:03 ` Mingming Cao
2008-08-01 19:10 ` Theodore Tso
2008-08-02 0:03 ` Theodore Tso
2008-08-04 11:23 ` Frédéric Bohé
2008-08-04 13:20 ` Theodore Tso
2008-07-30 11:36 ` Andreas Dilger
2008-07-30 12:16 ` Aneesh Kumar K.V
2008-08-01 19:29 ` Theodore Tso
2008-08-02 0:22 ` Theodore Tso
2008-08-12 16:23 ` [PATCH 0/6 ]Ext4 journal credits reservation fixes Mingming Cao
2008-08-12 16:25 ` [PATCH 1/6 ]Ext4 credits caclulation cleanup and fix that for nonextent writepage Mingming Cao
2008-08-13 8:31 ` Aneesh Kumar K.V
2008-08-14 0:30 ` Mingming Cao
2008-08-13 10:19 ` Aneesh Kumar K.V
2008-08-14 1:02 ` Mingming Cao
2008-08-16 0:37 ` [PATCH 1/6 V2 " Mingming Cao
2008-08-12 16:27 ` [PATCH 2/6 ]Ext4: journal credits reservation fixes for extent file writepage Mingming Cao
2008-08-13 8:37 ` Aneesh Kumar K.V
2008-08-14 0:26 ` Mingming Cao
2008-08-14 8:28 ` Aneesh Kumar K.V
2008-08-16 0:38 ` [PATCH 2/6 V2]Ext4: " Mingming Cao
2008-08-16 4:25 ` Aneesh Kumar K.V
2008-08-12 16:29 ` [PATCH 3/6 ]Ext4: journal credits reservation fixes for DIO, fallocate Mingming Cao
2008-08-13 8:53 ` Aneesh Kumar K.V
2008-08-13 10:14 ` Aneesh Kumar K.V
2008-08-14 0:50 ` Mingming Cao
2008-08-16 0:39 ` [PATCH 3/6 V2 " Mingming Cao
2008-08-12 16:32 ` [PATCH 4/6 ]ext4: Rework the ext4_da_writepages Mingming Cao
2008-08-16 0:43 ` [PATCH 4/6 V2]ext4: " Mingming Cao
2008-08-12 16:35 ` [PATCH 5/6 ]Ext4 journal credits reservation fixes Mingming Cao
2008-08-13 9:46 ` Aneesh Kumar K.V
2008-08-14 1:01 ` Mingming Cao
2008-08-14 8:40 ` Aneesh Kumar K.V
2008-08-16 0:40 ` [PATCH 5/6 V2]Ext4 journal credits fixes for delalloc writepages Mingming Cao
2008-08-16 4:23 ` Aneesh Kumar K.V
2008-08-12 16:37 ` [PATCH 6/6 ]Ext4 journal credits reservation fixes for defrag Mingming Cao
2008-08-16 0:45 ` [PATCH 6/6 V2]Ext4 " Mingming Cao
2008-08-16 15:55 ` Theodore Tso
2008-08-15 17:33 ` [PATCH 0/6 ]Ext4 journal credits reservation fixes Aneesh Kumar K.V
2008-08-15 19:02 ` Mingming Cao
2008-08-16 0:34 ` Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080728161156.GB4545@skywalker \
--to=aneesh.kumar@linux.vnet.ibm.com \
--cc=cmm@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=shehjart@cse.unsw.edu.au \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox