From: Mingming Cao <cmm@us.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: tytso <tytso@mit.edu>, Shehjar Tikoo <shehjart@cse.unsw.edu.au>,
linux-ext4@vger.kernel.org
Subject: Re: [RFC]Ext4: journal credits reservation fixes for DIO, fallocate and delalloc writepages
Date: Mon, 28 Jul 2008 12:07:33 -0700 [thread overview]
Message-ID: <1217272053.6379.7.camel@mingming-laptop> (raw)
In-Reply-To: <20080728161156.GB4545@skywalker>
在 2008-07-28一的 21:41 +0530,Aneesh Kumar K.V写道:
> On Fri, Jul 25, 2008 at 12:26:42PM -0700, Mingming Cao wrote:
> >
>
> ......
>
> > > > */
> > > > int ext4_ext_calc_credits_for_insert(struct inode *inode,
> > > > struct ext4_ext_path *path)
> > > > @@ -1930,9 +1931,6 @@ int ext4_ext_calc_credits_for_insert(str
> > > > */
> > > > needed += (depth * 2) + (depth * 2);
> > > >
> > > > - /* any allocation modifies superblock */
> > > > - needed += 1;
> > > > -
> > >
> > >
> > > Why are we dropping the super block modification credit ? An insert of
> > > an extent can result in super block modification also. If the goal is to
> > > use ext4_writepages_trans_blocks everywhere then this change is correct.
> > > But i see many place not using ext4_writepages_trans_blocks.
> > >
> > ext4_writepages_trans_blocks() will take care of the credits need for
> > updating superblock, for just once.ext4_ext_calc_credits_for_insert()
> > calculate the credits needed for inserting a single extent, You could
> > see in ext4_writepages_trans_blocks(), it will multiple it will the
> > total numberof blocks. If we account for superblock credit in
> > ext4_ext_calc_credits_for_insert(), then super block updates for
> > multiple extents allocation will be overaccounted, and have to remove
> > that later in ext4_writepages_trans_blocks()
>
>
>
> ext4_ext_calc_credit for insert was not doing it currently with
> path != NULL. I attaching a patch which reflect some of the changes
> I suggested.
>
>
Acked. this fixed another bug in existing code.
> >
> > Other places calling ext4_ext_calc_credits_for_insert() (mostly
> > migrate.c and defrag,c) are updated to add extra 4 credits, including
> > superblock, inode block and quota blocks.
>
>
> I guess it should not be +4 but should be +2 + 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
>
Yes.
> >
> > > You also need to update the function comment saying that super block
> > > update is not accounted.Also it doesn't account for block bitmap,
> > > group descriptor and inode block update.
> > >
> >
> > Credits for block bitmap and group descriptors are accounted in
> > ext4_ext_calc_credits_for_insert()
> >
> > inode block update is only accounted once per writepages, so it's
> > accounted in
> > ext4_writepages_trans_blocks()/ext4_ext_writepages_trans_blocks()
> >
>
> ....
>
> > > > *
> > > > @@ -1142,13 +1133,13 @@ int ext4_get_block(struct inode *inode,
> > > > handle_t *handle = ext4_journal_current_handle();
> > > > int ret = 0, started = 0;
> > > > unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
> > > > + int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> > >
> > > Again this should be
> > > int dio_credits = ext4_writeblocks_trans(inode, DIO_MAX_BLOCKS);
> > >
> > No. DIO case is different than writepages(). When get_block() is called
> > from DIO path, the get_block() is called in a loop, and the credit is
> > reserved inside the loop. Each time get_block(),will return a single
> > extent, so we should use EXT4_DATA_TRANS_BLOCKS(inode->i_sb) which is
> > calculate a single chunk of allocation credits.
>
>
> That is true only for extent format. With noextents we need something
> like below
>
Not really, even with non extent format block allocation, ext4_get_block() will only allocate/map a chunk of contiguous blocks (i.e. an extent), so it will not hit the worse case.
> if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> dio_credits = ext4_writeblocks_trans_credits_old(inode, max_blocks);
> else {
> /*
> * For extent format get_block return only one extent
> */
> dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> }
>
>
...
>
> > >
> > > > - int bpp = ext4_journal_blocks_per_page(inode);
> > > > - int indirects = (EXT4_NDIR_BLOCKS % bpp) ? 5 : 3;
> > > > + int indirects = (EXT4_NDIR_BLOCKS % num) ? 5 : 3;
> > > > int ret;
> > > >
> > > > - if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
> > > > - return ext4_ext_writepage_trans_blocks(inode, bpp);
> > > > -
> > > > if (ext4_should_journal_data(inode))
> > > > - ret = 3 * (bpp + indirects) + 2;
> > > > + ret = 3 * (num + indirects) + 2;
> > > > else
> > > > - ret = 2 * (bpp + indirects) + 2;
> > > > + ret = 2 * (num + indirects) + 2;
> > >
> > >
> > > With non journalled moded we should just decrease num. I guess the above
> > > should be
> > > if (ext4_should_journal_data(inode))
> > > ret = 3 * (num + indirects) + 2;
> > > else
> > > ret = 3 * (num + indirects) + 2 - num;
> > >
> > >
> > >
> >
> > Well, I think in the journalled mode we need to journal the content of
> > the indirect/double indirect blocks also, no?
> >
>
> With non journalled mode we still need to journal the indirect, dindirect
> and tindirect block so this should be
>
yes, changes of indirect blocks are also logged in all journalling
mode.
Mingming
> Attaching the patch for easy review
>
>
>
> diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
> index 7c819fd..46e9600 100644
> --- a/fs/ext4/defrag.c
> +++ b/fs/ext4/defrag.c
> @@ -1385,8 +1385,9 @@ ext4_defrag_alloc_blocks(handle_t *handle, struct inode *org_inode,
> struct buffer_head *bh = NULL;
> int err, i, credits = 0;
>
> - credits = ext4_ext_calc_credits_for_single_extent(dest_inode, dest_path)
> - + 4;
> + credits = ext4_ext_calc_credits_for_single_extent(dest_inode,
> + dest_path) + 2 +
> + 2 * EXT4_QUOTA_TRANS_BLOCKS(dest_inode->i_sb);
> err = ext4_ext_journal_restart(handle,
> credits + EXT4_TRANS_META_BLOCKS);
> if (err)
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 77f4f94..969b1e6 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1898,6 +1898,9 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
> * for old tree and new tree, for every level we need to reserve
> * credits to log the bitmap and block group descriptors
> *
> + * This doesn't take into account credit needed for the update of
> + * super block + inode block + quota files
> + *
> */
> int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
> struct ext4_ext_path *path)
> @@ -1909,7 +1912,8 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
> depth = ext_depth(inode);
> if (le16_to_cpu(path[depth].p_hdr->eh_entries)
> < le16_to_cpu(path[depth].p_hdr->eh_max))
> - return 1;
> + /* 1 group desc + 1 block bitmap for allocated blocks */
> + return 2;
> }
>
> /*
> @@ -1919,7 +1923,7 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
> */
> depth = 5;
>
> - /* allocation of new data block(s) */
> + /* 1 group desc + 1 block bitmap for allocated blocks */
> needed = 2;
>
> /*
> @@ -2059,9 +2063,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> correct_index = 1;
> credits += (ext_depth(inode)) + 1;
> }
> -#ifdef CONFIG_QUOTA
> credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
> -#endif
>
> err = ext4_ext_journal_restart(handle, credits);
> if (err)
> @@ -3023,9 +3025,7 @@ int ext4_ext_writeblocks_trans_credits(struct inode *inode, int nrblocks)
> else
> needed = nrblocks * needed + 2;
>
> -#ifdef CONFIG_QUOTA
> needed += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
> -#endif
>
> return needed;
> }
> @@ -3092,7 +3092,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
> /*
> * credits to insert 1 extent into extent tree
> */
> - credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> + credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> mutex_lock(&inode->i_mutex);
> retry:
> while (ret >= 0 && ret < max_blocks) {
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5a394c8..d2832a1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1126,18 +1126,29 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
> up_write((&EXT4_I(inode)->i_data_sem));
> return retval;
> }
> +static int ext4_writeblocks_trans_credits_old(struct inode *, int);
> int ext4_get_block(struct inode *inode, sector_t iblock,
> struct buffer_head *bh_result, int create)
> {
> handle_t *handle = ext4_journal_current_handle();
> int ret = 0, started = 0;
> unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
> - int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> + int dio_credits;
>
> if (create && !handle) {
> /* Direct IO write... */
> if (max_blocks > DIO_MAX_BLOCKS)
> max_blocks = DIO_MAX_BLOCKS;
> + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> + dio_credits = ext4_writeblocks_trans_credits_old(inode,
> + max_blocks);
> + else {
> + /*
> + * For extent format get_block return only one extent
> + */
> + dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> + }
> +
> handle = ext4_journal_start(inode, dio_credits);
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> @@ -4310,14 +4321,18 @@ static int ext4_writeblocks_trans_credits_old(struct inode *inode, int nrblocks)
>
> if (ext4_should_journal_data(inode))
> ret = 3 * (nrblocks + indirects) + 2;
> - else
> - ret = 2 * (nrblocks + indirects) + 2;
> + else {
> + /*
> + * We still need to journal update for the
> + * indirect, dindirect, and tindirect blocks
> + * only data blocks are not journalled
> + */
> + ret = 3 * (nrblocks + indirects) + 2 - nrblocks;
> + }
>
> -#ifdef CONFIG_QUOTA
> /* We know that structure was already allocated during DQUOT_INIT so
> * we will be updating only the data blocks + inodes */
> - ret += 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
> -#endif
> + ret += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
>
> return ret;
> }
> diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
> index 3456094..72488ab 100644
> --- a/fs/ext4/migrate.c
> +++ b/fs/ext4/migrate.c
> @@ -55,7 +55,8 @@ static int finish_range(handle_t *handle, struct inode *inode,
> *
> * extra 4 credits for: 1 superblock, 1 inode block, 2 quotas
> */
> - needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 4;
> + needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 2 +
> + 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);;
>
> /*
> * Make sure the credit we accumalated is not really high
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-07-28 19:07 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-21 4:28 Crash and stack trace for jbd2 under ext4 Shehjar Tikoo
2008-07-21 8:20 ` Aneesh Kumar K.V
2008-07-23 0:49 ` Mingming Cao
2008-07-23 0:51 ` [RFC]Ext4: journal credits reservation fixes for DIO, fallocate and delalloc writepages Mingming Cao
2008-07-23 1:18 ` Andreas Dilger
2008-07-23 18:19 ` Theodore Tso
2008-07-25 19:38 ` Mingming Cao
2008-07-23 7:42 ` Aneesh Kumar K.V
2008-07-24 2:07 ` Andreas Dilger
2008-07-25 19:26 ` Mingming Cao
2008-07-28 16:11 ` Aneesh Kumar K.V
2008-07-28 19:07 ` Mingming Cao [this message]
2008-07-29 6:24 ` Aneesh Kumar K.V
2008-07-26 0:42 ` [PATCH v2]Ext4: " Mingming Cao
2008-07-30 1:58 ` [PATCH v3]Ext4: " Mingming Cao
2008-07-30 11:29 ` Frédéric Bohé
2008-07-31 18:07 ` Mingming Cao
2008-08-01 5:49 ` Theodore Tso
2008-08-01 10:51 ` Frédéric Bohé
2008-08-01 18:08 ` Mingming Cao
2008-08-01 18:03 ` Mingming Cao
2008-08-01 19:10 ` Theodore Tso
2008-08-02 0:03 ` Theodore Tso
2008-08-04 11:23 ` Frédéric Bohé
2008-08-04 13:20 ` Theodore Tso
2008-07-30 11:36 ` Andreas Dilger
2008-07-30 12:16 ` Aneesh Kumar K.V
2008-08-01 19:29 ` Theodore Tso
2008-08-02 0:22 ` Theodore Tso
2008-08-12 16:23 ` [PATCH 0/6 ]Ext4 journal credits reservation fixes Mingming Cao
2008-08-12 16:25 ` [PATCH 1/6 ]Ext4 credits caclulation cleanup and fix that for nonextent writepage Mingming Cao
2008-08-13 8:31 ` Aneesh Kumar K.V
2008-08-14 0:30 ` Mingming Cao
2008-08-13 10:19 ` Aneesh Kumar K.V
2008-08-14 1:02 ` Mingming Cao
2008-08-16 0:37 ` [PATCH 1/6 V2 " Mingming Cao
2008-08-12 16:27 ` [PATCH 2/6 ]Ext4: journal credits reservation fixes for extent file writepage Mingming Cao
2008-08-13 8:37 ` Aneesh Kumar K.V
2008-08-14 0:26 ` Mingming Cao
2008-08-14 8:28 ` Aneesh Kumar K.V
2008-08-16 0:38 ` [PATCH 2/6 V2]Ext4: " Mingming Cao
2008-08-16 4:25 ` Aneesh Kumar K.V
2008-08-12 16:29 ` [PATCH 3/6 ]Ext4: journal credits reservation fixes for DIO, fallocate Mingming Cao
2008-08-13 8:53 ` Aneesh Kumar K.V
2008-08-13 10:14 ` Aneesh Kumar K.V
2008-08-14 0:50 ` Mingming Cao
2008-08-16 0:39 ` [PATCH 3/6 V2 " Mingming Cao
2008-08-12 16:32 ` [PATCH 4/6 ]ext4: Rework the ext4_da_writepages Mingming Cao
2008-08-16 0:43 ` [PATCH 4/6 V2]ext4: " Mingming Cao
2008-08-12 16:35 ` [PATCH 5/6 ]Ext4 journal credits reservation fixes Mingming Cao
2008-08-13 9:46 ` Aneesh Kumar K.V
2008-08-14 1:01 ` Mingming Cao
2008-08-14 8:40 ` Aneesh Kumar K.V
2008-08-16 0:40 ` [PATCH 5/6 V2]Ext4 journal credits fixes for delalloc writepages Mingming Cao
2008-08-16 4:23 ` Aneesh Kumar K.V
2008-08-12 16:37 ` [PATCH 6/6 ]Ext4 journal credits reservation fixes for defrag Mingming Cao
2008-08-16 0:45 ` [PATCH 6/6 V2]Ext4 " Mingming Cao
2008-08-16 15:55 ` Theodore Tso
2008-08-15 17:33 ` [PATCH 0/6 ]Ext4 journal credits reservation fixes Aneesh Kumar K.V
2008-08-15 19:02 ` Mingming Cao
2008-08-16 0:34 ` Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1217272053.6379.7.camel@mingming-laptop \
--to=cmm@us.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=shehjart@cse.unsw.edu.au \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox