From: Mingming Cao <cmm@us.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: tytso@mit.edu, sandeen@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH] ext4: Make sure all the block allocation patch reserve blocks
Date: Fri, 22 Aug 2008 11:22:52 -0700 [thread overview]
Message-ID: <1219429372.6306.29.camel@mingming-laptop> (raw)
In-Reply-To: <1219412074-30584-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
在 2008-08-22五的 19:04 +0530,Aneesh Kumar K.V写道:
> With delayed allocation we need to make sure block are reserved
> before we attempt to allocate them. Otherwise we get block
> allocation failure (ENOSPC) during writepages which cannot
> be handled. This would mean silent data loss (We do a printk
> stating data will be lost). m
The patch description is a little unaccurate. We already have the blocks
reservation for delayed allocation. But it doesn't handle the multiple
threads racing issue, in case of the filesystem is close to full, which
will result in overbooking free blocks, and ENOSPC later at the
wirtepages time.
> This patch update the DIO
> and fallocate code path to do block reservation before block
> allocation.
>
Do you mean the patch changes the mballoc and old block allocator to
claim free blocks before doing real block allocation, and after
allocation is done, update the free blocks counter properly (if
allocated < requested)?
Yes that's the code path that is common to DIO and fallocate, but I
don't see we need block reservation for DIO and fallocate. Unlike
delayed allocation, they could handle ENOSPC error, and return to user
right after the block allocation request.
> When free blocks count go below a threshold we switch to
> a slow patch which looks at other CPU's accumulated percpu
> counter values.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
> fs/ext4/balloc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++---------
> fs/ext4/ext4.h | 2 +
> fs/ext4/inode.c | 5 +---
> fs/ext4/mballoc.c | 23 +++++++++-------
> 4 files changed, 76 insertions(+), 26 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index cfed283..4a53541 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -1602,6 +1602,47 @@ ext4_try_to_allocate_with_rsv(struct super_block *sb, handle_t *handle,
> return ret;
> }
>
> +int ext4_claim_free_blocks(struct ext4_sb_info *sbi,
> + ext4_fsblk_t nblocks)
> +{
> + s64 free_blocks;
> + ext4_fsblk_t root_blocks = 0;
> + struct percpu_counter *fbc = &sbi->s_freeblocks_counter;
> +
> + free_blocks = percpu_counter_read(fbc);
> +
> + if (!capable(CAP_SYS_RESOURCE) &&
> + sbi->s_resuid != current->fsuid &&
> + (sbi->s_resgid == 0 || !in_group_p(sbi->s_resgid)))
> + root_blocks = ext4_r_blocks_count(sbi->s_es);
> +#ifdef CONFIG_SMP
> + /* Each CPU can accumulate FBC_BATCH blocks in their local
> + * counters. So we need to make sure we have free blocks more
> + * than FBC_BATCH * nr_cpu_ids. Also add a window of 4 times.
> + */
> + if (free_blocks - (nblocks + root_blocks) <
> + (4 * (FBC_BATCH * nr_cpu_ids))) {
> + /*
> + * We need to sum and claim under lock
> + * This is the slow patch which will be
> + * taken when we are very low on free blocks
> + */
> + if (percpu_counter_sum_and_sub(fbc, nblocks + root_blocks))
> + return -ENOSPC;
> + /* add root_blocks back */
> + percpu_counter_add(fbc, root_blocks);
> + return 0;
> + }
> +#endif
> + if (free_blocks < (root_blocks + nblocks))
> + /* we don't have free space */
> + return -ENOSPC;
> +
> + /* reduce fs free blocks counter */
> + percpu_counter_sub(fbc, nblocks);
> + return 0;
> +}
> +
The only reason we need #ifdef CONFIG_SMP is FBC_BATCH is undefined in
UP. So instead of using FBC_BATCH, we could use nr_cpu_ids *4 directly,
so avoid the need for CONFIG_SMP nest in ext4 code.
Better to have a a define of
#define EXT$_FREEBLOCKS_LOW 4 * nr_cpu_ids * nr_cpu_ids
So later if this watermark is too low, we could change it easily.
> /**
> * ext4_has_free_blocks()
> * @sbi: in-core super block structure.
> @@ -1624,9 +1665,15 @@ ext4_fsblk_t ext4_has_free_blocks(struct ext4_sb_info *sbi,
> (sbi->s_resgid == 0 || !in_group_p(sbi->s_resgid)))
> root_blocks = ext4_r_blocks_count(sbi->s_es);
> #ifdef CONFIG_SMP
> - if (free_blocks - root_blocks < FBC_BATCH)
> + /* Each CPU can accumulate FBC_BATCH blocks in their local
> + * counters. So we need to make sure we have free blocks more
> + * than FBC_BATCH * nr_cpu_ids. Also add a window of 4 times.
> + */
> + if (free_blocks - (nblocks + root_blocks) <
> + (4 * (FBC_BATCH * nr_cpu_ids))) {
> free_blocks =
> percpu_counter_sum(&sbi->s_freeblocks_counter);
> + }
> #endif
> if (free_blocks <= root_blocks)
> /* we don't have free space */
> @@ -1634,7 +1681,7 @@ ext4_fsblk_t ext4_has_free_blocks(struct ext4_sb_info *sbi,
> if (free_blocks - root_blocks < nblocks)
> return free_blocks - root_blocks;
> return nblocks;
> - }
> +}
>
>
> /**
> @@ -1713,14 +1760,11 @@ ext4_fsblk_t ext4_old_new_blocks(handle_t *handle, struct inode *inode,
> /*
> * With delalloc we already reserved the blocks
> */
> - *count = ext4_has_free_blocks(sbi, *count);
> - }
> - if (*count == 0) {
> - *errp = -ENOSPC;
> - return 0; /*return with ENOSPC error */
> + if (ext4_claim_free_blocks(sbi, *count)) {
> + *errp = -ENOSPC;
> + return 0; /*return with ENOSPC error */
> + }
Oh, The changes hurt DIO and fallocate, I think.
In fact with this changes, DIO and fallocate will_STOP_ allocating even
if the there are some free blocks around, but less than requested, and
returns ENOSPC too early. With current code it will try to allocate as
much as possible.
> }
> - num = *count;
> -
> /*
> * Check quota for allocation of this block.
> */
> @@ -1915,9 +1959,13 @@ ext4_fsblk_t ext4_old_new_blocks(handle_t *handle, struct inode *inode,
> le16_add_cpu(&gdp->bg_free_blocks_count, -num);
> gdp->bg_checksum = ext4_group_desc_csum(sbi, group_no, gdp);
> spin_unlock(sb_bgl_lock(sbi, group_no));
> - if (!EXT4_I(inode)->i_delalloc_reserved_flag)
> - percpu_counter_sub(&sbi->s_freeblocks_counter, num);
> -
> + if (!EXT4_I(inode)->i_delalloc_reserved_flag && (*count != num)) {
> + /*
> + * we allocated less blocks than we
> + * claimed. Add the difference back.
> + */
> + percpu_counter_add(&sbi->s_freeblocks_counter, *count - num);
> + }
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi, group_no);
> spin_lock(sb_bgl_lock(sbi, flex_group));
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 7f11b25..3738039 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1047,6 +1047,8 @@ extern ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
> unsigned long *count, int *errp);
> extern ext4_fsblk_t ext4_old_new_blocks(handle_t *handle, struct inode *inode,
> ext4_fsblk_t goal, unsigned long *count, int *errp);
> +extern int ext4_claim_free_blocks(struct ext4_sb_info *sbi,
> + ext4_fsblk_t nblocks);
> extern ext4_fsblk_t ext4_has_free_blocks(struct ext4_sb_info *sbi,
> ext4_fsblk_t nblocks);
> extern void ext4_free_blocks (handle_t *handle, struct inode *inode,
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 1c289c1..d965a05 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1537,13 +1537,10 @@ static int ext4_da_reserve_space(struct inode *inode, int nrblocks)
> md_needed = mdblocks - EXT4_I(inode)->i_reserved_meta_blocks;
> total = md_needed + nrblocks;
>
> - if (ext4_has_free_blocks(sbi, total) < total) {
> + if (ext4_claim_free_blocks(sbi, total)) {
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> return -ENOSPC;
> }
> - /* reduce fs free blocks counter */
> - percpu_counter_sub(&sbi->s_freeblocks_counter, total);
> -
> EXT4_I(inode)->i_reserved_data_blocks += nrblocks;
> EXT4_I(inode)->i_reserved_meta_blocks = mdblocks;
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 82dd0e4..4404b46 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2977,9 +2977,15 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> * at write_begin() time for delayed allocation
> * do not double accounting
> */
> - if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
> - percpu_counter_sub(&sbi->s_freeblocks_counter,
> - ac->ac_b_ex.fe_len);
> + if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED) &&
> + ac->ac_o_ex.fe_len != ac->ac_b_ex.fe_len) {
> + /*
> + * we allocated less blocks than we calimed
> + * Add the difference back
> + */
> + percpu_counter_add(&sbi->s_freeblocks_counter,
> + ac->ac_o_ex.fe_len - ac->ac_b_ex.fe_len);
> + }
>
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi,
> @@ -4391,14 +4397,11 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
> /*
> * With delalloc we already reserved the blocks
> */
> - ar->len = ext4_has_free_blocks(sbi, ar->len);
> - }
> -
> - if (ar->len == 0) {
> - *errp = -ENOSPC;
> - return 0;
> + if (ext4_claim_free_blocks(sbi, ar->len)) {
> + *errp = -ENOSPC;
> + return 0;
> + }
> }
> -
> while (ar->len && DQUOT_ALLOC_BLOCK(ar->inode, ar->len)) {
> ar->flags |= EXT4_MB_HINT_NOPREALLOC;
> ar->len--;
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-08-22 18:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-22 13:34 [RFC PATCH] percpu_counters: make fbc->count read atomic on 32 bit architecture Aneesh Kumar K.V
2008-08-22 13:34 ` [RFC PATCH] percpu_counters: Add new function percpu_counter_sum_and_sub Aneesh Kumar K.V
2008-08-22 13:34 ` [RFC PATCH] ext4: Make sure all the block allocation patch reserve blocks Aneesh Kumar K.V
2008-08-22 18:22 ` Mingming Cao [this message]
2008-08-22 18:01 ` [RFC PATCH] percpu_counters: Add new function percpu_counter_sum_and_sub Mingming Cao
2008-08-22 18:29 ` [RFC PATCH] percpu_counters: make fbc->count read atomic on 32 bit architecture Mingming Cao
2008-08-22 18:33 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1219429372.6306.29.camel@mingming-laptop \
--to=cmm@us.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox