From: Dmitry Monakhov <dmonakhov@openvz.org>
To: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/4] ext4: fix race chown vs truncate
Date: Mon, 23 Nov 2009 21:42:42 +0300 [thread overview]
Message-ID: <87pr79xeu5.fsf@openvz.org> (raw)
In-Reply-To: <873a45ytwa.fsf@openvz.org>
Dmitry Monakhov <dmonakhov@openvz.org> writes:
> To prevent ext4_reservation vs dquot_reservation inconsistency, we have
> to reorganize locking ordering like follows:
> i_block_reservation_lock > dqptr_sem
> This means what all functions which changes ext4 or quota reservation have
> to hold i_block_reservation_lock.
While investigating this issue i've considered other solution
* Introduce i_block analog for generic reserved space management:
We may introduce i_rsv_block field in generic inode, and protected
it by i_lock(similar to i_block).
Introduce inc/dec/set/get methods similar to inode_get_bytes,
inode_sub_bytes.. .
This value is managed internally by quota code. Perform reservation
management inside dquot_reserve_space, dquot_release_reservation
without interfering with fs internals, as we do for i_blocks.
IMHO this is best way because:
1)We don't have to hold i_block_reservation_lock while quota-op
which may lead to virtual performance penalty.
2)This brings to well defined VFS interface for reserved space management.
But I expect some problems from AlViro because only ext4 would use it by now.
And off course this may lead to ext4_rsv quot_rsv inconsistency
due to some bugs.
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
> fs/ext4/inode.c | 23 ++++++++++++++++++-----
> 1 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 84863e6..c521c93 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1047,16 +1047,23 @@ cleanup:
> out:
> return err;
> }
> +/*
> + * Quota_transfer callback.
> + * During quota transfer we have to transfer rsv-blocks from one dquot to
> + * another. inode must be protected from concurrent reservation/reclamation.
> + * Locking ordering for all space reservation code:
> + * i_block_reservation_lock > dqptr_sem
> + * This is differ from i_block,i_lock locking ordering, but this is the
> + * only possible way.
> + * NOTE: Caller must hold i_block_reservation_lock.
> + */
>
> qsize_t ext4_get_reserved_space(struct inode *inode)
> {
> unsigned long long total;
>
> - spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> total = EXT4_I(inode)->i_reserved_data_blocks +
> EXT4_I(inode)->i_reserved_meta_blocks;
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> -
> return (total << inode->i_blkbits);
> }
> /*
> @@ -1131,6 +1138,8 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
> if (mdb_free)
> vfs_dq_release_reservation_block(inode, mdb_free);
>
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> +
> /*
> * If we have done all the pending block allocations and if
> * there aren't any writers on the inode, we can discard the
> @@ -1866,8 +1875,8 @@ repeat:
> }
>
> if (ext4_claim_free_blocks(sbi, total)) {
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> vfs_dq_release_reservation_block(inode, total);
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
> yield();
> goto repeat;
> @@ -1924,9 +1933,9 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
>
> BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
> EXT4_I(inode)->i_reserved_meta_blocks = mdb;
> - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> vfs_dq_release_reservation_block(inode, release);
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> }
>
> static void ext4_da_page_release_reservation(struct page *page,
> @@ -5436,7 +5445,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
> error = PTR_ERR(handle);
> goto err_out;
> }
> + /* i_block_reservation must being held in order to avoid races
> + * with concurent block reservation. */
> + spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> error = vfs_dq_transfer(inode, attr) ? -EDQUOT : 0;
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> if (error) {
> ext4_journal_stop(handle);
> return error;
next prev parent reply other threads:[~2009-11-23 18:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-23 18:13 ext4+quota patch series Dmitry Monakhov
2009-11-23 18:15 ` Eric Sandeen
2009-11-23 19:18 ` Dmitry Monakhov
2009-11-23 19:35 ` Eric Sandeen
2009-11-23 18:30 ` [PATCH 1/4] ext4: delalloc quota fixes Dmitry Monakhov
2009-11-23 22:43 ` Dmitry Monakhov
2009-11-23 22:58 ` Dmitry Monakhov
2009-11-23 22:58 ` [PATCH 2/4] ext4: fix race chown vs truncate Dmitry Monakhov
2009-11-23 22:58 ` [PATCH 3/4] ext4: quota macros cleanup Dmitry Monakhov
2009-11-23 22:58 ` [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer Dmitry Monakhov
2009-11-24 15:24 ` [PATCH 1/4] ext4: delalloc quota fixes Eric Sandeen
2009-11-24 19:38 ` Dmitry Monakhov
2009-12-08 0:00 ` Mingming
2009-12-08 6:34 ` Dmitry Monakhov
2009-11-23 18:32 ` [PATCH 2/4] ext4: fix race chown vs truncate Dmitry Monakhov
2009-11-23 18:42 ` Dmitry Monakhov [this message]
2009-11-23 18:33 ` [PATCH 3/4] ext4: quota macros cleanup Dmitry Monakhov
2009-11-23 18:34 ` [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer Dmitry Monakhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pr79xeu5.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).