From: Dmitry Monakhov <dmonakhov@openvz.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2]
Date: Mon, 07 Dec 2009 22:41:15 +0300 [thread overview]
Message-ID: <87ocmalgh0.fsf@openvz.org> (raw)
In-Reply-To: <20091207171835.GB16078@skywalker.linux.vnet.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> On Wed, Nov 25, 2009 at 09:57:39AM +0300, Dmitry Monakhov wrote:
>> Currently all quota's functions except vfs_dq_reserve_block()
>> called without i_block_reservation_lock. This result in
>> ext4_reservation vs quota_reservation inconsistency which provoke
>> incorrect reservation transfer ==> incorrect quota after transfer.
>>
>> Race (1)
>> | Task 1 (chown) | Task 2 (truncate) |
>> | dquot_transfer | |
>> | ->down_write(dqptr_sem) | ext4_da_release_spac |
>> | -->dquot_get_reserved_space | ->lock(i_block_reservation_lock) |
>> | --->get_reserved_space | /* decrement reservation */ |
>> | ---->ext4_get_reserved_space | ->unlock(i_block_reservation_lock) |
>> | ----->lock(i_block_rsv_lock) | /* During this time window |
>> | /* Read ext4_rsv from inode */ | * fs's reservation not equals |
>> | /* transfer it to new quota */ | * to quota's */ |
>> | ->up_write(dqptr_sem) | ->vfs_dq_release_reservation_block() |
>> | | /* quota_rsv goes negative here */ |
>> | | |
>>
>> Race (2)
>> | Task 1 (chown) | Task 2 (flush-8:16) |
>> | dquot_transfer() | ext4_mb_mark_diskspace_used() |
>> | ->down_write(dqptr_sem) | ->vfs_dq_claim_block() |
>> | --->get_reserved_space() | /* After this moment */ |
>> | --->ext4_get_reserved_space() | /* ext4_rsv != quota_ino_rsv */ |
>> | /* Read rsv from inode which | |
>> | ->dquot_free_reserved_space() | |
>> | /* quota_rsv goes negative */ | |
>> | | |
>> | | dquot_free_reserved_space() |
>> | | /* finally dec ext4_ino_rsv */ |
>>
>> So, in order to protect us from this type of races we always have to
>> provides ext4_ino_rsv == quot_ino_rsv guarantee. And this is only
>> possible then i_block_reservation_lock is taken before entering any
>> quota operations.
>>
>> In fact i_block_reservation_lock is held by ext4_da_reserve_space()
>> while calling vfs_dq_reserve_block(). Lock are held in following order
>> i_block_reservation_lock > dqptr_sem
>>
>> This may result in deadlock because of different lock ordering:
>> ext4_da_reserve_space() dquot_transfer()
>> lock(i_block_reservation_lock) down_write(dqptr_sem)
>> down_write(dqptr_sem) lock(i_block_reservation_lock)
>>
>> But this not happen only because both callers must have i_mutex so
>> serialization happens on i_mutex.
>
>
> But that down_write can sleep right ?
Absolutely right. I've fixed an issue, but overlooked the BIGGEST one.
So off course my patch is wrong, even if we will acquire lock in
different order " dqptr_sem > i_block_reservation_lock"
we sill getting in to sleeping spin lock problems by following scenario:
ext4_da_update_reserve_space()
->dquot_claim_space()
ASSUMES that we hold i_block_reservation_lock here.
-->mark_dquot_dirty()
--->ext4_write_dquot()
if (journalled quota) ext4_write_dquot();
---->dquot_commit()
----->mutex_lock(&dqopt->dqio_mutt's); <<< sleep here.
This means that we have fully redesign quota reservation locking.
As i already suggested previously here:
http://thread.gmane.org/gmane.comp.file-systems.ext4/16576/focus=16587
* Introduce i_block analog for generic reserved space management:
We may introduce i_rsv_block field in generic intone, it protected
by i_lock(similar to i_block).
Introduce inc/dec/set/get methods similar to inode_get_bytes,
inode_sub_bytes.. .
This value is managed internally by quota code. Perform reservation
management inside devout_reserve_space, dquot_release_reservation
without interfering with fs internals, as we do for i_blocks.
So locking sequence will looks like follows
ext4_XXX_space()
->spin_lock(&i_block_reservation_lock)
->// update ext4_rsv fields
->spin_unlock(&i_block_reservation_lock)
->vfs_XXX_space()
-->down_XXX(&dqptr_sem)
--->inode_XXX_reserved_bytes << analog for inode_XXX_bytes()
---->spin_lock(&inode->i_lock)
---->// update inode->i_rsv_block (and inode->i_block if necessary)
---->spin_unlock(&inode->i_lock)
--->mark_dirty()
IMHO this is best way because:
1)This is the only sane way to fix #14739
2)This brings to well defined VFS interface for reserved space management.
I'll prepare the patch soon.
>
> For example:
> http://bugzilla.kernel.org/show_bug.cgi?id=14739
>
> -aneesh
> LocalWords: rsv inode dquot
next prev parent reply other threads:[~2009-12-07 19:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-25 6:57 [PATCH 1/4] ext4: ext4_get_reserved_space() must return bytes instead of blocks Dmitry Monakhov
2009-11-25 6:57 ` [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2] Dmitry Monakhov
2009-11-25 6:57 ` [PATCH 3/4] ext4: quota macros cleanup Dmitry Monakhov
2009-11-25 6:57 ` [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer Dmitry Monakhov
2009-12-08 1:02 ` Mingming
2009-12-08 6:48 ` Dmitry Monakhov
2009-12-08 0:59 ` [PATCH 3/4] ext4: quota macros cleanup Mingming
2009-12-07 17:18 ` [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2] Aneesh Kumar K.V
2009-12-07 19:41 ` Dmitry Monakhov [this message]
2009-12-09 1:42 ` tytso
2009-12-09 2:03 ` Dmitry Monakhov
2009-12-08 23:06 ` Mingming
2009-11-25 16:32 ` [PATCH 1/4] ext4: ext4_get_reserved_space() must return bytes instead of blocks Eric Sandeen
2009-12-08 1:04 ` Mingming
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ocmalgh0.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.