linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2]
Date: Mon, 07 Dec 2009 22:41:15 +0300	[thread overview]
Message-ID: <87ocmalgh0.fsf@openvz.org> (raw)
In-Reply-To: <20091207171835.GB16078@skywalker.linux.vnet.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> On Wed, Nov 25, 2009 at 09:57:39AM +0300, Dmitry Monakhov wrote:
>> Currently all quota's functions except vfs_dq_reserve_block()
>> called without i_block_reservation_lock. This result in
>> ext4_reservation vs quota_reservation inconsistency which provoke
>> incorrect reservation transfer ==> incorrect quota after transfer.
>> 
>> Race (1)
>> | Task 1 (chown)                 | Task 2 (truncate)                    |
>> | dquot_transfer                 |                                      |
>> | ->down_write(dqptr_sem)        | ext4_da_release_spac                 |
>> | -->dquot_get_reserved_space    | ->lock(i_block_reservation_lock)     |
>> | --->get_reserved_space         | /* decrement reservation */          |
>> | ---->ext4_get_reserved_space   | ->unlock(i_block_reservation_lock)   |
>> | ----->lock(i_block_rsv_lock)   | /* During this time window           |
>> | /* Read ext4_rsv from inode */ | * fs's reservation not equals        |
>> | /* transfer it to new quota */ | * to quota's */                      |
>> | ->up_write(dqptr_sem)          | ->vfs_dq_release_reservation_block() |
>> |                                | /* quota_rsv goes negative here */   |
>> |                                |                                      |
>> 
>> Race (2)
>> | Task 1 (chown)                | Task 2 (flush-8:16)             |
>> | dquot_transfer()              | ext4_mb_mark_diskspace_used()   |
>> | ->down_write(dqptr_sem)       | ->vfs_dq_claim_block()          |
>> | --->get_reserved_space()      | /* After this moment         */ |
>> | --->ext4_get_reserved_space() | /* ext4_rsv != quota_ino_rsv */ |
>> | /* Read rsv from inode which  |                                 |
>> | ->dquot_free_reserved_space() |                                 |
>> | /* quota_rsv goes negative */ |                                 |
>> |                               |                                 |
>> |                               | dquot_free_reserved_space()     |
>> |                               | /* finally dec ext4_ino_rsv */  |
>> 
>> So, in order to protect us from this type of races we always have to
>> provides ext4_ino_rsv == quot_ino_rsv guarantee. And this is only
>> possible then i_block_reservation_lock is taken before entering any
>> quota operations.
>> 
>> In fact i_block_reservation_lock is held by ext4_da_reserve_space()
>> while calling vfs_dq_reserve_block(). Lock are held in following order
>> i_block_reservation_lock > dqptr_sem
>> 
>> This may result in deadlock because of different lock ordering:
>> ext4_da_reserve_space()             dquot_transfer()
>> lock(i_block_reservation_lock)	     down_write(dqptr_sem)
>>   down_write(dqptr_sem)		     lock(i_block_reservation_lock)
>> 
>> But this not happen only because both callers must have i_mutex so
>> serialization happens on i_mutex.
>
>
> But that down_write can sleep right ? 
Absolutely right. I've fixed an issue, but overlooked the BIGGEST one.
So off course my patch is wrong, even if we will acquire lock in
different order " dqptr_sem > i_block_reservation_lock"
we sill getting in to sleeping  spin lock problems by following scenario:
ext4_da_update_reserve_space()
->dquot_claim_space()
ASSUMES that we hold i_block_reservation_lock here.
-->mark_dquot_dirty()
--->ext4_write_dquot()
    if (journalled quota) ext4_write_dquot();
---->dquot_commit()
----->mutex_lock(&dqopt->dqio_mutt's); <<< sleep here.

This means that we have fully redesign quota reservation locking. 
As i already suggested previously here:
http://thread.gmane.org/gmane.comp.file-systems.ext4/16576/focus=16587 

* Introduce i_block analog for generic reserved space management:
We may introduce i_rsv_block field in generic intone, it protected
by i_lock(similar to i_block).
Introduce inc/dec/set/get methods similar to inode_get_bytes,
inode_sub_bytes.. . 
This value is managed internally by quota code. Perform reservation
management inside devout_reserve_space, dquot_release_reservation
without interfering with fs internals, as we do for i_blocks.
So locking sequence will looks like follows
ext4_XXX_space()
->spin_lock(&i_block_reservation_lock)
->// update ext4_rsv fields 
->spin_unlock(&i_block_reservation_lock)
->vfs_XXX_space()
-->down_XXX(&dqptr_sem)
--->inode_XXX_reserved_bytes << analog for inode_XXX_bytes()
---->spin_lock(&inode->i_lock)
---->// update inode->i_rsv_block (and inode->i_block if necessary)
---->spin_unlock(&inode->i_lock)
--->mark_dirty()
IMHO this is best way because:
1)This is the only sane way to fix #14739
2)This brings to well defined VFS interface for reserved space management.

I'll prepare the patch soon.
>
> For example:
> http://bugzilla.kernel.org/show_bug.cgi?id=14739
>
> -aneesh

>  LocalWords:  rsv inode dquot

  reply	other threads:[~2009-12-07 19:41 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-25  6:57 [PATCH 1/4] ext4: ext4_get_reserved_space() must return bytes instead of blocks Dmitry Monakhov
2009-11-25  6:57 ` [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2] Dmitry Monakhov
2009-11-25  6:57   ` [PATCH 3/4] ext4: quota macros cleanup Dmitry Monakhov
2009-11-25  6:57     ` [PATCH 4/4] ext4: fix incorrect block reservation on quota transfer Dmitry Monakhov
2009-12-08  1:02       ` Mingming
2009-12-08  6:48         ` Dmitry Monakhov
2009-12-08  0:59     ` [PATCH 3/4] ext4: quota macros cleanup Mingming
2009-12-07 17:18   ` [PATCH 2/4] ext4: fix reserved space transferring on chown() [V2] Aneesh Kumar K.V
2009-12-07 19:41     ` Dmitry Monakhov [this message]
2009-12-09  1:42       ` tytso
2009-12-09  2:03         ` Dmitry Monakhov
2009-12-08 23:06   ` Mingming
2009-11-25 16:32 ` [PATCH 1/4] ext4: ext4_get_reserved_space() must return bytes instead of blocks Eric Sandeen
2009-12-08  1:04 ` Mingming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ocmalgh0.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).