* [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel @ 2009-12-23 17:27 Jan Kara 2009-12-23 17:27 ` [PATCH 1/5] Add unlocked version of inode_add_bytes() function Jan Kara ` (4 more replies) 0 siblings, 5 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso The following five patches fix deadlocks, quota miscomputations, and such in ext4 implementation of quota for 2.6.31 kernel. Honza ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/5] Add unlocked version of inode_add_bytes() function 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara @ 2009-12-23 17:27 ` Jan Kara 2009-12-23 17:27 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara ` (3 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> Quota code requires unlocked version of this function. Off course we can just copy-paste the code, but copy-pasting is always an evil. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/stat.c | 10 ++++++++-- include/linux/fs.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/stat.c b/fs/stat.c index 075694e..c4ecd52 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -401,9 +401,9 @@ SYSCALL_DEFINE4(fstatat64, int, dfd, char __user *, filename, } #endif /* __ARCH_WANT_STAT64 */ -void inode_add_bytes(struct inode *inode, loff_t bytes) +/* Caller is here responsible for sufficient locking (ie. inode->i_lock) */ +void __inode_add_bytes(struct inode *inode, loff_t bytes) { - spin_lock(&inode->i_lock); inode->i_blocks += bytes >> 9; bytes &= 511; inode->i_bytes += bytes; @@ -411,6 +411,12 @@ void inode_add_bytes(struct inode *inode, loff_t bytes) inode->i_blocks++; inode->i_bytes -= 512; } +} + +void inode_add_bytes(struct inode *inode, loff_t bytes) +{ + spin_lock(&inode->i_lock); + __inode_add_bytes(inode, bytes); spin_unlock(&inode->i_lock); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 73e9b64..e2eeaa5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2296,6 +2296,7 @@ extern const struct inode_operations page_symlink_inode_operations; extern int generic_readlink(struct dentry *, char __user *, int); extern void generic_fillattr(struct inode *, struct kstat *); extern int vfs_getattr(struct vfsmount *, struct dentry *, struct kstat *); +void __inode_add_bytes(struct inode *inode, loff_t bytes); void inode_add_bytes(struct inode *inode, loff_t bytes); void inode_sub_bytes(struct inode *inode, loff_t bytes); loff_t inode_get_bytes(struct inode *inode); -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/5] quota: decouple fs reserved space from quota reservation 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara 2009-12-23 17:27 ` [PATCH 1/5] Add unlocked version of inode_add_bytes() function Jan Kara @ 2009-12-23 17:27 ` Jan Kara 2010-01-11 17:53 ` Eric Sandeen 2009-12-23 17:27 ` [PATCH 3/5] ext4: Convert to generic reserved quota's space management Jan Kara ` (2 subsequent siblings) 4 siblings, 1 reply; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> Currently inode_reservation is managed by fs itself and this reservation is transfered on dquot_transfer(). This means what inode_reservation must always be in sync with dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be triggered) This is not easy because of complex locking order issues for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 The patch introduce quota reservation field for each fs-inode (fs specific inode is used in order to prevent bloating generic vfs inode). This reservation is managed by quota code internally similar to i_blocks/i_bytes and may not be always in sync with internal fs reservation. Also perform some code rearrangement: - Unify dquot_reserve_space() and dquot_reserve_space() - Unify dquot_release_reserved_space() and dquot_free_space() - Also this patch add missing warning update to release_rsv() dquot_release_reserved_space() must call flush_warnings() as dquot_free_space() does. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/quota/dquot.c | 215 ++++++++++++++++++++++++++-------------------- include/linux/quota.h | 5 +- include/linux/quotaops.h | 11 ++- 3 files changed, 133 insertions(+), 98 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 38f7bd5..6b12046 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1388,6 +1388,69 @@ void vfs_dq_drop(struct inode *inode) EXPORT_SYMBOL(vfs_dq_drop); /* + * inode_reserved_space is managed internally by quota, and protected by + * i_lock similar to i_blocks+i_bytes. + */ +static qsize_t *inode_reserved_space(struct inode * inode) +{ + /* Filesystem must explicitly define it's own method in order to use + * quota reservation interface */ + BUG_ON(!inode->i_sb->dq_op->get_reserved_space); + return inode->i_sb->dq_op->get_reserved_space(inode); +} + +void inode_add_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) += number; + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_add_rsv_space); + +void inode_claim_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) -= number; + __inode_add_bytes(inode, number); + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_claim_rsv_space); + +void inode_sub_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) -= number; + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_sub_rsv_space); + +static qsize_t inode_get_rsv_space(struct inode *inode) +{ + qsize_t ret; + spin_lock(&inode->i_lock); + ret = *inode_reserved_space(inode); + spin_unlock(&inode->i_lock); + return ret; +} + +static void inode_incr_space(struct inode *inode, qsize_t number, + int reserve) +{ + if (reserve) + inode_add_rsv_space(inode, number); + else + inode_add_bytes(inode, number); +} + +static void inode_decr_space(struct inode *inode, qsize_t number, int reserve) +{ + if (reserve) + inode_sub_rsv_space(inode, number); + else + inode_sub_bytes(inode, number); +} + +/* * Following four functions update i_blocks+i_bytes fields and * quota information (together with appropriate checks) * NOTE: We absolutely rely on the fact that caller dirties @@ -1405,6 +1468,21 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int cnt, ret = QUOTA_OK; char warntype[MAXQUOTAS]; + /* + * First test before acquiring mutex - solves deadlocks when we + * re-enter the quota code and are already holding the mutex + */ + if (IS_NOQUOTA(inode)) { + inode_incr_space(inode, number, reserve); + goto out; + } + + down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); + if (IS_NOQUOTA(inode)) { + inode_incr_space(inode, number, reserve); + goto out_unlock; + } + for (cnt = 0; cnt < MAXQUOTAS; cnt++) warntype[cnt] = QUOTA_NL_NOWARN; @@ -1415,7 +1493,8 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, if (check_bdq(inode->i_dquot[cnt], number, warn, warntype+cnt) == NO_QUOTA) { ret = NO_QUOTA; - goto out_unlock; + spin_unlock(&dq_data_lock); + goto out_flush_warn; } } for (cnt = 0; cnt < MAXQUOTAS; cnt++) { @@ -1426,64 +1505,32 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, else dquot_incr_space(inode->i_dquot[cnt], number); } - if (!reserve) - inode_add_bytes(inode, number); -out_unlock: + inode_incr_space(inode, number, reserve); spin_unlock(&dq_data_lock); - flush_warnings(inode->i_dquot, warntype); - return ret; -} - -int dquot_alloc_space(struct inode *inode, qsize_t number, int warn) -{ - int cnt, ret = QUOTA_OK; - - /* - * First test before acquiring mutex - solves deadlocks when we - * re-enter the quota code and are already holding the mutex - */ - if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); - goto out; - } - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); - goto out_unlock; - } - - ret = __dquot_alloc_space(inode, number, warn, 0); - if (ret == NO_QUOTA) - goto out_unlock; + if (reserve) + goto out_flush_warn; /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) if (inode->i_dquot[cnt]) mark_dquot_dirty(inode->i_dquot[cnt]); +out_flush_warn: + flush_warnings(inode->i_dquot, warntype); out_unlock: up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); out: return ret; } + +int dquot_alloc_space(struct inode *inode, qsize_t number, int warn) +{ + return __dquot_alloc_space(inode, number, warn, 0); +} EXPORT_SYMBOL(dquot_alloc_space); int dquot_reserve_space(struct inode *inode, qsize_t number, int warn) { - int ret = QUOTA_OK; - - if (IS_NOQUOTA(inode)) - goto out; - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) - goto out_unlock; - - ret = __dquot_alloc_space(inode, number, warn, 1); -out_unlock: - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); -out: - return ret; + return __dquot_alloc_space(inode, number, warn, 1); } EXPORT_SYMBOL(dquot_reserve_space); @@ -1540,14 +1587,14 @@ int dquot_claim_space(struct inode *inode, qsize_t number) int ret = QUOTA_OK; if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); goto out; } down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); if (IS_NOQUOTA(inode)) { up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); goto out; } @@ -1559,7 +1606,7 @@ int dquot_claim_space(struct inode *inode, qsize_t number) number); } /* Update inode bytes */ - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); spin_unlock(&dq_data_lock); /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) @@ -1572,38 +1619,9 @@ out: EXPORT_SYMBOL(dquot_claim_space); /* - * Release reserved quota space - */ -void dquot_release_reserved_space(struct inode *inode, qsize_t number) -{ - int cnt; - - if (IS_NOQUOTA(inode)) - goto out; - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) - goto out_unlock; - - spin_lock(&dq_data_lock); - /* Release reserved dquots */ - for (cnt = 0; cnt < MAXQUOTAS; cnt++) { - if (inode->i_dquot[cnt]) - dquot_free_reserved_space(inode->i_dquot[cnt], number); - } - spin_unlock(&dq_data_lock); - -out_unlock: - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); -out: - return; -} -EXPORT_SYMBOL(dquot_release_reserved_space); - -/* * This operation can block, but only after everything is updated */ -int dquot_free_space(struct inode *inode, qsize_t number) +int __dquot_free_space(struct inode *inode, qsize_t number, int reserve) { unsigned int cnt; char warntype[MAXQUOTAS]; @@ -1612,7 +1630,7 @@ int dquot_free_space(struct inode *inode, qsize_t number) * re-enter the quota code and are already holding the mutex */ if (IS_NOQUOTA(inode)) { out_sub: - inode_sub_bytes(inode, number); + inode_decr_space(inode, number, reserve); return QUOTA_OK; } @@ -1627,21 +1645,43 @@ out_sub: if (!inode->i_dquot[cnt]) continue; warntype[cnt] = info_bdq_free(inode->i_dquot[cnt], number); - dquot_decr_space(inode->i_dquot[cnt], number); + if (reserve) + dquot_free_reserved_space(inode->i_dquot[cnt], number); + else + dquot_decr_space(inode->i_dquot[cnt], number); } - inode_sub_bytes(inode, number); + inode_decr_space(inode, number, reserve); spin_unlock(&dq_data_lock); + + if (reserve) + goto out_unlock; /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) if (inode->i_dquot[cnt]) mark_dquot_dirty(inode->i_dquot[cnt]); +out_unlock: flush_warnings(inode->i_dquot, warntype); up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); return QUOTA_OK; } + +int dquot_free_space(struct inode *inode, qsize_t number) +{ + return __dquot_free_space(inode, number, 0); +} EXPORT_SYMBOL(dquot_free_space); /* + * Release reserved quota space + */ +void dquot_release_reserved_space(struct inode *inode, qsize_t number) +{ + __dquot_free_space(inode, number, 1); + +} +EXPORT_SYMBOL(dquot_release_reserved_space); + +/* * This operation can block, but only after everything is updated */ int dquot_free_inode(const struct inode *inode, qsize_t number) @@ -1679,19 +1719,6 @@ int dquot_free_inode(const struct inode *inode, qsize_t number) EXPORT_SYMBOL(dquot_free_inode); /* - * call back function, get reserved quota space from underlying fs - */ -qsize_t dquot_get_reserved_space(struct inode *inode) -{ - qsize_t reserved_space = 0; - - if (sb_any_quota_active(inode->i_sb) && - inode->i_sb->dq_op->get_reserved_space) - reserved_space = inode->i_sb->dq_op->get_reserved_space(inode); - return reserved_space; -} - -/* * Transfer the number of inode and blocks from one diskquota to an other. * * This operation can block, but only after everything is updated @@ -1734,7 +1761,7 @@ int dquot_transfer(struct inode *inode, struct iattr *iattr) } spin_lock(&dq_data_lock); cur_space = inode_get_bytes(inode); - rsv_space = dquot_get_reserved_space(inode); + rsv_space = inode_get_rsv_space(inode); space = cur_space + rsv_space; /* Build the transfer_from list and check the limits */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) { diff --git a/include/linux/quota.h b/include/linux/quota.h index 78c4889..8fd8efc 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -313,8 +313,9 @@ struct dquot_operations { int (*claim_space) (struct inode *, qsize_t); /* release rsved quota for delayed alloc */ void (*release_rsv) (struct inode *, qsize_t); - /* get reserved quota for delayed alloc */ - qsize_t (*get_reserved_space) (struct inode *); + /* get reserved quota for delayed alloc, value returned is managed by + * quota code only */ + qsize_t *(*get_reserved_space) (struct inode *); }; /* Operations handling requests from userspace */ diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 26361c4..34b1f13 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -26,6 +26,10 @@ static inline void writeout_quota_sb(struct super_block *sb, int type) sb->s_qcop->quota_sync(sb, type); } +void inode_add_rsv_space(struct inode *inode, qsize_t number); +void inode_claim_rsv_space(struct inode *inode, qsize_t number); +void inode_sub_rsv_space(struct inode *inode, qsize_t number); + int dquot_initialize(struct inode *inode, int type); int dquot_drop(struct inode *inode); struct dquot *dqget(struct super_block *sb, unsigned int id, int type); @@ -42,7 +46,6 @@ int dquot_alloc_inode(const struct inode *inode, qsize_t number); int dquot_reserve_space(struct inode *inode, qsize_t number, int prealloc); int dquot_claim_space(struct inode *inode, qsize_t number); void dquot_release_reserved_space(struct inode *inode, qsize_t number); -qsize_t dquot_get_reserved_space(struct inode *inode); int dquot_free_space(struct inode *inode, qsize_t number); int dquot_free_inode(const struct inode *inode, qsize_t number); @@ -199,6 +202,8 @@ static inline int vfs_dq_reserve_space(struct inode *inode, qsize_t nr) if (inode->i_sb->dq_op->reserve_space(inode, nr, 0) == NO_QUOTA) return 1; } + else + inode_add_rsv_space(inode, nr); return 0; } @@ -221,7 +226,7 @@ static inline int vfs_dq_claim_space(struct inode *inode, qsize_t nr) if (inode->i_sb->dq_op->claim_space(inode, nr) == NO_QUOTA) return 1; } else - inode_add_bytes(inode, nr); + inode_claim_rsv_space(inode, nr); mark_inode_dirty(inode); return 0; @@ -235,6 +240,8 @@ void vfs_dq_release_reservation_space(struct inode *inode, qsize_t nr) { if (sb_any_quota_active(inode->i_sb)) inode->i_sb->dq_op->release_rsv(inode, nr); + else + inode_sub_rsv_space(inode, nr); } static inline void vfs_dq_free_space_nodirty(struct inode *inode, qsize_t nr) -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/5] quota: decouple fs reserved space from quota reservation 2009-12-23 17:27 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara @ 2010-01-11 17:53 ` Eric Sandeen 2010-01-11 18:00 ` Jan Kara 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2010-01-11 17:53 UTC (permalink / raw) To: Jan Kara; +Cc: stable, linux-ext4, tytso, Dmitry Monakhov Jan Kara wrote: > From: Dmitry Monakhov <dmonakhov@openvz.org> > > Currently inode_reservation is managed by fs itself and this > reservation is transfered on dquot_transfer(). This means what > inode_reservation must always be in sync with > dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result > in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be > triggered) > This is not easy because of complex locking order issues > for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 > > The patch introduce quota reservation field for each fs-inode > (fs specific inode is used in order to prevent bloating generic > vfs inode). This reservation is managed by quota code internally > similar to i_blocks/i_bytes and may not be always in sync with > internal fs reservation. > > Also perform some code rearrangement: > - Unify dquot_reserve_space() and dquot_reserve_space() > - Unify dquot_release_reserved_space() and dquot_free_space() > - Also this patch add missing warning update to release_rsv() > dquot_release_reserved_space() must call flush_warnings() as > dquot_free_space() does. > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > Signed-off-by: Jan Kara <jack@suse.cz> > --- ... > @@ -1734,7 +1761,7 @@ int dquot_transfer(struct inode *inode, struct iattr *iattr) > } > spin_lock(&dq_data_lock); > cur_space = inode_get_bytes(inode); > - rsv_space = dquot_get_reserved_space(inode); > + rsv_space = inode_get_rsv_space(inode); > space = cur_space + rsv_space; > /* Build the transfer_from list and check the limits */ > for (cnt = 0; cnt < MAXQUOTAS; cnt++) { ... > /* > + * inode_reserved_space is managed internally by quota, and protected by > + * i_lock similar to i_blocks+i_bytes. > + */ > +static qsize_t *inode_reserved_space(struct inode * inode) > +{ > + /* Filesystem must explicitly define it's own method in order to use > + * quota reservation interface */ > + BUG_ON(!inode->i_sb->dq_op->get_reserved_space); Unless I'm missing something, this just broke quota for everyone except ext4 ... sys_chown ... dquot_transfer inode_get_rsv_space inode_reserved_space will BUG_ON ext3, we get there with (rightly) no ->get_reserved_space. Or am I missing something? -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/5] quota: decouple fs reserved space from quota reservation 2010-01-11 17:53 ` Eric Sandeen @ 2010-01-11 18:00 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2010-01-11 18:00 UTC (permalink / raw) To: Eric Sandeen; +Cc: Jan Kara, stable, linux-ext4, tytso, Dmitry Monakhov [-- Attachment #1: Type: text/plain, Size: 2654 bytes --] On Mon 11-01-10 11:53:19, Eric Sandeen wrote: > Jan Kara wrote: > > From: Dmitry Monakhov <dmonakhov@openvz.org> > > > > Currently inode_reservation is managed by fs itself and this > > reservation is transfered on dquot_transfer(). This means what > > inode_reservation must always be in sync with > > dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result > > in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be > > triggered) > > This is not easy because of complex locking order issues > > for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 > > > > The patch introduce quota reservation field for each fs-inode > > (fs specific inode is used in order to prevent bloating generic > > vfs inode). This reservation is managed by quota code internally > > similar to i_blocks/i_bytes and may not be always in sync with > > internal fs reservation. > > > > Also perform some code rearrangement: > > - Unify dquot_reserve_space() and dquot_reserve_space() > > - Unify dquot_release_reserved_space() and dquot_free_space() > > - Also this patch add missing warning update to release_rsv() > > dquot_release_reserved_space() must call flush_warnings() as > > dquot_free_space() does. > > > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > ... > > > @@ -1734,7 +1761,7 @@ int dquot_transfer(struct inode *inode, struct > iattr *iattr) > > } > > spin_lock(&dq_data_lock); > > cur_space = inode_get_bytes(inode); > > - rsv_space = dquot_get_reserved_space(inode); > > + rsv_space = inode_get_rsv_space(inode); > > space = cur_space + rsv_space; > > /* Build the transfer_from list and check the limits */ > > for (cnt = 0; cnt < MAXQUOTAS; cnt++) { > > ... > > > /* > > + * inode_reserved_space is managed internally by quota, and protected by > > + * i_lock similar to i_blocks+i_bytes. > > + */ > > +static qsize_t *inode_reserved_space(struct inode * inode) > > +{ > > + /* Filesystem must explicitly define it's own method in order to use > > + * quota reservation interface */ > > + BUG_ON(!inode->i_sb->dq_op->get_reserved_space); > > Unless I'm missing something, this just broke quota for everyone > except ext4 ... > > sys_chown > ... > dquot_transfer > inode_get_rsv_space > inode_reserved_space > > will BUG_ON ext3, we get there with (rightly) no ->get_reserved_space. > > Or am I missing something? No, you are exactly right (sadly). Linus already has a pull request with the fix in his mailbox. The fix is attached if you need it for something. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR [-- Attachment #2: 0001-quota-Fix-dquot_transfer-for-filesystems-different-f.patch --] [-- Type: text/x-patch, Size: 1242 bytes --] >From 05b5d898235401c489c68e1f3bc5706a29ad5713 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Wed, 6 Jan 2010 18:03:36 +0100 Subject: [PATCH] quota: Fix dquot_transfer for filesystems different from ext4 Commit fd8fbfc1 modified the way we find amount of reserved space belonging to an inode. The amount of reserved space is checked from dquot_transfer and thus inode_reserved_space gets called even for filesystems that don't provide get_reserved_space callback which results in a BUG. Fix the problem by checking get_reserved_space callback and return 0 if the filesystem does not provide it. CC: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/quota/dquot.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index dea86ab..3fc62b0 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1377,6 +1377,9 @@ static void inode_sub_rsv_space(struct inode *inode, qsize_t number) static qsize_t inode_get_rsv_space(struct inode *inode) { qsize_t ret; + + if (!inode->i_sb->dq_op->get_reserved_space) + return 0; spin_lock(&inode->i_lock); ret = *inode_reserved_space(inode); spin_unlock(&inode->i_lock); -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/5] ext4: Convert to generic reserved quota's space management. 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara 2009-12-23 17:27 ` [PATCH 1/5] Add unlocked version of inode_add_bytes() function Jan Kara 2009-12-23 17:27 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara @ 2009-12-23 17:27 ` Jan Kara 2009-12-23 17:27 ` [PATCH 4/5] ext4: Fix potential quota deadlock Jan Kara 2009-12-23 17:27 ` [PATCH 5/5] ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) Jan Kara 4 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> This patch also fixes write vs chown race condition. Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/ext4.h | 6 +++++- fs/ext4/inode.c | 17 ++++++++--------- fs/ext4/super.c | 5 +++++ 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9714db3..daff7d1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -653,6 +653,10 @@ struct ext4_inode_info { __u16 i_extra_isize; spinlock_t i_block_reservation_lock; +#ifdef CONFIG_QUOTA + /* quota space reservation, managed internally by quota code */ + qsize_t i_reserved_quota; +#endif }; /* @@ -1377,7 +1381,7 @@ extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); extern int ext4_block_truncate_page(handle_t *handle, struct address_space *mapping, loff_t from); extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf); -extern qsize_t ext4_get_reserved_space(struct inode *inode); +extern qsize_t *ext4_get_reserved_space(struct inode *inode); /* ioctl.c */ extern long ext4_ioctl(struct file *, unsigned int, unsigned long); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f9c642b..620ccd5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1020,17 +1020,13 @@ out: return err; } -qsize_t ext4_get_reserved_space(struct inode *inode) +#ifdef CONFIG_QUOTA +qsize_t *ext4_get_reserved_space(struct inode *inode) { - unsigned long long total; - - spin_lock(&EXT4_I(inode)->i_block_reservation_lock); - total = EXT4_I(inode)->i_reserved_data_blocks + - EXT4_I(inode)->i_reserved_meta_blocks; - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); - - return total; + return &EXT4_I(inode)->i_reserved_quota; } +#endif + /* * Calculate the number of metadata blocks need to reserve * to allocate @blocks for non extent file based file @@ -4369,6 +4365,9 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) ((__u64)le16_to_cpu(raw_inode->i_file_acl_high)) << 32; inode->i_size = ext4_isize(raw_inode); ei->i_disksize = inode->i_size; +#ifdef CONFIG_QUOTA + ei->i_reserved_quota = 0; +#endif inode->i_generation = le32_to_cpu(raw_inode->i_generation); ei->i_block_group = iloc.block_group; ei->i_last_alloc_group = ~0; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 8f4f079..f281c5d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -682,6 +682,9 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_allocated_meta_blocks = 0; ei->i_delalloc_reserved_flag = 0; spin_lock_init(&(ei->i_block_reservation_lock)); +#ifdef CONFIG_QUOTA + ei->i_reserved_quota = 0; +#endif return &ei->vfs_inode; } @@ -969,7 +972,9 @@ static struct dquot_operations ext4_quota_operations = { .reserve_space = dquot_reserve_space, .claim_space = dquot_claim_space, .release_rsv = dquot_release_reserved_space, +#ifdef CONFIG_QUOTA .get_reserved_space = ext4_get_reserved_space, +#endif .alloc_inode = dquot_alloc_inode, .free_space = dquot_free_space, .free_inode = dquot_free_inode, -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/5] ext4: Fix potential quota deadlock 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara ` (2 preceding siblings ...) 2009-12-23 17:27 ` [PATCH 3/5] ext4: Convert to generic reserved quota's space management Jan Kara @ 2009-12-23 17:27 ` Jan Kara 2009-12-23 17:27 ` [PATCH 5/5] ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) Jan Kara 4 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> We have to delay vfs_dq_claim_space() until allocation context destruction. Currently we have following call-trace: ext4_mb_new_blocks() /* task is already holding ac->alloc_semp */ ->ext4_mb_mark_diskspace_used ->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */ ->ext4_mb_release_context() /* drop ac->alloc_semp here */ Let's move quota claiming to ext4_da_update_reserve_space() ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc7 #18 ------------------------------------------------------- write-truncate-/3465 is trying to acquire lock: (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0 but task is already holding lock: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&meta_group_info[i]->alloc_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870 [<c029c9d3>] ext4_free_blocks+0x73/0x130 [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0 [<c02a8087>] ext4_truncate+0x187/0x5e0 [<c01e0f7b>] vmtruncate+0x6b/0x70 [<c022ec02>] inode_setattr+0x62/0x190 [<c02a2d7a>] ext4_setattr+0x25a/0x370 [<c022ee81>] notify_change+0x151/0x340 [<c021349d>] do_truncate+0x6d/0xa0 [<c0221034>] may_open+0x1d4/0x200 [<c022412b>] do_filp_open+0x1eb/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #2 (&ei->i_data_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02a5787>] ext4_get_blocks+0x47/0x450 [<c02a74c1>] ext4_getblk+0x61/0x1d0 [<c02a7a7f>] ext4_bread+0x1f/0xa0 [<c02bcddc>] ext4_quota_write+0x12c/0x310 [<c0262d23>] qtree_write_dquot+0x93/0x120 [<c0261708>] v2_write_dquot+0x28/0x30 [<c025d3fb>] dquot_commit+0xab/0xf0 [<c02be977>] ext4_write_dquot+0x77/0x90 [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50 [<c025e321>] dquot_alloc_inode+0x101/0x180 [<c029fec2>] ext4_new_inode+0x602/0xf00 [<c02ad789>] ext4_create+0x89/0x150 [<c0221ff2>] vfs_create+0xa2/0xc0 [<c02246e7>] do_filp_open+0x7a7/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0526505>] mutex_lock_nested+0x65/0x2d0 [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0 [<c02610af>] vfs_quota_on_path+0x5f/0x70 [<c02bc812>] ext4_quota_on+0x112/0x190 [<c026345a>] sys_quotactl+0x44a/0x8a0 [<c0103100>] sysenter_do_call+0x12/0x32 -> #0 (&s->s_dquot.dqptr_sem){++++..}: [<c017d361>] __lock_acquire+0x1091/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c0147028>] do_group_exit+0x38/0xa0 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c01032d2>] work_notifysig+0x13/0x21 other info that might help us debug this: 3 locks held by write-truncate-/3465: #0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0 #1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450 #2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 stack backtrace: Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18 Call Trace: [<c0524cb3>] ? printk+0x1d/0x22 [<c017ac9a>] print_circular_bug+0xca/0xd0 [<c017d361>] __lock_acquire+0x1091/0x1260 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c052712c>] ? down_write+0x8c/0xa0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c017908b>] ? trace_hardirqs_off+0xb/0x10 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180 [<c01d6860>] ? find_get_pages_tag+0x0/0x180 [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0 [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40 [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0 [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c02a69d0>] ? ext4_da_writepages+0x0/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0528137>] ? _spin_unlock_irq+0x27/0x30 [<c0147028>] do_group_exit+0x38/0xa0 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0165b50>] ? autoremove_wake_function+0x0/0x50 [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0300ba4>] ? security_file_permission+0x14/0x20 [<c0215761>] ? vfs_write+0x131/0x190 [<c0214f50>] ? do_sync_write+0x0/0x120 [<c0103115>] ? sysenter_do_call+0x27/0x32 [<c01032d2>] work_notifysig+0x13/0x21 CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 9 +++++++-- fs/ext4/mballoc.c | 6 ------ 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 620ccd5..9307f4b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1064,7 +1064,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks) static void ext4_da_update_reserve_space(struct inode *inode, int used) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - int total, mdb, mdb_free; + int total, mdb, mdb_free, mdb_claim = 0; spin_lock(&EXT4_I(inode)->i_block_reservation_lock); /* recalculate the number of metablocks still need to be reserved */ @@ -1077,7 +1077,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) if (mdb_free) { /* Account for allocated meta_blocks */ - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks; + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; + BUG_ON(mdb_free < mdb_claim); + mdb_free -= mdb_claim; /* update fs dirty blocks counter */ percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); @@ -1088,8 +1090,11 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) /* update per-inode reservations */ BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks); EXT4_I(inode)->i_reserved_data_blocks -= used; + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim); spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); + vfs_dq_claim_block(inode, used + mdb_claim); + /* * free those over-booking quota for metadata blocks */ diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index cd25846..bb14b05 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3018,12 +3018,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED)) /* release all the reserved blocks if non delalloc */ percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks); - else { - percpu_counter_sub(&sbi->s_dirtyblocks_counter, - ac->ac_b_ex.fe_len); - /* convert reserved quota blocks to real quota blocks */ - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len); - } if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/5] ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara ` (3 preceding siblings ...) 2009-12-23 17:27 ` [PATCH 4/5] ext4: Fix potential quota deadlock Jan Kara @ 2009-12-23 17:27 ` Jan Kara 4 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:27 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> Unlock i_block_reservation_lock before vfs_dq_reserve_block(). This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=14739 CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9307f4b..6b7735f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1764,19 +1764,17 @@ repeat: md_needed = mdblocks - EXT4_I(inode)->i_reserved_meta_blocks; total = md_needed + nrblocks; + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); /* * Make quota reservation here to prevent quota overflow * later. Real quota accounting is done at pages writeout * time. */ - if (vfs_dq_reserve_block(inode, total)) { - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); + if (vfs_dq_reserve_block(inode, total)) return -EDQUOT; - } if (ext4_claim_free_blocks(sbi, total)) { - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); if (ext4_should_retry_alloc(inode->i_sb, &retries)) { yield(); goto repeat; @@ -1784,10 +1782,11 @@ repeat: vfs_dq_release_reservation_block(inode, total); return -ENOSPC; } + spin_lock(&EXT4_I(inode)->i_block_reservation_lock); EXT4_I(inode)->i_reserved_data_blocks += nrblocks; - EXT4_I(inode)->i_reserved_meta_blocks = mdblocks; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/5] Ext4 quota fixes for 2.6.32 -stable kernel @ 2009-12-23 17:25 Jan Kara 2009-12-23 17:25 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara 0 siblings, 1 reply; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:25 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso The following five patches fix deadlocks, miscomputations, and similar problems in ext4 implementation of quota. Honza ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/5] quota: decouple fs reserved space from quota reservation 2009-12-23 17:25 [PATCH 0/5] Ext4 quota fixes for 2.6.32 -stable kernel Jan Kara @ 2009-12-23 17:25 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2009-12-23 17:25 UTC (permalink / raw) To: stable; +Cc: linux-ext4, tytso, Dmitry Monakhov, Jan Kara From: Dmitry Monakhov <dmonakhov@openvz.org> Currently inode_reservation is managed by fs itself and this reservation is transfered on dquot_transfer(). This means what inode_reservation must always be in sync with dquot->dq_dqb.dqb_rsvspace. Otherwise dquot_transfer() will result in incorrect quota(WARN_ON in dquot_claim_reserved_space() will be triggered) This is not easy because of complex locking order issues for example http://bugzilla.kernel.org/show_bug.cgi?id=14739 The patch introduce quota reservation field for each fs-inode (fs specific inode is used in order to prevent bloating generic vfs inode). This reservation is managed by quota code internally similar to i_blocks/i_bytes and may not be always in sync with internal fs reservation. Also perform some code rearrangement: - Unify dquot_reserve_space() and dquot_reserve_space() - Unify dquot_release_reserved_space() and dquot_free_space() - Also this patch add missing warning update to release_rsv() dquot_release_reserved_space() must call flush_warnings() as dquot_free_space() does. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/quota/dquot.c | 215 ++++++++++++++++++++++++++-------------------- include/linux/quota.h | 5 +- include/linux/quotaops.h | 11 ++- 3 files changed, 133 insertions(+), 98 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 39b49c4..658552d 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1388,6 +1388,69 @@ void vfs_dq_drop(struct inode *inode) EXPORT_SYMBOL(vfs_dq_drop); /* + * inode_reserved_space is managed internally by quota, and protected by + * i_lock similar to i_blocks+i_bytes. + */ +static qsize_t *inode_reserved_space(struct inode * inode) +{ + /* Filesystem must explicitly define it's own method in order to use + * quota reservation interface */ + BUG_ON(!inode->i_sb->dq_op->get_reserved_space); + return inode->i_sb->dq_op->get_reserved_space(inode); +} + +void inode_add_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) += number; + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_add_rsv_space); + +void inode_claim_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) -= number; + __inode_add_bytes(inode, number); + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_claim_rsv_space); + +void inode_sub_rsv_space(struct inode *inode, qsize_t number) +{ + spin_lock(&inode->i_lock); + *inode_reserved_space(inode) -= number; + spin_unlock(&inode->i_lock); +} +EXPORT_SYMBOL(inode_sub_rsv_space); + +static qsize_t inode_get_rsv_space(struct inode *inode) +{ + qsize_t ret; + spin_lock(&inode->i_lock); + ret = *inode_reserved_space(inode); + spin_unlock(&inode->i_lock); + return ret; +} + +static void inode_incr_space(struct inode *inode, qsize_t number, + int reserve) +{ + if (reserve) + inode_add_rsv_space(inode, number); + else + inode_add_bytes(inode, number); +} + +static void inode_decr_space(struct inode *inode, qsize_t number, int reserve) +{ + if (reserve) + inode_sub_rsv_space(inode, number); + else + inode_sub_bytes(inode, number); +} + +/* * Following four functions update i_blocks+i_bytes fields and * quota information (together with appropriate checks) * NOTE: We absolutely rely on the fact that caller dirties @@ -1405,6 +1468,21 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int cnt, ret = QUOTA_OK; char warntype[MAXQUOTAS]; + /* + * First test before acquiring mutex - solves deadlocks when we + * re-enter the quota code and are already holding the mutex + */ + if (IS_NOQUOTA(inode)) { + inode_incr_space(inode, number, reserve); + goto out; + } + + down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); + if (IS_NOQUOTA(inode)) { + inode_incr_space(inode, number, reserve); + goto out_unlock; + } + for (cnt = 0; cnt < MAXQUOTAS; cnt++) warntype[cnt] = QUOTA_NL_NOWARN; @@ -1415,7 +1493,8 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, if (check_bdq(inode->i_dquot[cnt], number, warn, warntype+cnt) == NO_QUOTA) { ret = NO_QUOTA; - goto out_unlock; + spin_unlock(&dq_data_lock); + goto out_flush_warn; } } for (cnt = 0; cnt < MAXQUOTAS; cnt++) { @@ -1426,64 +1505,32 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, else dquot_incr_space(inode->i_dquot[cnt], number); } - if (!reserve) - inode_add_bytes(inode, number); -out_unlock: + inode_incr_space(inode, number, reserve); spin_unlock(&dq_data_lock); - flush_warnings(inode->i_dquot, warntype); - return ret; -} - -int dquot_alloc_space(struct inode *inode, qsize_t number, int warn) -{ - int cnt, ret = QUOTA_OK; - - /* - * First test before acquiring mutex - solves deadlocks when we - * re-enter the quota code and are already holding the mutex - */ - if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); - goto out; - } - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); - goto out_unlock; - } - - ret = __dquot_alloc_space(inode, number, warn, 0); - if (ret == NO_QUOTA) - goto out_unlock; + if (reserve) + goto out_flush_warn; /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) if (inode->i_dquot[cnt]) mark_dquot_dirty(inode->i_dquot[cnt]); +out_flush_warn: + flush_warnings(inode->i_dquot, warntype); out_unlock: up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); out: return ret; } + +int dquot_alloc_space(struct inode *inode, qsize_t number, int warn) +{ + return __dquot_alloc_space(inode, number, warn, 0); +} EXPORT_SYMBOL(dquot_alloc_space); int dquot_reserve_space(struct inode *inode, qsize_t number, int warn) { - int ret = QUOTA_OK; - - if (IS_NOQUOTA(inode)) - goto out; - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) - goto out_unlock; - - ret = __dquot_alloc_space(inode, number, warn, 1); -out_unlock: - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); -out: - return ret; + return __dquot_alloc_space(inode, number, warn, 1); } EXPORT_SYMBOL(dquot_reserve_space); @@ -1540,14 +1587,14 @@ int dquot_claim_space(struct inode *inode, qsize_t number) int ret = QUOTA_OK; if (IS_NOQUOTA(inode)) { - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); goto out; } down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); if (IS_NOQUOTA(inode)) { up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); goto out; } @@ -1559,7 +1606,7 @@ int dquot_claim_space(struct inode *inode, qsize_t number) number); } /* Update inode bytes */ - inode_add_bytes(inode, number); + inode_claim_rsv_space(inode, number); spin_unlock(&dq_data_lock); /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) @@ -1572,38 +1619,9 @@ out: EXPORT_SYMBOL(dquot_claim_space); /* - * Release reserved quota space - */ -void dquot_release_reserved_space(struct inode *inode, qsize_t number) -{ - int cnt; - - if (IS_NOQUOTA(inode)) - goto out; - - down_read(&sb_dqopt(inode->i_sb)->dqptr_sem); - if (IS_NOQUOTA(inode)) - goto out_unlock; - - spin_lock(&dq_data_lock); - /* Release reserved dquots */ - for (cnt = 0; cnt < MAXQUOTAS; cnt++) { - if (inode->i_dquot[cnt]) - dquot_free_reserved_space(inode->i_dquot[cnt], number); - } - spin_unlock(&dq_data_lock); - -out_unlock: - up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); -out: - return; -} -EXPORT_SYMBOL(dquot_release_reserved_space); - -/* * This operation can block, but only after everything is updated */ -int dquot_free_space(struct inode *inode, qsize_t number) +int __dquot_free_space(struct inode *inode, qsize_t number, int reserve) { unsigned int cnt; char warntype[MAXQUOTAS]; @@ -1612,7 +1630,7 @@ int dquot_free_space(struct inode *inode, qsize_t number) * re-enter the quota code and are already holding the mutex */ if (IS_NOQUOTA(inode)) { out_sub: - inode_sub_bytes(inode, number); + inode_decr_space(inode, number, reserve); return QUOTA_OK; } @@ -1627,21 +1645,43 @@ out_sub: if (!inode->i_dquot[cnt]) continue; warntype[cnt] = info_bdq_free(inode->i_dquot[cnt], number); - dquot_decr_space(inode->i_dquot[cnt], number); + if (reserve) + dquot_free_reserved_space(inode->i_dquot[cnt], number); + else + dquot_decr_space(inode->i_dquot[cnt], number); } - inode_sub_bytes(inode, number); + inode_decr_space(inode, number, reserve); spin_unlock(&dq_data_lock); + + if (reserve) + goto out_unlock; /* Dirtify all the dquots - this can block when journalling */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) if (inode->i_dquot[cnt]) mark_dquot_dirty(inode->i_dquot[cnt]); +out_unlock: flush_warnings(inode->i_dquot, warntype); up_read(&sb_dqopt(inode->i_sb)->dqptr_sem); return QUOTA_OK; } + +int dquot_free_space(struct inode *inode, qsize_t number) +{ + return __dquot_free_space(inode, number, 0); +} EXPORT_SYMBOL(dquot_free_space); /* + * Release reserved quota space + */ +void dquot_release_reserved_space(struct inode *inode, qsize_t number) +{ + __dquot_free_space(inode, number, 1); + +} +EXPORT_SYMBOL(dquot_release_reserved_space); + +/* * This operation can block, but only after everything is updated */ int dquot_free_inode(const struct inode *inode, qsize_t number) @@ -1679,19 +1719,6 @@ int dquot_free_inode(const struct inode *inode, qsize_t number) EXPORT_SYMBOL(dquot_free_inode); /* - * call back function, get reserved quota space from underlying fs - */ -qsize_t dquot_get_reserved_space(struct inode *inode) -{ - qsize_t reserved_space = 0; - - if (sb_any_quota_active(inode->i_sb) && - inode->i_sb->dq_op->get_reserved_space) - reserved_space = inode->i_sb->dq_op->get_reserved_space(inode); - return reserved_space; -} - -/* * Transfer the number of inode and blocks from one diskquota to an other. * * This operation can block, but only after everything is updated @@ -1734,7 +1761,7 @@ int dquot_transfer(struct inode *inode, struct iattr *iattr) } spin_lock(&dq_data_lock); cur_space = inode_get_bytes(inode); - rsv_space = dquot_get_reserved_space(inode); + rsv_space = inode_get_rsv_space(inode); space = cur_space + rsv_space; /* Build the transfer_from list and check the limits */ for (cnt = 0; cnt < MAXQUOTAS; cnt++) { diff --git a/include/linux/quota.h b/include/linux/quota.h index 78c4889..8fd8efc 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -313,8 +313,9 @@ struct dquot_operations { int (*claim_space) (struct inode *, qsize_t); /* release rsved quota for delayed alloc */ void (*release_rsv) (struct inode *, qsize_t); - /* get reserved quota for delayed alloc */ - qsize_t (*get_reserved_space) (struct inode *); + /* get reserved quota for delayed alloc, value returned is managed by + * quota code only */ + qsize_t *(*get_reserved_space) (struct inode *); }; /* Operations handling requests from userspace */ diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 3ebb231..a529d86 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -26,6 +26,10 @@ static inline void writeout_quota_sb(struct super_block *sb, int type) sb->s_qcop->quota_sync(sb, type); } +void inode_add_rsv_space(struct inode *inode, qsize_t number); +void inode_claim_rsv_space(struct inode *inode, qsize_t number); +void inode_sub_rsv_space(struct inode *inode, qsize_t number); + int dquot_initialize(struct inode *inode, int type); int dquot_drop(struct inode *inode); struct dquot *dqget(struct super_block *sb, unsigned int id, int type); @@ -42,7 +46,6 @@ int dquot_alloc_inode(const struct inode *inode, qsize_t number); int dquot_reserve_space(struct inode *inode, qsize_t number, int prealloc); int dquot_claim_space(struct inode *inode, qsize_t number); void dquot_release_reserved_space(struct inode *inode, qsize_t number); -qsize_t dquot_get_reserved_space(struct inode *inode); int dquot_free_space(struct inode *inode, qsize_t number); int dquot_free_inode(const struct inode *inode, qsize_t number); @@ -199,6 +202,8 @@ static inline int vfs_dq_reserve_space(struct inode *inode, qsize_t nr) if (inode->i_sb->dq_op->reserve_space(inode, nr, 0) == NO_QUOTA) return 1; } + else + inode_add_rsv_space(inode, nr); return 0; } @@ -221,7 +226,7 @@ static inline int vfs_dq_claim_space(struct inode *inode, qsize_t nr) if (inode->i_sb->dq_op->claim_space(inode, nr) == NO_QUOTA) return 1; } else - inode_add_bytes(inode, nr); + inode_claim_rsv_space(inode, nr); mark_inode_dirty(inode); return 0; @@ -235,6 +240,8 @@ void vfs_dq_release_reservation_space(struct inode *inode, qsize_t nr) { if (sb_any_quota_active(inode->i_sb)) inode->i_sb->dq_op->release_rsv(inode, nr); + else + inode_sub_rsv_space(inode, nr); } static inline void vfs_dq_free_space_nodirty(struct inode *inode, qsize_t nr) -- 1.6.4.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-01-11 17:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-23 17:27 [PATCH 0/5] Ext4 quota fixes for 2.6.31 kernel Jan Kara 2009-12-23 17:27 ` [PATCH 1/5] Add unlocked version of inode_add_bytes() function Jan Kara 2009-12-23 17:27 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara 2010-01-11 17:53 ` Eric Sandeen 2010-01-11 18:00 ` Jan Kara 2009-12-23 17:27 ` [PATCH 3/5] ext4: Convert to generic reserved quota's space management Jan Kara 2009-12-23 17:27 ` [PATCH 4/5] ext4: Fix potential quota deadlock Jan Kara 2009-12-23 17:27 ` [PATCH 5/5] ext4: fix sleep inside spinlock issue with quota and dealloc (#14739) Jan Kara -- strict thread matches above, loose matches on Subject: below -- 2009-12-23 17:25 [PATCH 0/5] Ext4 quota fixes for 2.6.32 -stable kernel Jan Kara 2009-12-23 17:25 ` [PATCH 2/5] quota: decouple fs reserved space from quota reservation Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).