From: Theodore Ts'o <tytso@mit.edu>
To: stable@kernel.org
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
Dmitry Monakhov <dmonakhov@openvz.org>,
Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>
Subject: [PATCH v2.6.32.y 01/53] ext4: Fix potential quota deadlock
Date: Sun, 30 May 2010 22:49:14 -0400 [thread overview]
Message-ID: <1275274206-3900-1-git-send-email-tytso@mit.edu> (raw)
From: Dmitry Monakhov <dmonakhov@openvz.org>
commit d21cd8f163ac44b15c465aab7306db931c606908 upstream (as of v2.6.33-rc2)
We have to delay vfs_dq_claim_space() until allocation context destruction.
Currently we have following call-trace:
ext4_mb_new_blocks()
/* task is already holding ac->alloc_semp */
->ext4_mb_mark_diskspace_used
->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */
->ext4_mb_release_context() /* drop ac->alloc_semp here */
Let's move quota claiming to ext4_da_update_reserve_space()
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc7 #18
-------------------------------------------------------
write-truncate-/3465 is trying to acquire lock:
(&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
but task is already holding lock:
(&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&meta_group_info[i]->alloc_sem){++++..}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
[<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
[<c029c9d3>] ext4_free_blocks+0x73/0x130
[<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
[<c02a8087>] ext4_truncate+0x187/0x5e0
[<c01e0f7b>] vmtruncate+0x6b/0x70
[<c022ec02>] inode_setattr+0x62/0x190
[<c02a2d7a>] ext4_setattr+0x25a/0x370
[<c022ee81>] notify_change+0x151/0x340
[<c021349d>] do_truncate+0x6d/0xa0
[<c0221034>] may_open+0x1d4/0x200
[<c022412b>] do_filp_open+0x1eb/0x910
[<c021244d>] do_sys_open+0x6d/0x140
[<c021258e>] sys_open+0x2e/0x40
[<c0103100>] sysenter_do_call+0x12/0x32
-> #2 (&ei->i_data_sem){++++..}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c02a5787>] ext4_get_blocks+0x47/0x450
[<c02a74c1>] ext4_getblk+0x61/0x1d0
[<c02a7a7f>] ext4_bread+0x1f/0xa0
[<c02bcddc>] ext4_quota_write+0x12c/0x310
[<c0262d23>] qtree_write_dquot+0x93/0x120
[<c0261708>] v2_write_dquot+0x28/0x30
[<c025d3fb>] dquot_commit+0xab/0xf0
[<c02be977>] ext4_write_dquot+0x77/0x90
[<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
[<c025e321>] dquot_alloc_inode+0x101/0x180
[<c029fec2>] ext4_new_inode+0x602/0xf00
[<c02ad789>] ext4_create+0x89/0x150
[<c0221ff2>] vfs_create+0xa2/0xc0
[<c02246e7>] do_filp_open+0x7a7/0x910
[<c021244d>] do_sys_open+0x6d/0x140
[<c021258e>] sys_open+0x2e/0x40
[<c0103100>] sysenter_do_call+0x12/0x32
-> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
[<c017d04b>] __lock_acquire+0xd7b/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0526505>] mutex_lock_nested+0x65/0x2d0
[<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
[<c02610af>] vfs_quota_on_path+0x5f/0x70
[<c02bc812>] ext4_quota_on+0x112/0x190
[<c026345a>] sys_quotactl+0x44a/0x8a0
[<c0103100>] sysenter_do_call+0x12/0x32
-> #0 (&s->s_dquot.dqptr_sem){++++..}:
[<c017d361>] __lock_acquire+0x1091/0x1260
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c0527191>] down_read+0x51/0x90
[<c025e73b>] dquot_claim_space+0x3b/0x1b0
[<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
[<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
[<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
[<c02a5966>] ext4_get_blocks+0x226/0x450
[<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
[<c02a6ed6>] ext4_da_writepages+0x506/0x790
[<c01de272>] do_writepages+0x22/0x50
[<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
[<c01d7b9b>] filemap_flush+0x2b/0x30
[<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
[<c029e595>] ext4_release_file+0x75/0xb0
[<c0216b59>] __fput+0xf9/0x210
[<c0216c97>] fput+0x27/0x30
[<c02122dc>] filp_close+0x4c/0x80
[<c014510e>] put_files_struct+0x6e/0xd0
[<c01451b7>] exit_files+0x47/0x60
[<c0146a24>] do_exit+0x144/0x710
[<c0147028>] do_group_exit+0x38/0xa0
[<c0159abc>] get_signal_to_deliver+0x2ac/0x410
[<c0102849>] do_notify_resume+0xb9/0x890
[<c01032d2>] work_notifysig+0x13/0x21
other info that might help us debug this:
3 locks held by write-truncate-/3465:
#0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
#1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
#2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
stack backtrace:
Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
Call Trace:
[<c0524cb3>] ? printk+0x1d/0x22
[<c017ac9a>] print_circular_bug+0xca/0xd0
[<c017d361>] __lock_acquire+0x1091/0x1260
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c017d5ea>] lock_acquire+0xba/0xd0
[<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
[<c0527191>] down_read+0x51/0x90
[<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
[<c025e73b>] dquot_claim_space+0x3b/0x1b0
[<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
[<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
[<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
[<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c052712c>] ? down_write+0x8c/0xa0
[<c02a5966>] ext4_get_blocks+0x226/0x450
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c017908b>] ? trace_hardirqs_off+0xb/0x10
[<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
[<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
[<c01d6860>] ? find_get_pages_tag+0x0/0x180
[<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
[<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
[<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
[<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
[<c02a6ed6>] ext4_da_writepages+0x506/0x790
[<c016beef>] ? cpu_clock+0x4f/0x60
[<c016bca2>] ? sched_clock_local+0xd2/0x170
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c016be60>] ? sched_clock_cpu+0x120/0x160
[<c02a69d0>] ? ext4_da_writepages+0x0/0x790
[<c01de272>] do_writepages+0x22/0x50
[<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
[<c01d7b9b>] filemap_flush+0x2b/0x30
[<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
[<c029e595>] ext4_release_file+0x75/0xb0
[<c0216b59>] __fput+0xf9/0x210
[<c0216c97>] fput+0x27/0x30
[<c02122dc>] filp_close+0x4c/0x80
[<c014510e>] put_files_struct+0x6e/0xd0
[<c01451b7>] exit_files+0x47/0x60
[<c0146a24>] do_exit+0x144/0x710
[<c017b163>] ? lock_release_holdtime+0x33/0x210
[<c0528137>] ? _spin_unlock_irq+0x27/0x30
[<c0147028>] do_group_exit+0x38/0xa0
[<c017babb>] ? trace_hardirqs_on+0xb/0x10
[<c0159abc>] get_signal_to_deliver+0x2ac/0x410
[<c0102849>] do_notify_resume+0xb9/0x890
[<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
[<c017b163>] ? lock_release_holdtime+0x33/0x210
[<c0165b50>] ? autoremove_wake_function+0x0/0x50
[<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
[<c017babb>] ? trace_hardirqs_on+0xb/0x10
[<c0300ba4>] ? security_file_permission+0x14/0x20
[<c0215761>] ? vfs_write+0x131/0x190
[<c0214f50>] ? do_sync_write+0x0/0x120
[<c0103115>] ? sysenter_do_call+0x27/0x32
[<c01032d2>] work_notifysig+0x13/0x21
CC: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 9 +++++++--
fs/ext4/mballoc.c | 6 ------
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 16efcee..9b81b76 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1088,7 +1088,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
static void ext4_da_update_reserve_space(struct inode *inode, int used)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- int total, mdb, mdb_free;
+ int total, mdb, mdb_free, mdb_claim = 0;
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
/* recalculate the number of metablocks still need to be reserved */
@@ -1101,7 +1101,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
if (mdb_free) {
/* Account for allocated meta_blocks */
- mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
+ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
+ BUG_ON(mdb_free < mdb_claim);
+ mdb_free -= mdb_claim;
/* update fs dirty blocks counter */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
@@ -1112,8 +1114,11 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
/* update per-inode reservations */
BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
EXT4_I(inode)->i_reserved_data_blocks -= used;
+ percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+ vfs_dq_claim_block(inode, used + mdb_claim);
+
/*
* free those over-booking quota for metadata blocks
*/
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 7d71148..82b9778 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2755,12 +2755,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
/* release all the reserved blocks if non delalloc */
percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
- else {
- percpu_counter_sub(&sbi->s_dirtyblocks_counter,
- ac->ac_b_ex.fe_len);
- /* convert reserved quota blocks to real quota blocks */
- vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
- }
if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi,
--
1.6.6.1.1.g974db.dirty
next reply other threads:[~2010-05-31 2:50 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-31 2:49 Theodore Ts'o [this message]
2010-05-31 2:49 ` [PATCH v2.6.32.y 02/53] jbd: jbd-debug and jbd2-debug should be writable Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 03/53] ext4: replace BUG() with return -EIO in ext4_ext_get_blocks Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 04/53] ext4, jbd2: Add barriers for file systems with exernal journals Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 05/53] ext4: Eliminate potential double free on error path Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 06/53] ext4: return correct wbc.nr_to_write in ext4_da_writepages Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 07/53] ext4: Ensure zeroout blocks have no dirty metadata Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 08/53] ext4: Patch up how we claim metadata blocks for quota purposes Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 09/53] ext4: Fix accounting of reserved metadata blocks Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 10/53] ext4: Calculate metadata requirements more accurately Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 11/53] ext4: Handle -EDQUOT error on write Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 12/53] ext4: Fix quota accounting error with fallocate Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 13/53] ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 14/53] ext4: Use bitops to read/modify EXT4_I(inode)->i_state Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 15/53] ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 16/53] ext4: Add flag to files with blocks intentionally past EOF Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 17/53] ext4: Fix fencepost error in chosing choosing group vs file preallocation Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 18/53] ext4: fix error handling in migrate Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 19/53] ext4: explicitly remove inode from orphan list after failed direct io Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 20/53] ext4: Handle non empty on-disk orphan link Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 21/53] ext4: make "offset" consistent in ext4_check_dir_entry() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 22/53] ext4: Fix insertion point of extent in mext_insert_across_blocks() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 23/53] ext4: Fix the NULL reference in double_down_write_data_sem() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 24/53] ext4: Code cleanup for EXT4_IOC_MOVE_EXT ioctl Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 25/53] ext4: Fix estimate of # of blocks needed to write indirect-mapped files Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 26/53] ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 27/53] ext4: Fix possible lost inode write in no journal mode Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 28/53] ext4: Fix buffer head leaks after calls to ext4_get_inode_loc() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 29/53] ext4: Issue the discard operation *before* releasing the blocks to be reused Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 30/53] ext4: check missed return value in ext4_sync_file() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 31/53] ext4: fix memory leaks in error path handling of ext4_ext_zeroout() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 32/53] ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 33/53] ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy() Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 34/53] ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 35/53] ext4: fix quota accounting in case of fallocate Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 36/53] ext4: check s_log_groups_per_flex in online resize code Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 37/53] ext4: don't return to userspace after freezing the fs with a mutex held Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 38/53] ext4: stop issuing discards if not supported by device Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 39/53] ext4: don't scan/accumulate more pages than mballoc will allocate Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 40/53] ext4: Do not zero out uninitialized extents beyond i_size Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 41/53] ext4: clean up inode bitmaps manipulation in ext4_free_inode Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 42/53] ext4: init statistics after journal recovery Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 43/53] ext4: Remove extraneous newlines in ext4_msg() calls Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 44/53] ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 45/53] ext4: check for a good block group before loading buddy pages Theodore Ts'o
2010-05-31 2:49 ` [PATCH v2.6.32.y 46/53] ext4: Show journal_checksum option Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 47/53] ext4: Use bitops to read/modify i_flags in struct ext4_inode_info Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 48/53] ext4: Avoid crashing on NULL ptr dereference on a filesystem error Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 49/53] ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 50/53] ext4: restart ext4_ext_remove_space() after transaction restart Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 51/53] ext4: Conditionally define compat ioctl numbers Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 52/53] ext4: Fix compat EXT4_IOC_ADD_GROUP Theodore Ts'o
2010-05-31 2:50 ` [PATCH v2.6.32.y 53/53] ext4: Make fsync sync new parent directories in no-journal mode Theodore Ts'o
2010-06-25 22:29 ` [stable] [PATCH v2.6.32.y 01/53] ext4: Fix potential quota deadlock Greg KH
2010-06-26 23:19 ` tytso
2010-06-28 15:48 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1275274206-3900-1-git-send-email-tytso@mit.edu \
--to=tytso@mit.edu \
--cc=dmonakhov@openvz.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).