From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
Jan Kara <jack@suse.cz>, "Theodore Tso" <tytso@mit.edu>,
Greg Kroah-Hartman <gregkh@suse.de>
Subject: [09/90] ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()
Date: Thu, 10 Dec 2009 20:24:47 -0800 [thread overview]
Message-ID: <20091211042729.262525249@linux.site> (raw)
In-Reply-To: <20091211043502.GA17916@kroah.com>
[-- Attachment #1: 0009-ext4-Fix-possible-deadlock-between-ext4_truncate-and.patch --]
[-- Type: text/plain, Size: 4892 bytes --]
2.6.31-stable review patch. If anyone has any objections, please let us know.
------------------
During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).
(cherry picked from commit 487caeef9fc08c0565e082c40a8aaf58dad92bbb)
We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:
1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).
2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.
This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/ext4/ext4.h | 1 +
fs/ext4/extents.c | 15 ++++++++++++---
fs/ext4/inode.c | 23 +++++++++++++++++++----
3 files changed, 32 insertions(+), 7 deletions(-)
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1370,6 +1370,7 @@ extern int ext4_change_inode_journal_fla
extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
extern int ext4_can_truncate(struct inode *inode);
extern void ext4_truncate(struct inode *);
+extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks);
extern void ext4_set_inode_flags(struct inode *);
extern void ext4_get_inode_flags(struct ext4_inode_info *);
extern int ext4_alloc_da_blocks(struct inode *inode);
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -93,7 +93,9 @@ static void ext4_idx_store_pblock(struct
ix->ei_leaf_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff);
}
-static int ext4_ext_journal_restart(handle_t *handle, int needed)
+static int ext4_ext_truncate_extend_restart(handle_t *handle,
+ struct inode *inode,
+ int needed)
{
int err;
@@ -104,7 +106,14 @@ static int ext4_ext_journal_restart(hand
err = ext4_journal_extend(handle, needed);
if (err <= 0)
return err;
- return ext4_journal_restart(handle, needed);
+ err = ext4_truncate_restart_trans(handle, inode, needed);
+ /*
+ * We have dropped i_data_sem so someone might have cached again
+ * an extent we are going to truncate.
+ */
+ ext4_ext_invalidate_cache(inode);
+
+ return err;
}
/*
@@ -2138,7 +2147,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc
}
credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
- err = ext4_ext_journal_restart(handle, credits);
+ err = ext4_ext_truncate_extend_restart(handle, inode, credits);
if (err)
goto out;
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -192,11 +192,24 @@ static int try_to_extend_transaction(han
* so before we call here everything must be consistently dirtied against
* this transaction.
*/
-static int ext4_journal_test_restart(handle_t *handle, struct inode *inode)
+ int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode,
+ int nblocks)
{
+ int ret;
+
+ /*
+ * Drop i_data_sem to avoid deadlock with ext4_get_blocks At this
+ * moment, get_block can be called only for blocks inside i_size since
+ * page cache has been already dropped and writes are blocked by
+ * i_mutex. So we can safely drop the i_data_sem here.
+ */
BUG_ON(EXT4_JOURNAL(inode) == NULL);
jbd_debug(2, "restarting handle %p\n", handle);
- return ext4_journal_restart(handle, blocks_for_truncate(inode));
+ up_write(&EXT4_I(inode)->i_data_sem);
+ ret = ext4_journal_restart(handle, blocks_for_truncate(inode));
+ down_write(&EXT4_I(inode)->i_data_sem);
+
+ return ret;
}
/*
@@ -3659,7 +3672,8 @@ static void ext4_clear_blocks(handle_t *
ext4_handle_dirty_metadata(handle, inode, bh);
}
ext4_mark_inode_dirty(handle, inode);
- ext4_journal_test_restart(handle, inode);
+ ext4_truncate_restart_trans(handle, inode,
+ blocks_for_truncate(inode));
if (bh) {
BUFFER_TRACE(bh, "retaking write access");
ext4_journal_get_write_access(handle, bh);
@@ -3870,7 +3884,8 @@ static void ext4_free_branches(handle_t
return;
if (try_to_extend_transaction(handle, inode)) {
ext4_mark_inode_dirty(handle, inode);
- ext4_journal_test_restart(handle, inode);
+ ext4_truncate_restart_trans(handle, inode,
+ blocks_for_truncate(inode));
}
ext4_free_blocks(handle, inode, nr, 1, 1);
next prev parent reply other threads:[~2009-12-11 4:36 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20091211042438.970725457@linux.site>
2009-12-11 4:35 ` [00/90] 2.6.31.8-stable review Greg KH
2009-12-11 4:24 ` [01/90] ext4: Fix memory leak fix when mounting an ext4 filesystem Greg KH
2009-12-11 4:24 ` [02/90] ext4: Avoid null pointer dereference when decoding EROFS w/o a journal Greg KH
2009-12-11 4:24 ` [03/90] jbd2: Fail to load a journal if it is too short Greg KH
2009-12-11 4:24 ` [04/90] jbd2: round commit timer up to avoid uncommitted transaction Greg KH
2009-12-11 4:24 ` [05/90] ext4: fix journal ref count in move_extent_par_page Greg KH
2009-12-11 4:24 ` [06/90] ext4: Fix bugs in mballocs stream allocation mode Greg KH
2009-12-11 4:24 ` [07/90] ext4: Avoid group preallocation for closed files Greg KH
2009-12-11 4:24 ` [08/90] jbd2: Annotate transaction start also for jbd2_journal_restart() Greg KH
2009-12-11 4:24 ` Greg KH [this message]
2009-12-11 4:24 ` [10/90] ext4: reject too-large filesystems on 32-bit kernels Greg KH
2009-12-11 4:24 ` [11/90] ext4: Add feature set check helper for mount & remount paths Greg KH
2009-12-11 4:24 ` [12/90] ext4: Add missing unlock_new_inode() call in extent migration code Greg KH
2009-12-11 4:24 ` [13/90] ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories Greg KH
2009-12-11 4:24 ` [14/90] ext4: Limit number of links that can be created by ext4_link() Greg KH
2009-12-11 4:24 ` [15/90] ext4: Restore wbc->range_start in ext4_da_writepages() Greg KH
2009-12-11 4:24 ` [16/90] ext4: fix cache flush in ext4_sync_file Greg KH
2009-12-11 4:24 ` [17/90] ext4: Fix wrong comparisons in mext_check_arguments() Greg KH
2009-12-11 4:24 ` [18/90] ext4: Remove unneeded BUG_ON() in ext4_move_extents() Greg KH
2009-12-11 4:24 ` [19/90] ext4: Return exchanged blocks count to user space in failure Greg KH
2009-12-11 4:24 ` [20/90] ext4: Take page lock before looking at attached buffer_heads flags Greg KH
2009-12-11 4:24 ` [21/90] ext4: print more sysadmin-friendly message in check_block_validity() Greg KH
2009-12-11 4:25 ` [22/90] ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() Greg KH
2009-12-11 4:25 ` [23/90] ext4: Assure that metadata blocks are written during fsync in no journal mode Greg KH
2009-12-11 4:25 ` [24/90] ext4: Make non-journal fsync work properly Greg KH
2009-12-11 4:25 ` [25/90] ext4: move ext4_mb_init_group() function earlier in the mballoc.c Greg KH
2009-12-11 4:25 ` [26/90] ext4: check for need init flag in ext4_mb_load_buddy Greg KH
2009-12-11 4:25 ` [27/90] ext4: Dont update superblock write time when filesystem is read-only Greg KH
2009-12-11 4:25 ` [28/90] ext4: Always set dx_nodes fake_dirent explicitly Greg KH
2009-12-11 4:25 ` [29/90] ext4: Fix initalization of s_flex_groups Greg KH
2009-12-11 4:25 ` [30/90] ext4: Fix include/trace/events/ext4.h to work with Systemtap Greg KH
2009-12-11 4:25 ` [31/90] ext4: Fix small typo for move_extent_per_page() Greg KH
2009-12-11 4:25 ` [32/90] ext4: Replace get_ext_path macro with an inline funciton Greg KH
2009-12-11 4:25 ` [33/90] ext4: Replace BUG_ON() with ext4_error() in move_extents.c Greg KH
2009-12-11 4:25 ` [34/90] ext4: Add null extent check to ext_get_path Greg KH
2009-12-11 4:25 ` [35/90] ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11 4:25 ` [36/90] ext4: limit block allocations for indirect-block files to < 2^32 Greg KH
2009-12-11 4:25 ` [37/90] ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags Greg KH
2009-12-11 4:25 ` [38/90] ext4: Fix the alloc on close after a truncate hueristic Greg KH
2009-12-11 4:25 ` [39/90] ext4: Fix hueristic which avoids group preallocation for closed files Greg KH
2009-12-11 4:25 ` [40/90] ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks Greg KH
2009-12-11 4:25 ` [41/90] ext4: release reserved quota when block reservation for delalloc retry Greg KH
2009-12-11 4:25 ` [42/90] ext4: Split uninitialized extents for direct I/O Greg KH
2009-12-11 4:25 ` [43/90] ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O Greg KH
2009-12-11 4:25 ` [44/90] ext4: async direct IO for holes and fallocate support Greg KH
2009-12-11 4:25 ` [45/90] ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first Greg KH
2009-12-11 4:25 ` [46/90] ext4: Avoid updating the inode table bh twice in no journal mode Greg KH
2009-12-11 4:25 ` [47/90] ext4: Make sure ext4_dirty_inode() updates the inode " Greg KH
2009-12-11 4:25 ` [48/90] ext4: Handle nested ext4_journal_start/stop calls without a journal Greg KH
2009-12-11 4:25 ` [49/90] ext4: Fix time encoding with extra epoch bits Greg KH
2009-12-11 4:25 ` [50/90] ext4: fix a BUG_ON crash by checking that page has buffers attached to it Greg KH
2009-12-11 4:25 ` [51/90] ext4: retry failed direct IO allocations Greg KH
2009-12-11 4:25 ` [52/90] ext4: discard preallocation when restarting a transaction during truncate Greg KH
2009-12-11 4:25 ` [53/90] ext4: fix ext4_ext_direct_IO()s return value after converting uninit extents Greg KH
2009-12-11 4:25 ` [54/90] ext4: skip conversion of uninit extents after direct IO if there isnt any Greg KH
2009-12-11 4:25 ` [55/90] ext4: code clean up for dio fallocate handling Greg KH
2009-12-11 4:25 ` [56/90] ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O Greg KH
2009-12-11 4:25 ` [57/90] ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC Greg KH
2009-12-11 4:25 ` [58/90] ext4: avoid divide by zero when trying to mount a corrupted file system Greg KH
2009-12-11 4:25 ` [59/90] ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails Greg KH
2009-12-11 4:25 ` [60/90] ext4: fix lock order problem in ext4_move_extents() Greg KH
2009-12-11 4:25 ` [61/90] ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11 4:25 ` [62/90] ext4: plug a buffer_head leak in an error path of ext4_iget() Greg KH
2009-12-11 4:25 ` [63/90] ext4: make sure directory and symlink blocks are revoked Greg KH
2009-12-11 4:25 ` [64/90] ext4: fix i_flags access in ext4_da_writepages_trans_blocks() Greg KH
2009-12-11 4:25 ` [65/90] ext4: journal all modifications in ext4_xattr_set_handle Greg KH
2009-12-11 4:25 ` [66/90] ext4: dont update the superblock in ext4_statfs() Greg KH
2009-12-11 4:25 ` [67/90] ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero Greg KH
2009-12-11 4:25 ` [68/90] ext4: fix block validity checks so they work correctly with meta_bg Greg KH
2009-12-11 4:25 ` [69/90] ext4: avoid issuing unnecessary barriers Greg KH
2009-12-11 4:25 ` [70/90] ext4: fix error handling in ext4_ind_get_blocks() Greg KH
2009-12-11 4:25 ` [71/90] ext4: make trim/discard optional (and off by default) Greg KH
2009-12-11 4:25 ` [72/90] ext4: make "norecovery" an alias for "noload" Greg KH
2009-12-11 4:25 ` [73/90] ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT Greg KH
2009-12-11 4:25 ` [74/90] ext4: initialize moved_len before calling ext4_move_extents() Greg KH
2009-12-11 4:25 ` [75/90] ext4: move_extent_per_page() cleanup Greg KH
2009-12-11 4:25 ` [76/90] jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() Greg KH
2009-12-11 4:25 ` [77/90] ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() Greg KH
2009-12-11 4:25 ` [78/90] ext4: Avoid data / filesystem corruption when write fails to copy data Greg KH
2009-12-11 4:25 ` [79/90] ext4: wait for log to commit when umounting Greg KH
2009-12-11 4:25 ` [80/90] ext4: remove blocks from inode prealloc list on failure Greg KH
2009-12-11 4:25 ` [81/90] ext4: ext4_get_reserved_space() must return bytes instead of blocks Greg KH
2009-12-11 4:26 ` [82/90] ext4: quota macros cleanup Greg KH
2009-12-11 4:26 ` [83/90] ext4: fix incorrect block reservation on quota transfer Greg KH
2009-12-11 4:26 ` [84/90] ext4: Wait for proper transaction commit on fsync Greg KH
2009-12-11 4:26 ` [85/90] ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11 4:26 ` [86/90] SCSI: megaraid_sas: fix 64 bit sense pointer truncation Greg KH
2009-12-11 4:26 ` [87/90] SCSI: osd_protocol.h: Add missing #include Greg KH
2009-12-11 4:26 ` [88/90] SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects Greg KH
2009-12-11 4:26 ` [89/90] signal: Fix alternate signal stack check Greg KH
2009-12-11 4:26 ` [90/90] ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091211042729.262525249@linux.site \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=stable-review@kernel.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox