public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	Jan Kara <jack@suse.cz>, "Theodore Tso" <tytso@mit.edu>,
	Greg Kroah-Hartman <gregkh@suse.de>
Subject: [09/90] ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()
Date: Thu, 10 Dec 2009 20:24:47 -0800	[thread overview]
Message-ID: <20091211042729.262525249@linux.site> (raw)
In-Reply-To: <20091211043502.GA17916@kroah.com>

[-- Attachment #1: 0009-ext4-Fix-possible-deadlock-between-ext4_truncate-and.patch --]
[-- Type: text/plain, Size: 4892 bytes --]

2.6.31-stable review patch.  If anyone has any objections, please let us know.

------------------
During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).

(cherry picked from commit 487caeef9fc08c0565e082c40a8aaf58dad92bbb)

We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:

1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).

2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.

This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 fs/ext4/ext4.h    |    1 +
 fs/ext4/extents.c |   15 ++++++++++++---
 fs/ext4/inode.c   |   23 +++++++++++++++++++----
 3 files changed, 32 insertions(+), 7 deletions(-)

--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1370,6 +1370,7 @@ extern int ext4_change_inode_journal_fla
 extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
 extern int ext4_can_truncate(struct inode *inode);
 extern void ext4_truncate(struct inode *);
+extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks);
 extern void ext4_set_inode_flags(struct inode *);
 extern void ext4_get_inode_flags(struct ext4_inode_info *);
 extern int ext4_alloc_da_blocks(struct inode *inode);
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -93,7 +93,9 @@ static void ext4_idx_store_pblock(struct
 	ix->ei_leaf_hi = cpu_to_le16((unsigned long) ((pb >> 31) >> 1) & 0xffff);
 }
 
-static int ext4_ext_journal_restart(handle_t *handle, int needed)
+static int ext4_ext_truncate_extend_restart(handle_t *handle,
+					    struct inode *inode,
+					    int needed)
 {
 	int err;
 
@@ -104,7 +106,14 @@ static int ext4_ext_journal_restart(hand
 	err = ext4_journal_extend(handle, needed);
 	if (err <= 0)
 		return err;
-	return ext4_journal_restart(handle, needed);
+	err = ext4_truncate_restart_trans(handle, inode, needed);
+	/*
+	 * We have dropped i_data_sem so someone might have cached again
+	 * an extent we are going to truncate.
+	 */
+	ext4_ext_invalidate_cache(inode);
+
+	return err;
 }
 
 /*
@@ -2138,7 +2147,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc
 		}
 		credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
 
-		err = ext4_ext_journal_restart(handle, credits);
+		err = ext4_ext_truncate_extend_restart(handle, inode, credits);
 		if (err)
 			goto out;
 
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -192,11 +192,24 @@ static int try_to_extend_transaction(han
  * so before we call here everything must be consistently dirtied against
  * this transaction.
  */
-static int ext4_journal_test_restart(handle_t *handle, struct inode *inode)
+ int ext4_truncate_restart_trans(handle_t *handle, struct inode *inode,
+				 int nblocks)
 {
+	int ret;
+
+	/*
+	 * Drop i_data_sem to avoid deadlock with ext4_get_blocks At this
+	 * moment, get_block can be called only for blocks inside i_size since
+	 * page cache has been already dropped and writes are blocked by
+	 * i_mutex. So we can safely drop the i_data_sem here.
+	 */
 	BUG_ON(EXT4_JOURNAL(inode) == NULL);
 	jbd_debug(2, "restarting handle %p\n", handle);
-	return ext4_journal_restart(handle, blocks_for_truncate(inode));
+	up_write(&EXT4_I(inode)->i_data_sem);
+	ret = ext4_journal_restart(handle, blocks_for_truncate(inode));
+	down_write(&EXT4_I(inode)->i_data_sem);
+
+	return ret;
 }
 
 /*
@@ -3659,7 +3672,8 @@ static void ext4_clear_blocks(handle_t *
 			ext4_handle_dirty_metadata(handle, inode, bh);
 		}
 		ext4_mark_inode_dirty(handle, inode);
-		ext4_journal_test_restart(handle, inode);
+		ext4_truncate_restart_trans(handle, inode,
+					    blocks_for_truncate(inode));
 		if (bh) {
 			BUFFER_TRACE(bh, "retaking write access");
 			ext4_journal_get_write_access(handle, bh);
@@ -3870,7 +3884,8 @@ static void ext4_free_branches(handle_t
 				return;
 			if (try_to_extend_transaction(handle, inode)) {
 				ext4_mark_inode_dirty(handle, inode);
-				ext4_journal_test_restart(handle, inode);
+				ext4_truncate_restart_trans(handle, inode,
+					    blocks_for_truncate(inode));
 			}
 
 			ext4_free_blocks(handle, inode, nr, 1, 1);



  parent reply	other threads:[~2009-12-11  4:36 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091211042438.970725457@linux.site>
2009-12-11  4:35 ` [00/90] 2.6.31.8-stable review Greg KH
2009-12-11  4:24   ` [01/90] ext4: Fix memory leak fix when mounting an ext4 filesystem Greg KH
2009-12-11  4:24   ` [02/90] ext4: Avoid null pointer dereference when decoding EROFS w/o a journal Greg KH
2009-12-11  4:24   ` [03/90] jbd2: Fail to load a journal if it is too short Greg KH
2009-12-11  4:24   ` [04/90] jbd2: round commit timer up to avoid uncommitted transaction Greg KH
2009-12-11  4:24   ` [05/90] ext4: fix journal ref count in move_extent_par_page Greg KH
2009-12-11  4:24   ` [06/90] ext4: Fix bugs in mballocs stream allocation mode Greg KH
2009-12-11  4:24   ` [07/90] ext4: Avoid group preallocation for closed files Greg KH
2009-12-11  4:24   ` [08/90] jbd2: Annotate transaction start also for jbd2_journal_restart() Greg KH
2009-12-11  4:24   ` Greg KH [this message]
2009-12-11  4:24   ` [10/90] ext4: reject too-large filesystems on 32-bit kernels Greg KH
2009-12-11  4:24   ` [11/90] ext4: Add feature set check helper for mount & remount paths Greg KH
2009-12-11  4:24   ` [12/90] ext4: Add missing unlock_new_inode() call in extent migration code Greg KH
2009-12-11  4:24   ` [13/90] ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories Greg KH
2009-12-11  4:24   ` [14/90] ext4: Limit number of links that can be created by ext4_link() Greg KH
2009-12-11  4:24   ` [15/90] ext4: Restore wbc->range_start in ext4_da_writepages() Greg KH
2009-12-11  4:24   ` [16/90] ext4: fix cache flush in ext4_sync_file Greg KH
2009-12-11  4:24   ` [17/90] ext4: Fix wrong comparisons in mext_check_arguments() Greg KH
2009-12-11  4:24   ` [18/90] ext4: Remove unneeded BUG_ON() in ext4_move_extents() Greg KH
2009-12-11  4:24   ` [19/90] ext4: Return exchanged blocks count to user space in failure Greg KH
2009-12-11  4:24   ` [20/90] ext4: Take page lock before looking at attached buffer_heads flags Greg KH
2009-12-11  4:24   ` [21/90] ext4: print more sysadmin-friendly message in check_block_validity() Greg KH
2009-12-11  4:25   ` [22/90] ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() Greg KH
2009-12-11  4:25   ` [23/90] ext4: Assure that metadata blocks are written during fsync in no journal mode Greg KH
2009-12-11  4:25   ` [24/90] ext4: Make non-journal fsync work properly Greg KH
2009-12-11  4:25   ` [25/90] ext4: move ext4_mb_init_group() function earlier in the mballoc.c Greg KH
2009-12-11  4:25   ` [26/90] ext4: check for need init flag in ext4_mb_load_buddy Greg KH
2009-12-11  4:25   ` [27/90] ext4: Dont update superblock write time when filesystem is read-only Greg KH
2009-12-11  4:25   ` [28/90] ext4: Always set dx_nodes fake_dirent explicitly Greg KH
2009-12-11  4:25   ` [29/90] ext4: Fix initalization of s_flex_groups Greg KH
2009-12-11  4:25   ` [30/90] ext4: Fix include/trace/events/ext4.h to work with Systemtap Greg KH
2009-12-11  4:25   ` [31/90] ext4: Fix small typo for move_extent_per_page() Greg KH
2009-12-11  4:25   ` [32/90] ext4: Replace get_ext_path macro with an inline funciton Greg KH
2009-12-11  4:25   ` [33/90] ext4: Replace BUG_ON() with ext4_error() in move_extents.c Greg KH
2009-12-11  4:25   ` [34/90] ext4: Add null extent check to ext_get_path Greg KH
2009-12-11  4:25   ` [35/90] ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11  4:25   ` [36/90] ext4: limit block allocations for indirect-block files to < 2^32 Greg KH
2009-12-11  4:25   ` [37/90] ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags Greg KH
2009-12-11  4:25   ` [38/90] ext4: Fix the alloc on close after a truncate hueristic Greg KH
2009-12-11  4:25   ` [39/90] ext4: Fix hueristic which avoids group preallocation for closed files Greg KH
2009-12-11  4:25   ` [40/90] ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks Greg KH
2009-12-11  4:25   ` [41/90] ext4: release reserved quota when block reservation for delalloc retry Greg KH
2009-12-11  4:25   ` [42/90] ext4: Split uninitialized extents for direct I/O Greg KH
2009-12-11  4:25   ` [43/90] ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O Greg KH
2009-12-11  4:25   ` [44/90] ext4: async direct IO for holes and fallocate support Greg KH
2009-12-11  4:25   ` [45/90] ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first Greg KH
2009-12-11  4:25   ` [46/90] ext4: Avoid updating the inode table bh twice in no journal mode Greg KH
2009-12-11  4:25   ` [47/90] ext4: Make sure ext4_dirty_inode() updates the inode " Greg KH
2009-12-11  4:25   ` [48/90] ext4: Handle nested ext4_journal_start/stop calls without a journal Greg KH
2009-12-11  4:25   ` [49/90] ext4: Fix time encoding with extra epoch bits Greg KH
2009-12-11  4:25   ` [50/90] ext4: fix a BUG_ON crash by checking that page has buffers attached to it Greg KH
2009-12-11  4:25   ` [51/90] ext4: retry failed direct IO allocations Greg KH
2009-12-11  4:25   ` [52/90] ext4: discard preallocation when restarting a transaction during truncate Greg KH
2009-12-11  4:25   ` [53/90] ext4: fix ext4_ext_direct_IO()s return value after converting uninit extents Greg KH
2009-12-11  4:25   ` [54/90] ext4: skip conversion of uninit extents after direct IO if there isnt any Greg KH
2009-12-11  4:25   ` [55/90] ext4: code clean up for dio fallocate handling Greg KH
2009-12-11  4:25   ` [56/90] ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O Greg KH
2009-12-11  4:25   ` [57/90] ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC Greg KH
2009-12-11  4:25   ` [58/90] ext4: avoid divide by zero when trying to mount a corrupted file system Greg KH
2009-12-11  4:25   ` [59/90] ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails Greg KH
2009-12-11  4:25   ` [60/90] ext4: fix lock order problem in ext4_move_extents() Greg KH
2009-12-11  4:25   ` [61/90] ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11  4:25   ` [62/90] ext4: plug a buffer_head leak in an error path of ext4_iget() Greg KH
2009-12-11  4:25   ` [63/90] ext4: make sure directory and symlink blocks are revoked Greg KH
2009-12-11  4:25   ` [64/90] ext4: fix i_flags access in ext4_da_writepages_trans_blocks() Greg KH
2009-12-11  4:25   ` [65/90] ext4: journal all modifications in ext4_xattr_set_handle Greg KH
2009-12-11  4:25   ` [66/90] ext4: dont update the superblock in ext4_statfs() Greg KH
2009-12-11  4:25   ` [67/90] ext4: fix uninit block bitmap initialization when s_meta_first_bg is non-zero Greg KH
2009-12-11  4:25   ` [68/90] ext4: fix block validity checks so they work correctly with meta_bg Greg KH
2009-12-11  4:25   ` [69/90] ext4: avoid issuing unnecessary barriers Greg KH
2009-12-11  4:25   ` [70/90] ext4: fix error handling in ext4_ind_get_blocks() Greg KH
2009-12-11  4:25   ` [71/90] ext4: make trim/discard optional (and off by default) Greg KH
2009-12-11  4:25   ` [72/90] ext4: make "norecovery" an alias for "noload" Greg KH
2009-12-11  4:25   ` [73/90] ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT Greg KH
2009-12-11  4:25   ` [74/90] ext4: initialize moved_len before calling ext4_move_extents() Greg KH
2009-12-11  4:25   ` [75/90] ext4: move_extent_per_page() cleanup Greg KH
2009-12-11  4:25   ` [76/90] jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() Greg KH
2009-12-11  4:25   ` [77/90] ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() Greg KH
2009-12-11  4:25   ` [78/90] ext4: Avoid data / filesystem corruption when write fails to copy data Greg KH
2009-12-11  4:25   ` [79/90] ext4: wait for log to commit when umounting Greg KH
2009-12-11  4:25   ` [80/90] ext4: remove blocks from inode prealloc list on failure Greg KH
2009-12-11  4:25   ` [81/90] ext4: ext4_get_reserved_space() must return bytes instead of blocks Greg KH
2009-12-11  4:26   ` [82/90] ext4: quota macros cleanup Greg KH
2009-12-11  4:26   ` [83/90] ext4: fix incorrect block reservation on quota transfer Greg KH
2009-12-11  4:26   ` [84/90] ext4: Wait for proper transaction commit on fsync Greg KH
2009-12-11  4:26   ` [85/90] ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT Greg KH
2009-12-11  4:26   ` [86/90] SCSI: megaraid_sas: fix 64 bit sense pointer truncation Greg KH
2009-12-11  4:26   ` [87/90] SCSI: osd_protocol.h: Add missing #include Greg KH
2009-12-11  4:26   ` [88/90] SCSI: scsi_lib_dma: fix bug with dma maps on nested scsi objects Greg KH
2009-12-11  4:26   ` [89/90] signal: Fix alternate signal stack check Greg KH
2009-12-11  4:26   ` [90/90] ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091211042729.262525249@linux.site \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox