From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Whitney Subject: Re: [PATCH] ext4: fix race between truncate and __ext4_journalled_writepage() Date: Mon, 15 Jun 2015 16:39:07 -0400 Message-ID: <20150615203907.GA2306@localhost.localdomain> References: <20150615011433.GA15793@thunk.org> <1434331430-23125-1-git-send-email-tytso@mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Ext4 Developers List , enwlinux@gmail.com, jack@suse.cz, stable@vger.kernel.org To: Theodore Ts'o Return-path: Content-Disposition: inline In-Reply-To: <1434331430-23125-1-git-send-email-tytso@mit.edu> Sender: stable-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org * Theodore Ts'o : > The commit cf108bca465d: "ext4: Invert the locking order of page_lock > and transaction start" caused __ext4_journalled_writepage() to drop > the page lock before the page was written back, as part of changing > the locking order to jbd2_journal_start -> page_lock. However, this > introduced a potential race if there was a truncate racing with the > data=journalled writeback mode. > > Fix this by grabbing the page lock after starting the journal handle, > and then checking to see if page had gotten truncated out from under > us. > > This fixes a number of different crashes or BUG_ON's when running > xfstests generic/086 in data=journalled mode, including: > > jbd2_journal_dirty_metadata: vdc-8: bad jh for block 84434: transaction (ec90434 > ransaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0 > > - and - > > kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200! > ... > Call Trace: > [] ? __ext4_journalled_invalidatepage+0x117/0x117 > [] __ext4_journalled_invalidatepage+0x10f/0x117 > [] ? __ext4_journalled_invalidatepage+0x117/0x117 > [] ? lock_buffer+0x36/0x36 > [] ext4_journalled_invalidatepage+0xd/0x22 > [] do_invalidatepage+0x22/0x26 > [] truncate_inode_page+0x5b/0x85 > [] truncate_inode_pages_range+0x156/0x38c > [] truncate_inode_pages+0x11/0x15 > [] truncate_pagecache+0x55/0x71 > [] ext4_setattr+0x4a9/0x560 > [] ? current_kernel_time+0x10/0x44 > [] notify_change+0x1c7/0x2be > [] do_truncate+0x65/0x85 > [] ? file_ra_state_init+0x12/0x29 > > - and - > > WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396 > irty_metadata+0x14a/0x1ae() > ... > Call Trace: > [] ? console_unlock+0x3a1/0x3ce > [] dump_stack+0x48/0x60 > [] warn_slowpath_common+0x89/0xa0 > [] ? jbd2_journal_dirty_metadata+0x14a/0x1ae > [] warn_slowpath_null+0x14/0x18 > [] jbd2_journal_dirty_metadata+0x14a/0x1ae > [] __ext4_handle_dirty_metadata+0xd4/0x19d > [] write_end_fn+0x40/0x53 > [] ext4_walk_page_buffers+0x4e/0x6a > [] ext4_writepage+0x354/0x3b8 > [] ? mpage_release_unused_pages+0xd4/0xd4 > [] ? wait_on_buffer+0x2c/0x2c > [] ? ext4_writepage+0x3b8/0x3b8 > [] __writepage+0x10/0x2e > [] write_cache_pages+0x22d/0x32c > [] ? ext4_writepage+0x3b8/0x3b8 > [] ext4_writepages+0x102/0x607 > [] ? sched_clock_local+0x10/0x10e > [] ? __lock_is_held+0x2e/0x44 > [] ? lock_is_held+0x43/0x51 > [] do_writepages+0x1c/0x29 > [] __writeback_single_inode+0xc3/0x545 > [] writeback_sb_inodes+0x21f/0x36d > ... > > Signed-off-by: Theodore Ts'o > Cc: stable@vger.kernel.org > --- > fs/ext4/inode.c | 23 +++++++++++++++++++---- > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 0554b0b..263a46c 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1701,19 +1701,32 @@ static int __ext4_journalled_writepage(struct page *page, > ext4_walk_page_buffers(handle, page_bufs, 0, len, > NULL, bget_one); > } > - /* As soon as we unlock the page, it can go away, but we have > - * references to buffers so we are safe */ > + /* > + * We need to release the page lock before we start the > + * journal, so grab a reference so the page won't disappear > + * out from under us. > + */ > + get_page(page); > unlock_page(page); > > handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, > ext4_writepage_trans_blocks(inode)); > if (IS_ERR(handle)) { > ret = PTR_ERR(handle); > - goto out; > + put_page(page); > + goto out_no_pagelock; > } > - > BUG_ON(!ext4_handle_valid(handle)); > > + lock_page(page); > + put_page(page); > + if (page->mapping != mapping) { > + /* The page got truncated from under us */ > + ext4_journal_stop(handle); > + ret = 0; > + goto out; > + } > + > if (inline_data) { > BUFFER_TRACE(inode_bh, "get write access"); > ret = ext4_journal_get_write_access(handle, inode_bh); > @@ -1739,6 +1752,8 @@ static int __ext4_journalled_writepage(struct page *page, > NULL, bput_one); > ext4_set_inode_state(inode, EXT4_STATE_JDATA); > out: > + unlock_page(page); > +out_no_pagelock: > brelse(inode_bh); > return ret; > } > -- > 2.3.0 > This patch looks promising. I'm running a 1000 trial stress test on a Pandaboard where I've generally been able to force a couple of manifestations of this bug to appear within 5 to 10 runs. Applied to 4.1-rc7, it's passed 135 trials cleanly. The full series will complete sometime tomorrow. I was also able to reproduce the problem on x86_64 pretty consistently in four runs or less on 4.1-rc7; I'm planning a stress test there as well once -rc8 regression is complete. Thanks! Eric