linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Whitney <enwlinux@gmail.com>
To: Eric Whitney <enwlinux@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	jack@suse.cz, stable@vger.kernel.org
Subject: Re: [PATCH] ext4: fix race between truncate and __ext4_journalled_writepage()
Date: Tue, 16 Jun 2015 15:29:20 -0400	[thread overview]
Message-ID: <20150616192920.GA23675@localhost.localdomain> (raw)
In-Reply-To: <20150615203907.GA2306@localhost.localdomain>

* Eric Whitney <enwlinux@gmail.com>:
> * Theodore Ts'o <tytso@mit.edu>:
> > The commit cf108bca465d: "ext4: Invert the locking order of page_lock
> > and transaction start" caused __ext4_journalled_writepage() to drop
> > the page lock before the page was written back, as part of changing
> > the locking order to jbd2_journal_start -> page_lock.  However, this
> > introduced a potential race if there was a truncate racing with the
> > data=journalled writeback mode.
> > 
> > Fix this by grabbing the page lock after starting the journal handle,
> > and then checking to see if page had gotten truncated out from under
> > us.
> > 
> > This fixes a number of different crashes or BUG_ON's when running
> > xfstests generic/086 in data=journalled mode, including:
> > 
> > jbd2_journal_dirty_metadata: vdc-8: bad jh for block 84434: transaction (ec90434
> > ransaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0
> > 
> > 	      	      	  - and -
> > 
> > kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
> >     ...
> > Call Trace:
> >  [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
> >  [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
> >  [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
> >  [<c027d883>] ? lock_buffer+0x36/0x36
> >  [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
> >  [<c0229139>] do_invalidatepage+0x22/0x26
> >  [<c0229198>] truncate_inode_page+0x5b/0x85
> >  [<c022934b>] truncate_inode_pages_range+0x156/0x38c
> >  [<c0229592>] truncate_inode_pages+0x11/0x15
> >  [<c022962d>] truncate_pagecache+0x55/0x71
> >  [<c02b913b>] ext4_setattr+0x4a9/0x560
> >  [<c01ca542>] ? current_kernel_time+0x10/0x44
> >  [<c026c4d8>] notify_change+0x1c7/0x2be
> >  [<c0256a00>] do_truncate+0x65/0x85
> >  [<c0226f31>] ? file_ra_state_init+0x12/0x29
> > 
> > 	      	      	  - and -
> > 
> > WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
> > irty_metadata+0x14a/0x1ae()
> >     ...
> > Call Trace:
> >  [<c01b879f>] ? console_unlock+0x3a1/0x3ce
> >  [<c082cbb4>] dump_stack+0x48/0x60
> >  [<c0178b65>] warn_slowpath_common+0x89/0xa0
> >  [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
> >  [<c0178bef>] warn_slowpath_null+0x14/0x18
> >  [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
> >  [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
> >  [<c02b2f44>] write_end_fn+0x40/0x53
> >  [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
> >  [<c02b59e7>] ext4_writepage+0x354/0x3b8
> >  [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
> >  [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
> >  [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
> >  [<c02b5a5b>] __writepage+0x10/0x2e
> >  [<c0225956>] write_cache_pages+0x22d/0x32c
> >  [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
> >  [<c02b6ee8>] ext4_writepages+0x102/0x607
> >  [<c019adfe>] ? sched_clock_local+0x10/0x10e
> >  [<c01a8a7c>] ? __lock_is_held+0x2e/0x44
> >  [<c01a8ad5>] ? lock_is_held+0x43/0x51
> >  [<c0226dff>] do_writepages+0x1c/0x29
> >  [<c0276bed>] __writeback_single_inode+0xc3/0x545
> >  [<c0277c07>] writeback_sb_inodes+0x21f/0x36d
> >     ...
> > 
> > Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> > Cc: stable@vger.kernel.org
> > ---
> >  fs/ext4/inode.c | 23 +++++++++++++++++++----
> >  1 file changed, 19 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 0554b0b..263a46c 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -1701,19 +1701,32 @@ static int __ext4_journalled_writepage(struct page *page,
> >  		ext4_walk_page_buffers(handle, page_bufs, 0, len,
> >  				       NULL, bget_one);
> >  	}
> > -	/* As soon as we unlock the page, it can go away, but we have
> > -	 * references to buffers so we are safe */
> > +	/*
> > +	 * We need to release the page lock before we start the
> > +	 * journal, so grab a reference so the page won't disappear
> > +	 * out from under us.
> > +	 */
> > +	get_page(page);
> >  	unlock_page(page);
> >  
> >  	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> >  				    ext4_writepage_trans_blocks(inode));
> >  	if (IS_ERR(handle)) {
> >  		ret = PTR_ERR(handle);
> > -		goto out;
> > +		put_page(page);
> > +		goto out_no_pagelock;
> >  	}
> > -
> >  	BUG_ON(!ext4_handle_valid(handle));
> >  
> > +	lock_page(page);
> > +	put_page(page);
> > +	if (page->mapping != mapping) {
> > +		/* The page got truncated from under us */
> > +		ext4_journal_stop(handle);
> > +		ret = 0;
> > +		goto out;
> > +	}
> > +
> >  	if (inline_data) {
> >  		BUFFER_TRACE(inode_bh, "get write access");
> >  		ret = ext4_journal_get_write_access(handle, inode_bh);
> > @@ -1739,6 +1752,8 @@ static int __ext4_journalled_writepage(struct page *page,
> >  				       NULL, bput_one);
> >  	ext4_set_inode_state(inode, EXT4_STATE_JDATA);
> >  out:
> > +	unlock_page(page);
> > +out_no_pagelock:
> >  	brelse(inode_bh);
> >  	return ret;
> >  }
> > -- 
> > 2.3.0
> >
> 
> This patch looks promising.  I'm running a 1000 trial stress test on a
> Pandaboard where I've generally been able to force a couple of manifestations
> of this bug to appear within 5 to 10 runs.  Applied to 4.1-rc7, it's passed
> 135 trials cleanly.  The full series will complete sometime tomorrow.
> 
> I was also able to reproduce the problem on x86_64 pretty consistently in
> four runs or less on 4.1-rc7;  I'm planning a stress test there as well once
> -rc8 regression is complete.
> 

1000 consecutive runs of generic/068 on the data_journal test case have
completed successfully on an x86_64 guest and a Pandaboard running a
4.1-rc7 kernel with this patch.  That's a couple of orders of magnitude more
runs than have previously completed without hangs on these test system
configurations, so

Tested-by: Eric Whitney <enwlinux@gmail.com>

if useful at this point.

Eric

      reply	other threads:[~2015-06-16 19:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 18:13 [PATCH] xfstests-bld: exclude ext4 defrag tests from unsupported configurations Eric Whitney
2015-06-10 19:28 ` Theodore Ts'o
2015-06-11 20:00   ` Eric Whitney
2015-06-12  0:24     ` Eric Whitney
2015-06-15  1:14       ` Theodore Ts'o
2015-06-15  1:23         ` [PATCH] ext4: fix race between truncate and __ext4_journalled_writepage() Theodore Ts'o
2015-06-15 12:33           ` Jan Kara
2015-06-15 13:06             ` Theodore Ts'o
2015-06-15 17:03               ` Jan Kara
2015-06-15 19:37                 ` Theodore Ts'o
2015-06-15 17:59               ` Andreas Dilger
2015-06-16 12:57                 ` Jan Kara
2015-06-15 20:39           ` Eric Whitney
2015-06-16 19:29             ` Eric Whitney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150616192920.GA23675@localhost.localdomain \
    --to=enwlinux@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).