From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
enwlinux@gmail.com, jack@suse.cz, stable@vger.kernel.org
Subject: Re: [PATCH] ext4: fix race between truncate and __ext4_journalled_writepage()
Date: Mon, 15 Jun 2015 14:33:52 +0200 [thread overview]
Message-ID: <20150615123352.GD4368@quack.suse.cz> (raw)
In-Reply-To: <1434331430-23125-1-git-send-email-tytso@mit.edu>
On Sun 14-06-15 21:23:50, Ted Tso wrote:
> The commit cf108bca465d: "ext4: Invert the locking order of page_lock
> and transaction start" caused __ext4_journalled_writepage() to drop
> the page lock before the page was written back, as part of changing
> the locking order to jbd2_journal_start -> page_lock. However, this
> introduced a potential race if there was a truncate racing with the
> data=journalled writeback mode.
>
> Fix this by grabbing the page lock after starting the journal handle,
> and then checking to see if page had gotten truncated out from under
> us.
>
> This fixes a number of different crashes or BUG_ON's when running
> xfstests generic/086 in data=journalled mode, including:
>
> jbd2_journal_dirty_metadata: vdc-8: bad jh for block 84434: transaction (ec90434
> ransaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0
>
> - and -
>
> kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
Yeah, that's nasty. Thanks for debugging this! However I think your fix
reintroduces the original deadlock issues. do_journal_get_write_access()
can end up blocking waiting for jbd2 thread to finish a commit while jbd2
thread may be blocked waiting for the page to be unlocked.
After some thought I don't think the deadlock is real since
do_journal_get_write_access() will currently only block if a buffer is
under writeout to the journal and at that point we don't wait for page
locks anymore. Also ext4_write_begin() does the same in data=journal mode
and we haven't observed deadlocks so far. But still things look really
fragile here.
A clean fix for these problems would be to implement
ext4_journalled_writepages() which will start a transaction and then
writeback a bunch of pages. Similarly for write_begin() case we could start
the transaction in ext4_write() (and loop there since a single write may
need to be split among several transactions). However this is relatively
extensive work given how rarely the code is used...
So for now, feel free to add:
Acked-by: Jan Kara <jack@suse.cz>
to the patch.
Honza
> ---
> fs/ext4/inode.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0554b0b..263a46c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1701,19 +1701,32 @@ static int __ext4_journalled_writepage(struct page *page,
> ext4_walk_page_buffers(handle, page_bufs, 0, len,
> NULL, bget_one);
> }
> - /* As soon as we unlock the page, it can go away, but we have
> - * references to buffers so we are safe */
> + /*
> + * We need to release the page lock before we start the
> + * journal, so grab a reference so the page won't disappear
> + * out from under us.
> + */
> + get_page(page);
> unlock_page(page);
>
> handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> ext4_writepage_trans_blocks(inode));
> if (IS_ERR(handle)) {
> ret = PTR_ERR(handle);
> - goto out;
> + put_page(page);
> + goto out_no_pagelock;
> }
> -
> BUG_ON(!ext4_handle_valid(handle));
>
> + lock_page(page);
> + put_page(page);
> + if (page->mapping != mapping) {
> + /* The page got truncated from under us */
> + ext4_journal_stop(handle);
> + ret = 0;
> + goto out;
> + }
> +
> if (inline_data) {
> BUFFER_TRACE(inode_bh, "get write access");
> ret = ext4_journal_get_write_access(handle, inode_bh);
> @@ -1739,6 +1752,8 @@ static int __ext4_journalled_writepage(struct page *page,
> NULL, bput_one);
> ext4_set_inode_state(inode, EXT4_STATE_JDATA);
> out:
> + unlock_page(page);
> +out_no_pagelock:
> brelse(inode_bh);
> return ret;
> }
> --
> 2.3.0
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2015-06-15 12:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-10 18:13 [PATCH] xfstests-bld: exclude ext4 defrag tests from unsupported configurations Eric Whitney
2015-06-10 19:28 ` Theodore Ts'o
2015-06-11 20:00 ` Eric Whitney
2015-06-12 0:24 ` Eric Whitney
2015-06-15 1:14 ` Theodore Ts'o
2015-06-15 1:23 ` [PATCH] ext4: fix race between truncate and __ext4_journalled_writepage() Theodore Ts'o
2015-06-15 12:33 ` Jan Kara [this message]
2015-06-15 13:06 ` Theodore Ts'o
2015-06-15 17:03 ` Jan Kara
2015-06-15 19:37 ` Theodore Ts'o
2015-06-15 17:59 ` Andreas Dilger
2015-06-16 12:57 ` Jan Kara
2015-06-15 20:39 ` Eric Whitney
2015-06-16 19:29 ` Eric Whitney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150615123352.GD4368@quack.suse.cz \
--to=jack@suse.cz \
--cc=enwlinux@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.