From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code
Date: Wed, 20 Mar 2013 14:22:23 +0100 [thread overview]
Message-ID: <20130320132223.GD13294@quack.suse.cz> (raw)
In-Reply-To: <1363742959-12815-1-git-send-email-tytso@mit.edu>
On Tue 19-03-13 21:29:19, Ted Tso wrote:
> Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a
> regression when running xfstest #270 when the file system is mounted
> with dioread_nolock.
>
> The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
> this guarantees that last io_end structure has been freed, but it does
> not guarantee that the workqueue structure, which was moved into the
> inode by commit 84c17543ab56, is actually finished. Once
> ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
> will allow ext4_ioend_wait() to return on CPU #2, at which point the
> evict_inode() codepath can race against the workqueue code on CPU #1
> accessing EXT4_I(inode)->i_unwritten_work to find the next item of
> work to do.
>
> Fix this by calling flush_work() if the work structure is still
> pending in ext$_ioend_wait(). Also, move the call to
> ext4_ioend_wait() until after truncate_inode_pages() and
> filemap_write_and_wait() are called, to make sure all dirty pages have
> been written back and flushed from the page cache first.
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e
> *pdpt = 0000000030bc3001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in:
> Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
> EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0
> EIP is at cwq_activate_delayed_work+0x3b/0x7e
> EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
> ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
> Stack:
> f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
> f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
> f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
> Call Trace:
> [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb
> [<c01def1d>] process_one_work+0x5d8/0x637
> [<c040a10b>] ? ext4_end_bio+0x300/0x300
> [<c01e3105>] worker_thread+0x249/0x3ef
> [<c01ea317>] kthread+0xd8/0xeb
> [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb
> [<c023a370>] ? trace_hardirqs_on+0x27/0x37
> [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28
> [<c01ea23f>] ? __init_kthread_worker+0x71/0x71
> Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1
> 6c c1 00 31 c9 83
> EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
> CR2: 0000000000000000
> ---[ end trace a1923229da53d8a4 ]---
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.cz>
Good catch. Thanks for fixing this. Just one question below:
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 809b310..8a004a4 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -50,11 +50,21 @@ void ext4_exit_pageio(void)
> kmem_cache_destroy(io_page_cachep);
> }
>
> +/*
> + * Called by ext4_evict_inode() to make sure there are no pending I/O
> + * completion work left to do.
> + */
> void ext4_ioend_wait(struct inode *inode)
> {
> wait_queue_head_t *wq = ext4_ioend_wq(inode);
>
> wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0));
> + /*
> + * We need to make sure the work structure is finished before
> + * we let the inode get destroyed by ext4_evict_inode()
> + */
> + if (work_pending(&EXT4_I(inode)->i_unwritten_work))
> + flush_work(&EXT4_I(inode)->i_unwritten_work);
> }
Won't it be more logical to use cancel_work_sync() here?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2013-03-20 13:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-20 1:29 [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code Theodore Ts'o
2013-03-20 1:38 ` Theodore Ts'o
2013-03-20 13:22 ` Jan Kara [this message]
2013-03-20 13:37 ` Theodore Ts'o
2013-03-20 13:42 ` Jan Kara
2013-03-20 13:51 ` Theodore Ts'o
2013-03-20 14:14 ` Eric Sandeen
2013-03-20 14:45 ` Theodore Ts'o
2013-03-20 20:13 ` Jan Kara
2013-03-26 5:52 ` Zheng Liu
2013-03-26 5:55 ` Zheng Liu
2013-03-26 20:34 ` Jan Kara
2013-03-27 3:13 ` Zheng Liu
2013-03-29 7:32 ` Zheng Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130320132223.GD13294@quack.suse.cz \
--to=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.