linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code
Date: Wed, 20 Mar 2013 14:22:23 +0100	[thread overview]
Message-ID: <20130320132223.GD13294@quack.suse.cz> (raw)
In-Reply-To: <1363742959-12815-1-git-send-email-tytso@mit.edu>

On Tue 19-03-13 21:29:19, Ted Tso wrote:
> Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a
> regression when running xfstest #270 when the file system is mounted
> with dioread_nolock.
> 
> The problem is that after ext4_evict_inode() calls ext4_ioend_wait(),
> this guarantees that last io_end structure has been freed, but it does
> not guarantee that the workqueue structure, which was moved into the
> inode by commit 84c17543ab56, is actually finished.  Once
> ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this
> will allow ext4_ioend_wait() to return on CPU #2, at which point the
> evict_inode() codepath can race against the workqueue code on CPU #1
> accessing EXT4_I(inode)->i_unwritten_work to find the next item of
> work to do.
> 
> Fix this by calling flush_work() if the work structure is still
> pending in ext$_ioend_wait().  Also, move the call to
> ext4_ioend_wait() until after truncate_inode_pages() and
> filemap_write_and_wait() are called, to make sure all dirty pages have
> been written back and flushed from the page cache first.
> 
> BUG: unable to handle kernel NULL pointer dereference at   (null)
> IP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e
> *pdpt = 0000000030bc3001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in:
> Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs
> EIP: 0060:[<c01dda6a>] EFLAGS: 00010046 CPU: 0
> EIP is at cwq_activate_delayed_work+0x3b/0x7e
> EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000
> ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000)
> Stack:
>  f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d
>  f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0
>  f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000
> Call Trace:
>  [<c01de3d7>] cwq_dec_nr_in_flight+0x71/0xfb
>  [<c01def1d>] process_one_work+0x5d8/0x637
>  [<c040a10b>] ? ext4_end_bio+0x300/0x300
>  [<c01e3105>] worker_thread+0x249/0x3ef
>  [<c01ea317>] kthread+0xd8/0xeb
>  [<c01e2ebc>] ? manage_workers+0x4bb/0x4bb
>  [<c023a370>] ? trace_hardirqs_on+0x27/0x37
>  [<c0f1b4b7>] ret_from_kernel_thread+0x1b/0x28
>  [<c01ea23f>] ? __init_kthread_worker+0x71/0x71
> Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1
>  6c c1 00 31 c9 83
> EIP: [<c01dda6a>] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84
> CR2: 0000000000000000
> ---[ end trace a1923229da53d8a4 ]---
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.cz>
  Good catch. Thanks for fixing this. Just one question below:

> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 809b310..8a004a4 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -50,11 +50,21 @@ void ext4_exit_pageio(void)
>  	kmem_cache_destroy(io_page_cachep);
>  }
>  
> +/*
> + * Called by ext4_evict_inode() to make sure there are no pending I/O
> + * completion work left to do.
> + */
>  void ext4_ioend_wait(struct inode *inode)
>  {
>  	wait_queue_head_t *wq = ext4_ioend_wq(inode);
>  
>  	wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0));
> +	/*
> +	 * We need to make sure the work structure is finished before
> +	 * we let the inode get destroyed by ext4_evict_inode()
> +	 */
> +	if (work_pending(&EXT4_I(inode)->i_unwritten_work))
> +		flush_work(&EXT4_I(inode)->i_unwritten_work);
>  }
  Won't it be more logical to use cancel_work_sync() here?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2013-03-20 13:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-20  1:29 [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code Theodore Ts'o
2013-03-20  1:38 ` Theodore Ts'o
2013-03-20 13:22 ` Jan Kara [this message]
2013-03-20 13:37   ` Theodore Ts'o
2013-03-20 13:42     ` Jan Kara
2013-03-20 13:51       ` Theodore Ts'o
2013-03-20 14:14 ` Eric Sandeen
2013-03-20 14:45   ` Theodore Ts'o
2013-03-20 20:13     ` Jan Kara
2013-03-26  5:52     ` Zheng Liu
2013-03-26  5:55       ` Zheng Liu
2013-03-26 20:34       ` Jan Kara
2013-03-27  3:13         ` Zheng Liu
2013-03-29  7:32         ` Zheng Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130320132223.GD13294@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).