From: Martin Steigerwald <Martin@lichtvoll.de>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
"Chris Mason" <clm@fb.com>,
miaox@cn.fujitsu.com, "Marc MERLIN" <marc@merlins.org>,
Torbjørn <lists@skagestad.org>
Subject: Re: [PATCH] Btrfs: fix task hang under heavy compressed write
Date: Wed, 13 Aug 2014 13:54:40 +0200 [thread overview]
Message-ID: <2364156.aMAqnATvIX@merkaba> (raw)
In-Reply-To: <1407829499-21902-1-git-send-email-bo.li.liu@oracle.com>
Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
> This has been reported and discussed for a long time, and this hang occurs
> in both 3.15 and 3.16.
Liu, is this safe for testing yet?
Thanks,
Martin
> Btrfs now migrates to use kernel workqueue, but it introduces this hang
> problem.
>
> Btrfs has a kind of work queued as an ordered way, which means that its
> ordered_func() must be processed in the way of FIFO, so it usually looks
> like --
>
> normal_work_helper(arg)
> work = container_of(arg, struct btrfs_work, normal_work);
>
> work->func() <---- (we name it work X)
> for ordered_work in wq->ordered_list
> ordered_work->ordered_func()
> ordered_work->ordered_free()
>
> The hang is a rare case, first when we find free space, we get an uncached
> block group, then we go to read its free space cache inode for free space
> information, so it will
>
> file a readahead request
> btrfs_readpages()
> for page that is not in page cache
> __do_readpage()
> submit_extent_page()
> btrfs_submit_bio_hook()
> btrfs_bio_wq_end_io()
> submit_bio()
> end_workqueue_bio() <--(ret by the 1st
> endio) queue a work(named work Y) for the 2nd also the real endio()
>
> So the hang occurs when work Y's work_struct and work X's work_struct
> happens to share the same address.
>
> A bit more explanation,
>
> A,B,C -- struct btrfs_work
> arg -- struct work_struct
>
> kthread:
> worker_thread()
> pick up a work_struct from @worklist
> process_one_work(arg)
> worker->current_work = arg; <-- arg is A->normal_work
> worker->current_func(arg)
> normal_work_helper(arg)
> A = container_of(arg, struct btrfs_work, normal_work);
>
> A->func()
> A->ordered_func()
> A->ordered_free() <-- A gets freed
>
> B->ordered_func()
> submit_compressed_extents()
> find_free_extent()
> load_free_space_inode()
> ... <-- (the above readhead stack)
> end_workqueue_bio()
> btrfs_queue_work(work C)
> B->ordered_free()
>
> As if work A has a high priority in wq->ordered_list and there are more
> ordered works queued after it, such as B->ordered_func(), its memory could
> have been freed before normal_work_helper() returns, which means that
> kernel workqueue code worker_thread() still has worker->current_work
> pointer to be work A->normal_work's, ie. arg's address.
>
> Meanwhile, work C is allocated after work A is freed, work C->normal_work
> and work A->normal_work are likely to share the same address(I confirmed
> this with ftrace output, so I'm not just guessing, it's rare though).
>
> When another kthread picks up work C->normal_work to process, and finds our
> kthread is processing it(see find_worker_executing_work()), it'll think
> work C as a collision and skip then, which ends up nobody processing work C.
>
> So the situation is that our kthread is waiting forever on work C.
>
> The key point is that they shouldn't have the same address, so this defers
> ->ordered_free() and does a batched free to avoid that.
>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> fs/btrfs/async-thread.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index 5a201d8..2ac01b3 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -195,6 +195,7 @@ static void run_ordered_work(struct __btrfs_workqueue
> *wq) struct btrfs_work *work;
> spinlock_t *lock = &wq->list_lock;
> unsigned long flags;
> + LIST_HEAD(free_list);
>
> while (1) {
> spin_lock_irqsave(lock, flags);
> @@ -219,17 +220,24 @@ static void run_ordered_work(struct __btrfs_workqueue
> *wq)
>
> /* now take the lock again and drop our item from the list */
> spin_lock_irqsave(lock, flags);
> - list_del(&work->ordered_list);
> + list_move_tail(&work->ordered_list, &free_list);
> spin_unlock_irqrestore(lock, flags);
>
> /*
> * we don't want to call the ordered free functions
> * with the lock held though
> */
> + }
> + spin_unlock_irqrestore(lock, flags);
> +
> + while (!list_empty(&free_list)) {
> + work = list_entry(free_list.next, struct btrfs_work,
> + ordered_list);
> +
> + list_del(&work->ordered_list);
> work->ordered_free(work);
> trace_btrfs_all_work_done(work);
> }
> - spin_unlock_irqrestore(lock, flags);
> }
>
> static void normal_work_helper(struct work_struct *arg)
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2014-08-13 11:54 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-12 7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
2014-08-12 14:35 ` [PATCH v2] " Liu Bo
2014-08-12 14:57 ` [PATCH] " Chris Mason
2014-08-13 0:53 ` Qu Wenruo
2014-08-13 11:54 ` Martin Steigerwald [this message]
2014-08-13 13:27 ` Rich Freeman
2014-08-13 15:20 ` Liu Bo
2014-08-14 9:27 ` Martin Steigerwald
2014-08-15 17:51 ` Martin Steigerwald
2014-08-15 15:36 ` [PATCH v3] " Liu Bo
2014-08-15 16:05 ` Chris Mason
2014-08-16 7:28 ` Miao Xie
2014-08-18 7:32 ` Liu Bo
2014-08-25 14:58 ` Chris Mason
2014-08-25 15:19 ` Liu Bo
2014-08-26 10:20 ` Martin Steigerwald
2014-08-26 10:38 ` Liu Bo
2014-08-26 12:04 ` Martin Steigerwald
2014-08-26 13:02 ` Chris Mason
2014-08-26 13:20 ` Martin Steigerwald
2014-08-31 11:48 ` Martin Steigerwald
2014-08-31 15:40 ` Liu Bo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2364156.aMAqnATvIX@merkaba \
--to=martin@lichtvoll.de \
--cc=bo.li.liu@oracle.com \
--cc=clm@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@skagestad.org \
--cc=marc@merlins.org \
--cc=miaox@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).