From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Josef Bacik <jbacik@fb.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue
Date: Mon, 23 Dec 2013 10:35:04 +0800 [thread overview]
Message-ID: <52B7A158.8000503@cn.fujitsu.com> (raw)
In-Reply-To: <52B44688.5070603@fb.com>
On fri, 20 Dec 2013 05:30:48 -0800, Josef Bacik wrote:
>
> On 12/19/2013 07:08 PM, Qu Wenruo wrote:
>> I'm sorry but I failed to reproduce the problem.
>> Btrfs/012 in xfstests has been run for serveral hours but nothing
>> happened.
>>
>> Would you please give me some more details about the environment or
>> the panic backtrace?
>>
>
> Ok so it wasn't that test, it was just ./check -g auto. It would
> sometimes die at btrfs/012, other times it would make it as far as
> generic/083 before it keeled over. I bisected it down to
>
> btrfs: Replace fs_info->workers with btrfs_workqueue
>
> which of course is just the first patch that you start using the new
> code which isn't helpful. My dmesg from one of my runs is here
>
> http://ur1.ca/g867b
>
> All of the initial panics looked exactly the same. I'm just going to
> kick the series out for now while you track down the problem. Thanks,
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Thanks for your dmesg a lot.
It's quite sad that I still failed to reproduce the same bug on all my
test environment during the weekend.
But according to the dmesg, which is expired now though, I made the
following assumption:
There is a possibility that out of order execution made the "work->wq =
wq" sentence executed behind the "queue_work()" call,
and the "normal_work_helper" is executed in another CPU instantly, which
caused the problem.
------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
struct btrfs_work *work)
{
unsigned long flags;
work->wq = wq;
/* There is no memory barrier ensure the work->wq = wq is executed
before queue_work */
thresh_queue_hook(wq);
if (work->ordered_func) {
spin_lock_irqsave(&wq->list_lock, flags);
list_add_tail(&work->ordered_list, &wq->ordered_list);
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);
}
------
The memory order problem involves compiler or even CPU hardware (AMD
CPUS seems to have a weaker memory order than Intel),
so to simulate the bug, I used the following codes tried to reproduce
the bug, and the result is pretty much the same as your dmesg.
------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
struct btrfs_work *work)
{
unsigned long flags;
int in_order = get_random_int() % 1000
if (in_order)
work->wq = wq;
thresh_queue_hook(wq);
if (work->ordered_func) {
spin_lock_irqsave(&wq->list_lock, flags);
list_add_tail(&work->ordered_list, &wq->ordered_list);
spin_unlock_irqrestore(&wq->list_lock, flags);
}
queue_work(wq->normal_wq, &work->normal_work);
/* Try make the codes out of order since the Intel CPU
seems obey the memory order quite well */
if (!in_order)
work->wq = wq;
}
------
The fix is as easy as just adding a smp_wmb() behind the "work->wq = wq"
sentence, but since I failed to reproduce the bug,
I can't confirm whether the fix is good or not.
I know it's impolite but would you please first try the smp_wmb() first
to confirm the fix?
If the fix is good I'll send the formal V5 patchset.
Thanks
Qu
next prev parent reply other threads:[~2013-12-23 2:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 9:31 [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 01/18] btrfs: Cleanup the unused struct async_sched Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 04/18] btrfs: Add threshold workqueue based on kernel workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 05/18] btrfs: Replace fs_info->workers with btrfs_workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 06/18] btrfs: Replace fs_info->delalloc_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 07/18] btrfs: Replace fs_info->submit_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 08/18] btrfs: Replace fs_info->flush_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 09/18] btrfs: Replace fs_info->endio_* workqueue " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 10/18] btrfs: Replace fs_info->rmw_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 11/18] btrfs: Replace fs_info->cache_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 12/18] btrfs: Replace fs_info->readahead_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 13/18] btrfs: Replace fs_info->fixup_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 14/18] btrfs: Replace fs_info->delayed_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 15/18] btrfs: Replace fs_info->qgroup_rescan_worker " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 16/18] btrfs: Replace fs_info->scrub_* " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 17/18] btrfs: Cleanup the old btrfs_worker Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 18/18] btrfs: Cleanup the "_struct" suffix in btrfs_workequeue Qu Wenruo
2013-12-19 15:27 ` [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Josef Bacik
2013-12-20 0:37 ` Qu Wenruo
2013-12-20 3:08 ` Qu Wenruo
2013-12-20 13:30 ` Josef Bacik
2013-12-23 2:35 ` Qu Wenruo [this message]
2014-01-08 6:25 ` Qu Wenruo
2014-01-08 18:51 ` David Sterba
2014-01-09 1:35 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52B7A158.8000503@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).