From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Josef Bacik <jbacik@fb.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue
Date: Wed, 08 Jan 2014 14:25:02 +0800 [thread overview]
Message-ID: <52CCEF3E.5010807@cn.fujitsu.com> (raw)
In-Reply-To: <52B7A158.8000503@cn.fujitsu.com>
On Mon, 23 Dec 2013 10:35:04 +0800
, Qu Wenruo wrote:
> On fri, 20 Dec 2013 05:30:48 -0800, Josef Bacik wrote:
>>
>> On 12/19/2013 07:08 PM, Qu Wenruo wrote:
>>> I'm sorry but I failed to reproduce the problem.
>>> Btrfs/012 in xfstests has been run for serveral hours but nothing
>>> happened.
>>>
>>> Would you please give me some more details about the environment or
>>> the panic backtrace?
>>>
>>
>> Ok so it wasn't that test, it was just ./check -g auto. It would
>> sometimes die at btrfs/012, other times it would make it as far as
>> generic/083 before it keeled over. I bisected it down to
>>
>> btrfs: Replace fs_info->workers with btrfs_workqueue
>>
>> which of course is just the first patch that you start using the new
>> code which isn't helpful. My dmesg from one of my runs is here
>>
>> http://ur1.ca/g867b
>>
>> All of the initial panics looked exactly the same. I'm just going to
>> kick the series out for now while you track down the problem. Thanks,
>>
>> Josef
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> Thanks for your dmesg a lot.
>
> It's quite sad that I still failed to reproduce the same bug on all my
> test environment during the weekend.
>
> But according to the dmesg, which is expired now though, I made the
> following assumption:
>
> There is a possibility that out of order execution made the "work->wq
> = wq" sentence executed behind the "queue_work()" call,
> and the "normal_work_helper" is executed in another CPU instantly,
> which caused the problem.
>
> ------
> static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
> struct btrfs_work *work)
> {
> unsigned long flags;
>
> work->wq = wq;
> /* There is no memory barrier ensure the work->wq = wq is executed
> before queue_work */
> thresh_queue_hook(wq);
> if (work->ordered_func) {
> spin_lock_irqsave(&wq->list_lock, flags);
> list_add_tail(&work->ordered_list, &wq->ordered_list);
> spin_unlock_irqrestore(&wq->list_lock, flags);
> }
> queue_work(wq->normal_wq, &work->normal_work);
> }
> ------
>
> The memory order problem involves compiler or even CPU hardware (AMD
> CPUS seems to have a weaker memory order than Intel),
> so to simulate the bug, I used the following codes tried to reproduce
> the bug, and the result is pretty much the same as your dmesg.
> ------
> static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
> struct btrfs_work *work)
> {
> unsigned long flags;
> int in_order = get_random_int() % 1000
>
> if (in_order)
> work->wq = wq;
> thresh_queue_hook(wq);
> if (work->ordered_func) {
> spin_lock_irqsave(&wq->list_lock, flags);
> list_add_tail(&work->ordered_list, &wq->ordered_list);
> spin_unlock_irqrestore(&wq->list_lock, flags);
> }
> queue_work(wq->normal_wq, &work->normal_work);
>
> /* Try make the codes out of order since the Intel CPU
> seems obey the memory order quite well */
> if (!in_order)
> work->wq = wq;
> }
> ------
>
> The fix is as easy as just adding a smp_wmb() behind the "work->wq =
> wq" sentence, but since I failed to reproduce the bug,
> I can't confirm whether the fix is good or not.
>
> I know it's impolite but would you please first try the smp_wmb()
> first to confirm the fix?
>
> If the fix is good I'll send the formal V5 patchset.
>
> Thanks
>
> Qu
Happy new year!
Very sorry for disturbing but ithas been some time before my last replay
and no feedback since then (mainly because of the new year holiday).
If itis convenient for you,would you please give some feedback about the
smp_wmb() fix ?
Also incase that smp_wmb() doesn't work, it would be very nice of you to
also try smp_mb() fix.
Thanks
Qu
next prev parent reply other threads:[~2014-01-08 6:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 9:31 [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 01/18] btrfs: Cleanup the unused struct async_sched Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 04/18] btrfs: Add threshold workqueue based on kernel workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 05/18] btrfs: Replace fs_info->workers with btrfs_workqueue Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 06/18] btrfs: Replace fs_info->delalloc_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 07/18] btrfs: Replace fs_info->submit_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 08/18] btrfs: Replace fs_info->flush_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 09/18] btrfs: Replace fs_info->endio_* workqueue " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 10/18] btrfs: Replace fs_info->rmw_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 11/18] btrfs: Replace fs_info->cache_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 12/18] btrfs: Replace fs_info->readahead_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 13/18] btrfs: Replace fs_info->fixup_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 14/18] btrfs: Replace fs_info->delayed_workers " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 15/18] btrfs: Replace fs_info->qgroup_rescan_worker " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 16/18] btrfs: Replace fs_info->scrub_* " Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 17/18] btrfs: Cleanup the old btrfs_worker Qu Wenruo
2013-12-17 9:31 ` [PATCH v4 18/18] btrfs: Cleanup the "_struct" suffix in btrfs_workequeue Qu Wenruo
2013-12-19 15:27 ` [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Josef Bacik
2013-12-20 0:37 ` Qu Wenruo
2013-12-20 3:08 ` Qu Wenruo
2013-12-20 13:30 ` Josef Bacik
2013-12-23 2:35 ` Qu Wenruo
2014-01-08 6:25 ` Qu Wenruo [this message]
2014-01-08 18:51 ` David Sterba
2014-01-09 1:35 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52CCEF3E.5010807@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).