linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Josef Bacik <jbacik@fb.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue
Date: Mon, 23 Dec 2013 10:35:04 +0800	[thread overview]
Message-ID: <52B7A158.8000503@cn.fujitsu.com> (raw)
In-Reply-To: <52B44688.5070603@fb.com>

On fri, 20 Dec 2013 05:30:48 -0800, Josef Bacik wrote:
>
> On 12/19/2013 07:08 PM, Qu Wenruo wrote:
>> I'm sorry but I failed to reproduce the problem.
>> Btrfs/012 in xfstests has been run for serveral hours but nothing 
>> happened.
>>
>> Would you please give me some more details about the environment or 
>> the panic backtrace?
>>
>
> Ok so it wasn't that test, it was just ./check -g auto. It would 
> sometimes die at btrfs/012, other times it would make it as far as 
> generic/083 before it keeled over. I bisected it down to
>
> btrfs: Replace fs_info->workers with btrfs_workqueue
>
> which of course is just the first patch that you start using the new 
> code which isn't helpful. My dmesg from one of my runs is here
>
> http://ur1.ca/g867b
>
> All of the initial panics looked exactly the same. I'm just going to 
> kick the series out for now while you track down the problem. Thanks,
>
> Josef
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Thanks for your dmesg a lot.

It's quite sad that I still failed to reproduce the same bug on all my 
test environment during the weekend.

But according to the dmesg, which is expired now though, I made the 
following assumption:

There is a possibility that out of order execution made the "work->wq = 
wq" sentence executed behind the "queue_work()" call,
and the "normal_work_helper" is executed in another CPU instantly, which 
caused the problem.

------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
				      struct btrfs_work *work)
{
	unsigned long flags;

	work->wq = wq;
/* There is no memory barrier ensure the work->wq = wq is executed
before queue_work */
	thresh_queue_hook(wq);
	if (work->ordered_func) {
		spin_lock_irqsave(&wq->list_lock, flags);
		list_add_tail(&work->ordered_list, &wq->ordered_list);
		spin_unlock_irqrestore(&wq->list_lock, flags);
	}
	queue_work(wq->normal_wq, &work->normal_work);
}
------

The memory order problem involves compiler or even CPU hardware (AMD 
CPUS seems to have a weaker memory order than Intel),
so to simulate the bug, I used the following codes tried to reproduce 
the bug, and the result is pretty much the same as your dmesg.
------
static inline void __btrfs_queue_work(struct __btrfs_workqueue *wq,
				      struct btrfs_work *work)
{
	unsigned long flags;
	int in_order = get_random_int() % 1000

	if (in_order)
		work->wq = wq;
	thresh_queue_hook(wq);
	if (work->ordered_func) {
		spin_lock_irqsave(&wq->list_lock, flags);
		list_add_tail(&work->ordered_list, &wq->ordered_list);
		spin_unlock_irqrestore(&wq->list_lock, flags);
	}
	queue_work(wq->normal_wq, &work->normal_work);

/* Try make the codes out of order since the Intel CPU
seems obey the memory order quite well */

	if (!in_order)
		work->wq = wq;
}
------

The fix is as easy as just adding a smp_wmb() behind the "work->wq = wq" 
sentence, but since I failed to reproduce the bug,
I can't confirm whether the fix is good or not.

I know it's impolite but would you please first try the smp_wmb() first 
to confirm the fix?

If the fix is good I'll send the formal V5 patchset.

Thanks

Qu

  reply	other threads:[~2013-12-23  2:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17  9:31 [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 01/18] btrfs: Cleanup the unused struct async_sched Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 02/18] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 03/18] btrfs: Add high priority workqueue support for btrfs_workqueue_struct Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 04/18] btrfs: Add threshold workqueue based on kernel workqueue Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 05/18] btrfs: Replace fs_info->workers with btrfs_workqueue Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 06/18] btrfs: Replace fs_info->delalloc_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 07/18] btrfs: Replace fs_info->submit_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 08/18] btrfs: Replace fs_info->flush_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 09/18] btrfs: Replace fs_info->endio_* workqueue " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 10/18] btrfs: Replace fs_info->rmw_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 11/18] btrfs: Replace fs_info->cache_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 12/18] btrfs: Replace fs_info->readahead_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 13/18] btrfs: Replace fs_info->fixup_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 14/18] btrfs: Replace fs_info->delayed_workers " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 15/18] btrfs: Replace fs_info->qgroup_rescan_worker " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 16/18] btrfs: Replace fs_info->scrub_* " Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 17/18] btrfs: Cleanup the old btrfs_worker Qu Wenruo
2013-12-17  9:31 ` [PATCH v4 18/18] btrfs: Cleanup the "_struct" suffix in btrfs_workequeue Qu Wenruo
2013-12-19 15:27 ` [PATCH v4 00/18] Replace btrfs_workers with kernel workqueue based btrfs_workqueue Josef Bacik
2013-12-20  0:37   ` Qu Wenruo
2013-12-20  3:08   ` Qu Wenruo
2013-12-20 13:30     ` Josef Bacik
2013-12-23  2:35       ` Qu Wenruo [this message]
2014-01-08  6:25         ` Qu Wenruo
2014-01-08 18:51           ` David Sterba
2014-01-09  1:35             ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52B7A158.8000503@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).