All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Chris Mason <clm@fb.com>
Cc: Martin Steigerwald <Martin@lichtvoll.de>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix compressed write corruption on enospc
Date: Thu, 7 Aug 2014 15:50:30 +0800	[thread overview]
Message-ID: <20140807075029.GA29710@localhost.localdomain> (raw)
In-Reply-To: <53E2CDD9.5080009@fb.com>

Hi,

On Wed, Aug 06, 2014 at 08:52:41PM -0400, Chris Mason wrote:
> On 08/06/2014 11:18 AM, Chris Mason wrote:
> > On 08/06/2014 10:43 AM, Martin Steigerwald wrote:
> >> Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason:
> >>> On 08/06/2014 06:21 AM, Martin Steigerwald wrote:
> >>>>> I think this should go to stable. Thanks, Liu.
> >>>
> >>> I'm definitely tagging this for stable.
> >>>
> >>>> Unfortunately this fix does not seem to fix all lockups.
> >>>
> >>> The traces below are a little different, could you please send the whole
> >>> file?
> >>
> >> Will paste it at the end.
> > 
> > [90496.156016] kworker/u8:14   D ffff880044e38540     0 21050      2 0x00000000
> > [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
> > [90496.159320]  ffff88022880f990 0000000000000002 ffff880407f649b0 ffff88022880ffd8
> > [90496.160997]  ffff880044e38000 0000000000013040 ffff880044e38000 7fffffffffffffff
> > [90496.162686]  ffff880301383aa0 0000000000000002 ffffffff814705d0 ffff880301383a98
> > [90496.164360] Call Trace:
> > [90496.166028]  [<ffffffff814705d0>] ? michael_mic.part.6+0x21/0x21
> > [90496.167854]  [<ffffffff81470fd0>] schedule+0x64/0x66
> > [90496.169574]  [<ffffffff814705ff>] schedule_timeout+0x2f/0x114
> > [90496.171221]  [<ffffffff8106479a>] ? wake_up_process+0x2f/0x32
> > [90496.172867]  [<ffffffff81062c3b>] ? get_parent_ip+0xd/0x3c
> > [90496.174472]  [<ffffffff81062ce5>] ? preempt_count_add+0x7b/0x8e
> > [90496.176053]  [<ffffffff814717f3>] __wait_for_common+0x11e/0x163
> > [90496.177619]  [<ffffffff814717f3>] ? __wait_for_common+0x11e/0x163
> > [90496.179173]  [<ffffffff810647aa>] ? wake_up_state+0xd/0xd
> > [90496.180728]  [<ffffffff81471857>] wait_for_completion+0x1f/0x21
> > [90496.182285]  [<ffffffffc044e3b3>] btrfs_async_run_delayed_refs+0xbf/0xd9 [btrfs]
> > [90496.183833]  [<ffffffffc04624e1>] __btrfs_end_transaction+0x2b6/0x2ec [btrfs]
> > [90496.185380]  [<ffffffffc0462522>] btrfs_end_transaction+0xb/0xd [btrfs]
> > [90496.186940]  [<ffffffffc0451742>] find_free_extent+0x8a9/0x976 [btrfs]
> > [90496.189464]  [<ffffffffc0451990>] btrfs_reserve_extent+0x6f/0x119 [btrfs]
> > [90496.191326]  [<ffffffffc0466b45>] cow_file_range+0x1a6/0x377 [btrfs]
> > [90496.193080]  [<ffffffffc047adc4>] ? extent_write_locked_range+0x10c/0x11e [btrfs]
> > [90496.194659]  [<ffffffffc04677e4>] submit_compressed_extents+0x100/0x412 [btrfs]
> > [90496.196225]  [<ffffffff8120e344>] ? debug_smp_processor_id+0x17/0x19
> > [90496.197776]  [<ffffffffc0467b78>] async_cow_submit+0x82/0x87 [btrfs]
> > [90496.199383]  [<ffffffffc048644b>] normal_work_helper+0x153/0x224 [btrfs]
> > [90496.200944]  [<ffffffff81052d8c>] process_one_work+0x16f/0x2b8
> > [90496.202483]  [<ffffffff81053636>] worker_thread+0x27b/0x32e
> > [90496.204000]  [<ffffffff810533bb>] ? cancel_delayed_work_sync+0x10/0x10
> > [90496.205514]  [<ffffffff81058012>] kthread+0xb2/0xba
> > [90496.207040]  [<ffffffff81470000>] ? ap_handle_dropped_data+0xf/0xc8
> > [90496.208565]  [<ffffffff81057f60>] ? __kthread_parkme+0x62/0x62
> > [90496.210096]  [<ffffffff81473f6c>] ret_from_fork+0x7c/0xb0
> > [90496.211618]  [<ffffffff81057f60>] ? __kthread_parkme+0x62/0x62
> > 
> > 
> > Ok, this should explain the hang.  submit_compressed_extents is calling
> > cow_file_range with a locked page.
> > 
> > cow_file_range is trying to find a free extent and in the process is
> > calling btrfs_end_transaction, which is running the async delayed refs,
> > which is trying to write dirty pages, which is waiting for your locked
> > page.
> > 
> > I should be able to reproduce this ;)
> 
> This part of the trace is relatively new because Liu Bo's patch made us
> redirty the pages, making it more likely that we'd try to write them
> during commit.
> 
> But, at the end of the day we have a fundamental deadlock with
> committing a transaction while holding a locked page from an ordered file.
> 
> For now, I'm ripping out the strict ordered file and going back to a
> best-effort filemap_flush like ext4 is using.

I think I've figured the deadlock out, this is obviously a race case, really
hard to reproduce :-(

So it turns out to be related to workqueues -- now a kthread can process
work_struct queued in different workqueues, so we can explain the deadlock as
such,

(1)
"btrfs-delalloc" workqueue gets a compressed extent to process with all its pages
locked during this, and it runs into read free space cache inode, and then wait
on lock_page().

(2)
Reading that free space cache inode comes to submit part, and we have a
indirect twice endio way for it, with the first endio we come to end_workqueue_bio()
and queue a work in "btrfs-endio-meta" workqueue, and it will run the real
endio() for us, but ONLY when it's processed.

So the problem is a kthread can serve several workqueues, which means
works in "btrfs-endio-meta" workqueues and works in "btrfs-flush_delalloc"
workqueues can be in the same processing list of a kthread.  When
"btrfs-flush_delalloc" waits for the compressed page and "btrfs-endio-meta"
comes after it, it hangs.

For now, making a "btrfs-endio-meta-high" can make it pass, but still testing it.

PS:
However, I still have a question on this, kernel workqueue kthread can wake up
another kthread in the same worker pool to run when it's going to sleep(in
__schedule()), so in our hang case, does it happen to be there is no such
'another kthread'?

PPS:
Looking at the above new trace, I think we'd better come up with something new,
making a high priority workqueue seems to be not a good solution.

thanks,
-liubo

> 
> -chris
> 

  reply	other threads:[~2014-08-07  7:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-24 14:48 [PATCH] Btrfs: fix compressed write corruption on enospc Liu Bo
2014-07-24 14:55 ` Chris Mason
2014-07-25  1:00   ` Liu Bo
2014-07-25  9:58     ` Martin Steigerwald
2014-07-25  1:53 ` Wang Shilong
2014-07-25  2:08   ` Liu Bo
2014-07-25  2:11     ` Wang Shilong
2014-07-25  9:54 ` Martin Steigerwald
2014-08-04 12:50   ` Martin Steigerwald
2014-08-04 12:52     ` Martin Steigerwald
2014-08-06 10:21     ` Martin Steigerwald
2014-08-06 10:29       ` Hugo Mills
2014-08-06 12:28         ` Martin Steigerwald
2014-08-06 13:35       ` Chris Mason
2014-08-06 14:43         ` Martin Steigerwald
2014-08-06 15:18           ` Chris Mason
2014-08-07  0:52             ` Chris Mason
2014-08-07  7:50               ` Liu Bo [this message]
2014-08-07  8:20                 ` Miao Xie
2014-08-07 14:02                   ` Chris Mason
2014-08-10 14:55                     ` Liu Bo
2014-08-11 20:35                       ` Chris Mason
2014-08-12  2:55                       ` Miao Xie
2014-08-12  7:51                         ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140807075029.GA29710@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=Martin@lichtvoll.de \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.