Re: [PATCH] Btrfs: fix compressed write corruption on enospc

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Liu Bo <bo.li.liu@oracle.com>
To: Chris Mason <clm@fb.com>
Cc: Martin Steigerwald <Martin@lichtvoll.de>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix compressed write corruption on enospc
Date: Thu, 7 Aug 2014 15:50:30 +0800	[thread overview]
Message-ID: <20140807075029.GA29710@localhost.localdomain> (raw)
In-Reply-To: <53E2CDD9.5080009@fb.com>

Hi,

On Wed, Aug 06, 2014 at 08:52:41PM -0400, Chris Mason wrote:
> On 08/06/2014 11:18 AM, Chris Mason wrote:
> > On 08/06/2014 10:43 AM, Martin Steigerwald wrote:
> >> Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason:
> >>> On 08/06/2014 06:21 AM, Martin Steigerwald wrote:
> >>>>> I think this should go to stable. Thanks, Liu.
> >>>
> >>> I'm definitely tagging this for stable.
> >>>
> >>>> Unfortunately this fix does not seem to fix all lockups.
> >>>
> >>> The traces below are a little different, could you please send the whole
> >>> file?
> >>
> >> Will paste it at the end.
> > 
> > [90496.156016] kworker/u8:14   D ffff880044e38540     0 21050      2 0x00000000
> > [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
> > [90496.159320]  ffff88022880f990 0000000000000002 ffff880407f649b0 ffff88022880ffd8
> > [90496.160997]  ffff880044e38000 0000000000013040 ffff880044e38000 7fffffffffffffff
> > [90496.162686]  ffff880301383aa0 0000000000000002 ffffffff814705d0 ffff880301383a98
> > [90496.164360] Call Trace:
> > [90496.166028]  [<ffffffff814705d0>] ? michael_mic.part.6+0x21/0x21
> > [90496.167854]  [<ffffffff81470fd0>] schedule+0x64/0x66
> > [90496.169574]  [<ffffffff814705ff>] schedule_timeout+0x2f/0x114
> > [90496.171221]  [<ffffffff8106479a>] ? wake_up_process+0x2f/0x32
> > [90496.172867]  [<ffffffff81062c3b>] ? get_parent_ip+0xd/0x3c
> > [90496.174472]  [<ffffffff81062ce5>] ? preempt_count_add+0x7b/0x8e
> > [90496.176053]  [<ffffffff814717f3>] __wait_for_common+0x11e/0x163
> > [90496.177619]  [<ffffffff814717f3>] ? __wait_for_common+0x11e/0x163
> > [90496.179173]  [<ffffffff810647aa>] ? wake_up_state+0xd/0xd
> > [90496.180728]  [<ffffffff81471857>] wait_for_completion+0x1f/0x21
> > [90496.182285]  [<ffffffffc044e3b3>] btrfs_async_run_delayed_refs+0xbf/0xd9 [btrfs]
> > [90496.183833]  [<ffffffffc04624e1>] __btrfs_end_transaction+0x2b6/0x2ec [btrfs]
> > [90496.185380]  [<ffffffffc0462522>] btrfs_end_transaction+0xb/0xd [btrfs]
> > [90496.186940]  [<ffffffffc0451742>] find_free_extent+0x8a9/0x976 [btrfs]
> > [90496.189464]  [<ffffffffc0451990>] btrfs_reserve_extent+0x6f/0x119 [btrfs]
> > [90496.191326]  [<ffffffffc0466b45>] cow_file_range+0x1a6/0x377 [btrfs]
> > [90496.193080]  [<ffffffffc047adc4>] ? extent_write_locked_range+0x10c/0x11e [btrfs]
> > [90496.194659]  [<ffffffffc04677e4>] submit_compressed_extents+0x100/0x412 [btrfs]
> > [90496.196225]  [<ffffffff8120e344>] ? debug_smp_processor_id+0x17/0x19
> > [90496.197776]  [<ffffffffc0467b78>] async_cow_submit+0x82/0x87 [btrfs]
> > [90496.199383]  [<ffffffffc048644b>] normal_work_helper+0x153/0x224 [btrfs]
> > [90496.200944]  [<ffffffff81052d8c>] process_one_work+0x16f/0x2b8
> > [90496.202483]  [<ffffffff81053636>] worker_thread+0x27b/0x32e
> > [90496.204000]  [<ffffffff810533bb>] ? cancel_delayed_work_sync+0x10/0x10
> > [90496.205514]  [<ffffffff81058012>] kthread+0xb2/0xba
> > [90496.207040]  [<ffffffff81470000>] ? ap_handle_dropped_data+0xf/0xc8
> > [90496.208565]  [<ffffffff81057f60>] ? __kthread_parkme+0x62/0x62
> > [90496.210096]  [<ffffffff81473f6c>] ret_from_fork+0x7c/0xb0
> > [90496.211618]  [<ffffffff81057f60>] ? __kthread_parkme+0x62/0x62
> > 
> > 
> > Ok, this should explain the hang.  submit_compressed_extents is calling
> > cow_file_range with a locked page.
> > 
> > cow_file_range is trying to find a free extent and in the process is
> > calling btrfs_end_transaction, which is running the async delayed refs,
> > which is trying to write dirty pages, which is waiting for your locked
> > page.
> > 
> > I should be able to reproduce this ;)
> 
> This part of the trace is relatively new because Liu Bo's patch made us
> redirty the pages, making it more likely that we'd try to write them
> during commit.
> 
> But, at the end of the day we have a fundamental deadlock with
> committing a transaction while holding a locked page from an ordered file.
> 
> For now, I'm ripping out the strict ordered file and going back to a
> best-effort filemap_flush like ext4 is using.

I think I've figured the deadlock out, this is obviously a race case, really
hard to reproduce :-(

So it turns out to be related to workqueues -- now a kthread can process
work_struct queued in different workqueues, so we can explain the deadlock as
such,

(1)
"btrfs-delalloc" workqueue gets a compressed extent to process with all its pages
locked during this, and it runs into read free space cache inode, and then wait
on lock_page().

(2)
Reading that free space cache inode comes to submit part, and we have a
indirect twice endio way for it, with the first endio we come to end_workqueue_bio()
and queue a work in "btrfs-endio-meta" workqueue, and it will run the real
endio() for us, but ONLY when it's processed.

So the problem is a kthread can serve several workqueues, which means
works in "btrfs-endio-meta" workqueues and works in "btrfs-flush_delalloc"
workqueues can be in the same processing list of a kthread.  When
"btrfs-flush_delalloc" waits for the compressed page and "btrfs-endio-meta"
comes after it, it hangs.

For now, making a "btrfs-endio-meta-high" can make it pass, but still testing it.

PS:
However, I still have a question on this, kernel workqueue kthread can wake up
another kthread in the same worker pool to run when it's going to sleep(in
__schedule()), so in our hang case, does it happen to be there is no such
'another kthread'?

PPS:
Looking at the above new trace, I think we'd better come up with something new,
making a high priority workqueue seems to be not a good solution.

thanks,
-liubo

> 
> -chris
>

next prev parent reply	other threads:[~2014-08-07  7:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-24 14:48 [PATCH] Btrfs: fix compressed write corruption on enospc Liu Bo
2014-07-24 14:55 ` Chris Mason
2014-07-25  1:00   ` Liu Bo
2014-07-25  9:58     ` Martin Steigerwald
2014-07-25  1:53 ` Wang Shilong
2014-07-25  2:08   ` Liu Bo
2014-07-25  2:11     ` Wang Shilong
2014-07-25  9:54 ` Martin Steigerwald
2014-08-04 12:50   ` Martin Steigerwald
2014-08-04 12:52     ` Martin Steigerwald
2014-08-06 10:21     ` Martin Steigerwald
2014-08-06 10:29       ` Hugo Mills
2014-08-06 12:28         ` Martin Steigerwald
2014-08-06 13:35       ` Chris Mason
2014-08-06 14:43         ` Martin Steigerwald
2014-08-06 15:18           ` Chris Mason
2014-08-07  0:52             ` Chris Mason
2014-08-07  7:50               ` Liu Bo [this message]
2014-08-07  8:20                 ` Miao Xie
2014-08-07 14:02                   ` Chris Mason
2014-08-10 14:55                     ` Liu Bo
2014-08-11 20:35                       ` Chris Mason
2014-08-12  2:55                       ` Miao Xie
2014-08-12  7:51                         ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140807075029.GA29710@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=Martin@lichtvoll.de \
    --cc=clm@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).