From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:51236 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754119AbaHGHuz (ORCPT ); Thu, 7 Aug 2014 03:50:55 -0400 Date: Thu, 7 Aug 2014 15:50:30 +0800 From: Liu Bo To: Chris Mason Cc: Martin Steigerwald , linux-btrfs@vger.kernel.org Subject: Re: [PATCH] Btrfs: fix compressed write corruption on enospc Message-ID: <20140807075029.GA29710@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <1406213285-19607-1-git-send-email-bo.li.liu@oracle.com> <3143412.n5KSJct7YP@merkaba> <53E22F37.9000308@fb.com> <4178355.kT8uO1WxjX@merkaba> <53E24738.5070908@fb.com> <53E2CDD9.5080009@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <53E2CDD9.5080009@fb.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi, On Wed, Aug 06, 2014 at 08:52:41PM -0400, Chris Mason wrote: > On 08/06/2014 11:18 AM, Chris Mason wrote: > > On 08/06/2014 10:43 AM, Martin Steigerwald wrote: > >> Am Mittwoch, 6. August 2014, 09:35:51 schrieb Chris Mason: > >>> On 08/06/2014 06:21 AM, Martin Steigerwald wrote: > >>>>> I think this should go to stable. Thanks, Liu. > >>> > >>> I'm definitely tagging this for stable. > >>> > >>>> Unfortunately this fix does not seem to fix all lockups. > >>> > >>> The traces below are a little different, could you please send the whole > >>> file? > >> > >> Will paste it at the end. > > > > [90496.156016] kworker/u8:14 D ffff880044e38540 0 21050 2 0x00000000 > > [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs] > > [90496.159320] ffff88022880f990 0000000000000002 ffff880407f649b0 ffff88022880ffd8 > > [90496.160997] ffff880044e38000 0000000000013040 ffff880044e38000 7fffffffffffffff > > [90496.162686] ffff880301383aa0 0000000000000002 ffffffff814705d0 ffff880301383a98 > > [90496.164360] Call Trace: > > [90496.166028] [] ? michael_mic.part.6+0x21/0x21 > > [90496.167854] [] schedule+0x64/0x66 > > [90496.169574] [] schedule_timeout+0x2f/0x114 > > [90496.171221] [] ? wake_up_process+0x2f/0x32 > > [90496.172867] [] ? get_parent_ip+0xd/0x3c > > [90496.174472] [] ? preempt_count_add+0x7b/0x8e > > [90496.176053] [] __wait_for_common+0x11e/0x163 > > [90496.177619] [] ? __wait_for_common+0x11e/0x163 > > [90496.179173] [] ? wake_up_state+0xd/0xd > > [90496.180728] [] wait_for_completion+0x1f/0x21 > > [90496.182285] [] btrfs_async_run_delayed_refs+0xbf/0xd9 [btrfs] > > [90496.183833] [] __btrfs_end_transaction+0x2b6/0x2ec [btrfs] > > [90496.185380] [] btrfs_end_transaction+0xb/0xd [btrfs] > > [90496.186940] [] find_free_extent+0x8a9/0x976 [btrfs] > > [90496.189464] [] btrfs_reserve_extent+0x6f/0x119 [btrfs] > > [90496.191326] [] cow_file_range+0x1a6/0x377 [btrfs] > > [90496.193080] [] ? extent_write_locked_range+0x10c/0x11e [btrfs] > > [90496.194659] [] submit_compressed_extents+0x100/0x412 [btrfs] > > [90496.196225] [] ? debug_smp_processor_id+0x17/0x19 > > [90496.197776] [] async_cow_submit+0x82/0x87 [btrfs] > > [90496.199383] [] normal_work_helper+0x153/0x224 [btrfs] > > [90496.200944] [] process_one_work+0x16f/0x2b8 > > [90496.202483] [] worker_thread+0x27b/0x32e > > [90496.204000] [] ? cancel_delayed_work_sync+0x10/0x10 > > [90496.205514] [] kthread+0xb2/0xba > > [90496.207040] [] ? ap_handle_dropped_data+0xf/0xc8 > > [90496.208565] [] ? __kthread_parkme+0x62/0x62 > > [90496.210096] [] ret_from_fork+0x7c/0xb0 > > [90496.211618] [] ? __kthread_parkme+0x62/0x62 > > > > > > Ok, this should explain the hang. submit_compressed_extents is calling > > cow_file_range with a locked page. > > > > cow_file_range is trying to find a free extent and in the process is > > calling btrfs_end_transaction, which is running the async delayed refs, > > which is trying to write dirty pages, which is waiting for your locked > > page. > > > > I should be able to reproduce this ;) > > This part of the trace is relatively new because Liu Bo's patch made us > redirty the pages, making it more likely that we'd try to write them > during commit. > > But, at the end of the day we have a fundamental deadlock with > committing a transaction while holding a locked page from an ordered file. > > For now, I'm ripping out the strict ordered file and going back to a > best-effort filemap_flush like ext4 is using. I think I've figured the deadlock out, this is obviously a race case, really hard to reproduce :-( So it turns out to be related to workqueues -- now a kthread can process work_struct queued in different workqueues, so we can explain the deadlock as such, (1) "btrfs-delalloc" workqueue gets a compressed extent to process with all its pages locked during this, and it runs into read free space cache inode, and then wait on lock_page(). (2) Reading that free space cache inode comes to submit part, and we have a indirect twice endio way for it, with the first endio we come to end_workqueue_bio() and queue a work in "btrfs-endio-meta" workqueue, and it will run the real endio() for us, but ONLY when it's processed. So the problem is a kthread can serve several workqueues, which means works in "btrfs-endio-meta" workqueues and works in "btrfs-flush_delalloc" workqueues can be in the same processing list of a kthread. When "btrfs-flush_delalloc" waits for the compressed page and "btrfs-endio-meta" comes after it, it hangs. For now, making a "btrfs-endio-meta-high" can make it pass, but still testing it. PS: However, I still have a question on this, kernel workqueue kthread can wake up another kthread in the same worker pool to run when it's going to sleep(in __schedule()), so in our hang case, does it happen to be there is no such 'another kthread'? PPS: Looking at the above new trace, I think we'd better come up with something new, making a high priority workqueue seems to be not a good solution. thanks, -liubo > > -chris >