From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:12773 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751125AbaJGVXG (ORCPT ); Tue, 7 Oct 2014 17:23:06 -0400 Date: Tue, 7 Oct 2014 17:22:52 -0400 From: Chris Mason Subject: Re: 3.16.2 btrfs deadlock To: Marc MERLIN CC: Message-ID: <1412716972.2374.1@mail.thefacebook.com> In-Reply-To: <20141005202937.GK10696@merlins.org> References: <20141005202937.GK10696@merlins.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Sun, Oct 5, 2014 at 4:29 PM, Marc MERLIN wrote: > Deadlocks have been less frequent (good), but here is one. > > An rsync from 5 days ago got stuck on btrfs it seems, and things just > started piling up on top until the system deadlocked > > I see btrfs-transaction saying wait on page, but if it's RAM, I had > plenty left: > total used free shared buffers > cached > Mem: 7894580 6950108 944472 0 40 > 4816148 > -/+ buffers/cache: 2133920 5760660 > Swap: 15616764 767004 14849760 > > > Here's the trace: > SysRq : Show Blocked State > task PC stack pid father > md8_raid5 D ffff88017028cb80 0 675 2 0x00000000 > ffff88020fd67aa8 0000000000000046 ffffffff812f1799 ffff88020fd67fd8 > ffff880037228410 00000000000140c0 ffff88021e3940c0 ffff880037228410 > ffff8801f5579bf0 0000000000000004 ffff880211ad07c8 ffff88020fd67ab8 > Call Trace: > [] ? blk_flush_plug_list+0x1bc/0x1cb > [] schedule+0x6e/0x70 > [] io_schedule+0x60/0x7a > [] get_request+0x4b8/0x56a > [] ? cfq_merge+0x49/0x9e > [] ? finish_wait+0x65/0x65 > [] blk_queue_bio+0x179/0x262 > [] generic_make_request+0x9c/0xdb > [] handle_stripe+0x1e41/0x2166 [raid456] > [] ? ___preempt_schedule+0x56/0xa8 > [] ? _raw_spin_unlock_irqrestore+0x1f/0x32 > [] handle_active_stripes.isra.22+0x2e3/0x359 > [raid456] > [] ? md_wakeup_thread+0x55/0x58 > [] raid5d+0x330/0x428 [raid456] > [] ? get_parent_ip+0xd/0x3c > [] md_thread+0x11c/0x13a > [] ? finish_wait+0x65/0x65 > [] ? bb_store+0x55/0x55 > [] kthread+0xae/0xb6 > [] ? __kthread_parkme+0x61/0x61 > [] ret_from_fork+0x7c/0xb0 > [] ? __kthread_parkme+0x61/0x61 This trace shows we're stuck somewhere different from the 3.15 stalls. md is waiting for a request, and unfortunately those are outside of btrfs completely. It's likely that if you had let it sit, the box would have eventually dig its way out. -chris