From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:41626 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751553AbaEAGBt (ORCPT ); Thu, 1 May 2014 02:01:49 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Wfk43-0004AU-HL for linux-btrfs@vger.kernel.org; Thu, 01 May 2014 08:01:47 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 01 May 2014 08:01:47 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 01 May 2014 08:01:47 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: Unable to rebuild a 3 drive raid1 - blocked for more than 120 seconds. Date: Thu, 1 May 2014 06:01:34 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Saran Neti posted on Thu, 01 May 2014 00:48:22 -0400 as excerpted: > I had 3 x 3 TB drives [...] Then one of the drives got busted. > Mounting the fs in degraded mode and adding a new fresh drive to > rebuild raid1, generated several "...blocked > for more than 120 seconds." messages. > Described in > https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg30017.html > are two possible causes, fragmentation due to COW and hardlinks, both of > which I think are unlikely in this case. I can mount in degraded mode > and read files, but that's about it. Is there something I'm missing? Any > debugging tips would be appreciated. Just a btrfs user and list regular here, not a dev, but... You're to be commended for all that useful information you posted. Way more helpful than most manage in their first round. =:^) But it's enough to see I can't be of much help but for the below, so it's mostly snipped here as unnecessary for this reply... I've several times seen the devs request a magic-sysrq-w dump for cases like this. That should be alt-srq-w on x86 hardware, or echo w > /proc/sysrq-trigger (should work in a VM also). That dumps IO-blocked tasks, letting the devs see where things are screwing up. (If magic-srq is new to you, there's more about it in $KERNDIR/Documentation/sysrq.txt. Last I looked a google returned some pretty good hits discussing it, too.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman