From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([59.151.112.132]:31272 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751200AbbHSFWz (ORCPT ); Wed, 19 Aug 2015 01:22:55 -0400 Subject: Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor To: Timothy Normand Miller References: <55D3DCA5.9070607@cn.fujitsu.com> <55D3EE99.2000702@cn.fujitsu.com> CC: Btrfs BTRFS From: Qu Wenruo Message-ID: <55D412A9.3030903@cn.fujitsu.com> Date: Wed, 19 Aug 2015 13:22:49 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Timothy Normand Miller wrote on 2015/08/18 22:55 -0400: > On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo wrote: >> >> >> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>> >>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo >>> wrote: >>>> >>>> Hi Timothy, >>>> >>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>> discuss it in mail list, as it's not a kernel bug. >>>> >>> >>> All four devices were online. The "missing" one was a drive that >>> died, which was replaced by a new one, but btrfs wouldn't finish the >>> deletion of the missing device. >>> >> By replaced, did you mean "btrfs replace"? Or just change the physical disk >> without using "btrfs replace"? > > Here's what happened: > > - A drive started throwing bad sectors. Somehow this caused metadata > on other drives to get messed up. Did that cause any huge damage? > - I took that drive offline and mounted degraded (it's a 4-drive RAID1) > - I did a "btrfs add" on a new drive and then a "btrfs delete missing" > - The replacement drive failed during the replacement operation, and > everything went to crap. > - With some help, I got a kernel patch that allowed me to mount the > original three drives with TWO missing devices. So the original 3 drives are still OK, original bad one is missing, and the newly add one is also missing? That sounds quite repairable. > - I added a brand new drive and then did "delete missing" again. This > time, the first "delete missing" was successful, but it didn't fully > balance the drives, and there was another missing device, so I had to > do a "delete missing" again, and that failed. > > I wanted to get this back online and restored from a backup, but I was > willing to keep it this way if people wanted to probe at, in case we > can uncover any btrfs bugs. So it was suggested to get a metadata > image, but that ran into some kind of bug in btrfs-image. If btrfs-image doesn't work, you can also try btrfs-debug-tree. IIRC, debug-tree should be more robust than btrfs-image. BTW, have you tried btrfsck on it? Does it also cause the infinite loop? I'll also try to reproduce it and investigate the codes directly. Thanks, Qu > > Currently, I'm restoring from backup, but I have at least a partial > metadata dump. > >