On 2015-08-18 22:55, Timothy Normand Miller wrote: > On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo wrote: >> >> >> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>> >>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo >>> wrote: >>>> >>>> Hi Timothy, >>>> >>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>> discuss it in mail list, as it's not a kernel bug. >>>> >>> >>> All four devices were online. The "missing" one was a drive that >>> died, which was replaced by a new one, but btrfs wouldn't finish the >>> deletion of the missing device. >>> >> By replaced, did you mean "btrfs replace"? Or just change the physical disk >> without using "btrfs replace"? > > Here's what happened: > > - A drive started throwing bad sectors. Somehow this caused metadata > on other drives to get messed up. > - I took that drive offline and mounted degraded (it's a 4-drive RAID1) > - I did a "btrfs add" on a new drive and then a "btrfs delete missing" > - The replacement drive failed during the replacement operation, and > everything went to crap. > - With some help, I got a kernel patch that allowed me to mount the > original three drives with TWO missing devices. > - I added a brand new drive and then did "delete missing" again. This > time, the first "delete missing" was successful, but it didn't fully > balance the drives, and there was another missing device, so I had to > do a "delete missing" again, and that failed. > Just for reference, I've found that it is usually safer to delete the missing device first if possible, then add the new one and re-balance. There seem to be some edge-cases in the code for deleting missing devices.