* btrfs-image gets stuck, using 100%, looping on bad file descriptor @ 2015-08-18 15:40 Timothy Normand Miller 2015-08-19 1:32 ` Qu Wenruo 2015-08-21 1:32 ` Qu Wenruo 0 siblings, 2 replies; 11+ messages in thread From: Timothy Normand Miller @ 2015-08-18 15:40 UTC (permalink / raw) To: Btrfs BTRFS I've filed a bug report on this: https://bugzilla.kernel.org/show_bug.cgi?id=103081 -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-18 15:40 btrfs-image gets stuck, using 100%, looping on bad file descriptor Timothy Normand Miller @ 2015-08-19 1:32 ` Qu Wenruo 2015-08-19 2:46 ` Timothy Normand Miller 2015-08-21 1:32 ` Qu Wenruo 1 sibling, 1 reply; 11+ messages in thread From: Qu Wenruo @ 2015-08-19 1:32 UTC (permalink / raw) To: Timothy Normand Miller, Btrfs BTRFS Hi Timothy, Although I have replied to the bugzilla, IMHO it's more appropriate to discuss it in mail list, as it's not a kernel bug. Thanks, Qu Timothy Normand Miller wrote on 2015/08/18 11:40 -0400: > I've filed a bug report on this: > > https://bugzilla.kernel.org/show_bug.cgi?id=103081 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 1:32 ` Qu Wenruo @ 2015-08-19 2:46 ` Timothy Normand Miller 2015-08-19 2:48 ` Qu Wenruo 0 siblings, 1 reply; 11+ messages in thread From: Timothy Normand Miller @ 2015-08-19 2:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > Hi Timothy, > > Although I have replied to the bugzilla, IMHO it's more appropriate to > discuss it in mail list, as it's not a kernel bug. > All four devices were online. The "missing" one was a drive that died, which was replaced by a new one, but btrfs wouldn't finish the deletion of the missing device. -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 2:46 ` Timothy Normand Miller @ 2015-08-19 2:48 ` Qu Wenruo 2015-08-19 2:55 ` Timothy Normand Miller 0 siblings, 1 reply; 11+ messages in thread From: Qu Wenruo @ 2015-08-19 2:48 UTC (permalink / raw) To: Timothy Normand Miller; +Cc: Btrfs BTRFS Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: > On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> Hi Timothy, >> >> Although I have replied to the bugzilla, IMHO it's more appropriate to >> discuss it in mail list, as it's not a kernel bug. >> > > All four devices were online. The "missing" one was a drive that > died, which was replaced by a new one, but btrfs wouldn't finish the > deletion of the missing device. > By replaced, did you mean "btrfs replace"? Or just change the physical disk without using "btrfs replace"? Thanks, Qu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 2:48 ` Qu Wenruo @ 2015-08-19 2:55 ` Timothy Normand Miller 2015-08-19 5:22 ` Qu Wenruo 2015-08-20 11:38 ` Austin S Hemmelgarn 0 siblings, 2 replies; 11+ messages in thread From: Timothy Normand Miller @ 2015-08-19 2:55 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >> >> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >> wrote: >>> >>> Hi Timothy, >>> >>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>> discuss it in mail list, as it's not a kernel bug. >>> >> >> All four devices were online. The "missing" one was a drive that >> died, which was replaced by a new one, but btrfs wouldn't finish the >> deletion of the missing device. >> > By replaced, did you mean "btrfs replace"? Or just change the physical disk > without using "btrfs replace"? Here's what happened: - A drive started throwing bad sectors. Somehow this caused metadata on other drives to get messed up. - I took that drive offline and mounted degraded (it's a 4-drive RAID1) - I did a "btrfs add" on a new drive and then a "btrfs delete missing" - The replacement drive failed during the replacement operation, and everything went to crap. - With some help, I got a kernel patch that allowed me to mount the original three drives with TWO missing devices. - I added a brand new drive and then did "delete missing" again. This time, the first "delete missing" was successful, but it didn't fully balance the drives, and there was another missing device, so I had to do a "delete missing" again, and that failed. I wanted to get this back online and restored from a backup, but I was willing to keep it this way if people wanted to probe at, in case we can uncover any btrfs bugs. So it was suggested to get a metadata image, but that ran into some kind of bug in btrfs-image. Currently, I'm restoring from backup, but I have at least a partial metadata dump. -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 2:55 ` Timothy Normand Miller @ 2015-08-19 5:22 ` Qu Wenruo 2015-08-19 16:18 ` Timothy Normand Miller 2015-08-20 11:38 ` Austin S Hemmelgarn 1 sibling, 1 reply; 11+ messages in thread From: Qu Wenruo @ 2015-08-19 5:22 UTC (permalink / raw) To: Timothy Normand Miller; +Cc: Btrfs BTRFS Timothy Normand Miller wrote on 2015/08/18 22:55 -0400: > On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> >> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>> >>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >>> wrote: >>>> >>>> Hi Timothy, >>>> >>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>> discuss it in mail list, as it's not a kernel bug. >>>> >>> >>> All four devices were online. The "missing" one was a drive that >>> died, which was replaced by a new one, but btrfs wouldn't finish the >>> deletion of the missing device. >>> >> By replaced, did you mean "btrfs replace"? Or just change the physical disk >> without using "btrfs replace"? > > Here's what happened: > > - A drive started throwing bad sectors. Somehow this caused metadata > on other drives to get messed up. Did that cause any huge damage? > - I took that drive offline and mounted degraded (it's a 4-drive RAID1) > - I did a "btrfs add" on a new drive and then a "btrfs delete missing" > - The replacement drive failed during the replacement operation, and > everything went to crap. > - With some help, I got a kernel patch that allowed me to mount the > original three drives with TWO missing devices. So the original 3 drives are still OK, original bad one is missing, and the newly add one is also missing? That sounds quite repairable. > - I added a brand new drive and then did "delete missing" again. This > time, the first "delete missing" was successful, but it didn't fully > balance the drives, and there was another missing device, so I had to > do a "delete missing" again, and that failed. > > I wanted to get this back online and restored from a backup, but I was > willing to keep it this way if people wanted to probe at, in case we > can uncover any btrfs bugs. So it was suggested to get a metadata > image, but that ran into some kind of bug in btrfs-image. If btrfs-image doesn't work, you can also try btrfs-debug-tree. IIRC, debug-tree should be more robust than btrfs-image. BTW, have you tried btrfsck on it? Does it also cause the infinite loop? I'll also try to reproduce it and investigate the codes directly. Thanks, Qu > > Currently, I'm restoring from backup, but I have at least a partial > metadata dump. > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 5:22 ` Qu Wenruo @ 2015-08-19 16:18 ` Timothy Normand Miller 0 siblings, 0 replies; 11+ messages in thread From: Timothy Normand Miller @ 2015-08-19 16:18 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS On Wed, Aug 19, 2015 at 1:22 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Timothy Normand Miller wrote on 2015/08/18 22:55 -0400: >> >> On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >> wrote: >>> >>> >>> >>> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>>> >>>> >>>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >>>> wrote: >>>>> >>>>> >>>>> Hi Timothy, >>>>> >>>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>>> discuss it in mail list, as it's not a kernel bug. >>>>> >>>> >>>> All four devices were online. The "missing" one was a drive that >>>> died, which was replaced by a new one, but btrfs wouldn't finish the >>>> deletion of the missing device. >>>> >>> By replaced, did you mean "btrfs replace"? Or just change the physical >>> disk >>> without using "btrfs replace"? >> >> >> Here's what happened: >> >> - A drive started throwing bad sectors. Somehow this caused metadata >> on other drives to get messed up. > > > Did that cause any huge damage? It seems that metadata was damaged on all drives. > >> - I took that drive offline and mounted degraded (it's a 4-drive RAID1) >> - I did a "btrfs add" on a new drive and then a "btrfs delete missing" >> - The replacement drive failed during the replacement operation, and >> everything went to crap. >> - With some help, I got a kernel patch that allowed me to mount the >> original three drives with TWO missing devices. > > > So the original 3 drives are still OK, > original bad one is missing, and the newly add one is also missing? > > That sounds quite repairable. Nothing I tried would run to completion. There were always errors. > >> - I added a brand new drive and then did "delete missing" again. This >> time, the first "delete missing" was successful, but it didn't fully >> balance the drives, and there was another missing device, so I had to >> do a "delete missing" again, and that failed. >> >> I wanted to get this back online and restored from a backup, but I was >> willing to keep it this way if people wanted to probe at, in case we >> can uncover any btrfs bugs. So it was suggested to get a metadata >> image, but that ran into some kind of bug in btrfs-image. > > If btrfs-image doesn't work, you can also try btrfs-debug-tree. > IIRC, debug-tree should be more robust than btrfs-image. > > BTW, have you tried btrfsck on it? Does it also cause the infinite loop? > > I'll also try to reproduce it and investigate the codes directly. Well, I had to get things back online, so I've restored from backup. I do have what limited metadata image I could get from btrfs-image. > > Thanks, > Qu > >> >> Currently, I'm restoring from backup, but I have at least a partial >> metadata dump. >> >> > -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-19 2:55 ` Timothy Normand Miller 2015-08-19 5:22 ` Qu Wenruo @ 2015-08-20 11:38 ` Austin S Hemmelgarn 2015-08-20 13:08 ` Timothy Normand Miller 1 sibling, 1 reply; 11+ messages in thread From: Austin S Hemmelgarn @ 2015-08-20 11:38 UTC (permalink / raw) To: Timothy Normand Miller, Qu Wenruo; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 1776 bytes --] On 2015-08-18 22:55, Timothy Normand Miller wrote: > On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> >> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>> >>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >>> wrote: >>>> >>>> Hi Timothy, >>>> >>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>> discuss it in mail list, as it's not a kernel bug. >>>> >>> >>> All four devices were online. The "missing" one was a drive that >>> died, which was replaced by a new one, but btrfs wouldn't finish the >>> deletion of the missing device. >>> >> By replaced, did you mean "btrfs replace"? Or just change the physical disk >> without using "btrfs replace"? > > Here's what happened: > > - A drive started throwing bad sectors. Somehow this caused metadata > on other drives to get messed up. > - I took that drive offline and mounted degraded (it's a 4-drive RAID1) > - I did a "btrfs add" on a new drive and then a "btrfs delete missing" > - The replacement drive failed during the replacement operation, and > everything went to crap. > - With some help, I got a kernel patch that allowed me to mount the > original three drives with TWO missing devices. > - I added a brand new drive and then did "delete missing" again. This > time, the first "delete missing" was successful, but it didn't fully > balance the drives, and there was another missing device, so I had to > do a "delete missing" again, and that failed. > Just for reference, I've found that it is usually safer to delete the missing device first if possible, then add the new one and re-balance. There seem to be some edge-cases in the code for deleting missing devices. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-20 11:38 ` Austin S Hemmelgarn @ 2015-08-20 13:08 ` Timothy Normand Miller 2015-08-20 13:12 ` Austin S Hemmelgarn 0 siblings, 1 reply; 11+ messages in thread From: Timothy Normand Miller @ 2015-08-20 13:08 UTC (permalink / raw) To: Austin S Hemmelgarn; +Cc: Qu Wenruo, Btrfs BTRFS On Thu, Aug 20, 2015 at 7:38 AM, Austin S Hemmelgarn <ahferroin7@gmail.com> wrote: > Just for reference, I've found that it is usually safer to delete the > missing device first if possible, then add the new one and re-balance. There > seem to be some edge-cases in the code for deleting missing devices. > The problem is that you can't do that if there's not enough space on the remaining devices to hold all the data. -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-20 13:08 ` Timothy Normand Miller @ 2015-08-20 13:12 ` Austin S Hemmelgarn 0 siblings, 0 replies; 11+ messages in thread From: Austin S Hemmelgarn @ 2015-08-20 13:12 UTC (permalink / raw) To: Timothy Normand Miller; +Cc: Qu Wenruo, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 595 bytes --] On 2015-08-20 09:08, Timothy Normand Miller wrote: > On Thu, Aug 20, 2015 at 7:38 AM, Austin S Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> Just for reference, I've found that it is usually safer to delete the >> missing device first if possible, then add the new one and re-balance. There >> seem to be some edge-cases in the code for deleting missing devices. >> > > The problem is that you can't do that if there's not enough space on > the remaining devices to hold all the data. > > Good point, I often forget that not everyone over-provisions their storage like I do. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3019 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: btrfs-image gets stuck, using 100%, looping on bad file descriptor 2015-08-18 15:40 btrfs-image gets stuck, using 100%, looping on bad file descriptor Timothy Normand Miller 2015-08-19 1:32 ` Qu Wenruo @ 2015-08-21 1:32 ` Qu Wenruo 1 sibling, 0 replies; 11+ messages in thread From: Qu Wenruo @ 2015-08-21 1:32 UTC (permalink / raw) To: Timothy Normand Miller, Btrfs BTRFS Succeeded in reproducing the bug. Any missing device will cause btrfs-image to inifinite loop. It should be easy to fix. I'll CC you when the patch is out. Thanks, Qu Timothy Normand Miller wrote on 2015/08/18 11:40 -0400: > I've filed a bug report on this: > > https://bugzilla.kernel.org/show_bug.cgi?id=103081 > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-08-21 1:32 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-18 15:40 btrfs-image gets stuck, using 100%, looping on bad file descriptor Timothy Normand Miller 2015-08-19 1:32 ` Qu Wenruo 2015-08-19 2:46 ` Timothy Normand Miller 2015-08-19 2:48 ` Qu Wenruo 2015-08-19 2:55 ` Timothy Normand Miller 2015-08-19 5:22 ` Qu Wenruo 2015-08-19 16:18 ` Timothy Normand Miller 2015-08-20 11:38 ` Austin S Hemmelgarn 2015-08-20 13:08 ` Timothy Normand Miller 2015-08-20 13:12 ` Austin S Hemmelgarn 2015-08-21 1:32 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).