* bad extent [5993525264384, 5993525280768), type mismatch with chunk @ 2015-11-12 21:51 Christoph Anton Mitterer 2015-11-12 22:23 ` Christoph Anton Mitterer ` (3 more replies) 0 siblings, 4 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-12 21:51 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1167 bytes --] Hey. I get these errors on fsck'ing a btrfs: bad extent [5993525264384, 5993525280768), type mismatch with chunk bad extent [5993525280768, 5993525297152), type mismatch with chunk bad extent [5993525297152, 5993525313536), type mismatch with chunk bad extent [5993529442304, 5993529458688), type mismatch with chunk bad extent [5993529458688, 5993529475072), type mismatch with chunk bad extent [5993530015744, 5993530032128), type mismatch with chunk bad extent [5993530359808, 5993530376192), type mismatch with chunk bad extent [5993530376192, 5993530392576), type mismatch with chunk bad extent [5993530392576, 5993530408960), type mismatch with chunk bad extent [5993530408960, 5993530425344), type mismatch with chunk bad extent [5993531260928, 5993531277312), type mismatch with chunk bad extent [5993531310080, 5993531326464), type mismatch with chunk What do they mean? And how to correct it without data loss (cause this would be critical/precious data)? Oddly, I've fsck'ed the very same fs last time I've unmounted it (with no errors)... and now this. The only difference would be newer kernel and btrfsprogs. Thanks, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-12 21:51 bad extent [5993525264384, 5993525280768), type mismatch with chunk Christoph Anton Mitterer @ 2015-11-12 22:23 ` Christoph Anton Mitterer 2015-11-13 2:13 ` Qu Wenruo ` (2 subsequent siblings) 3 siblings, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-12 22:23 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 324 bytes --] I've uploaded the full output of btrfs check on that device to: http://christoph.anton.mitterer.name/tmp/public/cbec6446-898b-11e5-90a4-502690aa641f.xz there are nearly 600k of these error lines... WTF?! Also, the filesystem still mounts (without any errors to dmesg) Any help would be appreciated, thx, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-12 21:51 bad extent [5993525264384, 5993525280768), type mismatch with chunk Christoph Anton Mitterer 2015-11-12 22:23 ` Christoph Anton Mitterer @ 2015-11-13 2:13 ` Qu Wenruo 2015-11-13 2:26 ` Christoph Anton Mitterer 2015-11-14 1:22 ` Qu Wenruo 2016-02-16 0:14 ` Ángel González 3 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-13 2:13 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-btrfs@vger.kernel.org Latest fsck added a lot of more restrict check. Like this one, if any extent type doesn't match with its chunk, like metadata extent in a data chunk, btrfsck will report like that. The filesystem seems to be a converted one from ext*. If so, there is no real 100% stable method to recovery it, as btrfs-convert has broken for some time, and will cause above problem. But some user, like Roman Mamedov in the maillist, said a balance operation can solve it. It's worthy trying but it may also cause unknown bugs. Thanks, Qu Christoph Anton Mitterer wrote on 2015/11/12 22:51 +0100: > Hey. > > I get these errors on fsck'ing a btrfs: > bad extent [5993525264384, 5993525280768), type mismatch with chunk > bad extent [5993525280768, 5993525297152), type mismatch with chunk > bad extent [5993525297152, 5993525313536), type mismatch with chunk > bad extent [5993529442304, 5993529458688), type mismatch with chunk > bad extent [5993529458688, 5993529475072), type mismatch with chunk > bad extent [5993530015744, 5993530032128), type mismatch with chunk > bad extent [5993530359808, 5993530376192), type mismatch with chunk > bad extent [5993530376192, 5993530392576), type mismatch with chunk > bad extent [5993530392576, 5993530408960), type mismatch with chunk > bad extent [5993530408960, 5993530425344), type mismatch with chunk > bad extent [5993531260928, 5993531277312), type mismatch with chunk > bad extent [5993531310080, 5993531326464), type mismatch with chunk > > What do they mean? And how to correct it without data loss (cause this > would be critical/precious data)? > > Oddly, I've fsck'ed the very same fs last time I've unmounted it (with > no errors)... and now this. > The only difference would be newer kernel and btrfsprogs. > > > Thanks, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 2:13 ` Qu Wenruo @ 2015-11-13 2:26 ` Christoph Anton Mitterer 2015-11-13 2:52 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 2:26 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1664 bytes --] Hey. On Fri, 2015-11-13 at 10:13 +0800, Qu Wenruo wrote: > Like this one, if any extent type doesn't match with its chunk, like > metadata extent in a data chunk, btrfsck will report like that. So these errors... are they anything serious? I.e. like data loss/corruption? Or is it more a "cosmetic" issue? And would there be a way for btrfs check to repair that thing? And is there a way to find out to which file these extents belong? > The filesystem seems to be a converted one from ext*. No,it was not... any other reasons that could cause this? It's actually a very plain btrfs... no RAID, no qgroups,... the only thing I really did was creating snapshots and incrementally send'ing them to other btrfs (i.e. the backups). I'd have expected that btrfs is more or less table in these cases. > But some user, like Roman Mamedov in the maillist, said a balance > operation can solve it. > It's worthy trying but it may also cause unknown bugs. So what's the safest way? Copying off all data and creating the fs from scratch? If so, is there a (safe) way to copy a fully fs with the snapshots? But as I've said, this wasn't an ext converted fs, so in case I do this or the balance we probably loose any chances to further debug. And is there any way to tell more certain, whether the balance would help or whether I'd just get more possibly even hidden corruptions? I mean right... well it would be painful to recover from the most recent backup, but not extremely painful. Right now I'm doing a full read-only scrub, which will take a while as it's a nearly full 8TB HDD, so far no errors. Thanks, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 2:26 ` Christoph Anton Mitterer @ 2015-11-13 2:52 ` Qu Wenruo 2015-11-13 3:03 ` Christoph Anton Mitterer ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-13 2:52 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-btrfs@vger.kernel.org Christoph Anton Mitterer wrote on 2015/11/13 03:26 +0100: > Hey. > > On Fri, 2015-11-13 at 10:13 +0800, Qu Wenruo wrote: >> Like this one, if any extent type doesn't match with its chunk, like >> metadata extent in a data chunk, btrfsck will report like that. > So these errors... are they anything serious? I.e. like data > loss/corruption? Or is it more a "cosmetic" issue? > > And would there be a way for btrfs check to repair that thing? > > And is there a way to find out to which file these extents belong? It can be done by the btrfs-debug-tree way. But it's never a user-friendly one... Not sure if there is a good tool. > > >> The filesystem seems to be a converted one from ext*. > No,it was not... any other reasons that could cause this? > > It's actually a very plain btrfs... no RAID, no qgroups,... the only > thing I really did was creating snapshots and incrementally send'ing > them to other btrfs (i.e. the backups). > I'd have expected that btrfs is more or less table in these cases. > So there is some bug hidden... > >> But some user, like Roman Mamedov in the maillist, said a balance >> operation can solve it. >> It's worthy trying but it may also cause unknown bugs. > So what's the safest way? Copying off all data and creating the fs from > scratch? > If so, is there a (safe) way to copy a fully fs with the snapshots? I'm not a experienced btrfs user, so no good advice here. > > But as I've said, this wasn't an ext converted fs, so in case I do this > or the balance we probably loose any chances to further debug. You can provide the output of "btrfs-debug-tree -t 2 <dev>" to help further debug. It would be quite big, so it's better to zip it. Although it may not help a lot, but at least I can tell you which file extents are affected. (By the hard way, I can only tell you which inode in which subvolume is affected, all in numeric form) And I could get enough info to determine what's the wrong type. (data extent in metadata chunk or vice verse, or even system chunk is involved) > > And is there any way to tell more certain, whether the balance would > help or whether I'd just get more possibly even hidden corruptions? I > mean right... well it would be painful to recover from the most recent > backup, but not extremely painful. When extent and chunk type get wrong, only god knows what will happen... So no useful help here. > > > Right now I'm doing a full read-only scrub, which will take a while as > it's a nearly full 8TB HDD, so far no errors. If the type mismatch errors are the only error output from fsck, then scrub should not help or report anything useful. And at last, what's the kernel and btrfs-progs version? Thanks, Qu > > > Thanks, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 2:52 ` Qu Wenruo @ 2015-11-13 3:03 ` Christoph Anton Mitterer 2015-11-13 3:23 ` Qu Wenruo 2015-11-13 3:57 ` Christoph Anton Mitterer [not found] ` <564F48FE.4000400@laposte.net> 2 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 3:03 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1549 bytes --] On Fri, 2015-11-13 at 10:52 +0800, Qu Wenruo wrote: > You can provide the output of "btrfs-debug-tree -t 2 <dev>" to help > further debug. > It would be quite big, so it's better to zip it. That would contain all the filenames, right? Hmm that could be problematic because of privacy issues... > Although it may not help a lot, but at least I can tell you which > file > extents are affected. (By the hard way, I can only tell you which > inode > in which subvolume is affected, all in numeric form) > > And I could get enough info to determine what's the wrong type. > (data extent in metadata chunk or vice verse, or even system chunk is > involved) sigh... I mean... how can that happen, if nothing of the more recent things is used... I'd have guess others would have noted such a bug before. > > And is there any way to tell more certain, whether the balance > > would > > help or whether I'd just get more possibly even hidden corruptions? > > I > > mean right... well it would be painful to recover from the most > > recent > > backup, but not extremely painful. > > When extent and chunk type get wrong, only god knows what will > happen... > So no useful help here. If btrfs check already notices the mismatch, shouldn't it then be possible to set the correct type? > If the type mismatch errors are the only error output from fsck, then > scrub should not help or report anything useful. I see... > And at last, what's the kernel and btrfs-progs version? kernel 4.2.6 btrfsprogs 4.3 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 3:03 ` Christoph Anton Mitterer @ 2015-11-13 3:23 ` Qu Wenruo 2015-11-13 3:31 ` Christoph Anton Mitterer 2015-11-13 3:44 ` Christoph Anton Mitterer 0 siblings, 2 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-13 3:23 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-btrfs@vger.kernel.org Christoph Anton Mitterer wrote on 2015/11/13 04:03 +0100: > On Fri, 2015-11-13 at 10:52 +0800, Qu Wenruo wrote: >> You can provide the output of "btrfs-debug-tree -t 2 <dev>" to help >> further debug. >> It would be quite big, so it's better to zip it. > That would contain all the filenames, right? Hmm that could be > problematic because of privacy issues... No, "-t 2" means only dump extent tree, no privacy issues at all. Since only numeric inode/snapshot number and offset inside file. Or I'll give you a warning on privacy. No file name at all, just try it yourself. > > >> Although it may not help a lot, but at least I can tell you which >> file >> extents are affected. (By the hard way, I can only tell you which >> inode >> in which subvolume is affected, all in numeric form) >> >> And I could get enough info to determine what's the wrong type. >> (data extent in metadata chunk or vice verse, or even system chunk is >> involved) > sigh... I mean... how can that happen, if nothing of the more recent > things is used... I'd have guess others would have noted such a bug > before. It may happened in older kernels, just recent btrfsck can detect the problem. > > >>> And is there any way to tell more certain, whether the balance >>> would >>> help or whether I'd just get more possibly even hidden corruptions? >>> I >>> mean right... well it would be painful to recover from the most >>> recent >>> backup, but not extremely painful. >> >> When extent and chunk type get wrong, only god knows what will >> happen... >> So no useful help here. > If btrfs check already notices the mismatch, shouldn't it then be > possible to set the correct type? Not possible yet. Since there is not relocation facility in btrfs-progs to do the fix. > > >> If the type mismatch errors are the only error output from fsck, then >> scrub should not help or report anything useful. > I see... > > >> And at last, what's the kernel and btrfs-progs version? > kernel 4.2.6 > btrfsprogs 4.3 > New enough, that's the only good news yet... Thanks, Qu ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 3:23 ` Qu Wenruo @ 2015-11-13 3:31 ` Christoph Anton Mitterer 2015-11-13 3:44 ` Christoph Anton Mitterer 1 sibling, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 3:31 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 842 bytes --] On Fri, 2015-11-13 at 11:23 +0800, Qu Wenruo wrote: > No, "-t 2" means only dump extent tree, no privacy issues at all. > Since only numeric inode/snapshot number and offset inside file. > Or I'll give you a warning on privacy. > > No file name at all, just try it yourself. I'm preparing it... > It may happened in older kernels, just recent btrfsck can detect the > problem. Okay, and how do "we" find the cause? > > kernel 4.2.6 > > btrfsprogs 4.3 > > > New enough, that's the only good news yet... Well but I did use the same fs with older kernels... so it may have happened much earlier... :-( I do have a SHA512 checksum on every file... stored in the XATTRS... So as long as the file hierarchy would still be complete I could at least check whether all files are still really valid. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 3:23 ` Qu Wenruo 2015-11-13 3:31 ` Christoph Anton Mitterer @ 2015-11-13 3:44 ` Christoph Anton Mitterer 1 sibling, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 3:44 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 595 bytes --] On Fri, 2015-11-13 at 11:23 +0800, Qu Wenruo wrote: > No, "-t 2" means only dump extent tree, no privacy issues at all. > Since only numeric inode/snapshot number and offset inside file. > Or I'll give you a warning on privacy. Done... http://christoph.anton.mitterer.name/tmp/public/856fc21a-89b8-11e5-abaf-502690aa641f.xz I tried to figure out which kernel I've started with when I created the fs,... that was around the 1th of March 2015... but I've only had the Debian unstable kernel form that time (not the current vanilla)... and I guess that was 3.16. Thanks, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 2:52 ` Qu Wenruo 2015-11-13 3:03 ` Christoph Anton Mitterer @ 2015-11-13 3:57 ` Christoph Anton Mitterer 2015-11-13 7:05 ` Duncan [not found] ` <564F48FE.4000400@laposte.net> 2 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 3:57 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 686 bytes --] And I should perhaps mention one more thing: As I've said I have these two 8TiB disks... one which is basically the master with loads of precious data, the other being a backup from the master, regularly created with incremental btrfs send/receive. Everytime I did this (which is every two months or so), I also do a complete manual diff of all new/changed files (between the two devices). And when I first filled the master in March, where I copied the data from some other ext4 disks, I did so as well. And there were never differences or missing files. That's why I kinda wonder if the whole thing is in anyway critical or an "issue" at all. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 3:57 ` Christoph Anton Mitterer @ 2015-11-13 7:05 ` Duncan 2015-11-13 9:55 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Duncan @ 2015-11-13 7:05 UTC (permalink / raw) To: linux-btrfs Christoph Anton Mitterer posted on Fri, 13 Nov 2015 04:57:29 +0100 as excerpted: > As I've said I have these two 8TiB disks... one which is basically the > master with loads of precious data, the other being a backup from the > master, regularly created with incremental btrfs send/receive. 8 TiB disks -- are those the disk-managed SMR "archive" disks I've read about on a number of threads? If so, that hardware is almost certainly the cause, as they're known to be problematic on current kernels. While most filesystems (all?) will apparently go corrupt on them, it can remain invisible corruption for quite some time on many of them, but btrfs with its checksums and etc will tend to show up the problems far sooner, and there have been at least 2-3 threads on the problem already, on this list. As I don't have any of those disks here I've been following the threads from a bit of a distance and haven't kept up with the full details, but I do know there's an active bugzilla.kernel.org bug open on the problem, and that the problem was first exposed by a commit that was supposed to help /support/ this sort of drive. Rolling it back does seem to help. I don't recall what kernel that commit landed in, but it was definitely after 3.16, so if that's what you were originally running, the problem wouldn't have shown up right away, until you upgraded to a kernel with the offending commit. For more than that, you'll either need to wait until someone following a bit closer (possibly affected by the problem) posts a better followup, or find the other threads in the list archive, that discuss the issue. You shouldn't have to go back more than a couple weeks or so, and it's very likely in the last week, so almost certainly in this month's posts, if you're checking a list archive on the web. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 7:05 ` Duncan @ 2015-11-13 9:55 ` Christoph Anton Mitterer 2015-11-13 11:37 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 9:55 UTC (permalink / raw) To: Duncan, linux-btrfs; +Cc: Qu Wenruo [-- Attachment #1: Type: text/plain, Size: 2429 bytes --] On Fri, 2015-11-13 at 07:05 +0000, Duncan wrote: > 8 TiB disks -- are those the disk-managed SMR "archive" disks I've > read > about on a number of threads? Yes... but... > If so, that hardware is almost certainly the cause, as they're known > to > be problematic on current kernels. While most filesystems (all?) > will > apparently go corrupt on them, it can remain invisible corruption for > quite some time on many of them, but btrfs with its checksums and etc > will tend to show up the problems far sooner, and there have been at > least 2-3 threads on the problem already, on this list. I think it's pretty unlikely that this is the reason. - I never saw any errors popping up from the lower driver levels (i.e. the ATA errors all those people saw), and I've regularly checked - I always did the checksum verification based on my own hashes stored in each file's XATTRS, without any error so far. - I wrote far more data (the device is nearly fully) without any error (XATTRs/hashes) than the time when most of these people noticed sever corruptions, which seemed to happen already after some GB. - I think it's pretty unlikely that all data (in terms of hashsums) would be okay, and that these corruptions would have just appeared in some of btrfs meta-data. I'm not sure why I don't suffer from these issues, probably because I run them only via USB/SATA bridges, which, while they're USB3.0, are probably too slow for these errors to pop up. See my comment: https://bugzilla.kernel.org/show_bug.cgi?id=93581#c70 Further, a small status update: As mentioned this night, I've kicked of a full run of verifying all of my XATTRs-set hashsums... (and these hashsums are basically all computed when the data is known to be valid, e.g. for DSLR pictures straight off the SD, etc.). In terms of numbers of files, that run is half through,... so far with only a handful of errors related to files where I've apparently forgot to set the sums. No errors (so far). So unless btrfs completely lost file entries (and I guess that wouldn't just affect the extent tree?), and I thus wouldn't verify these files at all or notice them missing,... everything seems fine, as far as I can tell, (so far). I'll basically just about to head of to get my backup disk, which I haven't at home... to see whether it also shows these fsck errors. So stay tuned. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-13 9:55 ` Christoph Anton Mitterer @ 2015-11-13 11:37 ` Christoph Anton Mitterer 0 siblings, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-13 11:37 UTC (permalink / raw) To: Duncan, linux-btrfs; +Cc: Qu Wenruo [-- Attachment #1: Type: text/plain, Size: 583 bytes --] I just got the backup disk back, also btrfs, which was made via send/receive... It has the same errors during fsck. The main disk still hasn't found any file (apart from a few, others for which none of my hash sums were stored at all) that doesn't verify. So I guess there's definitely some bug in btrfs, that even propagated via send/receive or was so common that it happens on another fs as well; even though it's unclear whether this was just in older versions. Perhaps Qu can shed some light on this, when he had a look at my extent tree dump. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
[parent not found: <564F48FE.4000400@laposte.net>]
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk [not found] ` <564F48FE.4000400@laposte.net> @ 2015-11-20 19:24 ` Christoph Anton Mitterer 2015-11-21 0:47 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-20 19:24 UTC (permalink / raw) To: Laurent Bonnaud, quwenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 960 bytes --] On Fri, 2015-11-20 at 17:23 +0100, Laurent Bonnaud wrote: > So here is the output of "btrfs-debug-tree -t 2 <dev>" in case it may Gosh... 15M via mail?! o.O Anyway an update from my side... I've copied all data from the fs in question to a new btrfs,... done under Linux 4.2.6 and btrfs-progs v4.3. No data was lost or anyhow corrupted (also shown by extensive tests using my XATTR hashsums and other backups I've had). On the new fs, btrfs doesn't report that error anymore. No snapshots/etc made so far on the new fs. Qu, have you had a look at btrfs check already? And you've explained that the fs was okay and only the check was wrong... but since the false positive errors don't appear on my copied fs, does that really mean that the skinny metadata changed so much, or was anything changed in the kernel that the same didn't appear on the new fs anymore (or was it perhaps because I was using snapshots?) Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-20 19:24 ` Christoph Anton Mitterer @ 2015-11-21 0:47 ` Qu Wenruo 2015-11-21 1:08 ` Lukas Pirl 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-21 0:47 UTC (permalink / raw) To: Christoph Anton Mitterer, Laurent Bonnaud, quwenruo, linux-btrfs@vger.kernel.org On 11/21/2015 03:24 AM, Christoph Anton Mitterer wrote: > On Fri, 2015-11-20 at 17:23 +0100, Laurent Bonnaud wrote: >> So here is the output of "btrfs-debug-tree -t 2 <dev>" in case it may > Gosh... 15M via mail?! o.O > > Anyway an update from my side... > I've copied all data from the fs in question to a new btrfs,... done > under Linux 4.2.6 and btrfs-progs v4.3. > No data was lost or anyhow corrupted (also shown by extensive tests > using my XATTR hashsums and other backups I've had). > > On the new fs, btrfs doesn't report that error anymore. > No snapshots/etc made so far on the new fs. > > > > Qu, have you had a look at btrfs check already? Nothing interesting found... And if the codes go crazy, it should also infect your new filesystem, but it doesn't... > > And you've explained that the fs was okay and only the check was > wrong... but since the false positive errors don't appear on my copied > fs, does that really mean that the skinny metadata changed so much, or > was anything changed in the kernel that the same didn't appear on the > new fs anymore (or was it perhaps because I was using snapshots?) > Hard to say, but we'd better keep an eye on this issue. At least, if it happens again, we should know if it's related to something like newer kernel or snapshots. Thanks, Qu > > Cheers, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-21 0:47 ` Qu Wenruo @ 2015-11-21 1:08 ` Lukas Pirl 2015-11-22 2:04 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Lukas Pirl @ 2015-11-21 1:08 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On 11/21/2015 01:47 PM, Qu Wenruo wrote as excerpted: > Hard to say, but we'd better keep an eye on this issue. > At least, if it happens again, we should know if it's related to > something like newer kernel or snapshots. I can confirm the initially describe behavior of "btrfs check" and reading the data works fine also. Versions etc.: $ uname -a Linux 4.2.0-0.bpo.1-amd64 #1 SMP Debian 4.2.6-1~bpo8+1 … $ btrfs filesystem show /mnt/data Label: none uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23 Total devices 6 FS bytes used 2.87TiB devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/…SZ devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/…03 devid 3 size 1.82TiB used 1.53TiB path /dev/mapper/…76 devid 4 size 1.82TiB used 1.53TiB path /dev/mapper/…78 devid 6 size 1.82TiB used 1.05TiB path /dev/mapper/…UK *** Some devices missing btrfs-progs v4.3 $ btrfs subvolume list /mnt/data | wc -l 62 Best, Lukas ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-21 1:08 ` Lukas Pirl @ 2015-11-22 2:04 ` Qu Wenruo 2015-11-22 6:56 ` Christoph Anton Mitterer 2015-11-22 10:17 ` Laurent Bonnaud 0 siblings, 2 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-22 2:04 UTC (permalink / raw) To: Lukas Pirl; +Cc: linux-btrfs, calestyo, L.Bonnaud Hi Lukas, Laurent and Christoph, If any of you can recompile btrfs-progs and use gdb to debug it, would anyone please to investigate where did the wrong_chunk_type is set? It is in the function check_extent_type(): ------ /* Check if the type of extent matches with its chunk */ static void check_extent_type(struct extent_record *rec) { struct btrfs_block_group_cache *bg_cache; bg_cache = btrfs_lookup_first_block_group(global_info, rec->start); if (!bg_cache) return; /* data extent, check chunk directly*/ if (!rec->metadata) { if (!(bg_cache->flags & BTRFS_BLOCK_GROUP_DATA)) rec->wrong_chunk_type = 1; <<< HERE return; } /* metadata extent, check the obvious case first */ if (!(bg_cache->flags & (BTRFS_BLOCK_GROUP_SYSTEM | BTRFS_BLOCK_GROUP_METADATA))) { rec->wrong_chunk_type = 1; <<< HERE return; } /* * Check SYSTEM extent, as it's also marked as metadata, we can only * make sure it's a SYSTEM extent by its backref */ if (!list_empty(&rec->backrefs)) { struct extent_backref *node; struct tree_backref *tback; u64 bg_type; node = list_entry(rec->backrefs.next, struct extent_backref, list); if (node->is_data) { /* tree block shouldn't have data backref */ rec->wrong_chunk_type = 1; <<< HERE return; } tback = container_of(node, struct tree_backref, node); if (tback->root == BTRFS_CHUNK_TREE_OBJECTID) bg_type = BTRFS_BLOCK_GROUP_SYSTEM; else bg_type = BTRFS_BLOCK_GROUP_METADATA; if (!(bg_cache->flags & bg_type)) rec->wrong_chunk_type = 1; <<< HERE } } ------ If you can add break point on the "rec->wrong_chunk_type = 1;" line, it would be quite helpful for further debugging. Thanks, Qu On 11/21/2015 09:08 AM, Lukas Pirl wrote: > On 11/21/2015 01:47 PM, Qu Wenruo wrote as excerpted: >> Hard to say, but we'd better keep an eye on this issue. >> At least, if it happens again, we should know if it's related to >> something like newer kernel or snapshots. > > I can confirm the initially describe behavior of "btrfs check" and > reading the data works fine also. > > Versions etc.: > > $ uname -a > Linux 4.2.0-0.bpo.1-amd64 #1 SMP Debian 4.2.6-1~bpo8+1 … > $ btrfs filesystem show /mnt/data > Label: none uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23 > Total devices 6 FS bytes used 2.87TiB > devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/…SZ > devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/…03 > devid 3 size 1.82TiB used 1.53TiB path /dev/mapper/…76 > devid 4 size 1.82TiB used 1.53TiB path /dev/mapper/…78 > devid 6 size 1.82TiB used 1.05TiB path /dev/mapper/…UK > *** Some devices missing > > btrfs-progs v4.3 > > $ btrfs subvolume list /mnt/data | wc -l > 62 > > Best, > > Lukas > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-22 2:04 ` Qu Wenruo @ 2015-11-22 6:56 ` Christoph Anton Mitterer 2015-11-23 1:10 ` Qu Wenruo 2015-11-22 10:17 ` Laurent Bonnaud 1 sibling, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-22 6:56 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2006 bytes --] Hey Qu. On Sun, 2015-11-22 at 10:04 +0800, Qu Wenruo wrote: > If any of you can recompile btrfs-progs and use gdb to debug it, > would > anyone please to investigate where did the wrong_chunk_type is set? > It is in the function check_extent_type(): Not 100% what you want... AFAIU, you just want to see whether that line is reached? If didn't re-compile but used the btrfs-tools-dbg package, but I guess that should do. In the debian version that line seems to be at: 4374 4375 /* Check if the type of extent matches with its chunk */ 4376 static void check_extent_type(struct extent_record *rec) 4377 { ... 4419 bg_type = BTRFS_BLOCK_GROUP_METADATA; 4420 if (!(bg_cache->flags & bg_type)) 4421 rec->wrong_chunk_type = 1; 4422 } 4423 } Running: # gdb btrfs (gdb) dir /root/btrfs-tools-4.3 Source directories searched: /root/btrfs-tools-4.3:$cdir:$cwd (gdb) break cmds-check.c:4421 Breakpoint 1 at 0x41d859: file cmds-check.c, line 4421. (gdb) run check /dev/mapper/data-b Starting program: /bin/btrfs check /dev/mapper/data-b [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". ... in fact reaches that breakpoint: Breakpoint 1, check_extent_type (rec=rec@entry=0x884290) at cmds-check.c:4423 4423 } ... but the error message ("bad extent [5993525264384, 5993525280768), type mismatch with chunk") doesn't seem to be printed at that stage... If I continue, it goes for a while: Breakpoint 1, check_extent_type (rec=rec@entry=0x884290) at cmds-check.c:4423 4423 } (gdb) cont 100000 Will ignore next 99999 crossings of breakpoint 1. Continuing. and so on for at least several million crossings... I then removed the breakpoint and after a longer while the usual error messages came up. Does that help? Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-22 6:56 ` Christoph Anton Mitterer @ 2015-11-23 1:10 ` Qu Wenruo 2015-11-23 18:12 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-23 1:10 UTC (permalink / raw) To: Christoph Anton Mitterer, Qu Wenruo; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/22 07:56 +0100: > Hey Qu. > > > On Sun, 2015-11-22 at 10:04 +0800, Qu Wenruo wrote: >> If any of you can recompile btrfs-progs and use gdb to debug it, >> would >> anyone please to investigate where did the wrong_chunk_type is set? >> It is in the function check_extent_type(): > > Not 100% what you want... AFAIU, you just want to see whether that line > is reached? > > If didn't re-compile but used the btrfs-tools-dbg package, but I guess > that should do. > > In the debian version that line seems to be at: > 4374 > 4375 /* Check if the type of extent matches with its chunk */ > 4376 static void check_extent_type(struct extent_record *rec) > 4377 { > ... > 4419 bg_type = BTRFS_BLOCK_GROUP_METADATA; > 4420 if (!(bg_cache->flags & bg_type)) > 4421 rec->wrong_chunk_type = 1; > 4422 } > 4423 } > > Running: > # gdb btrfs > (gdb) dir /root/btrfs-tools-4.3 > Source directories searched: /root/btrfs-tools-4.3:$cdir:$cwd > (gdb) break cmds-check.c:4421 > Breakpoint 1 at 0x41d859: file cmds-check.c, line 4421. > (gdb) run check /dev/mapper/data-b > Starting program: /bin/btrfs check /dev/mapper/data-b > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > > ... in fact reaches that breakpoint: > Breakpoint 1, check_extent_type (rec=rec@entry=0x884290) at cmds-check.c:4423 > 4423 } Normally you can use -dbg package, but as you experienced, a lot of reasons like optimization from compiler can cause such situation. That's why I recommended to compile the package from scratch. And if you want to compile, it's also recommended to disable unrelated component by disabling document(needs xmlto and ascii) and convert(needs a lot of ext2 related headers). "./configure --disable-convert --disable-documentation" should do the trick. Also, you won't want compiler to do extra optimization, so after configuration, you'd better edit Makefile and change optimization level to 0, by changing the "-O1" (o and one) to "-O0" (o and zero) of CFLAGS. Then make, compiler will complain about the optimization level as some headers need it, but that's OK and won't cause anything wrong. After make, you won't need to install the btrfs-progs, you can just use gdb to debug local ./btrfsck and add new breakpoints to do the trick. Thanks, Qu > > ... but the error message ("bad extent [5993525264384, 5993525280768), > type mismatch with chunk") doesn't seem to be printed at that stage... > > > If I continue, it goes for a while: > > Breakpoint 1, check_extent_type (rec=rec@entry=0x884290) at cmds-check.c:4423 > 4423 } > (gdb) cont 100000 > Will ignore next 99999 crossings of breakpoint 1. Continuing. > > and so on for at least several million crossings... I then removed the > breakpoint and after a longer while the usual error messages came up. > > > Does that help? > > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-23 1:10 ` Qu Wenruo @ 2015-11-23 18:12 ` Christoph Anton Mitterer 2015-11-24 0:46 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-23 18:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2086 bytes --] On Mon, 2015-11-23 at 09:10 +0800, Qu Wenruo wrote: > Also, you won't want compiler to do extra optimization I did the following: $ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2" $ ./configure --disable-convert --disable-documentation So if you want me to get rid of _FORTIFY_SOURCE, please tell. > > After make, you won't need to install the btrfs-progs, you can just > use > gdb to debug local ./btrfsck and add new breakpoints to do the trick. # gdb ./btrfs GNU gdb (Debian 7.10-1) 7.10 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./btrfs...done. (gdb) break cmds-check.c:4421 Breakpoint 1 at 0x42d000: file cmds-check.c, line 4421. (gdb) run check /dev/mapper/data-b Starting program: /home/calestyo/bfsck/btrfs-tools-4.3/btrfs check /dev/mapper/data-b [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". ... bad extent [6619620278272, 6619620294656), type mismatch with chunk bad extent [6619620294656, 6619620311040), type mismatch with chunk bad extent [6619620311040, 6619620327424), type mismatch with chunk checking free space cache checking fs roots with not breakpoint reached. And I've actually did that with both btrfs where the problem occurred (the master, and the one that send/received snapshots incrementally from it). Hope that helps... anything further to do? Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-23 18:12 ` Christoph Anton Mitterer @ 2015-11-24 0:46 ` Qu Wenruo 2015-11-24 1:53 ` Christoph Anton Mitterer 2015-11-24 17:39 ` David Sterba 0 siblings, 2 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 0:46 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/23 19:12 +0100: > On Mon, 2015-11-23 at 09:10 +0800, Qu Wenruo wrote: >> Also, you won't want compiler to do extra optimization > I did the following: > $ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2" Wow, I didn't ever know it's possible to override FORTIFY_SOURCE to suppress the warning. Great tip! > $ ./configure --disable-convert --disable-documentation > > So if you want me to get rid of _FORTIFY_SOURCE, please tell. > > >> >> After make, you won't need to install the btrfs-progs, you can just >> use >> gdb to debug local ./btrfsck and add new breakpoints to do the trick. > # gdb ./btrfs > GNU gdb (Debian 7.10-1) 7.10 > Copyright (C) 2015 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./btrfs...done. > (gdb) break cmds-check.c:4421 > Breakpoint 1 at 0x42d000: file cmds-check.c, line 4421. Yes, that's one possible code where set the bad_extent flag. But there are also some other places like line 4411, 4394 and 4387. So there are still 3 breakpoint needs to add. At least, we ruled out one possible case. Thanks for all the result you provide, we are firmly getting to the result now. Thanks, Qu > (gdb) run check /dev/mapper/data-b > Starting program: /home/calestyo/bfsck/btrfs-tools-4.3/btrfs check /dev/mapper/data-b > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > ... > bad extent [6619620278272, 6619620294656), type mismatch with chunk > bad extent [6619620294656, 6619620311040), type mismatch with chunk > bad extent [6619620311040, 6619620327424), type mismatch with chunk > checking free space cache > checking fs roots > > with not breakpoint reached. And I've actually did that with both btrfs > where the problem occurred (the master, and the one that send/received > snapshots incrementally from it). > > > Hope that helps... anything further to do? > > Chris. > -- This message has been scanned for viruses and dangerous content by FCNIC, and is believed to be clean. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 0:46 ` Qu Wenruo @ 2015-11-24 1:53 ` Christoph Anton Mitterer 2015-11-24 2:09 ` Qu Wenruo 2015-11-24 17:39 ` David Sterba 1 sibling, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-24 1:53 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3156 bytes --] On Tue, 2015-11-24 at 08:46 +0800, Qu Wenruo wrote: > But there are also some other places like line 4411, 4394 and 4387. Ah of course, I didn't have a look for further places.... $ grep -n "rec->wrong_chunk_type = 1" cmds-check.c 4387: rec->wrong_chunk_type = 1; 4394: rec->wrong_chunk_type = 1; 4411: rec->wrong_chunk_type = 1; 4421: rec->wrong_chunk_type = 1; > So there are still 3 breakpoint needs to add. GNU gdb (Debian 7.10-1) 7.10 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./btrfs...done. (gdb) break cmds-check.c:4387 Breakpoint 1 at 0x42cf2b: file cmds-check.c, line 4387. (gdb) break cmds-check.c:4394 Breakpoint 2 at 0x42cf57: file cmds-check.c, line 4394. (gdb) break cmds-check.c:4411 Breakpoint 3 at 0x42cfa6: file cmds-check.c, line 4411. (gdb) break cmds-check.c:4421 Breakpoint 4 at 0x42d000: file cmds-check.c, line 4421. (gdb) run check /dev/mapper/data-b Starting program: /home/calestyo/bfsck/btrfs-tools-4.3/btrfs check /dev/mapper/data-b [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Checking filesystem on /dev/mapper/data-b UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f checking extents Breakpoint 1, check_extent_type (rec=0x20a6740) at cmds-check.c:4387 4387 rec->wrong_chunk_type = 1; (gdb) continue Continuing. Breakpoint 1, check_extent_type (rec=0x20a6740) at cmds-check.c:4387 4387 rec->wrong_chunk_type = 1; (gdb) cont 100 Will ignore next 99 crossings of breakpoint 1. Continuing. Breakpoint 1, check_extent_type (rec=0x20a9880) at cmds-check.c:4387 4387 rec->wrong_chunk_type = 1; (gdb) cont 1000 Will ignore next 999 crossings of breakpoint 1. Continuing. That goes on for a few millions... until we get the: bad extent [6619620016128, 6619620032512), type mismatch with chunk bad extent [6619620032512, 6619620048896), type mismatch with chunk again.. and the check exits normally with: checking free space cache checking fs roots checking csums checking root refs found 5862373889375 bytes used err is 0 total csum bytes: 5715302800 total tree bytes: 9903816704 total fs tree bytes: 2475769856 total extent tree bytes: 938393600 btree space waste bytes: 1072581913 file data blocks allocated: 9170230497280 referenced 9281014861824 btrfs-progs v4.3 [Inferior 1 (process 18130) exited normally] So it's the one in 4378. Anything further to do? :) Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 1:53 ` Christoph Anton Mitterer @ 2015-11-24 2:09 ` Qu Wenruo 2015-11-24 2:48 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 2:09 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/24 02:53 +0100: > On Tue, 2015-11-24 at 08:46 +0800, Qu Wenruo wrote: >> But there are also some other places like line 4411, 4394 and 4387. > Ah of course, I didn't have a look for further places.... > > $ grep -n "rec->wrong_chunk_type = 1" cmds-check.c > 4387: rec->wrong_chunk_type = 1; > 4394: rec->wrong_chunk_type = 1; > 4411: rec->wrong_chunk_type = 1; > 4421: rec->wrong_chunk_type = 1; > > > >> So there are still 3 breakpoint needs to add. > GNU gdb (Debian 7.10-1) 7.10 > Copyright (C) 2015 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./btrfs...done. > (gdb) break cmds-check.c:4387 > Breakpoint 1 at 0x42cf2b: file cmds-check.c, line 4387. > (gdb) break cmds-check.c:4394 > Breakpoint 2 at 0x42cf57: file cmds-check.c, line 4394. > (gdb) break cmds-check.c:4411 > Breakpoint 3 at 0x42cfa6: file cmds-check.c, line 4411. > (gdb) break cmds-check.c:4421 > Breakpoint 4 at 0x42d000: file cmds-check.c, line 4421. > (gdb) run check /dev/mapper/data-b > Starting program: /home/calestyo/bfsck/btrfs-tools-4.3/btrfs check /dev/mapper/data-b > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Checking filesystem on /dev/mapper/data-b > UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f > checking extents > > Breakpoint 1, check_extent_type (rec=0x20a6740) at cmds-check.c:4387 > 4387 rec->wrong_chunk_type = 1; > (gdb) continue > Continuing. > > Breakpoint 1, check_extent_type (rec=0x20a6740) at cmds-check.c:4387 > 4387 rec->wrong_chunk_type = 1; > (gdb) cont 100 > Will ignore next 99 crossings of breakpoint 1. Continuing. > > Breakpoint 1, check_extent_type (rec=0x20a9880) at cmds-check.c:4387 > 4387 rec->wrong_chunk_type = 1; > (gdb) cont 1000 > Will ignore next 999 crossings of breakpoint 1. Continuing. Great, that's the direct cause. After some quick glance, it seems that extent_record->metadata does not always reflect the right status of the extent. I'll dig further to see what's causing the problem. Thanks for all the debug info, it really helps a lot! Qu > > That goes on for a few millions... until we get the: > bad extent [6619620016128, 6619620032512), type mismatch with chunk > bad extent [6619620032512, 6619620048896), type mismatch with chunk > again.. and the check exits normally with: > checking free space cache > checking fs roots > checking csums > checking root refs > found 5862373889375 bytes used err is 0 > total csum bytes: 5715302800 > total tree bytes: 9903816704 > total fs tree bytes: 2475769856 > total extent tree bytes: 938393600 > btree space waste bytes: 1072581913 > file data blocks allocated: 9170230497280 > referenced 9281014861824 > btrfs-progs v4.3 > [Inferior 1 (process 18130) exited normally] > > > So it's the one in 4378. > > Anything further to do? :) > > Chris. > -- This message has been scanned for viruses and dangerous content by FCNIC, and is believed to be clean. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 2:09 ` Qu Wenruo @ 2015-11-24 2:48 ` Christoph Anton Mitterer 2015-11-24 2:54 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-24 2:48 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 281 bytes --] On Tue, 2015-11-24 at 10:09 +0800, Qu Wenruo wrote: > I'll dig further to see what's causing the problem. I guess you'd prefer if I keep the fs for later verification? > Thanks for all the debug info, it really helps a lot! Well thanks for your efforts as well :) Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 2:48 ` Christoph Anton Mitterer @ 2015-11-24 2:54 ` Qu Wenruo 2015-11-24 3:02 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 2:54 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/24 03:48 +0100: > On Tue, 2015-11-24 at 10:09 +0800, Qu Wenruo wrote: >> I'll dig further to see what's causing the problem. > I guess you'd prefer if I keep the fs for later verification? That would be the best. And it would be even better if you want to be a lab mouse for incoming fixing patches. (It won't hurt nor destroy your data) Thanks, Qu > > >> Thanks for all the debug info, it really helps a lot! > Well thanks for your efforts as well :) > > Chris. > -- This message has been scanned for viruses and dangerous content by FCNIC, and is believed to be clean. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 2:54 ` Qu Wenruo @ 2015-11-24 3:02 ` Christoph Anton Mitterer 2015-11-24 5:35 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-24 3:02 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 492 bytes --] On Tue, 2015-11-24 at 10:54 +0800, Qu Wenruo wrote: > And it would be even better if you want to be a lab mouse for > incoming fixing patches. Sure,.. if I get some cheese... and it would be great if you could give me patches that apply to 4.3. > (It won't hurt nor destroy your data) wouldn't matter... it's already backuped again and that fs is for playing now ;-) Just tell me in case you need to give up so that I can start re-using the device then. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 3:02 ` Christoph Anton Mitterer @ 2015-11-24 5:35 ` Qu Wenruo 2015-11-24 18:25 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 5:35 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/24 04:02 +0100: > On Tue, 2015-11-24 at 10:54 +0800, Qu Wenruo wrote: >> And it would be even better if you want to be a lab mouse for >> incoming fixing patches. > Sure,.. if I get some cheese... and it would be great if you could give > me patches that apply to 4.3. Hopes you didn't wait too long. The fixing patch is CCed to you, or you can get it from patchwork: https://patchwork.kernel.org/patch/7687611/ > > >> (It won't hurt nor destroy your data) > wouldn't matter... it's already backuped again and that fs is for > playing now ;-) > > > Just tell me in case you need to give up so that I can start re-using > the device then. If that patch fixed the false alert, you are OK to re-use it. Thanks, Qu > > Cheers, > Chris. > -- This message has been scanned for viruses and dangerous content by FCNIC, and is believed to be clean. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 5:35 ` Qu Wenruo @ 2015-11-24 18:25 ` Christoph Anton Mitterer 2015-11-25 0:02 ` Qu Wenruo 2015-11-25 0:59 ` Qu Wenruo 0 siblings, 2 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-24 18:25 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 363 bytes --] On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote: > Hopes you didn't wait too long. No worries, didn't hold my breath ;) > The fixing patch is CCed to you, or you can get it from patchwork: > https://patchwork.kernel.org/patch/7687611/ Unfortunately that doesn't make the error messages go away. :( Shall I start debugging again? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 18:25 ` Christoph Anton Mitterer @ 2015-11-25 0:02 ` Qu Wenruo 2015-11-25 0:59 ` Qu Wenruo 1 sibling, 0 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-25 0:02 UTC (permalink / raw) To: Christoph Anton Mitterer, Qu Wenruo; +Cc: linux-btrfs On 11/25/2015 02:25 AM, Christoph Anton Mitterer wrote: > On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote: >> Hopes you didn't wait too long. > No worries, didn't hold my breath ;) > > >> The fixing patch is CCed to you, or you can get it from patchwork: >> https://patchwork.kernel.org/patch/7687611/ > Unfortunately that doesn't make the error messages go away. > :( > > Shall I start debugging again? That's too bad... Although you can try debugging again, but the result may not change at all. :( There maybe some complicated debugging method, like add breakpoint at add_extent_rec() for special bytenr(bytenr in btrfsck error output, e.g 5993525264384) to check if its metadata flag is set correctly. If metadata flag is not set correctly, the backtrace will provide a lot of useful info... But that's what dev should do, not an end user. I'm totally OK if you can't provide that output. I'll continue searching the code for any possible false alert. But without a local test image, I'm afraid you may try several patches until a final fix is found.... Thanks, Qu > > Cheers, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 18:25 ` Christoph Anton Mitterer 2015-11-25 0:02 ` Qu Wenruo @ 2015-11-25 0:59 ` Qu Wenruo 2015-11-25 3:35 ` Christoph Anton Mitterer 1 sibling, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-25 0:59 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-btrfs Christoph Anton Mitterer wrote on 2015/11/24 19:25 +0100: > On Tue, 2015-11-24 at 13:35 +0800, Qu Wenruo wrote: >> Hopes you didn't wait too long. > No worries, didn't hold my breath ;) > > >> The fixing patch is CCed to you, or you can get it from patchwork: >> https://patchwork.kernel.org/patch/7687611/ > Unfortunately that doesn't make the error messages go away. > :( > > Shall I start debugging again? > > Cheers, > Chris. > Quite strange... I succeeded in reproducing the bug, just disable skinny metadata and create fill a btrfs with fsstress. Btrfsck will report a lot of such false alert. But with my patch applied, all the warning just disappeared... Did you use the complied btrfsck? Or use the system btrfsck by mistake? Thanks, Qu -- This message has been scanned for viruses and dangerous content by Fujitsu, and is believed to be clean. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-25 0:59 ` Qu Wenruo @ 2015-11-25 3:35 ` Christoph Anton Mitterer 2015-11-25 4:16 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-25 3:35 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1250 bytes --] On Wed, 2015-11-25 at 08:59 +0800, Qu Wenruo wrote: > Did you use the complied btrfsck? Or use the system btrfsck by > mistake? I'm pretty sure cause I already did the whole procedure twice, but let me repeat and record it here just to be 100% sure: $ make clean Cleaning $ md5sum cmds-check.c a7e7d871c3b666df6b56c724dbfd1d86 cmds-check.c $ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2" $ ./configure --disable-convert --disable-documentation [snip] $ make # ./btrfs check /dev/mapper/data-b Checking filesystem on /dev/mapper/data-b UUID: 250ddae1-7b37-4b22-89e9-4dc5886c810f checking extents [getting a cup of tea] .. and voila it works... which is kinda weird... I still have the previous run in the bash history... and I *did* invoke ./btrfs and not btrfs. Also I just haven't done any further patching... so it cannot be that the patch wasn't applied before. WTF?! Apparently I suffer from Gremlins :-/ *a little while later* And back they are...(the errors)... o.O This time I checked both of my devices that shown the symptoms concurrently... data-b as above showed no errors. data-old-a, came with the same errors as before. Is there anything non-deterministic involved? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-25 3:35 ` Christoph Anton Mitterer @ 2015-11-25 4:16 ` Christoph Anton Mitterer 0 siblings, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-25 4:16 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 7219 bytes --] Hey again. So it seems that data-b is always fine (well at least three times in a row) and data-old-a always gives errors. including e.g: bad extent [3067663679488, 3067663695872), type mismatch with chunk bad extent [3067663876096, 3067663892480), type mismatch with chunk bad extent [3067663892480, 3067663908864), type mismatch with chunk bad extent [3067663908864, 3067663925248), type mismatch with chunk bad extent [3067669348352, 3067669364736), type mismatch with chunk bad extent [3067669430272, 3067669446656), type mismatch with chunk bad extent [3067669659648, 3067669676032), type mismatch with chunk bad extent [3067669790720, 3067669807104), type mismatch with chunk bad extent [3067669807104, 3067669823488), type mismatch with chunk bad extent [3067669823488, 3067669839872), type mismatch with chunk bad extent [3067669872640, 3067669889024), type mismatch with chunk bad extent [3067669921792, 3067669938176), type mismatch with chunk bad extent [3067671805952, 3067671822336), type mismatch with chunk I've started debugging (everything as before) with: (gdb) break cmds-check.c:4387 Breakpoint 1 at 0x42cf2b: file cmds-check.c, line 4387. (gdb) break cmds-check.c:4394 Breakpoint 2 at 0x42cf57: file cmds-check.c, line 4394. (gdb) break cmds-check.c:4411 Breakpoint 3 at 0x42cfa6: file cmds-check.c, line 4411. (gdb) break cmds-check.c:4421 Breakpoint 4 at 0x42d000: file cmds-check.c, line 4421. Hit a: Breakpoint 1, check_extent_type (rec=0x1a44130) at cmds-check.c:4387 4387 rec->wrong_chunk_type = 1; (gdb) bt #0 check_extent_type (rec=0x1a44130) at cmds-check.c:4387 #1 0x000000000042d6a5 in add_extent_rec (extent_cache=0x7fffffffdf30, parent_key=0x0, parent_gen=0, start=1097665216512, nr=16384, extent_item_refs=1, is_root=0, inc_ref=0, set_checked=0, metadata=0, extent_rec=1, max_size=16384) at cmds-check.c:4576 #2 0x000000000042ecc9 in process_extent_item (root=0x919d20, extent_cache=0x7fffffffdf30, eb=0x1a0edb0, slot=95) at cmds-check.c:5142 #3 0x0000000000430aea in run_next_block (root=0x919d20, bits=0x91e220, bits_nr=1024, last=0x7fffffffdb78, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40, ri=0x6cef30) at cmds-check.c:5960 #4 0x00000000004356c4 in deal_root_from_list (list=0x7fffffffdc00, root=0x919d20, bits=0x91e220, bits_nr=1024, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40) at cmds-check.c:8014 #5 0x0000000000435d91 in check_chunks_and_extents (root=0x919d20) at cmds-check.c:8181 #6 0x0000000000438e2b in cmd_check (argc=1, argv=0x7fffffffe220) at cmds-check.c:9627 #7 0x0000000000409d49 in main (argc=2, argv=0x7fffffffe220) at btrfs.c:252 (gdb) continue Continuing. Breakpoint 1, check_extent_type (rec=0x1a44130) at cmds-check.c:4387 4387 rec->wrong_chunk_type = 1; (gdb) bt #0 check_extent_type (rec=0x1a44130) at cmds-check.c:4387 #1 0x000000000042d856 in add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=1097665216512, parent=1314162819072, root=0, found_ref=0) at cmds-check.c:4624 #2 0x000000000042ede2 in process_extent_item (root=0x919d20, extent_cache=0x7fffffffdf30, eb=0x1a0edb0, slot=95) at cmds-check.c:5161 #3 0x0000000000430aea in run_next_block (root=0x919d20, bits=0x91e220, bits_nr=1024, last=0x7fffffffdb78, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40, ri=0x6cef30) at cmds-check.c:5960 #4 0x00000000004356c4 in deal_root_from_list (list=0x7fffffffdc00, root=0x919d20, bits=0x91e220, bits_nr=1024, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40) at cmds-check.c:8014 #5 0x0000000000435d91 in check_chunks_and_extents (root=0x919d20) at cmds-check.c:8181 #6 0x0000000000438e2b in cmd_check (argc=1, argv=0x7fffffffe220) at cmds-check.c:9627 #7 0x0000000000409d49 in main (argc=2, argv=0x7fffffffe220) at btrfs.c:252 You've mentioned add_extent_rec() before, but that doesn't seem to contain bytenr so I cannot break on it. I tried it with add_tree_backref instead, maybe that's already helpful for you until you give me further instructions on what to debug: Breakpoint 5 at 0x42d84a: file cmds-check.c, line 4624. (gdb) continue Continuing. Breakpoint 5, add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=3067669348352, parent=0, root=2, found_ref=0) at cmds-check.c:4624 4624 check_extent_type(rec); (gdb) bt #0 add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=3067669348352, parent=0, root=2, found_ref=0) at cmds-check.c:4624 #1 0x000000000042edb8 in process_extent_item (root=0x919d20, extent_cache=0x7fffffffdf30, eb=0x1a0edb0, slot=96) at cmds-check.c:5157 #2 0x0000000000430aea in run_next_block (root=0x919d20, bits=0x91e220, bits_nr=1024, last=0x7fffffffdb78, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40, ri=0x6cef30) at cmds-check.c:5960 #3 0x00000000004356c4 in deal_root_from_list (list=0x7fffffffdc00, root=0x919d20, bits=0x91e220, bits_nr=1024, pending=0x7fffffffdf10, seen=0x7fffffffdf20, reada=0x7fffffffdf00, nodes=0x7fffffffdef0, extent_cache=0x7fffffffdf30, chunk_cache=0x7fffffffdf90, dev_cache=0x7fffffffdfa0, block_group_cache=0x7fffffffdf70, dev_extent_cache=0x7fffffffdf40) at cmds-check.c:8014 #4 0x0000000000435d91 in check_chunks_and_extents (root=0x919d20) at cmds-check.c:8181 #5 0x0000000000438e2b in cmd_check (argc=1, argv=0x7fffffffe220) at cmds-check.c:9627 #6 0x0000000000409d49 in main (argc=2, argv=0x7fffffffe220) at btrfs.c:252 (btw: all lines are 4.3 including your patch) breakpoint 5 would be reached many times: Breakpoint 5, add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=3067669348352, parent=0, root=2, found_ref=0) at cmds-check.c:4624 4624 check_extent_type(rec); (gdb) continue Continuing. Breakpoint 5, add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=3067669348352, parent=0, root=7, found_ref=0) at cmds-check.c:4624 4624 check_extent_type(rec); (gdb) continue Continuing. Breakpoint 5, add_tree_backref (extent_cache=0x7fffffffdf30, bytenr=3067669348352, parent=0, root=7, found_ref=0) at cmds-check.c:4624 4624 check_extent_type(rec); (gdb) continue Continuing. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 0:46 ` Qu Wenruo 2015-11-24 1:53 ` Christoph Anton Mitterer @ 2015-11-24 17:39 ` David Sterba 1 sibling, 0 replies; 55+ messages in thread From: David Sterba @ 2015-11-24 17:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: Christoph Anton Mitterer, linux-btrfs On Tue, Nov 24, 2015 at 08:46:03AM +0800, Qu Wenruo wrote: > > > Christoph Anton Mitterer wrote on 2015/11/23 19:12 +0100: > > On Mon, 2015-11-23 at 09:10 +0800, Qu Wenruo wrote: > >> Also, you won't want compiler to do extra optimization > > I did the following: > > $ export CFLAGS="-g -O0 -Wall -D_FORTIFY_SOURCE=2" > > Wow, I didn't ever know it's possible to override FORTIFY_SOURCE to > suppress the warning. FWIW, my tip: make EXTRA_CFLAGS='-g -O0 -U_FORTIFY_SOURCE' ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-22 2:04 ` Qu Wenruo 2015-11-22 6:56 ` Christoph Anton Mitterer @ 2015-11-22 10:17 ` Laurent Bonnaud 2015-11-23 1:00 ` Qu Wenruo 1 sibling, 1 reply; 55+ messages in thread From: Laurent Bonnaud @ 2015-11-22 10:17 UTC (permalink / raw) To: Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 22/11/2015 03:04, Qu Wenruo wrote: > If any of you can recompile btrfs-progs and use gdb to debug it, > would anyone please to investigate where did the wrong_chunk_type is > set? In the mean time my btrfs filesystem degraded Nov 20 18:10:53 irancy kernel: BTRFS: device label sauvegarde-IUT2 devid 1 transid 9056 /dev/sdb1 Nov 20 18:10:53 irancy kernel: BTRFS info (device sdb1): disk space caching is enabled Nov 20 18:10:56 irancy kernel: BTRFS: checking UUID tree Nov 20 18:12:06 irancy kernel: BTRFS info (device sdb1): disk space caching is enabled Nov 20 18:12:49 irancy kernel: BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space Nov 20 18:12:49 irancy kernel: BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now Nov 20 18:12:51 irancy kernel: BTRFS: Transaction aborted (error -17) Nov 20 18:12:51 irancy kernel: BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2781: errno=-17 Object already exists Nov 20 18:12:51 irancy kernel: BTRFS info (device sdb1): forced readonly Nov 20 18:12:51 irancy kernel: BTRFS warning (device sdb1): Skipping commit of aborted transaction. Nov 20 18:12:51 irancy kernel: BTRFS: error (device sdb1) in cleanup_transaction:1710: errno=-17 Object already exists Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 [...] Are you interested in the output of btrfs-image ? No idea of the size (I am running it currently) but here some info about its size: # btrfs fi df /mnt/sauvegarde/ Data, single: total=1.81TiB, used=1.79TiB System, DUP: total=32.00MiB, used=240.00KiB Metadata, DUP: total=7.00GiB, used=5.38GiB GlobalReserve, single: total=512.00MiB, used=0.00B -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-22 10:17 ` Laurent Bonnaud @ 2015-11-23 1:00 ` Qu Wenruo 2015-11-24 13:15 ` Laurent Bonnaud 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-23 1:00 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo Laurent Bonnaud wrote on 2015/11/22 11:17 +0100: > On 22/11/2015 03:04, Qu Wenruo wrote: > >> If any of you can recompile btrfs-progs and use gdb to debug it, >> would anyone please to investigate where did the wrong_chunk_type is >> set? > > In the mean time my btrfs filesystem degraded > > Nov 20 18:10:53 irancy kernel: BTRFS: device label sauvegarde-IUT2 devid 1 transid 9056 /dev/sdb1 > Nov 20 18:10:53 irancy kernel: BTRFS info (device sdb1): disk space caching is enabled > Nov 20 18:10:56 irancy kernel: BTRFS: checking UUID tree > Nov 20 18:12:06 irancy kernel: BTRFS info (device sdb1): disk space caching is enabled > Nov 20 18:12:49 irancy kernel: BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space > Nov 20 18:12:49 irancy kernel: BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now > Nov 20 18:12:51 irancy kernel: BTRFS: Transaction aborted (error -17) > Nov 20 18:12:51 irancy kernel: BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2781: errno=-17 Object already exists Not pretty sure about the free space error for blockgroup, but the btrfs_run_delayed_refs problem seems to be a regression from 4.2-rc1. And should be fixed in 4.3-rc1, or just revert to 4.1, where my qgroup rework(with a delayed_ref regression) is not introduced. > Nov 20 18:12:51 irancy kernel: BTRFS info (device sdb1): forced readonly > Nov 20 18:12:51 irancy kernel: BTRFS warning (device sdb1): Skipping commit of aborted transaction. > Nov 20 18:12:51 irancy kernel: BTRFS: error (device sdb1) in cleanup_transaction:1710: errno=-17 Object already exists > Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > Nov 20 19:25:37 irancy kernel: BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [...] > > Are you interested in the output of btrfs-image ? No idea of the size (I am running it currently) but here some info about its size: > > # btrfs fi df /mnt/sauvegarde/ > Data, single: total=1.81TiB, used=1.79TiB > System, DUP: total=32.00MiB, used=240.00KiB > Metadata, DUP: total=7.00GiB, used=5.38GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > Considering the size, I'd like not to touch the dump, metadata is over 5G, and I think it's not related to on-disk data, but runtime problem like I mentioned above. Thanks, Qu ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-23 1:00 ` Qu Wenruo @ 2015-11-24 13:15 ` Laurent Bonnaud 2015-11-24 23:46 ` Qu Wenruo 2015-11-24 23:53 ` Qu Wenruo 0 siblings, 2 replies; 55+ messages in thread From: Laurent Bonnaud @ 2015-11-24 13:15 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 23/11/2015 02:00, Qu Wenruo wrote: > Considering the size, I'd like not to touch the dump, metadata is over 5G, It is only 2GB once compressed :>. > and I think it's not related to on-disk data, but runtime problem like I mentioned above. To test this hypothesis I did the following: - reboot the machine with a 4.3.0 kernel from Debian experimental - run "du" on the btrfs FS as a quick sanity check The kernel went read-only again with the following kernel errors: [ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled [ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space [ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now [ 5773.947885] ------------[ cut here ]------------ [ 5773.947908] WARNING: CPU: 0 PID: 2546 at /build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]() [ 5773.947909] BTRFS: Transaction aborted (error -17) [ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache [ 5773.947951] jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common [ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1 [ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, BIOS A08 01/21/2011 [ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 5773.947983] ffffffffa02a8250 ffffffff812c53a9 ffff8800af283d30 ffffffff8106ebad [ 5773.947985] ffff8800ace5eae0 ffff8800af283d80 ffff8800ac6ade70 ffff8800ac6add10 [ 5773.947987] 0000000000000020 ffffffff8106ec2c ffffffffa02a8420 0000000000000020 [ 5773.947989] Call Trace: [ 5773.947994] [<ffffffff812c53a9>] ? dump_stack+0x40/0x57 [ 5773.947997] [<ffffffff8106ebad>] ? warn_slowpath_common+0x7d/0xb0 [ 5773.947999] [<ffffffff8106ec2c>] ? warn_slowpath_fmt+0x4c/0x50 [ 5773.948019] [<ffffffffa021908b>] ? btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs] [ 5773.948027] [<ffffffffa02190f2>] ? delayed_ref_async_start+0x32/0x80 [btrfs] [ 5773.948039] [<ffffffffa025bd98>] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs] [ 5773.948041] [<ffffffff810851df>] ? process_one_work+0x19f/0x3d0 [ 5773.948043] [<ffffffff8108545d>] ? worker_thread+0x4d/0x450 [ 5773.948044] [<ffffffff81085410>] ? process_one_work+0x3d0/0x3d0 [ 5773.948046] [<ffffffff8108af5d>] ? kthread+0xbd/0xe0 [ 5773.948048] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 [ 5773.948051] [<ffffffff81553c5f>] ? ret_from_fork+0x3f/0x70 [ 5773.948053] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 [ 5773.948054] ---[ end trace 654b175f2543b4e4 ]--- [ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists [ 5773.948092] BTRFS info (device sdb1): forced readonly [ 5936.235238] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 [ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 [ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 [ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 [...] Are you interested in the btrfs-image output now ? -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 13:15 ` Laurent Bonnaud @ 2015-11-24 23:46 ` Qu Wenruo 2015-11-25 9:05 ` Laurent Bonnaud 2015-11-24 23:53 ` Qu Wenruo 1 sibling, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 23:46 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 11/24/2015 09:15 PM, Laurent Bonnaud wrote: > On 23/11/2015 02:00, Qu Wenruo wrote: > >> Considering the size, I'd like not to touch the dump, metadata is over 5G, > > It is only 2GB once compressed :>. The size seems small enough, I'll try to download it as it's super useful to debug it. > >> and I think it's not related to on-disk data, but runtime problem like I mentioned above. > > To test this hypothesis I did the following: > > - reboot the machine with a 4.3.0 kernel from Debian experimental > - run "du" on the btrfs FS as a quick sanity check Nice reproducer. Is it 100% reproducible or has a chance to reproduce? Thanks, Qu > > The kernel went read-only again with the following kernel errors: > > [ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled > [ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space > [ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now > [ 5773.947885] ------------[ cut here ]------------ > [ 5773.947908] WARNING: CPU: 0 PID: 2546 at /build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]() > [ 5773.947909] BTRFS: Transaction aborted (error -17) > [ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache > [ 5773.947951] jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common > [ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1 > [ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, BIOS A08 01/21/2011 > [ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] > [ 5773.947983] ffffffffa02a8250 ffffffff812c53a9 ffff8800af283d30 ffffffff8106ebad > [ 5773.947985] ffff8800ace5eae0 ffff8800af283d80 ffff8800ac6ade70 ffff8800ac6add10 > [ 5773.947987] 0000000000000020 ffffffff8106ec2c ffffffffa02a8420 0000000000000020 > [ 5773.947989] Call Trace: > [ 5773.947994] [<ffffffff812c53a9>] ? dump_stack+0x40/0x57 > [ 5773.947997] [<ffffffff8106ebad>] ? warn_slowpath_common+0x7d/0xb0 > [ 5773.947999] [<ffffffff8106ec2c>] ? warn_slowpath_fmt+0x4c/0x50 > [ 5773.948019] [<ffffffffa021908b>] ? btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs] > [ 5773.948027] [<ffffffffa02190f2>] ? delayed_ref_async_start+0x32/0x80 [btrfs] > [ 5773.948039] [<ffffffffa025bd98>] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs] > [ 5773.948041] [<ffffffff810851df>] ? process_one_work+0x19f/0x3d0 > [ 5773.948043] [<ffffffff8108545d>] ? worker_thread+0x4d/0x450 > [ 5773.948044] [<ffffffff81085410>] ? process_one_work+0x3d0/0x3d0 > [ 5773.948046] [<ffffffff8108af5d>] ? kthread+0xbd/0xe0 > [ 5773.948048] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 > [ 5773.948051] [<ffffffff81553c5f>] ? ret_from_fork+0x3f/0x70 > [ 5773.948053] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 > [ 5773.948054] ---[ end trace 654b175f2543b4e4 ]--- > [ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists > [ 5773.948092] BTRFS info (device sdb1): forced readonly > [ 5936.235238] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 > [ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [...] > > Are you interested in the btrfs-image output now ? > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 23:46 ` Qu Wenruo @ 2015-11-25 9:05 ` Laurent Bonnaud 2015-12-03 17:13 ` Laurent Bonnaud 0 siblings, 1 reply; 55+ messages in thread From: Laurent Bonnaud @ 2015-11-25 9:05 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 25/11/2015 00:46, Qu Wenruo wrote: > The size seems small enough, I'll try to download it as it's super useful to debug it. Thanks ! > Nice reproducer. > Is it 100% reproducible or has a chance to reproduce? I tried a second time and got a similar kernel backtrace. > BTW, did you encountered the same btrfsck error "chunk type dismatch" from Christoph? Yes, that's what drew me to this discussion :>. I also tried the --repair option and that is perhaps what corrupted my FS. -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-25 9:05 ` Laurent Bonnaud @ 2015-12-03 17:13 ` Laurent Bonnaud 2015-12-04 0:47 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Laurent Bonnaud @ 2015-12-03 17:13 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 25/11/2015 10:05, Laurent Bonnaud wrote: >> > Nice reproducer. >> > Is it 100% reproducible or has a chance to reproduce? > I tried a second time and got a similar kernel backtrace. Hi, any news since you downloaded my FS image ? I kept my corrupted FS in case you wanted more info, but since I use this disk as a backup, it deprives me from one backup. So would you mind if I replace my corrupted FS with a new one ? Thanks for investigating this bug, -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-03 17:13 ` Laurent Bonnaud @ 2015-12-04 0:47 ` Qu Wenruo 2015-12-11 13:22 ` Laurent Bonnaud 2015-12-11 14:21 ` Laurent Bonnaud 0 siblings, 2 replies; 55+ messages in thread From: Qu Wenruo @ 2015-12-04 0:47 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo Laurent Bonnaud wrote on 2015/12/03 18:13 +0100: > On 25/11/2015 10:05, Laurent Bonnaud wrote: > >>>> Nice reproducer. >>>> Is it 100% reproducible or has a chance to reproduce? >> I tried a second time and got a similar kernel backtrace. > > Hi, > > any news since you downloaded my FS image ? > > I kept my corrupted FS in case you wanted more info, but since I use this disk as a backup, it deprives me from one backup. So would you mind if I replace my corrupted FS with a new one ? > > Thanks for investigating this bug, > The chunk mismatch problem should be resolved already, as the patch is merged in david's devel branch. But for the kernel abort transaction, I'm sorry there is no good clue yet. Only generic advice like update kernel to recent rc, as it has some new fixes which may help your case. And run btrfsck --init-extent-tree with latest btrfs-progs and hope some miracle will happen... Thanks, QU ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-04 0:47 ` Qu Wenruo @ 2015-12-11 13:22 ` Laurent Bonnaud 2015-12-11 14:21 ` Laurent Bonnaud 1 sibling, 0 replies; 55+ messages in thread From: Laurent Bonnaud @ 2015-12-11 13:22 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 04/12/2015 01:47, Qu Wenruo wrote: > The chunk mismatch problem should be resolved already, as the patch is merged in david's devel branch. Great ! I am looking forward to a new release with this bug fix... > But for the kernel abort transaction, I'm sorry there is no good clue yet. > Only generic advice like update kernel to recent rc, as it has some new fixes which may help your case. I tested again my "du -s" test with kernel 4.4-rc4 and got the following backtrace. Is this a normal error message or an anomaly in the kernel ? [ 194.509475] BTRFS: device label sauvegarde-IUT2 devid 1 transid 9057 /dev/sdb1 [ 194.599397] BTRFS info (device sdb1): disk space caching is enabled [ 227.764561] BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space [ 227.764564] BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now [ 228.108072] ------------[ cut here ]------------ [ 228.108089] WARNING: CPU: 0 PID: 3303 at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2927 btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]() [ 228.108090] BTRFS: Transaction aborted (error -17) [ 228.108091] Modules linked in: ses enclosure uas usb_storage xt_CHECKSUM iptable_mangle xt_tcpudp rfcomm xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c binfmt_misc bnep drbg ansi_cprng dm_crypt dell_wmi sparse_keymap intel_rapl dell_rbtn snd_hda_codec_hdmi iosf_mbi dell_laptop x86_pkg_temp_thermal intel_powerclamp dcdbas snd_hda_codec_idt coretemp dell_smm_hwmon snd_hda_codec_generic crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_hda_core snd_hwdep arc4 btusb joydev input_leds btrtl btbcm btintel bluetooth iwlmvm [ 228.108120] serio_raw mac80211 snd_pcm iwlwifi snd_seq_midi snd_seq_midi_event snd_rawmidi cfg80211 snd_seq snd_seq_device snd_timer lpc_ich mei_me snd mei soundcore kvm_intel shpchp kvm irqbypass dell_smo8800 8250_fintek mac_hid tpm_rng parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq hid_generic usbhid hid i915 psmouse firewire_ohci ahci libahci i2c_algo_bit firewire_core drm_kms_helper sdhci_pci e1000e sdhci crc_itu_t syscopyarea sysfillrect sysimgblt ptp fb_sys_fops pps_core drm wmi video fjes [ 228.108144] CPU: 0 PID: 3303 Comm: btrfs-transacti Not tainted 4.4.0-040400rc4-generic #201512061930 [ 228.108145] Hardware name: Dell Inc. Latitude E6520/0NVF5K, BIOS A19 11/14/2013 [ 228.108146] 0000000000000000 000000006496ca82 ffff88007c257d08 ffffffff813c8ab4 [ 228.108148] ffff88007c257d50 ffff88007c257d40 ffffffff8107d772 ffff8801b3b9b000 [ 228.108149] ffff880221f92000 ffff8801c4034170 ffffffffffffffff ffff880203a2d280 [ 228.108151] Call Trace: [ 228.108155] [<ffffffff813c8ab4>] dump_stack+0x44/0x60 [ 228.108158] [<ffffffff8107d772>] warn_slowpath_common+0x82/0xc0 [ 228.108159] [<ffffffff8107d80c>] warn_slowpath_fmt+0x5c/0x80 [ 228.108166] [<ffffffffc0345410>] ? __btrfs_run_delayed_refs+0xc40/0x1150 [btrfs] [ 228.108173] [<ffffffffc034893b>] btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs] [ 228.108180] [<ffffffffc035b632>] ? btrfs_wait_pending_ordered+0x22/0x90 [btrfs] [ 228.108188] [<ffffffffc035dc42>] btrfs_commit_transaction+0x4d2/0xa70 [btrfs] [ 228.108195] [<ffffffffc0358e79>] transaction_kthread+0x229/0x240 [btrfs] [ 228.108201] [<ffffffffc0358c50>] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs] [ 228.108204] [<ffffffff8109c8b8>] kthread+0xd8/0xf0 [ 228.108206] [<ffffffff8109c7e0>] ? kthread_create_on_node+0x1a0/0x1a0 [ 228.108210] [<ffffffff817fc58f>] ret_from_fork+0x3f/0x70 [ 228.108211] [<ffffffff8109c7e0>] ? kthread_create_on_node+0x1a0/0x1a0 [ 228.108213] ---[ end trace 39e2e9062b36c29e ]--- [ 228.108215] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2927: errno=-17 Object already exists [ 228.108217] BTRFS info (device sdb1): forced readonly [ 228.108219] BTRFS warning (device sdb1): Skipping commit of aborted transaction. [ 228.108220] BTRFS: error (device sdb1) in cleanup_transaction:1747: errno=-17 Object already exists -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-04 0:47 ` Qu Wenruo 2015-12-11 13:22 ` Laurent Bonnaud @ 2015-12-11 14:21 ` Laurent Bonnaud 2015-12-14 0:53 ` Qu Wenruo 2015-12-14 12:47 ` Laurent Bonnaud 1 sibling, 2 replies; 55+ messages in thread From: Laurent Bonnaud @ 2015-12-11 14:21 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 04/12/2015 01:47, Qu Wenruo wrote: > [run btrfsck] I did that, too with an old btrfsck version (4.0) and it found the following errors. Then I did a btrfsck --repair, and I have been able to complete my "du -s" test. The next step will we to run a "btrfs scrub" to check if data loss did happen... # btrfsck /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: f6d4db2e-962b-42db-87b1-35064a4d38e0 checking extents checking free space cache block group 314635714560 has wrong amount of free spacefailed to load free space cache for block group 314635714560 There is no free space entry for 353290420224-353290764288 There is no free space entry for 353290420224-353827291136 cache appears valid but isnt 353290420224 There is no free space entry for 541732175872-541732208640 There is no free space entry for 541732175872-542268981248 cache appears valid but isnt 541732110336 Wanted bytes 32768, found 262144 for off 1008273178624 Wanted bytes 536625152, found 262144 for off 1008273178624 cache appears valid but isnt 1008272932864 block group 1475887497216 has wrong amount of free spacefailed to load free space cache for block group 1475887497216 block group 1823242977280 has wrong amount of free spacefailed to load free space cache for block group 1823242977280 There is no free space entry for 1827001073664-1827002810368 There is no free space entry for 1827001073664-1827537944576 cache appears valid but isnt 1827001073664 There is no free space entry for 1969305501696-1969305518080 There is no free space entry for 1969305501696-1969842290688 cache appears valid but isnt 1969305419776 There is no free space entry for 2021381947392-2021381963776 There is no free space entry for 2021381947392-2021918769152 cache appears valid but isnt 2021381898240 There is no free space entry for 2027287478272-2027287724032 There is no free space entry for 2027287478272-2027824349184 cache appears valid but isnt 2027287478272 There is no free space entry for 2143889227776-2143889244160 There is no free space entry for 2143889227776-2144426000384 cache appears valid but isnt 2143889129472 found 1977224107644 bytes used err is -22 total csum bytes: 1925245108 total tree bytes: 5773115392 total fs tree bytes: 3504685056 total extent tree bytes: 156975104 btree space waste bytes: 780048699 file data blocks allocated: 1971884707840 referenced 1971875930112 btrfs-progs v4.0 -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-11 14:21 ` Laurent Bonnaud @ 2015-12-14 0:53 ` Qu Wenruo 2015-12-14 12:47 ` Laurent Bonnaud 1 sibling, 0 replies; 55+ messages in thread From: Qu Wenruo @ 2015-12-14 0:53 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo Laurent Bonnaud wrote on 2015/12/11 15:21 +0100: > On 04/12/2015 01:47, Qu Wenruo wrote: > >> [run btrfsck] > > I did that, too with an old btrfsck version (4.0) and it found the following errors. > Then I did a btrfsck --repair, and I have been able to complete my "du -s" test. > The next step will we to run a "btrfs scrub" to check if data loss did happen... Glad to hear that btrfsck --repair can fix it. It seems to be space cache problem, and normally mount with -o clearcache should handle it. But btrfsck --repair should also handle it well. Thanks, Qu > > # btrfsck /dev/sdb1 > Checking filesystem on /dev/sdb1 > UUID: f6d4db2e-962b-42db-87b1-35064a4d38e0 > checking extents > checking free space cache > block group 314635714560 has wrong amount of free spacefailed to load free space cache for block group 314635714560 > There is no free space entry for 353290420224-353290764288 > There is no free space entry for 353290420224-353827291136 > cache appears valid but isnt 353290420224 > There is no free space entry for 541732175872-541732208640 > There is no free space entry for 541732175872-542268981248 > cache appears valid but isnt 541732110336 > Wanted bytes 32768, found 262144 for off 1008273178624 > Wanted bytes 536625152, found 262144 for off 1008273178624 > cache appears valid but isnt 1008272932864 > block group 1475887497216 has wrong amount of free spacefailed to load free space cache for block group 1475887497216 > block group 1823242977280 has wrong amount of free spacefailed to load free space cache for block group 1823242977280 > There is no free space entry for 1827001073664-1827002810368 > There is no free space entry for 1827001073664-1827537944576 > cache appears valid but isnt 1827001073664 > There is no free space entry for 1969305501696-1969305518080 > There is no free space entry for 1969305501696-1969842290688 > cache appears valid but isnt 1969305419776 > There is no free space entry for 2021381947392-2021381963776 > There is no free space entry for 2021381947392-2021918769152 > cache appears valid but isnt 2021381898240 > There is no free space entry for 2027287478272-2027287724032 > There is no free space entry for 2027287478272-2027824349184 > cache appears valid but isnt 2027287478272 > There is no free space entry for 2143889227776-2143889244160 > There is no free space entry for 2143889227776-2144426000384 > cache appears valid but isnt 2143889129472 > found 1977224107644 bytes used err is -22 > total csum bytes: 1925245108 > total tree bytes: 5773115392 > total fs tree bytes: 3504685056 > total extent tree bytes: 156975104 > btree space waste bytes: 780048699 > file data blocks allocated: 1971884707840 > referenced 1971875930112 > btrfs-progs v4.0 > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-11 14:21 ` Laurent Bonnaud 2015-12-14 0:53 ` Qu Wenruo @ 2015-12-14 12:47 ` Laurent Bonnaud 2015-12-15 1:16 ` Qu Wenruo 1 sibling, 1 reply; 55+ messages in thread From: Laurent Bonnaud @ 2015-12-14 12:47 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 11/12/2015 15:21, Laurent Bonnaud wrote: > The next step will we to run a "btrfs scrub" to check if data loss did happen... Scrubbing is now finished and it detected no errors. -- Laurent. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-12-14 12:47 ` Laurent Bonnaud @ 2015-12-15 1:16 ` Qu Wenruo 0 siblings, 0 replies; 55+ messages in thread From: Qu Wenruo @ 2015-12-15 1:16 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo Laurent Bonnaud wrote on 2015/12/14 13:47 +0100: > On 11/12/2015 15:21, Laurent Bonnaud wrote: > >> The next step will we to run a "btrfs scrub" to check if data loss did happen... > > Scrubbing is now finished and it detected no errors. > Glad to hear that. Your fs should be OK now. Thanks, Qu ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-24 13:15 ` Laurent Bonnaud 2015-11-24 23:46 ` Qu Wenruo @ 2015-11-24 23:53 ` Qu Wenruo 1 sibling, 0 replies; 55+ messages in thread From: Qu Wenruo @ 2015-11-24 23:53 UTC (permalink / raw) To: Laurent Bonnaud, Qu Wenruo; +Cc: Lukas Pirl, linux-btrfs, calestyo On 11/24/2015 09:15 PM, Laurent Bonnaud wrote: > On 23/11/2015 02:00, Qu Wenruo wrote: > >> Considering the size, I'd like not to touch the dump, metadata is over 5G, > > It is only 2GB once compressed :>. > >> and I think it's not related to on-disk data, but runtime problem like I mentioned above. > > To test this hypothesis I did the following: > > - reboot the machine with a 4.3.0 kernel from Debian experimental > - run "du" on the btrfs FS as a quick sanity check > > The kernel went read-only again with the following kernel errors: > > [ 5759.890934] BTRFS info (device sdb1): disk space caching is enabled > [ 5773.278244] BTRFS warning (device sdb1): block group 314635714560 has wrong amount of free space > [ 5773.278247] BTRFS warning (device sdb1): failed to load free space cache for block group 314635714560, rebuild it now > [ 5773.947885] ------------[ cut here ]------------ > [ 5773.947908] WARNING: CPU: 0 PID: 2546 at /build/linux-7sjCdl/linux-4.3/fs/btrfs/extent-tree.c:2851 btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]() > [ 5773.947909] BTRFS: Transaction aborted (error -17) > [ 5773.947910] Modules linked in: xt_multiport cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc snd_hda_codec_analog snd_hda_codec_generic dell_wmi iTCO_wdt iTCO_vendor_support sparse_keymap evdev coretemp kvm_intel dcdbas snd_hda_intel snd_hda_codec snd_hda_core kvm snd_hwdep i915 snd_pcm_oss snd_mixer_oss pcspkr sg snd_pcm psmouse lpc_ich mfd_core serio_raw i2c_i801 snd_timer snd shpchp tpm_tis video drm_kms_helper drm soundcore mei_me mei i2c_algo_bit wmi tpm 8250_fintek button acpi_cpufreq processor ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd lru_cache libcrc32c parport_pc ppdev lp parport loop dm_crypt dm_mod autofs4 ext4 crc16 mbcache > [ 5773.947951] jbd2 crc32c_generic btrfs xor raid6_pq md_mod ses enclosure hid_generic usbhid hid sd_mod uas usb_storage ahci libahci ata_generic libata scsi_mod e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common > [ 5773.947967] CPU: 0 PID: 2546 Comm: kworker/u16:2 Not tainted 4.3.0-trunk-amd64 #1 Debian 4.3-1~exp1 > [ 5773.947968] Hardware name: Dell Inc. OptiPlex 780 /0C27VV, BIOS A08 01/21/2011 > [ 5773.947981] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] > [ 5773.947983] ffffffffa02a8250 ffffffff812c53a9 ffff8800af283d30 ffffffff8106ebad > [ 5773.947985] ffff8800ace5eae0 ffff8800af283d80 ffff8800ac6ade70 ffff8800ac6add10 > [ 5773.947987] 0000000000000020 ffffffff8106ec2c ffffffffa02a8420 0000000000000020 > [ 5773.947989] Call Trace: > [ 5773.947994] [<ffffffff812c53a9>] ? dump_stack+0x40/0x57 > [ 5773.947997] [<ffffffff8106ebad>] ? warn_slowpath_common+0x7d/0xb0 > [ 5773.947999] [<ffffffff8106ec2c>] ? warn_slowpath_fmt+0x4c/0x50 > [ 5773.948019] [<ffffffffa021908b>] ? btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs] > [ 5773.948027] [<ffffffffa02190f2>] ? delayed_ref_async_start+0x32/0x80 [btrfs] > [ 5773.948039] [<ffffffffa025bd98>] ? btrfs_scrubparity_helper+0xc8/0x260 [btrfs] > [ 5773.948041] [<ffffffff810851df>] ? process_one_work+0x19f/0x3d0 > [ 5773.948043] [<ffffffff8108545d>] ? worker_thread+0x4d/0x450 > [ 5773.948044] [<ffffffff81085410>] ? process_one_work+0x3d0/0x3d0 > [ 5773.948046] [<ffffffff8108af5d>] ? kthread+0xbd/0xe0 > [ 5773.948048] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 > [ 5773.948051] [<ffffffff81553c5f>] ? ret_from_fork+0x3f/0x70 > [ 5773.948053] [<ffffffff8108aea0>] ? kthread_create_on_node+0x170/0x170 > [ 5773.948054] ---[ end trace 654b175f2543b4e4 ]--- > [ 5773.948057] BTRFS: error (device sdb1) in btrfs_run_delayed_refs:2851: errno=-17 Object already exists > [ 5773.948092] BTRFS info (device sdb1): forced readonly > [ 5936.235238] perf interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 > [ 6427.280125] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.288873] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.381126] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [ 6427.381747] BTRFS (device sdb1): parent transid verify failed on 353291255808 wanted 9058 found 9056 > [...] > > Are you interested in the btrfs-image output now ? > BTW, did you encountered the same btrfsck error "chunk type dismatch" from Christoph? If so that will provide great help for the btrfsck false alert debugging. Thanks, Qu ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-12 21:51 bad extent [5993525264384, 5993525280768), type mismatch with chunk Christoph Anton Mitterer 2015-11-12 22:23 ` Christoph Anton Mitterer 2015-11-13 2:13 ` Qu Wenruo @ 2015-11-14 1:22 ` Qu Wenruo 2015-11-14 2:29 ` Christoph Anton Mitterer 2016-02-16 0:14 ` Ángel González 3 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-14 1:22 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-btrfs@vger.kernel.org 在 2015年11月13日 05:51, Christoph Anton Mitterer 写道: > Hey. > > I get these errors on fsck'ing a btrfs: > bad extent [5993525264384, 5993525280768), type mismatch with chunk > bad extent [5993525280768, 5993525297152), type mismatch with chunk > bad extent [5993525297152, 5993525313536), type mismatch with chunk > bad extent [5993529442304, 5993529458688), type mismatch with chunk > bad extent [5993529458688, 5993529475072), type mismatch with chunk > bad extent [5993530015744, 5993530032128), type mismatch with chunk > bad extent [5993530359808, 5993530376192), type mismatch with chunk > bad extent [5993530376192, 5993530392576), type mismatch with chunk > bad extent [5993530392576, 5993530408960), type mismatch with chunk > bad extent [5993530408960, 5993530425344), type mismatch with chunk > bad extent [5993531260928, 5993531277312), type mismatch with chunk > bad extent [5993531310080, 5993531326464), type mismatch with chunk Manually checked they all. Strangely, they are all OK... although it's a good news for you. They are all tree blocks and are all in metadata block group. It seems to be a btrfsck false alert, but the result is very strange. If type is wrong, all the extents inside the chunk should be reported as mismatch type with chunk. And according to the dump result, the reported ones are not continuous even they have adjacent extents but adjacent ones are not reported. So there must be some other bug in btrfsck, especially for no-skinny_metadata case to trigger the false alert. Did you have any smaller btrfs with the same false alert? Although I'll check the code to find what's wrong, but if you have any small enough image, debugging will be much much faster. Thanks, Qu > > What do they mean? And how to correct it without data loss (cause this > would be critical/precious data)? > > Oddly, I've fsck'ed the very same fs last time I've unmounted it (with > no errors)... and now this. > The only difference would be newer kernel and btrfsprogs. > > > Thanks, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-14 1:22 ` Qu Wenruo @ 2015-11-14 2:29 ` Christoph Anton Mitterer 2015-11-15 1:29 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-14 2:29 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1891 bytes --] On Sat, 2015-11-14 at 09:22 +0800, Qu Wenruo wrote: > Manually checked they all. thanks a lot :-) > Strangely, they are all OK... although it's a good news for you. Oh man... you're soooo mean ;-D > They are all tree blocks and are all in metadata block group. and I guess that's... expected/intended? > It seems to be a btrfsck false alert that's a relieve (for me) Well I've already started to copy all files from the device to a new one... unfortunately I'll loose all older snapshots (at least on the new fs) but instead I get skinny-metadata, which wasn't the default back then. (being able to copy a full fs, with all subvols/snapshots is IMHO really something that should be worked on) > If type is wrong, all the extents inside the chunk should be reported > as > mismatch type with chunk. Isn't that the case? At least there are so many reported extents... > And according to the dump result, the reported ones are not > continuous > even they have adjacent extents but adjacent ones are not reported. I'm not so deep into btrfs... is this kinda expected and if not how could all this happen? Or is it really just a check issue and filesystem-wise fully as it should be? > Did you have any smaller btrfs with the same false alert? Uhm... I can check, but I don't think so, especially as all other btrfs I have are newer and already have skinny-metadata. The only ones I had without are those two big 8TB HDDs... Unfortunately they contain sensitive data from work, which I don't think I can copy, otherwise could have sent you the device or so... > Although I'll check the code to find what's wrong, but if you have > any > small enough image, debugging will be much much faster. In any case, I'll keep the fs in question for a while, so that I can do verifications in case you have patches. thanks a lot, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-14 2:29 ` Christoph Anton Mitterer @ 2015-11-15 1:29 ` Qu Wenruo 2015-11-15 3:24 ` Christoph Anton Mitterer 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2015-11-15 1:29 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-btrfs@vger.kernel.org 在 2015年11月14日 10:29, Christoph Anton Mitterer 写道: > On Sat, 2015-11-14 at 09:22 +0800, Qu Wenruo wrote: >> Manually checked they all. > thanks a lot :-) > > >> Strangely, they are all OK... although it's a good news for you. > Oh man... you're soooo mean ;-D > > >> They are all tree blocks and are all in metadata block group. > and I guess that's... expected/intended? Yes, that's the expected behavior. But dismatch with btrfsck error report. > > >> It seems to be a btrfsck false alert > that's a relieve (for me) > > Well I've already started to copy all files from the device to a new > one... unfortunately I'll loose all older snapshots (at least on the > new fs) but instead I get skinny-metadata, which wasn't the default > back then. Skinny metadata is quite nice feature, hugely reduce the space of metadata extent item size. > (being able to copy a full fs, with all subvols/snapshots is IMHO > really something that should be worked on) > > >> If type is wrong, all the extents inside the chunk should be reported >> as >> mismatch type with chunk. > Isn't that the case? At least there are so many reported extents... If you posted all the output, that's just a little more than nothing. Just tens of error reported, compared to millions of extents. And in your case, if a chunk is really bad, it will report about 65K errors. > >> And according to the dump result, the reported ones are not >> continuous >> even they have adjacent extents but adjacent ones are not reported. > I'm not so deep into btrfs... is this kinda expected and if not how > could all this happen? Or is it really just a check issue and > filesystem-wise fully as it should be? I think it's a btrfsck issue, at least from the dump info, your extent tree is OK. And if there is no other error reported from btrfsck, your filesystem should be OK. > > >> Did you have any smaller btrfs with the same false alert? > Uhm... I can check, but I don't think so, especially as all other btrfs > I have are newer and already have skinny-metadata. > The only ones I had without are those two big 8TB HDDs... > Unfortunately they contain sensitive data from work, which I don't > think I can copy, otherwise could have sent you the device or so... > >> Although I'll check the code to find what's wrong, but if you have >> any >> small enough image, debugging will be much much faster. > In any case, I'll keep the fs in question for a while, so that I can do > verifications in case you have patches. Nice. Thanks, Qu > > thanks a lot, > Chris. > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-15 1:29 ` Qu Wenruo @ 2015-11-15 3:24 ` Christoph Anton Mitterer 0 siblings, 0 replies; 55+ messages in thread From: Christoph Anton Mitterer @ 2015-11-15 3:24 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 7658 bytes --] On Sun, 2015-11-15 at 09:29 +0800, Qu Wenruo wrote: > > > If type is wrong, all the extents inside the chunk should be > > > reported > > > as > > > mismatch type with chunk. > > Isn't that the case? At least there are so many reported extents... > > If you posted all the output Sure, I posted everything that the dump gave :) > , that's just a little more than nothing. > Just tens of error reported, compared to millions of extents. > And in your case, if a chunk is really bad, it will report about 65K > errors. I see.. > I think it's a btrfsck issue, at least from the dump info, your > extent > tree is OK. > And if there is no other error reported from btrfsck, your filesystem > should be OK. Nope.. there were no further errors. > > In any case, I'll keep the fs in question for a while, so that I > > can do > > verifications in case you have patches. > > Nice. Just tell me if you have something. btw: I saw these: Nov 15 02:01:42 heisenberg kernel: INFO: task btrfs-transacti:28379 blocked for more than 120 seconds. Nov 15 02:01:42 heisenberg kernel: Not tainted 4.2.0-1-amd64 #1 Nov 15 02:01:42 heisenberg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 15 02:01:42 heisenberg kernel: btrfs-transacti D ffffffff8109a1b0 0 28379 2 0x00000000 Nov 15 02:01:42 heisenberg kernel: ffff88016e3e6500 0000000000000046 000000000000007a ffff88040be88f00 Nov 15 02:01:42 heisenberg kernel: 0000000000002659 ffff880138070000 ffff88041e355840 7fffffffffffffff Nov 15 02:01:42 heisenberg kernel: ffffffff815508e0 ffff88013806fbb8 0007ffffffffffff ffffffff815500ff Nov 15 02:01:42 heisenberg kernel: Call Trace: Nov 15 02:01:42 heisenberg kernel: [<ffffffff815508e0>] ? bit_wait_timeout+0x70/0x70 Nov 15 02:01:42 heisenberg kernel: [<ffffffff815500ff>] ? schedule+0x2f/0x70 Nov 15 02:01:42 heisenberg kernel: [<ffffffff81552bc7>] ? schedule_timeout+0x1f7/0x290 Nov 15 02:01:42 heisenberg kernel: [<ffffffffa02be6f2>] ? extent_write_cache_pages.isra.28.constprop.43+0x222/0x330 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffff8101c325>] ? read_tsc+0x5/0x10 Nov 15 02:01:42 heisenberg kernel: [<ffffffff815508e0>] ? bit_wait_timeout+0x70/0x70 Nov 15 02:01:42 heisenberg kernel: [<ffffffff8154f7ad>] ? io_schedule_timeout+0x9d/0x110 Nov 15 02:01:42 heisenberg kernel: [<ffffffff81550915>] ? bit_wait_io+0x35/0x60 Nov 15 02:01:42 heisenberg kernel: [<ffffffff815504ca>] ? __wait_on_bit+0x5a/0x90 Nov 15 02:01:42 heisenberg kernel: [<ffffffff81152496>] ? find_get_pages_tag+0x116/0x150 Nov 15 02:01:42 heisenberg kernel: [<ffffffff811513a6>] ? wait_on_page_bit+0xb6/0xc0 Nov 15 02:01:42 heisenberg kernel: [<ffffffff810a9b20>] ? autoremove_wake_function+0x40/0x40 Nov 15 02:01:42 heisenberg kernel: [<ffffffff81151477>] ? filemap_fdatawait_range+0xc7/0x140 Nov 15 02:01:42 heisenberg kernel: [<ffffffffa02b9a43>] ? btrfs_wait_ordered_range+0x73/0x110 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa02e3c1d>] ? btrfs_wait_cache_io+0x5d/0x1e0 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa028e7bc>] ? btrfs_start_dirty_block_groups+0x17c/0x3f0 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa029ee84>] ? btrfs_commit_transaction+0x1b4/0xa90 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa029f7f0>] ? start_transaction+0x90/0x580 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa029a654>] ? transaction_kthread+0x224/0x240 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffffa029a430>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs] Nov 15 02:01:42 heisenberg kernel: [<ffffffff8108aa41>] ? kthread+0xc1/0xe0 Nov 15 02:01:42 heisenberg kernel: [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170 Nov 15 02:01:42 heisenberg kernel: [<ffffffff81553e5f>] ? ret_from_fork+0x3f/0x70 Nov 15 02:01:42 heisenberg kernel: [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170 Nov 15 02:03:42 heisenberg kernel: INFO: task btrfs-transacti:28379 blocked for more than 120 seconds. Nov 15 02:03:42 heisenberg kernel: Not tainted 4.2.0-1-amd64 #1 Nov 15 02:03:42 heisenberg kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 15 02:03:42 heisenberg kernel: btrfs-transacti D ffffffff8109a1b0 0 28379 2 0x00000000 Nov 15 02:03:42 heisenberg kernel: ffff88016e3e6500 0000000000000046 000000000000007a ffff88040be88f00 Nov 15 02:03:42 heisenberg kernel: 0000000000002659 ffff880138070000 ffff88041e355840 7fffffffffffffff Nov 15 02:03:42 heisenberg kernel: ffffffff815508e0 ffff88013806fbb8 0007ffffffffffff ffffffff815500ff Nov 15 02:03:42 heisenberg kernel: Call Trace: Nov 15 02:03:42 heisenberg kernel: [<ffffffff815508e0>] ? bit_wait_timeout+0x70/0x70 Nov 15 02:03:42 heisenberg kernel: [<ffffffff815500ff>] ? schedule+0x2f/0x70 Nov 15 02:03:42 heisenberg kernel: [<ffffffff81552bc7>] ? schedule_timeout+0x1f7/0x290 Nov 15 02:03:42 heisenberg kernel: [<ffffffffa02be6f2>] ? extent_write_cache_pages.isra.28.constprop.43+0x222/0x330 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffff8101c325>] ? read_tsc+0x5/0x10 Nov 15 02:03:42 heisenberg kernel: [<ffffffff815508e0>] ? bit_wait_timeout+0x70/0x70 Nov 15 02:03:42 heisenberg kernel: [<ffffffff8154f7ad>] ? io_schedule_timeout+0x9d/0x110 Nov 15 02:03:42 heisenberg kernel: [<ffffffff81550915>] ? bit_wait_io+0x35/0x60 Nov 15 02:03:42 heisenberg kernel: [<ffffffff815504ca>] ? __wait_on_bit+0x5a/0x90 Nov 15 02:03:42 heisenberg kernel: [<ffffffff81152496>] ? find_get_pages_tag+0x116/0x150 Nov 15 02:03:42 heisenberg kernel: [<ffffffff811513a6>] ? wait_on_page_bit+0xb6/0xc0 Nov 15 02:03:42 heisenberg kernel: [<ffffffff810a9b20>] ? autoremove_wake_function+0x40/0x40 Nov 15 02:03:42 heisenberg kernel: [<ffffffff81151477>] ? filemap_fdatawait_range+0xc7/0x140 Nov 15 02:03:42 heisenberg kernel: [<ffffffffa02b9a43>] ? btrfs_wait_ordered_range+0x73/0x110 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa02e3c1d>] ? btrfs_wait_cache_io+0x5d/0x1e0 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa028e7bc>] ? btrfs_start_dirty_block_groups+0x17c/0x3f0 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa029ee84>] ? btrfs_commit_transaction+0x1b4/0xa90 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa029f7f0>] ? start_transaction+0x90/0x580 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa029a654>] ? transaction_kthread+0x224/0x240 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffffa029a430>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs] Nov 15 02:03:42 heisenberg kernel: [<ffffffff8108aa41>] ? kthread+0xc1/0xe0 Nov 15 02:03:42 heisenberg kernel: [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170 Nov 15 02:03:42 heisenberg kernel: [<ffffffff81553e5f>] ? ret_from_fork+0x3f/0x70 Nov 15 02:03:42 heisenberg kernel: [<ffffffff8108a980>] ? kthread_create_on_node+0x170/0x170 Nov 15 04:18:37 heisenberg kernel: virbr1: port 2(vnet0) entered disabled state Nov 15 04:18:37 heisenberg kernel: device vnet0 left promiscuous mode Nov 15 04:18:37 heisenberg kernel: virbr1: port 2(vnet0) entered disabled state Is that anything to worry about? And any way to find out to which filesystem that process corresponds? At least I've never seen btrfs-transacti to hang... and since nothing than the cp -a from on to the other disk is doing anything on the two HDDs, I was a bit surprised to see that happen Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5313 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2015-11-12 21:51 bad extent [5993525264384, 5993525280768), type mismatch with chunk Christoph Anton Mitterer ` (2 preceding siblings ...) 2015-11-14 1:22 ` Qu Wenruo @ 2016-02-16 0:14 ` Ángel González 2016-02-16 1:38 ` Qu Wenruo 3 siblings, 1 reply; 55+ messages in thread From: Ángel González @ 2016-02-16 0:14 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org Hello everybody I have a btrfs filesystem [probably] created with btrfs-progs 4.3.1 that is also spewing some hundred of thousand «bad extent [x, y), type mismatch with chunk» messages on btrfsck. The data seems to be fine, so I expect it to be some kind of false positive. Still, there seems to be disagreement on the list on whether running btrfs --repair or not, plus I don't know if you may be interested in getting more information of this. Which should be my next steps? Kind regards ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2016-02-16 0:14 ` Ángel González @ 2016-02-16 1:38 ` Qu Wenruo 2016-02-16 22:21 ` Ángel González 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2016-02-16 1:38 UTC (permalink / raw) To: Ángel González, linux-btrfs@vger.kernel.org Ángel González wrote on 2016/02/16 01:14 +0100: > Hello everybody > > I have a btrfs filesystem [probably] created with btrfs-progs 4.3.1 > that is also spewing some hundred of thousand «bad extent [x, y), type > mismatch with chunk» messages on btrfsck. In fact, there is not only one false alert. One related to 64K sector size one is fixed before 4.3.1. But I assume that's not the case for you as only PPC64 and AArch64 may use 64K page size. The latest false alert is fixed just after 4.3.1. commit b08a740d7b797d870cbc3691b1291290d0815998 Author: Qu Wenruo <quwenruo@cn.fujitsu.com> Date: Wed Nov 25 14:19:06 2015 +0800 btrfs-progs: fsck: Fix a false alert where extent record has wrong metadata flag So you may need to try v4.4 if your filesystem is created with non-skinny metadata. > > The data seems to be fine, so I expect it to be some kind of false > positive. Still, there seems to be disagreement on the list on whether > running btrfs --repair or not, plus I don't know if you may be > interested in getting more information of this. btrfs --repair won't help on this case, no matter if it is a false alert. > > Which should be my next steps? > Try btrfs-progs 4.4 to see if all these false alert goes a way. Thanks, Qu > > Kind regards > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2016-02-16 1:38 ` Qu Wenruo @ 2016-02-16 22:21 ` Ángel González 2016-02-17 7:26 ` Qu Wenruo 0 siblings, 1 reply; 55+ messages in thread From: Ángel González @ 2016-02-16 22:21 UTC (permalink / raw) To: inux-btrfs@vger.kernel.org > > Which should be my next steps? > > > > Try btrfs-progs 4.4 to see if all these false alert goes a way. > > Thanks, > Qu Thanks! Those "errors" are indeed gone after updating btrfs-progs from 4.3.1 to 4.4. Sorry for the fuss. It's strange though if it was supposed to only happen with non-skinny metadata, since I didn't manually specify any flags, and supposedly skinny is the default since 3.18 (the btrfs partition was created with a newer version). Thanks for your support ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2016-02-16 22:21 ` Ángel González @ 2016-02-17 7:26 ` Qu Wenruo 2016-02-17 23:56 ` Ángel González 0 siblings, 1 reply; 55+ messages in thread From: Qu Wenruo @ 2016-02-17 7:26 UTC (permalink / raw) To: Ángel González, inux-btrfs@vger.kernel.org Ángel González wrote on 2016/02/16 23:21 +0100: > > >>> Which should be my next steps? >>> >> >> Try btrfs-progs 4.4 to see if all these false alert goes a way. >> >> Thanks, >> Qu > > Thanks! > Those "errors" are indeed gone after updating btrfs-progs from 4.3.1 to > 4.4. Sorry for the fuss. > > > It's strange though if it was supposed to only happen with non-skinny > metadata, since I didn't manually specify any flags, and supposedly > skinny is the default since 3.18 (the btrfs partition was created with > a newer version). If you're really interesting in whether your fs has skinny metadata enabled, you can check btrfs-show-super output. Like the following output indicates skinny metadata: ------ incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) <<<Here ------ Even it has skinny metadata, it's still possible that some metadata are still in old format if you used btrfstune to convert an old fs to skinny metadata. But anyway, it's always good to see the problem solved. Thanks, Qu > > > Thanks for your support > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk 2016-02-17 7:26 ` Qu Wenruo @ 2016-02-17 23:56 ` Ángel González 0 siblings, 0 replies; 55+ messages in thread From: Ángel González @ 2016-02-17 23:56 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org Qu Wenruo wrote: > If you're really interesting in whether your fs has skinny metadata > enabled, you can check btrfs-show-super output. > Like the following output indicates skinny metadata: > ------ > incompat_flags 0x161 > ( MIXED_BACKREF | > BIG_METADATA | > EXTENDED_IREF | > SKINNY_METADATA ) <<<Here > ------ > > Even it has skinny metadata, it's still possible that some metadata > are still in old format if you used btrfstune to convert an old fs to > skinny metadata. It was a freshly created filesystem. However, btrfs-show-super shows it does *not* have skinny metadata: > incompat_flags 0x61 > ( MIXED_BACKREF | > BIG_METADATA | > EXTENDED_IREF ) Maybe gparted explicitely requested it to be created without skinny metadata. That won't make me lose my sleep, though. > But anyway, it's always good to see the problem solved. Indeed :-) Thanks again ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2016-02-17 23:56 UTC | newest] Thread overview: 55+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-12 21:51 bad extent [5993525264384, 5993525280768), type mismatch with chunk Christoph Anton Mitterer 2015-11-12 22:23 ` Christoph Anton Mitterer 2015-11-13 2:13 ` Qu Wenruo 2015-11-13 2:26 ` Christoph Anton Mitterer 2015-11-13 2:52 ` Qu Wenruo 2015-11-13 3:03 ` Christoph Anton Mitterer 2015-11-13 3:23 ` Qu Wenruo 2015-11-13 3:31 ` Christoph Anton Mitterer 2015-11-13 3:44 ` Christoph Anton Mitterer 2015-11-13 3:57 ` Christoph Anton Mitterer 2015-11-13 7:05 ` Duncan 2015-11-13 9:55 ` Christoph Anton Mitterer 2015-11-13 11:37 ` Christoph Anton Mitterer [not found] ` <564F48FE.4000400@laposte.net> 2015-11-20 19:24 ` Christoph Anton Mitterer 2015-11-21 0:47 ` Qu Wenruo 2015-11-21 1:08 ` Lukas Pirl 2015-11-22 2:04 ` Qu Wenruo 2015-11-22 6:56 ` Christoph Anton Mitterer 2015-11-23 1:10 ` Qu Wenruo 2015-11-23 18:12 ` Christoph Anton Mitterer 2015-11-24 0:46 ` Qu Wenruo 2015-11-24 1:53 ` Christoph Anton Mitterer 2015-11-24 2:09 ` Qu Wenruo 2015-11-24 2:48 ` Christoph Anton Mitterer 2015-11-24 2:54 ` Qu Wenruo 2015-11-24 3:02 ` Christoph Anton Mitterer 2015-11-24 5:35 ` Qu Wenruo 2015-11-24 18:25 ` Christoph Anton Mitterer 2015-11-25 0:02 ` Qu Wenruo 2015-11-25 0:59 ` Qu Wenruo 2015-11-25 3:35 ` Christoph Anton Mitterer 2015-11-25 4:16 ` Christoph Anton Mitterer 2015-11-24 17:39 ` David Sterba 2015-11-22 10:17 ` Laurent Bonnaud 2015-11-23 1:00 ` Qu Wenruo 2015-11-24 13:15 ` Laurent Bonnaud 2015-11-24 23:46 ` Qu Wenruo 2015-11-25 9:05 ` Laurent Bonnaud 2015-12-03 17:13 ` Laurent Bonnaud 2015-12-04 0:47 ` Qu Wenruo 2015-12-11 13:22 ` Laurent Bonnaud 2015-12-11 14:21 ` Laurent Bonnaud 2015-12-14 0:53 ` Qu Wenruo 2015-12-14 12:47 ` Laurent Bonnaud 2015-12-15 1:16 ` Qu Wenruo 2015-11-24 23:53 ` Qu Wenruo 2015-11-14 1:22 ` Qu Wenruo 2015-11-14 2:29 ` Christoph Anton Mitterer 2015-11-15 1:29 ` Qu Wenruo 2015-11-15 3:24 ` Christoph Anton Mitterer 2016-02-16 0:14 ` Ángel González 2016-02-16 1:38 ` Qu Wenruo 2016-02-16 22:21 ` Ángel González 2016-02-17 7:26 ` Qu Wenruo 2016-02-17 23:56 ` Ángel González
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).