* "WARNING: device 0 not present" during scrub? @ 2016-01-30 11:59 Christian Pernegger 2016-01-30 20:10 ` Henk Slager ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Christian Pernegger @ 2016-01-30 11:59 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4854 bytes --] Hello, tonight's scrub was cancelled after a "WARNING: device 0 not present". No other visible errors or abnormalities. Google dragged up a linux-btrfs discussion from May 2015, but some of it seems to have happend off list and I couldn't find a resolution. As running btrfs-debug-tree was suggested there and it seemed non-invasive, I did: [...] fs tree key (FS_TREE ROOT_ITEM 0) leaf 3903828393984 items 10 free space 15539 generation 9938 owner 5 fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2 chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07 item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 3 transid 9938 size 82 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 inode ref index 0 namelen 2 name: .. item 2 key (256 DIR_ITEM 243075479) itemoff 16066 itemsize 45 location key (262 ROOT_ITEM -1) type DIR namelen 15 datalen 0 name: @mohammed-crypt item 3 key (256 DIR_ITEM 606771344) itemoff 16031 itemsize 35 location key (257 ROOT_ITEM -1) type DIR namelen 5 datalen 0 name: @root item 4 key (256 DIR_ITEM 1793720662) itemoff 15987 itemsize 44 location key (3901 ROOT_ITEM -1) type DIR namelen 14 datalen 0 name: @backup-legacy item 5 key (256 DIR_ITEM 1811406303) itemoff 15950 itemsize 37 location key (258 ROOT_ITEM -1) type DIR namelen 7 datalen 0 name: @backup item 6 key (256 DIR_INDEX 5) itemoff 15915 itemsize 35 location key (257 ROOT_ITEM -1) type DIR namelen 5 datalen 0 name: @root item 7 key (256 DIR_INDEX 6) itemoff 15878 itemsize 37 location key (258 ROOT_ITEM -1) type DIR namelen 7 datalen 0 name: @backup item 8 key (256 DIR_INDEX 7) itemoff 15833 itemsize 45 location key (262 ROOT_ITEM -1) type DIR namelen 15 datalen 0 name: @mohammed-crypt item 9 key (256 DIR_INDEX 8) itemoff 15789 itemsize 44 location key (3901 ROOT_ITEM -1) type DIR namelen 14 datalen 0 name: @backup-legacy checksum tree key (CSUM_TREE ROOT_ITEM 0) node 4693945303040 level 3 items 5 free 488 generation 14495 owner 7 fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2 chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07 key (EXTENT_CSUM EXTENT_CSUM 12582912) block 4693971959808 (286497312) gen 14495 key (EXTENT_CSUM EXTENT_CSUM 1027063414784) block 4693997813760 (286498890) gen 14490 key (EXTENT_CSUM EXTENT_CSUM 2054823305216) block 4693998977024 (286498961) gen 14490 key (EXTENT_CSUM EXTENT_CSUM 3077363499008) block 4693945729024 (286495711) gen 14495 key (EXTENT_CSUM EXTENT_CSUM 4094043148288) block 4693992472576 (286498564) gen 14490 parent transid verify failed on 4693971959808 wanted 14495 found 14497 parent transid verify failed on 4693971959808 wanted 14495 found 14497 parent transid verify failed on 4693971959808 wanted 14495 found 14497 parent transid verify failed on 4693971959808 wanted 14495 found 14497 Ignoring transid failure print-tree.c:1074: btrfs_print_tree: Assertion failed. btrfs-debug-tree[0x410489] btrfs-debug-tree[0x411dbf] btrfs-debug-tree[0x402adb] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45] btrfs-debug-tree[0x402d85] Ouch. This is on a 1-month-old Debian stable (jessie) install and yes, I know that means the kernel and btrfs-progs are ancient but I'd still very much appreciate some help. It's a backup box, so the data isn't critical, but of course I need it stable in the long run. Is it possible to fix this and prevent it from happening again? (How) can I verify if the data is still good? If the verdict is that I have to re-roll the box I wouldn't go with btrfs again at this time, but still be willing to help with debugging first, if anyone is interested. Regards & TIA Christian Pernegger P.S.: Please CC me, as I'm not on the list. Mandatory info: chris@mrmackey:~$ uname -a Linux mrmackey 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux chris@mrmackey:~$ /sbin/btrfs --version Btrfs v3.17 chris@mrmackey:~$ sudo btrfs fi show Label: 'root' uuid: 84a044be-b396-48cf-91dc-c610c0ae11e2 Total devices 1 FS bytes used 4.46TiB devid 1 size 5.46TiB used 4.68TiB path /dev/mapper/sda3_crypt Btrfs v3.17 chris@mrmackey:~$ sudo btrfs fi df /mnt/btrfsroot/ Data, single: total=4.67TiB, used=4.45TiB System, DUP: total=8.00MiB, used=528.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=6.50GiB, used=5.07GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B [-- Attachment #2: dmesg.log.gz --] [-- Type: application/x-gzip, Size: 17601 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger @ 2016-01-30 20:10 ` Henk Slager 2016-01-30 21:19 ` Christian Pernegger 2016-01-31 1:09 ` Chris Murphy 2016-02-01 10:23 ` Patrik Lundquist 2 siblings, 1 reply; 11+ messages in thread From: Henk Slager @ 2016-01-30 20:10 UTC (permalink / raw) To: linux-btrfs; +Cc: Christian Pernegger On Sat, Jan 30, 2016 at 12:59 PM, Christian Pernegger <pernegger@gmail.com> wrote: > Hello, > > tonight's scrub was cancelled after a "WARNING: device 0 not present". > No other visible errors or abnormalities. > > Google dragged up a linux-btrfs discussion from May 2015, but some of > it seems to have happend off list and I couldn't find a resolution. As It i probably this discussion: http://www.spinics.net/lists/linux-btrfs/msg43755.html It is same tools version as you use I see, but newer kernel. > running btrfs-debug-tree was suggested there and it seemed > non-invasive, I did: > > [...] > fs tree key (FS_TREE ROOT_ITEM 0) > leaf 3903828393984 items 10 free space 15539 generation 9938 owner 5 > fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2 > chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07 > item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 > inode generation 3 transid 9938 size 82 block group 0 > mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 > item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 > inode ref index 0 namelen 2 name: .. > item 2 key (256 DIR_ITEM 243075479) itemoff 16066 itemsize 45 > location key (262 ROOT_ITEM -1) type DIR > namelen 15 datalen 0 name: @mohammed-crypt > item 3 key (256 DIR_ITEM 606771344) itemoff 16031 itemsize 35 > location key (257 ROOT_ITEM -1) type DIR > namelen 5 datalen 0 name: @root > item 4 key (256 DIR_ITEM 1793720662) itemoff 15987 itemsize 44 > location key (3901 ROOT_ITEM -1) type DIR > namelen 14 datalen 0 name: @backup-legacy > item 5 key (256 DIR_ITEM 1811406303) itemoff 15950 itemsize 37 > location key (258 ROOT_ITEM -1) type DIR > namelen 7 datalen 0 name: @backup > item 6 key (256 DIR_INDEX 5) itemoff 15915 itemsize 35 > location key (257 ROOT_ITEM -1) type DIR > namelen 5 datalen 0 name: @root > item 7 key (256 DIR_INDEX 6) itemoff 15878 itemsize 37 > location key (258 ROOT_ITEM -1) type DIR > namelen 7 datalen 0 name: @backup > item 8 key (256 DIR_INDEX 7) itemoff 15833 itemsize 45 > location key (262 ROOT_ITEM -1) type DIR > namelen 15 datalen 0 name: @mohammed-crypt > item 9 key (256 DIR_INDEX 8) itemoff 15789 itemsize 44 > location key (3901 ROOT_ITEM -1) type DIR > namelen 14 datalen 0 name: @backup-legacy > checksum tree key (CSUM_TREE ROOT_ITEM 0) > node 4693945303040 level 3 items 5 free 488 generation 14495 owner 7 > fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2 > chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07 > key (EXTENT_CSUM EXTENT_CSUM 12582912) block 4693971959808 > (286497312) gen 14495 > key (EXTENT_CSUM EXTENT_CSUM 1027063414784) block > 4693997813760 (286498890) gen 14490 > key (EXTENT_CSUM EXTENT_CSUM 2054823305216) block > 4693998977024 (286498961) gen 14490 > key (EXTENT_CSUM EXTENT_CSUM 3077363499008) block > 4693945729024 (286495711) gen 14495 > key (EXTENT_CSUM EXTENT_CSUM 4094043148288) block > 4693992472576 (286498564) gen 14490 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > Ignoring transid failure > print-tree.c:1074: btrfs_print_tree: Assertion failed. > btrfs-debug-tree[0x410489] > btrfs-debug-tree[0x411dbf] > btrfs-debug-tree[0x402adb] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45] > btrfs-debug-tree[0x402d85] I haven't actively used btrfs-debug-tree myself, but it happens that these kind of tools crash, sometimes many gigabytes of memory is used/claimed, maybe that is a reason for the crash. Can you mount the fs (readonly)? But you could do a standard check first: unmount and run a btrfs check -p /dev/mapper/sda3_crypt Its readonly by default, it could give some idea whether the fs is damaged too much or not. > This is on a 1-month-old Debian stable (jessie) install and yes, I > know that means the kernel and btrfs-progs are ancient but I'd still > very much appreciate some help. It's a backup box, so the data isn't > critical, but of course I need it stable in the long run. Is it > possible to fix this and prevent it from happening again? (How) can I > verify if the data is still good? If the verdict is that I have to > re-roll the box I wouldn't go with btrfs again at this time, but still > be willing to help with debugging first, if anyone is interested. I think there is a relation between the many ata2 messages and this scrub failure. It looks like that in this case, scrub want to do its work, but the drive or some part of the stack is still not out of its sleep mode. So for some moments, btrfs kernel code state and drive (devid1) are not in sync. This might have happened also on other occasions in the last month so the fs might be more damaged than currently known. Hence the suggestion to do a normal check. You can use brute-force rsync -c (and more, see manpage) to validate your data, assuming your sourcedata isn't on btrfs. A workaround might be to disable PM for the system, or have the blockdevice only mounted when you backup/write to it. An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean that every piece of the kernel and tooling fits that 'stamp'. One way to keep a btrfs based backup box stable in the long run is to use a reasonably new kernel. There is so a lot of improvement for btrfs from 3.16 to 4.4 and 3.16 is not supported anymore by kernel.org and this list. Maybe you could switch to a rolling release linux distro or just update the debian kernel. But the more fundamental question is why you use btrfs? What features do you need that ext4 or xfs or reiserfs don't have? > Regards & TIA > Christian Pernegger > > P.S.: Please CC me, as I'm not on the list. > > > > Mandatory info: > chris@mrmackey:~$ uname -a > Linux mrmackey 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 > (2016-01-17) x86_64 GNU/Linux > > chris@mrmackey:~$ /sbin/btrfs --version > Btrfs v3.17 > > chris@mrmackey:~$ sudo btrfs fi show > Label: 'root' uuid: 84a044be-b396-48cf-91dc-c610c0ae11e2 > Total devices 1 FS bytes used 4.46TiB > devid 1 size 5.46TiB used 4.68TiB path /dev/mapper/sda3_crypt > > Btrfs v3.17 > > chris@mrmackey:~$ sudo btrfs fi df /mnt/btrfsroot/ > Data, single: total=4.67TiB, used=4.45TiB > System, DUP: total=8.00MiB, used=528.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=6.50GiB, used=5.07GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-30 20:10 ` Henk Slager @ 2016-01-30 21:19 ` Christian Pernegger 2016-01-31 1:42 ` Chris Murphy 0 siblings, 1 reply; 11+ messages in thread From: Christian Pernegger @ 2016-01-30 21:19 UTC (permalink / raw) To: linux-btrfs On 30 January 2016 at 21:10, Henk Slager <eye1tm@gmail.com> wrote: > Can you mount the fs (readonly)? No idea, it's still mounted (rw even), aside from the scrub failing and debug-tree crashing I wouldn't know anything was amiss. I was kind of reluctant to shut the machine down lest it then wouldn't come up at all. > unmount and run a btrfs check -p /dev/mapper/sda3_crypt That would mean shutting it down and booting from a rescue image on USB (any suggestions for something with a recent kernel and progs?). That's fine of course, if there's nothing more to be gleaned from the running system. > I think there is a relation between the many ata2 messages and this > scrub failure. There's exactly one of these errors on every resume from suspend, I'd assumed it's just the disk being slow to wake up. Even if they aren't benign, I made sure beforehand that the box did not sleep during the scrub and according to the logs it didn't. Suspend-resume and/or systemd are still likely culprits of course. > You can use brute-force rsync -c (and more, see manpage) to validate your > data, assuming your sourcedata isn't on btrfs. The data that I can verify, i.e. where the source machines still have the version from the current backup, checks out. > A workaround might be to disable PM for the system, The system's supposed to wake up once daily (nightly), pull in rdiff-backups from a few others and go back to sleep 20 min later. Keeping it awake 24/7 is a no-go noise and cost-wise. (For testing / debugging, sure, just not in the long run.) > An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean > that every piece of the kernel and tooling fits that 'stamp'. [...] Maybe you could switch > to a rolling release linux distro or just update the debian kernel. Using Debian stable usally means that once something is set up and works it keeps working until the hardware dies with little to no user interaction. For someting that sits in a corner and pulls in backups that suits me just fine. If there's a specific reason to update the kernel and btrfs-progs, it's easily done of course, but "let's hope it has gone away with the newer version" doesn't inspire me with confidence on its own. > But the more fundamental question is why you use btrfs? What features > do you need that ext4 or xfs or reiserfs don't have? Data checksumming. I don't mind a bit flipping here or there in old backups / archives but I'd have liked to know if something went bad and which files were affected. Compression. Dedup that works on mortal hardware. To a lesser degree, subvolumes. Also I wanted to get familiar with the next big thing in Linux file systems. :-) My bigger boxes use md + dm-crypt + lvm + manual checksumming and the moment I can replace that (or part of it) with something integrated, I will. Once the resilience and fault tolerance is there. (The other day md-raid10 was so unfazed by what must have been a disk with a half-dead controller that it took me half a day to find out which one it was ...) I was fully aware that I might run into trouble, I just didn't expect it to take less than a month and/or happen without provocation. The current install is expendable, even though it irks me to have to redo it (I didn't backup that, wanted to get it just right first), but I'd really like to find and fix the problem before I do, otherwise I might be back to square one in a month or so ... Cheers, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-30 21:19 ` Christian Pernegger @ 2016-01-31 1:42 ` Chris Murphy 2016-01-31 12:35 ` Christian Pernegger 0 siblings, 1 reply; 11+ messages in thread From: Chris Murphy @ 2016-01-31 1:42 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-btrfs On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger <pernegger@gmail.com> wrote: > >> An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean >> that every piece of the kernel and tooling fits that 'stamp'. [...] Maybe you could switch >> to a rolling release linux distro or just update the debian kernel. > > Using Debian stable usally means that once something is set up and > works it keeps working until the hardware dies with little to no user > interaction. For someting that sits in a corner and pulls in backups > that suits me just fine. If there's a specific reason to update the > kernel and btrfs-progs, it's easily done of course, but "let's hope it > has gone away with the newer version" doesn't inspire me with > confidence on its own. It maybe be stable for Debian but is Debian explicitly supporting Btrfs with this release? I don't think they are. In which case, it's at the least the wrong kernel version. The only distro explicitly supporting Btrfs is openSUSE. So if you need Btrfs in particular to be stable, and you don't want to have to think quite as much about kernels, you could consider that. But absolutely, of course we hope the problem is gone with the newer version, *that's how file system development works.* If it hasn't, and you reproduce the problem with kernel 4.4, then that means you've found a new bug that needs to be fixed. And first, it'd only possibly get fixed in 4.5 or newer before being backported to older kernels. That's how it goes. I can see how it might seem like it's a reasonable question to just ask first, but it really isn't. There's just so much development happening right now, a developer is not in a great position to think that far back for specific problems and whether yours might be one of them, and in what kernel version it was fixed. *shrug* just doesn't work that way, that's why there are changelogs for every sub kernel version. >> But the more fundamental question is why you use btrfs? What features >> do you need that ext4 or xfs or reiserfs don't have? > > Data checksumming. I don't mind a bit flipping here or there in old > backups / archives but I'd have liked to know if something went bad > and which files were affected. Compression. Dedup that works on mortal > hardware. Have you checked out ZFS on Linux? That might fit your use case better because it has the features you're asking for, but at least the ZFS portion is older and considered more stable. -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-31 1:42 ` Chris Murphy @ 2016-01-31 12:35 ` Christian Pernegger 2016-01-31 18:06 ` Henk Slager ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Christian Pernegger @ 2016-01-31 12:35 UTC (permalink / raw) To: linux-btrfs On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com> wrote: > On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger > It maybe be stable for Debian but is Debian explicitly supporting > Btrfs with this release? I don't think they are. The modules are in the kernel, the progs are in the main archive, it's an option in the installer. It's not the default fs but I couldn't find any indication that it's more or less supported than, say, xfs. Why they've chosen 3.16 (and not 3.18, which would be a long term release) I don't know, but the fact remains that that's the default kernel of a tier 1 distro, so people using it are going to be around for a while. > But absolutely, of course we hope the problem is gone with the newer > version, *that's how file system development works.* Be that as it may, as I said, that approach doesn't inspire confidence. If I had the vaguest idea about how to reproduce it, sure, but all I have is an apparently lightly corrupted or at the very least glitchy fs (it mounts and unmounts just fine). How would I know if a new kernel helped things? > I can see how it might seem like it's a reasonable question to just > ask first, but it really isn't. There's just so much development > happening right now, a developer is not in a great position to think > that far back for specific problems and whether yours might be one of > them, and in what kernel version it was fixed. *shrug* just doesn't > work that way, that's why there are changelogs for every sub kernel > version. I do understand your point of view, but: If a possible fs corruption bug on a widespread (if older) kernel after one month of use and without any discernible cause gets nothing more than *shrug* from this list then btrfs isn't production ready nor ready for any kind of day-to-day use, not because of code maturity but because of that mindset. IMHO the btrfs-genie is too far out of the bottle for that, the wording of the stability status on the wiki much too inviting. Anyway, I knew what I was getting into, so I'll just chalk it up to experience and move on. Keep up the good work! > Have you checked out ZFS on Linux? That might fit your use case better > because it has the features you're asking for, but at least the ZFS > portion is older and considered more stable. It seemed a bit over the top on a single disk and 4GB of (not even ECC) RAM. Between btrfs' heavy development and zfsonlinux being stable but needing potentially less stable Solaris-glue and having no distro-side support I thought I'd try btrfs first. Regards, Christian Pernegger ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-31 12:35 ` Christian Pernegger @ 2016-01-31 18:06 ` Henk Slager 2016-02-01 1:59 ` Duncan 2016-02-01 3:23 ` Chris Murphy 2 siblings, 0 replies; 11+ messages in thread From: Henk Slager @ 2016-01-31 18:06 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-btrfs > The modules are in the kernel, the progs are in the main archive, it's > an option in the installer. It's not the default fs but I couldn't > find any indication that it's more or less supported than, say, xfs. > Why they've chosen 3.16 (and not 3.18, which would be a long term > release) I don't know, but the fact remains that that's the default > kernel of a tier 1 distro, so people using it are going to be around > for a while. > >> But absolutely, of course we hope the problem is gone with the newer >> version, *that's how file system development works.* > > Be that as it may, as I said, that approach doesn't inspire > confidence. If I had the vaguest idea about how to reproduce it, sure, > but all I have is an apparently lightly corrupted or at the very least > glitchy fs (it mounts and unmounts just fine). How would I know if a > new kernel helped things? Boot the board with one of these images (a live one I would say): http://download.opensuse.org/tumbleweed/iso/ This weekend this is kernel 4.4.0-2-default and tools 4.3.1 Then report back the result of btrfs check of the fs You might get some (or millions) false positives from the check with tools 4.3.1 (but fixed in v4.4), due to the tools version your fs is created with. This is not a problem, at least is my experience. But you can compile the v4.4 tools from https://github.com/kdave/btrfs-progs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-31 12:35 ` Christian Pernegger 2016-01-31 18:06 ` Henk Slager @ 2016-02-01 1:59 ` Duncan 2016-02-01 3:23 ` Chris Murphy 2 siblings, 0 replies; 11+ messages in thread From: Duncan @ 2016-02-01 1:59 UTC (permalink / raw) To: linux-btrfs Christian Pernegger posted on Sun, 31 Jan 2016 13:35:58 +0100 as excerpted: > On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com> > wrote: >> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger It maybe be stable >> for Debian but is Debian explicitly supporting Btrfs with this release? >> I don't think they are. > > The modules are in the kernel, the progs are in the main archive, it's > an option in the installer. It's not the default fs but I couldn't find > any indication that it's more or less supported than, say, xfs. > Why they've chosen 3.16 (and not 3.18, which would be a long term > release) I don't know, but the fact remains that that's the default > kernel of a tier 1 distro, so people using it are going to be around for > a while. [To pernegger@ and list both, as requested.] What the distro wishes to support is of course up to the distro. See below. >> But absolutely, of course we hope the problem is gone with the newer >> version, *that's how file system development works.* > > Be that as it may, as I said, that approach doesn't inspire confidence. > If I had the vaguest idea about how to reproduce it, sure, but all I > have is an apparently lightly corrupted or at the very least glitchy fs > (it mounts and unmounts just fine). How would I know if a new kernel > helped things? Umm... Because you _try_ it? And if you're not willing to _try_ it, why on earth are you running a still stabilizing, not fully stable and mature, filesystem, where the recommendation is to stay at least reasonably current as there's still bugs being actively fixed? >> I can see how it might seem like it's a reasonable question to just ask >> first, but it really isn't. There's just so much development happening >> right now, a developer is not in a great position to think that far >> back for specific problems and whether yours might be one of them, and >> in what kernel version it was fixed. *shrug* just doesn't work that >> way, that's why there are changelogs for every sub kernel version. > > I do understand your point of view, but: If a possible fs corruption bug > on a widespread (if older) kernel after one month of use and without any > discernible cause gets nothing more than *shrug* from this list then > btrfs isn't production ready nor ready for any kind of day-to-day use, > not because of code maturity but because of that mindset. IMHO the > btrfs-genie is too far out of the bottle for that, > the wording of the stability status on the wiki much too inviting. I know of no list regular claiming btrfs is production ready or fully stable. In fact, the general position here is that btrfs is _not_ production ready, and that while btrfs is "stabilizING", it is "not yet fully stable and mature." Yes, depending on the use-case, btrfs is or can be ready for routine daily use, provided people are aware of the situation, and are following the sysadmin's first rule of backups, which in simplest form says that if you don't have at least one backup, by definition of your (in)action, you are defining that data as worth less than the time/hassle/resources necessary to do that backup. Of course that's the first rule of backups even if you're running on a fully stable and mature filesystem, and because btrfs isn't at that point yet, having at least one backup, and preferably more (because with btrfs not fully stable and mature, it can't be considered reliable as the primary working copy either, more a test deployment, which effectively makes the first backup the primary working copy, which means if it isn't backed up, thus a second backup, you're still defining the data as of little more than trivial value. Additionally, given the stability situation, here on this list we generally rather strongly recommend that people run either the latest or at the oldest, the first back, of either the current kernel series or the LTS kernel series. With the just released 4.4 an LTS kernel, and 4.1 the previous LTS, that means for best support here, and of course 4.4 current and 4.3 the previous current, that means for best support here, we're now recommending no older than the 4.4 or 4.1 LTS kernel series, or the 4.4 or 4.3 current kernel series, tho with 4.4 so new, it's understandable if people are still on the second-back LTS, 3.18, provided they're already working on upgrading to LTS-4.1. Of course we still do our best if people are running older than that, but because btrfs is still moving fast and older kernels have known bugs that are fixed in newer versions, previous to that is ancient history for us, and we're simply not able to support it to the same level we do the recommended kernels. As such, people should expect that as soon as they have a problem, the first thing they're going to be asked to do is upgrade to something newer than the btrfs Paleolithic era (OK, I'm exaggerating a bit, Neolithic, then) and see if the problem is already fixed. Of course what distros choose to support is up to them, and some are indeed supporting older btrfs, backporting fixes, etc. But in that case people really should be getting their btrfs support from them as well, because they're best positioned to know what fixes they've backported to whatever arbitrary kernel version number they're using, while all we know is what mainline code of a comparable version was like. Then of course there's the userspace tools, btrfs-progs. While on a normal runtime kernel the kernel code is what counts as userspace primarily simply makes calls to the kernel and the kernel does all the work, as soon as you're using userspace to try to work with an offline btrfs, btrfs check, btrfs restore, etc, it's userspace code doing the work, and then running current userspace becomes critical, as again, it has all the bugfixes that older versions lack. While both kernels and userspace are designed to work with both older and newer versions of the other, a good rule of thumb for userspace is to keep its version at least in sync with your kernel version. That way, provided you're following the kernel recommendations of no more than one LTS kernel series back from the current LTS kernel series, userspace won't get too outdated, either. As for old and stable, yes, there's legitimate reasons to want to run old and stable. However, they tend not to be very compatible with wanting to run a new and still stabilizing filesystem that's not yet mature, since the filesystem code is still moving fast and there are /real/ bugs being fixed every release. Thus, the general recommendation, on-list at least, is to pick one or the other, and if you pick old and stale^h^hble, forget about btrfs for the time being. Again, what your distro may support and whether you choose to use that support is between you and the distro, but then, you really are probably better off actually using that distro support, since they're the ones that know what they've backported, etc, and not the list, where our focus is on further stabilization in reasonably current mainline current or LTS series kernels. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-31 12:35 ` Christian Pernegger 2016-01-31 18:06 ` Henk Slager 2016-02-01 1:59 ` Duncan @ 2016-02-01 3:23 ` Chris Murphy 2 siblings, 0 replies; 11+ messages in thread From: Chris Murphy @ 2016-02-01 3:23 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-btrfs On Sun, Jan 31, 2016 at 5:35 AM, Christian Pernegger <pernegger@gmail.com> wrote: > On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com> wrote: >> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger >> It maybe be stable for Debian but is Debian explicitly supporting >> Btrfs with this release? I don't think they are. > > The modules are in the kernel, the progs are in the main archive, it's > an option in the installer. It's not the default fs but I couldn't > find any indication that it's more or less supported than, say, xfs. > Why they've chosen 3.16 (and not 3.18, which would be a long term > release) I don't know, but the fact remains that that's the default > kernel of a tier 1 distro, so people using it are going to be around > for a while. The Debian wiki on Btrfs basically defers to upstream. And upstream Btrfs recommends using newer kernels than this. Part of it is that there have been literally thousands of changes, there are hundreds of bugs discovered and fixed since that kernel version. Another part is there so much change no one likely has any idea how to cross reference the changes with your particular problem. So the request is to use something newer because it's a practical compromise. Dollars to donuts only a developer would know such details and yet surely such a detail is lost among thousands of others because by now 3.16 is ancient history. At the very least, you should find a way to use btrfs-progs 4.4, 'btrfs check' (without --repair) against this volume, and report the results. That's safe. The easiest way I can think to do it is a Fedora nightly. I just tested this one: https://kojipkgs.fedoraproject.org/mash/rawhide-20160130/rawhide/x86_64/os/images/boot.iso It has kernel 4.4rc1+ and btrfs-progs 4.4. You can boot from the troubleshooting menu, rescue option, and choose option 3 "Skip to shell" and then run btrfs check, again without --repair. This ISO boots BIOS and UEFI systems, just dd it to a stick. If that comes up clean you can even mount the volume and scrub it (the scrub code is kernel code even though it's activated by user space tools; whereas the fsck is in the user space tools). -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger 2016-01-30 20:10 ` Henk Slager @ 2016-01-31 1:09 ` Chris Murphy 2016-02-01 10:23 ` Patrik Lundquist 2 siblings, 0 replies; 11+ messages in thread From: Chris Murphy @ 2016-01-31 1:09 UTC (permalink / raw) To: Christian Pernegger; +Cc: Btrfs BTRFS On Sat, Jan 30, 2016 at 4:59 AM, Christian Pernegger <pernegger@gmail.com> wrote: > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 > parent transid verify failed on 4693971959808 wanted 14495 found 14497 Well it's not that far off so mounting with -o recovery should work. > Ignoring transid failure > print-tree.c:1074: btrfs_print_tree: Assertion failed. > btrfs-debug-tree[0x410489] > btrfs-debug-tree[0x411dbf] > btrfs-debug-tree[0x402adb] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45] > btrfs-debug-tree[0x402d85] It could be a bug, and if so there's a good chance it's fixed in newer versions of btrfs-progs. > > Ouch. > > This is on a 1-month-old Debian stable (jessie) install and yes, I > know that means the kernel and btrfs-progs are ancient but I'd still > very much appreciate some help. It's a backup box, so the data isn't > critical, but of course I need it stable in the long run. Sorry, you need it to be stable but you're using an EOL unsupported kernel? That just doesn't square. It's either a hardware problem (there are many softreset message, and possibly more than one ata instance than you have attached devices for, and no Btrfs errors), or it's a software bug. Either way you kinda need to try something newer to see if the problem has been since been fixed, because it's in the realm of 10,000 changes (probably more) since that kernel version you're using. There might be 4 people on the list who'd maybe recognize this, and say, yes in fact that was fixed in a newer kernel. So really no matter what you just have to upgrade. > Is it > possible to fix this and prevent it from happening again? (How) can I > verify if the data is still good? The user space program crashed either due to a bug or it ran out of memory. Maybe increase swap size, sometimes that helps btrfs check and btrfs-debug-tree go farther without problems. Try mounting with -o recovery. If that doesn't work try -o recovery,ro, and if that doesn't work then try btrfs check (without repair), using kernel and progs no older than 4.1.15. It's middle aged in Btrfs terms, but at least that's a longterm currently maintained kernel. If the verdict is that I have to > re-roll the box I wouldn't go with btrfs again at this time, but still > be willing to help with debugging first, if anyone is interested. I can almost guarantee that if -o recovery does not work, no one will want to suggest anything more aggressive if you also aren't willing to upgrade kernel and tools to something much newer. Really, if you need Btrfs to be stable, you need to use a distro that makes it easy for you to get the latest bug fixes, not just features. -- Chris Murphy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger 2016-01-30 20:10 ` Henk Slager 2016-01-31 1:09 ` Chris Murphy @ 2016-02-01 10:23 ` Patrik Lundquist 2016-03-02 21:50 ` Nils Steinger 2 siblings, 1 reply; 11+ messages in thread From: Patrik Lundquist @ 2016-02-01 10:23 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-btrfs@vger.kernel.org On 30 January 2016 at 12:59, Christian Pernegger <pernegger@gmail.com> wrote: > > This is on a 1-month-old Debian stable (jessie) install and yes, I > know that means the kernel and btrfs-progs are ancient apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64 Or something like that for the image name. Unfortunately there's no stable backport of btrfs-tools (as they call btrfs-progs). https://tracker.debian.org/pkg/linux ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: "WARNING: device 0 not present" during scrub? 2016-02-01 10:23 ` Patrik Lundquist @ 2016-03-02 21:50 ` Nils Steinger 0 siblings, 0 replies; 11+ messages in thread From: Nils Steinger @ 2016-03-02 21:50 UTC (permalink / raw) To: Patrik Lundquist, Christian Pernegger, linux-btrfs@vger.kernel.org [-- Attachment #1.1: Type: text/plain, Size: 678 bytes --] On 01.02.2016 11:23, Patrik Lundquist wrote: > apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64 > > Or something like that for the image name. Unfortunately there's no > stable backport of btrfs-tools (as they call btrfs-progs). There is now: 4.4-1~bpo8+1 Upgrading from btrfs-tools 3.17 to 4.4 fixed the scrub aborts for me. Oddly, `btrfs scrub start -B /dev/mapper/foo` terminates with exit code 0 when it aborts due to the "device 0" problem. That's not supposed to happen, is it? > EXIT STATUS > btrfs scrub returns a zero exit status if it succeeds. Non zero > is returned in case of failure. Regards, Nils Steinger [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-03-02 21:58 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger 2016-01-30 20:10 ` Henk Slager 2016-01-30 21:19 ` Christian Pernegger 2016-01-31 1:42 ` Chris Murphy 2016-01-31 12:35 ` Christian Pernegger 2016-01-31 18:06 ` Henk Slager 2016-02-01 1:59 ` Duncan 2016-02-01 3:23 ` Chris Murphy 2016-01-31 1:09 ` Chris Murphy 2016-02-01 10:23 ` Patrik Lundquist 2016-03-02 21:50 ` Nils Steinger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).