"WARNING: device 0 not present" during scrub?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* "WARNING: device 0 not present" during scrub?
@ 2016-01-30 11:59 Christian Pernegger
  2016-01-30 20:10 ` Henk Slager
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Christian Pernegger @ 2016-01-30 11:59 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4854 bytes --]

Hello,

tonight's scrub was cancelled after a "WARNING: device 0 not present".
No other visible errors or abnormalities.

Google dragged up a linux-btrfs discussion from May 2015, but some of
it seems to have happend off list and I couldn't find a resolution. As
running btrfs-debug-tree was suggested there and it seemed
non-invasive, I did:

[...]
fs tree key (FS_TREE ROOT_ITEM 0)
leaf 3903828393984 items 10 free space 15539 generation 9938 owner 5
fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2
chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07
        item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
                inode generation 3 transid 9938 size 82 block group 0
mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
        item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
                inode ref index 0 namelen 2 name: ..
        item 2 key (256 DIR_ITEM 243075479) itemoff 16066 itemsize 45
                location key (262 ROOT_ITEM -1) type DIR
                namelen 15 datalen 0 name: @mohammed-crypt
        item 3 key (256 DIR_ITEM 606771344) itemoff 16031 itemsize 35
                location key (257 ROOT_ITEM -1) type DIR
                namelen 5 datalen 0 name: @root
        item 4 key (256 DIR_ITEM 1793720662) itemoff 15987 itemsize 44
                location key (3901 ROOT_ITEM -1) type DIR
                namelen 14 datalen 0 name: @backup-legacy
        item 5 key (256 DIR_ITEM 1811406303) itemoff 15950 itemsize 37
                location key (258 ROOT_ITEM -1) type DIR
                namelen 7 datalen 0 name: @backup
        item 6 key (256 DIR_INDEX 5) itemoff 15915 itemsize 35
                location key (257 ROOT_ITEM -1) type DIR
                namelen 5 datalen 0 name: @root
        item 7 key (256 DIR_INDEX 6) itemoff 15878 itemsize 37
                location key (258 ROOT_ITEM -1) type DIR
                namelen 7 datalen 0 name: @backup
        item 8 key (256 DIR_INDEX 7) itemoff 15833 itemsize 45
                location key (262 ROOT_ITEM -1) type DIR
                namelen 15 datalen 0 name: @mohammed-crypt
        item 9 key (256 DIR_INDEX 8) itemoff 15789 itemsize 44
                location key (3901 ROOT_ITEM -1) type DIR
                namelen 14 datalen 0 name: @backup-legacy
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 4693945303040 level 3 items 5 free 488 generation 14495 owner 7
fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2
chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07
        key (EXTENT_CSUM EXTENT_CSUM 12582912) block 4693971959808
(286497312) gen 14495
        key (EXTENT_CSUM EXTENT_CSUM 1027063414784) block
4693997813760 (286498890) gen 14490
        key (EXTENT_CSUM EXTENT_CSUM 2054823305216) block
4693998977024 (286498961) gen 14490
        key (EXTENT_CSUM EXTENT_CSUM 3077363499008) block
4693945729024 (286495711) gen 14495
        key (EXTENT_CSUM EXTENT_CSUM 4094043148288) block
4693992472576 (286498564) gen 14490
parent transid verify failed on 4693971959808 wanted 14495 found 14497
parent transid verify failed on 4693971959808 wanted 14495 found 14497
parent transid verify failed on 4693971959808 wanted 14495 found 14497
parent transid verify failed on 4693971959808 wanted 14495 found 14497
Ignoring transid failure
print-tree.c:1074: btrfs_print_tree: Assertion failed.
btrfs-debug-tree[0x410489]
btrfs-debug-tree[0x411dbf]
btrfs-debug-tree[0x402adb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45]
btrfs-debug-tree[0x402d85]

Ouch.

This is on a 1-month-old Debian stable (jessie) install and yes, I
know that means the kernel and btrfs-progs are ancient but I'd still
very much appreciate some help. It's a backup box, so the data isn't
critical, but of course I need it stable in the long run. Is it
possible to fix this and prevent it from happening again? (How) can I
verify if the data is still good?  If the verdict is that I have to
re-roll the box I wouldn't go with btrfs again at this time, but still
be willing to help with debugging first, if anyone is interested.

Regards & TIA
Christian Pernegger

P.S.: Please CC me, as I'm not on the list.



Mandatory info:
chris@mrmackey:~$ uname -a
Linux mrmackey 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3
(2016-01-17) x86_64 GNU/Linux

chris@mrmackey:~$ /sbin/btrfs --version
Btrfs v3.17

chris@mrmackey:~$ sudo btrfs fi show
Label: 'root'  uuid: 84a044be-b396-48cf-91dc-c610c0ae11e2
        Total devices 1 FS bytes used 4.46TiB
        devid    1 size 5.46TiB used 4.68TiB path /dev/mapper/sda3_crypt

Btrfs v3.17

chris@mrmackey:~$ sudo btrfs fi df /mnt/btrfsroot/
Data, single: total=4.67TiB, used=4.45TiB
System, DUP: total=8.00MiB, used=528.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=6.50GiB, used=5.07GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

[-- Attachment #2: dmesg.log.gz --]
[-- Type: application/x-gzip, Size: 17601 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger
@ 2016-01-30 20:10 ` Henk Slager
  2016-01-30 21:19   ` Christian Pernegger
  2016-01-31  1:09 ` Chris Murphy
  2016-02-01 10:23 ` Patrik Lundquist
  2 siblings, 1 reply; 11+ messages in thread
From: Henk Slager @ 2016-01-30 20:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Christian Pernegger

On Sat, Jan 30, 2016 at 12:59 PM, Christian Pernegger
<pernegger@gmail.com> wrote:
> Hello,
>
> tonight's scrub was cancelled after a "WARNING: device 0 not present".
> No other visible errors or abnormalities.
>
> Google dragged up a linux-btrfs discussion from May 2015, but some of
> it seems to have happend off list and I couldn't find a resolution. As

It i probably this discussion:
 http://www.spinics.net/lists/linux-btrfs/msg43755.html
It is same tools version as you use I see, but newer kernel.

> running btrfs-debug-tree was suggested there and it seemed
> non-invasive, I did:
>
> [...]
> fs tree key (FS_TREE ROOT_ITEM 0)
> leaf 3903828393984 items 10 free space 15539 generation 9938 owner 5
> fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2
> chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07
>         item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>                 inode generation 3 transid 9938 size 82 block group 0
> mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>         item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
>                 inode ref index 0 namelen 2 name: ..
>         item 2 key (256 DIR_ITEM 243075479) itemoff 16066 itemsize 45
>                 location key (262 ROOT_ITEM -1) type DIR
>                 namelen 15 datalen 0 name: @mohammed-crypt
>         item 3 key (256 DIR_ITEM 606771344) itemoff 16031 itemsize 35
>                 location key (257 ROOT_ITEM -1) type DIR
>                 namelen 5 datalen 0 name: @root
>         item 4 key (256 DIR_ITEM 1793720662) itemoff 15987 itemsize 44
>                 location key (3901 ROOT_ITEM -1) type DIR
>                 namelen 14 datalen 0 name: @backup-legacy
>         item 5 key (256 DIR_ITEM 1811406303) itemoff 15950 itemsize 37
>                 location key (258 ROOT_ITEM -1) type DIR
>                 namelen 7 datalen 0 name: @backup
>         item 6 key (256 DIR_INDEX 5) itemoff 15915 itemsize 35
>                 location key (257 ROOT_ITEM -1) type DIR
>                 namelen 5 datalen 0 name: @root
>         item 7 key (256 DIR_INDEX 6) itemoff 15878 itemsize 37
>                 location key (258 ROOT_ITEM -1) type DIR
>                 namelen 7 datalen 0 name: @backup
>         item 8 key (256 DIR_INDEX 7) itemoff 15833 itemsize 45
>                 location key (262 ROOT_ITEM -1) type DIR
>                 namelen 15 datalen 0 name: @mohammed-crypt
>         item 9 key (256 DIR_INDEX 8) itemoff 15789 itemsize 44
>                 location key (3901 ROOT_ITEM -1) type DIR
>                 namelen 14 datalen 0 name: @backup-legacy
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> node 4693945303040 level 3 items 5 free 488 generation 14495 owner 7
> fs uuid 84a044be-b396-48cf-91dc-c610c0ae11e2
> chunk uuid 7e3f121b-c77f-4d60-a560-897f1aa39d07
>         key (EXTENT_CSUM EXTENT_CSUM 12582912) block 4693971959808
> (286497312) gen 14495
>         key (EXTENT_CSUM EXTENT_CSUM 1027063414784) block
> 4693997813760 (286498890) gen 14490
>         key (EXTENT_CSUM EXTENT_CSUM 2054823305216) block
> 4693998977024 (286498961) gen 14490
>         key (EXTENT_CSUM EXTENT_CSUM 3077363499008) block
> 4693945729024 (286495711) gen 14495
>         key (EXTENT_CSUM EXTENT_CSUM 4094043148288) block
> 4693992472576 (286498564) gen 14490
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> Ignoring transid failure
> print-tree.c:1074: btrfs_print_tree: Assertion failed.
> btrfs-debug-tree[0x410489]
> btrfs-debug-tree[0x411dbf]
> btrfs-debug-tree[0x402adb]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45]
> btrfs-debug-tree[0x402d85]

I haven't actively used btrfs-debug-tree myself, but it happens that
these kind of tools crash, sometimes many gigabytes of memory is
used/claimed, maybe that is a reason for the crash.

Can you mount the fs (readonly)?

But you could do a standard check first:
 unmount and run a   btrfs check -p /dev/mapper/sda3_crypt
Its readonly by default, it could give some idea whether the fs is
damaged too much or not.

> This is on a 1-month-old Debian stable (jessie) install and yes, I
> know that means the kernel and btrfs-progs are ancient but I'd still
> very much appreciate some help. It's a backup box, so the data isn't
> critical, but of course I need it stable in the long run. Is it
> possible to fix this and prevent it from happening again? (How) can I
> verify if the data is still good?  If the verdict is that I have to
> re-roll the box I wouldn't go with btrfs again at this time, but still
> be willing to help with debugging first, if anyone is interested.

I think there is a relation between the many ata2 messages and this
scrub failure. It looks like that in this case, scrub want to do its
work, but the drive or some part of the stack is still not out of its
sleep mode. So for some moments, btrfs kernel code state and drive
(devid1) are not in sync. This might have happened also on other
occasions in the last month so the fs might be more damaged than
currently known. Hence the suggestion to do a normal check. You can
use brute-force rsync -c (and more, see manpage) to validate your
data, assuming your sourcedata isn't on btrfs.

A workaround might be to disable PM for the system, or have the
blockdevice only mounted when you backup/write to it. An an obvious
advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean
that every piece of the kernel and tooling fits that 'stamp'.

One way to keep a btrfs based backup box stable in the long run is to
use a reasonably new kernel. There is so a lot of improvement for
btrfs from 3.16 to 4.4 and 3.16 is not supported anymore by kernel.org
and this list. Maybe you could switch to a rolling release linux
distro or just update the debian kernel.

But the more fundamental question is why you use btrfs? What features
do you need that ext4 or xfs or reiserfs don't have?

> Regards & TIA
> Christian Pernegger
>
> P.S.: Please CC me, as I'm not on the list.
>
>
>
> Mandatory info:
> chris@mrmackey:~$ uname -a
> Linux mrmackey 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3
> (2016-01-17) x86_64 GNU/Linux
>
> chris@mrmackey:~$ /sbin/btrfs --version
> Btrfs v3.17
>
> chris@mrmackey:~$ sudo btrfs fi show
> Label: 'root'  uuid: 84a044be-b396-48cf-91dc-c610c0ae11e2
>         Total devices 1 FS bytes used 4.46TiB
>         devid    1 size 5.46TiB used 4.68TiB path /dev/mapper/sda3_crypt
>
> Btrfs v3.17
>
> chris@mrmackey:~$ sudo btrfs fi df /mnt/btrfsroot/
> Data, single: total=4.67TiB, used=4.45TiB
> System, DUP: total=8.00MiB, used=528.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=6.50GiB, used=5.07GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-30 20:10 ` Henk Slager
@ 2016-01-30 21:19   ` Christian Pernegger
  2016-01-31  1:42     ` Chris Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Pernegger @ 2016-01-30 21:19 UTC (permalink / raw)
  To: linux-btrfs

On 30 January 2016 at 21:10, Henk Slager <eye1tm@gmail.com> wrote:
> Can you mount the fs (readonly)?

No idea, it's still mounted (rw even), aside from the scrub failing
and debug-tree crashing I wouldn't know anything was amiss. I was kind
of reluctant to shut the machine down lest it then wouldn't come up at
all.

>  unmount and run a   btrfs check -p /dev/mapper/sda3_crypt

That would mean shutting it down and booting from a rescue image on
USB (any suggestions for something with a recent kernel and progs?).
That's fine of course, if there's nothing more to be gleaned from the
running system.

> I think there is a relation between the many ata2 messages and this
> scrub failure.

There's exactly one of these errors on every resume from suspend, I'd
assumed it's just the disk being slow to wake up. Even if they aren't
benign, I made sure beforehand that the box did not sleep during the
scrub and according to the logs it didn't.
Suspend-resume and/or systemd are still likely culprits of course.

> You can use brute-force rsync -c (and more, see manpage) to validate your
> data, assuming your sourcedata isn't on btrfs.

The data that I can verify, i.e. where the source machines still have
the version from the current backup, checks out.

> A workaround might be to disable PM for the system,

The system's supposed to wake up once daily (nightly), pull in
rdiff-backups from a few others and go back to sleep 20 min later.
Keeping it awake 24/7 is a no-go noise and cost-wise. (For testing /
debugging, sure, just not in the long run.)

> An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean
> that every piece of the kernel and tooling fits that 'stamp'. [...] Maybe you could switch
> to a rolling release linux distro or just update the debian kernel.

Using Debian stable usally means that once something is set up and
works it keeps working until the hardware dies with little to no user
interaction. For someting that sits in a corner and pulls in backups
that suits me just fine. If there's a specific reason to update the
kernel and btrfs-progs, it's easily done of course, but "let's hope it
has gone away with the newer version" doesn't inspire me with
confidence on its own.

> But the more fundamental question is why you use btrfs? What features
> do you need that ext4 or xfs or reiserfs don't have?

Data checksumming. I don't mind a bit flipping here or there in old
backups / archives but I'd have liked to know if something went bad
and which files were affected. Compression. Dedup that works on mortal
hardware. To a lesser degree, subvolumes.
Also I wanted to get familiar with the next big thing in Linux file
systems. :-) My bigger boxes use md + dm-crypt + lvm + manual
checksumming and the moment I can replace that (or part of it) with
something integrated, I will. Once the resilience and fault tolerance
is there. (The other day md-raid10 was so unfazed by what must have
been a disk with a half-dead controller that it took me half a day to
find out which one it was ...)
I was fully aware that I might run into trouble, I just didn't expect
it to take less than a month and/or happen without provocation.

The current install is expendable, even though it irks me to have to
redo it (I didn't backup that, wanted to get it just right first), but
I'd really like to find and fix the problem before I do, otherwise I
might be back to square one in a month or so ...

Cheers,
C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-30 21:19   ` Christian Pernegger
@ 2016-01-31  1:42     ` Chris Murphy
  2016-01-31 12:35       ` Christian Pernegger
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Murphy @ 2016-01-31  1:42 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger
<pernegger@gmail.com> wrote:

>
>> An an obvious advice is to use a 4.4 kernel and tools. Debian 'stable' doesn't mean
>> that every piece of the kernel and tooling fits that 'stamp'. [...] Maybe you could switch
>> to a rolling release linux distro or just update the debian kernel.
>
> Using Debian stable usally means that once something is set up and
> works it keeps working until the hardware dies with little to no user
> interaction. For someting that sits in a corner and pulls in backups
> that suits me just fine. If there's a specific reason to update the
> kernel and btrfs-progs, it's easily done of course, but "let's hope it
> has gone away with the newer version" doesn't inspire me with
> confidence on its own.

It maybe be stable for Debian but is Debian explicitly supporting
Btrfs with this release? I don't think they are. In which case, it's
at the least the wrong kernel version. The only distro explicitly
supporting Btrfs is openSUSE. So if you need Btrfs in particular to be
stable, and you don't want to have to think quite as much about
kernels, you could consider that.

But absolutely, of course we hope the problem is gone with the newer
version, *that's how file system development works.* If it hasn't, and
you reproduce the problem with kernel 4.4, then that means you've
found a new bug that needs to be fixed. And first, it'd only possibly
get fixed in 4.5 or newer before being backported to older kernels.
That's how it goes.

I can see how it might seem like it's a reasonable question to just
ask first, but it really isn't. There's just so much development
happening right now, a developer is not in a great position to think
that far back for specific problems and whether yours might be one of
them, and in what kernel version it was fixed. *shrug* just doesn't
work that way, that's why there are changelogs for every sub kernel
version.

>> But the more fundamental question is why you use btrfs? What features
>> do you need that ext4 or xfs or reiserfs don't have?
>
> Data checksumming. I don't mind a bit flipping here or there in old
> backups / archives but I'd have liked to know if something went bad
> and which files were affected. Compression. Dedup that works on mortal
> hardware.

Have you checked out ZFS on Linux? That might fit your use case better
because it has the features you're asking for, but at least the ZFS
portion is older and considered more stable.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-31  1:42     ` Chris Murphy
@ 2016-01-31 12:35       ` Christian Pernegger
  2016-01-31 18:06         ` Henk Slager
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Christian Pernegger @ 2016-01-31 12:35 UTC (permalink / raw)
  To: linux-btrfs

On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com> wrote:
> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger
> It maybe be stable for Debian but is Debian explicitly supporting
> Btrfs with this release? I don't think they are.

The modules are in the kernel, the progs are in the main archive, it's
an option in the installer. It's not the default fs but I couldn't
find any indication that it's more or less supported than, say, xfs.
Why they've chosen 3.16 (and not 3.18, which would be a long term
release) I don't know, but the fact remains that that's the default
kernel of a tier 1 distro, so people using it are going to be around
for a while.

> But absolutely, of course we hope the problem is gone with the newer
> version, *that's how file system development works.*

Be that as it may, as I said, that approach doesn't inspire
confidence. If I had the vaguest idea about how to reproduce it, sure,
but all I have is an apparently lightly corrupted or at the very least
glitchy fs (it mounts and unmounts just fine). How would I know if a
new kernel helped things?

> I can see how it might seem like it's a reasonable question to just
> ask first, but it really isn't. There's just so much development
> happening right now, a developer is not in a great position to think
> that far back for specific problems and whether yours might be one of
> them, and in what kernel version it was fixed. *shrug* just doesn't
> work that way, that's why there are changelogs for every sub kernel
> version.

I do understand your point of view, but: If a possible fs corruption
bug on a widespread (if older) kernel after one month of use and
without any discernible cause gets nothing more than *shrug* from this
list then btrfs isn't production ready nor ready for any kind of
day-to-day use, not because of code maturity but because of that
mindset. IMHO the btrfs-genie is too far out of the bottle for that,
the wording of the stability status on the wiki much too inviting.

Anyway, I knew what I was getting into, so I'll just chalk it up to
experience and move on. Keep up the good work!

> Have you checked out ZFS on Linux? That might fit your use case better
> because it has the features you're asking for, but at least the ZFS
> portion is older and considered more stable.

It seemed a bit over the top on a single disk and 4GB of (not even
ECC) RAM. Between btrfs' heavy development and zfsonlinux being stable
but needing potentially less stable Solaris-glue and having no
distro-side support I thought I'd try btrfs first.

Regards,
Christian Pernegger

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-31 12:35       ` Christian Pernegger
@ 2016-01-31 18:06         ` Henk Slager
  2016-02-01  1:59         ` Duncan
  2016-02-01  3:23         ` Chris Murphy
  2 siblings, 0 replies; 11+ messages in thread
From: Henk Slager @ 2016-01-31 18:06 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

> The modules are in the kernel, the progs are in the main archive, it's
> an option in the installer. It's not the default fs but I couldn't
> find any indication that it's more or less supported than, say, xfs.
> Why they've chosen 3.16 (and not 3.18, which would be a long term
> release) I don't know, but the fact remains that that's the default
> kernel of a tier 1 distro, so people using it are going to be around
> for a while.
>
>> But absolutely, of course we hope the problem is gone with the newer
>> version, *that's how file system development works.*
>
> Be that as it may, as I said, that approach doesn't inspire
> confidence. If I had the vaguest idea about how to reproduce it, sure,
> but all I have is an apparently lightly corrupted or at the very least
> glitchy fs (it mounts and unmounts just fine). How would I know if a
> new kernel helped things?

Boot the board with one of these images (a live one I would say):
http://download.opensuse.org/tumbleweed/iso/
This weekend this is kernel 4.4.0-2-default and tools 4.3.1

Then report back the result of btrfs check of the fs

You might get some (or millions) false positives from the check with
tools 4.3.1 (but fixed in v4.4), due to the tools version your fs is
created with. This is not a problem, at least is my experience.
But you can compile the v4.4 tools from
https://github.com/kdave/btrfs-progs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-31 12:35       ` Christian Pernegger
  2016-01-31 18:06         ` Henk Slager
@ 2016-02-01  1:59         ` Duncan
  2016-02-01  3:23         ` Chris Murphy
  2 siblings, 0 replies; 11+ messages in thread
From: Duncan @ 2016-02-01  1:59 UTC (permalink / raw)
  To: linux-btrfs

Christian Pernegger posted on Sun, 31 Jan 2016 13:35:58 +0100 as
excerpted:

> On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com>
> wrote:
>> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger It maybe be stable
>> for Debian but is Debian explicitly supporting Btrfs with this release?
>> I don't think they are.
> 
> The modules are in the kernel, the progs are in the main archive, it's
> an option in the installer. It's not the default fs but I couldn't find
> any indication that it's more or less supported than, say, xfs.
> Why they've chosen 3.16 (and not 3.18, which would be a long term
> release) I don't know, but the fact remains that that's the default
> kernel of a tier 1 distro, so people using it are going to be around for
> a while.

[To pernegger@ and list both, as requested.]

What the distro wishes to support is of course up to the distro.  See 
below.

>> But absolutely, of course we hope the problem is gone with the newer
>> version, *that's how file system development works.*
> 
> Be that as it may, as I said, that approach doesn't inspire confidence.
> If I had the vaguest idea about how to reproduce it, sure, but all I
> have is an apparently lightly corrupted or at the very least glitchy fs
> (it mounts and unmounts just fine). How would I know if a new kernel
> helped things?

Umm... Because you _try_ it?

And if you're not willing to _try_ it, why on earth are you running a 
still stabilizing, not fully stable and mature, filesystem, where the 
recommendation is to stay at least reasonably current as there's still 
bugs being actively fixed?

>> I can see how it might seem like it's a reasonable question to just ask
>> first, but it really isn't. There's just so much development happening
>> right now, a developer is not in a great position to think that far
>> back for specific problems and whether yours might be one of them, and
>> in what kernel version it was fixed. *shrug* just doesn't work that
>> way, that's why there are changelogs for every sub kernel version.
> 
> I do understand your point of view, but: If a possible fs corruption bug
> on a widespread (if older) kernel after one month of use and without any
> discernible cause gets nothing more than *shrug* from this list then
> btrfs isn't production ready nor ready for any kind of day-to-day use,
> not because of code maturity but because of that mindset. IMHO the
> btrfs-genie is too far out of the bottle for that,
> the wording of the stability status on the wiki much too inviting.

I know of no list regular claiming btrfs is production ready or fully 
stable.  In fact, the general position here is that btrfs is _not_ 
production ready, and that while btrfs is "stabilizING", it is "not yet 
fully stable and mature."

Yes, depending on the use-case, btrfs is or can be ready for routine 
daily use, provided people are aware of the situation, and are following 
the sysadmin's first rule of backups, which in simplest form says that if 
you don't have at least one backup, by definition of your (in)action, you 
are defining that data as worth less than the time/hassle/resources 
necessary to do that backup.

Of course that's the first rule of backups even if you're running on a 
fully stable and mature filesystem, and because btrfs isn't at that point 
yet, having at least one backup, and preferably more (because with btrfs 
not fully stable and mature, it can't be considered reliable as the 
primary working copy either, more a test deployment, which effectively 
makes the first backup the primary working copy, which means if it isn't 
backed up, thus a second backup, you're still defining the data as of 
little more than trivial value.

Additionally, given the stability situation, here on this list we 
generally rather strongly recommend that people run either the latest or 
at the oldest, the first back, of either the current kernel series or the 
LTS kernel series.  With the just released 4.4 an LTS kernel, and 4.1 the 
previous LTS, that means for best support here, and of course 4.4 current 
and 4.3 the previous current, that means for best support here, we're now 
recommending no older than the 4.4 or 4.1 LTS kernel series, or the 4.4 
or 4.3 current kernel series, tho with 4.4 so new, it's understandable if 
people are still on the second-back LTS, 3.18, provided they're already 
working on upgrading to LTS-4.1.

Of course we still do our best if people are running older than that, but 
because btrfs is still moving fast and older kernels have known bugs that 
are fixed in newer versions, previous to that is ancient history for us, 
and we're simply not able to support it to the same level we do the 
recommended kernels.  As such, people should expect that as soon as they 
have a problem, the first thing they're going to be asked to do is 
upgrade to something newer than the btrfs Paleolithic era (OK, I'm 
exaggerating a bit, Neolithic, then) and see if the problem is already 
fixed.

Of course what distros choose to support is up to them, and some are 
indeed supporting older btrfs, backporting fixes, etc.  But in that case 
people really should be getting their btrfs support from them as well, 
because they're best positioned to know what fixes they've backported to 
whatever arbitrary kernel version number they're using, while all we know 
is what mainline code of a comparable version was like.

Then of course there's the userspace tools, btrfs-progs.  While on a 
normal runtime kernel the kernel code is what counts as userspace 
primarily simply makes calls to the kernel and the kernel does all the 
work, as soon as you're using userspace to try to work with an offline 
btrfs, btrfs check, btrfs restore, etc, it's userspace code doing the 
work, and then running current userspace becomes critical, as again, it 
has all the bugfixes that older versions lack.  While both kernels and 
userspace are designed to work with both older and newer versions of the 
other, a good rule of thumb for userspace is to keep its version at least 
in sync with your kernel version.  That way, provided you're following 
the kernel recommendations of no more than one LTS kernel series back 
from the current LTS kernel series, userspace won't get too outdated, 
either.

As for old and stable, yes, there's legitimate reasons to want to run old 
and stable.  However, they tend not to be very compatible with wanting to 
run a new and still stabilizing filesystem that's not yet mature, since 
the filesystem code is still moving fast and there are /real/ bugs being 
fixed every release.  Thus, the general recommendation, on-list at least, 
is to pick one or the other, and if you pick old and stale^h^hble, forget 
about btrfs for the time being.  Again, what your distro may support and 
whether you choose to use that support is between you and the distro, but 
then, you really are probably better off actually using that distro 
support, since they're the ones that know what they've backported, etc, 
and not the list, where our focus is on further stabilization in 
reasonably current mainline current or LTS series kernels.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-31 12:35       ` Christian Pernegger
  2016-01-31 18:06         ` Henk Slager
  2016-02-01  1:59         ` Duncan
@ 2016-02-01  3:23         ` Chris Murphy
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Murphy @ 2016-02-01  3:23 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs

On Sun, Jan 31, 2016 at 5:35 AM, Christian Pernegger
<pernegger@gmail.com> wrote:
> On 31 January 2016 at 02:42, Chris Murphy <lists@colorremedies.com> wrote:
>> On Sat, Jan 30, 2016 at 2:19 PM, Christian Pernegger
>> It maybe be stable for Debian but is Debian explicitly supporting
>> Btrfs with this release? I don't think they are.
>
> The modules are in the kernel, the progs are in the main archive, it's
> an option in the installer. It's not the default fs but I couldn't
> find any indication that it's more or less supported than, say, xfs.
> Why they've chosen 3.16 (and not 3.18, which would be a long term
> release) I don't know, but the fact remains that that's the default
> kernel of a tier 1 distro, so people using it are going to be around
> for a while.

The Debian wiki on Btrfs basically defers to upstream. And upstream
Btrfs recommends using newer kernels than this. Part of it is that
there have been literally thousands of changes, there are hundreds of
bugs discovered and fixed since that kernel version. Another part is
there so much change no one likely has any idea how to cross reference
the changes with your particular problem. So the request is to use
something newer because it's a practical compromise. Dollars to donuts
only a developer would know such details and yet surely such a detail
is lost among thousands of others because by now 3.16 is ancient
history.

At the very least, you should find a way to use btrfs-progs 4.4,
'btrfs check' (without --repair) against this volume, and report the
results. That's safe.

The easiest way I can think to do it is a Fedora nightly. I just
tested this one:
https://kojipkgs.fedoraproject.org/mash/rawhide-20160130/rawhide/x86_64/os/images/boot.iso

It has kernel 4.4rc1+ and btrfs-progs 4.4. You can boot from the
troubleshooting menu, rescue option, and choose option 3 "Skip to
shell" and then run btrfs check, again without --repair. This ISO
boots BIOS and UEFI systems, just dd it to a stick.

If that comes up clean you can even mount the volume and scrub it (the
scrub code is kernel code even though it's activated by user space
tools; whereas the fsck is in the user space tools).

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger
  2016-01-30 20:10 ` Henk Slager
@ 2016-01-31  1:09 ` Chris Murphy
  2016-02-01 10:23 ` Patrik Lundquist
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Murphy @ 2016-01-31  1:09 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Btrfs BTRFS

On Sat, Jan 30, 2016 at 4:59 AM, Christian Pernegger
<pernegger@gmail.com> wrote:

> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497
> parent transid verify failed on 4693971959808 wanted 14495 found 14497

Well it's not that far off so mounting with -o recovery should work.

> Ignoring transid failure
> print-tree.c:1074: btrfs_print_tree: Assertion failed.
> btrfs-debug-tree[0x410489]
> btrfs-debug-tree[0x411dbf]
> btrfs-debug-tree[0x402adb]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f925b1ccb45]
> btrfs-debug-tree[0x402d85]

It could be a bug, and if so there's a good chance it's fixed in newer
versions of btrfs-progs.

>
> Ouch.
>
> This is on a 1-month-old Debian stable (jessie) install and yes, I
> know that means the kernel and btrfs-progs are ancient but I'd still
> very much appreciate some help. It's a backup box, so the data isn't
> critical, but of course I need it stable in the long run.

Sorry, you need it to be stable but you're using an EOL unsupported
kernel? That just doesn't square.

It's either a hardware problem (there are many softreset message, and
possibly more than one ata instance than you have attached devices
for, and no Btrfs errors), or it's a software bug. Either way you
kinda need to try something newer to see if the problem has been since
been fixed, because it's in the realm of 10,000 changes (probably
more) since that kernel version you're using. There might be 4 people
on the list who'd maybe recognize this, and say, yes in fact that was
fixed in a newer kernel. So really no matter what you just have to
upgrade.

> Is it
> possible to fix this and prevent it from happening again? (How) can I
> verify if the data is still good?

The user space program crashed either due to a bug or it ran out of
memory. Maybe increase swap size, sometimes that helps btrfs check and
btrfs-debug-tree go farther without problems.

Try mounting with -o recovery. If that doesn't work try -o
recovery,ro, and if that doesn't work then try btrfs check (without
repair), using kernel and progs no older than 4.1.15. It's middle aged
in Btrfs terms, but at least that's a longterm currently maintained
kernel.

  If the verdict is that I have to
> re-roll the box I wouldn't go with btrfs again at this time, but still
> be willing to help with debugging first, if anyone is interested.

I can almost guarantee that if -o recovery does not work, no one will
want to suggest anything more aggressive if you also aren't willing to
upgrade kernel and tools to something much newer. Really, if you need
Btrfs to be stable, you need to use a distro that makes it easy for
you to get the latest bug fixes, not just features.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger
  2016-01-30 20:10 ` Henk Slager
  2016-01-31  1:09 ` Chris Murphy
@ 2016-02-01 10:23 ` Patrik Lundquist
  2016-03-02 21:50   ` Nils Steinger
  2 siblings, 1 reply; 11+ messages in thread
From: Patrik Lundquist @ 2016-02-01 10:23 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs@vger.kernel.org

On 30 January 2016 at 12:59, Christian Pernegger <pernegger@gmail.com> wrote:
>
> This is on a 1-month-old Debian stable (jessie) install and yes, I
> know that means the kernel and btrfs-progs are ancient

apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64

Or something like that for the image name. Unfortunately there's no
stable backport of btrfs-tools (as they call btrfs-progs).

https://tracker.debian.org/pkg/linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "WARNING: device 0 not present" during scrub?
  2016-02-01 10:23 ` Patrik Lundquist
@ 2016-03-02 21:50   ` Nils Steinger
  0 siblings, 0 replies; 11+ messages in thread
From: Nils Steinger @ 2016-03-02 21:50 UTC (permalink / raw)
  To: Patrik Lundquist, Christian Pernegger,
	linux-btrfs@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 678 bytes --]

On 01.02.2016 11:23, Patrik Lundquist wrote:
> apt-get install -t jessie-backports linux-image-4.3.0-0.bpo.1-amd64
> 
> Or something like that for the image name. Unfortunately there's no
> stable backport of btrfs-tools (as they call btrfs-progs).

There is now: 4.4-1~bpo8+1

Upgrading from btrfs-tools 3.17 to 4.4 fixed the scrub aborts for me.

Oddly, `btrfs scrub start -B /dev/mapper/foo` terminates with exit code
0 when it aborts due to the "device 0" problem.
That's not supposed to happen, is it?
> EXIT STATUS
>       btrfs scrub returns a zero exit status if it succeeds. Non zero
>       is returned in case of failure.

Regards,
Nils Steinger


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-02 21:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-30 11:59 "WARNING: device 0 not present" during scrub? Christian Pernegger
2016-01-30 20:10 ` Henk Slager
2016-01-30 21:19   ` Christian Pernegger
2016-01-31  1:42     ` Chris Murphy
2016-01-31 12:35       ` Christian Pernegger
2016-01-31 18:06         ` Henk Slager
2016-02-01  1:59         ` Duncan
2016-02-01  3:23         ` Chris Murphy
2016-01-31  1:09 ` Chris Murphy
2016-02-01 10:23 ` Patrik Lundquist
2016-03-02 21:50   ` Nils Steinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).