how to repair a damaged filesystem with btrfs raid5

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* how to repair a damaged filesystem with btrfs raid5
@ 2015-01-27  9:12 Alexander Fieroch
  2015-01-27 12:01 ` Duncan
  2015-02-03  0:24 ` Tobias Holst
  0 siblings, 2 replies; 4+ messages in thread
From: Alexander Fieroch @ 2015-01-27  9:12 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4765 bytes --]

Hello,

I'm testing btrfs RAID5 on three encrypted hdds (dm-crypt) and I'm 
simulating a harddisk failure by unplugging one device while writing 
some files.
Now the filesystem is damaged. By now is there any chance to repair the 
filesystem?

My operating system is ubuntu server (vivid) with kernel 3.18 and btrfs 
3.18.1 (external PPA).
I've unplugged device sdb with UUID 65f62f63-6526-4d5e-82d4-adf6d7508092 
and crypt device name /dev/mapper/crypt-1. This one should be repaired.
Attached is the dmesg log file with corresponding errors.

btrfs check do not seem to work.

# btrfs check --repair /dev/mapper/crypt-1 enabling repair mode
Checking filesystem on /dev/mapper/crypt-1
UUID: 504c2850-3977-4340-8849-18dd3ac2e5e4
checking extents
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=65536
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=5385177728513973313
read block failed check_tree_block
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=65536
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=6895225932619678086
read block failed check_tree_block
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=65536
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=13399486021073017810
read block failed check_tree_block
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=65536
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=12571697019259051064
read block failed check_tree_block
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=65536
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=4069002570438424782
read block failed check_tree_block
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=65536
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=9612508092910615774
read block failed check_tree_block
ref mismatch on [99516416 16384] extent item 1, found 0
failed to repair damaged filesystem, aborting



Trying a btrfs scrub is finishing with uncorrectable errors:
# btrfs scrub start -d /dev/mapper/crypt-1 scrub started on 
/dev/mapper/crypt-1, fsid 504c2850-3977-4340-8849-18dd3ac2e5e4 (pid=2014)
# btrfs scrub status -d /mnt/data/
scrub status for 504c2850-3977-4340-8849-18dd3ac2e5e4
scrub device /dev/mapper/crypt-1 (id 1) history
         scrub started at Mon Jan 26 14:36:57 2015 and finished after 
617 seconds
         total bytes scrubbed: 29.78GiB with 10906 errors
         error details: csum=10906
         corrected errors: 0, uncorrectable errors: 10906, unverified 
errors: 0
scrub device /dev/mapper/crypt-2 (id 2)         no stats available
scrub device /dev/mapper/crypt-3 (id 3)         no stats available


Any chance to fix the errors or do I have to wait for the next btrfs 
version?
Thank you very much,
Alexander


# uname -a
Linux antares 3.18.0-9-generic #10-Ubuntu SMP Mon Jan 12 21:41:54 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
Btrfs v3.18.1

# btrfs fi show
Label: 'antares-data'  uuid: 504c2850-3977-4340-8849-18dd3ac2e5e4
           Total devices 3 FS bytes used 89.35GiB
           devid    1 size 698.63GiB used 47.03GiB path /dev/mapper/crypt-1
           devid    2 size 698.63GiB used 47.01GiB path /dev/mapper/crypt-2
           devid    3 size 698.63GiB used 47.01GiB path /dev/mapper/crypt-3

# btrfs fi df /mnt/data/
Data, single: total=8.00MiB, used=0.00B
Data, RAID5: total=92.00GiB, used=89.25GiB
System, single: total=4.00MiB, used=0.00B
System, RAID5: total=16.00MiB, used=16.00KiB
Metadata, single: total=8.00MiB, used=0.00B
Metadata, RAID5: total=2.00GiB, used=100.44MiB
GlobalReserve, single: total=48.00MiB, used=0.00B





[-- Attachment #1.2: dmesg.log.gz --]
[-- Type: application/gzip, Size: 20290 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4860 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to repair a damaged filesystem with btrfs raid5
  2015-01-27  9:12 how to repair a damaged filesystem with btrfs raid5 Alexander Fieroch
@ 2015-01-27 12:01 ` Duncan
  2015-02-03  0:24 ` Tobias Holst
  1 sibling, 0 replies; 4+ messages in thread
From: Duncan @ 2015-01-27 12:01 UTC (permalink / raw)
  To: linux-btrfs

Alexander Fieroch posted on Tue, 27 Jan 2015 10:12:35 +0100 as excerpted:

> I'm testing btrfs RAID5 on three encrypted hdds (dm-crypt) and I'm
> simulating a harddisk failure by unplugging one device while writing
> some files.
> Now the filesystem is damaged. By now is there any chance to repair the
> filesystem?
> 
> My operating system is ubuntu server (vivid) with kernel 3.18 and btrfs
> 3.18.1 (external PPA).

Your kernel is too old for proper btrfs raid56 support.  Yes, it's the 
newest kernel stable kernel cycle, but...

When raid56 mode (that's either/or, so 5 or 6, the same raid56 code 
supports both) was introduced several kernel cycles ago, it was 
incomplete -- normal runtime support was there, but the recovery code was 
incomplete, so in effect those running it were running a slower raid0 -- 
loss of a device could mean loss of the entire filesystem (tho recovery 
was possible in limited cases, but the point is, you couldn't count on 
it, so from an admin perspective it was a slow raid0, sudden-death at 
loss of a device).

The raid56 mode status is, however, *JUST* changing.  The still in 
development kernel 3.19 series should be very close to code-complete, 
altho there's still known bugs ATM that may well not be fixed until the 
3.20 series.

So a fresh 3.19-rc or git development kernel, preferably combined with 
the 3.19 series btrfs-progs userspace (which won't actually be released 
until after the 3.19 kernel, so you'll be grabbing a development branch 
from git to get that code now), should be very close to working with 
raid56 mode, while anything older won't, at least for reliable recovery.  
And even 3.19 is likely to still have bugs to work out in that code.  I'd 
thus recommend waiting at /least/ until 3.20 for raid56, if not 21 or 
later, depending on how bug tolerant you are.

For present usage, meanwhile, I'd strongly recommend sticking with raid1 
or raid10, which are in general as stable as btrfs itself is ATM, which 
is to say not /entirely/ stable yet, so if you don't have tested-working 
backups by definition you don't care if you lose the data on the 
filesystem, but it's working /reasonably/ well, without too much trouble, 
for most users at this time.

FWIW, raid56 mode's incomplete status was in the original commit message 
for it, and the status hasn't changed from that since.  It's also 
documented on the btrfs wiki[1], and is well known on this list.  Still, 
you're not the first to have missed it. <shrug>

---
[1] https://btrfs.wiki.kernel.org

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to repair a damaged filesystem with btrfs raid5
  2015-01-27  9:12 how to repair a damaged filesystem with btrfs raid5 Alexander Fieroch
  2015-01-27 12:01 ` Duncan
@ 2015-02-03  0:24 ` Tobias Holst
  2015-02-03  7:59   ` Alexander Fieroch
  1 sibling, 1 reply; 4+ messages in thread
From: Tobias Holst @ 2015-02-03  0:24 UTC (permalink / raw)
  To: Alexander Fieroch; +Cc: linux-btrfs@vger.kernel.org

Hi.

There is a known bug when you re-plug in a missing hdd of a btrfs raid
without wiping the device before. In worst case this results in a
totally corrupted filesystem as it did sometimes during my tests of
the raid6 implementation. With raid1 it may just "go back in time" to
the point when you unplugged the device. Which is also bad but still
no complete data loss - but in raid6 sometimes it was worse.

Sounds like you did that (plug in the missing device without wiping)?

Next thing is, that scrub and filesystem-check of raid5/6 is not
implemented/completed (yet) as Duncan said. It will be (mostly)
included in 3.19, but maybe with bugs.

You may try to do a balance instead of a scrub as this should read and
check your data and then write it back. This worked for me most of the
time during my personal raid6 stability and stress tests. But maybe
your filesystem has already been corrupted...
Give it a try :)

Regards
Tobias


2015-01-27 10:12 GMT+01:00 Alexander Fieroch
<alexander.fieroch@mpi-dortmund.mpg.de>:
> Hello,
>
> I'm testing btrfs RAID5 on three encrypted hdds (dm-crypt) and I'm
> simulating a harddisk failure by unplugging one device while writing some
> files.
> Now the filesystem is damaged. By now is there any chance to repair the
> filesystem?
>
> My operating system is ubuntu server (vivid) with kernel 3.18 and btrfs
> 3.18.1 (external PPA).
> I've unplugged device sdb with UUID 65f62f63-6526-4d5e-82d4-adf6d7508092 and
> crypt device name /dev/mapper/crypt-1. This one should be repaired.
> Attached is the dmesg log file with corresponding errors.
>
> btrfs check do not seem to work.
>
> # btrfs check --repair /dev/mapper/crypt-1 enabling repair mode
> Checking filesystem on /dev/mapper/crypt-1
> UUID: 504c2850-3977-4340-8849-18dd3ac2e5e4
> checking extents
> Check tree block failed, want=165396480, have=5385177728513973313
> Check tree block failed, want=165396480, have=5385177728513973313
> Check tree block failed, want=165396480, have=65536
> Check tree block failed, want=165396480, have=5385177728513973313
> Check tree block failed, want=165396480, have=5385177728513973313
> read block failed check_tree_block
> Check tree block failed, want=165740544, have=6895225932619678086
> Check tree block failed, want=165740544, have=6895225932619678086
> Check tree block failed, want=165740544, have=65536
> Check tree block failed, want=165740544, have=6895225932619678086
> Check tree block failed, want=165740544, have=6895225932619678086
> read block failed check_tree_block
> Check tree block failed, want=165756928, have=13399486021073017810
> Check tree block failed, want=165756928, have=13399486021073017810
> Check tree block failed, want=165756928, have=65536
> Check tree block failed, want=165756928, have=13399486021073017810
> Check tree block failed, want=165756928, have=13399486021073017810
> read block failed check_tree_block
> Check tree block failed, want=165773312, have=12571697019259051064
> Check tree block failed, want=165773312, have=12571697019259051064
> Check tree block failed, want=165773312, have=65536
> Check tree block failed, want=165773312, have=12571697019259051064
> Check tree block failed, want=165773312, have=12571697019259051064
> read block failed check_tree_block
> Check tree block failed, want=165789696, have=4069002570438424782
> Check tree block failed, want=165789696, have=4069002570438424782
> Check tree block failed, want=165789696, have=65536
> Check tree block failed, want=165789696, have=4069002570438424782
> Check tree block failed, want=165789696, have=4069002570438424782
> read block failed check_tree_block
> Check tree block failed, want=165838848, have=9612508092910615774
> Check tree block failed, want=165838848, have=9612508092910615774
> Check tree block failed, want=165838848, have=65536
> Check tree block failed, want=165838848, have=9612508092910615774
> Check tree block failed, want=165838848, have=9612508092910615774
> read block failed check_tree_block
> ref mismatch on [99516416 16384] extent item 1, found 0
> failed to repair damaged filesystem, aborting
>
>
>
> Trying a btrfs scrub is finishing with uncorrectable errors:
> # btrfs scrub start -d /dev/mapper/crypt-1 scrub started on
> /dev/mapper/crypt-1, fsid 504c2850-3977-4340-8849-18dd3ac2e5e4 (pid=2014)
> # btrfs scrub status -d /mnt/data/
> scrub status for 504c2850-3977-4340-8849-18dd3ac2e5e4
> scrub device /dev/mapper/crypt-1 (id 1) history
>         scrub started at Mon Jan 26 14:36:57 2015 and finished after 617
> seconds
>         total bytes scrubbed: 29.78GiB with 10906 errors
>         error details: csum=10906
>         corrected errors: 0, uncorrectable errors: 10906, unverified errors:
> 0
> scrub device /dev/mapper/crypt-2 (id 2)         no stats available
> scrub device /dev/mapper/crypt-3 (id 3)         no stats available
>
>
> Any chance to fix the errors or do I have to wait for the next btrfs
> version?
> Thank you very much,
> Alexander
>
>
> # uname -a
> Linux antares 3.18.0-9-generic #10-Ubuntu SMP Mon Jan 12 21:41:54 UTC 2015
> x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> Btrfs v3.18.1
>
> # btrfs fi show
> Label: 'antares-data'  uuid: 504c2850-3977-4340-8849-18dd3ac2e5e4
>           Total devices 3 FS bytes used 89.35GiB
>           devid    1 size 698.63GiB used 47.03GiB path /dev/mapper/crypt-1
>           devid    2 size 698.63GiB used 47.01GiB path /dev/mapper/crypt-2
>           devid    3 size 698.63GiB used 47.01GiB path /dev/mapper/crypt-3
>
> # btrfs fi df /mnt/data/
> Data, single: total=8.00MiB, used=0.00B
> Data, RAID5: total=92.00GiB, used=89.25GiB
> System, single: total=4.00MiB, used=0.00B
> System, RAID5: total=16.00MiB, used=16.00KiB
> Metadata, single: total=8.00MiB, used=0.00B
> Metadata, RAID5: total=2.00GiB, used=100.44MiB
> GlobalReserve, single: total=48.00MiB, used=0.00B
>
>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to repair a damaged filesystem with btrfs raid5
  2015-02-03  0:24 ` Tobias Holst
@ 2015-02-03  7:59   ` Alexander Fieroch
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Fieroch @ 2015-02-03  7:59 UTC (permalink / raw)
  To: Tobias Holst; +Cc: linux-btrfs@vger.kernel.org, Duncan


[-- Attachment #1.1: Type: text/plain, Size: 20326 bytes --]

Am 03.02.2015 um 01:24 schrieb Tobias Holst:
> Hi.

Hi,

> There is a known bug when you re-plug in a missing hdd of a btrfs raid
> without wiping the device before. In worst case this results in a
> totally corrupted filesystem as it did sometimes during my tests of
> the raid6 implementation. With raid1 it may just "go back in time" to
> the point when you unplugged the device. Which is also bad but still
> no complete data loss - but in raid6 sometimes it was worse.
> 
> Sounds like you did that (plug in the missing device without wiping)?

yes I just unplugged the device for a few seconds and re-plugged it in again.

> 
> Next thing is, that scrub and filesystem-check of raid5/6 is not
> implemented/completed (yet) as Duncan said. It will be (mostly)
> included in 3.19, but maybe with bugs.

Unfortunately a filesystem check is still not working with kernel 3.19rc6 and btrfs 3.18.1:

# btrfs check --repair /dev/mapper/crypt-3 
enabling repair mode
Checking filesystem on /dev/mapper/crypt-3
UUID: 504c2850-3977-4340-8849-18dd3ac2e5e4
checking extents
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=65536
Check tree block failed, want=165396480, have=5385177728513973313
Check tree block failed, want=165396480, have=5385177728513973313
read block failed check_tree_block
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=65536
Check tree block failed, want=165740544, have=6895225932619678086
Check tree block failed, want=165740544, have=6895225932619678086
read block failed check_tree_block
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=65536
Check tree block failed, want=165756928, have=13399486021073017810
Check tree block failed, want=165756928, have=13399486021073017810
read block failed check_tree_block
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=65536
Check tree block failed, want=165773312, have=12571697019259051064
Check tree block failed, want=165773312, have=12571697019259051064
read block failed check_tree_block
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=65536
Check tree block failed, want=165789696, have=4069002570438424782
Check tree block failed, want=165789696, have=4069002570438424782
read block failed check_tree_block
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=65536
Check tree block failed, want=165838848, have=9612508092910615774
Check tree block failed, want=165838848, have=9612508092910615774
read block failed check_tree_block
ref mismatch on [99516416 16384] extent item 1, found 0
failed to repair damaged filesystem, aborting

> You may try to do a balance instead of a scrub as this should read and
> check your data and then write it back. This worked for me most of the
> time during my personal raid6 stability and stress tests.

I tried a balance and got a btrfs error trace in syslog as well as several sata errors (complete syslog is attached):

[ 1835.914430] ------------[ cut here ]------------
[ 1835.914468] WARNING: CPU: 3 PID: 15995 at /home/kernel/COD/linux/fs/btrfs/inode.c:923 cow_file_range+0x44c/0x460 [btrfs]()
[ 1835.914470] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel coretemp ppdev snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer kvm_intel kvm snd soundcore dcdbas mei_me mei serio_raw mac_hid parport_pc lpc_ich 8250_fintek shpchp parport btrfs xor raid6_pq xts gf128mul dm_crypt amdkfd amd_iommu_v2 radeon hid_generic i2c_algo_bit e1000e ttm uas drm_kms_helper usbhid ahci sata_sil usb_storage psmouse ptp hid libahci drm pps_core pata_acpi
[ 1835.914511] CPU: 3 PID: 15995 Comm: btrfs Not tainted 3.19.0-031900rc6-generic #201501261152
[ 1835.914513] Hardware name: Dell Inc. OptiPlex 755                 /0GM819, BIOS A22 06/11/2012
[ 1835.914514]  000000000000039b ffff8800a784b2f8 ffffffff817c4584 0000000000000007
[ 1835.914517]  0000000000000000 ffff8800a784b338 ffffffff81076df7 0000000000000004
[ 1835.914520]  00000000ffffffea ffff8800b056c6d0 0000000000000000 0000000000080000
[ 1835.914523] Call Trace:
[ 1835.914530]  [<ffffffff817c4584>] dump_stack+0x45/0x57
[ 1835.914535]  [<ffffffff81076df7>] warn_slowpath_common+0x97/0xe0
[ 1835.914538]  [<ffffffff81076e5a>] warn_slowpath_null+0x1a/0x20
[ 1835.914554]  [<ffffffffc073e5ec>] cow_file_range+0x44c/0x460 [btrfs]
[ 1835.914572]  [<ffffffffc0758d35>] ? free_extent_buffer+0x35/0x40 [btrfs]
[ 1835.914588]  [<ffffffffc073ea9b>] run_delalloc_nocow+0x49b/0xae0 [btrfs]
[ 1835.914605]  [<ffffffffc073f26e>] run_delalloc_range+0x18e/0x1b0 [btrfs]
[ 1835.914622]  [<ffffffffc07550a4>] writepage_delalloc.isra.33+0xf4/0x170 [btrfs]
[ 1835.914639]  [<ffffffffc07576ff>] __extent_writepage+0xcf/0x280 [btrfs]
[ 1835.914643]  [<ffffffff811b6f70>] ? SyS_msync+0x230/0x230
[ 1835.914661]  [<ffffffffc0757b6a>] extent_write_cache_pages.isra.26.constprop.39+0x2ba/0x420 [btrfs]
[ 1835.914679]  [<ffffffffc07581ce>] extent_writepages+0x4e/0x70 [btrfs]
[ 1835.914694]  [<ffffffffc073c400>] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs]
[ 1835.914710]  [<ffffffffc0738758>] btrfs_writepages+0x28/0x30 [btrfs]
[ 1835.914714]  [<ffffffff81188a00>] do_writepages+0x20/0x40
[ 1835.914717]  [<ffffffff8117dd99>] __filemap_fdatawrite_range+0x59/0x60
[ 1835.914719]  [<ffffffff8117e5c3>] filemap_fdatawrite_range+0x13/0x20
[ 1835.914735]  [<ffffffffc074bedb>] btrfs_fdatawrite_range+0x2b/0x70 [btrfs]
[ 1835.914753]  [<ffffffffc07517ac>] btrfs_wait_ordered_range+0x4c/0x150 [btrfs]
[ 1835.914770]  [<ffffffffc077c70c>] __btrfs_write_out_cache+0x39c/0x4b0 [btrfs]
[ 1835.914788]  [<ffffffffc077cb5f>] btrfs_write_out_cache+0x9f/0x100 [btrfs]
[ 1835.914802]  [<ffffffffc0725492>] btrfs_write_dirty_block_groups+0x252/0x290 [btrfs]
[ 1835.914817]  [<ffffffffc07af4ad>] update_cowonly_root+0x42/0xa8 [btrfs]
[ 1835.914832]  [<ffffffffc07af666>] commit_cowonly_roots+0x153/0x1b5 [btrfs]
[ 1835.914847]  [<ffffffffc073625d>] btrfs_commit_transaction+0x4bd/0xa70 [btrfs]
[ 1835.914865]  [<ffffffffc078a6d5>] prepare_to_merge+0x1f5/0x230 [btrfs]
[ 1835.914882]  [<ffffffffc078a9fa>] relocate_block_group+0x2ea/0x510 [btrfs]
[ 1835.914899]  [<ffffffffc078add0>] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
[ 1835.914917]  [<ffffffffc0760b1b>] btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
[ 1835.914934]  [<ffffffffc0761f58>] __btrfs_balance+0x348/0x460 [btrfs]
[ 1835.914951]  [<ffffffffc0762425>] btrfs_balance+0x3b5/0x5d0 [btrfs]
[ 1835.914969]  [<ffffffffc076e36c>] btrfs_ioctl_balance+0x1cc/0x530 [btrfs]
[ 1835.914986]  [<ffffffffc077033e>] btrfs_ioctl+0x69e/0xb20 [btrfs]
[ 1835.914989]  [<ffffffff811b3848>] ? do_brk+0x258/0x330
[ 1835.914993]  [<ffffffff81207475>] do_vfs_ioctl+0x75/0x320
[ 1835.914996]  [<ffffffff812077b1>] SyS_ioctl+0x91/0xb0
[ 1835.915000]  [<ffffffff817d18ad>] system_call_fastpath+0x16/0x1b
[ 1835.915002] ---[ end trace 42ee65a1e517ce93 ]---

After this I cannot umount the filesystem anymore. I get new errors in syslog:

[ 3000.196039] INFO: task kworker/u16:0:15873 blocked for more than 120 seconds.
[ 3000.198122]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.200199] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.202298] kworker/u16:0   D ffff880213cbbcc8     0 15873      2 0x00000000
[ 3000.202341] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.202344]  ffff880213cbbcc8 ffff880213cbbc78 ffff880213cbbfd8 00000000000141c0
[ 3000.202349]  ffff8800b98e3100 ffff88022992e220 ffff880214f744b0 ffff880213cbbcd8
[ 3000.202353]  ffff8801f0c6b850 ffff8801f0c6b8d8 0000000000000000 ffff8800365f9df0
[ 3000.202357] Call Trace:
[ 3000.202366]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.202392]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.202399]  [<ffffffff810a068c>] ? ttwu_do_wakeup+0x2c/0x100
[ 3000.202403]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.202429]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.202455]  [<ffffffffc076720e>] normal_work_helper+0x7e/0x1b0 [btrfs]
[ 3000.202481]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.202486]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.202490]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.202495]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.202499]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.202503]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.202508]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.202512]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.202516] INFO: task kworker/u16:2:15945 blocked for more than 120 seconds.
[ 3000.204622]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.206720] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.208825] kworker/u16:2   D ffff8800adf13cc8     0 15945      2 0x00000000
[ 3000.208856] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.208859]  ffff8800adf13cc8 ffff8800adf13c78 ffff8800adf13fd8 00000000000141c0
[ 3000.208863]  ffff8800b98e3100 ffff8800bad88000 ffff880229929d70 ffff8800adf13cd8
[ 3000.208867]  ffff8801f0c6b9f8 ffff8801f0c6ba80 0000000000000000 ffff8800365f9df0
[ 3000.208871] Call Trace:
[ 3000.208876]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.208901]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.208914]  [<ffffffff810a068c>] ? ttwu_do_wakeup+0x2c/0x100
[ 3000.208920]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.208945]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.208971]  [<ffffffffc076720e>] normal_work_helper+0x7e/0x1b0 [btrfs]
[ 3000.208996]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.209001]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.209005]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.209009]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.209013]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.209017]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.209022]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.209026]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.209029] INFO: task kworker/u16:6:15951 blocked for more than 120 seconds.
[ 3000.211143]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.213270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.215403] kworker/u16:6   D ffff8800acdcbcc8     0 15951      2 0x00000000
[ 3000.215431] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.215434]  ffff8800acdcbcc8 ffff8800acdcbc78 ffff8800acdcbfd8 00000000000141c0
[ 3000.215438]  ffff8800b98e3100 ffff880214f744b0 ffff8800bad8e220 ffff8800acdcbcd8
[ 3000.215442]  ffff88009b6d6470 ffff88009b6d64f8 0000000000000000 ffff8800365f9df0
[ 3000.215446] Call Trace:
[ 3000.215450]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.215475]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.215480]  [<ffffffff810a068c>] ? ttwu_do_wakeup+0x2c/0x100
[ 3000.215484]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.215509]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.215534]  [<ffffffffc076720e>] normal_work_helper+0x7e/0x1b0 [btrfs]
[ 3000.215560]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.215564]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.215569]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.215573]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.215577]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.215581]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.215586]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.215590]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.215593] INFO: task kworker/u16:7:15952 blocked for more than 120 seconds.
[ 3000.217734]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.219857] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.221989] kworker/u16:7   D ffff8800997bfcc8     0 15952      2 0x00000000
[ 3000.222019] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.222021]  ffff8800997bfcc8 ffff8800997bfc78 ffff8800997bffd8 00000000000141c0
[ 3000.222025]  ffff8800b98e3100 ffff880229929d70 ffff88022992e220 ffff8800997bfcd8
[ 3000.222029]  ffff8801f0c6bd48 ffff8801f0c6bdd0 0000000000000000 ffff8800365f9df0
[ 3000.222033] Call Trace:
[ 3000.222038]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.222070]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.222076]  [<ffffffff810a068c>] ? ttwu_do_wakeup+0x2c/0x100
[ 3000.222079]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.222105]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.222130]  [<ffffffffc076720e>] normal_work_helper+0x7e/0x1b0 [btrfs]
[ 3000.222156]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.222160]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.222164]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.222169]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.222172]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.222177]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.222181]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.222185]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.222191] INFO: task kworker/u16:1:16059 blocked for more than 120 seconds.
[ 3000.224322]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.226463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.228612] kworker/u16:1   D ffff8801c903fcc8     0 16059      2 0x00000000
[ 3000.228643] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.228646]  ffff8801c903fcc8 ffff8801c903fc78 ffff8801c903ffd8 00000000000141c0
[ 3000.228650]  ffff8800b98e3100 ffff88022d253110 ffff8800bad88000 ffff8801c903fcd8
[ 3000.228654]  ffff8801f0c6bba0 ffff8801f0c6bc28 0000000000000000 ffff8800365f9df0
[ 3000.228658] Call Trace:
[ 3000.228663]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.228688]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.228692]  [<ffffffff810a068c>] ? ttwu_do_wakeup+0x2c/0x100
[ 3000.228696]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.228721]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.228754]  [<ffffffffc076720e>] normal_work_helper+0x7e/0x1b0 [btrfs]
[ 3000.228784]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.228789]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.228793]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.228797]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.228801]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.228805]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.228810]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.228814]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.228818] INFO: task kworker/u16:3:16076 blocked for more than 120 seconds.
[ 3000.230960]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.233105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.235269] kworker/u16:3   D ffff8801d48d7cc8     0 16076      2 0x00000000
[ 3000.235297] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
[ 3000.235299]  ffff8801d48d7cc8 ffff8801d48d7c78 ffff8801d48d7fd8 00000000000141c0
[ 3000.235303]  ffff8800b98e3100 ffff88022e6209d0 ffff88022d253110 ffff8801d48d7cd8
[ 3000.235307]  ffff8801f0c6b500 ffff8801f0c6b588 0000000000000000 ffff8800365f9df0
[ 3000.235311] Call Trace:
[ 3000.235316]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.235341]  [<ffffffffc0751305>] btrfs_start_ordered_extent+0x115/0x140 [btrfs]
[ 3000.235346]  [<ffffffff810a4007>] ? wake_up_process+0x27/0x50
[ 3000.235350]  [<ffffffff810b7440>] ? prepare_to_wait_event+0x100/0x100
[ 3000.235375]  [<ffffffffc0751359>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 3000.235401]  [<ffffffffc07672d2>] normal_work_helper+0x142/0x1b0 [btrfs]
[ 3000.235426]  [<ffffffffc0767392>] btrfs_flush_delalloc_helper+0x12/0x20 [btrfs]
[ 3000.235431]  [<ffffffff8108f6dd>] process_one_work+0x14d/0x460
[ 3000.235435]  [<ffffffff810900bb>] worker_thread+0x11b/0x3f0
[ 3000.235439]  [<ffffffff8108ffa0>] ? create_worker+0x1e0/0x1e0
[ 3000.235443]  [<ffffffff81095cc9>] kthread+0xc9/0xe0
[ 3000.235447]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.235452]  [<ffffffff817d17fc>] ret_from_fork+0x7c/0xb0
[ 3000.235456]  [<ffffffff81095c00>] ? flush_kthread_worker+0x90/0x90
[ 3000.235459] INFO: task umount:16078 blocked for more than 120 seconds.
[ 3000.237630]       Tainted: G        W      3.19.0-031900rc6-generic #201501261152
[ 3000.239788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3000.241955] umount          D ffff8801beac3b58     0 16078   1623 0x00000000
[ 3000.241962]  ffff8801beac3b58 ffff8801beac3b28 ffff8801beac3fd8 00000000000141c0
[ 3000.241966]  ffff8800b98e3100 ffff88022e620000 ffff88022d2513a0 ffff8801beac3b48
[ 3000.241970]  ffff88009b6d6590 7fffffffffffffff 7fffffffffffffff ffff88022d2513a0
[ 3000.241974] Call Trace:
[ 3000.241979]  [<ffffffff817cd6b9>] schedule+0x29/0x70
[ 3000.241983]  [<ffffffff817d0445>] schedule_timeout+0x1b5/0x210
[ 3000.241987]  [<ffffffff810a4007>] ? wake_up_process+0x27/0x50
[ 3000.241991]  [<ffffffff817cebc7>] wait_for_completion+0xa7/0x160
[ 3000.241996]  [<ffffffff810a3fa0>] ? try_to_wake_up+0x2a0/0x2a0
[ 3000.242021]  [<ffffffffc0750fca>] btrfs_wait_ordered_extents+0x20a/0x250 [btrfs]
[ 3000.242055]  [<ffffffffc0751138>] btrfs_wait_ordered_roots+0x128/0x1e0 [btrfs]
[ 3000.242076]  [<ffffffffc0703e91>] btrfs_sync_fs+0x51/0x150 [btrfs]
[ 3000.242082]  [<ffffffff81224e70>] __sync_filesystem+0x30/0x60
[ 3000.242086]  [<ffffffff81224f1b>] sync_filesystem+0x4b/0x70
[ 3000.242091]  [<ffffffff811f6dcb>] generic_shutdown_super+0x3b/0x110
[ 3000.242096]  [<ffffffff8118d88d>] ? unregister_shrinker+0x5d/0x70
[ 3000.242100]  [<ffffffff811f6f36>] kill_anon_super+0x16/0x30
[ 3000.242118]  [<ffffffffc070710e>] btrfs_kill_super+0x1e/0x130 [btrfs]
[ 3000.242123]  [<ffffffff811f7129>] deactivate_locked_super+0x59/0x80
[ 3000.242127]  [<ffffffff811f7dee>] deactivate_super+0x4e/0x70
[ 3000.242132]  [<ffffffff812138d3>] cleanup_mnt+0x43/0x90
[ 3000.242136]  [<ffffffff81213972>] __cleanup_mnt+0x12/0x20
[ 3000.242139]  [<ffffffff81093dff>] task_work_run+0xaf/0xf0
[ 3000.242145]  [<ffffffff81015077>] do_notify_resume+0xc7/0xd0
[ 3000.242149]  [<ffffffff817d1b4f>] int_signal+0x12/0x17

After a reboot the filesystem is still damaged, not repairable with a filesystem check but accessible.

> But maybe
> your filesystem has already been corrupted...
> Give it a try :)

Yea, no problem. It's just to test the current stability of raid5 and maybe to find bugs for you developers.

Regards,
Alexander

> Regards
> Tobias


[-- Attachment #1.2: dmesg.log.gz --]
[-- Type: application/gzip, Size: 23072 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4860 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-03  8:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-27  9:12 how to repair a damaged filesystem with btrfs raid5 Alexander Fieroch
2015-01-27 12:01 ` Duncan
2015-02-03  0:24 ` Tobias Holst
2015-02-03  7:59   ` Alexander Fieroch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).