From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Kai Krakow <hurikhan77@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: csum errors in VirtualBox VDI files
Date: Sun, 27 Mar 2016 21:55:40 +0800 [thread overview]
Message-ID: <56F7E65C.7020300@gmx.com> (raw)
In-Reply-To: <20160327035038.5ab6e64b@jupiter.sol.kaishome.de>
On 03/27/2016 09:50 AM, Kai Krakow wrote:
> Am Sat, 26 Mar 2016 20:30:35 +0100
> schrieb Kai Krakow <hurikhan77@gmail.com>:
>
>> Am Wed, 23 Mar 2016 12:16:24 +0800
>> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>>
>>> Kai Krakow wrote on 2016/03/22 19:48 +0100:
>>>> Am Tue, 22 Mar 2016 16:47:10 +0800
>>>> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>>>>
>> [...]
>>> [...]
>> [...]
>>>>
>>>> Apparently, that system does not boot now due to errors in bcache
>>>> b-tree. That being that, it may well be some bcache error and not
>>>> btrfs' fault. Apparently I couldn't catch the output, I've been
>>>> in a hurry. It said "write error" and had some backtrace. I will
>>>> come to this back later.
>>>>
>>>> Let's go to the system I currently care about (that one with the
>>>> always breaking VDI file):
>>>>
>>> [...]
>> [...]
>>>>
>>>> After the error occured?
>>>>
>>>> Yes, some text about the extent being compressed and btrfs repair
>>>> doesn't currently handle that case (I tried --repair as I'm
>>>> having a backup). I simply decided not to investigate that
>>>> further at that point but delete and restore the affected file
>>>> from backup. However, this is the message from dmesg (tho, I
>>>> didn't catch the backtrace):
>>>>
>>>> btrfs_run_delayed_refs:2927: errno=-17 Object already exists
>>>
>>> That's nice, at least we have some clue.
>>>
>>> It's almost sure, it's a bug either in btrfs kernel which doesn't
>>> handle delayed refs well(low possibility), or, corrupted fs which
>>> create something kernel can't handle(I bet that's the case).
>>
>> [kernel 4.5.0 gentoo, btrfs-progs 4.4.1]
>>
>> Well, this time it hit me on the USB backup drive which uses no bcache
>> and no other fancy options except compress-force=zlib. Apparently,
>> I've only got a (real) screenshot which I'm going to link here:
>>
>> https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0
>>
>> The same drive has no problems except "bad metadata crossing stripe
>> boundary" - but a lot of them. This drive was never converted, it was
>> freshly generated several months ago.
>> [...]
>
> I finally got copy&paste data:
>
> # before mounting let's check the FS:
>
> $ sudo btrfsck /dev/disk/by-label/usb-backup
> Checking filesystem on /dev/disk/by-label/usb-backup
> UUID: 1318ec21-c421-4e36-a44a-7be3d41f9c3f
> checking extents
> bad metadata [156041216, 156057600) crossing stripe boundary
> bad metadata [181403648, 181420032) crossing stripe boundary
> bad metadata [392167424, 392183808) crossing stripe boundary
> bad metadata [783482880, 783499264) crossing stripe boundary
> bad metadata [784924672, 784941056) crossing stripe boundary
> bad metadata [130151612416, 130151628800) crossing stripe boundary
> bad metadata [162826813440, 162826829824) crossing stripe boundary
> bad metadata [162927083520, 162927099904) crossing stripe boundary
> bad metadata [619740659712, 619740676096) crossing stripe boundary
> bad metadata [619781947392, 619781963776) crossing stripe boundary
> bad metadata [619795644416, 619795660800) crossing stripe boundary
> bad metadata [619816091648, 619816108032) crossing stripe boundary
> bad metadata [620011388928, 620011405312) crossing stripe boundary
> bad metadata [890992459776, 890992476160) crossing stripe boundary
> bad metadata [891022737408, 891022753792) crossing stripe boundary
> bad metadata [891101773824, 891101790208) crossing stripe boundary
> bad metadata [891301199872, 891301216256) crossing stripe boundary
> bad metadata [1012219314176, 1012219330560) crossing stripe boundary
> bad metadata [1017202409472, 1017202425856) crossing stripe boundary
> bad metadata [1017365397504, 1017365413888) crossing stripe boundary
> bad metadata [1020764422144, 1020764438528) crossing stripe boundary
> bad metadata [1251103342592, 1251103358976) crossing stripe boundary
> bad metadata [1251144695808, 1251144712192) crossing stripe boundary
> bad metadata [1251147055104, 1251147071488) crossing stripe boundary
> bad metadata [1259271225344, 1259271241728) crossing stripe boundary
> bad metadata [1266223611904, 1266223628288) crossing stripe boundary
> bad metadata [1304750063616, 1304750080000) crossing stripe boundary
> bad metadata [1304790106112, 1304790122496) crossing stripe boundary
> bad metadata [1304850792448, 1304850808832) crossing stripe boundary
> bad metadata [1304869928960, 1304869945344) crossing stripe boundary
> bad metadata [1305089540096, 1305089556480) crossing stripe boundary
> bad metadata [1309561651200, 1309561667584) crossing stripe boundary
> bad metadata [1309581443072, 1309581459456) crossing stripe boundary
> bad metadata [1309583671296, 1309583687680) crossing stripe boundary
> bad metadata [1309942808576, 1309942824960) crossing stripe boundary
> bad metadata [1310050549760, 1310050566144) crossing stripe boundary
> bad metadata [1313031585792, 1313031602176) crossing stripe boundary
> bad metadata [1313232912384, 1313232928768) crossing stripe boundary
> bad metadata [1555210764288, 1555210780672) crossing stripe boundary
> bad metadata [1555395182592, 1555395198976) crossing stripe boundary
> bad metadata [2050576744448, 2050576760832) crossing stripe boundary
> bad metadata [2050803957760, 2050803974144) crossing stripe boundary
> bad metadata [2050969108480, 2050969124864) crossing stripe boundary
Already mentioned in another reply, this *seems* to be false alert.
Latest btrfs-progs would help.
> checking free space tree cache and super generation don't match, space
> cache will be invalidated checking fs roots
Err, I found a missing '\n' before "checking fs roots".
And it seems that fs roots and extent tree are all OK.
Quite surprising.
The only possible problem seems to be outdated space cache.
Maybe mount with "-o clear_cache" will help, but I don't think that's
the cause.
> checking csums
> checking root refs
> found 1860217443214 bytes used err is 0
> total csum bytes: 1805105116
> total tree bytes: 11793776640
> total fs tree bytes: 8220835840
> total extent tree bytes: 1443315712
> btree space waste bytes: 2307850845
> file data blocks allocated: 2137151094784
> referenced 2706830905344
>
> # now let's wait for the backup to mount the FS and look at dmesg:
>
> [21375.606479] BTRFS info (device sde1): force zlib compression
> [21375.606483] BTRFS info (device sde1): using free space tree
Thanks to Chris Murphy, I almost ignored such line.
Not familiar with the new free space cache tree, sorry.
If "clear_cache" doesn't help, then try "nospace_cache" to disable the
entire space cache.
> [21375.606485] BTRFS: has skinny extents
> [21383.710725] BTRFS: checking UUID tree
> [21388.531267] ------------[ cut here ]------------
> [21388.531279] WARNING: CPU: 0 PID: 27085 at
> fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x279/0x2b0()
> [21388.531281] BTRFS: Transaction aborted (error -17) [21388.531282]
> Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O)
> vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO) [21388.531293]
> CPU: 0 PID: 27085 Comm: kworker/u8:3 Tainted: P O
> 4.5.0-gentoo #1 [21388.531295] Hardware name: To Be Filled By O.E.M. To
> Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013 [21388.531300]
> Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [21388.531302]
> 0000000000000000 ffffffff8159eaa9 ffff88029d723d68 ffffffff81ea1cf6
> [21388.531306] ffffffff810c6e37 ffff880407be5960 ffff88029d723db8
> 0000000000000020 [21388.531308] ffff8802bd886510 ffff8803cc846d00
> ffffffff810c6eb7 ffffffff81e8a088 [21388.531311] Call Trace:
> [21388.531317] [<ffffffff8159eaa9>] ? dump_stack+0x46/0x5d
> [21388.531322] [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
> [21388.531326] [<ffffffff810c6eb7>] ? warn_slowpath_fmt+0x47/0x50
> [21388.531329] [<ffffffff814923d9>] ?
> btrfs_run_delayed_refs+0x279/0x2b0 [21388.531332]
> [<ffffffff8149243d>] ? delayed_ref_async_start+0x2d/0x70
> [21388.531337] [<ffffffff810da8c3>] ? process_one_work+0x133/0x350
> [21388.531340] [<ffffffff810dae25>] ? worker_thread+0x45/0x450
> [21388.531343] [<ffffffff810dade0>] ? rescuer_thread+0x300/0x300
> [21388.531347] [<ffffffff810df3f8>] ? kthread+0xb8/0xd0
> [21388.531350] [<ffffffff810e301f>] ? finish_task_switch+0x6f/0x1f0
> [21388.531352] [<ffffffff810df340>] ? kthread_park+0x50/0x50
> [21388.531356] [<ffffffff81bdb31f>] ? ret_from_fork+0x3f/0x70
> [21388.531359] [<ffffffff810df340>] ? kthread_park+0x50/0x50
> [21388.531361] ---[ end trace 69a4c78997ef63b2 ]--- [21388.531364]
> BTRFS: error (device sde1) in btrfs_run_delayed_refs:2946: errno=-17
> Object already exists [21388.531367] BTRFS info (device sde1): forced
> readonly
>
> This FS has been okay just a few reboots earlier (except the stripe
> boundary thing).
>
> History of kernel versions used on the system (back to the date when I
> think the FS was still clean/not corrupted):
>
> Wed Aug 5 08:12:01 2015 >>> sys-kernel/gentoo-sources-4.1.4
> Sat Aug 15 19:04:34 2015 >>> sys-kernel/gentoo-sources-4.1.5
> Sat Aug 22 11:10:53 2015 >>> sys-kernel/gentoo-sources-4.1.6
> Tue Sep 1 00:03:38 2015 >>> sys-kernel/gentoo-sources-4.2.0
> Sat Sep 5 17:45:11 2015 >>> sys-kernel/gentoo-sources-4.2.0-r1
> Wed Sep 30 04:55:55 2015 >>> sys-kernel/gentoo-sources-4.2.2
> Tue Oct 6 22:55:19 2015 >>> sys-kernel/gentoo-sources-4.2.3
> Sun Oct 25 03:45:03 2015 >>> sys-kernel/gentoo-sources-4.2.4
> Wed Nov 4 04:21:38 2015 >>> sys-kernel/gentoo-sources-4.2.5
> Mon Nov 9 21:36:31 2015 >>> sys-kernel/gentoo-sources-4.3.0
> Sat Nov 14 10:48:03 2015 >>> sys-kernel/gentoo-sources-4.2.6
> Sun Dec 13 13:03:12 2015 >>> sys-kernel/gentoo-sources-4.2.7
> Tue Dec 15 22:37:53 2015 >>> sys-kernel/gentoo-sources-4.2.8
> Tue Dec 22 14:37:01 2015 >>> sys-kernel/gentoo-sources-4.3.3
> Fri Jan 22 20:38:32 2016 >>> sys-kernel/gentoo-sources-4.3.3-r1
> Wed Jan 27 09:27:57 2016 >>> sys-kernel/gentoo-sources-4.3.4
> Sun Jan 31 10:25:00 2016 >>> sys-kernel/gentoo-sources-4.4.0-r1
> Mon Feb 1 20:40:17 2016 >>> sys-kernel/gentoo-sources-4.4.1
> Fri Feb 19 22:11:45 2016 >>> sys-kernel/gentoo-sources-4.4.2
> Sat Feb 27 15:10:45 2016 >>> sys-kernel/gentoo-sources-4.4.3
> Sun Mar 6 14:58:05 2016 >>> sys-kernel/gentoo-sources-4.4.4
> Sat Mar 12 19:21:15 2016 >>> sys-kernel/gentoo-sources-4.4.5
> Sat Mar 19 06:13:09 2016 >>> sys-kernel/gentoo-sources-4.4.6
> Sun Mar 20 20:45:21 2016 >>> sys-kernel/gentoo-sources-4.5.0
>
> I only saw unreliable behavior with 4.4.5, 4.4.6, and 4.5.0 tho the
> problem may exist longer in my FS.
>
> $ sudo btrfs-show-super /dev/sde1
> superblock: bytenr=65536, device=/dev/sde1
> ---------------------------------------------------------
> csum 0xcc976d97 [match]
> bytenr 65536
> flags 0x1
> ( WRITTEN )
> magic _BHRfS_M [match]
> fsid 1318ec21-c421-4e36-a44a-7be3d41f9c3f
> label usb-backup
> generation 50814
> root 1251250159616
> sys_array_size 129
> chunk_root_generation 50784
> root_level 1
> chunk_root 2516518567936
> chunk_root_level 1
> log_root 0
> log_root_transid 0
> log_root_level 0
> total_bytes 2000397864960
> bytes_used 1860398493696
> sectorsize 4096
> nodesize 16384
> leafsize 16384
> stripesize 4096
> root_dir 6
> num_devices 1
> compat_flags 0x0
> compat_ro_flags 0x1
OK, you're using the new free space cache tree option, update
btrfs-progs to latest version will help further debugging.
> incompat_flags 0x169
> ( MIXED_BACKREF |
> COMPRESS_LZO |
> BIG_METADATA |
> EXTENDED_IREF |
> SKINNY_METADATA )
> csum_type 0
> csum_size 4
> cache_generation 50208
> uuid_tree_generation 50742
> dev_item.uuid 9008d5a0-ac7b-4505-8193-27428429f953
> dev_item.fsid 1318ec21-c421-4e36-a44a-7be3d41f9c3f [match]
> dev_item.type 0
> dev_item.total_bytes 2000397864960
> dev_item.bytes_used 1912308039680
> dev_item.io_align 4096
> dev_item.io_width 4096
> dev_item.sector_size 4096
> dev_item.devid 1
> dev_item.dev_group 0
> dev_item.seek_speed 0
> dev_item.bandwidth 0
> dev_item.generation 0
>
>
>
> BTW: btrfsck thinks that the space tree is invalid every time it is
> run, no matter if cleanly unmounted, uncleanly unmounted, or "btrfsck
> --repair" and then ran a second time.
>
Because your btrfs-progs is too old to understand the new free space
cache tree.
Thanks,
Qu
next prev parent reply other threads:[~2016-03-27 13:55 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-22 8:03 csum errors in VirtualBox VDI files Kai Krakow
2016-03-22 8:06 ` Kai Krakow
2016-03-22 8:07 ` Kai Krakow
2016-03-22 8:47 ` Qu Wenruo
2016-03-22 18:48 ` Kai Krakow
2016-03-22 19:42 ` Chris Murphy
2016-03-22 20:35 ` Kai Krakow
2016-03-23 4:16 ` Qu Wenruo
2016-03-26 19:30 ` Kai Krakow
2016-03-26 20:28 ` Chris Murphy
2016-03-26 21:04 ` Chris Murphy
2016-03-27 1:30 ` Kai Krakow
2016-03-27 4:57 ` Chris Murphy
2016-03-27 17:31 ` Kai Krakow
2016-03-27 19:04 ` Chris Murphy
2016-03-28 10:30 ` Kai Krakow
2016-03-27 1:01 ` Kai Krakow
2016-03-27 1:50 ` Kai Krakow
2016-03-27 4:43 ` Chris Murphy
2016-03-27 13:55 ` Qu Wenruo [this message]
2016-03-28 10:02 ` bad metadata crossing stripe boundary (was: csum errors in VirtualBox VDI files) Kai Krakow
2016-03-31 1:33 ` bad metadata crossing stripe boundary Qu Wenruo
2016-03-31 2:31 ` Qu Wenruo
2016-03-31 20:27 ` Kai Krakow
2016-03-31 20:37 ` Henk Slager
2016-03-31 21:00 ` Marc Haber
2016-03-31 21:16 ` Kai Krakow
2016-03-31 21:35 ` Kai Krakow
2016-04-01 5:57 ` Marc Haber
2016-04-02 9:03 ` Kai Krakow
2016-04-02 9:44 ` Marc Haber
2016-04-02 18:31 ` Kai Krakow
2016-04-02 19:39 ` Patrik Lundquist
2016-04-03 8:39 ` Marc Haber
2016-04-02 19:41 ` Chris Murphy
2016-04-03 8:51 ` Marc Haber
2016-04-03 18:29 ` Chris Murphy
2016-03-27 13:46 ` csum errors in VirtualBox VDI files Qu Wenruo
2016-03-22 20:07 ` Henk Slager
2016-03-22 21:23 ` Kai Krakow
2016-03-27 12:18 ` Martin Steigerwald
2016-03-27 16:53 ` Kai Krakow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56F7E65C.7020300@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=hurikhan77@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.