All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Kai Krakow <hurikhan77@gmail.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: csum errors in VirtualBox VDI files
Date: Wed, 23 Mar 2016 12:16:24 +0800	[thread overview]
Message-ID: <56F21898.3020101@cn.fujitsu.com> (raw)
In-Reply-To: <20160322194854.161e9c4c@jupiter.sol.kaishome.de>



Kai Krakow wrote on 2016/03/22 19:48 +0100:
> Am Tue, 22 Mar 2016 16:47:10 +0800
> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>:
>
>> Hi,
>>
>> Kai Krakow wrote on 2016/03/22 09:03 +0100:
>>> Hello!
>>>
>>> Since one of the last kernel updates (I don't know which exactly),
>>> I'm experiencing csum errors within VDI files when running
>>> VirtualBox. A side effect of this is, as soon as dmesg shows these
>>> errors, commands like "du" and "df" hang until reboot.
>>>
>>> I've now restored the file from backup but it happens over and over
>>> again.
>>>
>>> On another machine I'm also seeing errors with big files in the
>>> following scenario (apparently an older kernel, 4.1.x I afair):
>>>
>>> # ntfsclone --save /dev/md126p2 -o rescue.ntfs.img
>>>                      ^ big NTFS partition   ^ file on btrfs
>>>
>>> results in a write error and the file system goes read-only.
>>
>> When it goes RO, it must have some warning in kernel log.
>> Would you please paste the kernel log?
>
> Apparently, that system does not boot now due to errors in bcache
> b-tree. That being that, it may well be some bcache error and not
> btrfs' fault. Apparently I couldn't catch the output, I've been in a
> hurry. It said "write error" and had some backtrace. I will come to
> this back later.
>
> Let's go to the system I currently care about (that one with the
> always breaking VDI file):
>
>>> Both systems have in common they are using btrfs on bcache with
>>> compress=lzo,autodefrag,nossd,discard (mraid=1,draid=0 and
>>> mraid=1,draid=single).
>>>
>>> The system mentioned first is running Kernel 4.5.0 with Gentoo
>>> patch-set. I upgraded from the last 4.4.x kernel when I first
>>> experienced this problem. The first time the problem resulted in a
>>> duplicate extent which btrfsck wasn't able to fix, that's when I
>>> first restored from backup. But now I'm getting csum errors in this
>>> file over a over again, plus when rsync has run for backup, the
>>> system no longer responds to "du" and "df" commands - it just hangs.
>>>
>>> Known problem? Does it help if I send debug info? If so, please
>>> instruct.
>>>
>> Does btrfs check report anything wrong?
>
> After the error occured?
>
> Yes, some text about the extent being compressed and btrfs repair
> doesn't currently handle that case (I tried --repair as I'm having a
> backup). I simply decided not to investigate that further at that point
> but delete and restore the affected file from backup. However, this is
> the message from dmesg (tho, I didn't catch the backtrace):
>
> btrfs_run_delayed_refs:2927: errno=-17 Object already exists

That's nice, at least we have some clue.

It's almost sure, it's a bug either in btrfs kernel which doesn't handle 
delayed refs well(low possibility), or, corrupted fs which create 
something kernel can't handle(I bet that's the case).

>
> After this, the system went RO and I had to reboot. I ran btrfs check
> and it told about a duplicate extent.

If output of btrfsck can be posted, it would help a lot to locate the 
problem and enhance btrfsck.

> I identified the file (using
> btrfs inspect and the inode number) being the VDI file, and restored it.
> Afterwards, I upgraded from latest 4.4 to 4.5. Currently, I'm now
> watching closer since this incident, and the file becomes damaged
> without any message in the kernel log when doing some more than usual
> IO in VirtualBox. When my backup script then runs over the file, I get
> errors about missing csums - the block is not readable.

If no other problem reported by btrfsck after your fix, --init-csum 
would handle such case.

> I now ran
> ddrescue, and replaced the file to get a current and slightly damaged
> VDI image back (my backup uses time rotation, so no problem). But
> running chkdsk in VirtualBox damages the VDI again.
>
> Regarding the other error on the other machine, I'm not completely
> convinced bcache ain't involved in this problem.
>
> As soon as I "produced" csum errors again, I'll run btrfs check. Or
> should I do it now without forcing the csum error to occur?
>
>
If it's possible, btrfsck now with all its output posted is recommended.

Thanks,
Qu



  parent reply	other threads:[~2016-03-23  4:16 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22  8:03 csum errors in VirtualBox VDI files Kai Krakow
2016-03-22  8:06 ` Kai Krakow
2016-03-22  8:07 ` Kai Krakow
2016-03-22  8:47 ` Qu Wenruo
2016-03-22 18:48   ` Kai Krakow
2016-03-22 19:42     ` Chris Murphy
2016-03-22 20:35       ` Kai Krakow
2016-03-23  4:16     ` Qu Wenruo [this message]
2016-03-26 19:30       ` Kai Krakow
2016-03-26 20:28         ` Chris Murphy
2016-03-26 21:04           ` Chris Murphy
2016-03-27  1:30             ` Kai Krakow
2016-03-27  4:57               ` Chris Murphy
2016-03-27 17:31                 ` Kai Krakow
2016-03-27 19:04                   ` Chris Murphy
2016-03-28 10:30                     ` Kai Krakow
2016-03-27  1:01           ` Kai Krakow
2016-03-27  1:50         ` Kai Krakow
2016-03-27  4:43           ` Chris Murphy
2016-03-27 13:55           ` Qu Wenruo
2016-03-28 10:02             ` bad metadata crossing stripe boundary (was: csum errors in VirtualBox VDI files) Kai Krakow
2016-03-31  1:33               ` bad metadata crossing stripe boundary Qu Wenruo
2016-03-31  2:31                 ` Qu Wenruo
2016-03-31 20:27                   ` Kai Krakow
2016-03-31 20:37                     ` Henk Slager
2016-03-31 21:00                   ` Marc Haber
2016-03-31 21:16                     ` Kai Krakow
2016-03-31 21:35                       ` Kai Krakow
2016-04-01  5:57                       ` Marc Haber
2016-04-02  9:03                         ` Kai Krakow
2016-04-02  9:44                           ` Marc Haber
2016-04-02 18:31                             ` Kai Krakow
2016-04-02 19:39                               ` Patrik Lundquist
2016-04-03  8:39                               ` Marc Haber
2016-04-02 19:41                         ` Chris Murphy
2016-04-03  8:51                           ` Marc Haber
2016-04-03 18:29                             ` Chris Murphy
2016-03-27 13:46         ` csum errors in VirtualBox VDI files Qu Wenruo
2016-03-22 20:07 ` Henk Slager
2016-03-22 21:23   ` Kai Krakow
2016-03-27 12:18 ` Martin Steigerwald
2016-03-27 16:53   ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56F21898.3020101@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=hurikhan77@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.