strange i/o errors with btrfs on raid/lvm

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* strange i/o errors with btrfs on raid/lvm
@ 2015-09-24 14:34 Jogi Hofmüller
  2015-09-24 17:41 ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Jogi Hofmüller @ 2015-09-24 14:34 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

Hi all,

We experience strange Input/output errors on our mail server (dovecot
pop/imap) that is using btrfs for its mailspool.

The server uses software RAID10.  The RAID is split into LVMs.  The
mailspool logical volume uses btrfs.  For several days now we see
Input/output errors on different files.  We could pinpoint the first
occurrence of the errors to a day when one of the RAID disks failed.
More precisely it all started while the RAID was rebuilding.

All affected files are files that are read/written frequently, like
dovecot index files, maildirsize and the likes.

Most files return to a useful state after some time (sometimes days,
sometimes minutes), which is why we didn't notice the errors right away.

We do snapshot and send/receive backup of the mailspool via cron.  A
file that is unreadable in the mailspool is totally OK on the backup
volume, even the latest 'copy' that must have been taken from an
otherwise unreadable file.  It is also possible to restore an unreadable
file from backup without Input/output error.  The file is fine and
useful then.

All disks taking part in the RAID show now SMART errors btw and a btrfs
scrub on the mailspool did not indicate any errors (I somehow expected
that).

All this runs on a virtual machine that uses kernel 4.1.3 (Debian build)
and btrfs-progs v4.0.

So finally I would ask what we can do to solve this problem?  I also
appreciate comments to the situation and of course hints to what is
going on.  This is over my head.

Thanks and cheers,
-- 
J.Hofmüller

Nisiti    -    Abie Nathan, 1927-2008

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange i/o errors with btrfs on raid/lvm
  2015-09-24 14:34 strange i/o errors with btrfs on raid/lvm Jogi Hofmüller
@ 2015-09-24 17:41 ` Chris Murphy
  2015-09-25 17:17   ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2015-09-24 17:41 UTC (permalink / raw)
  To: Jogi Hofmüller; +Cc: Btrfs BTRFS

On Thu, Sep 24, 2015 at 8:34 AM, Jogi Hofmüller <jogi@mur.at> wrote:

> All this runs on a virtual machine that uses kernel 4.1.3 (Debian build)
> and btrfs-progs v4.0.

Unrelated to the problem but I'd upgrade progs, in case it's ever
needed. 4.1 and 4.2 have tons of bug fixes over 4.0 already.

>
> So finally I would ask what we can do to solve this problem?  I also
> appreciate comments to the situation and of course hints to what is
> going on.  This is over my head.

The problem description is incomplete without a dmesg. If this is a systemd
system, then I suggest
journalctl -b -l -o short-monotonic > journal.txt

Otherwise you can put in a trimmed /var/log/messages for the current boot.
I think it's best to have kernel messages for the entire boot because the
cause could be buried well before the Btrfs error messages.

Attach that to the mailing list. If it's too big, the best option is file a
bug, attach the file to the bug, and then post the URL for the bug in this
thread. The fallback is put it up somewhere else, the problem there is it
ends up being temporarily available so if someone runs into this thread in
a  year your attachment is MIA.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange i/o errors with btrfs on raid/lvm
  2015-09-24 17:41 ` Chris Murphy
@ 2015-09-25 17:17   ` Chris Murphy
       [not found]     ` <5605ADE4.5090800@mur.at>
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2015-09-25 17:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Jogi Hofmüller, Btrfs BTRFS

>From offlist host logs (the Btrfs errors happen in a VM guest), I
think this is a hardware problem.

Aug 28 07:04:22 host kernel: [41367948.153031] sas:
sas_scsi_find_task: task 0xffff880e85c09c00 is done
Aug 28 07:04:22 host kernel: [41367948.153033] sas:
sas_eh_handle_sas_errors: task 0xffff880e85c09c00 is done
Aug 28 07:04:22 host kernel: [41367948.153036] sas: ata7:
end_device-0:0: cmd error handler
Aug 28 07:04:22 host kernel: [41367948.153085] sas: ata7:
end_device-0:0: dev error handler
Aug 28 07:04:22 host kernel: [41367948.153094] ata7.00: exception
Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
Aug 28 07:04:22 host kernel: [41367948.153119] sas: ata8:
end_device-0:1: dev error handler
Aug 28 07:04:22 host kernel: [41367948.153130] sas: ata9:
end_device-0:2: dev error handler
Aug 28 07:04:22 host kernel: [41367948.153134] sas: ata10:
end_device-0:3: dev error handler
Aug 28 07:04:22 host kernel: [41367948.153181] ata7.00: failed
command: WRITE FPDMA QUEUED
Aug 28 07:04:22 host kernel: [41367948.153234] ata7.00: cmd
61/01:00:08:08:00/00:00:00:00:00/40 tag 0 ncq 512 out
Aug 28 07:04:22 host kernel: [41367948.153234]          res
40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 07:04:22 host kernel: [41367948.153393] ata7.00: status: { DRDY }
Aug 28 07:04:22 host kernel: [41367948.153437] ata7.00: failed
command: READ FPDMA QUEUED
Aug 28 07:04:22 host kernel: [41367948.153490] ata7.00: cmd
60/08:00:80:ac:15/00:00:00:00:00/40 tag 1 ncq 4096 in
Aug 28 07:04:22 host kernel: [41367948.153490]          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 07:04:22 host kernel: [41367948.153648] ata7.00: status: { DRDY }
Aug 28 07:04:22 host kernel: [41367948.153693] ata7.00: failed
command: READ FPDMA QUEUED
Aug 28 07:04:22 host kernel: [41367948.153744] ata7.00: cmd
60/30:00:90:ac:15/00:00:00:00:00/40 tag 2 ncq 24576 in
Aug 28 07:04:22 host kernel: [41367948.153744]          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 07:04:22 host kernel: [41367948.153903] ata7.00: status: { DRDY }
Aug 28 07:04:22 host kernel: [41367948.153950] ata7: hard resetting link
Aug 28 07:04:22 host kernel: [41367948.320311] ata7.00: configured for
UDMA/133 (device error ignored)
Aug 28 07:04:22 host kernel: [41367948.320371] ata7.00: device
reported invalid CHS sector 0
Aug 28 07:04:22 host kernel: [41367948.320421] ata7.00: device
reported invalid CHS sector 0
Aug 28 07:04:22 host kernel: [41367948.320478] ata7: EH complete
Aug 28 07:04:22 host kernel: [41367948.320558] sas: --- Exit
sas_scsi_recover_host: busy: 0 failed: 0 tries: 1


So at the moment I don't think this is a Btrfs problem.

----
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5605ADE4.5090800@mur.at>]

* Re: strange i/o errors with btrfs on raid/lvm
       [not found]     ` <5605ADE4.5090800@mur.at>
@ 2015-09-25 20:47       ` Chris Murphy
  2015-09-28  8:39         ` Jogi Hofmüller
  2015-10-02  3:54         ` Russell Coker
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Murphy @ 2015-09-25 20:47 UTC (permalink / raw)
  To: Jogi Hofmüller; +Cc: Chris Murphy, Btrfs BTRFS

On Fri, Sep 25, 2015 at 2:26 PM, Jogi Hofmüller <jogi@mur.at> wrote:

> That was right while the RAID was in degraded state and rebuilding.

On the guest:

Aug 28 05:17:01 vm kernel: [140683.741688] BTRFS info (device vdc):
disk space caching is enabled
Aug 28 05:17:13 vm kernel: [140695.575896] BTRFS warning (device vdc):
block group 13988003840 has wrong amount of free space
Aug 28 05:17:13 vm kernel: [140695.575901] BTRFS warning (device vdc):
failed to load free space cache for block group 13988003840, rebuild
it now
Aug 28 05:17:13 vm kernel: [140695.626035] BTRFS warning (device vdc):
block group 17209229312 has wrong amount of free space
Aug 28 05:17:13 vm kernel: [140695.626039] BTRFS warning (device vdc):
failed to load free space cache for block group 17209229312, rebuild
it now
Aug 28 05:17:13 vm kernel: [140695.683517] BTRFS warning (device vdc):
block group 20430454784 has wrong amount of free space
Aug 28 05:17:13 vm kernel: [140695.683521] BTRFS warning (device vdc):
failed to load free space cache for block group 20430454784, rebuild
it now
Aug 28 05:17:13 vm kernel: [140695.822818] BTRFS warning (device vdc):
block group 68211965952 has wrong amount of free space
Aug 28 05:17:13 vm kernel: [140695.822822] BTRFS warning (device vdc):
failed to load free space cache for block group 68211965952, rebuild
it now

On the host, there are no messages that correspond to this time index,
but a bit over an hour and a half later are when there are sas error
messages, and the first reported write error.

I see the rebuild event starting:

Aug 28 07:04:23 host mdadm[2751]: RebuildStarted event detected on md
device /dev/md/0

But there are subsequent sas errors still, including hard resetting of
the link, and additional read errors. This continues more than once...

And then

Aug 28 17:06:49 host mdadm[2751]: RebuildFinished event detected on md
device /dev/md/0, component device  mismatches found: 2048 (on raid
level 10)
Aug 28 17:06:49 host mdadm[2751]: SpareActive event detected on md
device /dev/md/0, component device /dev/sdd1

and also a number of SMART warnings about seek error on another device

Aug 28 17:35:55 host smartd[3146]: Device: /dev/sda [SAT], SMART Usage
Attribute: 7 Seek_Error_Rate changed from 180 to 179

So it sounds like more than one problem, either two drives, or maybe
even a controller problem, I can't really tell as there are lots of
messages.

But 2048 mismatches found after a rebuild is a problem. So there's
already some discrepancy in the mdadm raid10. And mdadm raid1 (or 10)
cannot resolve mismatches because which block is correct is ambiguous.
So that means something is definitely going to get corrupt. Btrfs, if
the metadata profile is DUP can recover from that. But data can't.
Normally this results in an explicit Btrfs message about a checksum
mismatch and no ability to fix it, but will still report the path to
affected file.  But I'm not finding that.

Anyway, if the hardware errors are resolved, try doing a scrub on the
file system, maybe when there's a period of reduced usage, to make
sure data and metadata are OK.

And then you could also manually reset the free space cache by
umounting and then remounting with -o clear_cache. This is a one time
thing, you do not need to remount.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange i/o errors with btrfs on raid/lvm
  2015-09-25 20:47       ` Chris Murphy
@ 2015-09-28  8:39         ` Jogi Hofmüller
  2015-10-02  3:54         ` Russell Coker
  1 sibling, 0 replies; 6+ messages in thread
From: Jogi Hofmüller @ 2015-09-28  8:39 UTC (permalink / raw)
  To: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

Hi Chris, all,

Am 2015-09-25 um 22:47 schrieb Chris Murphy:
> On Fri, Sep 25, 2015 at 2:26 PM, Jogi Hofmüller <jogi@mur.at> wrote:
> 
>> That was right while the RAID was in degraded state and rebuilding.
> 
> On the guest:
> 
> Aug 28 05:17:01 vm kernel: [140683.741688] BTRFS info (device vdc):
> disk space caching is enabled
> Aug 28 05:17:13 vm kernel: [140695.575896] BTRFS warning (device vdc):
> block group 13988003840 has wrong amount of free space

The device vdc is the backup device.  That's where we collect snapshots
of our mail spool.  I could fix that remounting with -o clear_cache as
you suggested.  However it is not directly related to the I/O error
problem.  The backup device sits on a different RAID than the mail spool.

> On the host, there are no messages that correspond to this time index,
> but a bit over an hour and a half later are when there are sas error
> messages, and the first reported write error.
> 
> I see the rebuild event starting:
> 
> Aug 28 07:04:23 host mdadm[2751]: RebuildStarted event detected on md
> device /dev/md/0
> 
> But there are subsequent sas errors still, including hard resetting of
> the link, and additional read errors. This continues more than once...

That is smartd still trying to read the failed device (sda).

> And then
> 
> Aug 28 17:06:49 host mdadm[2751]: RebuildFinished event detected on md
> device /dev/md/0, component device  mismatches found: 2048 (on raid
> level 10)

Right.  I totally missed the 'mismatches found' part :(

> and also a number of SMART warnings about seek error on another device
> 
> Aug 28 17:35:55 host smartd[3146]: Device: /dev/sda [SAT], SMART Usage
> Attribute: 7 Seek_Error_Rate changed from 180 to 179

Still the failed device.  These messages continue until we replaced the
disk with a new one.

> But 2048 mismatches found after a rebuild is a problem. So there's
> already some discrepancy in the mdadm raid10. And mdadm raid1 (or 10)
> cannot resolve mismatches because which block is correct is ambiguous.
> So that means something is definitely going to get corrupt. Btrfs, if
> the metadata profile is DUP can recover from that. But data can't.
> Normally this results in an explicit Btrfs message about a checksum
> mismatch and no ability to fix it, but will still report the path to
> affected file.  But I'm not finding that.

I ran checkarray on md0 and that reduced the mismatch_cnt to 384.

What I still don't understand is why it is possible to make a backup of
a file that is not accessible in the file system.  All files that
produce an I/O error upon access are fine on the backup drive.  It's
even possible to restore a file from the backup drive and then it is
read/writable again.

Another thing I cannot explain is why the only files affected are those
that get read/written lot's of times.

And finally the question is why does none of the other logical volumes
that reside on the same RAID experience any problems?  There are several
other logical volumes containing btrfs and ext4 file systems.

Anyhow, thanks for all the suggestions so far.

Cheers,

PS:  my messages with attached log files got forwarded to /dev/null
because they exceeded the 100000 char limit :(
-- 
j.hofmüller

We are all idiots with deadlines.                       - Mike West

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange i/o errors with btrfs on raid/lvm
  2015-09-25 20:47       ` Chris Murphy
  2015-09-28  8:39         ` Jogi Hofmüller
@ 2015-10-02  3:54         ` Russell Coker
  1 sibling, 0 replies; 6+ messages in thread
From: Russell Coker @ 2015-10-02  3:54 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Jogi Hofmüller, Btrfs BTRFS

On Sat, 26 Sep 2015 06:47:26 AM Chris Murphy wrote:
> And then
> 
> Aug 28 17:06:49 host mdadm[2751]: RebuildFinished event detected on md
> device /dev/md/0, component device  mismatches found: 2048 (on raid
> level 10)
> Aug 28 17:06:49 host mdadm[2751]: SpareActive event detected on md
> device /dev/md/0, component device /dev/sdd1

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=405919#41

For Linux Software RAID-1 it's expected that you get a multiple of 128 
mismatches found every time you do a scan.  I don't know why that is, but 
systems running like that don't appear to have any problems.  This even 
appears to happen on arrays that aren't in any noticable use (EG swap on a 
server that has heaps of RAM so swap doesn't get used).  The above URL 
describes part of the problem but if that was the entire explanation then I'd 
expect multiples of 8 not 128.  Also the explanation that Neil offers doesn't 
seem to cover the case of 10,000+ mismatches occurring in the weekly scrubs.

Linux Software RAID-10 works the same way in this regard.  So I'm not sure 
that the log entries in question mean anything other than a known bug in Linux 
Software RAID.

I have servers that have been running well for years while repeatedly 
reporting such errors on Linux Software RAID-1 and not getting any FSCK 
problems.  Note that the servers in question don't run BTRFS because years ago 
it really wasn't suitable for server use.  It is possible that there are 
errors that aren't detected by e2fsck, don't result in corruption of gzip 
compressed data, don't give MySQL data consistency errors, and don't corrupt 
any user data in a noticable way.  But it seems to be evidence in support of 
the theory that such mismatches don't matter.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-02  3:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-24 14:34 strange i/o errors with btrfs on raid/lvm Jogi Hofmüller
2015-09-24 17:41 ` Chris Murphy
2015-09-25 17:17   ` Chris Murphy
     [not found]     ` <5605ADE4.5090800@mur.at>
2015-09-25 20:47       ` Chris Murphy
2015-09-28  8:39         ` Jogi Hofmüller
2015-10-02  3:54         ` Russell Coker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).