linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* corruption in USB harddrive - backup via send/receive - question
@ 2015-04-16 18:48 Miguel Negrão
  2015-04-16 20:06 ` Marc MERLIN
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Miguel Negrão @ 2015-04-16 18:48 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I'm running a laptop, macbook pro 8,2, with ubuntu, on kernel
3.13.0-49-lowlatency. I have a USB enclosure containing two harddrives
(Icydock JBOD). Each harddrive runs their own btrfs file system, on top of
luks partitions. I backup one harddrive to the other using btrfs
send/receive with incremental sends (tests that I did indicated this setup
was too fragile for running btrfs RAID).

I've noticed that files on one of the harddrive start to get corrupted
sometimes. It's not many files, but it does happen from time to time. On the
irc I was told it could be the USB enclosure, it could be memory, etc. The
SMART data of the harddrives say they are fine, the quick SMART tests also
pass without problems.


 - Given that I'm running a laptop and comunicating with the harddrives via
USB, is it expected that I will get some corruption from time to time or is
this abnormal and there is something very wrong with some of my equipment
and if so how can track what is responsible ?
 - Is it possible to extract a file that has csum errors ? I work with audio
files, if I don't have a backup of file I would still like to get full
corrupted version, since most of the audio might still be perfectly fine.
Can I tell btrfs to do a new csum of the file has it is now, and just live
with the corruption ?

I've copied a file to the main USB harddrive on 2015-02-21, the file was
backed up to the other harddrive via send/receive on 2015-02-23. Now
(yesterday) when I try to access the file on the main harddrive it is corrupted:

Apr 16 19:20:35 miguel-MacBookPro kernel: [  835.944606] BTRFS info (device
dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
1128560616
Apr 16 19:20:35 miguel-MacBookPro kernel: [  835.948431] BTRFS info (device
dm-1): csum failed ino 136726 off 1067761664 csum 730461863 expected csum
1924299628
Apr 16 19:20:36 miguel-MacBookPro kernel: [  836.395372] BTRFS info (device
dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
1128560616
Apr 16 19:20:36 miguel-MacBookPro kernel: [  836.396682] BTRFS info (device
dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
1128560616

I can access it fine on the backup harddrive. 

Questions:

- Can I assume that that the corruption happened after the file was sent to
the backup hardrive ?
- Will btrfs send ever send a file with corrupted blocks ?
- I kept running more backups, but that particular file was not changed
since. I'm I correct in assuming that since the file was not changed it was
not sent again to the backup disk and that therefore the version I have in
the backup should be a good copy ?

Best regards,
Miguel

Label: 'huge-new'  uuid: 21d841c9-7c30-4d1b-b4c2-8c0e59e8959a
	Total devices 1 FS bytes used 1.04TiB
	devid    1 size 2.73TiB used 1.06TiB path /dev/mapper/huge-new

[/dev/mapper/huge-new].write_io_errs   0
[/dev/mapper/huge-new].read_io_errs    0
[/dev/mapper/huge-new].flush_io_errs   0
[/dev/mapper/huge-new].corruption_errs 1970
[/dev/mapper/huge-new].generation_errs 0

Btrfs v0.20-rc1-335-gf00dd83

Label: 'huge-new-backup'  uuid: 9af299bc-48b0-4e52-8078-82749627d9f4
	Total devices 1 FS bytes used 1.04TiB
	devid    1 size 2.73TiB used 1.05TiB path /dev/mapper/huge-new-backup

[/dev/mapper/huge-new-backup].write_io_errs   0
[/dev/mapper/huge-new-backup].read_io_errs    0
[/dev/mapper/huge-new-backup].flush_io_errs   0
[/dev/mapper/huge-new-backup].corruption_errs 0
[/dev/mapper/huge-new-backup].generation_errs 0

Btrfs v0.20-rc1-335-gf00dd83





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-16 18:48 corruption in USB harddrive - backup via send/receive - question Miguel Negrão
@ 2015-04-16 20:06 ` Marc MERLIN
  2015-04-16 20:58   ` Miguel Negrão
  2015-04-17 11:31 ` Austin S Hemmelgarn
  2015-04-20 14:07 ` Sander
  2 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2015-04-16 20:06 UTC (permalink / raw)
  To: Miguel Negrão; +Cc: linux-btrfs

On Thu, Apr 16, 2015 at 06:48:43PM +0000, Miguel Negrão wrote:
> Hello,
> 
> I'm running a laptop, macbook pro 8,2, with ubuntu, on kernel
> 3.13.0-49-lowlatency. I have a USB enclosure containing two harddrives

Btrfs send/receive is not known to work well enough until 3.14.x, and
several corruption bugs have been fixed between 3.13 and 3.19.

You should definitely upgrade to a much more recent kernel.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-16 20:06 ` Marc MERLIN
@ 2015-04-16 20:58   ` Miguel Negrão
  0 siblings, 0 replies; 7+ messages in thread
From: Miguel Negrão @ 2015-04-16 20:58 UTC (permalink / raw)
  To: linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 816 bytes --]

Marc MERLIN <marc <at> merlins.org> writes:

> 
> On Thu, Apr 16, 2015 at 06:48:43PM +0000, Miguel Negrão wrote:
> > Hello,
> > 
> > I'm running a laptop, macbook pro 8,2, with ubuntu, on kernel
> > 3.13.0-49-lowlatency. I have a USB enclosure containing two harddrives
> 
> Btrfs send/receive is not known to work well enough until 3.14.x, and
> several corruption bugs have been fixed between 3.13 and 3.19.
> 
> You should definitely upgrade to a much more recent kernel.
> 
> Marc

Hi Marc,

But in  my case the corruption is happening in the disk from where data is
being sent. Do you mean that btrfs send cand cause corruption on the sending
disk prior to 3.19 ? 

Thank you,
Miguelÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-16 18:48 corruption in USB harddrive - backup via send/receive - question Miguel Negrão
  2015-04-16 20:06 ` Marc MERLIN
@ 2015-04-17 11:31 ` Austin S Hemmelgarn
  2015-04-17 19:45   ` Miguel Negrão
  2015-04-20 14:07 ` Sander
  2 siblings, 1 reply; 7+ messages in thread
From: Austin S Hemmelgarn @ 2015-04-17 11:31 UTC (permalink / raw)
  To: Miguel Negrão, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5108 bytes --]

On 2015-04-16 14:48, Miguel Negrão wrote:
> Hello,
>
> I'm running a laptop, macbook pro 8,2, with ubuntu, on kernel
> 3.13.0-49-lowlatency. I have a USB enclosure containing two harddrives
> (Icydock JBOD). Each harddrive runs their own btrfs file system, on top of
> luks partitions. I backup one harddrive to the other using btrfs
> send/receive with incremental sends (tests that I did indicated this setup
> was too fragile for running btrfs RAID).
>
> I've noticed that files on one of the harddrive start to get corrupted
> sometimes. It's not many files, but it does happen from time to time. On the
> irc I was told it could be the USB enclosure, it could be memory, etc. The
> SMART data of the harddrives say they are fine, the quick SMART tests also
> pass without problems.
>
>
>   - Given that I'm running a laptop and comunicating with the harddrives via
> USB, is it expected that I will get some corruption from time to time or is
> this abnormal and there is something very wrong with some of my equipment
> and if so how can track what is responsible ?
>   - Is it possible to extract a file that has csum errors ? I work with audio
> files, if I don't have a backup of file I would still like to get full
> corrupted version, since most of the audio might still be perfectly fine.
> Can I tell btrfs to do a new csum of the file has it is now, and just live
> with the corruption ?
>
> I've copied a file to the main USB harddrive on 2015-02-21, the file was
> backed up to the other harddrive via send/receive on 2015-02-23. Now
> (yesterday) when I try to access the file on the main harddrive it is corrupted:
>
> Apr 16 19:20:35 miguel-MacBookPro kernel: [  835.944606] BTRFS info (device
> dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
> 1128560616
> Apr 16 19:20:35 miguel-MacBookPro kernel: [  835.948431] BTRFS info (device
> dm-1): csum failed ino 136726 off 1067761664 csum 730461863 expected csum
> 1924299628
> Apr 16 19:20:36 miguel-MacBookPro kernel: [  836.395372] BTRFS info (device
> dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
> 1128560616
> Apr 16 19:20:36 miguel-MacBookPro kernel: [  836.396682] BTRFS info (device
> dm-1): csum failed ino 136726 off 1067679744 csum 4135207512 expected csum
> 1128560616
>
> I can access it fine on the backup harddrive.
>
> Questions:
>
> - Can I assume that that the corruption happened after the file was sent to
> the backup hardrive ?
> - Will btrfs send ever send a file with corrupted blocks ?
> - I kept running more backups, but that particular file was not changed
> since. I'm I correct in assuming that since the file was not changed it was
> not sent again to the backup disk and that therefore the version I have in
> the backup should be a good copy ?
>
> Best regards,
> Miguel
>
> Label: 'huge-new'  uuid: 21d841c9-7c30-4d1b-b4c2-8c0e59e8959a
> 	Total devices 1 FS bytes used 1.04TiB
> 	devid    1 size 2.73TiB used 1.06TiB path /dev/mapper/huge-new
>
> [/dev/mapper/huge-new].write_io_errs   0
> [/dev/mapper/huge-new].read_io_errs    0
> [/dev/mapper/huge-new].flush_io_errs   0
> [/dev/mapper/huge-new].corruption_errs 1970
> [/dev/mapper/huge-new].generation_errs 0
>
> Btrfs v0.20-rc1-335-gf00dd83
>
> Label: 'huge-new-backup'  uuid: 9af299bc-48b0-4e52-8078-82749627d9f4
> 	Total devices 1 FS bytes used 1.04TiB
> 	devid    1 size 2.73TiB used 1.05TiB path /dev/mapper/huge-new-backup
>
> [/dev/mapper/huge-new-backup].write_io_errs   0
> [/dev/mapper/huge-new-backup].read_io_errs    0
> [/dev/mapper/huge-new-backup].flush_io_errs   0
> [/dev/mapper/huge-new-backup].corruption_errs 0
> [/dev/mapper/huge-new-backup].generation_errs 0
>
> Btrfs v0.20-rc1-335-gf00dd83
>

First, as mentioned in another reply to this, you should update your 
kernel.  I don't think that the kernel is what is causing the issue, but 
it is an old kernel by BTRFS standards, and keeping up to date is 
important with a filesystem under such heavy development.  The same 
actually goes for the userspace components as well, although that is 
less critical than the kernel side.

As to the corruption, this sounds like some kind of hardware issue to 
me.  Assuming that you can afford to wipe the filesystems, I would 
suggest running some tests on the disks with the program 'badblocks' 
(found in the e2fsutils).  The fact that it is only the first disk that 
is having issues would seem to indicate that either that port on the 
enclosure is intermitently bad, or the disk itself is having issues. 
The SMART tests passing just indicate that the disk doesn't think it is 
failing, not that it is perfectly reliable (I've had disks that pass all 
the SMART tests, and then just randomly reset themselves from time to 
time).  I would also look into what manufacturer and firmware version 
the drives are, as I do know that some of the early Seagate and WD 
multi-terabyte drives had some serious firmware bugs that could cause 
data corruption similar to this.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-17 11:31 ` Austin S Hemmelgarn
@ 2015-04-17 19:45   ` Miguel Negrão
  0 siblings, 0 replies; 7+ messages in thread
From: Miguel Negrão @ 2015-04-17 19:45 UTC (permalink / raw)
  To: Austin S Hemmelgarn, linux-btrfs

Hi Austin,

On 17-04-2015 12:31, Austin S Hemmelgarn wrote:
> 
> First, as mentioned in another reply to this, you should update your
> kernel.  I don't think that the kernel is what is causing the issue, but
> it is an old kernel by BTRFS standards, and keeping up to date is
> important with a filesystem under such heavy development.  The same
> actually goes for the userspace components as well, although that is
> less critical than the kernel side.

Ubuntu 15.04 is coming out next week with 3.19, I will try to update to
that soon. Unfortunatelly using a linux machine as a music instrument
requires a stable setup which means not getting always the latest
updates. I guess using btrfs for my setup might have been a bad decision.

> As to the corruption, this sounds like some kind of hardware issue to
> me.  Assuming that you can afford to wipe the filesystems, I would
> suggest running some tests on the disks with the program 'badblocks'
> (found in the e2fsutils).  The fact that it is only the first disk that
> is having issues would seem to indicate that either that port on the
> enclosure is intermitently bad, or the disk itself is having issues. The
> SMART tests passing just indicate that the disk doesn't think it is
> failing, not that it is perfectly reliable (I've had disks that pass all
> the SMART tests, and then just randomly reset themselves from time to
> time).  I would also look into what manufacturer and firmware version
> the drives are, as I do know that some of the early Seagate and WD
> multi-terabyte drives had some serious firmware bugs that could cause
> data corruption similar to this.

Thank you for the advice. At the moment it would be highly inconvenient
to erase one of the disks because I have almost 2 years of snapshots
sent with send/receive on the backup disk to which I could no longer
send just the diff, since erasing the original disk and copying from
backup would create different ids. I guess after a new backup I could
make the old snapshots again share the same blocks by using bedup or
something like that, although I'm not familiar with that tool and I
think two copies of the same data would not fit on that drive.
The drives are western digital green drives, the website doesn't list
any firmware update for those.

I have compared the results of the latest scrub with the results from a
previous scrub 2 month ago and all the damaged files were not damaged
two months ago, and were already in the backup by then (which doesn't
have any errors now or two months ago), so I guess it's safe to recover
those files from the backup. I guess will start running scrubs every few
weeks to check if the corruption continues to happen.

Thank you !
Miguel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-16 18:48 corruption in USB harddrive - backup via send/receive - question Miguel Negrão
  2015-04-16 20:06 ` Marc MERLIN
  2015-04-17 11:31 ` Austin S Hemmelgarn
@ 2015-04-20 14:07 ` Sander
  2015-04-20 14:29   ` Miguel Negrão
  2 siblings, 1 reply; 7+ messages in thread
From: Sander @ 2015-04-20 14:07 UTC (permalink / raw)
  To: Miguel Negrão; +Cc: linux-btrfs

Miguel Negrão wrote (ao):
>  - Given that I'm running a laptop and comunicating with the harddrives via
> USB, is it expected that I will get some corruption from time to time or is
> this abnormal

Abnormal. I have three Intel ssd's usb connected to an Arndale. Two of
them have luks and btrfs raid0 on top, and is used as a home server. The
third ssd is plain btrfs, and used for backup archives.

	Sander

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: corruption in USB harddrive - backup via send/receive - question
  2015-04-20 14:07 ` Sander
@ 2015-04-20 14:29   ` Miguel Negrão
  0 siblings, 0 replies; 7+ messages in thread
From: Miguel Negrão @ 2015-04-20 14:29 UTC (permalink / raw)
  To: sander; +Cc: linux-btrfs

On 20-04-2015 15:07, Sander wrote:
> Miguel Negrão wrote (ao):
>>  - Given that I'm running a laptop and comunicating with the harddrives via
>> USB, is it expected that I will get some corruption from time to time or is
>> this abnormal
> 
> Abnormal. I have three Intel ssd's usb connected to an Arndale. Two of
> them have luks and btrfs raid0 on top, and is used as a home server. The
> third ssd is plain btrfs, and used for backup archives.

Hi Sander.

Good to know. Indeed I think something's wrong with the enclosure, I
keep getting some minimal corruption from time to time. I also use btrfs
over luks.

 I'm considering buying a thunderbolt (I have a macbook pro) to esata
converter, as bypassing the usb interface might be more stable. I
imagine esata is better for btrfs then usb, correct ? Anyone here has
experience with thunderbolt in linux ? My understanding it that if
connected at boot time, devices should just appear as normal pci devices.

Best,
Miguel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-04-20 14:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-16 18:48 corruption in USB harddrive - backup via send/receive - question Miguel Negrão
2015-04-16 20:06 ` Marc MERLIN
2015-04-16 20:58   ` Miguel Negrão
2015-04-17 11:31 ` Austin S Hemmelgarn
2015-04-17 19:45   ` Miguel Negrão
2015-04-20 14:07 ` Sander
2015-04-20 14:29   ` Miguel Negrão

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).