Btrfs on a failing drive

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* Btrfs on a failing drive
@ 2014-11-17 22:55 Fennec Fox
  2014-11-18  1:17 ` Phillip Susi
  2014-11-18  6:10 ` Chris Murphy
  0 siblings, 2 replies; 6+ messages in thread
From: Fennec Fox @ 2014-11-17 22:55 UTC (permalink / raw)
  To: linux-btrfs

well i am an arch linux user and machine owner using a failing drive
  its still relyable enough for me but btrfs seems not to mark bad
blocks as unusable and continues to try to write to them.
/bbs.archlinux.org/viewtopic.php?pid=1476540#p1476540  this forum post
 has a few more details regarding the problem  i really need a bit of
help  thank you

Btrfs v3.17.1
Linux archos 3.17.3-1-ARCH #1 SMP PREEMPT Fri Nov 14 22:56:01 CET 2014
i686 GNU/Linux
Label: none  uuid: 9d7de4cb-d0a6-4b63-af65-9e055197808d
Total devices 1 FS bytes used 75.62GiB
devid    1 size 288.09GiB used 288.09GiB path /dev/sda1

Btrfs v3.17.1

[fennectech@archos ~]$ sudo   btrfs fi df /
Data, single: total=286.06GiB, used=75.02GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=615.66MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=208.00MiB, used=0.00B

10485756k FS
[   15.003094] mousedev: PS/2 mouse device common for all mice
[   17.894442] atl1c 0000:05:00.0: irq 30 for MSI/MSI-X
[   17.907192] IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
[   17.927793] IPv6: ADDRCONF(NETDEV_UP): wlp4s0: link is not ready
[   19.298877] wlp4s0: authenticate with 20:e5:2a:5f:38:9c
[   19.313224] wlp4s0: send auth to 20:e5:2a:5f:38:9c (try 1/3)
[   19.314876] wlp4s0: authenticated
[   19.316730] wlp4s0: associate with 20:e5:2a:5f:38:9c (try 1/3)
[   19.319151] wlp4s0: RX AssocResp from 20:e5:2a:5f:38:9c
(capab=0x421 status=0 aid=1)
[   19.319274] wlp4s0: associated
[   19.319295] IPv6: ADDRCONF(NETDEV_CHANGE): wlp4s0: link becomes ready
[   70.470562] fuse init (API version 7.23)
[   83.050733] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[   83.052317] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[   84.305761] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[   84.306348] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[   84.318911] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780
[   84.319115] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780
[   84.319317] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780
[   86.306536] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[   91.050707] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[  101.185462] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[  102.998155] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780
[  103.140555] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[  103.464885] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780
[  103.474580] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[  103.496146] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[  103.498569] BTRFS info (device sda1): csum failed ino 3048916 off
33030144 csum 1217419445 expected csum 510562246
[  103.499630] BTRFS info (device sda1): csum failed ino 3048916 off
21884928 csum 1036763273 expected csum 3160828041
[  103.499783] BTRFS info (device sda1): csum failed ino 3048916 off
9314304 csum 786831086 expected csum 1264200780

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs on a failing drive
  2014-11-17 22:55 Btrfs on a failing drive Fennec Fox
@ 2014-11-18  1:17 ` Phillip Susi
       [not found]   ` <CAD1x5BDJhZ6a=91G8+UzLTY+Oik7MVpr-XGKOQrOnXpkRLjwug@mail.gmail.com>
  2014-11-18  6:10 ` Chris Murphy
  1 sibling, 1 reply; 6+ messages in thread
From: Phillip Susi @ 2014-11-18  1:17 UTC (permalink / raw)
  To: Fennec Fox, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 11/17/2014 05:55 PM, Fennec Fox wrote:
> well i am an arch linux user and machine owner using a failing
> drive its still relyable enough for me but btrfs seems not to mark
> bad blocks as unusable and continues to try to write to them. 
> /bbs.archlinux.org/viewtopic.php?pid=1476540#p1476540  this forum
> post has a few more details regarding the problem  i really need a
> bit of help  thank you

If indeed writes are failing then the drive is only suitable for a
door stop.  Drives remap bad sectors to a spare pool on write so if it
is now failing writes, it has already exhausted its spare pool and you
should have replaced it long ago.  Have a look at its SMART stats and
it will probably confirm the drive is fubar.


> [   83.050733] BTRFS info (device sda1): csum failed ino 3048916
> off 33030144 csum 1217419445 expected csum 510562246 [   83.052317]
> BTRFS info (device sda1): csum failed ino 3048916 off 33030144 csum
> 1217419445 expected csum 510562246

That's not saying writes are failing; it is saying that your data has
been silently corrupted, which means the drive is the worst kind of
broken and should be thrown in a fire at once.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJUap4OAAoJENRVrw2cjl5RwBAH/1ceBd4i7WD7679x3bshYYTi
Lv63xLRMjbo+T0md3ptcndyxFbZlRdWQiJbIKT40yn9xnqOWeXWTkSmODqGyEOdC
M9HSlfZg8fOAha4kb7k1tzzqxdR1J3iAj03/G0B4+YKY0I7AaGdzhGLRAY8EVtRW
UVG99451wwRyUpg3YLk+n12MMSlq8Sy9XSjMU5/ECDzemH5GF6pPNi39nCy6JFti
oaTOwnAROfb7L3Y9ZBiIJ52Y7p4UIdS1jaSkLw0U2g0Gz+5V1/fb1hOhK5J/loYy
bC4JyoJsxn9GyJGwM93s64aWE5X+N+i7RzmysQVBI/3wepGXpG0Tsq37NnKB3iU=
=BctV
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs on a failing drive
  2014-11-17 22:55 Btrfs on a failing drive Fennec Fox
  2014-11-18  1:17 ` Phillip Susi
@ 2014-11-18  6:10 ` Chris Murphy
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2014-11-18  6:10 UTC (permalink / raw)
  To: Fennec Fox; +Cc: Btrfs BTRFS

On Nov 17, 2014, at 3:55 PM, Fennec Fox <fennectech@gmail.com> wrote:

> well i am an arch linux user and machine owner using a failing drive
>  its still relyable enough for me but btrfs seems not to mark bad
> blocks as unusable and continues to try to write to them.

It’s supposed to do try to write to them. If there is actual persistent write failure it’s the job of the firmware to reassign the affected LBA to a reserve physical sector. If it can’t do this, the drive is no longer normally operating, it should return a write error and ideally Btrfs would refuse to use the drive at all. I don’t know if that device rejection code exists yet. It hasn’t been the job of the filesystem to keep track of bad physical sectors since ancient times.

Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs on a failing drive
       [not found]   ` <CAD1x5BDJhZ6a=91G8+UzLTY+Oik7MVpr-XGKOQrOnXpkRLjwug@mail.gmail.com>
@ 2014-11-18 15:36     ` Phillip Susi
  2014-11-19  3:06       ` Duncan
       [not found]       ` <CAD1x5BDDrKoJ2Zf6Tf5MK4VBc3Q57jPaF43KOdhgcmw7uCK=Zg@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Phillip Susi @ 2014-11-18 15:36 UTC (permalink / raw)
  To: Fennec Fox; +Cc: linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Please get in the habit of using your mail client's reply-to-all
button instead of reply; there is no need for us to take this
conversation private.

On 11/17/2014 10:15 PM, Fennec Fox wrote:
<snip big smartctl output>
> i know the drive is dying and needs replacing   but i need to keep 
> this drive arround for some time longer   as i cant run from a 32
> gb usb    far too slow

If it were just a few bad sectors, then you could deal with that by
writing to them, which would force the drive to reallocate them from
the spare pool.  I'd suggest you dd /dev/zero all over the drive so
everything is written to, then check the smart stats again.  If there
were no write errors, and the smart stats show zero pending sectors,
then everything has been reallocated and you should be ok to reformat
the drive and use it.

As I said before though, the errors you posted from dmesg don't
indicate that the drive failed to read sectors, but rather that it
returned incorrect data, and this is *NEVER* supposed to happen.

I'd suggest running a few passes of badblocks over the drive, testing
writing different patterns and verifying that they read back
correctly.  If it can't do that, then there's nothing for it but to
junk the drive.

badblocks -b 4096 -c 256 -s -t 00 /dev/sda

That will read the drive and verify that it is full of zeros.  If that
passes, write a different pattern to the disk and verify that reads
back correctly:

badblocks -b 4096 -c 256 -s -w /dev/sda

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUa2duAAoJEI5FoCIzSKrw+0AIAJNAqF1rY2m5Oalehr3dz+G4
O6h9XERRiTl8GVMgcj7ZybeP3sFroItgiki5UdhRsjNoPEPRQpv3hApY7p2cEUtk
yNn8jAeRBjA0kli+5HMHY3eHL4RmLO3mrLmNoAu5HShvWBE4zj/18vvk15m/u5rj
SnrxBUSQ91V0D6p/CFkjAX9iBZBoWx4+J7Wz8EOhqnFJbqXaCEOdj7NKrjQ/7r+Q
5gxQWD4x54NQSGPfexERtRRaL9drE3JoLTbOEC+xdt7a9MwHw5Z50DTfMRzibpFP
kdKlRCLMzcNGXSVt/187MMbpvROXBWhfmAAFOCz5rGtrGjX3V6+/7hpPBn5ft3E=
=L5No
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs on a failing drive
  2014-11-18 15:36     ` Phillip Susi
@ 2014-11-19  3:06       ` Duncan
       [not found]       ` <CAD1x5BDDrKoJ2Zf6Tf5MK4VBc3Q57jPaF43KOdhgcmw7uCK=Zg@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2014-11-19  3:06 UTC (permalink / raw)
  To: linux-btrfs

Phillip Susi posted on Tue, 18 Nov 2014 10:36:14 -0500 as excerpted:

> As I said before though, the errors you posted from dmesg don't indicate
> that the drive failed to read sectors, but rather that it returned
> incorrect data, and this is *NEVER* supposed to happen.
> 
> I'd suggest running a few passes of badblocks over the drive, testing
> writing different patterns and verifying that they read back correctly.

+1 for badblocks! =:^)

Tho a hint if you decide to test multiple drives as I did some years 
ago.  Doing a multiple passes (I'd suggest at least two) on a full drive 
can take QUITE some time (days), due to the shear volume of data to be 
written to the drive, then read back to verify, then written as a new 
pattern and read back again.

But unlike IDE, the bottleneck on at least spinning rust SATA (well, 
unless you go heavy port-multiplier) tends to be the platters themselves, 
not the buses as they're point-to-point now days, or the controllers.  
Generally you can process four or more drives in parallel without slowing 
down the individual results significantly at all.  Thus, while it takes 
days to test a single drive, it normally takes the same time to process 
four drives in parallel!  So if you have 4+ devices to badblocks-test, 
definitely setup four (perhaps more, depending on hardware layout) 
instances of badblocks running at once, one to each of the devices.  Cut 
your time for all four done serially to say 8 days, to only two days when 
done in parallel! =:^)

Of course good SSDs tend to be both many times faster and several times 
smaller in capacity, so a badblocks run on them should be MUCH faster, 
perhaps a couple hours vs a couple days, and much less parallelizable 
without slowing all of them down, since they tend to saturate the bus or 
close to it (the reason fast SATA-based SSDs all tend to rate similarly 
speed-wise, the SATA bus is the bottleneck and the PCI-E bus isn't /that/ 
far behind!, tho the PCIE bus can be /enough/ faster to give direct PCIE-
interface SSDs a definite speed boost over top-of-the-line SATA interface 
devices... for those that can afford their accordingly higher prices).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs on a failing drive
       [not found]       ` <CAD1x5BDDrKoJ2Zf6Tf5MK4VBc3Q57jPaF43KOdhgcmw7uCK=Zg@mail.gmail.com>
@ 2014-11-19 18:19         ` Phillip Susi
  0 siblings, 0 replies; 6+ messages in thread
From: Phillip Susi @ 2014-11-19 18:19 UTC (permalink / raw)
  To: Fennec Fox; +Cc: linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Again, please stop taking this conversation private; keep the mailing
list on the Cc.

On 11/19/2014 11:37 AM, Fennec Fox wrote:
> well ive used spinrite and its found a few sectors   and they
> never move   so obviously the drives firmware isnt dealing with bad
> blocks on the drive   anyways ive got a new drive on order  but
> what can i do to prevent the drive from killing any more data?

The drive will only remap bad blocks when you try to write to them, so
if you haven't written to them then it is no surprise that they aren't
going anywhere.

If the drive is actually returning bad data rather than failing the
read outright, then the only thing you can do is to have btrfs
duplicate all data so if the checksum on one copy is bad it can try
the other.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbN8VAAoJEI5FoCIzSKrwGjkIAKxXbBcMaItyBe08yC/bipUH
2crWLj5MKej1sn1HEo1WqgJM1hCEZuHCBa8I6ZIECcZmzs4rvKhzU4WWIQ7J/tMN
8OYUzdsWboxbKHY5hrNEVsi8QcUTbz7HT3doaaYDhI7qERu1Ib/4FH+m5yFYEIu8
tx5+N2PzyXctDlNnjY/pcFg+I2+QyA5Rb9X+fLpvVoZCEW7TTMhejfKSQpMEfzHW
JsYyKwDpQO6cGIWi19P7pgHc2bsCzShPtFo9UQJh5TtuxjsqP01ju1UfQBX0+Y25
B2LDAjyGE71pY68tBuS7EC9XSB9Iks5yEJotmwYTv3/L7bgDeAGPrj5cFOKG9Tc=
=8JoK
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-11-19 18:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-17 22:55 Btrfs on a failing drive Fennec Fox
2014-11-18  1:17 ` Phillip Susi
     [not found]   ` <CAD1x5BDJhZ6a=91G8+UzLTY+Oik7MVpr-XGKOQrOnXpkRLjwug@mail.gmail.com>
2014-11-18 15:36     ` Phillip Susi
2014-11-19  3:06       ` Duncan
     [not found]       ` <CAD1x5BDDrKoJ2Zf6Tf5MK4VBc3Q57jPaF43KOdhgcmw7uCK=Zg@mail.gmail.com>
2014-11-19 18:19         ` Phillip Susi
2014-11-18  6:10 ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox