From: Corey Coughlin <corey.coughlin.cc3@gmail.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: raid1 has failing disks, but smart is clear
Date: Wed, 6 Jul 2016 15:14:06 -0700 [thread overview]
Message-ID: <577D82AE.3040005@gmail.com> (raw)
Hi all,
Hoping you all can help, have a strange problem, think I know
what's going on, but could use some verification. I set up a raid1 type
btrfs filesystem on an Ubuntu 16.04 system, here's what it looks like:
btrfs fi show
Label: none uuid: 597ee185-36ac-4b68-8961-d4adc13f95d4
Total devices 10 FS bytes used 3.42TiB
devid 1 size 1.82TiB used 1.18TiB path /dev/sdd
devid 2 size 698.64GiB used 47.00GiB path /dev/sdk
devid 3 size 931.51GiB used 280.03GiB path /dev/sdm
devid 4 size 931.51GiB used 280.00GiB path /dev/sdl
devid 5 size 1.82TiB used 1.17TiB path /dev/sdi
devid 6 size 1.82TiB used 823.03GiB path /dev/sdj
devid 7 size 698.64GiB used 47.00GiB path /dev/sdg
devid 8 size 1.82TiB used 1.18TiB path /dev/sda
devid 9 size 1.82TiB used 1.18TiB path /dev/sdb
devid 10 size 1.36TiB used 745.03GiB path /dev/sdh
I added a couple disks, and then ran a balance operation, and that took
about 3 days to finish. When it did finish, tried a scrub and got this
message:
scrub status for 597ee185-36ac-4b68-8961-d4adc13f95d4
scrub started at Sun Jun 26 18:19:28 2016 and was aborted after
01:16:35
total bytes scrubbed: 926.45GiB with 18849935 errors
error details: read=18849935
corrected errors: 5860, uncorrectable errors: 18844075, unverified
errors: 0
So that seems bad. Took a look at the devices and a few of them have
errors:
...
[/dev/sdi].generation_errs 0
[/dev/sdj].write_io_errs 289436740
[/dev/sdj].read_io_errs 289492820
[/dev/sdj].flush_io_errs 12411
[/dev/sdj].corruption_errs 0
[/dev/sdj].generation_errs 0
[/dev/sdg].write_io_errs 0
...
[/dev/sda].generation_errs 0
[/dev/sdb].write_io_errs 3490143
[/dev/sdb].read_io_errs 111
[/dev/sdb].flush_io_errs 268
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0
[/dev/sdh].write_io_errs 5839
[/dev/sdh].read_io_errs 2188
[/dev/sdh].flush_io_errs 11
[/dev/sdh].corruption_errs 1
[/dev/sdh].generation_errs 16373
So I checked the smart data for those disks, they seem perfect, no
reallocated sectors, no problems. But one thing I did notice is that
they are all WD Green drives. So I'm guessing that if they power down
and get reassigned to a new /dev/sd* letter, that could lead to data
corruption. I used idle3ctl to turn off the shut down mode on all the
green drives in the system, but I'm having trouble getting the
filesystem working without the errors. I tried a 'check --repair'
command on it, and it seems to find a lot of verification errors, but it
doesn't look like things are getting fixed. But I have all the data on
it backed up on another system, so I can recreate this if I need to.
But here's what I want to know:
1. Am I correct about the issues with the WD Green drives, if they
change mounts during disk operations, will that corrupt data?
2. If that is the case:
a.) Is there any way I can stop the /dev/sd* mount points from
changing? Or can I set up the filesystem using UUIDs or something more
solid? I googled about it, but found conflicting info
b.) Or, is there something else changing my drive devices? I have
most of drives on an LSI SAS 9201-16i card, is there something I need to
do to make them fixed?
c.) Or, is there a script or something I can use to figure out if
the disks will change mounts?
d.) Or, if I wipe everything and rebuild, will the disks with the
idle3ctl fix work now?
Regardless of whether or not it's a WD Green drive issue, should I just
wipefs all the disks and rebuild it? Is there any way to recover this?
Thanks for any help!
------- Corey
next reply other threads:[~2016-07-06 22:14 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-06 22:14 Corey Coughlin [this message]
2016-07-06 22:59 ` raid1 has failing disks, but smart is clear Tomasz Kusmierz
2016-07-07 6:40 ` Corey Coughlin
2016-07-08 1:24 ` Duncan
2016-07-08 4:51 ` Corey Coughlin
2016-07-09 5:51 ` Andrei Borzenkov
2016-07-09 5:40 ` Andrei Borzenkov
2016-07-12 4:50 ` Corey Coughlin
2016-07-07 11:58 ` Austin S. Hemmelgarn
2016-07-08 4:50 ` Corey Coughlin
2016-07-08 11:14 ` Tomasz Kusmierz
2016-07-08 12:14 ` Austin S. Hemmelgarn
2016-07-09 5:13 ` Corey Coughlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=577D82AE.3040005@gmail.com \
--to=corey.coughlin.cc3@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.