Linux RAID subsystem development
 help / color / mirror / Atom feed
From: "Vanhorn, Mike" <michael.vanhorn@wright.edu>
To: Phil Turmel <philip@turmel.org>, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Odd --examine output
Date: Fri, 12 Apr 2013 16:47:53 +0000	[thread overview]
Message-ID: <CD8DAF4D.412EF%michael.vanhorn@wright.edu> (raw)
In-Reply-To: <51681FB2.8060803@turmel.org>

On 4/12/13 10:52 AM, "Phil Turmel" <philip@turmel.org> wrote:

[snip]

>As noted above, the partition tables aren't wiped.  Just the device
>nodes are missing.  You could try a "blockdev --rereadpt /dev/sdX" on
>affected drives to see if it is a transient issue.

That did it! I was able to run blockdev for all of the drives that had
missing devices for the partitions, and then was able to

 mdadm --assemble --force /dev/md0 /dev/sd[cdefghi]1

and it assembled using all of the disks except, for some reason, sde1 and
sdf1. I think sde1 got left out because it had been dropped before the
raid actually stopped, and I think I could have added it back in with

 mdadm /dev/md0 --re-add /dev/sde1

(since /dev/sde actually seems to be fine). However, once I got the
filesystem mounted, my first priority was to get the data off, so I didn't
try to re-add that disk.

I don't know why sdf1 got left out.
 
[snip]

>If the partition is *not* aligned, each large chunk written will have at
>least two R-M-W cycles.

I snipped most of that explanation, but thank you for it; it really helps
me understand what was going on with my partitions.

>I guess "lsdrv" didn't work for you.  I'm naturally curious how it
>failed....

I don't have an lsdrv command, so I did the 'ls -l' that you suggested.

>Anyways, your detailed smartctl reports show big problems:
>
>1)  You have multiple drives with many dozens of pending relocations.
>This suggests that your regular scrubs are not happening on schedule.  A
>"check" scrub turns pending relocations into either real relocations, or
>no error at all (successful rewrite).  Typically the latter.

I've got a raid-check script that runs from cron.weekly. I really did
think it was working, because every week I would check and the array was
re-building.

>2) All of your self-test log entries show "short offline".  That isn't
>rigorous enough.  You need "long offline" self-tests occasionally, too.
> Or just use the long self-test every time.

I will take this into account, and being using the long test.

>3) You have a drive that entirely failed its SMART assessment
>{WD-WMAUR0381532 ==> /dev/sdj} due to excessive actual relocations.
>Replace this drive immediately.

I will. I have a spare disk on the shelf ready to go, once I feel safe
that the data is copied.

[snip]

>NOT a guess.  Back up what you can, while you can, and start over.  Use
>"fdisk -u" so you can ensure partitions start on multiples of eight (8)
>sectors.  (Modern fdisk uses 1MB alignment by default.  Highly
>recommended.)

That is exactly what I'm going to do. I feel like an idiot that there
seems to have been so many things wrong and I didn't realize it. Now,
thanks to your help, and I am much more enlightened.

Thanks!

---
Mike VanHorn
Senior Computer Systems Administrator
College of Engineering and Computer Science
Wright State University
265 Russ Engineering Center
937-775-5157
michael.vanhorn@wright.edu
http://www.cecs.wright.edu/~mvanhorn/





       reply	other threads:[~2013-04-12 16:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <51681FB2.8060803@turmel.org>
2013-04-12 16:47 ` Vanhorn, Mike [this message]
2013-04-12 17:21   ` Odd --examine output Phil Turmel
2013-04-15 13:46 ` Vanhorn, Mike
2013-04-15 14:00   ` Phil Turmel
2013-04-15 18:42   ` Roy Sigurd Karlsbakk
2013-04-15 20:13     ` John Stoffel
2013-04-15 16:06       ` Oliver Schinagl
2013-04-16  8:58       ` Robin Hill
2013-04-18 11:33         ` Roy Sigurd Karlsbakk
2013-04-18 13:03           ` John Stoffel
2013-04-18 14:22             ` Roy Sigurd Karlsbakk
2013-04-18 11:37         ` Roy Sigurd Karlsbakk
2013-04-18  6:32     ` Sam Bingner
2013-04-11 12:47 Vanhorn, Mike
2013-04-11 20:31 ` Vanhorn, Mike
2013-04-11 21:15   ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CD8DAF4D.412EF%michael.vanhorn@wright.edu \
    --to=michael.vanhorn@wright.edu \
    --cc=linux-raid@vger.kernel.org \
    --cc=philip@turmel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox