Re: Stacked array data recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ramon Hofer <ramonhofer@bluewin.ch>
To: stan@hardwarefreak.com
Cc: linux-raid@vger.kernel.org
Subject: Re: Stacked array data recovery
Date: Tue, 26 Jun 2012 10:37:19 +0200	[thread overview]
Message-ID: <1340699839.3241.29.camel@hoferr-desktop.hofer.rummelring> (raw)
In-Reply-To: <4FE91619.4020709@hardwarefreak.com>

On Mon, 2012-06-25 at 20:53 -0500, Stan Hoeppner wrote:
> On 6/25/2012 5:31 AM, Ramon Hofer wrote:
> > On Sun, 24 Jun 2012 22:51:32 -0500, Stan Hoeppner wrote:
> > 
> >> On 6/24/2012 9:12 AM, Stan Hoeppner wrote:
> >>
> >>> That's premature.  If you don't have any irreplaceable data on md9 yet,
> >>> I'd recommend erasing all 4 EARS drives with the dd command so you have
> >>> a "fresh start".
> >>
> >> Sorry Ramon, I meant the Samsungs here, not EARS.  You probably
> >> understood.
> > 
> > No, sorry I'm a bit confused.
> 
> I'm confused as well.  The error you pasted was on md9, which I thought
> was the old Samsung array.

Sorry, I should have been more precise.

After I was able to recover md1 (WD blacks) I created md2 with the
Samungs.

Then I wanted to test the WD greens by creating md9 and copying the
mythtv recordings onto it. (I wanted to do that because I wanted to
switch to xfs as well for the recordings drive.)

> [61142.466334] md/raid:md9: read error not correctable (sector 3758190680
> on sdk).
> [61142.466338] md/raid:md9: Disk failure on sdk, disabling device.
> 
> Which disk is /dev/sdk?  WD20EARS or Samsung?

All the disks from md9 now are WD20EARS.

Sorry again for the confusion!

> > The Samsung drives worked fine so far. I already have used the linear 
> > array and don't know what is written to md2 through md0.
> > But I could remove one Samsung disk from md2, dd it, re add it and do 
> > this procedure for the other three Samsungs.
> 
> Ok, so md1 are the Blacks, md2 are the Samsungs.  You tried to create
> another array, md9, using the WD20EARS, and one, /dev/sdk, generated the
> error above.  Is this correct?

Exactly.

> > What about the WD green?
> 
> Ok, so currently the WD20EARS drives are not part of an array, correct?
>  And you're following the procedure I posted to dd the four drives, correct?

No, they're not.
And yes, I did. But the server behaved very strangely. Sometimes I
couldn't ssh into it anymore. Sometimes I could and the connection
froze.

> > I tried to dd them yesterday 
> 
> There is no "try" here.  Once you start the dd commands they run until
> complete.  You didn't kill the processes did you?

I wanted to watch a movie that evening. It streamed fine until about 15
min to the end but I really had to see the end before going to bed.

> > but when I wanted to stream a movie from the 
> > server it stopped. 
> 
> What do you mean "it stopped"?  What stopped?  The playback in the
> client app?

Yes.
I first thought it was because of the client app. But after I couldn't
ssh into the server and freezings of the ssh connection I thought I'd
reboot it.

I thought it couldn't be very hard to write a lot of zeros...

> > Sometimes I couldn't even ssh into the server and when 
> > I could the remote shell froze after a very short time.
> 
> You had 4 dd processes writing zeros to 4 drives at full bandwidth,
> consuming something like 480MB/s at the beginning and around 200MB/s at
> the end as the platter diameter gets smaller.  The controller chip on
> the LSI HBA is seeing tens of thousands of write IOPS.  Not to mention
> the four dd processes are generating a good deal of CPU load.  And it
> you're not running irqbalance, which you're surely not, interrupts from
> the controller are only going to 1 CPU core.
> 
> My point is, running these 4 dd's in parallel is going to be very taxing
> on your system.  I guess I should have added a caveat/warning in my 'dd'
> email that you should not do any other work on the system while it's
> dd'ing the 4 drives.  Sorry for failing to mention this.

I ran top to see if the system is busy. And I saw that the cpu isn't.
But the system load was as high as never before (around 10).
Now I see that the movie couldn't be streamed because the LSI controller
didn't have any bandwidth left for the movie.

So maybe I can just rerun the four dd commands when the server isn't
busy? Or even take out the drives and run the command on another
machine?

> > Should I try to dd them again but one after the other so that I know 
> > which one makes problems?
> 
> You first need to explain what you mean by "try again".  Unless you
> killed the processes, or rebooted or power cycled the machine, the dd
> processes would have run to completion.  I get the feeling you've
> omitted some important details.

Sorry, I didn't explain properly what I did.

When the dd command was running for some time I wanted to watch that
movie in the evening. Unfortunately it stopped about 15 minutes before
it was finished and it was very thrilling ;-)

So I rebooted the frontend machine because I thought it was because I
use a xbmc version with mythtv pvr support which is alpha or beta.

But the movie stopped after some seconds. It's really strange because
ite ran fine for about 1 hour 50 mins. Only the last 15 or 20 minutes
made problems.

When I first ssh-ed into the server the connection froze like if the
network connection had gone. But I could still ping it. I tried several
times. Sometimes I couldn't login sometimes I could.

Btw I ran the four dd commands within a screen session if this is of any
importance?

> Oh, please reply-to-all Ramon so these hit my inbox.  List mail goes to
> separate folders, and I don't check them in a timely manner.

Sorry the last time I used pan to reply. It's not possible to reply to
the list and you at the same time with it.
But evolution can :-)

Best regards
Ramon

next prev parent reply	other threads:[~2012-06-26  8:37 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-21 22:44 Stacked array data recovery Ramon Hofer
2012-06-22 14:32 ` Ramon Hofer
2012-06-23 12:05   ` Stan Hoeppner
2012-06-22 14:37 ` Ramon Hofer
2012-06-23 12:09   ` Stan Hoeppner
2012-06-24 12:15     ` Ramon Hofer
2012-06-24 14:12       ` Stan Hoeppner
2012-06-25  3:51         ` Stan Hoeppner
2012-06-25 10:31           ` Ramon Hofer
2012-06-26  1:53             ` Stan Hoeppner
2012-06-26  8:37               ` Ramon Hofer [this message]
2012-06-26 20:23                 ` Stan Hoeppner
2012-06-27  9:07                   ` Ramon Hofer
2012-06-27 12:34                     ` Stan Hoeppner
2012-06-27 19:19                       ` Ramon Hofer
2012-06-28 19:57                         ` Stan Hoeppner
2012-06-29  7:58                           ` Ramon Hofer
2012-06-28 18:44                     ` Krzysztof Adamski
2012-06-29  7:44                       ` Ramon Hofer
2012-06-29 10:15                         ` John Robinson
2012-06-29 11:19                           ` Ramon Hofer
2012-07-02 10:12                   ` Ramon Hofer
2012-07-02 11:46                     ` Phil Turmel
2012-07-02 12:18                       ` Ramon Hofer
2012-07-02 21:42                         ` Phil Turmel
2012-07-02 20:27                     ` Stan Hoeppner
2012-07-03  7:16                       ` Ramon Hofer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1340699839.3241.29.camel@hoferr-desktop.hofer.rummelring \
    --to=ramonhofer@bluewin.ch \
    --cc=linux-raid@vger.kernel.org \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).