From: Ramon Hofer <ramonhofer@bluewin.ch>
To: stan@hardwarefreak.com
Cc: linux-raid@vger.kernel.org
Subject: Re: Stacked array data recovery
Date: Tue, 26 Jun 2012 10:37:19 +0200 [thread overview]
Message-ID: <1340699839.3241.29.camel@hoferr-desktop.hofer.rummelring> (raw)
In-Reply-To: <4FE91619.4020709@hardwarefreak.com>
On Mon, 2012-06-25 at 20:53 -0500, Stan Hoeppner wrote:
> On 6/25/2012 5:31 AM, Ramon Hofer wrote:
> > On Sun, 24 Jun 2012 22:51:32 -0500, Stan Hoeppner wrote:
> >
> >> On 6/24/2012 9:12 AM, Stan Hoeppner wrote:
> >>
> >>> That's premature. If you don't have any irreplaceable data on md9 yet,
> >>> I'd recommend erasing all 4 EARS drives with the dd command so you have
> >>> a "fresh start".
> >>
> >> Sorry Ramon, I meant the Samsungs here, not EARS. You probably
> >> understood.
> >
> > No, sorry I'm a bit confused.
>
> I'm confused as well. The error you pasted was on md9, which I thought
> was the old Samsung array.
Sorry, I should have been more precise.
After I was able to recover md1 (WD blacks) I created md2 with the
Samungs.
Then I wanted to test the WD greens by creating md9 and copying the
mythtv recordings onto it. (I wanted to do that because I wanted to
switch to xfs as well for the recordings drive.)
> [61142.466334] md/raid:md9: read error not correctable (sector 3758190680
> on sdk).
> [61142.466338] md/raid:md9: Disk failure on sdk, disabling device.
>
> Which disk is /dev/sdk? WD20EARS or Samsung?
All the disks from md9 now are WD20EARS.
Sorry again for the confusion!
> > The Samsung drives worked fine so far. I already have used the linear
> > array and don't know what is written to md2 through md0.
> > But I could remove one Samsung disk from md2, dd it, re add it and do
> > this procedure for the other three Samsungs.
>
> Ok, so md1 are the Blacks, md2 are the Samsungs. You tried to create
> another array, md9, using the WD20EARS, and one, /dev/sdk, generated the
> error above. Is this correct?
Exactly.
> > What about the WD green?
>
> Ok, so currently the WD20EARS drives are not part of an array, correct?
> And you're following the procedure I posted to dd the four drives, correct?
No, they're not.
And yes, I did. But the server behaved very strangely. Sometimes I
couldn't ssh into it anymore. Sometimes I could and the connection
froze.
> > I tried to dd them yesterday
>
> There is no "try" here. Once you start the dd commands they run until
> complete. You didn't kill the processes did you?
I wanted to watch a movie that evening. It streamed fine until about 15
min to the end but I really had to see the end before going to bed.
> > but when I wanted to stream a movie from the
> > server it stopped.
>
> What do you mean "it stopped"? What stopped? The playback in the
> client app?
Yes.
I first thought it was because of the client app. But after I couldn't
ssh into the server and freezings of the ssh connection I thought I'd
reboot it.
I thought it couldn't be very hard to write a lot of zeros...
> > Sometimes I couldn't even ssh into the server and when
> > I could the remote shell froze after a very short time.
>
> You had 4 dd processes writing zeros to 4 drives at full bandwidth,
> consuming something like 480MB/s at the beginning and around 200MB/s at
> the end as the platter diameter gets smaller. The controller chip on
> the LSI HBA is seeing tens of thousands of write IOPS. Not to mention
> the four dd processes are generating a good deal of CPU load. And it
> you're not running irqbalance, which you're surely not, interrupts from
> the controller are only going to 1 CPU core.
>
> My point is, running these 4 dd's in parallel is going to be very taxing
> on your system. I guess I should have added a caveat/warning in my 'dd'
> email that you should not do any other work on the system while it's
> dd'ing the 4 drives. Sorry for failing to mention this.
I ran top to see if the system is busy. And I saw that the cpu isn't.
But the system load was as high as never before (around 10).
Now I see that the movie couldn't be streamed because the LSI controller
didn't have any bandwidth left for the movie.
So maybe I can just rerun the four dd commands when the server isn't
busy? Or even take out the drives and run the command on another
machine?
> > Should I try to dd them again but one after the other so that I know
> > which one makes problems?
>
> You first need to explain what you mean by "try again". Unless you
> killed the processes, or rebooted or power cycled the machine, the dd
> processes would have run to completion. I get the feeling you've
> omitted some important details.
Sorry, I didn't explain properly what I did.
When the dd command was running for some time I wanted to watch that
movie in the evening. Unfortunately it stopped about 15 minutes before
it was finished and it was very thrilling ;-)
So I rebooted the frontend machine because I thought it was because I
use a xbmc version with mythtv pvr support which is alpha or beta.
But the movie stopped after some seconds. It's really strange because
ite ran fine for about 1 hour 50 mins. Only the last 15 or 20 minutes
made problems.
When I first ssh-ed into the server the connection froze like if the
network connection had gone. But I could still ping it. I tried several
times. Sometimes I couldn't login sometimes I could.
Btw I ran the four dd commands within a screen session if this is of any
importance?
> Oh, please reply-to-all Ramon so these hit my inbox. List mail goes to
> separate folders, and I don't check them in a timely manner.
Sorry the last time I used pan to reply. It's not possible to reply to
the list and you at the same time with it.
But evolution can :-)
Best regards
Ramon
next prev parent reply other threads:[~2012-06-26 8:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-21 22:44 Stacked array data recovery Ramon Hofer
2012-06-22 14:32 ` Ramon Hofer
2012-06-23 12:05 ` Stan Hoeppner
2012-06-22 14:37 ` Ramon Hofer
2012-06-23 12:09 ` Stan Hoeppner
2012-06-24 12:15 ` Ramon Hofer
2012-06-24 14:12 ` Stan Hoeppner
2012-06-25 3:51 ` Stan Hoeppner
2012-06-25 10:31 ` Ramon Hofer
2012-06-26 1:53 ` Stan Hoeppner
2012-06-26 8:37 ` Ramon Hofer [this message]
2012-06-26 20:23 ` Stan Hoeppner
2012-06-27 9:07 ` Ramon Hofer
2012-06-27 12:34 ` Stan Hoeppner
2012-06-27 19:19 ` Ramon Hofer
2012-06-28 19:57 ` Stan Hoeppner
2012-06-29 7:58 ` Ramon Hofer
2012-06-28 18:44 ` Krzysztof Adamski
2012-06-29 7:44 ` Ramon Hofer
2012-06-29 10:15 ` John Robinson
2012-06-29 11:19 ` Ramon Hofer
2012-07-02 10:12 ` Ramon Hofer
2012-07-02 11:46 ` Phil Turmel
2012-07-02 12:18 ` Ramon Hofer
2012-07-02 21:42 ` Phil Turmel
2012-07-02 20:27 ` Stan Hoeppner
2012-07-03 7:16 ` Ramon Hofer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1340699839.3241.29.camel@hoferr-desktop.hofer.rummelring \
--to=ramonhofer@bluewin.ch \
--cc=linux-raid@vger.kernel.org \
--cc=stan@hardwarefreak.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).