From: Ramin <ramin.t@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: Cannot sync RAID5 anymore - MCE problem?
Date: Tue, 14 Aug 2007 11:16:52 +0200 [thread overview]
Message-ID: <46C17304.5000803@gmail.com> (raw)
In-Reply-To: <46C07F41.2080500@gmail.com>
Ramin wrote:
> Hello everybody.
>
> I have a strange and severe problem with my Raid-Array and I have to
> contact "experts" before I can continue with a clear conscience.
> I wanted to exchange 2 disks in my Raid5 array by two newer ones. So I
> connected the new to my machine partitioned the correct layout added one
> of them as a Spare and "--faild" the partition of the disk I wanted to
> remove first. Rebulding of the array started immediately and finished fine.
> Now I took the two old disks out and put the new ones in. By removing the
> other disk from my array I degraded it. After booting I added the correct
> partition of my new drive to the Raid5 and waited for the syncing to
> finish ...
> but it didn't. Is crashed my whole machine with an MCE:
> CPU 0: Machine check exception: 4 bank 4 b20000000000070f0f
> ...
> Kernel Panic - not synching: machine check
>
> The reason why I write an MCE problem to the Software-Raid list is that
> this problem is very reproducible and always happens when resyncing of my
> array has finished 24.9%. I tried it about ten times so I am really sure
> that there is some connection to resyncing since this problem does not
> seem to appear under different conditions anymore.
> I tried to do an rsync-backup of my raid-array which lead the the same
> crash. After that I observed that this crash has occured when copying a
> not so important Backup of something else. I deleted that old Backup and
> since that my problem seems to ONLY occur if I try to resync my array.
>
> I am running Gentoo on an AMD64 3200+ and K8N Neo4 Platinum and my problem
> seems to be similar to the problems of these guys:
> http://kerneltrap.org/node/4993
> but somehow related to resyncing. I have reiserfs on my array and
> successfully completed a "reiserfsck --rebuild-tree".
> I think it is not important but it might be good to mention that I use
> LVM, too.
>
> I have also tried to resync the array to my old disk (with the second new
> one removed), but that leads to the same problem.
>
> I have tried several things like removing one RAM module or using
> different RAM-Banks I checked for leaking caps I tried without DMA, tried
> different kernels and played with some kernel options.
>
> Is there a way to figure out what hardware seems to be the problem?
> My hardware worked flawlessly for over 1.5 years if I did not break
> something while physically mounting the disks or cleaning dust out of the
> case it can only be a problem of the first new harddrive (which is
> unfortunately part of my degraded raid-array already). Is it possible that
> an SATA1 Cable on a SATA2 capable controller connected to a SATA2 capable
> disk leads to such errors?
>
> Since I was able to copy my data I think it is in perfect condition, but
> there seems to be a problem on the array in the "empty"-part. Does anybody
> know a way how to over- or rewrite the empty blocks of a
> reiserfs-partition? Or some tool to find/correct disk-problems. (I tried
> reiserfsck but that does not find anything)
>
> What is the smartest way for my to proceed to get my degraded array
> redundant again?
> I could delete the whole array, try set it up identically again and recopy
> the data, but if it is really a hardware problem that would be a waste of
> time.
>
> Thanks in advance ...
> Ramin
Figured out my problem myself ... I did a
dd if=/dev/zero of=/home/file
and waited until the disk was full.
/home is the main lvm volume on my raid. After that i deleted the file
again and re-added the new partition to the disk.
Now everything worked/synced fine.
Maybe one should improve the error messages?
It might be philosophical but I would say it was more of a software rather
that a hardware problem.
Regards
Ramin
parent reply other threads:[~2007-08-14 9:16 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <46C07F41.2080500@gmail.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46C17304.5000803@gmail.com \
--to=ramin.t@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).