linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: linux-raid.vger.kernel.org@atu.cjb.net
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: RAID-6 mdadm disks out of sync issue (more questions)
Date: Mon, 15 Jun 2009 11:48:33 -0400	[thread overview]
Message-ID: <4A366D51.6020003@tmr.com> (raw)
In-Reply-To: <200906142101.n5EL1Hpj087478@cjb.net>

linux-raid.vger.kernel.org@atu.cjb.net wrote:
>> This doesn't make a lot of sense.  It should not have been marked
>> as a spare unless someone explicitly tried to "Add" it to the
>> array.
>>
>> However you description of event suggests that this was automatic
>> which is strange.
>>     
>
> Yes, it was entirely automatic.  The only commands I had running on the computer when it happened were:
>
> # watch -n 0.1 'uptime; echo; cat /proc/mdstat|grep md13 -A 2; echo; dmesg|tac'
>
> This gave me a nice, simple display of what was going on with the
> rebuild, and a monitor of dmesg in case there were any new kernel
> messages.
>
>   
>> Can I get the complete kernel logs from when the rebuild started
>> to when you finally gave up?  It might help me understand.
>>     
>
> Sure.
>
> Just to confirm, /dev/sd{a,b,c,d,e,f}1 are the partitions which
> contain my up-to-date data.  /dev/sd{i,j}1 contain many days old data.
>
> Here is the entire dmesg output during the rebuild:
>   
> I left it running for about an hour, and none of the disks had any errors.
> I really hope it is not a permanent fault 75% of the way through the disk.
> Though if it was just bad sectors, why would the disk be disconnecting
> from the system?
>
> Thanks again for all your help.
>
>   

I really don't see any indication that this is a kernel issue, my VM 
host machine has multiple VMs, including this "desktop" system, and runs 
raid5 and raid10, and has had no "ata" messages in 15 days of uptime, 
obviously with lots of disk use. The only thought I do have is that it 
is at least possible that you have a marginal something in your 
hardware, possibly memory, or a controller, and that two things which 
might be useful to check are the memory (memtest) and using 'sensors' to 
monitor heat. I have seen drives which worked fine until you ran them 
hard for 20-30 minutes and then started getting errors (usually seek). 
Just a few things to consider, since you have put this much effort into 
characterizing the problem.

-- 
Bill Davidsen <davidsen@tmr.com>
  Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one error occurs during
wildcard (glob) expansion.


  reply	other threads:[~2009-06-15 15:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <S1752989AbZFJCy5/20090610025457Z+40@vger.kernel.org>
2009-06-10  8:52 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org
2009-06-10 10:55   ` NeilBrown
2009-06-11 18:43     ` RAID-6 mdadm disks out of sync issue (five questions) linux-raid.vger.kernel.org
2009-06-11 23:33       ` Michael Tokarev
2009-06-12  1:26         ` Neil Brown
2009-06-13  9:18           ` RAID-6 mdadm disks out of sync issue (no success) linux-raid.vger.kernel.org
2009-06-13  9:24             ` linux-raid.vger.kernel.org
2009-06-13  9:58             ` NeilBrown
2009-06-13 18:02               ` linux-raid.vger.kernel.org
2009-06-13 20:27                 ` RAID-6 mdadm disks out of sync issue (success!) linux-raid.vger.kernel.org
2009-06-14  7:10                   ` RAID-6 mdadm disks out of sync issue (more questions) linux-raid.vger.kernel.org
2009-06-14  8:11                     ` NeilBrown
2009-06-14 21:01                       ` linux-raid.vger.kernel.org
2009-06-15 15:48                         ` Bill Davidsen [this message]
2009-06-16  6:00                         ` Neil Brown
2009-06-16  8:13                           ` linux-raid.vger.kernel.org
2009-06-16  3:38                       ` Luca Berra
2009-06-16  5:00                         ` linux-raid.vger.kernel.org
2009-06-10  8:58 ` RAID-6 mdadm disks out of sync issue (long e-mail) linux-raid.vger.kernel.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A366D51.6020003@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-raid.vger.kernel.org@atu.cjb.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).