From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tudor Holton <tudor@smartguide.com.au>
Subject: Re: Spare disk not becoming active
Date: Thu, 20 Dec 2012 10:19:57 +1100
Message-ID: <50D24B9D.8000801@smartguide.com.au>
References: <50BBEC7E.7080200@smartguide.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <linux-raid-owner@vger.kernel.org>
In-Reply-To: <50BBEC7E.7080200@smartguide.com.au>
Sender: linux-raid-owner@vger.kernel.org
To: linux-raid@vger.kernel.org
List-Id: linux-raid.ids

I don't mean to be rude, but it's been two weeks and my system is still 
in this state.  Bump, anyone?

A thorough search of the web (before I originally posted this to the 
list) revealed nothing.  No explanation as to why this occurs seemed 
apparent, only that it's happened a number of times.  Most reports 
indicate that a complete stop of the array and reassemble fixes it, but 
I tried that and it still returned to spare. Some reports indicated my 
position but no response that seems complete.

Eventually the discussions runs to wiping the disks and starting again.  
That seems a bit drastic and I'm concerned that *one* of the disks is 
faulty but not being reported as such, and I don't want to pick the 
wrong one to wipe off the superblock.  mdadm reports no errors, but 
SMART indicates there may be a problem with the *active* disk, which is 
even more worrying because without making the spare active I can't 
remove it to test it properly.

Any ideas?

Cheers,
Tudor.

On 03/12/12 11:04, Tudor Holton wrote:
> Hallo,
>
> I'm having some trouble with an array I have that has become degraded.
>
> I have an array with this array state:
>
> md101 : active raid1 sdf1[0] sdb1[2](S)
>       1953511936 blocks [2/1] [U_]
>
>
> mdadm --detail says:
>
> /dev/md101:
>         Version : 0.90
>   Creation Time : Thu Jan 13 14:34:27 2011
>      Raid Level : raid1
>      Array Size : 1953511936 (1863.01 GiB 2000.40 GB)
>   Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 101
>     Persistence : Superblock is persistent
>
>     Update Time : Fri Nov 23 03:23:04 2012
>           State : clean, degraded
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 1
>
>            UUID : 43e92a79:90295495:0a76e71e:56c99031 (local to host 
> barney)
>          Events : 0.2127
>
>     Number   Major   Minor   RaidDevice State
>        0       8       81        0      active sync /dev/sdf1
>        1       0        0        1      removed
>
>        2       8       17        -      spare   /dev/sdb1
>
>
> If I attempt to force the spare to become active it begins to recover:
> $ sudo mdadm -S /dev/md101
> mdadm: stopped /dev/md101
> $ sudo mdadm --assemble --force --no-degraded /dev/md101 /dev/sdf1 
> /dev/sdb1
> mdadm: /dev/md101 has been started with 1 drive (out of 2) and 1 spare.
> $ cat /proc/mdstat
> md101 : active raid1 sdf1[0] sdb1[2]
>       1953511936 blocks [2/1] [U_]
>       [>....................]  recovery =  0.0% (541440/1953511936) 
> finish=420.8min speed=77348K/sec
>
> This runs for the allotted time but returns to the state of spare.
>
> Neither disk partition report errors:
> $ cat /sys/block/md101/md/dev-sdf1/errors
> 0
> $ cat /sys/block/md101/md/dev-sdb1/errors
> 0
>
> Are there mdadm logs to find out why this is not recovering properly?  
> How otherwise do I debug this?
>
> Cheers,
> Tudor.
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html