linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* all of my drives are spares
@ 2023-09-08  2:50 David T-G
  2023-09-09 11:26 ` David T-G
  0 siblings, 1 reply; 7+ messages in thread
From: David T-G @ 2023-09-08  2:50 UTC (permalink / raw)
  To: Linux RAID list

Hi, all --

After a surprise reboot the other day, I came home to find diskfarm's
RAID5 arrays all offline with all disks marked as spares.  wtf?!?

After some googling around I found

  https://ronhks.hu/2021/01/07/mdadm-raid-5-all-disk-became-spare/

(for recent example) that it has happened to others, and at least the
pieces are all there rather than completely destroyed, but before I try
stopping and reassembling each array I thought I should double check :-)

Below is the output of a big ol' debugging run.  I tried to dump only what
is interesting :-)  [The smartctl-disks-timeout.sh script is based on the
wiki site script to check and set as necessary.]  I'm not sure why sd[dbc]
show a missing device while sd[lkf] are happy on each array, and I wonder
what happened to md53 with the widely differing event counter that may
make assembly interesting (and why does md52 have such a low event count
when the six of these are linear striped into a big fat array?).

Soooooo ...  What do you guys suggest to get us back up and happy?  TIA


  diskfarm:~ # uname -a ; mdadm --version ; for D in sd{d,c,b,l,k,f} ; do mdadm -E /dev/$D ; smartctl -H -i /dev/$D | egrep 'Model|SMART' | sed -e 's/^/    /' ; done ; echo '' ; for A in 51 52 53 54 55 56 ; do egrep md$A /proc/mdstat ; mdadm -D /dev/md$A | egrep 'Version|State|Events|/dev' ; for D in sd{d,b,c,l,k,f} ; do echo $D$A ; mdadm -E /dev/$D$A | egrep 'Raid|State|Events' ; done ; echo '' ; done ; /usr/local/bin/smartctl-disks-timeout.sh

  Linux diskfarm 5.3.18-lp152.106-default #1 SMP Mon Nov 22 08:38:17 UTC 2021 (52078fe) x86_64 x86_64 x86_64 GNU/Linux
  mdadm - v4.1 - 2018-10-01
  /dev/sdd:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     TOSHIBA HDWR11A
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  /dev/sdc:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     TOSHIBA HDWR11A
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  /dev/sdb:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     TOSHIBA HDWR11A
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  /dev/sdl:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     TOSHIBA HDWR11A
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  /dev/sdk:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     ST20000NM007D-3DJ103
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  /dev/sdf:
     MBR Magic : aa55
  Partition[0] :   4294967295 sectors at            1 (type ee)
  	Device Model:     ST20000NM007D-3DJ103
  	SMART support is: Available - device has SMART capability.
  	SMART support is: Enabled
  	=== START OF READ SMART DATA SECTION ===
  	SMART overall-health self-assessment test result: PASSED
  
  md51 : inactive sdd51[3](S) sdb51[0](S) sdc51[1](S) sdl51[4](S) sdk51[6](S) sdf51[5](S)
  /dev/md51:
             Version : 1.2
               State : inactive
              Events : 46655
         -     259       39        -        /dev/sdl51
         -     259        9        -        /dev/sdb51
         -     259       31        -        /dev/sdk51
         -     259       16        -        /dev/sdd51
         -     259        2        -        /dev/sdc51
         -     259       23        -        /dev/sdf51
  sdd51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46670
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46670
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46670
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46655
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46655
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf51
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 46655
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  md52 : inactive sdd52[3](S) sdb52[0](S) sdc52[1](S) sdl52[4](S) sdk52[6](S) sdf52[5](S)
  /dev/md52:
             Version : 1.2
               State : inactive
              Events : 16482
         -     259        3        -        /dev/sdc52
         -     259       24        -        /dev/sdf52
         -     259       40        -        /dev/sdl52
         -     259       10        -        /dev/sdb52
         -     259       32        -        /dev/sdk52
         -     259       17        -        /dev/sdd52
  sdd52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16482
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16482
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16482
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16478
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16478
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf52
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 16478
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  md53 : inactive sdd53[3](S) sdc53[1](S) sdb53[0](S) sdl53[4](S) sdk53[6](S) sdf53[5](S)
  /dev/md53:
             Version : 1.2
               State : inactive
              Events : 41470
         -     259       33        -        /dev/sdk53
         -     259       18        -        /dev/sdd53
         -     259        4        -        /dev/sdc53
         -     259       25        -        /dev/sdf53
         -     259       41        -        /dev/sdl53
         -     259       11        -        /dev/sdb53
  sdd53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 53337
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 53337
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 53337
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 41470
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 41470
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf53
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 41470
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  md54 : inactive sdd54[3](S) sdc54[1](S) sdb54[0](S) sdl54[4](S) sdk54[6](S) sdf54[5](S)
  /dev/md54:
             Version : 1.2
               State : inactive
              Events : 37400
         -     259        5        -        /dev/sdc54
         -     259       26        -        /dev/sdf54
         -     259       42        -        /dev/sdl54
         -     259       12        -        /dev/sdb54
         -     259       34        -        /dev/sdk54
         -     259       19        -        /dev/sdd54
  sdd54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37400
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37400
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37400
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37377
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37377
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf54
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 37377
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  md55 : inactive sdd55[3](S) sdc55[1](S) sdb55[0](S) sdl55[4](S) sdk55[6](S) sdf55[5](S)
  /dev/md55:
             Version : 1.2
               State : inactive
              Events : 42328
         -     259       35        -        /dev/sdk55
         -     259       20        -        /dev/sdd55
         -     259        6        -        /dev/sdc55
         -     259       27        -        /dev/sdf55
         -     259       43        -        /dev/sdl55
         -     259       13        -        /dev/sdb55
  sdd55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42332
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42332
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42332
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42328
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42328
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf55
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 42328
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  md56 : inactive sdd56[3](S) sdb56[0](S) sdc56[1](S) sdl56[4](S) sdk56[6](S) sdf56[5](S)
  /dev/md56:
             Version : 1.2
               State : inactive
              Events : 43091
         -     259        7        -        /dev/sdc56
         -     259       28        -        /dev/sdf56
         -     259       44        -        /dev/sdl56
         -     259       14        -        /dev/sdb56
         -     259       36        -        /dev/sdk56
         -     259       21        -        /dev/sdd56
  sdd56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43091
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdb56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43091
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdc56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43091
     Array State : AAAA.A ('A' == active, '.' == missing, 'R' == replacing)
  sdl56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43087
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdk56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43087
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  sdf56
       Raid Level : raid5
     Raid Devices : 6
            State : clean
           Events : 43087
     Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
  
  ^[[34mDrive timeouts^[[0m: sda ^[[32mY^[[0m ; sdb ^[[32mY^[[0m ; sdc ^[[32mY^[[0m ; sdd ^[[32mY^[[0m ; sde ^[[32mY^[[0m ; sdf ^[[32mY^[[0m ; sdg ^[[33m180^[[0m ; sdh ^[[32mY^[[0m ; sdi ^[[32mY^[[0m ; sdj ^[[32mY^[[0m ; sdk ^[[32mY^[[0m ; sdl ^[[32mY^[[0m ; sdm ^[[32mY^[[0m ; 


:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: all of my drives are spares
  2023-09-08  2:50 all of my drives are spares David T-G
@ 2023-09-09 11:26 ` David T-G
  2023-09-09 18:28   ` Wol
  0 siblings, 1 reply; 7+ messages in thread
From: David T-G @ 2023-09-09 11:26 UTC (permalink / raw)
  To: Linux RAID list

Hi, all --

...and then David T-G home said...
% Hi, all --
% 
% After a surprise reboot the other day, I came home to find diskfarm's
% RAID5 arrays all offline with all disks marked as spares.  wtf?!?
[snip]

Wow ...  I'm used to responses pointing out either what I've left
out or how stupid my setup is, but total silence ...  How did I
offend and how can I fix it?

I sure could use advice on the current hangup before perhaps just
destroying my entire array with the wrong commands ...


With fingers crossed,
:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: all of my drives are spares
  2023-09-09 11:26 ` David T-G
@ 2023-09-09 18:28   ` Wol
  2023-09-10  2:55     ` David T-G
  0 siblings, 1 reply; 7+ messages in thread
From: Wol @ 2023-09-09 18:28 UTC (permalink / raw)
  To: David T-G, Linux RAID list

On 09/09/2023 12:26, David T-G wrote:
> Hi, all --
> 
> ...and then David T-G home said...
> % Hi, all --
> %
> % After a surprise reboot the other day, I came home to find diskfarm's
> % RAID5 arrays all offline with all disks marked as spares.  wtf?!?
> [snip]
> 
> Wow ...  I'm used to responses pointing out either what I've left
> out or how stupid my setup is, but total silence ...  How did I
> offend and how can I fix it?

Sorry, it's usually me that's the quick response, everyone else takes 
ages, and I'm feeling a bit burnt out with life in general at the moment.
> 
> I sure could use advice on the current hangup before perhaps just
> destroying my entire array with the wrong commands ...
> 
I wonder if a controlled reboot would fix it. Or just do a --stop 
followed by an assemble. The big worry is the wildly varying event 
counts. Do your arrays have journals.
> 
> With fingers crossed,
> :-D

If the worst comes to the worst, try a forced assemble with the minimum 
possible drives (no redundancy). Pick the drives with the highest event 
counts. You can then re-add the remaining ones if that works.

Iirc this is actually not uncommon and it shouldn't be hard to recover 
from. I really ought to go through the archives, find a bunch of 
occasions, and write it up.

The only real worry is that the varying event counts mean that some data 
corruption is likely. Recent files, hopefully nothing important. One 
thing that's just struck me, this is often caused by a drive failing 
some while back, and then a glitch on a second drive brings the whole 
thing down. When did you last check your array was fully functional?

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: all of my drives are spares
  2023-09-09 18:28   ` Wol
@ 2023-09-10  2:55     ` David T-G
  2023-09-10  3:11       ` assemble didn't quite (was "Re: all of my drives are spares") David T-G
  2023-09-10  3:44       ` timing (was "Re: all of my drives are spares") David T-G
  0 siblings, 2 replies; 7+ messages in thread
From: David T-G @ 2023-09-10  2:55 UTC (permalink / raw)
  To: Linux RAID list

Wol, et al --

...and then Wol said...
% On 09/09/2023 12:26, David T-G wrote:
% > 
% > Wow ...  I'm used to responses pointing out either what I've left
% > out or how stupid my setup is, but total silence ...  How did I
% > offend and how can I fix it?
% 
% Sorry, it's usually me that's the quick response, everyone else takes ages,

True!


% and I'm feeling a bit burnt out with life in general at the moment.

Oh!  Sorry to hear that.  Not much I can do from here, but I can think
you a hug :-)  I hope things look up soon.


% > 
% > I sure could use advice on the current hangup before perhaps just
% > destroying my entire array with the wrong commands ...
% 
% I wonder if a controlled reboot would fix it. Or just do a --stop followed

I've tried a couple of reboots; they're stuck that way.  I'll try the
stop and assemble.


% by an assemble. The big worry is the wildly varying event counts. Do your
% arrays have journals.

No, I don't think so, unless they're created automagically.  Alas, I
don't recall the exact creation command :-/

How can I check for usre?  The -D flag output doesn't mention a journal,
whether enabled or missing.


% > 
% > With fingers crossed,
% > :-D
% 
% If the worst comes to the worst, try a forced assemble with the minimum
% possible drives (no redundancy). Pick the drives with the highest event
% counts. You can then re-add the remaining ones if that works.

Hmmmmm ...  For a RAID5 array, the minimum would be one left out, right?
So five instead of all six.  And the event counts seem to be three and
three, which is interesting but also doesn't point to any one favorite to
drop :-/


% 
% Iirc this is actually not uncommon and it shouldn't be hard to recover from.
% I really ought to go through the archives, find a bunch of occasions, and
% write it up.

In your copious free time :-)  That would, indeed, be awesome.


% 
% The only real worry is that the varying event counts mean that some data
% corruption is likely. Recent files, hopefully nothing important. One thing

Gaaaaah!

Fortunately, everything is manually mirrored out to external drives, so
if everything did go tits-up I could reload.  I'll come up with a diff
using the externals as the sources and check ... eventually.


% that's just struck me, this is often caused by a drive failing some while
% back, and then a glitch on a second drive brings the whole thing down. When
% did you last check your array was fully functional?

Let me get back to you on that.  It's actually been a couple of weeks in
this state just waiting to get to it; life has been interesting here,
too.  But I have a heartbeat script that might have captured happy data
...  These are all pretty new drives after the 4T drive disaster a year
(or two?) ago, so they *should* be OK.


% 
% Cheers,
% Wol


Thanks again & stay tuned

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* assemble didn't quite (was "Re: all of my drives are spares")
  2023-09-10  2:55     ` David T-G
@ 2023-09-10  3:11       ` David T-G
  2023-09-14 15:59         ` assemble didn't quite David T-G
  2023-09-10  3:44       ` timing (was "Re: all of my drives are spares") David T-G
  1 sibling, 1 reply; 7+ messages in thread
From: David T-G @ 2023-09-10  3:11 UTC (permalink / raw)
  To: Linux RAID list

...and then David T-G home said...
% 
% ...and then Wol said...
% 
...
% % I wonder if a controlled reboot would fix it. Or just do a --stop followed
% 
% I've tried a couple of reboots; they're stuck that way.  I'll try the
% stop and assemble.
[snip]

Stopping was easy:

  diskfarm:~ # for A in 51 52 53 54 55 56 ; do mdadm --stop /dev/md$A ;
  done
  mdadm: stopped /dev/md51
  mdadm: stopped /dev/md52
  mdadm: stopped /dev/md53
  mdadm: stopped /dev/md54
  mdadm: stopped /dev/md55
  mdadm: stopped /dev/md56

Restarting wasn't as impressive:

  diskfarm:~ # for A in 51 52 53 54 55 56 ; do mdadm --assemble /dev/md$A /dev/sd[dcblfk]$A ; done
  mdadm: /dev/md51 assembled from 3 drives - not enough to start the array.
  mdadm: /dev/md52 assembled from 3 drives - not enough to start the array.
  mdadm: /dev/md53 assembled from 3 drives - not enough to start the array.
  mdadm: /dev/md54 assembled from 3 drives - not enough to start the array.
  mdadm: /dev/md55 assembled from 3 drives - not enough to start the array.
  mdadm: /dev/md56 assembled from 3 drives - not enough to start the array.

  diskfarm:~ # mdadm --detail /dev/md51
  /dev/md51:
	     Version : 1.2
	  Raid Level : raid5
       Total Devices : 6
	 Persistence : Superblock is persistent
  
	       State : inactive
     Working Devices : 6
  
		Name : diskfarm:51  (local to host diskfarm)
		UUID : 9330e44f:35baf039:7e971a8e:da983e31
	      Events : 46655
  
      Number   Major   Minor   RaidDevice
  
	 -     259       39        -        /dev/sdl51
	 -     259        9        -        /dev/sdb51
	 -     259       31        -        /dev/sdk51
	 -     259       16        -        /dev/sdd51
	 -     259        2        -        /dev/sdc51
	 -     259       23        -        /dev/sdf51
  diskfarm:~ # grep md51 /proc/mdstat
  md51 : inactive sdk51[6](S) sdl51[4](S) sdf51[5](S) sdc51[1](S) sdd51[3](S) sdb51[0](S)

Still all spares.  And here I was hoping this would be easy ...


Thanks again

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* timing (was "Re: all of my drives are spares")
  2023-09-10  2:55     ` David T-G
  2023-09-10  3:11       ` assemble didn't quite (was "Re: all of my drives are spares") David T-G
@ 2023-09-10  3:44       ` David T-G
  1 sibling, 0 replies; 7+ messages in thread
From: David T-G @ 2023-09-10  3:44 UTC (permalink / raw)
  To: Linux RAID list

One more time this evening ...

...and then David T-G home said...
% 
% ...and then Wol said...
...
% % that's just struck me, this is often caused by a drive failing some while
% % back, and then a glitch on a second drive brings the whole thing down. When
% % did you last check your array was fully functional?
% 
% Let me get back to you on that.  It's actually been a couple of weeks in
% this state just waiting to get to it; life has been interesting here,
[snip]

Apparently less than a couple of weeks after all.  That's what I get for
not knowing where I'll sleep each night and losing track of the days as a
result...

Anyway, here are a couple of clips from 08/29:

  ######################################################################
   02:55:01  up  22:42,  0 users,  load average: 6.41, 6.73, 6.49
  Personalities : [raid1] [raid6] [raid5] [raid4] [linear]
  md50 : active linear md52[1] md54[3] md56[5] md51[0] md53[2] md55[4]
	29289848832 blocks super 1.2 0k rounding

  md4 : active raid1 sde4[0] sda4[2]
	142972224 blocks super 1.2 [2/2] [UU]
	bitmap: 1/2 pages [4KB], 65536KB chunk

  md3 : active raid1 sde3[0] sda3[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md56 : active raid5 sdd56[3] sdc56[1] sdb56[0] sdf56[5] sdl56[4] sdk56[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	  resync=DELAYED
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md55 : active raid5 sdd55[3] sdc55[1] sdb55[0] sdf55[5] sdl55[4] sdk55[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	  resync=DELAYED
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md53 : active raid5 sdd53[3] sdc53[1] sdb53[0] sdf53[5] sdl53[4] sdk53[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	[>....................]  check =  4.2% (69399936/1627261952) finish=7901.9min speed=3285K/sec
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md51 : active raid5 sdb51[0] sdd51[3] sdc51[1] sdf51[5] sdl51[4] sdk51[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md54 : active raid5 sdd54[3] sdc54[1] sdb54[0] sdf54[5] sdl54[4] sdk54[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	  resync=DELAYED
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md52 : active raid5 sdd52[3] sdc52[1] sdb52[0] sdf52[5] sdl52[4] sdk52[6]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
	  resync=DELAYED
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md2 : active raid1 sde2[0] sda2[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md1 : active raid1 sde1[0] sda1[2]
	35609600 blocks super 1.2 [2/2] [UU]

  unused devices: <none>
  ...
  ######################################################################
   03:00:01  up  22:47,  0 users,  load average: 3.75, 5.86, 6.28
  Personalities : [raid1] [raid6] [raid5] [raid4] [linear]
  md50 : active linear md52[1] md54[3] md56[5] md51[0] md53[2] md55[4]
	29289848832 blocks super 1.2 0k rounding

  md4 : active raid1 sde4[0] sda4[2]
	142972224 blocks super 1.2 [2/2] [UU]
	bitmap: 1/2 pages [4KB], 65536KB chunk

  md3 : active raid1 sde3[0] sda3[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md56 : active raid5 sdd56[3] sdc56[1] sdb56[0] sdf56[5] sdl56[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md55 : active raid5 sdd55[3] sdc55[1] sdb55[0] sdf55[5] sdl55[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md53 : active raid5 sdd53[3] sdc53[1] sdb53[0] sdf53[5] sdl53[4] sdk53[6](F)
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	[>....................]  check =  4.2% (69789932/1627261952) finish=17276.4min speed=1502K/sec
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md51 : active raid5 sdb51[0] sdd51[3] sdc51[1] sdf51[5] sdl51[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md54 : active raid5 sdd54[3] sdc54[1] sdb54[0] sdf54[5] sdl54[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md52 : active raid5 sdd52[3] sdc52[1] sdb52[0] sdf52[5] sdl52[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md2 : active raid1 sde2[0] sda2[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md1 : active raid1 sde1[0] sda1[2]
	35609600 blocks super 1.2 [2/2] [UU]

  unused devices: <none>
  /dev/sdn: Avolusion PRO-5X:  drive supported, but it doesn't have a temperature sensor.
  /dev/sdo: Seagate BUP BL:  drive supported, but it doesn't have a temperature sensor.
  /dev/sda: SATA SSD: 33302260C
  /dev/sdb: TOSHIBA HDWR11A: 43302260C
  /dev/sdc: TOSHIBA HDWR11A: 41302260C
  /dev/sdd: TOSHIBA HDWR11A: 42302260C
  /dev/sde: SATA SSD: 33302260C
  /dev/sdf: : S.M.A.R.T. not available
  /dev/sdg: : S.M.A.R.T. not available
  /dev/sdh: : S.M.A.R.T. not available
  /dev/sdp: WDC WD2500BEKT-75A25T0: S.M.A.R.T. not available
  /dev/sdq: WDC WD3200BEVT-60ZCT0: S.M.A.R.T. not available
  /dev/sdr: WD easystore 25FB: S.M.A.R.T. not available
  /dev/sds: WD easystore 264D: S.M.A.R.T. not available
  /dev/sdt: ST9120822A: S.M.A.R.T. not available
  /dev/sdu: WD Elements 25A3: S.M.A.R.T. not available

That's where sdk, a brand new EXOS 20T drive, apparently keeled over.
Hmmmmm.  Notice the temps check display; half of the SATA expansion card
(sdf - sdm) is missing.  Ouch.

Things ran fine like that for a day, until early on 08/30 we seem to
have keeled over.

  ######################################################################
   01:10:01  up 1 day 20:57,  0 users,  load average: 2.31, 2.11, 1.12
  Personalities : [raid1] [raid6] [raid5] [raid4] [linear]
  md50 : active linear md52[1] md54[3] md56[5] md51[0] md53[2] md55[4]
	29289848832 blocks super 1.2 0k rounding

  md4 : active raid1 sde4[0] sda4[2]
	142972224 blocks super 1.2 [2/2] [UU]
	bitmap: 0/2 pages [0KB], 65536KB chunk

  md3 : active raid1 sde3[0] sda3[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md56 : active raid5 sdd56[3] sdc56[1] sdb56[0] sdf56[5] sdl56[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md55 : active raid5 sdd55[3] sdc55[1] sdb55[0] sdf55[5] sdl55[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md53 : active raid5 sdd53[3] sdc53[1] sdb53[0] sdf53[5] sdl53[4] sdk53[6](F)
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md51 : active raid5 sdb51[0] sdd51[3] sdc51[1] sdf51[5] sdl51[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md54 : active raid5 sdd54[3] sdc54[1] sdb54[0] sdf54[5] sdl54[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 1/13 pages [4KB], 65536KB chunk

  md52 : active raid5 sdd52[3] sdc52[1] sdb52[0] sdf52[5] sdl52[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUUU_U]
	bitmap: 0/13 pages [0KB], 65536KB chunk

  md2 : active raid1 sde2[0] sda2[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md1 : active raid1 sde1[0] sda1[2]
	35609600 blocks super 1.2 [2/2] [UU]

  unused devices: <none>
  ...
  ######################################################################
   01:15:02  up 1 day 21:02,  0 users,  load average: 0.16, 0.84, 0.84
  Personalities : [raid1] [raid6] [raid5] [raid4] [linear]
  md50 : active linear md52[1] md54[3] md56[5] md51[0] md53[2] md55[4]
	29289848832 blocks super 1.2 0k rounding

  md4 : active raid1 sde4[0] sda4[2]
	142972224 blocks super 1.2 [2/2] [UU]
	bitmap: 0/2 pages [0KB], 65536KB chunk

  md3 : active raid1 sde3[0] sda3[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md56 : active raid5 sdd56[3] sdc56[1] sdb56[0] sdf56[5] sdl56[4]
	8136309760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [

We came up again at 0240 the next day (08/31), and everything was a
spare.

  ######################################################################
   02:40:01  up   0:18,  27 users,  load average: 0.00, 0.03, 0.10
  Personalities : [raid1]
  md4 : active raid1 sde4[0] sda4[2]
	142972224 blocks super 1.2 [2/2] [UU]
	bitmap: 0/2 pages [0KB], 65536KB chunk

  md3 : active raid1 sde3[0] sda3[2]
	35617792 blocks super 1.2 [2/2] [UU]

  md55 : inactive sdd55[3](S) sdc55[1](S) sdb55[0](S) sdl55[4](S) sdk55[6](S) sdf55[5](S)
	9763571712 blocks super 1.2

  md56 : inactive sdd56[3](S) sdb56[0](S) sdc56[1](S) sdl56[4](S) sdk56[6](S) sdf56[5](S)
	9763571712 blocks super 1.2

  md53 : inactive sdd53[3](S) sdc53[1](S) sdb53[0](S) sdl53[4](S) sdk53[6](S) sdf53[5](S)
	9763571712 blocks super 1.2

  md54 : inactive sdd54[3](S) sdc54[1](S) sdb54[0](S) sdl54[4](S) sdk54[6](S) sdf54[5](S)
	9763571712 blocks super 1.2

  md52 : inactive sdd52[3](S) sdb52[0](S) sdc52[1](S) sdl52[4](S) sdk52[6](S) sdf52[5](S)
	9763571712 blocks super 1.2

  md51 : inactive sdd51[3](S) sdb51[0](S) sdc51[1](S) sdl51[4](S) sdk51[6](S) sdf51[5](S)
	9763571712 blocks super 1.2

  md1 : active raid1 sde1[0] sda1[2]
	35609600 blocks super 1.2 [2/2] [UU]

  md2 : active raid1 sde2[0] sda2[2]
	35617792 blocks super 1.2 [2/2] [UU]

  unused devices: <none>
  /dev/sdb: TOSHIBA HDWR11A: drive is sleeping
  /dev/sdc: TOSHIBA HDWR11A: drive is sleeping
  /dev/sdd: TOSHIBA HDWR11A: drive is sleeping
  /dev/sdf: ST20000NM007D-3DJ103: drive is sleeping
  /dev/sdk: ST20000NM007D-3DJ103: drive is sleeping
  /dev/sdl: TOSHIBA HDWR11A: drive is sleeping
  /dev/sdn: Avolusion PRO-5X:  drive supported, but it doesn't have a temperature sensor.
  /dev/sdo: Seagate BUP BL:  drive supported, but it doesn't have a temperature sensor.
  /dev/sda: SATA SSD: 33302260C
  /dev/sde: SATA SSD: 33302260C
  /dev/sdg: WDC WD7500BPKX-75HPJT0: 31302260C
  /dev/sdh: TOSHIBA MQ01ABD064: 33302260C
  /dev/sdi: ST3500413AS: 37302260C
  /dev/sdj: TOSHIBA MQ01ABD100: 33302260C
  /dev/sdm: Hitachi HDE721010SLA330: 43302260C
  /dev/sdp: WDC WD2500BEKT-75A25T0: S.M.A.R.T. not available
  /dev/sdq: WDC WD3200BEVT-60ZCT0: S.M.A.R.T. not available
  /dev/sdr: ST9120822A: S.M.A.R.T. not available
  /dev/sds: WD Elements 25A3: S.M.A.R.T. not available
  /dev/sdt: WD easystore 264D: S.M.A.R.T. not available
  /dev/sdu: WD easystore 25FB: S.M.A.R.T. not available

The whole SATA card is present, too; yay.  So rebooting helps.  But ...
Now I'm not sure how to get back to reassembly.


Thanks again and good night to all

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: assemble didn't quite
  2023-09-10  3:11       ` assemble didn't quite (was "Re: all of my drives are spares") David T-G
@ 2023-09-14 15:59         ` David T-G
  0 siblings, 0 replies; 7+ messages in thread
From: David T-G @ 2023-09-14 15:59 UTC (permalink / raw)
  To: Linux RAID list

Hi again, all --

...and then David T-G home said...
% ...and then David T-G home said...
% % 
% % ...and then Wol said...
% % 
% ...
% % % I wonder if a controlled reboot would fix it. Or just do a --stop followed
% % 
% % I've tried a couple of reboots; they're stuck that way.  I'll try the
% % stop and assemble.
% [snip]
% 
% Stopping was easy:
...
% 
% Restarting wasn't as impressive:
% 
%   diskfarm:~ # for A in 51 52 53 54 55 56 ; do mdadm --assemble /dev/md$A /dev/sd[dcblfk]$A ; done
%   mdadm: /dev/md51 assembled from 3 drives - not enough to start the array.
...
%   diskfarm:~ # grep md51 /proc/mdstat
%   md51 : inactive sdk51[6](S) sdl51[4](S) sdf51[5](S) sdc51[1](S) sdd51[3](S) sdb51[0](S)
% 
% Still all spares.  And here I was hoping this would be easy ...

In the end, after a few more tries, 

  --stop /dev/mdNN
  --assemble --force /dev/mdNN /dev/sd{d,c,b,l,f,k}NN

worked, at which point md50 the linear array happily lit itself up.  So
far, all seems good.  Yay!

Now about the size ...


Thanks again & HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-14 16:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-08  2:50 all of my drives are spares David T-G
2023-09-09 11:26 ` David T-G
2023-09-09 18:28   ` Wol
2023-09-10  2:55     ` David T-G
2023-09-10  3:11       ` assemble didn't quite (was "Re: all of my drives are spares") David T-G
2023-09-14 15:59         ` assemble didn't quite David T-G
2023-09-10  3:44       ` timing (was "Re: all of my drives are spares") David T-G

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).