linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* reshape failure
@ 2011-02-16 15:46 Tobias McNulty
  2011-02-16 20:32 ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Tobias McNulty @ 2011-02-16 15:46 UTC (permalink / raw)
  To: linux-raid

Hi,

I tried to start a reshape over the weekend (RAID6 -> RAID5) and was
dismayed to see that it was going to take roughly 2 weeks to complete:

md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1]
5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5]
[uuuuu] [>....................] reshape = 0.0% (245760/1953514496)
finish=21189.7min speed=1536K/sec

The disk that contained the backup file began experiencing SATA errors
several days into the reshape, due to what turned out to be a faulty
SATA card.  The card has since been replaced and the RAID1 device that
contains the backup file successfully resync'ed.

However, when I try to re-start the reshape now, I get the following error:

nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
/dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: Failed to restore critical section for reshape, sorry.

Is my data lost for good?  Is there anything else I can do?

Thanks,
Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-02-16 15:46 reshape failure Tobias McNulty
@ 2011-02-16 20:32 ` NeilBrown
  2011-02-16 20:41   ` Tobias McNulty
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2011-02-16 20:32 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid

On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com>
wrote:

> Hi,
> 
> I tried to start a reshape over the weekend (RAID6 -> RAID5) and was
> dismayed to see that it was going to take roughly 2 weeks to complete:
> 
> md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1]
> 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5]
> [uuuuu] [>....................] reshape = 0.0% (245760/1953514496)
> finish=21189.7min speed=1536K/sec
> 
> The disk that contained the backup file began experiencing SATA errors
> several days into the reshape, due to what turned out to be a faulty
> SATA card.  The card has since been replaced and the RAID1 device that
> contains the backup file successfully resync'ed.
> 
> However, when I try to re-start the reshape now, I get the following error:
> 
> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: Failed to restore critical section for reshape, sorry.
> 
> Is my data lost for good?  Is there anything else I can do?

Try above command with --verbose.
If a message about "too-old timestamp" appears, run

 export MDADM_GROW_ALLOW_OLD=1

and run the command again.

In either case, post the output.

NeilBrown

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-02-16 20:32 ` NeilBrown
@ 2011-02-16 20:41   ` Tobias McNulty
  2011-02-16 21:06     ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Tobias McNulty @ 2011-02-16 20:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com>
>> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
>> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> Is my data lost for good?  Is there anything else I can do?
>
> Try above command with --verbose.
> If a message about "too-old timestamp" appears, run
>
>  export MDADM_GROW_ALLOW_OLD=1
>
> and run the command again.
>
> In either case, post the output.

Wow - it looks like that might have done the trick:

nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup
/dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
mdadm:/dev/md0 has an active reshape - checking if critical section
needs to be restored
mdadm: too-old timestamp on backup-metadata on md0.backup
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
nas:~# export MDADM_GROW_ALLOW_OLD=1
nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup
/dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
mdadm:/dev/md0 has an active reshape - checking if critical section
needs to be restored
mdadm: accepting backup with timestamp 1297624561 for array with
timestamp 1297692473
mdadm: restoring critical section
mdadm: added /dev/sde to /dev/md0 as 1
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sdc to /dev/md0 as 3
mdadm: added /dev/sdh to /dev/md0 as 4
mdadm: added /dev/sdg to /dev/md0 as 5
mdadm: added /dev/sdf to /dev/md0 as 0
mdadm: /dev/md0 has been started with 5 drives and 1 spare.

Now I see this in /etc/mdstat:

md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
      5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
      [=>...................]  reshape =  9.9% (193691648/1953514496)
finish=97156886.4min speed=0K/sec

Is the 0K/sec something I need to worry about?

Thanks!
Tobias
-- 
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-02-16 20:41   ` Tobias McNulty
@ 2011-02-16 21:06     ` NeilBrown
  2011-02-17 21:39       ` Tobias McNulty
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2011-02-16 21:06 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid

On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com>
wrote:

> On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty <tobias@caktusgroup.com>
> >> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
> >> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> >> mdadm: Failed to restore critical section for reshape, sorry.
> >>
> >> Is my data lost for good?  Is there anything else I can do?
> >
> > Try above command with --verbose.
> > If a message about "too-old timestamp" appears, run
> >
> >  export MDADM_GROW_ALLOW_OLD=1
> >
> > and run the command again.
> >
> > In either case, post the output.
> 
> Wow - it looks like that might have done the trick:
> 
> nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup
> /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
> mdadm:/dev/md0 has an active reshape - checking if critical section
> needs to be restored
> mdadm: too-old timestamp on backup-metadata on md0.backup
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
> nas:~# export MDADM_GROW_ALLOW_OLD=1
> nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=md0.backup
> /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
> mdadm:/dev/md0 has an active reshape - checking if critical section
> needs to be restored
> mdadm: accepting backup with timestamp 1297624561 for array with
> timestamp 1297692473
> mdadm: restoring critical section
> mdadm: added /dev/sde to /dev/md0 as 1
> mdadm: added /dev/sdd to /dev/md0 as 2
> mdadm: added /dev/sdc to /dev/md0 as 3
> mdadm: added /dev/sdh to /dev/md0 as 4
> mdadm: added /dev/sdg to /dev/md0 as 5
> mdadm: added /dev/sdf to /dev/md0 as 0
> mdadm: /dev/md0 has been started with 5 drives and 1 spare.

That is what I expected..

> 
> Now I see this in /etc/mdstat:
> 
> md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
>       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [=>...................]  reshape =  9.9% (193691648/1953514496)
> finish=97156886.4min speed=0K/sec
> 
> Is the 0K/sec something I need to worry about?

Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.  It is
something to worry about.

Is there an 'mdadm' running in the background?  Can you 'strace' it for a few
seconds?

What does
   grep . /sys/block/md0/md/*
show?   Maybe do it twice, 1 minute apart.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-02-16 21:06     ` NeilBrown
@ 2011-02-17 21:39       ` Tobias McNulty
  2011-05-11 18:06         ` Tobias McNulty
  0 siblings, 1 reply; 21+ messages in thread
From: Tobias McNulty @ 2011-02-17 21:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com>
> wrote:
> >
> > Now I see this in /etc/mdstat:
> >
> > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> >       [=>...................]  reshape =  9.9% (193691648/1953514496)
> > finish=97156886.4min speed=0K/sec
> >
> > Is the 0K/sec something I need to worry about?
>
> Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.  It is
> something to worry about.

It seems like it was another buggy SATA HBA?? I moved everything back
to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
this time):

md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
      5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
      [==>..................]  reshape = 10.0% (196960192/1953514496)
finish=11376.9min speed=2572K/sec

Is it really possible that I had two buggy SATA cards, from different
manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
something very basic about connecting SATA drives to something other
than the on-board ports?

Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
(2.6.32-5-amd64).

Tobias

[1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
[2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-02-17 21:39       ` Tobias McNulty
@ 2011-05-11 18:06         ` Tobias McNulty
  2011-05-11 21:12           ` NeilBrown
  0 siblings, 1 reply; 21+ messages in thread
From: Tobias McNulty @ 2011-05-11 18:06 UTC (permalink / raw)
  To: linux-raid

On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote:
>
> On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com>
> > wrote:
> > >
> > > Now I see this in /etc/mdstat:
> > >
> > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> > >       [=>...................]  reshape =  9.9% (193691648/1953514496)
> > > finish=97156886.4min speed=0K/sec
> > >
> > > Is the 0K/sec something I need to worry about?
> >
> > Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.  It is
> > something to worry about.
>
> It seems like it was another buggy SATA HBA?? I moved everything back
> to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
> this time):
>
> md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [==>..................]  reshape = 10.0% (196960192/1953514496)
> finish=11376.9min speed=2572K/sec
>
> Is it really possible that I had two buggy SATA cards, from different
> manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
> something very basic about connecting SATA drives to something other
> than the on-board ports?
>
> Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> (2.6.32-5-amd64).
>
> Tobias
>
> [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
> [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm

So, after figuring out the hardware issues, the reshape appears to
have completed successfully (hurray!), but /proc/mdstat still says
that the array is level 6.  Is there another command I have to run to
put the finishing touches on the conversion?

md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
      5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]

Thank you!
Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-05-11 18:06         ` Tobias McNulty
@ 2011-05-11 21:12           ` NeilBrown
  2011-05-11 21:19             ` Tobias McNulty
       [not found]             ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>
  0 siblings, 2 replies; 21+ messages in thread
From: NeilBrown @ 2011-05-11 21:12 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid

On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com>
wrote:

> On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote:
> >
> > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
> > >
> > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com>
> > > wrote:
> > > >
> > > > Now I see this in /etc/mdstat:
> > > >
> > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> > > >       [=>...................]  reshape =  9.9% (193691648/1953514496)
> > > > finish=97156886.4min speed=0K/sec
> > > >
> > > > Is the 0K/sec something I need to worry about?
> > >
> > > Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.  It is
> > > something to worry about.
> >
> > It seems like it was another buggy SATA HBA?? I moved everything back
> > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
> > this time):
> >
> > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> >       [==>..................]  reshape = 10.0% (196960192/1953514496)
> > finish=11376.9min speed=2572K/sec
> >
> > Is it really possible that I had two buggy SATA cards, from different
> > manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
> > something very basic about connecting SATA drives to something other
> > than the on-board ports?
> >
> > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > (2.6.32-5-amd64).
> >
> > Tobias
> >
> > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
> > [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm
> 
> So, after figuring out the hardware issues, the reshape appears to
> have completed successfully (hurray!), but /proc/mdstat still says
> that the array is level 6.  Is there another command I have to run to
> put the finishing touches on the conversion?
> 
> md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>       5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]
> 

Just
   mdadm --grow /dev/md0 --level=5

should complete instantly. (assuming I'm correct in thinking that you want
this to be a raid5 array - I don't really remember the details anymore :-)


NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-05-11 21:12           ` NeilBrown
@ 2011-05-11 21:19             ` Tobias McNulty
       [not found]             ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>
  1 sibling, 0 replies; 21+ messages in thread
From: Tobias McNulty @ 2011-05-11 21:19 UTC (permalink / raw)
  To: linux-raid

On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com>
> wrote:
>
> > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com> wrote:
> > >
> > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
> > > >
> > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <tobias@caktusgroup.com>
> > > > wrote:
> > > > >
> > > > > Now I see this in /etc/mdstat:
> > > > >
> > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> > > > >       [=>...................]  reshape =  9.9% (193691648/1953514496)
> > > > > finish=97156886.4min speed=0K/sec
> > > > >
> > > > > Is the 0K/sec something I need to worry about?
> > > >
> > > > Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.  It is
> > > > something to worry about.
> > >
> > > It seems like it was another buggy SATA HBA?? I moved everything back
> > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
> > > this time):
> > >
> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]
> > >       [==>..................]  reshape = 10.0% (196960192/1953514496)
> > > finish=11376.9min speed=2572K/sec
> > >
> > > Is it really possible that I had two buggy SATA cards, from different
> > > manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
> > > something very basic about connecting SATA drives to something other
> > > than the on-board ports?
> > >
> > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > > (2.6.32-5-amd64).
> > >
> > > Tobias
> > >
> > > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
> > > [2] http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm
> >
> > So, after figuring out the hardware issues, the reshape appears to
> > have completed successfully (hurray!), but /proc/mdstat still says
> > that the array is level 6.  Is there another command I have to run to
> > put the finishing touches on the conversion?
> >
> > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> >       5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]
> >
>
> Just
>   mdadm --grow /dev/md0 --level=5
>
> should complete instantly. (assuming I'm correct in thinking that you want
> this to be a raid5 array - I don't really remember the details anymore :-)


Bingo!  Thanks.

md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
      5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

And I even ended up with a spare disk (wasn't sure how that part was
going to work).

Do you always have to run that command twice, or only if the reshape
is interrupted?  At least, I thought that was the same command I ran
originally to kick it off.

Thanks again.

Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
       [not found]             ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>
@ 2011-05-11 21:34               ` NeilBrown
  2011-05-12  0:46                 ` Tobias McNulty
  0 siblings, 1 reply; 21+ messages in thread
From: NeilBrown @ 2011-05-11 21:34 UTC (permalink / raw)
  To: Tobias McNulty; +Cc: linux-raid

On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty <tobias@caktusgroup.com>
wrote:

> On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote:
> 
> > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com>
> > wrote:
> >
> > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com>
> > wrote:
> > > >
> > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
> > > > >
> > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <
> > tobias@caktusgroup.com>
> > > > > wrote:
> > > > > >
> > > > > > Now I see this in /etc/mdstat:
> > > > > >
> > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
> > [5/5] [UUUUU]
> > > > > >       [=>...................]  reshape =  9.9%
> > (193691648/1953514496)
> > > > > > finish=97156886.4min speed=0K/sec
> > > > > >
> > > > > > Is the 0K/sec something I need to worry about?
> > > > >
> > > > > Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.
> >  It is
> > > > > something to worry about.
> > > >
> > > > It seems like it was another buggy SATA HBA?? I moved everything back
> > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
> > > > this time):
> > > >
> > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
> > [5/5] [UUUUU]
> > > >       [==>..................]  reshape = 10.0% (196960192/1953514496)
> > > > finish=11376.9min speed=2572K/sec
> > > >
> > > > Is it really possible that I had two buggy SATA cards, from different
> > > > manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
> > > > something very basic about connecting SATA drives to something other
> > > > than the on-board ports?
> > > >
> > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > > > (2.6.32-5-amd64).
> > > >
> > > > Tobias
> > > >
> > > > [1]
> > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
> > > > [2]
> > http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm
> > >
> > > So, after figuring out the hardware issues, the reshape appears to
> > > have completed successfully (hurray!), but /proc/mdstat still says
> > > that the array is level 6.  Is there another command I have to run to
> > > put the finishing touches on the conversion?
> > >
> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > >       5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]
> > >
> >
> > Just
> >   mdadm --grow /dev/md0 --level=5
> >
> > should complete instantly. (assuming I'm correct in thinking that you want
> > this to be a raid5 array - I don't really remember the details anymore :-)
> 
> 
> Bingo!  Thanks.
> 
> md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
>       5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
> 
> And I even ended up with a spare disk (wasn't sure how that part was going
> to work).
> 
> Do you always have to run that command twice, or only if the reshape is
> interrupted?  At least, I thought that was the same command I ran originally
> to kick it off.

Only if it is interrupted.  The array doesn't know that a level change is
needed after the layout change is completed, only the mdadm process knows
that.  And it has died.

I could probably get the array itself to 'know' this... one day.

NeilBrown


> 
> Thanks again.
> 
> Tobias


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: reshape failure
  2011-05-11 21:34               ` NeilBrown
@ 2011-05-12  0:46                 ` Tobias McNulty
  0 siblings, 0 replies; 21+ messages in thread
From: Tobias McNulty @ 2011-05-12  0:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, May 11, 2011 at 5:34 PM, NeilBrown <neilb@suse.de> wrote:
> On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty <tobias@caktusgroup.com>
> wrote:
>
>> On Wed, May 11, 2011 at 5:12 PM, NeilBrown <neilb@suse.de> wrote:
>>
>> > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty <tobias@caktusgroup.com>
>> > wrote:
>> >
>> > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty <tobias@caktusgroup.com>
>> > wrote:
>> > > >
>> > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown <neilb@suse.de> wrote:
>> > > > >
>> > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <
>> > tobias@caktusgroup.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Now I see this in /etc/mdstat:
>> > > > > >
>> > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
>> > > > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
>> > [5/5] [UUUUU]
>> > > > > >       [=>...................]  reshape =  9.9%
>> > (193691648/1953514496)
>> > > > > > finish=97156886.4min speed=0K/sec
>> > > > > >
>> > > > > > Is the 0K/sec something I need to worry about?
>> > > > >
>> > > > > Maybe.  If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.
>> >  It is
>> > > > > something to worry about.
>> > > >
>> > > > It seems like it was another buggy SATA HBA?? I moved everything back
>> > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
>> > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
>> > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
>> > > > this time):
>> > > >
>> > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>> > > >       5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
>> > [5/5] [UUUUU]
>> > > >       [==>..................]  reshape = 10.0% (196960192/1953514496)
>> > > > finish=11376.9min speed=2572K/sec
>> > > >
>> > > > Is it really possible that I had two buggy SATA cards, from different
>> > > > manufacturers?  Perhaps the motherboard is at fault?  Or am I missing
>> > > > something very basic about connecting SATA drives to something other
>> > > > than the on-board ports?
>> > > >
>> > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
>> > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
>> > > > (2.6.32-5-amd64).
>> > > >
>> > > > Tobias
>> > > >
>> > > > [1]
>> > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
>> > > > [2]
>> > http://www.supermicro.com/products/accessories/addon/AOC-SASLP-MV8.cfm
>> > >
>> > > So, after figuring out the hardware issues, the reshape appears to
>> > > have completed successfully (hurray!), but /proc/mdstat still says
>> > > that the array is level 6.  Is there another command I have to run to
>> > > put the finishing touches on the conversion?
>> > >
>> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>> > >       5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]
>> > >
>> >
>> > Just
>> >   mdadm --grow /dev/md0 --level=5
>> >
>> > should complete instantly. (assuming I'm correct in thinking that you want
>> > this to be a raid5 array - I don't really remember the details anymore :-)
>>
>>
>> Bingo!  Thanks.
>>
>> md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
>>       5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>>
>> And I even ended up with a spare disk (wasn't sure how that part was going
>> to work).
>>
>> Do you always have to run that command twice, or only if the reshape is
>> interrupted?  At least, I thought that was the same command I ran originally
>> to kick it off.
>
> Only if it is interrupted.  The array doesn't know that a level change is
> needed after the layout change is completed, only the mdadm process knows
> that.  And it has died.
>
> I could probably get the array itself to 'know' this... one day.
>
> NeilBrown

Hey, it makes perfect sense to me know that I know it's the expected
behavior.  I might have even tried it myself if I wasn't worried about
screwing up the array, again. :-)

Thanks
Tobias
-- 
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Reshape Failure
@ 2023-09-03 21:39 Jason Moss
  2023-09-04  1:41 ` Yu Kuai
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-03 21:39 UTC (permalink / raw)
  To: linux-raid

Hello,

I recently attempted to add a new drive to my 8-drive RAID 6 array,
growing it to 9 drives. I've done similar before with the same array,
having previously grown it from 6 drives to 7 and then from 7 to 8
with no issues. Drives are WD Reds, most older than 2019, some
(including the newest) newer, but all confirmed CMR and not SMR.

Process used to expand the array:
mdadm --add /dev/md0 /dev/sdb1
mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0

The reshape started off fine, the process was underway, and the volume
was still usable as expected. However, 15-30 minutes into the reshape,
I lost access to the contents of the drive. Checking /proc/mdstat, the
reshape was stopped at 0.6% with the counter not incrementing at all.
Any process accessing the array would just hang until killed. I waited
a half hour and there was still no further change to the counter. At
this point, I restarted the server and found that when it came back up
it would begin reshaping again, but only very briefly, under 30
seconds, but the counter would be increasing during that time.

I searched furiously for ideas and tried stopping and reassembling the
array, assembling with an invalid-backup flag, echoing "frozen" then
"reshape" to the sync_action file, and echoing "max" to the sync_max
file. Nothing ever seemed to make a difference.

Here is where I slightly panicked, worried that I'd borked my array,
and powered off the server again and disconnected the new drive that
was just added, assuming that since it was the change, it may be the
problem despite having burn-in tested it, and figuring that I'll rush
order a new drive, so long as the reshape continues and I can just
rebuild onto a new drive once the reshape finishes. However, this made
no difference and the array continued to not rebuild.

Much searching later, I'd found nothing substantially different then
I'd already tried and one of the common threads in other people's
issues was bad drives, so I ran a self-test against each of the
existing drives and found one drive that failed the read test.
Thinking I had the culprit now, I dropped that drive out of the array
and assembled the array again, but the same behavior persists. The
array reshapes very briefly, then completely stops.

Down to 0 drives of redundancy (in the reshaped section at least), not
finding any new ideas on any of the forums, mailing list, wiki, etc,
and very frustrated, I took a break, bought all new drives to build a
new array in another server and restored from a backup. However, there
is still some data not captured by the most recent backup that I would
like to recover, and I'd also like to solve the problem purely to
understand what happened and how to recover in the future.

Is there anything else I should try to recover this array, or is this
a lost cause?

Details requested by the wiki to follow and I'm happy to collect any
further data that would assist. /dev/sdb is the new drive that was
added, then disconnected. /dev/sdh is the drive that failed a
self-test and was removed from the array.

Thank you in advance for any help provided!


$ uname -a
Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux

$ mdadm --version
mdadm - v4.2 - 2021-12-30


$ sudo smartctl -H -i -l scterc /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N7AT7R7X
LU WWN Device Id: 5 0014ee 268545f93
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:27:55 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N7AT7R7X
LU WWN Device Id: 5 0014ee 268545f93
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:16 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WXG1A8UGLS42
LU WWN Device Id: 5 0014ee 2b75ef53b
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:19 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N4HYL32Y
LU WWN Device Id: 5 0014ee 2630752f8
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:20 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68N32N0
Serial Number:    WD-WCC7K1FF6DYK
LU WWN Device Id: 5 0014ee 2ba952a30
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:21 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sde
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N5ZHTRJF
LU WWN Device Id: 5 0014ee 2b88b83bb
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:22 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3804790
LU WWN Device Id: 5 0014ee 6036b6826
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:23 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdg
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0H692Z9
LU WWN Device Id: 5 0014ee 65af39740
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:24 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdh
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0K5S750
LU WWN Device Id: 5 0014ee 6b048d9ca
Firmware Version: 82.00A82
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:24 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -H -i -l scterc /dev/sdi
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T1502475
LU WWN Device Id: 5 0014ee 058d2e5cb
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Sep  3 13:28:27 2023 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)


$ sudo mdadm --examine /dev/sda
/dev/sda:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xd
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247728 sectors, after=14336 sectors
          State : clean
    Device UUID : 8ca60ad5:60d19333:11b24820:91453532

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 24 sectors - bad
blocks present.
       Checksum : b6d8f4d1 - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdb
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247728 sectors, after=14336 sectors
          State : clean
    Device UUID : 386d3001:16447e43:4d2a5459:85618d11

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 00:02:59 2023
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : b544a39 - correct
         Events : 181077

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 8
   Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xd
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : 88d8b8fc - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdd
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247728 sectors, after=14336 sectors
          State : clean
    Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : d1471d9d - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sde
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : e05d0278 - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdf
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdf1
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 26792cc0 - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdg
/dev/sdg:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdg1
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : 74476ce7:4edc23f6:08120711:ba281425

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 6f67d179 - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdh
/dev/sdh:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdh1
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0xd
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 20:09:14 2023
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : b7696b68 - correct
         Events : 181089

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --examine /dev/sdi
/dev/sdi:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm --examine /dev/sdi1
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x5
     Array UUID : 440dc11e:079308b1:131eda79:9a74c670
           Name : Blyth:0  (local to host Blyth)
  Creation Time : Tue Aug  4 23:47:57 2015
     Raid Level : raid6
   Raid Devices : 9

 Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
     Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
  Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
    Data Offset : 247808 sectors
   Super Offset : 8 sectors
   Unused Space : before=247720 sectors, after=14336 sectors
          State : clean
    Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483

Internal Bitmap : 8 sectors from superblock
  Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
  Delta Devices : 1 (8->9)

    Update Time : Tue Jul 11 23:12:08 2023
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 23b6d024 - correct
         Events : 181105

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)

$ sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
        Raid Level : raid6
     Total Devices : 9
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 9

     Delta Devices : 1, (-1->0)
         New Level : raid6
        New Layout : left-symmetric
     New Chunksize : 512K

              Name : Blyth:0  (local to host Blyth)
              UUID : 440dc11e:079308b1:131eda79:9a74c670
            Events : 181105

    Number   Major   Minor   RaidDevice

       -       8        1        -        /dev/sda1
       -       8      129        -        /dev/sdi1
       -       8      113        -        /dev/sdh1
       -       8       97        -        /dev/sdg1
       -       8       81        -        /dev/sdf1
       -       8       65        -        /dev/sde1
       -       8       49        -        /dev/sdd1
       -       8       33        -        /dev/sdc1
       -       8       17        -        /dev/sdb1

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
      26353689600 blocks super 1.2

unused devices: <none>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-03 21:39 Reshape Failure Jason Moss
@ 2023-09-04  1:41 ` Yu Kuai
  2023-09-04 16:38   ` Jason Moss
  0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-09-04  1:41 UTC (permalink / raw)
  To: Jason Moss, linux-raid; +Cc: yangerkun@huawei.com, yukuai (C)

Hi,

在 2023/09/04 5:39, Jason Moss 写道:
> Hello,
> 
> I recently attempted to add a new drive to my 8-drive RAID 6 array,
> growing it to 9 drives. I've done similar before with the same array,
> having previously grown it from 6 drives to 7 and then from 7 to 8
> with no issues. Drives are WD Reds, most older than 2019, some
> (including the newest) newer, but all confirmed CMR and not SMR.
> 
> Process used to expand the array:
> mdadm --add /dev/md0 /dev/sdb1
> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> 
> The reshape started off fine, the process was underway, and the volume
> was still usable as expected. However, 15-30 minutes into the reshape,
> I lost access to the contents of the drive. Checking /proc/mdstat, the
> reshape was stopped at 0.6% with the counter not incrementing at all.
> Any process accessing the array would just hang until killed. I waited

What kernel version are you using? And it'll be very helpful if you can
collect the stack of all stuck thread. There is a known deadlock for
raid5 related to reshape, and it's fixed in v6.5:

https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com

> a half hour and there was still no further change to the counter. At
> this point, I restarted the server and found that when it came back up
> it would begin reshaping again, but only very briefly, under 30
> seconds, but the counter would be increasing during that time.
> 
> I searched furiously for ideas and tried stopping and reassembling the
> array, assembling with an invalid-backup flag, echoing "frozen" then
> "reshape" to the sync_action file, and echoing "max" to the sync_max
> file. Nothing ever seemed to make a difference.
> 

Don't do this before v6.5, echo "reshape" while reshape is still in
progress will corrupt your data:

https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com

Thanks,
Kuai

> Here is where I slightly panicked, worried that I'd borked my array,
> and powered off the server again and disconnected the new drive that
> was just added, assuming that since it was the change, it may be the
> problem despite having burn-in tested it, and figuring that I'll rush
> order a new drive, so long as the reshape continues and I can just
> rebuild onto a new drive once the reshape finishes. However, this made
> no difference and the array continued to not rebuild.
> 
> Much searching later, I'd found nothing substantially different then
> I'd already tried and one of the common threads in other people's
> issues was bad drives, so I ran a self-test against each of the
> existing drives and found one drive that failed the read test.
> Thinking I had the culprit now, I dropped that drive out of the array
> and assembled the array again, but the same behavior persists. The
> array reshapes very briefly, then completely stops.
> 
> Down to 0 drives of redundancy (in the reshaped section at least), not
> finding any new ideas on any of the forums, mailing list, wiki, etc,
> and very frustrated, I took a break, bought all new drives to build a
> new array in another server and restored from a backup. However, there
> is still some data not captured by the most recent backup that I would
> like to recover, and I'd also like to solve the problem purely to
> understand what happened and how to recover in the future.
> 
> Is there anything else I should try to recover this array, or is this
> a lost cause?
> 
> Details requested by the wiki to follow and I'm happy to collect any
> further data that would assist. /dev/sdb is the new drive that was
> added, then disconnected. /dev/sdh is the drive that failed a
> self-test and was removed from the array.
> 
> Thank you in advance for any help provided!
> 
> 
> $ uname -a
> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> 2023 x86_64 x86_64 x86_64 GNU/Linux
> 
> $ mdadm --version
> mdadm - v4.2 - 2021-12-30
> 
> 
> $ sudo smartctl -H -i -l scterc /dev/sda
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WCC4N7AT7R7X
> LU WWN Device Id: 5 0014ee 268545f93
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sda
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WCC4N7AT7R7X
> LU WWN Device Id: 5 0014ee 268545f93
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdb
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WXG1A8UGLS42
> LU WWN Device Id: 5 0014ee 2b75ef53b
> Firmware Version: 80.00A80
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdc
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WCC4N4HYL32Y
> LU WWN Device Id: 5 0014ee 2630752f8
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdd
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68N32N0
> Serial Number:    WD-WCC7K1FF6DYK
> LU WWN Device Id: 5 0014ee 2ba952a30
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Form Factor:      3.5 inches
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-3 T13/2161-D revision 5
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sde
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WCC4N5ZHTRJF
> LU WWN Device Id: 5 0014ee 2b88b83bb
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdf
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68AX9N0
> Serial Number:    WD-WMC1T3804790
> LU WWN Device Id: 5 0014ee 6036b6826
> Firmware Version: 80.00A80
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdg
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WMC4N0H692Z9
> LU WWN Device Id: 5 0014ee 65af39740
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdh
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68EUZN0
> Serial Number:    WD-WMC4N0K5S750
> LU WWN Device Id: 5 0014ee 6b048d9ca
> Firmware Version: 82.00A82
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    5400 rpm
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> $ sudo smartctl -H -i -l scterc /dev/sdi
> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Red
> Device Model:     WDC WD30EFRX-68AX9N0
> Serial Number:    WD-WMC1T1502475
> LU WWN Device Id: 5 0014ee 058d2e5cb
> Firmware Version: 80.00A80
> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>             Read:     70 (7.0 seconds)
>            Write:     70 (7.0 seconds)
> 
> 
> $ sudo mdadm --examine /dev/sda
> /dev/sda:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sda1
> /dev/sda1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0xd
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247728 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 24 sectors - bad
> blocks present.
>         Checksum : b6d8f4d1 - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 7
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdb
> /dev/sdb:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdb1
> /dev/sdb1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247728 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 00:02:59 2023
>    Bad Block Log : 512 entries available at offset 24 sectors
>         Checksum : b544a39 - correct
>           Events : 181077
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 8
>     Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdc
> /dev/sdc:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdc1
> /dev/sdc1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0xd
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 72 sectors - bad
> blocks present.
>         Checksum : 88d8b8fc - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 4
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdd
> /dev/sdd:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdd1
> /dev/sdd1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247728 sectors, after=14336 sectors
>            State : clean
>      Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 24 sectors
>         Checksum : d1471d9d - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 6
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sde
> /dev/sde:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sde1
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 72 sectors
>         Checksum : e05d0278 - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 5
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdf
> /dev/sdf:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdf1
> /dev/sdf1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 72 sectors
>         Checksum : 26792cc0 - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 0
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdg
> /dev/sdg:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdg1
> /dev/sdg1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 72 sectors
>         Checksum : 6f67d179 - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 1
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdh
> /dev/sdh:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdh1
> /dev/sdh1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0xd
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 20:09:14 2023
>    Bad Block Log : 512 entries available at offset 72 sectors - bad
> blocks present.
>         Checksum : b7696b68 - correct
>           Events : 181089
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 2
>     Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --examine /dev/sdi
> /dev/sdi:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> $ sudo mdadm --examine /dev/sdi1
> /dev/sdi1:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x5
>       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>             Name : Blyth:0  (local to host Blyth)
>    Creation Time : Tue Aug  4 23:47:57 2015
>       Raid Level : raid6
>     Raid Devices : 9
> 
>   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>      Data Offset : 247808 sectors
>     Super Offset : 8 sectors
>     Unused Space : before=247720 sectors, after=14336 sectors
>            State : clean
>      Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> 
> Internal Bitmap : 8 sectors from superblock
>    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>    Delta Devices : 1 (8->9)
> 
>      Update Time : Tue Jul 11 23:12:08 2023
>    Bad Block Log : 512 entries available at offset 72 sectors
>         Checksum : 23b6d024 - correct
>           Events : 181105
> 
>           Layout : left-symmetric
>       Chunk Size : 512K
> 
>     Device Role : Active device 3
>     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> 
> $ sudo mdadm --detail /dev/md0
> /dev/md0:
>             Version : 1.2
>          Raid Level : raid6
>       Total Devices : 9
>         Persistence : Superblock is persistent
> 
>               State : inactive
>     Working Devices : 9
> 
>       Delta Devices : 1, (-1->0)
>           New Level : raid6
>          New Layout : left-symmetric
>       New Chunksize : 512K
> 
>                Name : Blyth:0  (local to host Blyth)
>                UUID : 440dc11e:079308b1:131eda79:9a74c670
>              Events : 181105
> 
>      Number   Major   Minor   RaidDevice
> 
>         -       8        1        -        /dev/sda1
>         -       8      129        -        /dev/sdi1
>         -       8      113        -        /dev/sdh1
>         -       8       97        -        /dev/sdg1
>         -       8       81        -        /dev/sdf1
>         -       8       65        -        /dev/sde1
>         -       8       49        -        /dev/sdd1
>         -       8       33        -        /dev/sdc1
>         -       8       17        -        /dev/sdb1
> 
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
>        26353689600 blocks super 1.2
> 
> unused devices: <none>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-04  1:41 ` Yu Kuai
@ 2023-09-04 16:38   ` Jason Moss
  2023-09-05  1:07     ` Yu Kuai
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-04 16:38 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi Kuai,

Thank you for the suggestion, I was previously on 5.15.0. I've built
an environment with 6.5.0.1 now and assembled the array there, but the
same problem happens. It reshaped for 20-30 seconds, then completely
stopped.

Processes and /proc/<PID>/stack output:
root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]

[root@arch ~]# cat /proc/24593/stack
[<0>] rescuer_thread+0x2b0/0x3b0
[<0>] kthread+0xe8/0x120
[<0>] ret_from_fork+0x34/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

[root@arch ~]# cat /proc/24594/stack

[root@arch ~]# cat /proc/24595/stack
[<0>] reshape_request+0x416/0x9f0 [raid456]
[<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
[<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
[<0>] md_thread+0xae/0x190 [md_mod]
[<0>] kthread+0xe8/0x120
[<0>] ret_from_fork+0x34/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

Please let me know if there's a better way to provide the stack info.

Thank you

On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/04 5:39, Jason Moss 写道:
> > Hello,
> >
> > I recently attempted to add a new drive to my 8-drive RAID 6 array,
> > growing it to 9 drives. I've done similar before with the same array,
> > having previously grown it from 6 drives to 7 and then from 7 to 8
> > with no issues. Drives are WD Reds, most older than 2019, some
> > (including the newest) newer, but all confirmed CMR and not SMR.
> >
> > Process used to expand the array:
> > mdadm --add /dev/md0 /dev/sdb1
> > mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> >
> > The reshape started off fine, the process was underway, and the volume
> > was still usable as expected. However, 15-30 minutes into the reshape,
> > I lost access to the contents of the drive. Checking /proc/mdstat, the
> > reshape was stopped at 0.6% with the counter not incrementing at all.
> > Any process accessing the array would just hang until killed. I waited
>
> What kernel version are you using? And it'll be very helpful if you can
> collect the stack of all stuck thread. There is a known deadlock for
> raid5 related to reshape, and it's fixed in v6.5:
>
> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
>
> > a half hour and there was still no further change to the counter. At
> > this point, I restarted the server and found that when it came back up
> > it would begin reshaping again, but only very briefly, under 30
> > seconds, but the counter would be increasing during that time.
> >
> > I searched furiously for ideas and tried stopping and reassembling the
> > array, assembling with an invalid-backup flag, echoing "frozen" then
> > "reshape" to the sync_action file, and echoing "max" to the sync_max
> > file. Nothing ever seemed to make a difference.
> >
>
> Don't do this before v6.5, echo "reshape" while reshape is still in
> progress will corrupt your data:
>
> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
>
> Thanks,
> Kuai
>
> > Here is where I slightly panicked, worried that I'd borked my array,
> > and powered off the server again and disconnected the new drive that
> > was just added, assuming that since it was the change, it may be the
> > problem despite having burn-in tested it, and figuring that I'll rush
> > order a new drive, so long as the reshape continues and I can just
> > rebuild onto a new drive once the reshape finishes. However, this made
> > no difference and the array continued to not rebuild.
> >
> > Much searching later, I'd found nothing substantially different then
> > I'd already tried and one of the common threads in other people's
> > issues was bad drives, so I ran a self-test against each of the
> > existing drives and found one drive that failed the read test.
> > Thinking I had the culprit now, I dropped that drive out of the array
> > and assembled the array again, but the same behavior persists. The
> > array reshapes very briefly, then completely stops.
> >
> > Down to 0 drives of redundancy (in the reshaped section at least), not
> > finding any new ideas on any of the forums, mailing list, wiki, etc,
> > and very frustrated, I took a break, bought all new drives to build a
> > new array in another server and restored from a backup. However, there
> > is still some data not captured by the most recent backup that I would
> > like to recover, and I'd also like to solve the problem purely to
> > understand what happened and how to recover in the future.
> >
> > Is there anything else I should try to recover this array, or is this
> > a lost cause?
> >
> > Details requested by the wiki to follow and I'm happy to collect any
> > further data that would assist. /dev/sdb is the new drive that was
> > added, then disconnected. /dev/sdh is the drive that failed a
> > self-test and was removed from the array.
> >
> > Thank you in advance for any help provided!
> >
> >
> > $ uname -a
> > Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> > 2023 x86_64 x86_64 x86_64 GNU/Linux
> >
> > $ mdadm --version
> > mdadm - v4.2 - 2021-12-30
> >
> >
> > $ sudo smartctl -H -i -l scterc /dev/sda
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WCC4N7AT7R7X
> > LU WWN Device Id: 5 0014ee 268545f93
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sda
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WCC4N7AT7R7X
> > LU WWN Device Id: 5 0014ee 268545f93
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdb
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WXG1A8UGLS42
> > LU WWN Device Id: 5 0014ee 2b75ef53b
> > Firmware Version: 80.00A80
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdc
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WCC4N4HYL32Y
> > LU WWN Device Id: 5 0014ee 2630752f8
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdd
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68N32N0
> > Serial Number:    WD-WCC7K1FF6DYK
> > LU WWN Device Id: 5 0014ee 2ba952a30
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Form Factor:      3.5 inches
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-3 T13/2161-D revision 5
> > SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sde
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WCC4N5ZHTRJF
> > LU WWN Device Id: 5 0014ee 2b88b83bb
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdf
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68AX9N0
> > Serial Number:    WD-WMC1T3804790
> > LU WWN Device Id: 5 0014ee 6036b6826
> > Firmware Version: 80.00A80
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdg
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WMC4N0H692Z9
> > LU WWN Device Id: 5 0014ee 65af39740
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdh
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68EUZN0
> > Serial Number:    WD-WMC4N0K5S750
> > LU WWN Device Id: 5 0014ee 6b048d9ca
> > Firmware Version: 82.00A82
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Rotation Rate:    5400 rpm
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> > $ sudo smartctl -H -i -l scterc /dev/sdi
> > smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> > Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >
> > === START OF INFORMATION SECTION ===
> > Model Family:     Western Digital Red
> > Device Model:     WDC WD30EFRX-68AX9N0
> > Serial Number:    WD-WMC1T1502475
> > LU WWN Device Id: 5 0014ee 058d2e5cb
> > Firmware Version: 80.00A80
> > User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> > Sector Sizes:     512 bytes logical, 4096 bytes physical
> > Device is:        In smartctl database [for details use: -P show]
> > ATA Version is:   ACS-2 (minor revision not indicated)
> > SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> > Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> > SMART support is: Available - device has SMART capability.
> > SMART support is: Enabled
> >
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: PASSED
> >
> > SCT Error Recovery Control:
> >             Read:     70 (7.0 seconds)
> >            Write:     70 (7.0 seconds)
> >
> >
> > $ sudo mdadm --examine /dev/sda
> > /dev/sda:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sda1
> > /dev/sda1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0xd
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247728 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 24 sectors - bad
> > blocks present.
> >         Checksum : b6d8f4d1 - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 7
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdb
> > /dev/sdb:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdb1
> > /dev/sdb1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247728 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 00:02:59 2023
> >    Bad Block Log : 512 entries available at offset 24 sectors
> >         Checksum : b544a39 - correct
> >           Events : 181077
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 8
> >     Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdc
> > /dev/sdc:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdc1
> > /dev/sdc1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0xd
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors - bad
> > blocks present.
> >         Checksum : 88d8b8fc - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 4
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdd
> > /dev/sdd:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdd1
> > /dev/sdd1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247728 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 24 sectors
> >         Checksum : d1471d9d - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 6
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sde
> > /dev/sde:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sde1
> > /dev/sde1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors
> >         Checksum : e05d0278 - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 5
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdf
> > /dev/sdf:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdf1
> > /dev/sdf1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors
> >         Checksum : 26792cc0 - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 0
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdg
> > /dev/sdg:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdg1
> > /dev/sdg1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors
> >         Checksum : 6f67d179 - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 1
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdh
> > /dev/sdh:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdh1
> > /dev/sdh1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0xd
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 20:09:14 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors - bad
> > blocks present.
> >         Checksum : b7696b68 - correct
> >           Events : 181089
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 2
> >     Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --examine /dev/sdi
> > /dev/sdi:
> >     MBR Magic : aa55
> > Partition[0] :   4294967295 sectors at            1 (type ee)
> > $ sudo mdadm --examine /dev/sdi1
> > /dev/sdi1:
> >            Magic : a92b4efc
> >          Version : 1.2
> >      Feature Map : 0x5
> >       Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >             Name : Blyth:0  (local to host Blyth)
> >    Creation Time : Tue Aug  4 23:47:57 2015
> >       Raid Level : raid6
> >     Raid Devices : 9
> >
> >   Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >       Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >    Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >      Data Offset : 247808 sectors
> >     Super Offset : 8 sectors
> >     Unused Space : before=247720 sectors, after=14336 sectors
> >            State : clean
> >      Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> >
> > Internal Bitmap : 8 sectors from superblock
> >    Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >    Delta Devices : 1 (8->9)
> >
> >      Update Time : Tue Jul 11 23:12:08 2023
> >    Bad Block Log : 512 entries available at offset 72 sectors
> >         Checksum : 23b6d024 - correct
> >           Events : 181105
> >
> >           Layout : left-symmetric
> >       Chunk Size : 512K
> >
> >     Device Role : Active device 3
> >     Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >
> > $ sudo mdadm --detail /dev/md0
> > /dev/md0:
> >             Version : 1.2
> >          Raid Level : raid6
> >       Total Devices : 9
> >         Persistence : Superblock is persistent
> >
> >               State : inactive
> >     Working Devices : 9
> >
> >       Delta Devices : 1, (-1->0)
> >           New Level : raid6
> >          New Layout : left-symmetric
> >       New Chunksize : 512K
> >
> >                Name : Blyth:0  (local to host Blyth)
> >                UUID : 440dc11e:079308b1:131eda79:9a74c670
> >              Events : 181105
> >
> >      Number   Major   Minor   RaidDevice
> >
> >         -       8        1        -        /dev/sda1
> >         -       8      129        -        /dev/sdi1
> >         -       8      113        -        /dev/sdh1
> >         -       8       97        -        /dev/sdg1
> >         -       8       81        -        /dev/sdf1
> >         -       8       65        -        /dev/sde1
> >         -       8       49        -        /dev/sdd1
> >         -       8       33        -        /dev/sdc1
> >         -       8       17        -        /dev/sdb1
> >
> > $ cat /proc/mdstat
> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> > [raid4] [raid10]
> > md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> > sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
> >        26353689600 blocks super 1.2
> >
> > unused devices: <none>
> >
> > .
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-04 16:38   ` Jason Moss
@ 2023-09-05  1:07     ` Yu Kuai
  2023-09-06 14:05       ` Jason Moss
  0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-09-05  1:07 UTC (permalink / raw)
  To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

在 2023/09/05 0:38, Jason Moss 写道:
> Hi Kuai,
> 
> Thank you for the suggestion, I was previously on 5.15.0. I've built
> an environment with 6.5.0.1 now and assembled the array there, but the
> same problem happens. It reshaped for 20-30 seconds, then completely
> stopped.
> 
> Processes and /proc/<PID>/stack output:
> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
> 
> [root@arch ~]# cat /proc/24593/stack
> [<0>] rescuer_thread+0x2b0/0x3b0
> [<0>] kthread+0xe8/0x120
> [<0>] ret_from_fork+0x34/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> [root@arch ~]# cat /proc/24594/stack
> 
> [root@arch ~]# cat /proc/24595/stack
> [<0>] reshape_request+0x416/0x9f0 [raid456]
Can you provide the addr2line result? Let's see where reshape_request()
is stuck first.

Thanks,
Kuai

> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
> [<0>] md_thread+0xae/0x190 [md_mod]
> [<0>] kthread+0xe8/0x120
> [<0>] ret_from_fork+0x34/0x50
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> Please let me know if there's a better way to provide the stack info.
> 
> Thank you
> 
> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/04 5:39, Jason Moss 写道:
>>> Hello,
>>>
>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
>>> growing it to 9 drives. I've done similar before with the same array,
>>> having previously grown it from 6 drives to 7 and then from 7 to 8
>>> with no issues. Drives are WD Reds, most older than 2019, some
>>> (including the newest) newer, but all confirmed CMR and not SMR.
>>>
>>> Process used to expand the array:
>>> mdadm --add /dev/md0 /dev/sdb1
>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
>>>
>>> The reshape started off fine, the process was underway, and the volume
>>> was still usable as expected. However, 15-30 minutes into the reshape,
>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
>>> reshape was stopped at 0.6% with the counter not incrementing at all.
>>> Any process accessing the array would just hang until killed. I waited
>>
>> What kernel version are you using? And it'll be very helpful if you can
>> collect the stack of all stuck thread. There is a known deadlock for
>> raid5 related to reshape, and it's fixed in v6.5:
>>
>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
>>
>>> a half hour and there was still no further change to the counter. At
>>> this point, I restarted the server and found that when it came back up
>>> it would begin reshaping again, but only very briefly, under 30
>>> seconds, but the counter would be increasing during that time.
>>>
>>> I searched furiously for ideas and tried stopping and reassembling the
>>> array, assembling with an invalid-backup flag, echoing "frozen" then
>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
>>> file. Nothing ever seemed to make a difference.
>>>
>>
>> Don't do this before v6.5, echo "reshape" while reshape is still in
>> progress will corrupt your data:
>>
>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
>>
>> Thanks,
>> Kuai
>>
>>> Here is where I slightly panicked, worried that I'd borked my array,
>>> and powered off the server again and disconnected the new drive that
>>> was just added, assuming that since it was the change, it may be the
>>> problem despite having burn-in tested it, and figuring that I'll rush
>>> order a new drive, so long as the reshape continues and I can just
>>> rebuild onto a new drive once the reshape finishes. However, this made
>>> no difference and the array continued to not rebuild.
>>>
>>> Much searching later, I'd found nothing substantially different then
>>> I'd already tried and one of the common threads in other people's
>>> issues was bad drives, so I ran a self-test against each of the
>>> existing drives and found one drive that failed the read test.
>>> Thinking I had the culprit now, I dropped that drive out of the array
>>> and assembled the array again, but the same behavior persists. The
>>> array reshapes very briefly, then completely stops.
>>>
>>> Down to 0 drives of redundancy (in the reshaped section at least), not
>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
>>> and very frustrated, I took a break, bought all new drives to build a
>>> new array in another server and restored from a backup. However, there
>>> is still some data not captured by the most recent backup that I would
>>> like to recover, and I'd also like to solve the problem purely to
>>> understand what happened and how to recover in the future.
>>>
>>> Is there anything else I should try to recover this array, or is this
>>> a lost cause?
>>>
>>> Details requested by the wiki to follow and I'm happy to collect any
>>> further data that would assist. /dev/sdb is the new drive that was
>>> added, then disconnected. /dev/sdh is the drive that failed a
>>> self-test and was removed from the array.
>>>
>>> Thank you in advance for any help provided!
>>>
>>>
>>> $ uname -a
>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> $ mdadm --version
>>> mdadm - v4.2 - 2021-12-30
>>>
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WCC4N7AT7R7X
>>> LU WWN Device Id: 5 0014ee 268545f93
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WCC4N7AT7R7X
>>> LU WWN Device Id: 5 0014ee 268545f93
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdb
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WXG1A8UGLS42
>>> LU WWN Device Id: 5 0014ee 2b75ef53b
>>> Firmware Version: 80.00A80
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdc
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WCC4N4HYL32Y
>>> LU WWN Device Id: 5 0014ee 2630752f8
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdd
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68N32N0
>>> Serial Number:    WD-WCC7K1FF6DYK
>>> LU WWN Device Id: 5 0014ee 2ba952a30
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Form Factor:      3.5 inches
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-3 T13/2161-D revision 5
>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sde
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WCC4N5ZHTRJF
>>> LU WWN Device Id: 5 0014ee 2b88b83bb
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdf
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68AX9N0
>>> Serial Number:    WD-WMC1T3804790
>>> LU WWN Device Id: 5 0014ee 6036b6826
>>> Firmware Version: 80.00A80
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdg
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WMC4N0H692Z9
>>> LU WWN Device Id: 5 0014ee 65af39740
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdh
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68EUZN0
>>> Serial Number:    WD-WMC4N0K5S750
>>> LU WWN Device Id: 5 0014ee 6b048d9ca
>>> Firmware Version: 82.00A82
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    5400 rpm
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>> $ sudo smartctl -H -i -l scterc /dev/sdi
>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>>> === START OF INFORMATION SECTION ===
>>> Model Family:     Western Digital Red
>>> Device Model:     WDC WD30EFRX-68AX9N0
>>> Serial Number:    WD-WMC1T1502475
>>> LU WWN Device Id: 5 0014ee 058d2e5cb
>>> Firmware Version: 80.00A80
>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Device is:        In smartctl database [for details use: -P show]
>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> === START OF READ SMART DATA SECTION ===
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> SCT Error Recovery Control:
>>>              Read:     70 (7.0 seconds)
>>>             Write:     70 (7.0 seconds)
>>>
>>>
>>> $ sudo mdadm --examine /dev/sda
>>> /dev/sda:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sda1
>>> /dev/sda1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0xd
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247728 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 8ca60ad5:60d19333:11b24820:91453532
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 24 sectors - bad
>>> blocks present.
>>>          Checksum : b6d8f4d1 - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 7
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdb
>>> /dev/sdb:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdb1
>>> /dev/sdb1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247728 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 386d3001:16447e43:4d2a5459:85618d11
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 00:02:59 2023
>>>     Bad Block Log : 512 entries available at offset 24 sectors
>>>          Checksum : b544a39 - correct
>>>            Events : 181077
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 8
>>>      Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdc
>>> /dev/sdc:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdc1
>>> /dev/sdc1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0xd
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors - bad
>>> blocks present.
>>>          Checksum : 88d8b8fc - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 4
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdd
>>> /dev/sdd:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdd1
>>> /dev/sdd1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247728 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 24 sectors
>>>          Checksum : d1471d9d - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 6
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sde
>>> /dev/sde:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sde1
>>> /dev/sde1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors
>>>          Checksum : e05d0278 - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 5
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdf
>>> /dev/sdf:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdf1
>>> /dev/sdf1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors
>>>          Checksum : 26792cc0 - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 0
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdg
>>> /dev/sdg:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdg1
>>> /dev/sdg1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 74476ce7:4edc23f6:08120711:ba281425
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors
>>>          Checksum : 6f67d179 - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 1
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdh
>>> /dev/sdh:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdh1
>>> /dev/sdh1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0xd
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 20:09:14 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors - bad
>>> blocks present.
>>>          Checksum : b7696b68 - correct
>>>            Events : 181089
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 2
>>>      Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --examine /dev/sdi
>>> /dev/sdi:
>>>      MBR Magic : aa55
>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>> $ sudo mdadm --examine /dev/sdi1
>>> /dev/sdi1:
>>>             Magic : a92b4efc
>>>           Version : 1.2
>>>       Feature Map : 0x5
>>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>              Name : Blyth:0  (local to host Blyth)
>>>     Creation Time : Tue Aug  4 23:47:57 2015
>>>        Raid Level : raid6
>>>      Raid Devices : 9
>>>
>>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>       Data Offset : 247808 sectors
>>>      Super Offset : 8 sectors
>>>      Unused Space : before=247720 sectors, after=14336 sectors
>>>             State : clean
>>>       Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
>>>
>>> Internal Bitmap : 8 sectors from superblock
>>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>     Delta Devices : 1 (8->9)
>>>
>>>       Update Time : Tue Jul 11 23:12:08 2023
>>>     Bad Block Log : 512 entries available at offset 72 sectors
>>>          Checksum : 23b6d024 - correct
>>>            Events : 181105
>>>
>>>            Layout : left-symmetric
>>>        Chunk Size : 512K
>>>
>>>      Device Role : Active device 3
>>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>
>>> $ sudo mdadm --detail /dev/md0
>>> /dev/md0:
>>>              Version : 1.2
>>>           Raid Level : raid6
>>>        Total Devices : 9
>>>          Persistence : Superblock is persistent
>>>
>>>                State : inactive
>>>      Working Devices : 9
>>>
>>>        Delta Devices : 1, (-1->0)
>>>            New Level : raid6
>>>           New Layout : left-symmetric
>>>        New Chunksize : 512K
>>>
>>>                 Name : Blyth:0  (local to host Blyth)
>>>                 UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>               Events : 181105
>>>
>>>       Number   Major   Minor   RaidDevice
>>>
>>>          -       8        1        -        /dev/sda1
>>>          -       8      129        -        /dev/sdi1
>>>          -       8      113        -        /dev/sdh1
>>>          -       8       97        -        /dev/sdg1
>>>          -       8       81        -        /dev/sdf1
>>>          -       8       65        -        /dev/sde1
>>>          -       8       49        -        /dev/sdd1
>>>          -       8       33        -        /dev/sdc1
>>>          -       8       17        -        /dev/sdb1
>>>
>>> $ cat /proc/mdstat
>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>> [raid4] [raid10]
>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
>>>         26353689600 blocks super 1.2
>>>
>>> unused devices: <none>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-05  1:07     ` Yu Kuai
@ 2023-09-06 14:05       ` Jason Moss
  2023-09-07  1:38         ` Yu Kuai
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-06 14:05 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi Kuai,

I ended up using gdb rather than addr2line, as that output didn't give
me the global offset. Maybe there's a better way, but this seems to be
similar to what I expected.

(gdb) list *(reshape_request+0x416)
0x11566 is in reshape_request (drivers/md/raid5.c:6396).
6391            if ((mddev->reshape_backwards
6392                 ? (safepos > writepos && readpos < writepos)
6393                 : (safepos < writepos && readpos > writepos)) ||
6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
6395                    /* Cannot proceed until we've updated the
superblock... */
6396                    wait_event(conf->wait_for_overlap,
6397                               atomic_read(&conf->reshape_stripes)==0
6398                               || test_bit(MD_RECOVERY_INTR,
&mddev->recovery));
6399                    if (atomic_read(&conf->reshape_stripes) != 0)
6400                            return 0;

Thanks

On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/05 0:38, Jason Moss 写道:
> > Hi Kuai,
> >
> > Thank you for the suggestion, I was previously on 5.15.0. I've built
> > an environment with 6.5.0.1 now and assembled the array there, but the
> > same problem happens. It reshaped for 20-30 seconds, then completely
> > stopped.
> >
> > Processes and /proc/<PID>/stack output:
> > root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
> > root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
> > root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
> >
> > [root@arch ~]# cat /proc/24593/stack
> > [<0>] rescuer_thread+0x2b0/0x3b0
> > [<0>] kthread+0xe8/0x120
> > [<0>] ret_from_fork+0x34/0x50
> > [<0>] ret_from_fork_asm+0x1b/0x30
> >
> > [root@arch ~]# cat /proc/24594/stack
> >
> > [root@arch ~]# cat /proc/24595/stack
> > [<0>] reshape_request+0x416/0x9f0 [raid456]
> Can you provide the addr2line result? Let's see where reshape_request()
> is stuck first.
>
> Thanks,
> Kuai
>
> > [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
> > [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
> > [<0>] md_thread+0xae/0x190 [md_mod]
> > [<0>] kthread+0xe8/0x120
> > [<0>] ret_from_fork+0x34/0x50
> > [<0>] ret_from_fork_asm+0x1b/0x30
> >
> > Please let me know if there's a better way to provide the stack info.
> >
> > Thank you
> >
> > On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/04 5:39, Jason Moss 写道:
> >>> Hello,
> >>>
> >>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
> >>> growing it to 9 drives. I've done similar before with the same array,
> >>> having previously grown it from 6 drives to 7 and then from 7 to 8
> >>> with no issues. Drives are WD Reds, most older than 2019, some
> >>> (including the newest) newer, but all confirmed CMR and not SMR.
> >>>
> >>> Process used to expand the array:
> >>> mdadm --add /dev/md0 /dev/sdb1
> >>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> >>>
> >>> The reshape started off fine, the process was underway, and the volume
> >>> was still usable as expected. However, 15-30 minutes into the reshape,
> >>> I lost access to the contents of the drive. Checking /proc/mdstat, the
> >>> reshape was stopped at 0.6% with the counter not incrementing at all.
> >>> Any process accessing the array would just hang until killed. I waited
> >>
> >> What kernel version are you using? And it'll be very helpful if you can
> >> collect the stack of all stuck thread. There is a known deadlock for
> >> raid5 related to reshape, and it's fixed in v6.5:
> >>
> >> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
> >>
> >>> a half hour and there was still no further change to the counter. At
> >>> this point, I restarted the server and found that when it came back up
> >>> it would begin reshaping again, but only very briefly, under 30
> >>> seconds, but the counter would be increasing during that time.
> >>>
> >>> I searched furiously for ideas and tried stopping and reassembling the
> >>> array, assembling with an invalid-backup flag, echoing "frozen" then
> >>> "reshape" to the sync_action file, and echoing "max" to the sync_max
> >>> file. Nothing ever seemed to make a difference.
> >>>
> >>
> >> Don't do this before v6.5, echo "reshape" while reshape is still in
> >> progress will corrupt your data:
> >>
> >> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
> >>
> >> Thanks,
> >> Kuai
> >>
> >>> Here is where I slightly panicked, worried that I'd borked my array,
> >>> and powered off the server again and disconnected the new drive that
> >>> was just added, assuming that since it was the change, it may be the
> >>> problem despite having burn-in tested it, and figuring that I'll rush
> >>> order a new drive, so long as the reshape continues and I can just
> >>> rebuild onto a new drive once the reshape finishes. However, this made
> >>> no difference and the array continued to not rebuild.
> >>>
> >>> Much searching later, I'd found nothing substantially different then
> >>> I'd already tried and one of the common threads in other people's
> >>> issues was bad drives, so I ran a self-test against each of the
> >>> existing drives and found one drive that failed the read test.
> >>> Thinking I had the culprit now, I dropped that drive out of the array
> >>> and assembled the array again, but the same behavior persists. The
> >>> array reshapes very briefly, then completely stops.
> >>>
> >>> Down to 0 drives of redundancy (in the reshaped section at least), not
> >>> finding any new ideas on any of the forums, mailing list, wiki, etc,
> >>> and very frustrated, I took a break, bought all new drives to build a
> >>> new array in another server and restored from a backup. However, there
> >>> is still some data not captured by the most recent backup that I would
> >>> like to recover, and I'd also like to solve the problem purely to
> >>> understand what happened and how to recover in the future.
> >>>
> >>> Is there anything else I should try to recover this array, or is this
> >>> a lost cause?
> >>>
> >>> Details requested by the wiki to follow and I'm happy to collect any
> >>> further data that would assist. /dev/sdb is the new drive that was
> >>> added, then disconnected. /dev/sdh is the drive that failed a
> >>> self-test and was removed from the array.
> >>>
> >>> Thank you in advance for any help provided!
> >>>
> >>>
> >>> $ uname -a
> >>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> >>> 2023 x86_64 x86_64 x86_64 GNU/Linux
> >>>
> >>> $ mdadm --version
> >>> mdadm - v4.2 - 2021-12-30
> >>>
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WCC4N7AT7R7X
> >>> LU WWN Device Id: 5 0014ee 268545f93
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WCC4N7AT7R7X
> >>> LU WWN Device Id: 5 0014ee 268545f93
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdb
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WXG1A8UGLS42
> >>> LU WWN Device Id: 5 0014ee 2b75ef53b
> >>> Firmware Version: 80.00A80
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdc
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WCC4N4HYL32Y
> >>> LU WWN Device Id: 5 0014ee 2630752f8
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdd
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68N32N0
> >>> Serial Number:    WD-WCC7K1FF6DYK
> >>> LU WWN Device Id: 5 0014ee 2ba952a30
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Form Factor:      3.5 inches
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-3 T13/2161-D revision 5
> >>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sde
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WCC4N5ZHTRJF
> >>> LU WWN Device Id: 5 0014ee 2b88b83bb
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdf
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68AX9N0
> >>> Serial Number:    WD-WMC1T3804790
> >>> LU WWN Device Id: 5 0014ee 6036b6826
> >>> Firmware Version: 80.00A80
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdg
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WMC4N0H692Z9
> >>> LU WWN Device Id: 5 0014ee 65af39740
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdh
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68EUZN0
> >>> Serial Number:    WD-WMC4N0K5S750
> >>> LU WWN Device Id: 5 0014ee 6b048d9ca
> >>> Firmware Version: 82.00A82
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Rotation Rate:    5400 rpm
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>> $ sudo smartctl -H -i -l scterc /dev/sdi
> >>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>
> >>> === START OF INFORMATION SECTION ===
> >>> Model Family:     Western Digital Red
> >>> Device Model:     WDC WD30EFRX-68AX9N0
> >>> Serial Number:    WD-WMC1T1502475
> >>> LU WWN Device Id: 5 0014ee 058d2e5cb
> >>> Firmware Version: 80.00A80
> >>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>> Device is:        In smartctl database [for details use: -P show]
> >>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> >>> SMART support is: Available - device has SMART capability.
> >>> SMART support is: Enabled
> >>>
> >>> === START OF READ SMART DATA SECTION ===
> >>> SMART overall-health self-assessment test result: PASSED
> >>>
> >>> SCT Error Recovery Control:
> >>>              Read:     70 (7.0 seconds)
> >>>             Write:     70 (7.0 seconds)
> >>>
> >>>
> >>> $ sudo mdadm --examine /dev/sda
> >>> /dev/sda:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sda1
> >>> /dev/sda1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0xd
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247728 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 24 sectors - bad
> >>> blocks present.
> >>>          Checksum : b6d8f4d1 - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 7
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdb
> >>> /dev/sdb:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdb1
> >>> /dev/sdb1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247728 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 00:02:59 2023
> >>>     Bad Block Log : 512 entries available at offset 24 sectors
> >>>          Checksum : b544a39 - correct
> >>>            Events : 181077
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 8
> >>>      Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdc
> >>> /dev/sdc:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdc1
> >>> /dev/sdc1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0xd
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>> blocks present.
> >>>          Checksum : 88d8b8fc - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 4
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdd
> >>> /dev/sdd:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdd1
> >>> /dev/sdd1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247728 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 24 sectors
> >>>          Checksum : d1471d9d - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 6
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sde
> >>> /dev/sde:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sde1
> >>> /dev/sde1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors
> >>>          Checksum : e05d0278 - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 5
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdf
> >>> /dev/sdf:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdf1
> >>> /dev/sdf1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors
> >>>          Checksum : 26792cc0 - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 0
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdg
> >>> /dev/sdg:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdg1
> >>> /dev/sdg1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors
> >>>          Checksum : 6f67d179 - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 1
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdh
> >>> /dev/sdh:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdh1
> >>> /dev/sdh1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0xd
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 20:09:14 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>> blocks present.
> >>>          Checksum : b7696b68 - correct
> >>>            Events : 181089
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 2
> >>>      Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --examine /dev/sdi
> >>> /dev/sdi:
> >>>      MBR Magic : aa55
> >>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>> $ sudo mdadm --examine /dev/sdi1
> >>> /dev/sdi1:
> >>>             Magic : a92b4efc
> >>>           Version : 1.2
> >>>       Feature Map : 0x5
> >>>        Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>              Name : Blyth:0  (local to host Blyth)
> >>>     Creation Time : Tue Aug  4 23:47:57 2015
> >>>        Raid Level : raid6
> >>>      Raid Devices : 9
> >>>
> >>>    Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>        Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>     Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>       Data Offset : 247808 sectors
> >>>      Super Offset : 8 sectors
> >>>      Unused Space : before=247720 sectors, after=14336 sectors
> >>>             State : clean
> >>>       Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> >>>
> >>> Internal Bitmap : 8 sectors from superblock
> >>>     Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>     Delta Devices : 1 (8->9)
> >>>
> >>>       Update Time : Tue Jul 11 23:12:08 2023
> >>>     Bad Block Log : 512 entries available at offset 72 sectors
> >>>          Checksum : 23b6d024 - correct
> >>>            Events : 181105
> >>>
> >>>            Layout : left-symmetric
> >>>        Chunk Size : 512K
> >>>
> >>>      Device Role : Active device 3
> >>>      Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>
> >>> $ sudo mdadm --detail /dev/md0
> >>> /dev/md0:
> >>>              Version : 1.2
> >>>           Raid Level : raid6
> >>>        Total Devices : 9
> >>>          Persistence : Superblock is persistent
> >>>
> >>>                State : inactive
> >>>      Working Devices : 9
> >>>
> >>>        Delta Devices : 1, (-1->0)
> >>>            New Level : raid6
> >>>           New Layout : left-symmetric
> >>>        New Chunksize : 512K
> >>>
> >>>                 Name : Blyth:0  (local to host Blyth)
> >>>                 UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>               Events : 181105
> >>>
> >>>       Number   Major   Minor   RaidDevice
> >>>
> >>>          -       8        1        -        /dev/sda1
> >>>          -       8      129        -        /dev/sdi1
> >>>          -       8      113        -        /dev/sdh1
> >>>          -       8       97        -        /dev/sdg1
> >>>          -       8       81        -        /dev/sdf1
> >>>          -       8       65        -        /dev/sde1
> >>>          -       8       49        -        /dev/sdd1
> >>>          -       8       33        -        /dev/sdc1
> >>>          -       8       17        -        /dev/sdb1
> >>>
> >>> $ cat /proc/mdstat
> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >>> [raid4] [raid10]
> >>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> >>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
> >>>         26353689600 blocks super 1.2
> >>>
> >>> unused devices: <none>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-06 14:05       ` Jason Moss
@ 2023-09-07  1:38         ` Yu Kuai
  2023-09-07  5:44           ` Jason Moss
  0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-09-07  1:38 UTC (permalink / raw)
  To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

在 2023/09/06 22:05, Jason Moss 写道:
> Hi Kuai,
> 
> I ended up using gdb rather than addr2line, as that output didn't give
> me the global offset. Maybe there's a better way, but this seems to be
> similar to what I expected.

It's ok.
> 
> (gdb) list *(reshape_request+0x416)
> 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
> 6391            if ((mddev->reshape_backwards
> 6392                 ? (safepos > writepos && readpos < writepos)
> 6393                 : (safepos < writepos && readpos > writepos)) ||
> 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
> 6395                    /* Cannot proceed until we've updated the
> superblock... */
> 6396                    wait_event(conf->wait_for_overlap,
> 6397                               atomic_read(&conf->reshape_stripes)==0
> 6398                               || test_bit(MD_RECOVERY_INTR,

If reshape is stuck here, which means:

1) Either reshape io is stuck somewhere and never complete;
2) Or the counter reshape_stripes is broken;

Can you read following debugfs files to verify if io is stuck in
underlying disk?

/sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}

Furthermore, echo frozen should break above wait_event() because
'MD_RECOVERY_INTR' will be set, however, based on your description,
the problem still exist. Can you collect stack and addr2line result
of stuck thread after echo frozen?

Thanks,
Kuai

> &mddev->recovery));
> 6399                    if (atomic_read(&conf->reshape_stripes) != 0)
> 6400                            return 0;
> 
> Thanks
> 
> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/05 0:38, Jason Moss 写道:
>>> Hi Kuai,
>>>
>>> Thank you for the suggestion, I was previously on 5.15.0. I've built
>>> an environment with 6.5.0.1 now and assembled the array there, but the
>>> same problem happens. It reshaped for 20-30 seconds, then completely
>>> stopped.
>>>
>>> Processes and /proc/<PID>/stack output:
>>> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
>>> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
>>> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
>>>
>>> [root@arch ~]# cat /proc/24593/stack
>>> [<0>] rescuer_thread+0x2b0/0x3b0
>>> [<0>] kthread+0xe8/0x120
>>> [<0>] ret_from_fork+0x34/0x50
>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>
>>> [root@arch ~]# cat /proc/24594/stack
>>>
>>> [root@arch ~]# cat /proc/24595/stack
>>> [<0>] reshape_request+0x416/0x9f0 [raid456]
>> Can you provide the addr2line result? Let's see where reshape_request()
>> is stuck first.
>>
>> Thanks,
>> Kuai
>>
>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
>>> [<0>] md_thread+0xae/0x190 [md_mod]
>>> [<0>] kthread+0xe8/0x120
>>> [<0>] ret_from_fork+0x34/0x50
>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>
>>> Please let me know if there's a better way to provide the stack info.
>>>
>>> Thank you
>>>
>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/09/04 5:39, Jason Moss 写道:
>>>>> Hello,
>>>>>
>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
>>>>> growing it to 9 drives. I've done similar before with the same array,
>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8
>>>>> with no issues. Drives are WD Reds, most older than 2019, some
>>>>> (including the newest) newer, but all confirmed CMR and not SMR.
>>>>>
>>>>> Process used to expand the array:
>>>>> mdadm --add /dev/md0 /dev/sdb1
>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
>>>>>
>>>>> The reshape started off fine, the process was underway, and the volume
>>>>> was still usable as expected. However, 15-30 minutes into the reshape,
>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
>>>>> reshape was stopped at 0.6% with the counter not incrementing at all.
>>>>> Any process accessing the array would just hang until killed. I waited
>>>>
>>>> What kernel version are you using? And it'll be very helpful if you can
>>>> collect the stack of all stuck thread. There is a known deadlock for
>>>> raid5 related to reshape, and it's fixed in v6.5:
>>>>
>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
>>>>
>>>>> a half hour and there was still no further change to the counter. At
>>>>> this point, I restarted the server and found that when it came back up
>>>>> it would begin reshaping again, but only very briefly, under 30
>>>>> seconds, but the counter would be increasing during that time.
>>>>>
>>>>> I searched furiously for ideas and tried stopping and reassembling the
>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then
>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
>>>>> file. Nothing ever seemed to make a difference.
>>>>>
>>>>
>>>> Don't do this before v6.5, echo "reshape" while reshape is still in
>>>> progress will corrupt your data:
>>>>
>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>> Here is where I slightly panicked, worried that I'd borked my array,
>>>>> and powered off the server again and disconnected the new drive that
>>>>> was just added, assuming that since it was the change, it may be the
>>>>> problem despite having burn-in tested it, and figuring that I'll rush
>>>>> order a new drive, so long as the reshape continues and I can just
>>>>> rebuild onto a new drive once the reshape finishes. However, this made
>>>>> no difference and the array continued to not rebuild.
>>>>>
>>>>> Much searching later, I'd found nothing substantially different then
>>>>> I'd already tried and one of the common threads in other people's
>>>>> issues was bad drives, so I ran a self-test against each of the
>>>>> existing drives and found one drive that failed the read test.
>>>>> Thinking I had the culprit now, I dropped that drive out of the array
>>>>> and assembled the array again, but the same behavior persists. The
>>>>> array reshapes very briefly, then completely stops.
>>>>>
>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not
>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
>>>>> and very frustrated, I took a break, bought all new drives to build a
>>>>> new array in another server and restored from a backup. However, there
>>>>> is still some data not captured by the most recent backup that I would
>>>>> like to recover, and I'd also like to solve the problem purely to
>>>>> understand what happened and how to recover in the future.
>>>>>
>>>>> Is there anything else I should try to recover this array, or is this
>>>>> a lost cause?
>>>>>
>>>>> Details requested by the wiki to follow and I'm happy to collect any
>>>>> further data that would assist. /dev/sdb is the new drive that was
>>>>> added, then disconnected. /dev/sdh is the drive that failed a
>>>>> self-test and was removed from the array.
>>>>>
>>>>> Thank you in advance for any help provided!
>>>>>
>>>>>
>>>>> $ uname -a
>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> $ mdadm --version
>>>>> mdadm - v4.2 - 2021-12-30
>>>>>
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WCC4N7AT7R7X
>>>>> LU WWN Device Id: 5 0014ee 268545f93
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WCC4N7AT7R7X
>>>>> LU WWN Device Id: 5 0014ee 268545f93
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WXG1A8UGLS42
>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b
>>>>> Firmware Version: 80.00A80
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WCC4N4HYL32Y
>>>>> LU WWN Device Id: 5 0014ee 2630752f8
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68N32N0
>>>>> Serial Number:    WD-WCC7K1FF6DYK
>>>>> LU WWN Device Id: 5 0014ee 2ba952a30
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Form Factor:      3.5 inches
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-3 T13/2161-D revision 5
>>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sde
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WCC4N5ZHTRJF
>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68AX9N0
>>>>> Serial Number:    WD-WMC1T3804790
>>>>> LU WWN Device Id: 5 0014ee 6036b6826
>>>>> Firmware Version: 80.00A80
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WMC4N0H692Z9
>>>>> LU WWN Device Id: 5 0014ee 65af39740
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>> Serial Number:    WD-WMC4N0K5S750
>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca
>>>>> Firmware Version: 82.00A82
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Rotation Rate:    5400 rpm
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi
>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>
>>>>> === START OF INFORMATION SECTION ===
>>>>> Model Family:     Western Digital Red
>>>>> Device Model:     WDC WD30EFRX-68AX9N0
>>>>> Serial Number:    WD-WMC1T1502475
>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb
>>>>> Firmware Version: 80.00A80
>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
>>>>> SMART support is: Available - device has SMART capability.
>>>>> SMART support is: Enabled
>>>>>
>>>>> === START OF READ SMART DATA SECTION ===
>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>
>>>>> SCT Error Recovery Control:
>>>>>               Read:     70 (7.0 seconds)
>>>>>              Write:     70 (7.0 seconds)
>>>>>
>>>>>
>>>>> $ sudo mdadm --examine /dev/sda
>>>>> /dev/sda:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sda1
>>>>> /dev/sda1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0xd
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247728 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 8ca60ad5:60d19333:11b24820:91453532
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 24 sectors - bad
>>>>> blocks present.
>>>>>           Checksum : b6d8f4d1 - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 7
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdb
>>>>> /dev/sdb:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdb1
>>>>> /dev/sdb1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247728 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 386d3001:16447e43:4d2a5459:85618d11
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 00:02:59 2023
>>>>>      Bad Block Log : 512 entries available at offset 24 sectors
>>>>>           Checksum : b544a39 - correct
>>>>>             Events : 181077
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 8
>>>>>       Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdc
>>>>> /dev/sdc:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdc1
>>>>> /dev/sdc1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0xd
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors - bad
>>>>> blocks present.
>>>>>           Checksum : 88d8b8fc - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 4
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdd
>>>>> /dev/sdd:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdd1
>>>>> /dev/sdd1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247728 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 24 sectors
>>>>>           Checksum : d1471d9d - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 6
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sde
>>>>> /dev/sde:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sde1
>>>>> /dev/sde1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors
>>>>>           Checksum : e05d0278 - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 5
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdf
>>>>> /dev/sdf:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdf1
>>>>> /dev/sdf1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors
>>>>>           Checksum : 26792cc0 - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 0
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdg
>>>>> /dev/sdg:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdg1
>>>>> /dev/sdg1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 74476ce7:4edc23f6:08120711:ba281425
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors
>>>>>           Checksum : 6f67d179 - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 1
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdh
>>>>> /dev/sdh:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdh1
>>>>> /dev/sdh1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0xd
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 20:09:14 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors - bad
>>>>> blocks present.
>>>>>           Checksum : b7696b68 - correct
>>>>>             Events : 181089
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 2
>>>>>       Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --examine /dev/sdi
>>>>> /dev/sdi:
>>>>>       MBR Magic : aa55
>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>> $ sudo mdadm --examine /dev/sdi1
>>>>> /dev/sdi1:
>>>>>              Magic : a92b4efc
>>>>>            Version : 1.2
>>>>>        Feature Map : 0x5
>>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>               Name : Blyth:0  (local to host Blyth)
>>>>>      Creation Time : Tue Aug  4 23:47:57 2015
>>>>>         Raid Level : raid6
>>>>>       Raid Devices : 9
>>>>>
>>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>        Data Offset : 247808 sectors
>>>>>       Super Offset : 8 sectors
>>>>>       Unused Space : before=247720 sectors, after=14336 sectors
>>>>>              State : clean
>>>>>        Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
>>>>>
>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>      Delta Devices : 1 (8->9)
>>>>>
>>>>>        Update Time : Tue Jul 11 23:12:08 2023
>>>>>      Bad Block Log : 512 entries available at offset 72 sectors
>>>>>           Checksum : 23b6d024 - correct
>>>>>             Events : 181105
>>>>>
>>>>>             Layout : left-symmetric
>>>>>         Chunk Size : 512K
>>>>>
>>>>>       Device Role : Active device 3
>>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>
>>>>> $ sudo mdadm --detail /dev/md0
>>>>> /dev/md0:
>>>>>               Version : 1.2
>>>>>            Raid Level : raid6
>>>>>         Total Devices : 9
>>>>>           Persistence : Superblock is persistent
>>>>>
>>>>>                 State : inactive
>>>>>       Working Devices : 9
>>>>>
>>>>>         Delta Devices : 1, (-1->0)
>>>>>             New Level : raid6
>>>>>            New Layout : left-symmetric
>>>>>         New Chunksize : 512K
>>>>>
>>>>>                  Name : Blyth:0  (local to host Blyth)
>>>>>                  UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>                Events : 181105
>>>>>
>>>>>        Number   Major   Minor   RaidDevice
>>>>>
>>>>>           -       8        1        -        /dev/sda1
>>>>>           -       8      129        -        /dev/sdi1
>>>>>           -       8      113        -        /dev/sdh1
>>>>>           -       8       97        -        /dev/sdg1
>>>>>           -       8       81        -        /dev/sdf1
>>>>>           -       8       65        -        /dev/sde1
>>>>>           -       8       49        -        /dev/sdd1
>>>>>           -       8       33        -        /dev/sdc1
>>>>>           -       8       17        -        /dev/sdb1
>>>>>
>>>>> $ cat /proc/mdstat
>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>>>> [raid4] [raid10]
>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
>>>>>          26353689600 blocks super 1.2
>>>>>
>>>>> unused devices: <none>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-07  1:38         ` Yu Kuai
@ 2023-09-07  5:44           ` Jason Moss
       [not found]             ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-07  5:44 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/06 22:05, Jason Moss 写道:
> > Hi Kuai,
> >
> > I ended up using gdb rather than addr2line, as that output didn't give
> > me the global offset. Maybe there's a better way, but this seems to be
> > similar to what I expected.
>
> It's ok.
> >
> > (gdb) list *(reshape_request+0x416)
> > 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
> > 6391            if ((mddev->reshape_backwards
> > 6392                 ? (safepos > writepos && readpos < writepos)
> > 6393                 : (safepos < writepos && readpos > writepos)) ||
> > 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
> > 6395                    /* Cannot proceed until we've updated the
> > superblock... */
> > 6396                    wait_event(conf->wait_for_overlap,
> > 6397                               atomic_read(&conf->reshape_stripes)==0
> > 6398                               || test_bit(MD_RECOVERY_INTR,
>
> If reshape is stuck here, which means:
>
> 1) Either reshape io is stuck somewhere and never complete;
> 2) Or the counter reshape_stripes is broken;
>
> Can you read following debugfs files to verify if io is stuck in
> underlying disk?
>
> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}
>

I'll attach this below.

> Furthermore, echo frozen should break above wait_event() because
> 'MD_RECOVERY_INTR' will be set, however, based on your description,
> the problem still exist. Can you collect stack and addr2line result
> of stuck thread after echo frozen?
>

I echo'd frozen to /sys/block/md0/md/sync_action, however the echo
call has been sitting for about 30 minutes, maybe longer, and has not
returned. Here's the current state:

root         454  0.0  0.0      0     0 ?        I<   Sep05   0:00 [raid5wq]
root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
root         456 99.9  0.0      0     0 ?        R    Sep05 1543:40 [md0_raid6]
root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]

[jason@arch md]$ sudo cat /proc/457/stack
[<0>] md_do_sync+0xef2/0x11d0 [md_mod]
[<0>] md_thread+0xae/0x190 [md_mod]
[<0>] kthread+0xe8/0x120
[<0>] ret_from_fork+0x34/0x50
[<0>] ret_from_fork_asm+0x1b/0x30

Reading symbols from md-mod.ko...
(gdb) list *(md_do_sync+0xef2)
0xb3a2 is in md_do_sync (drivers/md/md.c:9035).
9030                    ? "interrupted" : "done");
9031            /*
9032             * this also signals 'finished resyncing' to md_stop
9033             */
9034            blk_finish_plug(&plug);
9035            wait_event(mddev->recovery_wait,
!atomic_read(&mddev->recovery_active));
9036
9037            if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
9038                !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
9039                mddev->curr_resync >= MD_RESYNC_ACTIVE) {


The debugfs info:

[root@arch ~]# cat
/sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=64
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=64
busy=1
cleared=55
bits_per_word=16
map_nr=4
alloc_hint={40, 20, 46, 0}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=48
nr_tags=32
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=32
busy=0
cleared=27
bits_per_word=8
map_nr=4
alloc_hint={19, 26, 5, 21}
wake_batch=4
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=4294967295


[root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx*
/{sched_tags,tags,busy,dispatch}
nr_tags=64
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=64
busy=1
cleared=56
bits_per_word=16
map_nr=4
alloc_hint={57, 43, 14, 19}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=48
nr_tags=32
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=32
busy=0
cleared=24
bits_per_word=8
map_nr=4
alloc_hint={17, 13, 23, 17}
wake_batch=4
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=4294967295


[root@arch ~]# cat
/sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=64
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=64
busy=1
cleared=51
bits_per_word=16
map_nr=4
alloc_hint={36, 43, 15, 7}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=48
nr_tags=32
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=32
busy=0
cleared=31
bits_per_word=8
map_nr=4
alloc_hint={0, 15, 1, 22}
wake_batch=4
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=1
min_shallow_depth=4294967295


[root@arch ~]# cat
/sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=256
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=256
busy=1
cleared=131
bits_per_word=64
map_nr=4
alloc_hint={125, 46, 83, 205}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=192
nr_tags=10104
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=10104
busy=0
cleared=235
bits_per_word=64
map_nr=158
alloc_hint={503, 2913, 9827, 9851}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=4294967295


[root@arch ~]# cat
/sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=256
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=256
busy=1
cleared=97
bits_per_word=64
map_nr=4
alloc_hint={144, 144, 127, 254}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=192
nr_tags=10104
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=10104
busy=0
cleared=235
bits_per_word=64
map_nr=158
alloc_hint={503, 2913, 9827, 9851}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=4294967295


[root@arch ~]# cat
/sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=256
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=256
busy=1
cleared=34
bits_per_word=64
map_nr=4
alloc_hint={197, 20, 1, 230}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=192
nr_tags=10104
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=10104
busy=0
cleared=235
bits_per_word=64
map_nr=158
alloc_hint={503, 2913, 9827, 9851}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=4294967295


[root@arch ~]# cat
/sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch}
nr_tags=256
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=256
busy=1
cleared=27
bits_per_word=64
map_nr=4
alloc_hint={132, 74, 129, 76}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=192
nr_tags=10104
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=10104
busy=0
cleared=235
bits_per_word=64
map_nr=158
alloc_hint={503, 2913, 9827, 9851}
wake_batch=8
wake_index=0
ws_active=0
ws={
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
        {.wait=inactive},
}
round_robin=0
min_shallow_depth=4294967295


Thanks for your continued assistance with this!
Jason


> Thanks,
> Kuai
>
> > &mddev->recovery));
> > 6399                    if (atomic_read(&conf->reshape_stripes) != 0)
> > 6400                            return 0;
> >
> > Thanks
> >
> > On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/05 0:38, Jason Moss 写道:
> >>> Hi Kuai,
> >>>
> >>> Thank you for the suggestion, I was previously on 5.15.0. I've built
> >>> an environment with 6.5.0.1 now and assembled the array there, but the
> >>> same problem happens. It reshaped for 20-30 seconds, then completely
> >>> stopped.
> >>>
> >>> Processes and /proc/<PID>/stack output:
> >>> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
> >>> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
> >>> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
> >>>
> >>> [root@arch ~]# cat /proc/24593/stack
> >>> [<0>] rescuer_thread+0x2b0/0x3b0
> >>> [<0>] kthread+0xe8/0x120
> >>> [<0>] ret_from_fork+0x34/0x50
> >>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>
> >>> [root@arch ~]# cat /proc/24594/stack
> >>>
> >>> [root@arch ~]# cat /proc/24595/stack
> >>> [<0>] reshape_request+0x416/0x9f0 [raid456]
> >> Can you provide the addr2line result? Let's see where reshape_request()
> >> is stuck first.
> >>
> >> Thanks,
> >> Kuai
> >>
> >>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
> >>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
> >>> [<0>] md_thread+0xae/0x190 [md_mod]
> >>> [<0>] kthread+0xe8/0x120
> >>> [<0>] ret_from_fork+0x34/0x50
> >>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>
> >>> Please let me know if there's a better way to provide the stack info.
> >>>
> >>> Thank you
> >>>
> >>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/09/04 5:39, Jason Moss 写道:
> >>>>> Hello,
> >>>>>
> >>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
> >>>>> growing it to 9 drives. I've done similar before with the same array,
> >>>>> having previously grown it from 6 drives to 7 and then from 7 to 8
> >>>>> with no issues. Drives are WD Reds, most older than 2019, some
> >>>>> (including the newest) newer, but all confirmed CMR and not SMR.
> >>>>>
> >>>>> Process used to expand the array:
> >>>>> mdadm --add /dev/md0 /dev/sdb1
> >>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> >>>>>
> >>>>> The reshape started off fine, the process was underway, and the volume
> >>>>> was still usable as expected. However, 15-30 minutes into the reshape,
> >>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
> >>>>> reshape was stopped at 0.6% with the counter not incrementing at all.
> >>>>> Any process accessing the array would just hang until killed. I waited
> >>>>
> >>>> What kernel version are you using? And it'll be very helpful if you can
> >>>> collect the stack of all stuck thread. There is a known deadlock for
> >>>> raid5 related to reshape, and it's fixed in v6.5:
> >>>>
> >>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
> >>>>
> >>>>> a half hour and there was still no further change to the counter. At
> >>>>> this point, I restarted the server and found that when it came back up
> >>>>> it would begin reshaping again, but only very briefly, under 30
> >>>>> seconds, but the counter would be increasing during that time.
> >>>>>
> >>>>> I searched furiously for ideas and tried stopping and reassembling the
> >>>>> array, assembling with an invalid-backup flag, echoing "frozen" then
> >>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
> >>>>> file. Nothing ever seemed to make a difference.
> >>>>>
> >>>>
> >>>> Don't do this before v6.5, echo "reshape" while reshape is still in
> >>>> progress will corrupt your data:
> >>>>
> >>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>
> >>>>> Here is where I slightly panicked, worried that I'd borked my array,
> >>>>> and powered off the server again and disconnected the new drive that
> >>>>> was just added, assuming that since it was the change, it may be the
> >>>>> problem despite having burn-in tested it, and figuring that I'll rush
> >>>>> order a new drive, so long as the reshape continues and I can just
> >>>>> rebuild onto a new drive once the reshape finishes. However, this made
> >>>>> no difference and the array continued to not rebuild.
> >>>>>
> >>>>> Much searching later, I'd found nothing substantially different then
> >>>>> I'd already tried and one of the common threads in other people's
> >>>>> issues was bad drives, so I ran a self-test against each of the
> >>>>> existing drives and found one drive that failed the read test.
> >>>>> Thinking I had the culprit now, I dropped that drive out of the array
> >>>>> and assembled the array again, but the same behavior persists. The
> >>>>> array reshapes very briefly, then completely stops.
> >>>>>
> >>>>> Down to 0 drives of redundancy (in the reshaped section at least), not
> >>>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
> >>>>> and very frustrated, I took a break, bought all new drives to build a
> >>>>> new array in another server and restored from a backup. However, there
> >>>>> is still some data not captured by the most recent backup that I would
> >>>>> like to recover, and I'd also like to solve the problem purely to
> >>>>> understand what happened and how to recover in the future.
> >>>>>
> >>>>> Is there anything else I should try to recover this array, or is this
> >>>>> a lost cause?
> >>>>>
> >>>>> Details requested by the wiki to follow and I'm happy to collect any
> >>>>> further data that would assist. /dev/sdb is the new drive that was
> >>>>> added, then disconnected. /dev/sdh is the drive that failed a
> >>>>> self-test and was removed from the array.
> >>>>>
> >>>>> Thank you in advance for any help provided!
> >>>>>
> >>>>>
> >>>>> $ uname -a
> >>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> >>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
> >>>>>
> >>>>> $ mdadm --version
> >>>>> mdadm - v4.2 - 2021-12-30
> >>>>>
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdb
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WXG1A8UGLS42
> >>>>> LU WWN Device Id: 5 0014ee 2b75ef53b
> >>>>> Firmware Version: 80.00A80
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdc
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WCC4N4HYL32Y
> >>>>> LU WWN Device Id: 5 0014ee 2630752f8
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdd
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68N32N0
> >>>>> Serial Number:    WD-WCC7K1FF6DYK
> >>>>> LU WWN Device Id: 5 0014ee 2ba952a30
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Form Factor:      3.5 inches
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-3 T13/2161-D revision 5
> >>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sde
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WCC4N5ZHTRJF
> >>>>> LU WWN Device Id: 5 0014ee 2b88b83bb
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdf
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>> Serial Number:    WD-WMC1T3804790
> >>>>> LU WWN Device Id: 5 0014ee 6036b6826
> >>>>> Firmware Version: 80.00A80
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdg
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WMC4N0H692Z9
> >>>>> LU WWN Device Id: 5 0014ee 65af39740
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdh
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>> Serial Number:    WD-WMC4N0K5S750
> >>>>> LU WWN Device Id: 5 0014ee 6b048d9ca
> >>>>> Firmware Version: 82.00A82
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Rotation Rate:    5400 rpm
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>> $ sudo smartctl -H -i -l scterc /dev/sdi
> >>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>
> >>>>> === START OF INFORMATION SECTION ===
> >>>>> Model Family:     Western Digital Red
> >>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>> Serial Number:    WD-WMC1T1502475
> >>>>> LU WWN Device Id: 5 0014ee 058d2e5cb
> >>>>> Firmware Version: 80.00A80
> >>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> >>>>> SMART support is: Available - device has SMART capability.
> >>>>> SMART support is: Enabled
> >>>>>
> >>>>> === START OF READ SMART DATA SECTION ===
> >>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>
> >>>>> SCT Error Recovery Control:
> >>>>>               Read:     70 (7.0 seconds)
> >>>>>              Write:     70 (7.0 seconds)
> >>>>>
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sda
> >>>>> /dev/sda:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sda1
> >>>>> /dev/sda1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0xd
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 24 sectors - bad
> >>>>> blocks present.
> >>>>>           Checksum : b6d8f4d1 - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 7
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdb
> >>>>> /dev/sdb:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdb1
> >>>>> /dev/sdb1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 00:02:59 2023
> >>>>>      Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>           Checksum : b544a39 - correct
> >>>>>             Events : 181077
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 8
> >>>>>       Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdc
> >>>>> /dev/sdc:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdc1
> >>>>> /dev/sdc1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0xd
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>> blocks present.
> >>>>>           Checksum : 88d8b8fc - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 4
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdd
> >>>>> /dev/sdd:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdd1
> >>>>> /dev/sdd1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>           Checksum : d1471d9d - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 6
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sde
> >>>>> /dev/sde:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sde1
> >>>>> /dev/sde1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>           Checksum : e05d0278 - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 5
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdf
> >>>>> /dev/sdf:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdf1
> >>>>> /dev/sdf1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>           Checksum : 26792cc0 - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 0
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdg
> >>>>> /dev/sdg:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdg1
> >>>>> /dev/sdg1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>           Checksum : 6f67d179 - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 1
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdh
> >>>>> /dev/sdh:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdh1
> >>>>> /dev/sdh1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0xd
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 20:09:14 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>> blocks present.
> >>>>>           Checksum : b7696b68 - correct
> >>>>>             Events : 181089
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 2
> >>>>>       Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --examine /dev/sdi
> >>>>> /dev/sdi:
> >>>>>       MBR Magic : aa55
> >>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>> $ sudo mdadm --examine /dev/sdi1
> >>>>> /dev/sdi1:
> >>>>>              Magic : a92b4efc
> >>>>>            Version : 1.2
> >>>>>        Feature Map : 0x5
> >>>>>         Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>               Name : Blyth:0  (local to host Blyth)
> >>>>>      Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>         Raid Level : raid6
> >>>>>       Raid Devices : 9
> >>>>>
> >>>>>     Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>         Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>      Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>        Data Offset : 247808 sectors
> >>>>>       Super Offset : 8 sectors
> >>>>>       Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>              State : clean
> >>>>>        Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> >>>>>
> >>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>      Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>      Delta Devices : 1 (8->9)
> >>>>>
> >>>>>        Update Time : Tue Jul 11 23:12:08 2023
> >>>>>      Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>           Checksum : 23b6d024 - correct
> >>>>>             Events : 181105
> >>>>>
> >>>>>             Layout : left-symmetric
> >>>>>         Chunk Size : 512K
> >>>>>
> >>>>>       Device Role : Active device 3
> >>>>>       Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>
> >>>>> $ sudo mdadm --detail /dev/md0
> >>>>> /dev/md0:
> >>>>>               Version : 1.2
> >>>>>            Raid Level : raid6
> >>>>>         Total Devices : 9
> >>>>>           Persistence : Superblock is persistent
> >>>>>
> >>>>>                 State : inactive
> >>>>>       Working Devices : 9
> >>>>>
> >>>>>         Delta Devices : 1, (-1->0)
> >>>>>             New Level : raid6
> >>>>>            New Layout : left-symmetric
> >>>>>         New Chunksize : 512K
> >>>>>
> >>>>>                  Name : Blyth:0  (local to host Blyth)
> >>>>>                  UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>                Events : 181105
> >>>>>
> >>>>>        Number   Major   Minor   RaidDevice
> >>>>>
> >>>>>           -       8        1        -        /dev/sda1
> >>>>>           -       8      129        -        /dev/sdi1
> >>>>>           -       8      113        -        /dev/sdh1
> >>>>>           -       8       97        -        /dev/sdg1
> >>>>>           -       8       81        -        /dev/sdf1
> >>>>>           -       8       65        -        /dev/sde1
> >>>>>           -       8       49        -        /dev/sdd1
> >>>>>           -       8       33        -        /dev/sdc1
> >>>>>           -       8       17        -        /dev/sdb1
> >>>>>
> >>>>> $ cat /proc/mdstat
> >>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >>>>> [raid4] [raid10]
> >>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> >>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
> >>>>>          26353689600 blocks super 1.2
> >>>>>
> >>>>> unused devices: <none>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
       [not found]             ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com>
@ 2023-09-07  6:19               ` Jason Moss
  2023-09-10  2:45                 ` Yu Kuai
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-07  6:19 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/07 13:44, Jason Moss 写道:
> > Hi,
> >
> > On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/06 22:05, Jason Moss 写道:
> >>> Hi Kuai,
> >>>
> >>> I ended up using gdb rather than addr2line, as that output didn't give
> >>> me the global offset. Maybe there's a better way, but this seems to be
> >>> similar to what I expected.
> >>
> >> It's ok.
> >>>
> >>> (gdb) list *(reshape_request+0x416)
> >>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
> >>> 6391            if ((mddev->reshape_backwards
> >>> 6392                 ? (safepos > writepos && readpos < writepos)
> >>> 6393                 : (safepos < writepos && readpos > writepos)) ||
> >>> 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
> >>> 6395                    /* Cannot proceed until we've updated the
> >>> superblock... */
> >>> 6396                    wait_event(conf->wait_for_overlap,
> >>> 6397                               atomic_read(&conf->reshape_stripes)==0
> >>> 6398                               || test_bit(MD_RECOVERY_INTR,
> >>
> >> If reshape is stuck here, which means:
> >>
> >> 1) Either reshape io is stuck somewhere and never complete;
> >> 2) Or the counter reshape_stripes is broken;
> >>
> >> Can you read following debugfs files to verify if io is stuck in
> >> underlying disk?
> >>
> >> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}
> >>
> >
> > I'll attach this below.
> >
> >> Furthermore, echo frozen should break above wait_event() because
> >> 'MD_RECOVERY_INTR' will be set, however, based on your description,
> >> the problem still exist. Can you collect stack and addr2line result
> >> of stuck thread after echo frozen?
> >>
> >
> > I echo'd frozen to /sys/block/md0/md/sync_action, however the echo
> > call has been sitting for about 30 minutes, maybe longer, and has not
> > returned. Here's the current state:
> >
> > root         454  0.0  0.0      0     0 ?        I<   Sep05   0:00 [raid5wq]
> > root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
>
> Can you also show the stack of udev-worker? And any other thread with
> 'D' state, I think above "echo frozen" is probably also stuck in D
> state.
>

As requested:

ps aux | grep D
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
root       45507  0.0  0.0   8272  4736 pts/1    Ds+  Sep05   0:00 -bash
jason     279169  0.0  0.0   6976  2560 pts/0    S+   23:16   0:00
grep --color=auto D

[jason@arch md]$ sudo cat /proc/455/stack
[<0>] wait_woken+0x54/0x60
[<0>] raid5_make_request+0x5fe/0x12f0 [raid456]
[<0>] md_handle_request+0x135/0x220 [md_mod]
[<0>] __submit_bio+0xb3/0x170
[<0>] submit_bio_noacct_nocheck+0x159/0x370
[<0>] block_read_full_folio+0x21c/0x340
[<0>] filemap_read_folio+0x40/0xd0
[<0>] filemap_get_pages+0x475/0x630
[<0>] filemap_read+0xd9/0x350
[<0>] blkdev_read_iter+0x6b/0x1b0
[<0>] vfs_read+0x201/0x350
[<0>] ksys_read+0x6f/0xf0
[<0>] do_syscall_64+0x60/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8


[jason@arch md]$ sudo cat /proc/45507/stack
[<0>] kthread_stop+0x6a/0x180
[<0>] md_unregister_thread+0x29/0x60 [md_mod]
[<0>] action_store+0x168/0x320 [md_mod]
[<0>] md_attr_store+0x86/0xf0 [md_mod]
[<0>] kernfs_fop_write_iter+0x136/0x1d0
[<0>] vfs_write+0x23e/0x420
[<0>] ksys_write+0x6f/0xf0
[<0>] do_syscall_64+0x60/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Please let me know if you'd like me to identify the lines for any of those.

Thanks,
Jason


> > root         456 99.9  0.0      0     0 ?        R    Sep05 1543:40 [md0_raid6]
> > root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
> >
> > [jason@arch md]$ sudo cat /proc/457/stack
> > [<0>] md_do_sync+0xef2/0x11d0 [md_mod]
> > [<0>] md_thread+0xae/0x190 [md_mod]
> > [<0>] kthread+0xe8/0x120
> > [<0>] ret_from_fork+0x34/0x50
> > [<0>] ret_from_fork_asm+0x1b/0x30
> >
> > Reading symbols from md-mod.ko...
> > (gdb) list *(md_do_sync+0xef2)
> > 0xb3a2 is in md_do_sync (drivers/md/md.c:9035).
> > 9030                    ? "interrupted" : "done");
> > 9031            /*
> > 9032             * this also signals 'finished resyncing' to md_stop
> > 9033             */
> > 9034            blk_finish_plug(&plug);
> > 9035            wait_event(mddev->recovery_wait,
> > !atomic_read(&mddev->recovery_active));
>
> That's also wait for reshape io to be done from common layer.
>
> > 9036
> > 9037            if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> > 9038                !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> > 9039                mddev->curr_resync >= MD_RESYNC_ACTIVE) {
> >
> >
> > The debugfs info:
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch}
>
> Only sched_tags is read, sorry that I didn't mean to use this exact cmd.
>
> Perhaps you can using following cmd:
>
> find /sys/kernel/debug/block/sda/ -type f | xargs grep .
>
> > nr_tags=64
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=64
> > busy=1
>
> This means there is one IO in sda, however, I need more information to
> make sure where is this IO. And please make sure don't run any other
> thread that can read/write from sda. You can use "iostat -dmx 1" and
> observe for a while to confirm that there is no new io.
>
> Thanks,
> Kuai
>
> > cleared=55
> > bits_per_word=16
> > map_nr=4
> > alloc_hint={40, 20, 46, 0}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=48
> > nr_tags=32
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=32
> > busy=0
> > cleared=27
> > bits_per_word=8
> > map_nr=4
> > alloc_hint={19, 26, 5, 21}
> > wake_batch=4
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=4294967295
>
>
> >
> >
> > [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx*
> > /{sched_tags,tags,busy,dispatch}
> > nr_tags=64
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=64
> > busy=1
> > cleared=56
> > bits_per_word=16
> > map_nr=4
> > alloc_hint={57, 43, 14, 19}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=48
> > nr_tags=32
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=32
> > busy=0
> > cleared=24
> > bits_per_word=8
> > map_nr=4
> > alloc_hint={17, 13, 23, 17}
> > wake_batch=4
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=4294967295
> >
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch}
> > nr_tags=64
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=64
> > busy=1
> > cleared=51
> > bits_per_word=16
> > map_nr=4
> > alloc_hint={36, 43, 15, 7}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=48
> > nr_tags=32
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=32
> > busy=0
> > cleared=31
> > bits_per_word=8
> > map_nr=4
> > alloc_hint={0, 15, 1, 22}
> > wake_batch=4
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=1
> > min_shallow_depth=4294967295
> >
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch}
> > nr_tags=256
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=256
> > busy=1
> > cleared=131
> > bits_per_word=64
> > map_nr=4
> > alloc_hint={125, 46, 83, 205}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=192
> > nr_tags=10104
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=10104
> > busy=0
> > cleared=235
> > bits_per_word=64
> > map_nr=158
> > alloc_hint={503, 2913, 9827, 9851}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=4294967295
> >
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch}
> > nr_tags=256
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=256
> > busy=1
> > cleared=97
> > bits_per_word=64
> > map_nr=4
> > alloc_hint={144, 144, 127, 254}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=192
> > nr_tags=10104
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=10104
> > busy=0
> > cleared=235
> > bits_per_word=64
> > map_nr=158
> > alloc_hint={503, 2913, 9827, 9851}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=4294967295
> >
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch}
> > nr_tags=256
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=256
> > busy=1
> > cleared=34
> > bits_per_word=64
> > map_nr=4
> > alloc_hint={197, 20, 1, 230}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=192
> > nr_tags=10104
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=10104
> > busy=0
> > cleared=235
> > bits_per_word=64
> > map_nr=158
> > alloc_hint={503, 2913, 9827, 9851}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=4294967295
> >
> >
> > [root@arch ~]# cat
> > /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch}
> > nr_tags=256
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=256
> > busy=1
> > cleared=27
> > bits_per_word=64
> > map_nr=4
> > alloc_hint={132, 74, 129, 76}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=192
> > nr_tags=10104
> > nr_reserved_tags=0
> > active_queues=0
> >
> > bitmap_tags:
> > depth=10104
> > busy=0
> > cleared=235
> > bits_per_word=64
> > map_nr=158
> > alloc_hint={503, 2913, 9827, 9851}
> > wake_batch=8
> > wake_index=0
> > ws_active=0
> > ws={
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> >          {.wait=inactive},
> > }
> > round_robin=0
> > min_shallow_depth=4294967295
> >
> >
> > Thanks for your continued assistance with this!
> > Jason
> >
> >
> >> Thanks,
> >> Kuai
> >>
> >>> &mddev->recovery));
> >>> 6399                    if (atomic_read(&conf->reshape_stripes) != 0)
> >>> 6400                            return 0;
> >>>
> >>> Thanks
> >>>
> >>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/09/05 0:38, Jason Moss 写道:
> >>>>> Hi Kuai,
> >>>>>
> >>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built
> >>>>> an environment with 6.5.0.1 now and assembled the array there, but the
> >>>>> same problem happens. It reshaped for 20-30 seconds, then completely
> >>>>> stopped.
> >>>>>
> >>>>> Processes and /proc/<PID>/stack output:
> >>>>> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
> >>>>> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
> >>>>> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
> >>>>>
> >>>>> [root@arch ~]# cat /proc/24593/stack
> >>>>> [<0>] rescuer_thread+0x2b0/0x3b0
> >>>>> [<0>] kthread+0xe8/0x120
> >>>>> [<0>] ret_from_fork+0x34/0x50
> >>>>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>>>
> >>>>> [root@arch ~]# cat /proc/24594/stack
> >>>>>
> >>>>> [root@arch ~]# cat /proc/24595/stack
> >>>>> [<0>] reshape_request+0x416/0x9f0 [raid456]
> >>>> Can you provide the addr2line result? Let's see where reshape_request()
> >>>> is stuck first.
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>
> >>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
> >>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
> >>>>> [<0>] md_thread+0xae/0x190 [md_mod]
> >>>>> [<0>] kthread+0xe8/0x120
> >>>>> [<0>] ret_from_fork+0x34/0x50
> >>>>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>>>
> >>>>> Please let me know if there's a better way to provide the stack info.
> >>>>>
> >>>>> Thank you
> >>>>>
> >>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> 在 2023/09/04 5:39, Jason Moss 写道:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
> >>>>>>> growing it to 9 drives. I've done similar before with the same array,
> >>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8
> >>>>>>> with no issues. Drives are WD Reds, most older than 2019, some
> >>>>>>> (including the newest) newer, but all confirmed CMR and not SMR.
> >>>>>>>
> >>>>>>> Process used to expand the array:
> >>>>>>> mdadm --add /dev/md0 /dev/sdb1
> >>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> >>>>>>>
> >>>>>>> The reshape started off fine, the process was underway, and the volume
> >>>>>>> was still usable as expected. However, 15-30 minutes into the reshape,
> >>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
> >>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all.
> >>>>>>> Any process accessing the array would just hang until killed. I waited
> >>>>>>
> >>>>>> What kernel version are you using? And it'll be very helpful if you can
> >>>>>> collect the stack of all stuck thread. There is a known deadlock for
> >>>>>> raid5 related to reshape, and it's fixed in v6.5:
> >>>>>>
> >>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
> >>>>>>
> >>>>>>> a half hour and there was still no further change to the counter. At
> >>>>>>> this point, I restarted the server and found that when it came back up
> >>>>>>> it would begin reshaping again, but only very briefly, under 30
> >>>>>>> seconds, but the counter would be increasing during that time.
> >>>>>>>
> >>>>>>> I searched furiously for ideas and tried stopping and reassembling the
> >>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then
> >>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
> >>>>>>> file. Nothing ever seemed to make a difference.
> >>>>>>>
> >>>>>>
> >>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in
> >>>>>> progress will corrupt your data:
> >>>>>>
> >>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Kuai
> >>>>>>
> >>>>>>> Here is where I slightly panicked, worried that I'd borked my array,
> >>>>>>> and powered off the server again and disconnected the new drive that
> >>>>>>> was just added, assuming that since it was the change, it may be the
> >>>>>>> problem despite having burn-in tested it, and figuring that I'll rush
> >>>>>>> order a new drive, so long as the reshape continues and I can just
> >>>>>>> rebuild onto a new drive once the reshape finishes. However, this made
> >>>>>>> no difference and the array continued to not rebuild.
> >>>>>>>
> >>>>>>> Much searching later, I'd found nothing substantially different then
> >>>>>>> I'd already tried and one of the common threads in other people's
> >>>>>>> issues was bad drives, so I ran a self-test against each of the
> >>>>>>> existing drives and found one drive that failed the read test.
> >>>>>>> Thinking I had the culprit now, I dropped that drive out of the array
> >>>>>>> and assembled the array again, but the same behavior persists. The
> >>>>>>> array reshapes very briefly, then completely stops.
> >>>>>>>
> >>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not
> >>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
> >>>>>>> and very frustrated, I took a break, bought all new drives to build a
> >>>>>>> new array in another server and restored from a backup. However, there
> >>>>>>> is still some data not captured by the most recent backup that I would
> >>>>>>> like to recover, and I'd also like to solve the problem purely to
> >>>>>>> understand what happened and how to recover in the future.
> >>>>>>>
> >>>>>>> Is there anything else I should try to recover this array, or is this
> >>>>>>> a lost cause?
> >>>>>>>
> >>>>>>> Details requested by the wiki to follow and I'm happy to collect any
> >>>>>>> further data that would assist. /dev/sdb is the new drive that was
> >>>>>>> added, then disconnected. /dev/sdh is the drive that failed a
> >>>>>>> self-test and was removed from the array.
> >>>>>>>
> >>>>>>> Thank you in advance for any help provided!
> >>>>>>>
> >>>>>>>
> >>>>>>> $ uname -a
> >>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> >>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
> >>>>>>>
> >>>>>>> $ mdadm --version
> >>>>>>> mdadm - v4.2 - 2021-12-30
> >>>>>>>
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WXG1A8UGLS42
> >>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b
> >>>>>>> Firmware Version: 80.00A80
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WCC4N4HYL32Y
> >>>>>>> LU WWN Device Id: 5 0014ee 2630752f8
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68N32N0
> >>>>>>> Serial Number:    WD-WCC7K1FF6DYK
> >>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Form Factor:      3.5 inches
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-3 T13/2161-D revision 5
> >>>>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WCC4N5ZHTRJF
> >>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>>>> Serial Number:    WD-WMC1T3804790
> >>>>>>> LU WWN Device Id: 5 0014ee 6036b6826
> >>>>>>> Firmware Version: 80.00A80
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WMC4N0H692Z9
> >>>>>>> LU WWN Device Id: 5 0014ee 65af39740
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>> Serial Number:    WD-WMC4N0K5S750
> >>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca
> >>>>>>> Firmware Version: 82.00A82
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Rotation Rate:    5400 rpm
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi
> >>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>
> >>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>> Model Family:     Western Digital Red
> >>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>>>> Serial Number:    WD-WMC1T1502475
> >>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb
> >>>>>>> Firmware Version: 80.00A80
> >>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> >>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>> SMART support is: Enabled
> >>>>>>>
> >>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>
> >>>>>>> SCT Error Recovery Control:
> >>>>>>>                Read:     70 (7.0 seconds)
> >>>>>>>               Write:     70 (7.0 seconds)
> >>>>>>>
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sda
> >>>>>>> /dev/sda:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sda1
> >>>>>>> /dev/sda1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0xd
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 24 sectors - bad
> >>>>>>> blocks present.
> >>>>>>>            Checksum : b6d8f4d1 - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 7
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdb
> >>>>>>> /dev/sdb:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdb1
> >>>>>>> /dev/sdb1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 00:02:59 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>>>            Checksum : b544a39 - correct
> >>>>>>>              Events : 181077
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 8
> >>>>>>>        Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdc
> >>>>>>> /dev/sdc:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdc1
> >>>>>>> /dev/sdc1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0xd
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>>>> blocks present.
> >>>>>>>            Checksum : 88d8b8fc - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 4
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdd
> >>>>>>> /dev/sdd:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdd1
> >>>>>>> /dev/sdd1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>>>            Checksum : d1471d9d - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 6
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sde
> >>>>>>> /dev/sde:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sde1
> >>>>>>> /dev/sde1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>            Checksum : e05d0278 - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 5
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdf
> >>>>>>> /dev/sdf:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdf1
> >>>>>>> /dev/sdf1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>            Checksum : 26792cc0 - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 0
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdg
> >>>>>>> /dev/sdg:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdg1
> >>>>>>> /dev/sdg1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>            Checksum : 6f67d179 - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 1
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdh
> >>>>>>> /dev/sdh:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdh1
> >>>>>>> /dev/sdh1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0xd
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 20:09:14 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>>>> blocks present.
> >>>>>>>            Checksum : b7696b68 - correct
> >>>>>>>              Events : 181089
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 2
> >>>>>>>        Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --examine /dev/sdi
> >>>>>>> /dev/sdi:
> >>>>>>>        MBR Magic : aa55
> >>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>> $ sudo mdadm --examine /dev/sdi1
> >>>>>>> /dev/sdi1:
> >>>>>>>               Magic : a92b4efc
> >>>>>>>             Version : 1.2
> >>>>>>>         Feature Map : 0x5
> >>>>>>>          Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                Name : Blyth:0  (local to host Blyth)
> >>>>>>>       Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>          Raid Level : raid6
> >>>>>>>        Raid Devices : 9
> >>>>>>>
> >>>>>>>      Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>          Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>       Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>         Data Offset : 247808 sectors
> >>>>>>>        Super Offset : 8 sectors
> >>>>>>>        Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>               State : clean
> >>>>>>>         Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> >>>>>>>
> >>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>       Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>       Delta Devices : 1 (8->9)
> >>>>>>>
> >>>>>>>         Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>       Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>            Checksum : 23b6d024 - correct
> >>>>>>>              Events : 181105
> >>>>>>>
> >>>>>>>              Layout : left-symmetric
> >>>>>>>          Chunk Size : 512K
> >>>>>>>
> >>>>>>>        Device Role : Active device 3
> >>>>>>>        Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>
> >>>>>>> $ sudo mdadm --detail /dev/md0
> >>>>>>> /dev/md0:
> >>>>>>>                Version : 1.2
> >>>>>>>             Raid Level : raid6
> >>>>>>>          Total Devices : 9
> >>>>>>>            Persistence : Superblock is persistent
> >>>>>>>
> >>>>>>>                  State : inactive
> >>>>>>>        Working Devices : 9
> >>>>>>>
> >>>>>>>          Delta Devices : 1, (-1->0)
> >>>>>>>              New Level : raid6
> >>>>>>>             New Layout : left-symmetric
> >>>>>>>          New Chunksize : 512K
> >>>>>>>
> >>>>>>>                   Name : Blyth:0  (local to host Blyth)
> >>>>>>>                   UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>                 Events : 181105
> >>>>>>>
> >>>>>>>         Number   Major   Minor   RaidDevice
> >>>>>>>
> >>>>>>>            -       8        1        -        /dev/sda1
> >>>>>>>            -       8      129        -        /dev/sdi1
> >>>>>>>            -       8      113        -        /dev/sdh1
> >>>>>>>            -       8       97        -        /dev/sdg1
> >>>>>>>            -       8       81        -        /dev/sdf1
> >>>>>>>            -       8       65        -        /dev/sde1
> >>>>>>>            -       8       49        -        /dev/sdd1
> >>>>>>>            -       8       33        -        /dev/sdc1
> >>>>>>>            -       8       17        -        /dev/sdb1
> >>>>>>>
> >>>>>>> $ cat /proc/mdstat
> >>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >>>>>>> [raid4] [raid10]
> >>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> >>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
> >>>>>>>           26353689600 blocks super 1.2
> >>>>>>>
> >>>>>>> unused devices: <none>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-07  6:19               ` Jason Moss
@ 2023-09-10  2:45                 ` Yu Kuai
  2023-09-10  4:58                   ` Jason Moss
  0 siblings, 1 reply; 21+ messages in thread
From: Yu Kuai @ 2023-09-10  2:45 UTC (permalink / raw)
  To: Jason Moss, Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

在 2023/09/07 14:19, Jason Moss 写道:
> Hi,
> 
> On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/07 13:44, Jason Moss 写道:
>>> Hi,
>>>
>>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/09/06 22:05, Jason Moss 写道:
>>>>> Hi Kuai,
>>>>>
>>>>> I ended up using gdb rather than addr2line, as that output didn't give
>>>>> me the global offset. Maybe there's a better way, but this seems to be
>>>>> similar to what I expected.
>>>>
>>>> It's ok.
>>>>>
>>>>> (gdb) list *(reshape_request+0x416)
>>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
>>>>> 6391            if ((mddev->reshape_backwards
>>>>> 6392                 ? (safepos > writepos && readpos < writepos)
>>>>> 6393                 : (safepos < writepos && readpos > writepos)) ||
>>>>> 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
>>>>> 6395                    /* Cannot proceed until we've updated the
>>>>> superblock... */
>>>>> 6396                    wait_event(conf->wait_for_overlap,
>>>>> 6397                               atomic_read(&conf->reshape_stripes)==0
>>>>> 6398                               || test_bit(MD_RECOVERY_INTR,
>>>>
>>>> If reshape is stuck here, which means:
>>>>
>>>> 1) Either reshape io is stuck somewhere and never complete;
>>>> 2) Or the counter reshape_stripes is broken;
>>>>
>>>> Can you read following debugfs files to verify if io is stuck in
>>>> underlying disk?
>>>>
>>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}
>>>>
>>>
>>> I'll attach this below.
>>>
>>>> Furthermore, echo frozen should break above wait_event() because
>>>> 'MD_RECOVERY_INTR' will be set, however, based on your description,
>>>> the problem still exist. Can you collect stack and addr2line result
>>>> of stuck thread after echo frozen?
>>>>
>>>
>>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo
>>> call has been sitting for about 30 minutes, maybe longer, and has not
>>> returned. Here's the current state:
>>>
>>> root         454  0.0  0.0      0     0 ?        I<   Sep05   0:00 [raid5wq]
>>> root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
>>
>> Can you also show the stack of udev-worker? And any other thread with
>> 'D' state, I think above "echo frozen" is probably also stuck in D
>> state.
>>
> 
> As requested:
> 
> ps aux | grep D
> USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
> root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
> root       45507  0.0  0.0   8272  4736 pts/1    Ds+  Sep05   0:00 -bash
> jason     279169  0.0  0.0   6976  2560 pts/0    S+   23:16   0:00
> grep --color=auto D
> 
> [jason@arch md]$ sudo cat /proc/455/stack
> [<0>] wait_woken+0x54/0x60
> [<0>] raid5_make_request+0x5fe/0x12f0 [raid456]
> [<0>] md_handle_request+0x135/0x220 [md_mod]
> [<0>] __submit_bio+0xb3/0x170
> [<0>] submit_bio_noacct_nocheck+0x159/0x370
> [<0>] block_read_full_folio+0x21c/0x340
> [<0>] filemap_read_folio+0x40/0xd0
> [<0>] filemap_get_pages+0x475/0x630
> [<0>] filemap_read+0xd9/0x350
> [<0>] blkdev_read_iter+0x6b/0x1b0
> [<0>] vfs_read+0x201/0x350
> [<0>] ksys_read+0x6f/0xf0
> [<0>] do_syscall_64+0x60/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> 
> [jason@arch md]$ sudo cat /proc/45507/stack
> [<0>] kthread_stop+0x6a/0x180
> [<0>] md_unregister_thread+0x29/0x60 [md_mod]
> [<0>] action_store+0x168/0x320 [md_mod]
> [<0>] md_attr_store+0x86/0xf0 [md_mod]
> [<0>] kernfs_fop_write_iter+0x136/0x1d0
> [<0>] vfs_write+0x23e/0x420
> [<0>] ksys_write+0x6f/0xf0
> [<0>] do_syscall_64+0x60/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> 
> Please let me know if you'd like me to identify the lines for any of those.
> 

That's enough.
> Thanks,
> Jason
> 
> 
>>> root         456 99.9  0.0      0     0 ?        R    Sep05 1543:40 [md0_raid6]
>>> root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
>>>
>>> [jason@arch md]$ sudo cat /proc/457/stack
>>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod]
>>> [<0>] md_thread+0xae/0x190 [md_mod]
>>> [<0>] kthread+0xe8/0x120
>>> [<0>] ret_from_fork+0x34/0x50
>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>
>>> Reading symbols from md-mod.ko...
>>> (gdb) list *(md_do_sync+0xef2)
>>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035).
>>> 9030                    ? "interrupted" : "done");
>>> 9031            /*
>>> 9032             * this also signals 'finished resyncing' to md_stop
>>> 9033             */
>>> 9034            blk_finish_plug(&plug);
>>> 9035            wait_event(mddev->recovery_wait,
>>> !atomic_read(&mddev->recovery_active));
>>
>> That's also wait for reshape io to be done from common layer.
>>
>>> 9036
>>> 9037            if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>>> 9038                !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>> 9039                mddev->curr_resync >= MD_RESYNC_ACTIVE) {
>>>
>>>
>>> The debugfs info:
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch}
>>
>> Only sched_tags is read, sorry that I didn't mean to use this exact cmd.
>>
>> Perhaps you can using following cmd:
>>
>> find /sys/kernel/debug/block/sda/ -type f | xargs grep .
>>
>>> nr_tags=64
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=64
>>> busy=1
>>
>> This means there is one IO in sda, however, I need more information to
>> make sure where is this IO. And please make sure don't run any other
>> thread that can read/write from sda. You can use "iostat -dmx 1" and
>> observe for a while to confirm that there is no new io.

And can you help for this? Confirm no new io and collect debugfs.

Thanks,
Kuai

>>
>> Thanks,
>> Kuai
>>
>>> cleared=55
>>> bits_per_word=16
>>> map_nr=4
>>> alloc_hint={40, 20, 46, 0}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=48
>>> nr_tags=32
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=32
>>> busy=0
>>> cleared=27
>>> bits_per_word=8
>>> map_nr=4
>>> alloc_hint={19, 26, 5, 21}
>>> wake_batch=4
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=4294967295
>>
>>
>>>
>>>
>>> [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx*
>>> /{sched_tags,tags,busy,dispatch}
>>> nr_tags=64
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=64
>>> busy=1
>>> cleared=56
>>> bits_per_word=16
>>> map_nr=4
>>> alloc_hint={57, 43, 14, 19}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=48
>>> nr_tags=32
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=32
>>> busy=0
>>> cleared=24
>>> bits_per_word=8
>>> map_nr=4
>>> alloc_hint={17, 13, 23, 17}
>>> wake_batch=4
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=4294967295
>>>
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch}
>>> nr_tags=64
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=64
>>> busy=1
>>> cleared=51
>>> bits_per_word=16
>>> map_nr=4
>>> alloc_hint={36, 43, 15, 7}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=48
>>> nr_tags=32
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=32
>>> busy=0
>>> cleared=31
>>> bits_per_word=8
>>> map_nr=4
>>> alloc_hint={0, 15, 1, 22}
>>> wake_batch=4
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=1
>>> min_shallow_depth=4294967295
>>>
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch}
>>> nr_tags=256
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=256
>>> busy=1
>>> cleared=131
>>> bits_per_word=64
>>> map_nr=4
>>> alloc_hint={125, 46, 83, 205}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=192
>>> nr_tags=10104
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=10104
>>> busy=0
>>> cleared=235
>>> bits_per_word=64
>>> map_nr=158
>>> alloc_hint={503, 2913, 9827, 9851}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=4294967295
>>>
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch}
>>> nr_tags=256
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=256
>>> busy=1
>>> cleared=97
>>> bits_per_word=64
>>> map_nr=4
>>> alloc_hint={144, 144, 127, 254}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=192
>>> nr_tags=10104
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=10104
>>> busy=0
>>> cleared=235
>>> bits_per_word=64
>>> map_nr=158
>>> alloc_hint={503, 2913, 9827, 9851}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=4294967295
>>>
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch}
>>> nr_tags=256
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=256
>>> busy=1
>>> cleared=34
>>> bits_per_word=64
>>> map_nr=4
>>> alloc_hint={197, 20, 1, 230}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=192
>>> nr_tags=10104
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=10104
>>> busy=0
>>> cleared=235
>>> bits_per_word=64
>>> map_nr=158
>>> alloc_hint={503, 2913, 9827, 9851}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=4294967295
>>>
>>>
>>> [root@arch ~]# cat
>>> /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch}
>>> nr_tags=256
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=256
>>> busy=1
>>> cleared=27
>>> bits_per_word=64
>>> map_nr=4
>>> alloc_hint={132, 74, 129, 76}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=192
>>> nr_tags=10104
>>> nr_reserved_tags=0
>>> active_queues=0
>>>
>>> bitmap_tags:
>>> depth=10104
>>> busy=0
>>> cleared=235
>>> bits_per_word=64
>>> map_nr=158
>>> alloc_hint={503, 2913, 9827, 9851}
>>> wake_batch=8
>>> wake_index=0
>>> ws_active=0
>>> ws={
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>>           {.wait=inactive},
>>> }
>>> round_robin=0
>>> min_shallow_depth=4294967295
>>>
>>>
>>> Thanks for your continued assistance with this!
>>> Jason
>>>
>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>> &mddev->recovery));
>>>>> 6399                    if (atomic_read(&conf->reshape_stripes) != 0)
>>>>> 6400                            return 0;
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2023/09/05 0:38, Jason Moss 写道:
>>>>>>> Hi Kuai,
>>>>>>>
>>>>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built
>>>>>>> an environment with 6.5.0.1 now and assembled the array there, but the
>>>>>>> same problem happens. It reshaped for 20-30 seconds, then completely
>>>>>>> stopped.
>>>>>>>
>>>>>>> Processes and /proc/<PID>/stack output:
>>>>>>> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
>>>>>>> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
>>>>>>> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
>>>>>>>
>>>>>>> [root@arch ~]# cat /proc/24593/stack
>>>>>>> [<0>] rescuer_thread+0x2b0/0x3b0
>>>>>>> [<0>] kthread+0xe8/0x120
>>>>>>> [<0>] ret_from_fork+0x34/0x50
>>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>>>>>
>>>>>>> [root@arch ~]# cat /proc/24594/stack
>>>>>>>
>>>>>>> [root@arch ~]# cat /proc/24595/stack
>>>>>>> [<0>] reshape_request+0x416/0x9f0 [raid456]
>>>>>> Can you provide the addr2line result? Let's see where reshape_request()
>>>>>> is stuck first.
>>>>>>
>>>>>> Thanks,
>>>>>> Kuai
>>>>>>
>>>>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
>>>>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
>>>>>>> [<0>] md_thread+0xae/0x190 [md_mod]
>>>>>>> [<0>] kthread+0xe8/0x120
>>>>>>> [<0>] ret_from_fork+0x34/0x50
>>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>>>>>
>>>>>>> Please let me know if there's a better way to provide the stack info.
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 在 2023/09/04 5:39, Jason Moss 写道:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
>>>>>>>>> growing it to 9 drives. I've done similar before with the same array,
>>>>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8
>>>>>>>>> with no issues. Drives are WD Reds, most older than 2019, some
>>>>>>>>> (including the newest) newer, but all confirmed CMR and not SMR.
>>>>>>>>>
>>>>>>>>> Process used to expand the array:
>>>>>>>>> mdadm --add /dev/md0 /dev/sdb1
>>>>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
>>>>>>>>>
>>>>>>>>> The reshape started off fine, the process was underway, and the volume
>>>>>>>>> was still usable as expected. However, 15-30 minutes into the reshape,
>>>>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
>>>>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all.
>>>>>>>>> Any process accessing the array would just hang until killed. I waited
>>>>>>>>
>>>>>>>> What kernel version are you using? And it'll be very helpful if you can
>>>>>>>> collect the stack of all stuck thread. There is a known deadlock for
>>>>>>>> raid5 related to reshape, and it's fixed in v6.5:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
>>>>>>>>
>>>>>>>>> a half hour and there was still no further change to the counter. At
>>>>>>>>> this point, I restarted the server and found that when it came back up
>>>>>>>>> it would begin reshaping again, but only very briefly, under 30
>>>>>>>>> seconds, but the counter would be increasing during that time.
>>>>>>>>>
>>>>>>>>> I searched furiously for ideas and tried stopping and reassembling the
>>>>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then
>>>>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
>>>>>>>>> file. Nothing ever seemed to make a difference.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in
>>>>>>>> progress will corrupt your data:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kuai
>>>>>>>>
>>>>>>>>> Here is where I slightly panicked, worried that I'd borked my array,
>>>>>>>>> and powered off the server again and disconnected the new drive that
>>>>>>>>> was just added, assuming that since it was the change, it may be the
>>>>>>>>> problem despite having burn-in tested it, and figuring that I'll rush
>>>>>>>>> order a new drive, so long as the reshape continues and I can just
>>>>>>>>> rebuild onto a new drive once the reshape finishes. However, this made
>>>>>>>>> no difference and the array continued to not rebuild.
>>>>>>>>>
>>>>>>>>> Much searching later, I'd found nothing substantially different then
>>>>>>>>> I'd already tried and one of the common threads in other people's
>>>>>>>>> issues was bad drives, so I ran a self-test against each of the
>>>>>>>>> existing drives and found one drive that failed the read test.
>>>>>>>>> Thinking I had the culprit now, I dropped that drive out of the array
>>>>>>>>> and assembled the array again, but the same behavior persists. The
>>>>>>>>> array reshapes very briefly, then completely stops.
>>>>>>>>>
>>>>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not
>>>>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
>>>>>>>>> and very frustrated, I took a break, bought all new drives to build a
>>>>>>>>> new array in another server and restored from a backup. However, there
>>>>>>>>> is still some data not captured by the most recent backup that I would
>>>>>>>>> like to recover, and I'd also like to solve the problem purely to
>>>>>>>>> understand what happened and how to recover in the future.
>>>>>>>>>
>>>>>>>>> Is there anything else I should try to recover this array, or is this
>>>>>>>>> a lost cause?
>>>>>>>>>
>>>>>>>>> Details requested by the wiki to follow and I'm happy to collect any
>>>>>>>>> further data that would assist. /dev/sdb is the new drive that was
>>>>>>>>> added, then disconnected. /dev/sdh is the drive that failed a
>>>>>>>>> self-test and was removed from the array.
>>>>>>>>>
>>>>>>>>> Thank you in advance for any help provided!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ uname -a
>>>>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
>>>>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>>>>
>>>>>>>>> $ mdadm --version
>>>>>>>>> mdadm - v4.2 - 2021-12-30
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WCC4N7AT7R7X
>>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WCC4N7AT7R7X
>>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WXG1A8UGLS42
>>>>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b
>>>>>>>>> Firmware Version: 80.00A80
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WCC4N4HYL32Y
>>>>>>>>> LU WWN Device Id: 5 0014ee 2630752f8
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68N32N0
>>>>>>>>> Serial Number:    WD-WCC7K1FF6DYK
>>>>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Form Factor:      3.5 inches
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-3 T13/2161-D revision 5
>>>>>>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WCC4N5ZHTRJF
>>>>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
>>>>>>>>> Serial Number:    WD-WMC1T3804790
>>>>>>>>> LU WWN Device Id: 5 0014ee 6036b6826
>>>>>>>>> Firmware Version: 80.00A80
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WMC4N0H692Z9
>>>>>>>>> LU WWN Device Id: 5 0014ee 65af39740
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
>>>>>>>>> Serial Number:    WD-WMC4N0K5S750
>>>>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca
>>>>>>>>> Firmware Version: 82.00A82
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Rotation Rate:    5400 rpm
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi
>>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
>>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
>>>>>>>>>
>>>>>>>>> === START OF INFORMATION SECTION ===
>>>>>>>>> Model Family:     Western Digital Red
>>>>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
>>>>>>>>> Serial Number:    WD-WMC1T1502475
>>>>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb
>>>>>>>>> Firmware Version: 80.00A80
>>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
>>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>>>>>>>> Device is:        In smartctl database [for details use: -P show]
>>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
>>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>>>>>>>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
>>>>>>>>> SMART support is: Available - device has SMART capability.
>>>>>>>>> SMART support is: Enabled
>>>>>>>>>
>>>>>>>>> === START OF READ SMART DATA SECTION ===
>>>>>>>>> SMART overall-health self-assessment test result: PASSED
>>>>>>>>>
>>>>>>>>> SCT Error Recovery Control:
>>>>>>>>>                 Read:     70 (7.0 seconds)
>>>>>>>>>                Write:     70 (7.0 seconds)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sda
>>>>>>>>> /dev/sda:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sda1
>>>>>>>>> /dev/sda1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0xd
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 8ca60ad5:60d19333:11b24820:91453532
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors - bad
>>>>>>>>> blocks present.
>>>>>>>>>             Checksum : b6d8f4d1 - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 7
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdb
>>>>>>>>> /dev/sdb:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdb1
>>>>>>>>> /dev/sdb1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 386d3001:16447e43:4d2a5459:85618d11
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 00:02:59 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors
>>>>>>>>>             Checksum : b544a39 - correct
>>>>>>>>>               Events : 181077
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 8
>>>>>>>>>         Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdc
>>>>>>>>> /dev/sdc:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdc1
>>>>>>>>> /dev/sdc1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0xd
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors - bad
>>>>>>>>> blocks present.
>>>>>>>>>             Checksum : 88d8b8fc - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 4
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdd
>>>>>>>>> /dev/sdd:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdd1
>>>>>>>>> /dev/sdd1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors
>>>>>>>>>             Checksum : d1471d9d - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 6
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sde
>>>>>>>>> /dev/sde:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sde1
>>>>>>>>> /dev/sde1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
>>>>>>>>>             Checksum : e05d0278 - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 5
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdf
>>>>>>>>> /dev/sdf:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdf1
>>>>>>>>> /dev/sdf1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
>>>>>>>>>             Checksum : 26792cc0 - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 0
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdg
>>>>>>>>> /dev/sdg:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdg1
>>>>>>>>> /dev/sdg1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 74476ce7:4edc23f6:08120711:ba281425
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
>>>>>>>>>             Checksum : 6f67d179 - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 1
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdh
>>>>>>>>> /dev/sdh:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdh1
>>>>>>>>> /dev/sdh1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0xd
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 20:09:14 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors - bad
>>>>>>>>> blocks present.
>>>>>>>>>             Checksum : b7696b68 - correct
>>>>>>>>>               Events : 181089
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 2
>>>>>>>>>         Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --examine /dev/sdi
>>>>>>>>> /dev/sdi:
>>>>>>>>>         MBR Magic : aa55
>>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
>>>>>>>>> $ sudo mdadm --examine /dev/sdi1
>>>>>>>>> /dev/sdi1:
>>>>>>>>>                Magic : a92b4efc
>>>>>>>>>              Version : 1.2
>>>>>>>>>          Feature Map : 0x5
>>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
>>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
>>>>>>>>>           Raid Level : raid6
>>>>>>>>>         Raid Devices : 9
>>>>>>>>>
>>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
>>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
>>>>>>>>>          Data Offset : 247808 sectors
>>>>>>>>>         Super Offset : 8 sectors
>>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
>>>>>>>>>                State : clean
>>>>>>>>>          Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
>>>>>>>>>
>>>>>>>>> Internal Bitmap : 8 sectors from superblock
>>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
>>>>>>>>>        Delta Devices : 1 (8->9)
>>>>>>>>>
>>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
>>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
>>>>>>>>>             Checksum : 23b6d024 - correct
>>>>>>>>>               Events : 181105
>>>>>>>>>
>>>>>>>>>               Layout : left-symmetric
>>>>>>>>>           Chunk Size : 512K
>>>>>>>>>
>>>>>>>>>         Device Role : Active device 3
>>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
>>>>>>>>>
>>>>>>>>> $ sudo mdadm --detail /dev/md0
>>>>>>>>> /dev/md0:
>>>>>>>>>                 Version : 1.2
>>>>>>>>>              Raid Level : raid6
>>>>>>>>>           Total Devices : 9
>>>>>>>>>             Persistence : Superblock is persistent
>>>>>>>>>
>>>>>>>>>                   State : inactive
>>>>>>>>>         Working Devices : 9
>>>>>>>>>
>>>>>>>>>           Delta Devices : 1, (-1->0)
>>>>>>>>>               New Level : raid6
>>>>>>>>>              New Layout : left-symmetric
>>>>>>>>>           New Chunksize : 512K
>>>>>>>>>
>>>>>>>>>                    Name : Blyth:0  (local to host Blyth)
>>>>>>>>>                    UUID : 440dc11e:079308b1:131eda79:9a74c670
>>>>>>>>>                  Events : 181105
>>>>>>>>>
>>>>>>>>>          Number   Major   Minor   RaidDevice
>>>>>>>>>
>>>>>>>>>             -       8        1        -        /dev/sda1
>>>>>>>>>             -       8      129        -        /dev/sdi1
>>>>>>>>>             -       8      113        -        /dev/sdh1
>>>>>>>>>             -       8       97        -        /dev/sdg1
>>>>>>>>>             -       8       81        -        /dev/sdf1
>>>>>>>>>             -       8       65        -        /dev/sde1
>>>>>>>>>             -       8       49        -        /dev/sdd1
>>>>>>>>>             -       8       33        -        /dev/sdc1
>>>>>>>>>             -       8       17        -        /dev/sdb1
>>>>>>>>>
>>>>>>>>> $ cat /proc/mdstat
>>>>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>>>>>>>> [raid4] [raid10]
>>>>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
>>>>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
>>>>>>>>>            26353689600 blocks super 1.2
>>>>>>>>>
>>>>>>>>> unused devices: <none>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-10  2:45                 ` Yu Kuai
@ 2023-09-10  4:58                   ` Jason Moss
  2023-09-10  6:10                     ` Yu Kuai
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Moss @ 2023-09-10  4:58 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, yangerkun@huawei.com, yukuai (C)

Hi,

On Sat, Sep 9, 2023 at 7:45 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/09/07 14:19, Jason Moss 写道:
> > Hi,
> >
> > On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/09/07 13:44, Jason Moss 写道:
> >>> Hi,
> >>>
> >>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/09/06 22:05, Jason Moss 写道:
> >>>>> Hi Kuai,
> >>>>>
> >>>>> I ended up using gdb rather than addr2line, as that output didn't give
> >>>>> me the global offset. Maybe there's a better way, but this seems to be
> >>>>> similar to what I expected.
> >>>>
> >>>> It's ok.
> >>>>>
> >>>>> (gdb) list *(reshape_request+0x416)
> >>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
> >>>>> 6391            if ((mddev->reshape_backwards
> >>>>> 6392                 ? (safepos > writepos && readpos < writepos)
> >>>>> 6393                 : (safepos < writepos && readpos > writepos)) ||
> >>>>> 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
> >>>>> 6395                    /* Cannot proceed until we've updated the
> >>>>> superblock... */
> >>>>> 6396                    wait_event(conf->wait_for_overlap,
> >>>>> 6397                               atomic_read(&conf->reshape_stripes)==0
> >>>>> 6398                               || test_bit(MD_RECOVERY_INTR,
> >>>>
> >>>> If reshape is stuck here, which means:
> >>>>
> >>>> 1) Either reshape io is stuck somewhere and never complete;
> >>>> 2) Or the counter reshape_stripes is broken;
> >>>>
> >>>> Can you read following debugfs files to verify if io is stuck in
> >>>> underlying disk?
> >>>>
> >>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}
> >>>>
> >>>
> >>> I'll attach this below.
> >>>
> >>>> Furthermore, echo frozen should break above wait_event() because
> >>>> 'MD_RECOVERY_INTR' will be set, however, based on your description,
> >>>> the problem still exist. Can you collect stack and addr2line result
> >>>> of stuck thread after echo frozen?
> >>>>
> >>>
> >>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo
> >>> call has been sitting for about 30 minutes, maybe longer, and has not
> >>> returned. Here's the current state:
> >>>
> >>> root         454  0.0  0.0      0     0 ?        I<   Sep05   0:00 [raid5wq]
> >>> root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
> >>
> >> Can you also show the stack of udev-worker? And any other thread with
> >> 'D' state, I think above "echo frozen" is probably also stuck in D
> >> state.
> >>
> >
> > As requested:
> >
> > ps aux | grep D
> > USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> > root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
> > root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
> > root       45507  0.0  0.0   8272  4736 pts/1    Ds+  Sep05   0:00 -bash
> > jason     279169  0.0  0.0   6976  2560 pts/0    S+   23:16   0:00
> > grep --color=auto D
> >
> > [jason@arch md]$ sudo cat /proc/455/stack
> > [<0>] wait_woken+0x54/0x60
> > [<0>] raid5_make_request+0x5fe/0x12f0 [raid456]
> > [<0>] md_handle_request+0x135/0x220 [md_mod]
> > [<0>] __submit_bio+0xb3/0x170
> > [<0>] submit_bio_noacct_nocheck+0x159/0x370
> > [<0>] block_read_full_folio+0x21c/0x340
> > [<0>] filemap_read_folio+0x40/0xd0
> > [<0>] filemap_get_pages+0x475/0x630
> > [<0>] filemap_read+0xd9/0x350
> > [<0>] blkdev_read_iter+0x6b/0x1b0
> > [<0>] vfs_read+0x201/0x350
> > [<0>] ksys_read+0x6f/0xf0
> > [<0>] do_syscall_64+0x60/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >
> >
> > [jason@arch md]$ sudo cat /proc/45507/stack
> > [<0>] kthread_stop+0x6a/0x180
> > [<0>] md_unregister_thread+0x29/0x60 [md_mod]
> > [<0>] action_store+0x168/0x320 [md_mod]
> > [<0>] md_attr_store+0x86/0xf0 [md_mod]
> > [<0>] kernfs_fop_write_iter+0x136/0x1d0
> > [<0>] vfs_write+0x23e/0x420
> > [<0>] ksys_write+0x6f/0xf0
> > [<0>] do_syscall_64+0x60/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> >
> > Please let me know if you'd like me to identify the lines for any of those.
> >
>
> That's enough.
> > Thanks,
> > Jason
> >
> >
> >>> root         456 99.9  0.0      0     0 ?        R    Sep05 1543:40 [md0_raid6]
> >>> root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
> >>>
> >>> [jason@arch md]$ sudo cat /proc/457/stack
> >>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod]
> >>> [<0>] md_thread+0xae/0x190 [md_mod]
> >>> [<0>] kthread+0xe8/0x120
> >>> [<0>] ret_from_fork+0x34/0x50
> >>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>
> >>> Reading symbols from md-mod.ko...
> >>> (gdb) list *(md_do_sync+0xef2)
> >>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035).
> >>> 9030                    ? "interrupted" : "done");
> >>> 9031            /*
> >>> 9032             * this also signals 'finished resyncing' to md_stop
> >>> 9033             */
> >>> 9034            blk_finish_plug(&plug);
> >>> 9035            wait_event(mddev->recovery_wait,
> >>> !atomic_read(&mddev->recovery_active));
> >>
> >> That's also wait for reshape io to be done from common layer.
> >>
> >>> 9036
> >>> 9037            if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> >>> 9038                !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
> >>> 9039                mddev->curr_resync >= MD_RESYNC_ACTIVE) {
> >>>
> >>>
> >>> The debugfs info:
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch}
> >>
> >> Only sched_tags is read, sorry that I didn't mean to use this exact cmd.
> >>
> >> Perhaps you can using following cmd:
> >>
> >> find /sys/kernel/debug/block/sda/ -type f | xargs grep .
> >>
> >>> nr_tags=64
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=64
> >>> busy=1
> >>
> >> This means there is one IO in sda, however, I need more information to
> >> make sure where is this IO. And please make sure don't run any other
> >> thread that can read/write from sda. You can use "iostat -dmx 1" and
> >> observe for a while to confirm that there is no new io.
>
> And can you help for this? Confirm no new io and collect debugfs.

As instructed, I confirmed there is no active IO to sda1 via iostat. I
then ran the provided command

[root@arch ~]# find /sys/kernel/debug/block/sda/ -type f | xargs grep .
/sys/kernel/debug/block/sda/rqos/wbt/wb_background:6
/sys/kernel/debug/block/sda/rqos/wbt/wb_normal:12
/sys/kernel/debug/block/sda/rqos/wbt/unknown_cnt:4
/sys/kernel/debug/block/sda/rqos/wbt/min_lat_nsec:75000000
/sys/kernel/debug/block/sda/rqos/wbt/inflight:0: inflight 1
/sys/kernel/debug/block/sda/rqos/wbt/inflight:1: inflight 0
/sys/kernel/debug/block/sda/rqos/wbt/inflight:2: inflight 0
/sys/kernel/debug/block/sda/rqos/wbt/id:0
/sys/kernel/debug/block/sda/rqos/wbt/enabled:1
/sys/kernel/debug/block/sda/rqos/wbt/curr_win_nsec:100000000
/sys/kernel/debug/block/sda/hctx0/type:default
/sys/kernel/debug/block/sda/hctx0/dispatch_busy:0
/sys/kernel/debug/block/sda/hctx0/active:0
/sys/kernel/debug/block/sda/hctx0/run:2583
/sys/kernel/debug/block/sda/hctx0/sched_tags_bitmap:00000000: 0000
0000 8000 0000
/sys/kernel/debug/block/sda/hctx0/sched_tags:nr_tags=64
/sys/kernel/debug/block/sda/hctx0/sched_tags:nr_reserved_tags=0
/sys/kernel/debug/block/sda/hctx0/sched_tags:active_queues=0
/sys/kernel/debug/block/sda/hctx0/sched_tags:bitmap_tags:
/sys/kernel/debug/block/sda/hctx0/sched_tags:depth=64
/sys/kernel/debug/block/sda/hctx0/sched_tags:busy=1
/sys/kernel/debug/block/sda/hctx0/sched_tags:cleared=57
/sys/kernel/debug/block/sda/hctx0/sched_tags:bits_per_word=16
/sys/kernel/debug/block/sda/hctx0/sched_tags:map_nr=4
/sys/kernel/debug/block/sda/hctx0/sched_tags:alloc_hint={40, 20, 48, 0}
/sys/kernel/debug/block/sda/hctx0/sched_tags:wake_batch=8
/sys/kernel/debug/block/sda/hctx0/sched_tags:wake_index=0
/sys/kernel/debug/block/sda/hctx0/sched_tags:ws_active=0
/sys/kernel/debug/block/sda/hctx0/sched_tags:ws={
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/sched_tags:}
/sys/kernel/debug/block/sda/hctx0/sched_tags:round_robin=1
/sys/kernel/debug/block/sda/hctx0/sched_tags:min_shallow_depth=48
/sys/kernel/debug/block/sda/hctx0/tags_bitmap:00000000: 0000 0000
/sys/kernel/debug/block/sda/hctx0/tags:nr_tags=32
/sys/kernel/debug/block/sda/hctx0/tags:nr_reserved_tags=0
/sys/kernel/debug/block/sda/hctx0/tags:active_queues=0
/sys/kernel/debug/block/sda/hctx0/tags:bitmap_tags:
/sys/kernel/debug/block/sda/hctx0/tags:depth=32
/sys/kernel/debug/block/sda/hctx0/tags:busy=0
/sys/kernel/debug/block/sda/hctx0/tags:cleared=21
/sys/kernel/debug/block/sda/hctx0/tags:bits_per_word=8
/sys/kernel/debug/block/sda/hctx0/tags:map_nr=4
/sys/kernel/debug/block/sda/hctx0/tags:alloc_hint={19, 26, 7, 21}
/sys/kernel/debug/block/sda/hctx0/tags:wake_batch=4
/sys/kernel/debug/block/sda/hctx0/tags:wake_index=0
/sys/kernel/debug/block/sda/hctx0/tags:ws_active=0
/sys/kernel/debug/block/sda/hctx0/tags:ws={
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
/sys/kernel/debug/block/sda/hctx0/tags:}
/sys/kernel/debug/block/sda/hctx0/tags:round_robin=1
/sys/kernel/debug/block/sda/hctx0/tags:min_shallow_depth=4294967295
/sys/kernel/debug/block/sda/hctx0/ctx_map:00000000: 00
/sys/kernel/debug/block/sda/hctx0/flags:alloc_policy=RR SHOULD_MERGE
/sys/kernel/debug/block/sda/sched/queued:0 0 0
/sys/kernel/debug/block/sda/sched/owned_by_driver:0 0 0
/sys/kernel/debug/block/sda/sched/async_depth:48
/sys/kernel/debug/block/sda/sched/starved:0
/sys/kernel/debug/block/sda/sched/batching:2
/sys/kernel/debug/block/sda/state:SAME_COMP|IO_STAT|ADD_RANDOM|INIT_DONE|WC|STATS|REGISTERED|NOWAIT|SQ_SCHED
/sys/kernel/debug/block/sda/pm_only:0

Let me know if there's anything further I can provide to assist in
troubleshooting.

Thanks,
Jason

>
> Thanks,
> Kuai
>
> >>
> >> Thanks,
> >> Kuai
> >>
> >>> cleared=55
> >>> bits_per_word=16
> >>> map_nr=4
> >>> alloc_hint={40, 20, 46, 0}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=48
> >>> nr_tags=32
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=32
> >>> busy=0
> >>> cleared=27
> >>> bits_per_word=8
> >>> map_nr=4
> >>> alloc_hint={19, 26, 5, 21}
> >>> wake_batch=4
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=4294967295
> >>
> >>
> >>>
> >>>
> >>> [root@arch ~]# cat /sys/kernel/debug/block/sdb/hctx*
> >>> /{sched_tags,tags,busy,dispatch}
> >>> nr_tags=64
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=64
> >>> busy=1
> >>> cleared=56
> >>> bits_per_word=16
> >>> map_nr=4
> >>> alloc_hint={57, 43, 14, 19}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=48
> >>> nr_tags=32
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=32
> >>> busy=0
> >>> cleared=24
> >>> bits_per_word=8
> >>> map_nr=4
> >>> alloc_hint={17, 13, 23, 17}
> >>> wake_batch=4
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sdd/hctx*/{sched_tags,tags,busy,dispatch}
> >>> nr_tags=64
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=64
> >>> busy=1
> >>> cleared=51
> >>> bits_per_word=16
> >>> map_nr=4
> >>> alloc_hint={36, 43, 15, 7}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=48
> >>> nr_tags=32
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=32
> >>> busy=0
> >>> cleared=31
> >>> bits_per_word=8
> >>> map_nr=4
> >>> alloc_hint={0, 15, 1, 22}
> >>> wake_batch=4
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=1
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sdf/hctx*/{sched_tags,tags,busy,dispatch}
> >>> nr_tags=256
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=256
> >>> busy=1
> >>> cleared=131
> >>> bits_per_word=64
> >>> map_nr=4
> >>> alloc_hint={125, 46, 83, 205}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=192
> >>> nr_tags=10104
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=10104
> >>> busy=0
> >>> cleared=235
> >>> bits_per_word=64
> >>> map_nr=158
> >>> alloc_hint={503, 2913, 9827, 9851}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sdh/hctx*/{sched_tags,tags,busy,dispatch}
> >>> nr_tags=256
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=256
> >>> busy=1
> >>> cleared=97
> >>> bits_per_word=64
> >>> map_nr=4
> >>> alloc_hint={144, 144, 127, 254}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=192
> >>> nr_tags=10104
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=10104
> >>> busy=0
> >>> cleared=235
> >>> bits_per_word=64
> >>> map_nr=158
> >>> alloc_hint={503, 2913, 9827, 9851}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sdi/hctx*/{sched_tags,tags,busy,dispatch}
> >>> nr_tags=256
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=256
> >>> busy=1
> >>> cleared=34
> >>> bits_per_word=64
> >>> map_nr=4
> >>> alloc_hint={197, 20, 1, 230}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=192
> >>> nr_tags=10104
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=10104
> >>> busy=0
> >>> cleared=235
> >>> bits_per_word=64
> >>> map_nr=158
> >>> alloc_hint={503, 2913, 9827, 9851}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> [root@arch ~]# cat
> >>> /sys/kernel/debug/block/sdj/hctx*/{sched_tags,tags,busy,dispatch}
> >>> nr_tags=256
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=256
> >>> busy=1
> >>> cleared=27
> >>> bits_per_word=64
> >>> map_nr=4
> >>> alloc_hint={132, 74, 129, 76}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=192
> >>> nr_tags=10104
> >>> nr_reserved_tags=0
> >>> active_queues=0
> >>>
> >>> bitmap_tags:
> >>> depth=10104
> >>> busy=0
> >>> cleared=235
> >>> bits_per_word=64
> >>> map_nr=158
> >>> alloc_hint={503, 2913, 9827, 9851}
> >>> wake_batch=8
> >>> wake_index=0
> >>> ws_active=0
> >>> ws={
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>>           {.wait=inactive},
> >>> }
> >>> round_robin=0
> >>> min_shallow_depth=4294967295
> >>>
> >>>
> >>> Thanks for your continued assistance with this!
> >>> Jason
> >>>
> >>>
> >>>> Thanks,
> >>>> Kuai
> >>>>
> >>>>> &mddev->recovery));
> >>>>> 6399                    if (atomic_read(&conf->reshape_stripes) != 0)
> >>>>> 6400                            return 0;
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> On Mon, Sep 4, 2023 at 6:08 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> 在 2023/09/05 0:38, Jason Moss 写道:
> >>>>>>> Hi Kuai,
> >>>>>>>
> >>>>>>> Thank you for the suggestion, I was previously on 5.15.0. I've built
> >>>>>>> an environment with 6.5.0.1 now and assembled the array there, but the
> >>>>>>> same problem happens. It reshaped for 20-30 seconds, then completely
> >>>>>>> stopped.
> >>>>>>>
> >>>>>>> Processes and /proc/<PID>/stack output:
> >>>>>>> root       24593  0.0  0.0      0     0 ?        I<   09:22   0:00 [raid5wq]
> >>>>>>> root       24594 96.5  0.0      0     0 ?        R    09:22   2:29 [md0_raid6]
> >>>>>>> root       24595  0.3  0.0      0     0 ?        D    09:22   0:00 [md0_reshape]
> >>>>>>>
> >>>>>>> [root@arch ~]# cat /proc/24593/stack
> >>>>>>> [<0>] rescuer_thread+0x2b0/0x3b0
> >>>>>>> [<0>] kthread+0xe8/0x120
> >>>>>>> [<0>] ret_from_fork+0x34/0x50
> >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>>>>>
> >>>>>>> [root@arch ~]# cat /proc/24594/stack
> >>>>>>>
> >>>>>>> [root@arch ~]# cat /proc/24595/stack
> >>>>>>> [<0>] reshape_request+0x416/0x9f0 [raid456]
> >>>>>> Can you provide the addr2line result? Let's see where reshape_request()
> >>>>>> is stuck first.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Kuai
> >>>>>>
> >>>>>>> [<0>] raid5_sync_request+0x2fc/0x3d0 [raid456]
> >>>>>>> [<0>] md_do_sync+0x7d6/0x11d0 [md_mod]
> >>>>>>> [<0>] md_thread+0xae/0x190 [md_mod]
> >>>>>>> [<0>] kthread+0xe8/0x120
> >>>>>>> [<0>] ret_from_fork+0x34/0x50
> >>>>>>> [<0>] ret_from_fork_asm+0x1b/0x30
> >>>>>>>
> >>>>>>> Please let me know if there's a better way to provide the stack info.
> >>>>>>>
> >>>>>>> Thank you
> >>>>>>>
> >>>>>>> On Sun, Sep 3, 2023 at 6:41 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> 在 2023/09/04 5:39, Jason Moss 写道:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> I recently attempted to add a new drive to my 8-drive RAID 6 array,
> >>>>>>>>> growing it to 9 drives. I've done similar before with the same array,
> >>>>>>>>> having previously grown it from 6 drives to 7 and then from 7 to 8
> >>>>>>>>> with no issues. Drives are WD Reds, most older than 2019, some
> >>>>>>>>> (including the newest) newer, but all confirmed CMR and not SMR.
> >>>>>>>>>
> >>>>>>>>> Process used to expand the array:
> >>>>>>>>> mdadm --add /dev/md0 /dev/sdb1
> >>>>>>>>> mdadm --grow --raid-devices=9 --backup-file=/root/grow_md0.bak /dev/md0
> >>>>>>>>>
> >>>>>>>>> The reshape started off fine, the process was underway, and the volume
> >>>>>>>>> was still usable as expected. However, 15-30 minutes into the reshape,
> >>>>>>>>> I lost access to the contents of the drive. Checking /proc/mdstat, the
> >>>>>>>>> reshape was stopped at 0.6% with the counter not incrementing at all.
> >>>>>>>>> Any process accessing the array would just hang until killed. I waited
> >>>>>>>>
> >>>>>>>> What kernel version are you using? And it'll be very helpful if you can
> >>>>>>>> collect the stack of all stuck thread. There is a known deadlock for
> >>>>>>>> raid5 related to reshape, and it's fixed in v6.5:
> >>>>>>>>
> >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-6-yukuai1@huaweicloud.com
> >>>>>>>>
> >>>>>>>>> a half hour and there was still no further change to the counter. At
> >>>>>>>>> this point, I restarted the server and found that when it came back up
> >>>>>>>>> it would begin reshaping again, but only very briefly, under 30
> >>>>>>>>> seconds, but the counter would be increasing during that time.
> >>>>>>>>>
> >>>>>>>>> I searched furiously for ideas and tried stopping and reassembling the
> >>>>>>>>> array, assembling with an invalid-backup flag, echoing "frozen" then
> >>>>>>>>> "reshape" to the sync_action file, and echoing "max" to the sync_max
> >>>>>>>>> file. Nothing ever seemed to make a difference.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Don't do this before v6.5, echo "reshape" while reshape is still in
> >>>>>>>> progress will corrupt your data:
> >>>>>>>>
> >>>>>>>> https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Kuai
> >>>>>>>>
> >>>>>>>>> Here is where I slightly panicked, worried that I'd borked my array,
> >>>>>>>>> and powered off the server again and disconnected the new drive that
> >>>>>>>>> was just added, assuming that since it was the change, it may be the
> >>>>>>>>> problem despite having burn-in tested it, and figuring that I'll rush
> >>>>>>>>> order a new drive, so long as the reshape continues and I can just
> >>>>>>>>> rebuild onto a new drive once the reshape finishes. However, this made
> >>>>>>>>> no difference and the array continued to not rebuild.
> >>>>>>>>>
> >>>>>>>>> Much searching later, I'd found nothing substantially different then
> >>>>>>>>> I'd already tried and one of the common threads in other people's
> >>>>>>>>> issues was bad drives, so I ran a self-test against each of the
> >>>>>>>>> existing drives and found one drive that failed the read test.
> >>>>>>>>> Thinking I had the culprit now, I dropped that drive out of the array
> >>>>>>>>> and assembled the array again, but the same behavior persists. The
> >>>>>>>>> array reshapes very briefly, then completely stops.
> >>>>>>>>>
> >>>>>>>>> Down to 0 drives of redundancy (in the reshaped section at least), not
> >>>>>>>>> finding any new ideas on any of the forums, mailing list, wiki, etc,
> >>>>>>>>> and very frustrated, I took a break, bought all new drives to build a
> >>>>>>>>> new array in another server and restored from a backup. However, there
> >>>>>>>>> is still some data not captured by the most recent backup that I would
> >>>>>>>>> like to recover, and I'd also like to solve the problem purely to
> >>>>>>>>> understand what happened and how to recover in the future.
> >>>>>>>>>
> >>>>>>>>> Is there anything else I should try to recover this array, or is this
> >>>>>>>>> a lost cause?
> >>>>>>>>>
> >>>>>>>>> Details requested by the wiki to follow and I'm happy to collect any
> >>>>>>>>> further data that would assist. /dev/sdb is the new drive that was
> >>>>>>>>> added, then disconnected. /dev/sdh is the drive that failed a
> >>>>>>>>> self-test and was removed from the array.
> >>>>>>>>>
> >>>>>>>>> Thank you in advance for any help provided!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> $ uname -a
> >>>>>>>>> Linux Blyth 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC
> >>>>>>>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
> >>>>>>>>>
> >>>>>>>>> $ mdadm --version
> >>>>>>>>> mdadm - v4.2 - 2021-12-30
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:27:55 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sda
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WCC4N7AT7R7X
> >>>>>>>>> LU WWN Device Id: 5 0014ee 268545f93
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:16 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdb
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WXG1A8UGLS42
> >>>>>>>>> LU WWN Device Id: 5 0014ee 2b75ef53b
> >>>>>>>>> Firmware Version: 80.00A80
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:19 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdc
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WCC4N4HYL32Y
> >>>>>>>>> LU WWN Device Id: 5 0014ee 2630752f8
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:20 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdd
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68N32N0
> >>>>>>>>> Serial Number:    WD-WCC7K1FF6DYK
> >>>>>>>>> LU WWN Device Id: 5 0014ee 2ba952a30
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Form Factor:      3.5 inches
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-3 T13/2161-D revision 5
> >>>>>>>>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:21 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sde
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WCC4N5ZHTRJF
> >>>>>>>>> LU WWN Device Id: 5 0014ee 2b88b83bb
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:22 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdf
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>>>>>> Serial Number:    WD-WMC1T3804790
> >>>>>>>>> LU WWN Device Id: 5 0014ee 6036b6826
> >>>>>>>>> Firmware Version: 80.00A80
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:23 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdg
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WMC4N0H692Z9
> >>>>>>>>> LU WWN Device Id: 5 0014ee 65af39740
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdh
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68EUZN0
> >>>>>>>>> Serial Number:    WD-WMC4N0K5S750
> >>>>>>>>> LU WWN Device Id: 5 0014ee 6b048d9ca
> >>>>>>>>> Firmware Version: 82.00A82
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Rotation Rate:    5400 rpm
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:24 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>> $ sudo smartctl -H -i -l scterc /dev/sdi
> >>>>>>>>> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-76-generic] (local build)
> >>>>>>>>> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> >>>>>>>>>
> >>>>>>>>> === START OF INFORMATION SECTION ===
> >>>>>>>>> Model Family:     Western Digital Red
> >>>>>>>>> Device Model:     WDC WD30EFRX-68AX9N0
> >>>>>>>>> Serial Number:    WD-WMC1T1502475
> >>>>>>>>> LU WWN Device Id: 5 0014ee 058d2e5cb
> >>>>>>>>> Firmware Version: 80.00A80
> >>>>>>>>> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
> >>>>>>>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
> >>>>>>>>> Device is:        In smartctl database [for details use: -P show]
> >>>>>>>>> ATA Version is:   ACS-2 (minor revision not indicated)
> >>>>>>>>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> >>>>>>>>> Local Time is:    Sun Sep  3 13:28:27 2023 PDT
> >>>>>>>>> SMART support is: Available - device has SMART capability.
> >>>>>>>>> SMART support is: Enabled
> >>>>>>>>>
> >>>>>>>>> === START OF READ SMART DATA SECTION ===
> >>>>>>>>> SMART overall-health self-assessment test result: PASSED
> >>>>>>>>>
> >>>>>>>>> SCT Error Recovery Control:
> >>>>>>>>>                 Read:     70 (7.0 seconds)
> >>>>>>>>>                Write:     70 (7.0 seconds)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sda
> >>>>>>>>> /dev/sda:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sda1
> >>>>>>>>> /dev/sda1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0xd
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 8ca60ad5:60d19333:11b24820:91453532
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors - bad
> >>>>>>>>> blocks present.
> >>>>>>>>>             Checksum : b6d8f4d1 - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 7
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdb
> >>>>>>>>> /dev/sdb:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdb1
> >>>>>>>>> /dev/sdb1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 386d3001:16447e43:4d2a5459:85618d11
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 00:02:59 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>>>>>             Checksum : b544a39 - correct
> >>>>>>>>>               Events : 181077
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 8
> >>>>>>>>>         Array State : AAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdc
> >>>>>>>>> /dev/sdc:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdc1
> >>>>>>>>> /dev/sdc1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0xd
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 1798ec4f:72c56905:4e74ea61:2468db75
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>>>>>> blocks present.
> >>>>>>>>>             Checksum : 88d8b8fc - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 4
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdd
> >>>>>>>>> /dev/sdd:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdd1
> >>>>>>>>> /dev/sdd1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247728 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : a198095b:f54d26a9:deb3be8f:d6de9be1
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 24 sectors
> >>>>>>>>>             Checksum : d1471d9d - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 6
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sde
> >>>>>>>>> /dev/sde:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sde1
> >>>>>>>>> /dev/sde1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856376832 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : acf7ba2e:35d2fa91:6b12b0ce:33a73af5
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>>>             Checksum : e05d0278 - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 5
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdf
> >>>>>>>>> /dev/sdf:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdf1
> >>>>>>>>> /dev/sdf1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 31e7b86d:c274ff45:aa6dab50:2ff058c6
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>>>             Checksum : 26792cc0 - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 0
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdg
> >>>>>>>>> /dev/sdg:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdg1
> >>>>>>>>> /dev/sdg1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 74476ce7:4edc23f6:08120711:ba281425
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>>>             Checksum : 6f67d179 - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 1
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdh
> >>>>>>>>> /dev/sdh:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdh1
> >>>>>>>>> /dev/sdh1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0xd
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : 31c08263:b135f0f5:763bc86b:f81d7296
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124207104 (118.45 GiB 127.19 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 20:09:14 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors - bad
> >>>>>>>>> blocks present.
> >>>>>>>>>             Checksum : b7696b68 - correct
> >>>>>>>>>               Events : 181089
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 2
> >>>>>>>>>         Array State : AAAAAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --examine /dev/sdi
> >>>>>>>>> /dev/sdi:
> >>>>>>>>>         MBR Magic : aa55
> >>>>>>>>> Partition[0] :   4294967295 sectors at            1 (type ee)
> >>>>>>>>> $ sudo mdadm --examine /dev/sdi1
> >>>>>>>>> /dev/sdi1:
> >>>>>>>>>                Magic : a92b4efc
> >>>>>>>>>              Version : 1.2
> >>>>>>>>>          Feature Map : 0x5
> >>>>>>>>>           Array UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                 Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>        Creation Time : Tue Aug  4 23:47:57 2015
> >>>>>>>>>           Raid Level : raid6
> >>>>>>>>>         Raid Devices : 9
> >>>>>>>>>
> >>>>>>>>>       Avail Dev Size : 5856373760 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>           Array Size : 20497268736 KiB (19.09 TiB 20.99 TB)
> >>>>>>>>>        Used Dev Size : 5856362496 sectors (2.73 TiB 3.00 TB)
> >>>>>>>>>          Data Offset : 247808 sectors
> >>>>>>>>>         Super Offset : 8 sectors
> >>>>>>>>>         Unused Space : before=247720 sectors, after=14336 sectors
> >>>>>>>>>                State : clean
> >>>>>>>>>          Device UUID : ac1063fc:d9d66e6d:f3de33da:b396f483
> >>>>>>>>>
> >>>>>>>>> Internal Bitmap : 8 sectors from superblock
> >>>>>>>>>        Reshape pos'n : 124311040 (118.55 GiB 127.29 GB)
> >>>>>>>>>        Delta Devices : 1 (8->9)
> >>>>>>>>>
> >>>>>>>>>          Update Time : Tue Jul 11 23:12:08 2023
> >>>>>>>>>        Bad Block Log : 512 entries available at offset 72 sectors
> >>>>>>>>>             Checksum : 23b6d024 - correct
> >>>>>>>>>               Events : 181105
> >>>>>>>>>
> >>>>>>>>>               Layout : left-symmetric
> >>>>>>>>>           Chunk Size : 512K
> >>>>>>>>>
> >>>>>>>>>         Device Role : Active device 3
> >>>>>>>>>         Array State : AA.AAAAA. ('A' == active, '.' == missing, 'R' == replacing)
> >>>>>>>>>
> >>>>>>>>> $ sudo mdadm --detail /dev/md0
> >>>>>>>>> /dev/md0:
> >>>>>>>>>                 Version : 1.2
> >>>>>>>>>              Raid Level : raid6
> >>>>>>>>>           Total Devices : 9
> >>>>>>>>>             Persistence : Superblock is persistent
> >>>>>>>>>
> >>>>>>>>>                   State : inactive
> >>>>>>>>>         Working Devices : 9
> >>>>>>>>>
> >>>>>>>>>           Delta Devices : 1, (-1->0)
> >>>>>>>>>               New Level : raid6
> >>>>>>>>>              New Layout : left-symmetric
> >>>>>>>>>           New Chunksize : 512K
> >>>>>>>>>
> >>>>>>>>>                    Name : Blyth:0  (local to host Blyth)
> >>>>>>>>>                    UUID : 440dc11e:079308b1:131eda79:9a74c670
> >>>>>>>>>                  Events : 181105
> >>>>>>>>>
> >>>>>>>>>          Number   Major   Minor   RaidDevice
> >>>>>>>>>
> >>>>>>>>>             -       8        1        -        /dev/sda1
> >>>>>>>>>             -       8      129        -        /dev/sdi1
> >>>>>>>>>             -       8      113        -        /dev/sdh1
> >>>>>>>>>             -       8       97        -        /dev/sdg1
> >>>>>>>>>             -       8       81        -        /dev/sdf1
> >>>>>>>>>             -       8       65        -        /dev/sde1
> >>>>>>>>>             -       8       49        -        /dev/sdd1
> >>>>>>>>>             -       8       33        -        /dev/sdc1
> >>>>>>>>>             -       8       17        -        /dev/sdb1
> >>>>>>>>>
> >>>>>>>>> $ cat /proc/mdstat
> >>>>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >>>>>>>>> [raid4] [raid10]
> >>>>>>>>> md0 : inactive sdb1[9](S) sdi1[4](S) sdf1[0](S) sdg1[1](S) sdh1[3](S)
> >>>>>>>>> sda1[8](S) sdd1[7](S) sdc1[6](S) sde1[5](S)
> >>>>>>>>>            26353689600 blocks super 1.2
> >>>>>>>>>
> >>>>>>>>> unused devices: <none>
> >>>>>>>>>
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Reshape Failure
  2023-09-10  4:58                   ` Jason Moss
@ 2023-09-10  6:10                     ` Yu Kuai
  0 siblings, 0 replies; 21+ messages in thread
From: Yu Kuai @ 2023-09-10  6:10 UTC (permalink / raw)
  To: Jason Moss, Yu Kuai
  Cc: linux-raid, yangerkun@huawei.com, linux-block, Jens Axboe,
	yukuai (C)

Hi,

[cc linux-block]
在 2023/09/10 12:58, Jason Moss 写道:
> Hi,
> 
> On Sat, Sep 9, 2023 at 7:45 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/09/07 14:19, Jason Moss 写道:
>>> Hi,
>>>
>>> On Wed, Sep 6, 2023 at 11:13 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/09/07 13:44, Jason Moss 写道:
>>>>> Hi,
>>>>>
>>>>> On Wed, Sep 6, 2023 at 6:38 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2023/09/06 22:05, Jason Moss 写道:
>>>>>>> Hi Kuai,
>>>>>>>
>>>>>>> I ended up using gdb rather than addr2line, as that output didn't give
>>>>>>> me the global offset. Maybe there's a better way, but this seems to be
>>>>>>> similar to what I expected.
>>>>>>
>>>>>> It's ok.
>>>>>>>
>>>>>>> (gdb) list *(reshape_request+0x416)
>>>>>>> 0x11566 is in reshape_request (drivers/md/raid5.c:6396).
>>>>>>> 6391            if ((mddev->reshape_backwards
>>>>>>> 6392                 ? (safepos > writepos && readpos < writepos)
>>>>>>> 6393                 : (safepos < writepos && readpos > writepos)) ||
>>>>>>> 6394                time_after(jiffies, conf->reshape_checkpoint + 10*HZ)) {
>>>>>>> 6395                    /* Cannot proceed until we've updated the
>>>>>>> superblock... */
>>>>>>> 6396                    wait_event(conf->wait_for_overlap,
>>>>>>> 6397                               atomic_read(&conf->reshape_stripes)==0
>>>>>>> 6398                               || test_bit(MD_RECOVERY_INTR,
>>>>>>
>>>>>> If reshape is stuck here, which means:
>>>>>>
>>>>>> 1) Either reshape io is stuck somewhere and never complete;
>>>>>> 2) Or the counter reshape_stripes is broken;
>>>>>>
>>>>>> Can you read following debugfs files to verify if io is stuck in
>>>>>> underlying disk?
>>>>>>
>>>>>> /sys/kernel/debug/block/[disk]/hctx*/{sched_tags,tags,busy,dispatch}
>>>>>>
>>>>>
>>>>> I'll attach this below.
>>>>>
>>>>>> Furthermore, echo frozen should break above wait_event() because
>>>>>> 'MD_RECOVERY_INTR' will be set, however, based on your description,
>>>>>> the problem still exist. Can you collect stack and addr2line result
>>>>>> of stuck thread after echo frozen?
>>>>>>
>>>>>
>>>>> I echo'd frozen to /sys/block/md0/md/sync_action, however the echo
>>>>> call has been sitting for about 30 minutes, maybe longer, and has not
>>>>> returned. Here's the current state:
>>>>>
>>>>> root         454  0.0  0.0      0     0 ?        I<   Sep05   0:00 [raid5wq]
>>>>> root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
>>>>
>>>> Can you also show the stack of udev-worker? And any other thread with
>>>> 'D' state, I think above "echo frozen" is probably also stuck in D
>>>> state.
>>>>
>>>
>>> As requested:
>>>
>>> ps aux | grep D
>>> USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>> root         455  0.0  0.0  34680  5988 ?        D    Sep05   0:00 (udev-worker)
>>> root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
>>> root       45507  0.0  0.0   8272  4736 pts/1    Ds+  Sep05   0:00 -bash
>>> jason     279169  0.0  0.0   6976  2560 pts/0    S+   23:16   0:00
>>> grep --color=auto D
>>>
>>> [jason@arch md]$ sudo cat /proc/455/stack
>>> [<0>] wait_woken+0x54/0x60
>>> [<0>] raid5_make_request+0x5fe/0x12f0 [raid456]
>>> [<0>] md_handle_request+0x135/0x220 [md_mod]
>>> [<0>] __submit_bio+0xb3/0x170
>>> [<0>] submit_bio_noacct_nocheck+0x159/0x370
>>> [<0>] block_read_full_folio+0x21c/0x340
>>> [<0>] filemap_read_folio+0x40/0xd0
>>> [<0>] filemap_get_pages+0x475/0x630
>>> [<0>] filemap_read+0xd9/0x350
>>> [<0>] blkdev_read_iter+0x6b/0x1b0
>>> [<0>] vfs_read+0x201/0x350
>>> [<0>] ksys_read+0x6f/0xf0
>>> [<0>] do_syscall_64+0x60/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>
>>>
>>> [jason@arch md]$ sudo cat /proc/45507/stack
>>> [<0>] kthread_stop+0x6a/0x180
>>> [<0>] md_unregister_thread+0x29/0x60 [md_mod]
>>> [<0>] action_store+0x168/0x320 [md_mod]
>>> [<0>] md_attr_store+0x86/0xf0 [md_mod]
>>> [<0>] kernfs_fop_write_iter+0x136/0x1d0
>>> [<0>] vfs_write+0x23e/0x420
>>> [<0>] ksys_write+0x6f/0xf0
>>> [<0>] do_syscall_64+0x60/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>
>>> Please let me know if you'd like me to identify the lines for any of those.
>>>
>>
>> That's enough.
>>> Thanks,
>>> Jason
>>>
>>>
>>>>> root         456 99.9  0.0      0     0 ?        R    Sep05 1543:40 [md0_raid6]
>>>>> root         457  0.0  0.0      0     0 ?        D    Sep05   0:00 [md0_reshape]
>>>>>
>>>>> [jason@arch md]$ sudo cat /proc/457/stack
>>>>> [<0>] md_do_sync+0xef2/0x11d0 [md_mod]
>>>>> [<0>] md_thread+0xae/0x190 [md_mod]
>>>>> [<0>] kthread+0xe8/0x120
>>>>> [<0>] ret_from_fork+0x34/0x50
>>>>> [<0>] ret_from_fork_asm+0x1b/0x30
>>>>>
>>>>> Reading symbols from md-mod.ko...
>>>>> (gdb) list *(md_do_sync+0xef2)
>>>>> 0xb3a2 is in md_do_sync (drivers/md/md.c:9035).
>>>>> 9030                    ? "interrupted" : "done");
>>>>> 9031            /*
>>>>> 9032             * this also signals 'finished resyncing' to md_stop
>>>>> 9033             */
>>>>> 9034            blk_finish_plug(&plug);
>>>>> 9035            wait_event(mddev->recovery_wait,
>>>>> !atomic_read(&mddev->recovery_active));
>>>>
>>>> That's also wait for reshape io to be done from common layer.
>>>>
>>>>> 9036
>>>>> 9037            if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>>>>> 9038                !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
>>>>> 9039                mddev->curr_resync >= MD_RESYNC_ACTIVE) {
>>>>>
>>>>>
>>>>> The debugfs info:
>>>>>
>>>>> [root@arch ~]# cat
>>>>> /sys/kernel/debug/block/sda/hctx*/{sched_tags,tags,busy,dispatch}
>>>>
>>>> Only sched_tags is read, sorry that I didn't mean to use this exact cmd.
>>>>
>>>> Perhaps you can using following cmd:
>>>>
>>>> find /sys/kernel/debug/block/sda/ -type f | xargs grep .
>>>>
>>>>> nr_tags=64
>>>>> nr_reserved_tags=0
>>>>> active_queues=0
>>>>>
>>>>> bitmap_tags:
>>>>> depth=64
>>>>> busy=1
>>>>
>>>> This means there is one IO in sda, however, I need more information to
>>>> make sure where is this IO. And please make sure don't run any other
>>>> thread that can read/write from sda. You can use "iostat -dmx 1" and
>>>> observe for a while to confirm that there is no new io.
>>
>> And can you help for this? Confirm no new io and collect debugfs.
> 
> As instructed, I confirmed there is no active IO to sda1 via iostat. I
> then ran the provided command
> 
> [root@arch ~]# find /sys/kernel/debug/block/sda/ -type f | xargs grep .
> /sys/kernel/debug/block/sda/rqos/wbt/wb_background:6
> /sys/kernel/debug/block/sda/rqos/wbt/wb_normal:12
> /sys/kernel/debug/block/sda/rqos/wbt/unknown_cnt:4
> /sys/kernel/debug/block/sda/rqos/wbt/min_lat_nsec:75000000
> /sys/kernel/debug/block/sda/rqos/wbt/inflight:0: inflight 1
> /sys/kernel/debug/block/sda/rqos/wbt/inflight:1: inflight 0
> /sys/kernel/debug/block/sda/rqos/wbt/inflight:2: inflight 0
> /sys/kernel/debug/block/sda/rqos/wbt/id:0
> /sys/kernel/debug/block/sda/rqos/wbt/enabled:1
> /sys/kernel/debug/block/sda/rqos/wbt/curr_win_nsec:100000000
> /sys/kernel/debug/block/sda/hctx0/type:default
> /sys/kernel/debug/block/sda/hctx0/dispatch_busy:0
> /sys/kernel/debug/block/sda/hctx0/active:0
> /sys/kernel/debug/block/sda/hctx0/run:2583
> /sys/kernel/debug/block/sda/hctx0/sched_tags_bitmap:00000000: 0000
> 0000 8000 0000
> /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_tags=64
> /sys/kernel/debug/block/sda/hctx0/sched_tags:nr_reserved_tags=0
> /sys/kernel/debug/block/sda/hctx0/sched_tags:active_queues=0
> /sys/kernel/debug/block/sda/hctx0/sched_tags:bitmap_tags:
> /sys/kernel/debug/block/sda/hctx0/sched_tags:depth=64
> /sys/kernel/debug/block/sda/hctx0/sched_tags:busy=1
sched_tags:busy is 1 indicate this io made to the elevator. Which means
this problem is not related to raid,io issued to sda never return.

> /sys/kernel/debug/block/sda/hctx0/sched_tags:cleared=57
> /sys/kernel/debug/block/sda/hctx0/sched_tags:bits_per_word=16
> /sys/kernel/debug/block/sda/hctx0/sched_tags:map_nr=4
> /sys/kernel/debug/block/sda/hctx0/sched_tags:alloc_hint={40, 20, 48, 0}
> /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_batch=8
> /sys/kernel/debug/block/sda/hctx0/sched_tags:wake_index=0
> /sys/kernel/debug/block/sda/hctx0/sched_tags:ws_active=0
> /sys/kernel/debug/block/sda/hctx0/sched_tags:ws={
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:   {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/sched_tags:}
> /sys/kernel/debug/block/sda/hctx0/sched_tags:round_robin=1
> /sys/kernel/debug/block/sda/hctx0/sched_tags:min_shallow_depth=48
> /sys/kernel/debug/block/sda/hctx0/tags_bitmap:00000000: 0000 0000
> /sys/kernel/debug/block/sda/hctx0/tags:nr_tags=32
> /sys/kernel/debug/block/sda/hctx0/tags:nr_reserved_tags=0
> /sys/kernel/debug/block/sda/hctx0/tags:active_queues=0
> /sys/kernel/debug/block/sda/hctx0/tags:bitmap_tags:
> /sys/kernel/debug/block/sda/hctx0/tags:depth=32
> /sys/kernel/debug/block/sda/hctx0/tags:busy=0
sched_tags:busy is 0 indicate this io didn't make to the driver. So io
is still in block layer, likely still in elevator.

Which elevator you are using? You can confirm by:

cat /sys/block/sda/queue/scheduler

It's likely mq-deadline, anyway, can you switch to other elevator before
assemble the array and retry to test if you can still reporduce the
problem?

Thanks,
Kuai

> /sys/kernel/debug/block/sda/hctx0/tags:cleared=21
> /sys/kernel/debug/block/sda/hctx0/tags:bits_per_word=8
> /sys/kernel/debug/block/sda/hctx0/tags:map_nr=4
> /sys/kernel/debug/block/sda/hctx0/tags:alloc_hint={19, 26, 7, 21}
> /sys/kernel/debug/block/sda/hctx0/tags:wake_batch=4
> /sys/kernel/debug/block/sda/hctx0/tags:wake_index=0
> /sys/kernel/debug/block/sda/hctx0/tags:ws_active=0
> /sys/kernel/debug/block/sda/hctx0/tags:ws={
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags: {.wait=inactive},
> /sys/kernel/debug/block/sda/hctx0/tags:}
> /sys/kernel/debug/block/sda/hctx0/tags:round_robin=1
> /sys/kernel/debug/block/sda/hctx0/tags:min_shallow_depth=4294967295
> /sys/kernel/debug/block/sda/hctx0/ctx_map:00000000: 00
> /sys/kernel/debug/block/sda/hctx0/flags:alloc_policy=RR SHOULD_MERGE
> /sys/kernel/debug/block/sda/sched/queued:0 0 0
> /sys/kernel/debug/block/sda/sched/owned_by_driver:0 0 0
> /sys/kernel/debug/block/sda/sched/async_depth:48
> /sys/kernel/debug/block/sda/sched/starved:0
> /sys/kernel/debug/block/sda/sched/batching:2
> /sys/kernel/debug/block/sda/state:SAME_COMP|IO_STAT|ADD_RANDOM|INIT_DONE|WC|STATS|REGISTERED|NOWAIT|SQ_SCHED
> /sys/kernel/debug/block/sda/pm_only:0
> 
> Let me know if there's anything further I can provide to assist in
> troubleshooting.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-09-10  6:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 15:46 reshape failure Tobias McNulty
2011-02-16 20:32 ` NeilBrown
2011-02-16 20:41   ` Tobias McNulty
2011-02-16 21:06     ` NeilBrown
2011-02-17 21:39       ` Tobias McNulty
2011-05-11 18:06         ` Tobias McNulty
2011-05-11 21:12           ` NeilBrown
2011-05-11 21:19             ` Tobias McNulty
     [not found]             ` <BANLkTi=3-PgTqeGqyu5fPZMporA1vk6-Tw@mail.gmail.com>
2011-05-11 21:34               ` NeilBrown
2011-05-12  0:46                 ` Tobias McNulty
  -- strict thread matches above, loose matches on Subject: below --
2023-09-03 21:39 Reshape Failure Jason Moss
2023-09-04  1:41 ` Yu Kuai
2023-09-04 16:38   ` Jason Moss
2023-09-05  1:07     ` Yu Kuai
2023-09-06 14:05       ` Jason Moss
2023-09-07  1:38         ` Yu Kuai
2023-09-07  5:44           ` Jason Moss
     [not found]             ` <79aa3cf3-78d4-cfc6-8d3b-eb8704ffaba1@huaweicloud.com>
2023-09-07  6:19               ` Jason Moss
2023-09-10  2:45                 ` Yu Kuai
2023-09-10  4:58                   ` Jason Moss
2023-09-10  6:10                     ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).