extreme RAID10 rebuild times reported, but rebuild's progressing ?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* extreme RAID10 rebuild times reported, but rebuild's progressing ?
@ 2025-03-30 15:56 pgnd
  2025-03-31 16:11 ` Xiao Ni
  0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-03-30 15:56 UTC (permalink / raw)
  To: linux-raid

on

	distro
		Name: Fedora Linux 41 (Forty One)
		Version: 41
		Codename:

	mdadm -V
		mdadm - v4.3 - 2024-02-15

	rpm -qa | grep mdadm
		mdadm-4.3-4.fc41.x86_64

i have a relatively-new (~1 month) 4x4TB RAID10 array.

after a reboot, one of the drives got kicked

	dmesg
		...
		[   15.513443] sd 15:0:7:0: [sdn] Attached SCSI disk
		[   15.784537] md: kicking non-fresh sdn1 from array!
		...

	cat proc mdstat
		md124 : active raid10 sdm1[1] sdl1[0] sdk1[4]
		      7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
		      bitmap: 1/59 pages [4KB], 65536KB chunk

smartctl shows no issues; can't yet find a reason for the kick.

re-adding the drive, rebuild starts.

it's progressing with recovery; after ~ 30mins, I see

	md124 : active raid10 sdm1[1] sdn1[2] sdl1[0] sdk1[4]
	      7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
	      [=========>...........]  recovery = 49.2% (1924016576/3906885120) finish=3918230862.4min speed=0K/sec
	      bitmap: 1/59 pages [4KB], 65536KB chunk

the values of

	finish=3918230862.4min speed=0K/sec

appear nonsensical.

is this a bug in my mdadm config, progress reporting & or an actual problem with _function_?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-03-30 15:56 extreme RAID10 rebuild times reported, but rebuild's progressing ? pgnd
@ 2025-03-31 16:11 ` Xiao Ni
  2025-03-31 16:36   ` pgnd
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-03-31 16:11 UTC (permalink / raw)
  To: pgnd; +Cc: linux-raid

On Sun, Mar 30, 2025 at 11:56 PM pgnd <pgnd@dev-mail.net> wrote:
>
> on
>
>         distro
>                 Name: Fedora Linux 41 (Forty One)
>                 Version: 41
>                 Codename:
>
>         mdadm -V
>                 mdadm - v4.3 - 2024-02-15
>
>         rpm -qa | grep mdadm
>                 mdadm-4.3-4.fc41.x86_64
>
> i have a relatively-new (~1 month) 4x4TB RAID10 array.
>
> after a reboot, one of the drives got kicked
>
>         dmesg
>                 ...
>                 [   15.513443] sd 15:0:7:0: [sdn] Attached SCSI disk
>                 [   15.784537] md: kicking non-fresh sdn1 from array!
>                 ...
>
>         cat proc mdstat
>                 md124 : active raid10 sdm1[1] sdl1[0] sdk1[4]
>                       7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
>                       bitmap: 1/59 pages [4KB], 65536KB chunk
>
> smartctl shows no issues; can't yet find a reason for the kick.
>
> re-adding the drive, rebuild starts.
>
> it's progressing with recovery; after ~ 30mins, I see
>
>         md124 : active raid10 sdm1[1] sdn1[2] sdl1[0] sdk1[4]
>               7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
>               [=========>...........]  recovery = 49.2% (1924016576/3906885120) finish=3918230862.4min speed=0K/sec
>               bitmap: 1/59 pages [4KB], 65536KB chunk
>
> the values of
>
>         finish=3918230862.4min speed=0K/sec
>
> appear nonsensical.
>
> is this a bug in my mdadm config, progress reporting & or an actual problem with _function_?
>
>

Hi

Are there some D state progress? And how about `ps auxf | grep md`? Is
there a filesystem on it? If so, can you still read/write data from
it?

Regards
Xiao


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-03-31 16:11 ` Xiao Ni
@ 2025-03-31 16:36   ` pgnd
  2025-04-01  2:29     ` Xiao Ni
  0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-03-31 16:36 UTC (permalink / raw)
  To: xni; +Cc: linux-raid

hi.

> Are there some D state progress?

no, there were none.

the rebuild 'completed' in ~ 1hr 15mins ...
atm, the array's up, passing all tests, and seemingly fully functional

> And how about `ps auxf | grep md`?

ps auxf | grep md
	root          97  0.0  0.0      0     0 ?        SN   09:10   0:00  \_ [ksmd]
	root         107  0.0  0.0      0     0 ?        I<   09:10   0:00  \_ [kworker/R-md]
	root         108  0.0  0.0      0     0 ?        I<   09:10   0:00  \_ [kworker/R-md_bitmap]
	root        1049  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [md124_raid10]
	root        1052  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [md123_raid10]
	root        1677  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [jbd2/md126-8]
	root           1  0.0  0.0  24820 15536 ?        Ss   09:10   0:03 /usr/lib/systemd/systemd --switched-root --system --deserialize=49 domdadm dolvm showopts noquiet
	root        1308  0.0  0.0  32924  8340 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-journald
	root        1368  0.0  0.0  36620 11596 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-udevd
	systemd+    1400  0.0  0.0  17564  9160 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-networkd
	systemd+    2010  0.0  0.0  15932  7112 ?        Ss   09:11   0:02 /usr/lib/systemd/systemd-oomd
	root        2029  0.0  0.0   4176  2128 ?        Ss   09:11   0:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
	root        2055  0.0  0.0  16648  8012 ?        Ss   09:11   0:00 /usr/lib/systemd/systemd-logind
	root        2121  0.0  0.0  21176 12288 ?        Ss   09:11   0:00 /usr/lib/systemd/systemd --user
	root        4105  0.0  0.0 230344  2244 pts/0    S+   12:21   0:00              \_ grep --color=auto md
	root        2247  0.0  0.0 113000  6236 ?        Ssl  09:11   0:00 /usr/sbin/automount --systemd-service --dont-check-daemon

> there a filesystem on it? If so, can you still read/write data from it?

yes, and yes.

pvs
   PV         VG         Fmt  Attr PSize  PFree
   /dev/md123 VG_D1    lvm2 a--   5.45t     0
   /dev/md124 VG_D1    lvm2 a--   7.27t     0
vgs
   VG       #PV #LV #SN Attr   VSize  VFree
   VG_D1      2   1   0 wz--n- 12.72t     0
lvs
   LV             VG         Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
   LV_D1          VG_D1      -wi-ao---- 12.72t

cat /proc/mdstat
	Personalities : [raid1] [raid10]
	md123 : active (auto-read-only) raid10 sdg1[3] sdh1[4] sdj1[2] sdi1[1]
	      5860265984 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
	      bitmap: 0/44 pages [0KB], 65536KB chunk

	md124 : active raid10 sdl1[0] sdm1[1] sdn1[2] sdk1[4]
	      7813770240 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
	      bitmap: 0/59 pages [0KB], 65536KB chunk

lsblk /dev/sdn
	NAME                  MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINTS
	sdn                     8:208  0  3.6T  0 disk
	└─sdn1                  8:209  0  3.6T  0 part
	  └─md124               9:124  0  7.3T  0 raid10
	    └─VG_D1-LV_D1     253:8    0 12.7T  0 lvm    /NAS/D1

fdisk -l /dev/sdn
	Disk /dev/sdn: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
	Disk model: WDC WD40EFPX-68C
	Units: sectors of 1 * 512 = 512 bytes
	Sector size (logical/physical): 512 bytes / 4096 bytes
	I/O size (minimum/optimal): 4096 bytes / 131072 bytes
	Disklabel type: gpt
	Disk identifier: ...

	Device     Start        End    Sectors  Size Type
	/dev/sdn1   2048 7814037134 7814035087  3.6T Linux RAID

fdisk -l /dev/sdn1
	Disk /dev/sdn1: 3.64 TiB, 4000785964544 bytes, 7814035087 sectors
	Units: sectors of 1 * 512 = 512 bytes
	Sector size (logical/physical): 512 bytes / 4096 bytes
	I/O size (minimum/optimal): 4096 bytes / 131072 bytes

cat /proc/mounts  | grep D1
	/dev/mapper/VG_D1-LV_D1 /NAS/D1 ext4 rw,relatime,stripe=128 0 0


touch /NAS/D1/test.file
stat /NAS/D1/test.file
	  File: /NAS/D1/test.file
	  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
	Device: 253,8   Inode: 11          Links: 1
	Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
	Access: 2025-03-31 12:33:48.110052013 -0400
	Modify: 2025-03-31 12:33:48.110052013 -0400
	Change: 2025-03-31 12:33:48.110052013 -0400
	 Birth: 2025-03-31 12:33:07.272309441 -0400

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-03-31 16:36   ` pgnd
@ 2025-04-01  2:29     ` Xiao Ni
  2025-04-01  2:41       ` pgnd
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-04-01  2:29 UTC (permalink / raw)
  To: pgnd; +Cc: linux-raid

On Tue, Apr 1, 2025 at 12:36 AM pgnd <pgnd@dev-mail.net> wrote:
>
> hi.
>
> > Are there some D state progress?
>
> no, there were none.
>
> the rebuild 'completed' in ~ 1hr 15mins ...
> atm, the array's up, passing all tests, and seemingly fully functional

I'm glad to hear this, so everything works well now :)

>
> > And how about `ps auxf | grep md`?
>
> ps auxf | grep md
>         root          97  0.0  0.0      0     0 ?        SN   09:10   0:00  \_ [ksmd]
>         root         107  0.0  0.0      0     0 ?        I<   09:10   0:00  \_ [kworker/R-md]
>         root         108  0.0  0.0      0     0 ?        I<   09:10   0:00  \_ [kworker/R-md_bitmap]
>         root        1049  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [md124_raid10]
>         root        1052  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [md123_raid10]
>         root        1677  0.0  0.0      0     0 ?        S    09:10   0:00  \_ [jbd2/md126-8]
>         root           1  0.0  0.0  24820 15536 ?        Ss   09:10   0:03 /usr/lib/systemd/systemd --switched-root --system --deserialize=49 domdadm dolvm showopts noquiet
>         root        1308  0.0  0.0  32924  8340 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-journald
>         root        1368  0.0  0.0  36620 11596 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-udevd
>         systemd+    1400  0.0  0.0  17564  9160 ?        Ss   09:10   0:00 /usr/lib/systemd/systemd-networkd
>         systemd+    2010  0.0  0.0  15932  7112 ?        Ss   09:11   0:02 /usr/lib/systemd/systemd-oomd
>         root        2029  0.0  0.0   4176  2128 ?        Ss   09:11   0:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
>         root        2055  0.0  0.0  16648  8012 ?        Ss   09:11   0:00 /usr/lib/systemd/systemd-logind
>         root        2121  0.0  0.0  21176 12288 ?        Ss   09:11   0:00 /usr/lib/systemd/systemd --user
>         root        4105  0.0  0.0 230344  2244 pts/0    S+   12:21   0:00              \_ grep --color=auto md
>         root        2247  0.0  0.0 113000  6236 ?        Ssl  09:11   0:00 /usr/sbin/automount --systemd-service --dont-check-daemon
>
> > there a filesystem on it? If so, can you still read/write data from it?
>
> yes, and yes.
>
> pvs
>    PV         VG         Fmt  Attr PSize  PFree
>    /dev/md123 VG_D1    lvm2 a--   5.45t     0
>    /dev/md124 VG_D1    lvm2 a--   7.27t     0
> vgs
>    VG       #PV #LV #SN Attr   VSize  VFree
>    VG_D1      2   1   0 wz--n- 12.72t     0
> lvs
>    LV             VG         Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>    LV_D1          VG_D1      -wi-ao---- 12.72t
>
> cat /proc/mdstat
>         Personalities : [raid1] [raid10]
>         md123 : active (auto-read-only) raid10 sdg1[3] sdh1[4] sdj1[2] sdi1[1]
>               5860265984 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>               bitmap: 0/44 pages [0KB], 65536KB chunk
>
>         md124 : active raid10 sdl1[0] sdm1[1] sdn1[2] sdk1[4]
>               7813770240 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
>               bitmap: 0/59 pages [0KB], 65536KB chunk
>
> lsblk /dev/sdn
>         NAME                  MAJ:MIN RM  SIZE RO TYPE   MOUNTPOINTS
>         sdn                     8:208  0  3.6T  0 disk
>         └─sdn1                  8:209  0  3.6T  0 part
>           └─md124               9:124  0  7.3T  0 raid10
>             └─VG_D1-LV_D1     253:8    0 12.7T  0 lvm    /NAS/D1
>
> fdisk -l /dev/sdn
>         Disk /dev/sdn: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
>         Disk model: WDC WD40EFPX-68C
>         Units: sectors of 1 * 512 = 512 bytes
>         Sector size (logical/physical): 512 bytes / 4096 bytes
>         I/O size (minimum/optimal): 4096 bytes / 131072 bytes
>         Disklabel type: gpt
>         Disk identifier: ...
>
>         Device     Start        End    Sectors  Size Type
>         /dev/sdn1   2048 7814037134 7814035087  3.6T Linux RAID
>
> fdisk -l /dev/sdn1
>         Disk /dev/sdn1: 3.64 TiB, 4000785964544 bytes, 7814035087 sectors
>         Units: sectors of 1 * 512 = 512 bytes
>         Sector size (logical/physical): 512 bytes / 4096 bytes
>         I/O size (minimum/optimal): 4096 bytes / 131072 bytes
>
> cat /proc/mounts  | grep D1
>         /dev/mapper/VG_D1-LV_D1 /NAS/D1 ext4 rw,relatime,stripe=128 0 0
>
>
> touch /NAS/D1/test.file
> stat /NAS/D1/test.file
>           File: /NAS/D1/test.file
>           Size: 0               Blocks: 0          IO Block: 4096   regular empty file
>         Device: 253,8   Inode: 11          Links: 1
>         Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>         Access: 2025-03-31 12:33:48.110052013 -0400
>         Modify: 2025-03-31 12:33:48.110052013 -0400
>         Change: 2025-03-31 12:33:48.110052013 -0400
>          Birth: 2025-03-31 12:33:07.272309441 -0400
>

Regards
Xiao


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-04-01  2:29     ` Xiao Ni
@ 2025-04-01  2:41       ` pgnd
  2025-04-01  4:48         ` Xiao Ni
  0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-04-01  2:41 UTC (permalink / raw)
  To: xni; +Cc: linux-raid

> so everything works well now

the OP question remains unanswered.  namely what the issue is re:

	finish=3918230862.4min speed=0K/sec

, and whether it's an indication of a functional problem with the rebuild, one of the mdadm util, or of config?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-04-01  2:41       ` pgnd
@ 2025-04-01  4:48         ` Xiao Ni
  2025-04-01 12:47           ` pgnd
  0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-04-01  4:48 UTC (permalink / raw)
  To: pgnd; +Cc: linux-raid

On Tue, Apr 1, 2025 at 10:41 AM pgnd <pgnd@dev-mail.net> wrote:
>
> > so everything works well now
>
> the OP question remains unanswered.  namely what the issue is re:
>
>         finish=3918230862.4min speed=0K/sec
>
> , and whether it's an indication of a functional problem with the rebuild, one of the mdadm util, or of config?
>

I can't give you an answer now. Because there is very little
information. But the mdadm --monitor --scan is a suspicious place. Do
you know why this command is running? I tried to start a recovery and
didn't see this command.

Thanks
Xiao


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
  2025-04-01  4:48         ` Xiao Ni
@ 2025-04-01 12:47           ` pgnd
  0 siblings, 0 replies; 7+ messages in thread
From: pgnd @ 2025-04-01 12:47 UTC (permalink / raw)
  To: xni; +Cc: linux-raid

> mdadm --monitor --scan is a suspicious place. Do
> you know why this command is running? I tried to start a recovery and
> didn't see this command.

it's the `mdmonitor` service ...

rpm -q --whatprovides /usr/lib/systemd/system/mdmonitor.service
	mdadm-4.3-4.fc41.x86_64

cat /usr/lib/systemd/system/mdmonitor.service
	[Unit]
	Description=Software RAID monitoring and management
	ConditionPathExists=/etc/mdadm.conf

	[Service]
	Type=forking
	PIDFile=/run/mdadm/mdadm.pid
	EnvironmentFile=-/etc/sysconfig/mdmonitor
	ExecStart=/sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid

	[Install]
	WantedBy=multi-user.target

systemctl status mdmonitor
	● mdmonitor.service - Software RAID monitoring and management
	     Loaded: loaded (/usr/lib/systemd/system/mdmonitor.service; enabled; preset: enabled)
	    Drop-In: /usr/lib/systemd/system/service.d
	             └─10-timeout-abort.conf, 50-keep-warm.conf
	     Active: active (running) since Mon 2025-03-31 09:11:17 EDT; 23h ago
	 Invocation: 79247157fbfb4c369c1cc7899b4d79f2
	   Main PID: 2029 (mdadm)
	      Tasks: 1 (limit: 76108)
	     Memory: 988K (peak: 1.2M)
	        CPU: 1.626s
	     CGroup: /system.slice/mdmonitor.service
!!!	             └─2029 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid

ps aufx | grep /mdadm
	root        2029  0.0  0.0   4176  2128 ?        Ss   Mar31   0:01 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-01 12:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-30 15:56 extreme RAID10 rebuild times reported, but rebuild's progressing ? pgnd
2025-03-31 16:11 ` Xiao Ni
2025-03-31 16:36   ` pgnd
2025-04-01  2:29     ` Xiao Ni
2025-04-01  2:41       ` pgnd
2025-04-01  4:48         ` Xiao Ni
2025-04-01 12:47           ` pgnd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).