* extreme RAID10 rebuild times reported, but rebuild's progressing ?
@ 2025-03-30 15:56 pgnd
2025-03-31 16:11 ` Xiao Ni
0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-03-30 15:56 UTC (permalink / raw)
To: linux-raid
on
distro
Name: Fedora Linux 41 (Forty One)
Version: 41
Codename:
mdadm -V
mdadm - v4.3 - 2024-02-15
rpm -qa | grep mdadm
mdadm-4.3-4.fc41.x86_64
i have a relatively-new (~1 month) 4x4TB RAID10 array.
after a reboot, one of the drives got kicked
dmesg
...
[ 15.513443] sd 15:0:7:0: [sdn] Attached SCSI disk
[ 15.784537] md: kicking non-fresh sdn1 from array!
...
cat proc mdstat
md124 : active raid10 sdm1[1] sdl1[0] sdk1[4]
7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
bitmap: 1/59 pages [4KB], 65536KB chunk
smartctl shows no issues; can't yet find a reason for the kick.
re-adding the drive, rebuild starts.
it's progressing with recovery; after ~ 30mins, I see
md124 : active raid10 sdm1[1] sdn1[2] sdl1[0] sdk1[4]
7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
[=========>...........] recovery = 49.2% (1924016576/3906885120) finish=3918230862.4min speed=0K/sec
bitmap: 1/59 pages [4KB], 65536KB chunk
the values of
finish=3918230862.4min speed=0K/sec
appear nonsensical.
is this a bug in my mdadm config, progress reporting & or an actual problem with _function_?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-03-30 15:56 extreme RAID10 rebuild times reported, but rebuild's progressing ? pgnd
@ 2025-03-31 16:11 ` Xiao Ni
2025-03-31 16:36 ` pgnd
0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-03-31 16:11 UTC (permalink / raw)
To: pgnd; +Cc: linux-raid
On Sun, Mar 30, 2025 at 11:56 PM pgnd <pgnd@dev-mail.net> wrote:
>
> on
>
> distro
> Name: Fedora Linux 41 (Forty One)
> Version: 41
> Codename:
>
> mdadm -V
> mdadm - v4.3 - 2024-02-15
>
> rpm -qa | grep mdadm
> mdadm-4.3-4.fc41.x86_64
>
> i have a relatively-new (~1 month) 4x4TB RAID10 array.
>
> after a reboot, one of the drives got kicked
>
> dmesg
> ...
> [ 15.513443] sd 15:0:7:0: [sdn] Attached SCSI disk
> [ 15.784537] md: kicking non-fresh sdn1 from array!
> ...
>
> cat proc mdstat
> md124 : active raid10 sdm1[1] sdl1[0] sdk1[4]
> 7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
> bitmap: 1/59 pages [4KB], 65536KB chunk
>
> smartctl shows no issues; can't yet find a reason for the kick.
>
> re-adding the drive, rebuild starts.
>
> it's progressing with recovery; after ~ 30mins, I see
>
> md124 : active raid10 sdm1[1] sdn1[2] sdl1[0] sdk1[4]
> 7813770240 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
> [=========>...........] recovery = 49.2% (1924016576/3906885120) finish=3918230862.4min speed=0K/sec
> bitmap: 1/59 pages [4KB], 65536KB chunk
>
> the values of
>
> finish=3918230862.4min speed=0K/sec
>
> appear nonsensical.
>
> is this a bug in my mdadm config, progress reporting & or an actual problem with _function_?
>
>
Hi
Are there some D state progress? And how about `ps auxf | grep md`? Is
there a filesystem on it? If so, can you still read/write data from
it?
Regards
Xiao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-03-31 16:11 ` Xiao Ni
@ 2025-03-31 16:36 ` pgnd
2025-04-01 2:29 ` Xiao Ni
0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-03-31 16:36 UTC (permalink / raw)
To: xni; +Cc: linux-raid
hi.
> Are there some D state progress?
no, there were none.
the rebuild 'completed' in ~ 1hr 15mins ...
atm, the array's up, passing all tests, and seemingly fully functional
> And how about `ps auxf | grep md`?
ps auxf | grep md
root 97 0.0 0.0 0 0 ? SN 09:10 0:00 \_ [ksmd]
root 107 0.0 0.0 0 0 ? I< 09:10 0:00 \_ [kworker/R-md]
root 108 0.0 0.0 0 0 ? I< 09:10 0:00 \_ [kworker/R-md_bitmap]
root 1049 0.0 0.0 0 0 ? S 09:10 0:00 \_ [md124_raid10]
root 1052 0.0 0.0 0 0 ? S 09:10 0:00 \_ [md123_raid10]
root 1677 0.0 0.0 0 0 ? S 09:10 0:00 \_ [jbd2/md126-8]
root 1 0.0 0.0 24820 15536 ? Ss 09:10 0:03 /usr/lib/systemd/systemd --switched-root --system --deserialize=49 domdadm dolvm showopts noquiet
root 1308 0.0 0.0 32924 8340 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-journald
root 1368 0.0 0.0 36620 11596 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-udevd
systemd+ 1400 0.0 0.0 17564 9160 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-networkd
systemd+ 2010 0.0 0.0 15932 7112 ? Ss 09:11 0:02 /usr/lib/systemd/systemd-oomd
root 2029 0.0 0.0 4176 2128 ? Ss 09:11 0:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
root 2055 0.0 0.0 16648 8012 ? Ss 09:11 0:00 /usr/lib/systemd/systemd-logind
root 2121 0.0 0.0 21176 12288 ? Ss 09:11 0:00 /usr/lib/systemd/systemd --user
root 4105 0.0 0.0 230344 2244 pts/0 S+ 12:21 0:00 \_ grep --color=auto md
root 2247 0.0 0.0 113000 6236 ? Ssl 09:11 0:00 /usr/sbin/automount --systemd-service --dont-check-daemon
> there a filesystem on it? If so, can you still read/write data from it?
yes, and yes.
pvs
PV VG Fmt Attr PSize PFree
/dev/md123 VG_D1 lvm2 a-- 5.45t 0
/dev/md124 VG_D1 lvm2 a-- 7.27t 0
vgs
VG #PV #LV #SN Attr VSize VFree
VG_D1 2 1 0 wz--n- 12.72t 0
lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
LV_D1 VG_D1 -wi-ao---- 12.72t
cat /proc/mdstat
Personalities : [raid1] [raid10]
md123 : active (auto-read-only) raid10 sdg1[3] sdh1[4] sdj1[2] sdi1[1]
5860265984 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/44 pages [0KB], 65536KB chunk
md124 : active raid10 sdl1[0] sdm1[1] sdn1[2] sdk1[4]
7813770240 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/59 pages [0KB], 65536KB chunk
lsblk /dev/sdn
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sdn 8:208 0 3.6T 0 disk
└─sdn1 8:209 0 3.6T 0 part
└─md124 9:124 0 7.3T 0 raid10
└─VG_D1-LV_D1 253:8 0 12.7T 0 lvm /NAS/D1
fdisk -l /dev/sdn
Disk /dev/sdn: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFPX-68C
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 131072 bytes
Disklabel type: gpt
Disk identifier: ...
Device Start End Sectors Size Type
/dev/sdn1 2048 7814037134 7814035087 3.6T Linux RAID
fdisk -l /dev/sdn1
Disk /dev/sdn1: 3.64 TiB, 4000785964544 bytes, 7814035087 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 131072 bytes
cat /proc/mounts | grep D1
/dev/mapper/VG_D1-LV_D1 /NAS/D1 ext4 rw,relatime,stripe=128 0 0
touch /NAS/D1/test.file
stat /NAS/D1/test.file
File: /NAS/D1/test.file
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 253,8 Inode: 11 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2025-03-31 12:33:48.110052013 -0400
Modify: 2025-03-31 12:33:48.110052013 -0400
Change: 2025-03-31 12:33:48.110052013 -0400
Birth: 2025-03-31 12:33:07.272309441 -0400
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-03-31 16:36 ` pgnd
@ 2025-04-01 2:29 ` Xiao Ni
2025-04-01 2:41 ` pgnd
0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-04-01 2:29 UTC (permalink / raw)
To: pgnd; +Cc: linux-raid
On Tue, Apr 1, 2025 at 12:36 AM pgnd <pgnd@dev-mail.net> wrote:
>
> hi.
>
> > Are there some D state progress?
>
> no, there were none.
>
> the rebuild 'completed' in ~ 1hr 15mins ...
> atm, the array's up, passing all tests, and seemingly fully functional
I'm glad to hear this, so everything works well now :)
>
> > And how about `ps auxf | grep md`?
>
> ps auxf | grep md
> root 97 0.0 0.0 0 0 ? SN 09:10 0:00 \_ [ksmd]
> root 107 0.0 0.0 0 0 ? I< 09:10 0:00 \_ [kworker/R-md]
> root 108 0.0 0.0 0 0 ? I< 09:10 0:00 \_ [kworker/R-md_bitmap]
> root 1049 0.0 0.0 0 0 ? S 09:10 0:00 \_ [md124_raid10]
> root 1052 0.0 0.0 0 0 ? S 09:10 0:00 \_ [md123_raid10]
> root 1677 0.0 0.0 0 0 ? S 09:10 0:00 \_ [jbd2/md126-8]
> root 1 0.0 0.0 24820 15536 ? Ss 09:10 0:03 /usr/lib/systemd/systemd --switched-root --system --deserialize=49 domdadm dolvm showopts noquiet
> root 1308 0.0 0.0 32924 8340 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-journald
> root 1368 0.0 0.0 36620 11596 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-udevd
> systemd+ 1400 0.0 0.0 17564 9160 ? Ss 09:10 0:00 /usr/lib/systemd/systemd-networkd
> systemd+ 2010 0.0 0.0 15932 7112 ? Ss 09:11 0:02 /usr/lib/systemd/systemd-oomd
> root 2029 0.0 0.0 4176 2128 ? Ss 09:11 0:00 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
> root 2055 0.0 0.0 16648 8012 ? Ss 09:11 0:00 /usr/lib/systemd/systemd-logind
> root 2121 0.0 0.0 21176 12288 ? Ss 09:11 0:00 /usr/lib/systemd/systemd --user
> root 4105 0.0 0.0 230344 2244 pts/0 S+ 12:21 0:00 \_ grep --color=auto md
> root 2247 0.0 0.0 113000 6236 ? Ssl 09:11 0:00 /usr/sbin/automount --systemd-service --dont-check-daemon
>
> > there a filesystem on it? If so, can you still read/write data from it?
>
> yes, and yes.
>
> pvs
> PV VG Fmt Attr PSize PFree
> /dev/md123 VG_D1 lvm2 a-- 5.45t 0
> /dev/md124 VG_D1 lvm2 a-- 7.27t 0
> vgs
> VG #PV #LV #SN Attr VSize VFree
> VG_D1 2 1 0 wz--n- 12.72t 0
> lvs
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
> LV_D1 VG_D1 -wi-ao---- 12.72t
>
> cat /proc/mdstat
> Personalities : [raid1] [raid10]
> md123 : active (auto-read-only) raid10 sdg1[3] sdh1[4] sdj1[2] sdi1[1]
> 5860265984 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
> bitmap: 0/44 pages [0KB], 65536KB chunk
>
> md124 : active raid10 sdl1[0] sdm1[1] sdn1[2] sdk1[4]
> 7813770240 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
> bitmap: 0/59 pages [0KB], 65536KB chunk
>
> lsblk /dev/sdn
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
> sdn 8:208 0 3.6T 0 disk
> └─sdn1 8:209 0 3.6T 0 part
> └─md124 9:124 0 7.3T 0 raid10
> └─VG_D1-LV_D1 253:8 0 12.7T 0 lvm /NAS/D1
>
> fdisk -l /dev/sdn
> Disk /dev/sdn: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
> Disk model: WDC WD40EFPX-68C
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 131072 bytes
> Disklabel type: gpt
> Disk identifier: ...
>
> Device Start End Sectors Size Type
> /dev/sdn1 2048 7814037134 7814035087 3.6T Linux RAID
>
> fdisk -l /dev/sdn1
> Disk /dev/sdn1: 3.64 TiB, 4000785964544 bytes, 7814035087 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 131072 bytes
>
> cat /proc/mounts | grep D1
> /dev/mapper/VG_D1-LV_D1 /NAS/D1 ext4 rw,relatime,stripe=128 0 0
>
>
> touch /NAS/D1/test.file
> stat /NAS/D1/test.file
> File: /NAS/D1/test.file
> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
> Device: 253,8 Inode: 11 Links: 1
> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2025-03-31 12:33:48.110052013 -0400
> Modify: 2025-03-31 12:33:48.110052013 -0400
> Change: 2025-03-31 12:33:48.110052013 -0400
> Birth: 2025-03-31 12:33:07.272309441 -0400
>
Regards
Xiao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-04-01 2:29 ` Xiao Ni
@ 2025-04-01 2:41 ` pgnd
2025-04-01 4:48 ` Xiao Ni
0 siblings, 1 reply; 7+ messages in thread
From: pgnd @ 2025-04-01 2:41 UTC (permalink / raw)
To: xni; +Cc: linux-raid
> so everything works well now
the OP question remains unanswered. namely what the issue is re:
finish=3918230862.4min speed=0K/sec
, and whether it's an indication of a functional problem with the rebuild, one of the mdadm util, or of config?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-04-01 2:41 ` pgnd
@ 2025-04-01 4:48 ` Xiao Ni
2025-04-01 12:47 ` pgnd
0 siblings, 1 reply; 7+ messages in thread
From: Xiao Ni @ 2025-04-01 4:48 UTC (permalink / raw)
To: pgnd; +Cc: linux-raid
On Tue, Apr 1, 2025 at 10:41 AM pgnd <pgnd@dev-mail.net> wrote:
>
> > so everything works well now
>
> the OP question remains unanswered. namely what the issue is re:
>
> finish=3918230862.4min speed=0K/sec
>
> , and whether it's an indication of a functional problem with the rebuild, one of the mdadm util, or of config?
>
I can't give you an answer now. Because there is very little
information. But the mdadm --monitor --scan is a suspicious place. Do
you know why this command is running? I tried to start a recovery and
didn't see this command.
Thanks
Xiao
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: extreme RAID10 rebuild times reported, but rebuild's progressing ?
2025-04-01 4:48 ` Xiao Ni
@ 2025-04-01 12:47 ` pgnd
0 siblings, 0 replies; 7+ messages in thread
From: pgnd @ 2025-04-01 12:47 UTC (permalink / raw)
To: xni; +Cc: linux-raid
> mdadm --monitor --scan is a suspicious place. Do
> you know why this command is running? I tried to start a recovery and
> didn't see this command.
it's the `mdmonitor` service ...
rpm -q --whatprovides /usr/lib/systemd/system/mdmonitor.service
mdadm-4.3-4.fc41.x86_64
cat /usr/lib/systemd/system/mdmonitor.service
[Unit]
Description=Software RAID monitoring and management
ConditionPathExists=/etc/mdadm.conf
[Service]
Type=forking
PIDFile=/run/mdadm/mdadm.pid
EnvironmentFile=-/etc/sysconfig/mdmonitor
ExecStart=/sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
[Install]
WantedBy=multi-user.target
systemctl status mdmonitor
● mdmonitor.service - Software RAID monitoring and management
Loaded: loaded (/usr/lib/systemd/system/mdmonitor.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/service.d
└─10-timeout-abort.conf, 50-keep-warm.conf
Active: active (running) since Mon 2025-03-31 09:11:17 EDT; 23h ago
Invocation: 79247157fbfb4c369c1cc7899b4d79f2
Main PID: 2029 (mdadm)
Tasks: 1 (limit: 76108)
Memory: 988K (peak: 1.2M)
CPU: 1.626s
CGroup: /system.slice/mdmonitor.service
!!! └─2029 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
ps aufx | grep /mdadm
root 2029 0.0 0.0 4176 2128 ? Ss Mar31 0:01 /sbin/mdadm --monitor --scan --syslog -f --pid-file=/run/mdadm/mdadm.pid
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-01 12:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-30 15:56 extreme RAID10 rebuild times reported, but rebuild's progressing ? pgnd
2025-03-31 16:11 ` Xiao Ni
2025-03-31 16:36 ` pgnd
2025-04-01 2:29 ` Xiao Ni
2025-04-01 2:41 ` pgnd
2025-04-01 4:48 ` Xiao Ni
2025-04-01 12:47 ` pgnd
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).