linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to debug intermittent increasing md/inflight but no disk activity?
@ 2024-07-10 11:46 Paul Menzel
  2024-07-10 11:54 ` Roger Heflin
  2024-07-10 23:12 ` Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Menzel @ 2024-07-10 11:46 UTC (permalink / raw)
  To: linux-raid, linux-nfs; +Cc: linux-block, linux-xfs, it+linux-raid

Dear Linux folks,


Exporting directories over NFS on a Dell PowerEdge R420 with Linux 
5.15.86, users noticed intermittent hangs. For example,

     df /project/something # on an NFS client

on a different system timed out.

     @grele:~$ more /proc/mdstat
     Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] 
[multipath]
     md3 : active raid6 sdr[0] sdp[11] sdx[10] sdt[9] sdo[8] sdw[7] 
sds[6] sdm[5] sdu[4] sdq[3] sdn[2] sdv[1]
           156257474560 blocks super 1.2 level 6, 1024k chunk, algorithm 
2 [12/12] [UUUUUUUUUUUU]
           bitmap: 0/117 pages [0KB], 65536KB chunk

     md2 : active raid6 sdap[0] sdan[11] sdav[10] sdar[12] sdam[8] 
sdau[7] sdaq[6] sdak[5] sdas[4] sdao[3] sdal[2] sdat[1]
           156257474560 blocks super 1.2 level 6, 1024k chunk, algorithm 
2 [12/12] [UUUUUUUUUUUU]
           bitmap: 0/117 pages [0KB], 65536KB chunk

     md1 : active raid6 sdb[0] sdl[11] sdh[10] sdd[9] sdk[8] sdg[7] 
sdc[6] sdi[5] sde[4] sda[3] sdj[2] sdf[1]
           156257474560 blocks super 1.2 level 6, 1024k chunk, algorithm 
2 [12/12] [UUUUUUUUUUUU]
           bitmap: 2/117 pages [8KB], 65536KB chunk

     md0 : active raid6 sdaj[0] sdz[11] sdad[10] sdah[9] sdy[8] sdac[7] 
sdag[6] sdaa[5] sdae[4] sdai[3] sdab[2] sdaf[1]
           156257474560 blocks super 1.2 level 6, 1024k chunk, algorithm 
2 [12/12] [UUUUUUUUUUUU]
           bitmap: 7/117 pages [28KB], 65536KB chunk

     unused devices: <none>

In that time, we noticed all 64 NFSD processes being in uninterruptible 
sleep and the I/O requests currently in process increasing for the RAID6 
device *md0*

     /sys/devices/virtual/block/md0/inflight : 10 921

but with no disk activity according to iostat. There was only “little 
NFS activity” going on as far as we saw. This alternated for around half 
an our, and then we decreased the NFS processes from 64 to 8. After a 
while the problem settled, meaning the I/O requests went down, so it 
might be related to the access pattern, but we’d be curious to figure 
out exactly what is going on.

We captured some more data from sysfs [1].

Of course it’s not reproducible, but any insight how to debug this next 
time is much welcomed.


Kind regards,

Paul


[1]: https://owww.molgen.mpg.de/~pmenzel/grele.2.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-23 15:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 11:46 How to debug intermittent increasing md/inflight but no disk activity? Paul Menzel
2024-07-10 11:54 ` Roger Heflin
2024-07-23 10:33   ` Paul Menzel
2024-07-10 23:12 ` Dave Chinner
2024-07-11  8:51   ` Johannes Truschnigg
2024-07-11 11:23   ` Andre Noll
2024-07-11 22:26     ` Dave Chinner
2024-07-13 15:47       ` Andre Noll
2024-07-23 15:13     ` Paul Menzel
2024-07-12  3:54   ` Dragan Milivojević
2024-07-12 23:45     ` Dave Chinner
2024-07-13 17:44       ` Dragan Milivojević

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).