From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Roger Heflin <rogerheflin@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
it+linux-raid@molgen.mpg.de
Subject: Re: How to debug intermittent increasing md/inflight but no disk activity?
Date: Tue, 23 Jul 2024 12:33:33 +0200 [thread overview]
Message-ID: <02ceb39e-e4fb-4f62-ac40-7afafbd620c1@molgen.mpg.de> (raw)
In-Reply-To: <CAAMCDedmjyyn93V+ScRTyqd1FbW5VJmbZHGMss3iwyqxwJL3Pg@mail.gmail.com>
Dear Roger,
Thank you for your reply.
Am 10.07.24 um 13:54 schrieb Roger Heflin:
> How long does it freeze this way?
It froze up to five minutes I’d say.
> The disks getting bad blocks do show up as stopping activity for 3-60
> seconds (depending on the disks internal settings).
>
> smartctl --xall <device> | grep -iE 'sector|reall' should show the
> reallocation counters.
These are SAS disks, and none of the array members has any errors. Example:
```
@grele:~$ sudo smartctl --xall /dev/sdy
[…]
Error counter log:
Errors Corrected by Total Correction
Gigabytes Total
ECC rereads/ errors algorithm
processed uncorrected
fast | delayed rewrites corrected invocations [10^9
bytes] errors
read: 0 0 0 0 0 655487.372
0
write: 0 0 0 0 0 38289.771
0
```
> What kind of disks does the machine have?
Seagate ST16000NM004J (16 TB, SAS)
> On my home machine a bad sector freezes it for 7 seconds (scterc
> defaults to 7). On some work large disk big raid the hang is minutes.
> The raw disk is set to 10 (that is what the vendor told us) and
> that 10 + having potentially a bunch of IOs against the bad sector
> shows as minutes.
>
> I wrote a script that work uses that both times how long smartctl
> takes for each disk (the bad disk takes >5 seconds, and up to minutes)
> and also shows the reallocated count and save a copy every hour so one
> can see what disk incremented its counter in the last hour and replace
> that disk.
A colleague also wrote a Perl program diskcheck, that is regularly run
to check all the disks. Nothing suspicious here.
Kind regards,
Paul
next prev parent reply other threads:[~2024-07-23 10:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 11:46 How to debug intermittent increasing md/inflight but no disk activity? Paul Menzel
2024-07-10 11:54 ` Roger Heflin
2024-07-23 10:33 ` Paul Menzel [this message]
2024-07-10 23:12 ` Dave Chinner
2024-07-11 8:51 ` Johannes Truschnigg
2024-07-11 11:23 ` Andre Noll
2024-07-11 22:26 ` Dave Chinner
2024-07-13 15:47 ` Andre Noll
2024-07-23 15:13 ` Paul Menzel
2024-07-12 3:54 ` Dragan Milivojević
2024-07-12 23:45 ` Dave Chinner
2024-07-13 17:44 ` Dragan Milivojević
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=02ceb39e-e4fb-4f62-ac40-7afafbd620c1@molgen.mpg.de \
--to=pmenzel@molgen.mpg.de \
--cc=it+linux-raid@molgen.mpg.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=rogerheflin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox