From: Dave Chinner <david@fromorbit.com>
To: "Dragan Milivojević" <galileo@pkm-inc.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
linux-raid@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
it+linux-raid@molgen.mpg.de
Subject: Re: How to debug intermittent increasing md/inflight but no disk activity?
Date: Sat, 13 Jul 2024 09:45:03 +1000 [thread overview]
Message-ID: <ZpG///ZaN9KfPPcf@dread.disaster.area> (raw)
In-Reply-To: <7c300510-bab8-4389-adba-c3219a11578d@pkm-inc.com>
On Fri, Jul 12, 2024 at 05:54:05AM +0200, Dragan Milivojević wrote:
> On 11/07/2024 01:12, Dave Chinner wrote:
> > Probably not a lot you can do short of reconfiguring your RAID6
> > storage devices to handle small IOs better. However, in general,
> > RAID6 /always sucks/ for small IOs, and the only way to fix this
> > problem is to use high performance SSDs to give you a massive excess
> > of write bandwidth to burn on write amplification....
> RAID5/6 has the same issues with NVME drives.
> Major issue is the bitmap.
That's irrelevant to the problem being discussed. The OP is
reporting stalls due to the bursty incoming workload vastly
outpacing the rate of draining of storage device. the above comment
is not about how close to "raw performace" the MD device gets on
NVMe SSDs - it's about how much faster it is for the given workload
than HDDs.
i.e. waht matters is the relative performance differential, and
according to you numbers below, it is at least two orders of
magnitude. That would make a 100s stall into a 1s stall, and that
would largely make the OP's problems go away....
> 5 disk NVMe RAID5, 64K chunk
>
> Test BW IOPS
> bitmap internal 64M 700KiB/s 174
> bitmap internal 128M 702KiB/s 175
> bitmap internal 512M 1142KiB/s 285
> bitmap internal 1024M 40.4MiB/s 10.3k
> bitmap internal 2G 66.5MiB/s 17.0k
> bitmap external 64M 67.8MiB/s 17.3k
> bitmap external 1024M 76.5MiB/s 19.6k
> bitmap none 80.6MiB/s 20.6k
> Single disk 1K 54.1MiB/s 55.4k
> Single disk 4K 269MiB/s 68.8k
>
> Tested with fio --filename=/dev/md/raid5 --direct=1 --rw=randwrite
> --bs=4k --ioengine=libaio --iodepth=1 --runtime=60 --numjobs=1
> --group_reporting --time_based --name=Raid5
Oh, you're only testing a single depth block aligned async direct IO
random write to the block device. The problem case that was reported
was unaligned, synchronous buffered IO to multiple files through the
the filesystem page cache (i.e. RMW at the page cache level as well
as the MD device) at IO depths of up to 64 with periodic fsyncs
thrown into the mix.
So the OP's workload was not only doing synchronous buffered writes,
they also triggered a lot of dependent synchronous random read IO to
go with the async write IOs issued by fsyncs and page cache
writeback.
If you were to simulate all that, I would expect that the difference
between HDDs and NVMe SSDs to be much greater than just 2 orders of
magnitude.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-07-12 23:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 11:46 How to debug intermittent increasing md/inflight but no disk activity? Paul Menzel
2024-07-10 11:54 ` Roger Heflin
2024-07-23 10:33 ` Paul Menzel
2024-07-10 23:12 ` Dave Chinner
2024-07-11 8:51 ` Johannes Truschnigg
2024-07-11 11:23 ` Andre Noll
2024-07-11 22:26 ` Dave Chinner
2024-07-13 15:47 ` Andre Noll
2024-07-23 15:13 ` Paul Menzel
2024-07-12 3:54 ` Dragan Milivojević
2024-07-12 23:45 ` Dave Chinner [this message]
2024-07-13 17:44 ` Dragan Milivojević
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpG///ZaN9KfPPcf@dread.disaster.area \
--to=david@fromorbit.com \
--cc=galileo@pkm-inc.com \
--cc=it+linux-raid@molgen.mpg.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox