public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Dragan Milivojević" <galileo@pkm-inc.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
	linux-raid@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	it+linux-raid@molgen.mpg.de
Subject: Re: How to debug intermittent increasing md/inflight but no disk activity?
Date: Sat, 13 Jul 2024 09:45:03 +1000	[thread overview]
Message-ID: <ZpG///ZaN9KfPPcf@dread.disaster.area> (raw)
In-Reply-To: <7c300510-bab8-4389-adba-c3219a11578d@pkm-inc.com>

On Fri, Jul 12, 2024 at 05:54:05AM +0200, Dragan Milivojević wrote:
> On 11/07/2024 01:12, Dave Chinner wrote:
> > Probably not a lot you can do short of reconfiguring your RAID6
> > storage devices to handle small IOs better. However, in general,
> > RAID6 /always sucks/ for small IOs, and the only way to fix this
> > problem is to use high performance SSDs to give you a massive excess
> > of write bandwidth to burn on write amplification....
> RAID5/6 has the same issues with NVME drives.
> Major issue is the bitmap.

That's irrelevant to the problem being discussed. The OP is
reporting stalls due to the bursty incoming workload vastly
outpacing the rate of draining of storage device. the above comment
is not about how close to "raw performace" the MD device gets on
NVMe SSDs - it's about how much faster it is for the given workload
than HDDs.

i.e. waht matters is the relative performance differential, and
according to you numbers below, it is at least two orders of
magnitude. That would make a 100s stall into a 1s stall, and that
would largely make the OP's problems go away....

> 5 disk NVMe RAID5, 64K chunk
> 
> Test                   BW         IOPS
> bitmap internal 64M    700KiB/s   174
> bitmap internal 128M   702KiB/s   175
> bitmap internal 512M   1142KiB/s  285
> bitmap internal 1024M  40.4MiB/s  10.3k
> bitmap internal 2G     66.5MiB/s  17.0k
> bitmap external 64M    67.8MiB/s  17.3k
> bitmap external 1024M  76.5MiB/s  19.6k
> bitmap none            80.6MiB/s  20.6k
> Single disk 1K         54.1MiB/s  55.4k
> Single disk 4K         269MiB/s   68.8k
> 
> Tested with fio --filename=/dev/md/raid5 --direct=1 --rw=randwrite
> --bs=4k --ioengine=libaio --iodepth=1 --runtime=60 --numjobs=1
> --group_reporting --time_based --name=Raid5

Oh, you're only testing a single depth block aligned async direct IO
random write to the block device. The problem case that was reported
was unaligned, synchronous buffered IO to multiple files through the
the filesystem page cache (i.e. RMW at the page cache level as well
as the MD device) at IO depths of up to 64 with periodic fsyncs
thrown into the mix. 

So the OP's workload was not only doing synchronous buffered writes,
they also triggered a lot of dependent synchronous random read IO to
go with the async write IOs issued by fsyncs and page cache
writeback.

If you were to simulate all that, I would expect that the difference
between HDDs and NVMe SSDs to be much greater than just 2 orders of
magnitude.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-07-12 23:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10 11:46 How to debug intermittent increasing md/inflight but no disk activity? Paul Menzel
2024-07-10 11:54 ` Roger Heflin
2024-07-23 10:33   ` Paul Menzel
2024-07-10 23:12 ` Dave Chinner
2024-07-11  8:51   ` Johannes Truschnigg
2024-07-11 11:23   ` Andre Noll
2024-07-11 22:26     ` Dave Chinner
2024-07-13 15:47       ` Andre Noll
2024-07-23 15:13     ` Paul Menzel
2024-07-12  3:54   ` Dragan Milivojević
2024-07-12 23:45     ` Dave Chinner [this message]
2024-07-13 17:44       ` Dragan Milivojević

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZpG///ZaN9KfPPcf@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=galileo@pkm-inc.com \
    --cc=it+linux-raid@molgen.mpg.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox