public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: "Damien Le Moal" <dlemoal@kernel.org>,
	"Csordás Hunor" <csordas.hunor@gmail.com>,
	"Coly Li" <colyli@kernel.org>,
	hch@lst.de, linux-block@vger.kernel.org,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Improper io_opt setting for md raid5
Date: Tue, 29 Jul 2025 00:23:22 -0400	[thread overview]
Message-ID: <yq1zfcnljtw.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <c8c4d140-4ca4-9998-dea3-62341a28c7c5@huaweicloud.com> (Yu Kuai's message of "Mon, 28 Jul 2025 17:02:47 +0800")


> Ok, looks like there are two problems now:
>
> a) io_min, size to prevent performance penalty;
>
>  1) For raid5, to avoid read-modify-write, this value should be 448k,
>     but now it's 64k;

You have two penalties for RAID5: Writes smaller than the stripe chunk
size and writes smaller than the full stripe width.

>  2) For raid0/raid10, this value is set to 64k now, however, this value
>     should not set. If the value in member disks is 4k, issue 4k is just
>     fine, there won't be any performance penalty;

Correct.

>  3) For raid1, this value is not set, and will use member disks, this is
>     correct.

Correct.

> b) io_opt, size to ???
>  4) For raid0/raid10/rai5, this value is set to mininal IO size to get
>     best performance.

For RAID 0 you want to set io_opt to the stripe width. io_opt is for
sequential, throughput-optimized I/O. Presumably the MD stripe chunk
size has been chosen based on knowledge about the underlying disks and
their performance. And thus maximum throughput will be achieved when
doing full stripe writes across all drives.

For software RAID I am not sure how much this really matters in a modern
context. It certainly did 25 years ago when we benchmarked things for
XFS. Full stripe writes were a big improvement with both software and
hardware RAID. But how much this matters today, I am not sure.

>  5) For raid1, this value is not set, and will use member disks.

Correct.

>
> If io_opt should be *upper bound*, problem 4) should be fixed like case
> 5), and other places like blk_apply_bdi_limits() setting ra_pages by
> io_opt should be fixed as well.

I understand Damien's "upper bound" interpretation but it does not take
alignment and granularity into account. And both are imperative for
io_opt.

> If io_opt should be *mininal IO size to get best performance*,

What is "best performance"? IOPS or throughput?

io_min is about IOPS. io_opt is about throughput.

-- 
Martin K. Petersen

  reply	other threads:[~2025-07-29  4:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ywsfp3lqnijgig6yrlv2ztxram6ohf5z4yfeebswjkvp2dzisd@f5ikoyo3sfq5>
2025-07-27 10:50 ` Improper io_opt setting for md raid5 Csordás Hunor
2025-07-28  0:39   ` Damien Le Moal
2025-07-28  0:55     ` Yu Kuai
2025-07-28  2:41       ` Damien Le Moal
2025-07-28  3:08         ` Yu Kuai
2025-07-28  3:49           ` Damien Le Moal
2025-07-28  7:14             ` Yu Kuai
2025-07-28  7:44               ` Damien Le Moal
2025-07-28  9:02                 ` Yu Kuai
2025-07-29  4:23                   ` Martin K. Petersen [this message]
2025-07-29  6:25                     ` Yu Kuai
2025-07-29 22:02                     ` Tony Battersby
2025-07-29  6:13                   ` Hannes Reinecke
2025-07-29  6:29                     ` Yu Kuai
2025-07-29 22:24                     ` Keith Busch
2025-07-28 10:56                 ` Csordás Hunor
2025-07-29  4:08                 ` Martin K. Petersen
2025-07-29  3:53               ` Martin K. Petersen
2025-07-29  3:49             ` Martin K. Petersen
2025-07-29  4:44   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1zfcnljtw.fsf@ca-mkp.ca.oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=colyli@kernel.org \
    --cc=csordas.hunor@gmail.com \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox