public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: "Yu Kuai" <yukuai1@huaweicloud.com>,
	"Csordás Hunor" <csordas.hunor@gmail.com>,
	"Coly Li" <colyli@kernel.org>,
	hch@lst.de, linux-block@vger.kernel.org,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: Improper io_opt setting for md raid5
Date: Mon, 28 Jul 2025 23:49:58 -0400	[thread overview]
Message-ID: <yq1h5yvn0bb.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <a1626eef-9846-4824-a899-2fbd8e369fac@kernel.org> (Damien Le Moal's message of "Mon, 28 Jul 2025 12:49:45 +0900")


Damien,

> An OPTIMAL TRANSFER LENGTH GRANULARITY field set to 0000h indicates
> that the device server does not report optimal transfer length
> granularity.
>
> For a SCSI disk, sd.c uses this value for sdkp->min_xfer_blocks. Note
> that the naming here is dubious since this is not a minimum. The
> minimum is the logical block size.

min_io is a preferred minimum, not an absolute minimum, just like the
physical block size. You can do smaller I/Os but you don't want to.

> Storage devices may report an optimal I/O size, which is the device's
> preferred unit for sustained I/O. This is rarely reported for disk
> drives. For RAID arrays it is usually the stripe width or the internal
> track size. A properly aligned multiple of optimal_io_size is the
> preferred request size for workloads where sustained throughput is
> desired. If no optimal I/O size is reported this file contains 0.
>
> Well, I find this definition not correct *at all*. This is repeating
> the definition of minimum_io_size (limits->io_min) and completely
> disregard the eventual optimal_io_size limit of the drives in the
> array.

opt_io defines the optimal I/O size for a sequential/throughput-focused
workload. You can do larger I/Os but you don't want to.

RAID arrays at the time the spec was written had sophisticated
non-volatile caches which managed data in tracks or cache lines. When
you issued an I/O which straddled internal cache lines in the array, you
would fall back to a slow path as opposed to doing things entirely in
hardware. So the purpose of the optimal transfer length and granularity
in SCSI was to encourage applications to write naturally aligned full
tracks/cache lines/stripes.

> For a raid array, this value should obviously be a multiple of
> minimum_io_size (the array stripe size),

SCSI does not require this but we enforce it in sd.c.

-- 
Martin K. Petersen

  parent reply	other threads:[~2025-07-29  3:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <ywsfp3lqnijgig6yrlv2ztxram6ohf5z4yfeebswjkvp2dzisd@f5ikoyo3sfq5>
2025-07-27 10:50 ` Improper io_opt setting for md raid5 Csordás Hunor
2025-07-28  0:39   ` Damien Le Moal
2025-07-28  0:55     ` Yu Kuai
2025-07-28  2:41       ` Damien Le Moal
2025-07-28  3:08         ` Yu Kuai
2025-07-28  3:49           ` Damien Le Moal
2025-07-28  7:14             ` Yu Kuai
2025-07-28  7:44               ` Damien Le Moal
2025-07-28  9:02                 ` Yu Kuai
2025-07-29  4:23                   ` Martin K. Petersen
2025-07-29  6:25                     ` Yu Kuai
2025-07-29 22:02                     ` Tony Battersby
2025-07-29  6:13                   ` Hannes Reinecke
2025-07-29  6:29                     ` Yu Kuai
2025-07-29 22:24                     ` Keith Busch
2025-07-28 10:56                 ` Csordás Hunor
2025-07-29  4:08                 ` Martin K. Petersen
2025-07-29  3:53               ` Martin K. Petersen
2025-07-29  3:49             ` Martin K. Petersen [this message]
2025-07-29  4:44   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1h5yvn0bb.fsf@ca-mkp.ca.oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=colyli@kernel.org \
    --cc=csordas.hunor@gmail.com \
    --cc=dlemoal@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox