From: Neil Brown <neilb@suse.de>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Mike Snitzer" <snitzer@redhat.com>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Alasdair G Kergon" <agk@redhat.com>,
jens.axboe@oracle.com, linux-scsi@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"device-mapper development" <dm-devel@redhat.com>
Subject: Re: [dm-devel] REQUEST for new 'topology' metrics to be moved out of the 'queue' sysfs directory.
Date: Tue, 7 Jul 2009 11:47:06 +1000 [thread overview]
Message-ID: <19026.43290.340555.774690@notabene.brown> (raw)
In-Reply-To: message from Martin K. Petersen on Friday June 26
On Friday June 26, martin.petersen@oracle.com wrote:
> >>>>> "Neil" == Neil Brown <neilb@suse.de> writes:
>
> Neil> Providing the fields are clearly and unambiguously documented so
> Neil> that it I can use the documentation to verify the implementation
> Neil> (in md at least), I will be satisfied.
>
> The current sysfs documentation says:
>
> /sys/block/<disk>/queue/minimum_io_size:
> [...] For RAID arrays it is often the stripe chunk size.
>
> /sys/block/<disk>/queue/optimal_io_size:
> [...] For RAID devices it is usually the stripe width or the internal
> block size.
>
> The latter should be "internal track size". But in the context of MD I
> think those two definitions are crystal clear.
They might be "clear" but I'm not convinced that they are "correct".
>
>
> As far as making the application of these values more obvious I propose
> the following:
>
> What: /sys/block/<disk>/queue/minimum_io_size
> Date: April 2009
> Contact: Martin K. Petersen <martin.petersen@oracle.com>
> Description:
> Storage devices may report a granularity or minimum I/O
> size which is the device's preferred unit of I/O.
> Requests smaller than this may incur a significant
> performance penalty.
>
> For disk drives this value corresponds to the physical
> block size. For RAID devices it is usually the stripe
> chunk size.
These two paragraphs are contradictory. There is no sense in which a
RAID chunk size is a preferred minimum I/O size.
To some degree it is actually a 'maximum' preferred size for random
IO. If you do random IO is blocks larger than the chunk size then you
risk causing more 'head contention' (at least with RAID0 - with RAID5
the tradeoff is more complex).
If you are talking about "alignment", then yes - the chunk size is an
appropriate size to align on. But so are the block size and the
stripe size and none is, in general, any better than any other.
Also, you say "may" report. If a device does not report, what happens
to this file. Is it not present, or empty, or contain a special
"undefined" value?
I think the answer is that "512" is reported. It might be good to
explicitly document that.
>
> A properly aligned multiple of minimum_io_size is the
> preferred request size for workloads where a high number
> of I/O operations is desired.
>
>
> What: /sys/block/<disk>/queue/optimal_io_size
> Date: April 2009
> Contact: Martin K. Petersen <martin.petersen@oracle.com>
> Description:
> Storage devices may report an optimal transfer length or
> streaming I/O size which is the device's preferred unit
> of sustained I/O. This value is a multiple of the
> device's minimum_io_size.
>
> optimal_io_size is rarely reported for disk drives. For
> RAID devices it is usually the stripe width or the
> internal track size.
>
> A properly aligned multiple of optimal_io_size is the
> preferred request size for workloads where sustained
> throughput is desired.
In this case, if a device does not report an optimal size, the file
contains "0" - correct? Should that be explicit?
I'd really like to see an example of how you expect filesystems to use
this.
I can well imagine the VM or elevator using this to assemble IO
requests in to properly aligned requests. But I cannot imagine how
e.g mkfs would use it.
Or am I misunderstanding and this is for programs that use O_DIRECT on
the block device so they can optimise their request stream?
Thanks,
NeilBrown
next prev parent reply other threads:[~2009-07-07 1:47 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-25 3:58 REQUEST for new 'topology' metrics to be moved out of the 'queue' sysfs directory Neil Brown
2009-06-25 8:00 ` Martin K. Petersen
2009-06-25 11:07 ` [dm-devel] " NeilBrown
2009-06-25 11:36 ` John Robinson
2009-06-25 17:43 ` Martin K. Petersen
2009-06-25 12:17 ` berthiaume_wayne
2009-06-25 17:38 ` Martin K. Petersen
2009-06-25 17:46 ` Linus Torvalds
2009-06-25 19:34 ` Jens Axboe
2009-06-26 11:58 ` [dm-devel] " Neil Brown
2009-06-26 14:48 ` Martin K. Petersen
2009-07-07 1:47 ` Neil Brown [this message]
2009-07-07 5:29 ` Martin K. Petersen
2009-07-09 0:42 ` Neil Brown
2009-07-07 22:06 ` Bill Davidsen
2009-06-25 19:40 ` [dm-devel] " Jens Axboe
2009-06-26 12:41 ` Neil Brown
2009-06-26 12:50 ` Jens Axboe
2009-06-26 13:16 ` NeilBrown
2009-06-26 13:27 ` Jens Axboe
2009-06-26 13:41 ` NeilBrown
2009-06-26 13:49 ` Jens Axboe
2009-06-27 12:50 ` Neil Brown
2009-06-26 13:23 ` [dm-devel] " NeilBrown
2009-06-26 13:29 ` Jens Axboe
2009-06-27 12:32 ` Neil Brown
2009-06-29 10:18 ` [dm-devel] " Jens Axboe
2009-06-29 10:52 ` NeilBrown
2009-06-29 11:41 ` Jens Axboe
2009-06-29 12:45 ` Boaz Harrosh
2009-06-29 12:52 ` Jens Axboe
2009-06-29 23:09 ` Andreas Dilger
2009-07-01 0:29 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19026.43290.340555.774690@notabene.brown \
--to=neilb@suse.de \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=snitzer@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).