From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>,
linux-scsi@vger.kernel.org, jens.axboe@oracle.com,
linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
linux-ide@vger.kernel.org,
device-mapper development <dm-devel@redhat.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Alasdair G Kergon <agk@redhat.com>
Subject: Re: REQUEST for new 'topology' metrics to be moved out of the 'queue' sysfs directory.
Date: Tue, 07 Jul 2009 01:29:37 -0400 [thread overview]
Message-ID: <yq163e55ada.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <19026.43290.340555.774690@notabene.brown> (Neil Brown's message of "Tue, 7 Jul 2009 11:47:06 +1000")
>>>>> "Neil" == Neil Brown <neilb@suse.de> writes:
>> What: /sys/block/<disk>/queue/minimum_io_size Date: April 2009
>> Contact: Martin K. Petersen <martin.petersen@oracle.com> Description:
>> Storage devices may report a granularity or minimum I/O size which is
>> the device's preferred unit of I/O. Requests smaller than this may
>> incur a significant performance penalty.
>>
>> For disk drives this value corresponds to the physical block
>> size. For RAID devices it is usually the stripe chunk size.
Neil> These two paragraphs are contradictory. There is no sense in
Neil> which a RAID chunk size is a preferred minimum I/O size.
Maybe not for MD. This is not just about MD.
This is a hint that says "Please don't send me random I/Os smaller than
this. And please align to a multiple of this value".
I agree that for MD devices the alignment portion of that is the
important one. However, putting a lower boundary on the size *is* quite
important for 4KB disk drives. There are also HW RAID devices that
choke on requests smaller than the chunk size.
I appreciate the difficulty in filling out these hints in a way that
makes sense for all the supported RAID levels in MD. However, I really
don't consider the hints particularly interesting in the isolated
context of MD. To me the hints are conduits for characteristics of the
physical storage. The question you should be asking yourself is: "What
do I put in these fields to help the filesystem so that we get the most
out of the underlying, slow hardware?".
I think it is futile to keep spending time coming up with terminology
that encompasses all current and future software and hardware storage
devices with 100% accuracy.
Neil> To some degree it is actually a 'maximum' preferred size for
Neil> random IO. If you do random IO is blocks larger than the chunk
Neil> size then you risk causing more 'head contention' (at least with
Neil> RAID0 - with RAID5 the tradeoff is more complex).
Please elaborate.
Neil> Also, you say "may" report. If a device does not report, what
Neil> happens to this file. Is it not present, or empty, or contain a
Neil> special "undefined" value? I think the answer is that "512" is
Neil> reported.
The answer is physical_block_size.
Neil> In this case, if a device does not report an optimal size, the
Neil> file contains "0" - correct? Should that be explicit?
Now documented.
Neil> I'd really like to see an example of how you expect filesystems to
Neil> use this. I can well imagine the VM or elevator using this to
Neil> assemble IO requests in to properly aligned requests. But I
Neil> cannot imagine how e.g mkfs would use it. Or am I
Neil> misunderstanding and this is for programs that use O_DIRECT on the
Neil> block device so they can optimise their request stream?
The way it has been working so far (with the manual ioctl pokage) is
that mkfs will align metadata as well as data on a minimum_io_size
boundary. And it will try to use the minimum_io_size as filesystem
block size. On Linux that's currently limited by the fact that we can't
have blocks bigger than a page. The filesystem can also report the
optimal I/O size in statfs. For XFS the stripe width also affects how
the realtime/GRIO allocators work.
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2009-07-07 5:29 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-25 3:58 REQUEST for new 'topology' metrics to be moved out of the 'queue' sysfs directory Neil Brown
2009-06-25 8:00 ` Martin K. Petersen
2009-06-25 11:07 ` [dm-devel] " NeilBrown
2009-06-25 11:36 ` John Robinson
2009-06-25 17:43 ` Martin K. Petersen
2009-06-25 12:17 ` berthiaume_wayne
2009-06-25 17:38 ` Martin K. Petersen
2009-06-25 17:46 ` Linus Torvalds
2009-06-25 19:34 ` Jens Axboe
2009-06-26 11:58 ` [dm-devel] " Neil Brown
2009-06-26 14:48 ` Martin K. Petersen
2009-07-07 1:47 ` [dm-devel] " Neil Brown
2009-07-07 5:29 ` Martin K. Petersen [this message]
2009-07-09 0:42 ` Neil Brown
2009-07-07 22:06 ` Bill Davidsen
2009-06-25 19:40 ` [dm-devel] " Jens Axboe
2009-06-26 12:41 ` Neil Brown
2009-06-26 12:50 ` Jens Axboe
2009-06-26 13:16 ` NeilBrown
2009-06-26 13:27 ` Jens Axboe
2009-06-26 13:41 ` NeilBrown
2009-06-26 13:49 ` Jens Axboe
2009-06-27 12:50 ` Neil Brown
2009-06-26 13:23 ` [dm-devel] " NeilBrown
2009-06-26 13:29 ` Jens Axboe
2009-06-27 12:32 ` Neil Brown
2009-06-29 10:18 ` [dm-devel] " Jens Axboe
2009-06-29 10:52 ` NeilBrown
2009-06-29 11:41 ` Jens Axboe
2009-06-29 12:45 ` Boaz Harrosh
2009-06-29 12:52 ` Jens Axboe
2009-06-29 23:09 ` Andreas Dilger
2009-07-01 0:29 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq163e55ada.fsf@sermon.lab.mkp.net \
--to=martin.petersen@oracle.com \
--cc=agk@redhat.com \
--cc=dm-devel@redhat.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=neilb@suse.de \
--cc=snitzer@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).