From: Martin Steigerwald <Martin@lichtvoll.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Moyer <jmoyer@redhat.com>, fio@vger.kernel.org
Subject: Re: Measuring IOPS
Date: Thu, 4 Aug 2011 11:34:32 +0200 [thread overview]
Message-ID: <201108041134.33127.Martin@lichtvoll.de> (raw)
In-Reply-To: <4E3A5F36.5030608@kernel.dk>
Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> On 2011-08-04 10:51, Martin Steigerwald wrote:
> > Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >>> Am Mittwoch, 3. August 2011 schrieben Sie:
> >>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
> >> [...]
> >>
> >>> Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage:
> >>> iodepth=int
> >>>
> >>> Number of I/O units to keep in flight against the
> >>> file. Note that increasing iodepth beyond 1 will
> >>> not affect synchronous ioengines (except for small
> >>> degress when verify_async is in use). Even async
> >>> engines my impose OS restrictions causing the
> >>> desired depth not to be achieved. This may happen
> >>> on Linux when using libaio and not setting
> >>> direct=1, since buffered IO is not async on that
> >>> OS. Keep an eye on the IO depth distribution in
> >>> the fio output to verify that the achieved depth
> >>> is as expected. Default: 1.
> >>>
> >>> Okay, yes, it does. I start getting a hang on it. Its a bit
> >>> puzzling to have two concepts of synchronous I/O around:
> >>>
> >>> 1) synchronous system call interfaces aka fio I/O engine
> >>>
> >>> 2) synchronous I/O requests aka O_SYNC
> >>
> >> But isn´t this a case for iodepth=1 if buffered I/O on Linux is
> >> synchronous? I bet most regular applications except some databases
> >> use buffered I/O.
> >
> > Thanks a lot for your answers, Jens, Jeff, DongJin.
> >
> > Now what about the above one?
> >
> > In what cases is iodepth > 1 relevant, when Linux buffered I/O is
> > synchronous? For mutiple threads or processes?
>
> iodepth controls what depth fio operates at, not the OS. You are right
> in that with iodepth=1, for buffered writes you could be seeing a much
> higher depth on the device side.
>
> So think of iodepth as how many IO units fio can have in flight,
> nothing else.
Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio
issues 64 I/O requests at once before it bothers waiting for I/O requests
to complete. And as the block layer completes I/O requests fio fills up the
64 I/O requests queue. Right?
Now when I do have two jobs running at once and iodepth=64, will each
process submit 64 I/O requests before waiting thus having at most 128 I/O
requests in flight? Or will each process use 32 I/O requests? My bet is
that iodepth is per job, per process.
> > One process / thread can only submit one I/O at a time with
> > synchronous system call I/O, but the function returns when the stuff
> > is in the page cache. So first why can´t Linux use iodepth > 1 when
> > there is lots of stuff in the page cache to be written out? That
> > should help the single process case.
>
> Since the IO unit is done when the system call returns, you can never
> have more than the one in flight for a sync engine. So iodepth > 1
> makes no sense for a sync engine.
Makes perfect sense then I understand that iodepth option related to what
the fio processes do.
> > On the mutiple process/threadsa case Linux gets several I/O requests
> > from mutiple processes/threads and thus iodepth > 1 does make sense?
>
> No.
Since each synchronous system call I/O fio job still submits one I/O at a
time...
> > Maybe it helps getting clear where in the stack iodepth is located
> > at, is it
> >
> > process / thread
> > systemcall
> > pagecache
> > blocklayer
> > iodepth
> > device driver
> > device
> >
> > ? If so, why can´t Linux not make use of iodepth > 1 with
> > synchronous system call I/O? Or is it further up on the system call
> > level? But then
>
> Because it is sync. The very nature of the sync system calls is that
> submission and completion are one event. For libaio, you could submit a
> bunch of requests before retrieving or waiting for completion of any
> one of them.
>
> The only example where a sync engine could drive a higher queue depth
> on the device side is buffered writes. For any other case (reads,
> direct writes), you need async submission to build up a higher queue
> depth.
Great! I think that makes it pretty clear.
Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and
10 from a file at once and then wait I need async I/O. Block might be of
arbitrary size.
What when I use 10 processes, each reading one of these blocks as once?
Couldn´t this fill up the queue at the device level? But then different
processes usually read different files...
... my question hints at how I/O depths might accumulate at the device
level, when several processes are issuing read and/or write requests at
once.
> > what sense would it make there, when using system calls that are
> > asynchronous already?
> > (Is that ordering above correct at all?)
>
> Your ordering looks OK. Now consider where and how you end up waiting
> for issued IO, that should tell you where queue depth could build up or
> not.
So we have several levels of queue depth.
- queue depth at the system call level
- queue depth at device level
=== sync I/O engines ===
queue depth at the system call level = 1
== reads ==
queue depth at the device level = 1
since read() returns when the data is in RAM and thus is synchronous I/O
on the lower level by nature
page cache will be used unless direct=1, so one might be measuring RAM /
read ahead performance, especially when several read jobs are running
concurrently.
writes might not hit the device unless direct=1 and thus one should use
larger than RAM file size.
== writes ==
queue depth at the device level = depending on the workload upto what the
device supports
unless direct=1, cause then write() is doing synchronous I/O on the lower
level and only returns when data is at least in drive cache
=== libaio ===
queue depth at the system call level = iodepth option of fio
as long as direct=1, since libaio falls back to synchronous system calls
with buffered writes
queue depth at the device level = same
fio submits as much I/Os as specified by iodepth and only then waits. As the
block layer completes I/Os fio fills up the queue.
conclusion:
thus when I want to measure higher I/O depths at read I need libaio and
direct=1. but then I am measuring something that does not have any
practical effect on processes that use synchronous system call I/O.
so for regular applications ioengine=sync + iodepth=64 gives more
realistic results - even when its then just I/O depth 1 for reads - and
for databases that use direct I/O ioengine=libaio makes sense and will
cause higher I/O depths on the device side if it supports it.
anything without direct=1 (or the slower sync=1) is potentially measuring
RAM performance. direct=1 omits the page cache. sync=1 basically disables
caching on the device / controller side as well.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2011-08-04 9:34 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
2011-08-02 14:32 ` Measuring IOPS (solved, I think) Martin Steigerwald
2011-08-02 19:48 ` Jens Axboe
2011-08-02 21:28 ` Martin Steigerwald
2011-08-03 7:17 ` Jens Axboe
2011-08-03 9:03 ` Martin Steigerwald
2011-08-03 10:34 ` Jens Axboe
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
2011-08-03 20:22 ` Jeff Moyer
2011-08-03 20:33 ` Martin Steigerwald
2011-08-04 7:50 ` Jens Axboe
2011-08-03 20:42 ` Martin Steigerwald
2011-08-03 20:50 ` Martin Steigerwald
2011-08-04 8:51 ` Martin Steigerwald
2011-08-04 8:58 ` Jens Axboe
2011-08-04 9:34 ` Martin Steigerwald [this message]
2011-08-04 10:02 ` Jens Axboe
2011-08-04 10:23 ` Martin Steigerwald
2011-08-05 7:28 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201108041134.33127.Martin@lichtvoll.de \
--to=martin@lichtvoll.de \
--cc=axboe@kernel.dk \
--cc=fio@vger.kernel.org \
--cc=jmoyer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox