From: Martin Steigerwald <Martin@lichtvoll.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Moyer <jmoyer@redhat.com>, fio@vger.kernel.org
Subject: Re: Measuring IOPS
Date: Thu, 4 Aug 2011 11:34:32 +0200 [thread overview]
Message-ID: <201108041134.33127.Martin@lichtvoll.de> (raw)
In-Reply-To: <4E3A5F36.5030608@kernel.dk>
Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> On 2011-08-04 10:51, Martin Steigerwald wrote:
> > Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >>> Am Mittwoch, 3. August 2011 schrieben Sie:
> >>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
> >> [...]
> >>
> >>> Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage:
> >>> iodepth=int
> >>>
> >>> Number of I/O units to keep in flight against the
> >>> file. Note that increasing iodepth beyond 1 will
> >>> not affect synchronous ioengines (except for small
> >>> degress when verify_async is in use). Even async
> >>> engines my impose OS restrictions causing the
> >>> desired depth not to be achieved. This may happen
> >>> on Linux when using libaio and not setting
> >>> direct=1, since buffered IO is not async on that
> >>> OS. Keep an eye on the IO depth distribution in
> >>> the fio output to verify that the achieved depth
> >>> is as expected. Default: 1.
> >>>
> >>> Okay, yes, it does. I start getting a hang on it. Its a bit
> >>> puzzling to have two concepts of synchronous I/O around:
> >>>
> >>> 1) synchronous system call interfaces aka fio I/O engine
> >>>
> >>> 2) synchronous I/O requests aka O_SYNC
> >>
> >> But isn´t this a case for iodepth=1 if buffered I/O on Linux is
> >> synchronous? I bet most regular applications except some databases
> >> use buffered I/O.
> >
> > Thanks a lot for your answers, Jens, Jeff, DongJin.
> >
> > Now what about the above one?
> >
> > In what cases is iodepth > 1 relevant, when Linux buffered I/O is
> > synchronous? For mutiple threads or processes?
>
> iodepth controls what depth fio operates at, not the OS. You are right
> in that with iodepth=1, for buffered writes you could be seeing a much
> higher depth on the device side.
>
> So think of iodepth as how many IO units fio can have in flight,
> nothing else.
Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio
issues 64 I/O requests at once before it bothers waiting for I/O requests
to complete. And as the block layer completes I/O requests fio fills up the
64 I/O requests queue. Right?
Now when I do have two jobs running at once and iodepth=64, will each
process submit 64 I/O requests before waiting thus having at most 128 I/O
requests in flight? Or will each process use 32 I/O requests? My bet is
that iodepth is per job, per process.
> > One process / thread can only submit one I/O at a time with
> > synchronous system call I/O, but the function returns when the stuff
> > is in the page cache. So first why can´t Linux use iodepth > 1 when
> > there is lots of stuff in the page cache to be written out? That
> > should help the single process case.
>
> Since the IO unit is done when the system call returns, you can never
> have more than the one in flight for a sync engine. So iodepth > 1
> makes no sense for a sync engine.
Makes perfect sense then I understand that iodepth option related to what
the fio processes do.
> > On the mutiple process/threadsa case Linux gets several I/O requests
> > from mutiple processes/threads and thus iodepth > 1 does make sense?
>
> No.
Since each synchronous system call I/O fio job still submits one I/O at a
time...
> > Maybe it helps getting clear where in the stack iodepth is located
> > at, is it
> >
> > process / thread
> > systemcall
> > pagecache
> > blocklayer
> > iodepth
> > device driver
> > device
> >
> > ? If so, why can´t Linux not make use of iodepth > 1 with
> > synchronous system call I/O? Or is it further up on the system call
> > level? But then
>
> Because it is sync. The very nature of the sync system calls is that
> submission and completion are one event. For libaio, you could submit a
> bunch of requests before retrieving or waiting for completion of any
> one of them.
>
> The only example where a sync engine could drive a higher queue depth
> on the device side is buffered writes. For any other case (reads,
> direct writes), you need async submission to build up a higher queue
> depth.
Great! I think that makes it pretty clear.
Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and
10 from a file at once and then wait I need async I/O. Block might be of
arbitrary size.
What when I use 10 processes, each reading one of these blocks as once?
Couldn´t this fill up the queue at the device level? But then different
processes usually read different files...
... my question hints at how I/O depths might accumulate at the device
level, when several processes are issuing read and/or write requests at
once.
> > what sense would it make there, when using system calls that are
> > asynchronous already?
> > (Is that ordering above correct at all?)
>
> Your ordering looks OK. Now consider where and how you end up waiting
> for issued IO, that should tell you where queue depth could build up or
> not.
So we have several levels of queue depth.
- queue depth at the system call level
- queue depth at device level
=== sync I/O engines ===
queue depth at the system call level = 1
== reads ==
queue depth at the device level = 1
since read() returns when the data is in RAM and thus is synchronous I/O
on the lower level by nature
page cache will be used unless direct=1, so one might be measuring RAM /
read ahead performance, especially when several read jobs are running
concurrently.
writes might not hit the device unless direct=1 and thus one should use
larger than RAM file size.
== writes ==
queue depth at the device level = depending on the workload upto what the
device supports
unless direct=1, cause then write() is doing synchronous I/O on the lower
level and only returns when data is at least in drive cache
=== libaio ===
queue depth at the system call level = iodepth option of fio
as long as direct=1, since libaio falls back to synchronous system calls
with buffered writes
queue depth at the device level = same
fio submits as much I/Os as specified by iodepth and only then waits. As the
block layer completes I/Os fio fills up the queue.
conclusion:
thus when I want to measure higher I/O depths at read I need libaio and
direct=1. but then I am measuring something that does not have any
practical effect on processes that use synchronous system call I/O.
so for regular applications ioengine=sync + iodepth=64 gives more
realistic results - even when its then just I/O depth 1 for reads - and
for databases that use direct I/O ioengine=libaio makes sense and will
cause higher I/O depths on the device side if it supports it.
anything without direct=1 (or the slower sync=1) is potentially measuring
RAM performance. direct=1 omits the page cache. sync=1 basically disables
caching on the device / controller side as well.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2011-08-04 9:34 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
2011-08-02 14:32 ` Measuring IOPS (solved, I think) Martin Steigerwald
2011-08-02 19:48 ` Jens Axboe
2011-08-02 21:28 ` Martin Steigerwald
2011-08-03 7:17 ` Jens Axboe
2011-08-03 9:03 ` Martin Steigerwald
2011-08-03 10:34 ` Jens Axboe
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
2011-08-03 20:22 ` Jeff Moyer
2011-08-03 20:33 ` Martin Steigerwald
2011-08-04 7:50 ` Jens Axboe
2011-08-03 20:42 ` Martin Steigerwald
2011-08-03 20:50 ` Martin Steigerwald
2011-08-04 8:51 ` Martin Steigerwald
2011-08-04 8:58 ` Jens Axboe
2011-08-04 9:34 ` Martin Steigerwald [this message]
2011-08-04 10:02 ` Jens Axboe
2011-08-04 10:23 ` Martin Steigerwald
2011-08-05 7:28 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201108041134.33127.Martin@lichtvoll.de \
--to=martin@lichtvoll.de \
--cc=axboe@kernel.dk \
--cc=fio@vger.kernel.org \
--cc=jmoyer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.