Re: Measuring IOPS - Martin Steigerwald

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Steigerwald <Martin@lichtvoll.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Moyer <jmoyer@redhat.com>, fio@vger.kernel.org
Subject: Re: Measuring IOPS
Date: Thu, 4 Aug 2011 11:34:32 +0200	[thread overview]
Message-ID: <201108041134.33127.Martin@lichtvoll.de> (raw)
In-Reply-To: <4E3A5F36.5030608@kernel.dk>

Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> On 2011-08-04 10:51, Martin Steigerwald wrote:
> > Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >>> Am Mittwoch, 3. August 2011 schrieben Sie:
> >>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
> >> [...]
> >> 
> >>> Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage:
> >>>        iodepth=int
> >>>        
> >>>               Number  of I/O units to keep in flight against the
> >>>               file. Note that increasing iodepth beyond  1  will
> >>>               not affect synchronous ioengines (except for small
> >>>               degress when verify_async is in use).  Even  async
> >>>               engines  my  impose  OS  restrictions  causing the
> >>>               desired depth not to be achieved.  This may happen
> >>>               on   Linux  when  using  libaio  and  not  setting
> >>>               direct=1, since buffered IO is not async  on  that
> >>>               OS.  Keep  an  eye on the IO depth distribution in
> >>>               the fio output to verify that the  achieved  depth
> >>>               is as expected. Default: 1.
> >>> 
> >>> Okay, yes, it does. I start getting a hang on it. Its a bit
> >>> puzzling to have two concepts of synchronous I/O around:
> >>> 
> >>> 1) synchronous system call interfaces aka fio I/O engine
> >>> 
> >>> 2) synchronous I/O requests aka O_SYNC
> >> 
> >> But isn´t this a case for iodepth=1 if buffered I/O on Linux is
> >> synchronous? I bet most regular applications except some databases
> >> use buffered I/O.
> > 
> > Thanks a lot for your answers, Jens, Jeff, DongJin.
> > 
> > Now what about the above one?
> > 
> > In what cases is iodepth > 1 relevant, when Linux buffered I/O is
> > synchronous? For mutiple threads or processes?
> 
> iodepth controls what depth fio operates at, not the OS. You are right
> in that with iodepth=1, for buffered writes you could be seeing a much
> higher depth on the device side.
> 
> So think of iodepth as how many IO units fio can have in flight,
> nothing else.

Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio 
issues 64 I/O requests at once before it bothers waiting for I/O requests 
to complete. And as the block layer completes I/O requests fio fills up the 
64 I/O requests queue. Right?

Now when I do have two jobs running at once and iodepth=64, will each 
process submit 64 I/O requests before waiting thus having at most 128 I/O 
requests in flight? Or will each process use 32 I/O requests? My bet is 
that iodepth is per job, per process.

> > One process / thread can only submit one I/O at a time with
> > synchronous system call I/O, but the function returns when the stuff
> > is in the page cache. So first why can´t Linux use iodepth > 1 when
> > there is lots of stuff in the page cache to be written out? That
> > should help the single process case.
> 
> Since the IO unit is done when the system call returns, you can never
> have more than the one in flight for a sync engine. So iodepth > 1
> makes no sense for a sync engine.

Makes perfect sense then I understand that iodepth option related to what 
the fio processes do.

> > On the mutiple process/threadsa case Linux gets several I/O requests
> > from mutiple processes/threads and thus iodepth > 1 does make sense?
> 
> No.

Since each synchronous system call I/O fio job still submits one I/O at a 
time...

> > Maybe it helps getting clear where in the stack iodepth is located
> > at, is it
> > 
> > process / thread
> > systemcall
> > pagecache
> > blocklayer
> > iodepth
> > device driver
> > device
> > 
> > ? If so, why can´t Linux  not make use of iodepth > 1 with
> > synchronous system call I/O? Or is it further up on the system call
> > level? But then
> 
> Because it is sync. The very nature of the sync system calls is that
> submission and completion are one event. For libaio, you could submit a
> bunch of requests before retrieving or waiting for completion of any
> one of them.
> 
> The only example where a sync engine could drive a higher queue depth
> on the device side is buffered writes. For any other case (reads,
> direct writes), you need async submission to build up a higher queue
> depth.

Great! I think that makes it pretty clear.

Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and 
10 from a file at once and then wait I need async I/O.  Block might be of 
arbitrary size.

What when I use 10 processes, each reading one of these blocks as once? 
Couldn´t this fill up the queue at the device level? But then different 
processes usually read different files...

... my question hints at how I/O depths might accumulate at the device 
level, when several processes are issuing read and/or write requests at 
once.

> > what sense would it make there, when using system calls that are
> > asynchronous already?
> > (Is that ordering above correct at all?)
> 
> Your ordering looks OK. Now consider where and how you end up waiting
> for issued IO, that should tell you where queue depth could build up or
> not.

So we have several levels of queue depth.

- queue depth at the system call level 
- queue depth at device level

=== sync I/O engines ===
queue depth at the system call level = 1

== reads ==
queue depth at the device level = 1
since read() returns when the data is in RAM and thus is synchronous I/O 
on the lower level by nature

page cache will be used unless direct=1, so one might be measuring RAM / 
read ahead performance, especially when several read jobs are running 
concurrently. 

writes might not hit the device unless direct=1 and thus one should use 
larger than RAM file size.

== writes ==
queue depth at the device level = depending on the workload upto what the 
device supports

unless direct=1, cause then write() is doing synchronous I/O on the lower 
level and only returns when data is at least in drive cache

=== libaio ===
queue depth at the system call level = iodepth option of fio

as long as direct=1, since libaio falls back to synchronous system calls 
with buffered writes

queue depth at the device level = same

fio submits as much I/Os as specified by iodepth and only then waits. As the 
block layer completes I/Os fio fills up the queue.

conclusion:

thus when I want to measure higher I/O depths at read I need libaio and 
direct=1. but then I am measuring something that does not have any 
practical effect on processes that use synchronous system call I/O.

so for regular applications ioengine=sync + iodepth=64 gives more 
realistic results - even when its then just I/O depth 1 for reads - and 
for databases that use direct I/O ioengine=libaio makes sense and will 
cause higher I/O depths on the device side if it supports it.

anything without direct=1 (or the slower sync=1) is potentially measuring 
RAM performance. direct=1 omits the page cache. sync=1 basically disables 
caching on the device / controller side as well.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

next prev parent reply	other threads:[~2011-08-04  9:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
2011-08-02 14:32   ` Measuring IOPS (solved, I think) Martin Steigerwald
2011-08-02 19:48     ` Jens Axboe
2011-08-02 21:28       ` Martin Steigerwald
2011-08-03  7:17         ` Jens Axboe
2011-08-03  9:03           ` Martin Steigerwald
2011-08-03 10:34             ` Jens Axboe
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
2011-08-03 20:22   ` Jeff Moyer
2011-08-03 20:33     ` Martin Steigerwald
2011-08-04  7:50       ` Jens Axboe
2011-08-03 20:42     ` Martin Steigerwald
2011-08-03 20:50       ` Martin Steigerwald
2011-08-04  8:51         ` Martin Steigerwald
2011-08-04  8:58           ` Jens Axboe
2011-08-04  9:34             ` Martin Steigerwald [this message]
2011-08-04 10:02               ` Jens Axboe
2011-08-04 10:23                 ` Martin Steigerwald
2011-08-05  7:28                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201108041134.33127.Martin@lichtvoll.de \
    --to=martin@lichtvoll.de \
    --cc=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=jmoyer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.