qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency
@ 2016-07-19 10:25 Roman Pen
  2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Roman Pen @ 2016-07-19 10:25 UTC (permalink / raw)
  Cc: Roman Pen, Stefan Hajnoczi, Paolo Bonzini, qemu-devel

This series are intended to reduce completion latencies by two changes:

1. QEMU does not use any timeout value for harvesting completed AIO
   requests from the ring buffer, thus io_getevents() can be implemented
   in userspace (first patch).

2. In order to reduce completion latency it makes sense to harvest completed
   requests ASAP.  Very fast backend device can complete requests just after
   submission, so it is worth trying to check ring buffer and peek completed
   requests directly after io_submit() has been called (third patch).

Indeed, the series reduces the completions latencies and increases the
overall throughput, e.g. the following is the percentiles of number of
completed requests at once:

        1th 10th  20th  30th  40th  50th  60th  70th  80th  90th  99.99th
Before    2    4    42   112   128   128   128   128   128   128    128
 After    1    1     4    14    33    45    47    48    50    51    108

That means, that before the third patch is applied the ring buffer is
observed as full (128 requests were consumed at once) in 60% of calls.

After the third patch is applied the distribution of number of completed
requests is "smoother" and the queue (requests in-flight) is almost never
full.

The fio read results are the following (write results are almost the
same and are not showed here):

  Before
  ------
job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec
    slat (usec): min=172, max=16883, avg=338.35, stdev=109.66
    clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29
     lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  596], 10.00th=[  708], 20.00th=[  852],
     | 30.00th=[  932], 40.00th=[  996], 50.00th=[ 1048], 60.00th=[ 1112],
     | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496],
     | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672],
     | 99.99th=[ 4704]
    bw (KB  /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51

  After
  ------
job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016
  Description  : [Emulation of Storage Server Access Pattern]
  read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec
    slat (usec): min=169, max=20636, avg=329.61, stdev=124.18
    clat (usec): min=2, max=19592, avg=988.78, stdev=251.04
     lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  580], 10.00th=[  748], 20.00th=[  876],
     | 30.00th=[  908], 40.00th=[  948], 50.00th=[ 1012], 60.00th=[ 1064],
     | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288],
     | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256],
     | 99.99th=[ 5408]
    bw (KB  /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75

Throughput increased from 1822MB/s to 1921MB/s, average completion latencies
decreased from 1051us to 988us.

Roman Pen (3):
  linux-aio: consume events in userspace instead of calling io_getevents
  linux-aio: split processing events function
  linux-aio: process completions from ioq_submit()

 block/linux-aio.c | 175 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 137 insertions(+), 38 deletions(-)

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
-- 
2.8.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-07-19 11:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-19 10:25 [Qemu-devel] [PATCH 0/3] linux-aio: reduce completion latency Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 1/3] linux-aio: consume events in userspace instead of calling io_getevents Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 2/3] linux-aio: split processing events function Roman Pen
2016-07-19 10:25 ` [Qemu-devel] [PATCH 3/3] linux-aio: process completions from ioq_submit() Roman Pen
2016-07-19 10:36   ` Paolo Bonzini
2016-07-19 11:18     ` Roman Penyaev
2016-07-19 11:30       ` Paolo Bonzini
2016-07-19 11:44       ` Roman Penyaev
2016-07-19 11:47         ` Paolo Bonzini
2016-07-19 11:52           ` Roman Penyaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).