Re: Higher than expected disk write(2) latency

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Martin Sustrik <sustrik@fastmq.com>
Cc: Martin Lucina <mato@kotelna.sk>,
	linux-kernel@vger.kernel.org, linux-aio@kvack.org
Subject: Re: Higher than expected disk write(2) latency
Date: Thu, 10 Jul 2008 01:14:17 -0700	[thread overview]
Message-ID: <20080710011417.95532d51.akpm@linux-foundation.org> (raw)
In-Reply-To: <4875C45C.2010901@fastmq.com>

On Thu, 10 Jul 2008 10:12:12 +0200 Martin Sustrik <sustrik@fastmq.com> wrote:

> Hi Andrew,
> 
> >> we're getting some rather high figures for write(2) latency when testing
> >> synchronous writing to disk.  The test I'm running writes 2000 blocks of
> >> contiguous data to a raw device, using O_DIRECT and various block sizes
> >> down to a minimum of 512 bytes.  
> >>
> >> The disk is a Seagate ST380817AS SATA connected to an Intel ICH7
> >> using ata_piix.  Write caching has been explicitly disabled on the
> >> drive, and there is no other activity that should affect the test
> >> results (all system filesystems are on a separate drive).  The system is
> >> running Debian etch, with a 2.6.24 kernel.
> >>
> >> Observed results:
> >>
> >> size=1024, N=2000, took=4.450788 s, thput=3 mb/s seekc=1
> >> write: avg=8.388851 max=24.998846 min=8.335624 ms
> >> 8 ms: 1992 cases
> >> 9 ms: 2 cases
> >> 10 ms: 1 cases
> >> 14 ms: 1 cases
> >> 16 ms: 3 cases
> >> 24 ms: 1 cases
> > 
> > stoopid question 1: are you writing to a regular file, or to /dev/sda?  If
> > the former then metadata fetches will introduce glitches.
> 
> Not a file, just a raw device.
> 
> > stoopid question 2: does the same effect happen with reads?
> 
> Dunno. The read is not critical for us. However, I would expect the same 
> behaviour (see below).
> 
> We've got a satisfying explansation of the behaviour from Roger Heflin:
> 
> "You write sector n and n+1, it takes some amount of time for that first 
> set of sectors to come under the head, when it does you write it and 
> immediately return.   Immediately after that you attempt write sector 
> n+2 and n+3 which just a bit ago passed under the head, so you have to 
> wait an *ENTIRE* revolution for those sectors to again come under the 
> head to be written, another ~8.3ms, and you continue to repeat this with 
> each block being written.   If the sector was randomly placed in the 
> rotation (ie 50% chance of the disk being off by 1/2 a rotation or 
> less-you would have a 4.15 ms average seek time for your test)-but the 
> case of sequential sync writes this leaves the sector about as far as 
> possible from the head (it just passed under the head)."
> 
> Now, the obvious solution was to use AIO to be able to enqueue write 
> requests even before the head reaches the end of the sector - thus there 
> would be no need for superfluous disk revolvings.
> 
> We've actually measured this scenario with kernel AIO (libaio1) and this 
> is what we'vew got (see attached graph).
> 
> The x axis represents individual write operations, y axis represents 
> time. Crosses are operations enqueue times (when write requests were 
> issues), circles are times of notifications (when the app was notified 
> that the write request was processed).
> 
> What we see is that AIO performs rather bad while we are still 
> enqueueing more writes (it misses right position on the disk and has to 
> do superfluous disk revolvings), however, once we stop enqueueing new 
> write request, those already in the queue are processed swiftly.
> 
> My guess (I am not a kernel hacker) would be that sync operations on the 
> AIO queue are slowing down the retrieval from the queue and thus we miss 
> the right place on the disk almost all the time. Once app stops 
> enqueueing new write requests there's no contention on the queue and we 
> are able to catch up with the speed of disk rotation.
> 
> If this is the case, the solution would be straightforward: When 
> dequeueing from AIO queue, dequeue *all* the requests in the queue and 
> place them into another non-synchronised queue. Getting an element from 
> a non-sync queue is matter of few nanoseconds, thus we should be able to 
> process it before head missis the right point on the disk. Once the 
> non-sync queue is empty, we get *all* the requests from the AIO queue 
> again. Etc.
> 
> Anyone any opinion on this matter?

Not immediately, but the fine folks on the linux-aio list might be able to
help out.  If you have some simple testcase code which you can share then
that would help things along.

next prev parent reply	other threads:[~2008-07-10  8:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-28 12:11 Higher than expected disk write(2) latency Martin Lucina
2008-06-28 13:11 ` Roger Heflin
2008-06-30 18:10   ` Martin Sustrik
2008-06-30 19:02     ` Roger Heflin
2008-06-30 22:20       ` Martin Sustrik
2008-07-01  0:11         ` Bernd Eckenfels
2008-07-02 16:48       ` Martin Sustrik
2008-07-02 18:15         ` Jeff Moyer
2008-07-02 18:20           ` Martin Sustrik
2008-07-04  3:16             ` David Dillow
2008-07-02 21:33         ` Roger Heflin
2008-06-28 14:47 ` David Newall
2008-06-29 11:34   ` Martin Sustrik
2008-07-10  5:27 ` Andrew Morton
2008-07-10  8:12   ` Martin Sustrik
2008-07-10  8:14     ` Andrew Morton [this message]
2008-07-10 13:29       ` Chris Mason
2008-07-10 13:41         ` Martin Lucina
2008-07-10 14:01           ` Arjan van de Ven
2008-07-10 14:18             ` Chris Mason
2008-07-10  8:31     ` Alan Cox
2008-07-10 13:17       ` Martin Sustrik
2008-07-10 13:18         ` Andrew Morton
2008-07-11 15:17       ` Martin Sustrik
     [not found] <fa.OZMA74BZPX46rhnjz1am4hB786M@ifi.uio.no>
2008-06-30  6:41 ` Robert Hancock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080710011417.95532d51.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mato@kotelna.sk \
    --cc=sustrik@fastmq.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.