public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Sustrik <sustrik@fastmq.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Lucina <mato@kotelna.sk>, linux-kernel@vger.kernel.org
Subject: Re: Higher than expected disk write(2) latency
Date: Thu, 10 Jul 2008 10:12:12 +0200	[thread overview]
Message-ID: <4875C45C.2010901@fastmq.com> (raw)
In-Reply-To: <20080709222701.8eab4924.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3702 bytes --]

Hi Andrew,

>> we're getting some rather high figures for write(2) latency when testing
>> synchronous writing to disk.  The test I'm running writes 2000 blocks of
>> contiguous data to a raw device, using O_DIRECT and various block sizes
>> down to a minimum of 512 bytes.  
>>
>> The disk is a Seagate ST380817AS SATA connected to an Intel ICH7
>> using ata_piix.  Write caching has been explicitly disabled on the
>> drive, and there is no other activity that should affect the test
>> results (all system filesystems are on a separate drive).  The system is
>> running Debian etch, with a 2.6.24 kernel.
>>
>> Observed results:
>>
>> size=1024, N=2000, took=4.450788 s, thput=3 mb/s seekc=1
>> write: avg=8.388851 max=24.998846 min=8.335624 ms
>> 8 ms: 1992 cases
>> 9 ms: 2 cases
>> 10 ms: 1 cases
>> 14 ms: 1 cases
>> 16 ms: 3 cases
>> 24 ms: 1 cases
> 
> stoopid question 1: are you writing to a regular file, or to /dev/sda?  If
> the former then metadata fetches will introduce glitches.

Not a file, just a raw device.

> stoopid question 2: does the same effect happen with reads?

Dunno. The read is not critical for us. However, I would expect the same 
behaviour (see below).

We've got a satisfying explansation of the behaviour from Roger Heflin:

"You write sector n and n+1, it takes some amount of time for that first 
set of sectors to come under the head, when it does you write it and 
immediately return.   Immediately after that you attempt write sector 
n+2 and n+3 which just a bit ago passed under the head, so you have to 
wait an *ENTIRE* revolution for those sectors to again come under the 
head to be written, another ~8.3ms, and you continue to repeat this with 
each block being written.   If the sector was randomly placed in the 
rotation (ie 50% chance of the disk being off by 1/2 a rotation or 
less-you would have a 4.15 ms average seek time for your test)-but the 
case of sequential sync writes this leaves the sector about as far as 
possible from the head (it just passed under the head)."

Now, the obvious solution was to use AIO to be able to enqueue write 
requests even before the head reaches the end of the sector - thus there 
would be no need for superfluous disk revolvings.

We've actually measured this scenario with kernel AIO (libaio1) and this 
is what we'vew got (see attached graph).

The x axis represents individual write operations, y axis represents 
time. Crosses are operations enqueue times (when write requests were 
issues), circles are times of notifications (when the app was notified 
that the write request was processed).

What we see is that AIO performs rather bad while we are still 
enqueueing more writes (it misses right position on the disk and has to 
do superfluous disk revolvings), however, once we stop enqueueing new 
write request, those already in the queue are processed swiftly.

My guess (I am not a kernel hacker) would be that sync operations on the 
AIO queue are slowing down the retrieval from the queue and thus we miss 
the right place on the disk almost all the time. Once app stops 
enqueueing new write requests there's no contention on the queue and we 
are able to catch up with the speed of disk rotation.

If this is the case, the solution would be straightforward: When 
dequeueing from AIO queue, dequeue *all* the requests in the queue and 
place them into another non-synchronised queue. Getting an element from 
a non-sync queue is matter of few nanoseconds, thus we should be able to 
process it before head missis the right point on the disk. Once the 
non-sync queue is empty, we get *all* the requests from the AIO queue 
again. Etc.

Anyone any opinion on this matter?

Thanks.
Martin

[-- Attachment #2: aio.png --]
[-- Type: image/png, Size: 3628 bytes --]

  reply	other threads:[~2008-07-10  8:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-28 12:11 Higher than expected disk write(2) latency Martin Lucina
2008-06-28 13:11 ` Roger Heflin
2008-06-30 18:10   ` Martin Sustrik
2008-06-30 19:02     ` Roger Heflin
2008-06-30 22:20       ` Martin Sustrik
2008-07-01  0:11         ` Bernd Eckenfels
2008-07-02 16:48       ` Martin Sustrik
2008-07-02 18:15         ` Jeff Moyer
2008-07-02 18:20           ` Martin Sustrik
2008-07-04  3:16             ` David Dillow
2008-07-02 21:33         ` Roger Heflin
2008-06-28 14:47 ` David Newall
2008-06-29 11:34   ` Martin Sustrik
2008-07-10  5:27 ` Andrew Morton
2008-07-10  8:12   ` Martin Sustrik [this message]
2008-07-10  8:14     ` Andrew Morton
2008-07-10 13:29       ` Chris Mason
2008-07-10 13:41         ` Martin Lucina
2008-07-10 14:01           ` Arjan van de Ven
2008-07-10 14:18             ` Chris Mason
2008-07-10  8:31     ` Alan Cox
2008-07-10 13:17       ` Martin Sustrik
2008-07-10 13:18         ` Andrew Morton
2008-07-11 15:17       ` Martin Sustrik
     [not found] <fa.OZMA74BZPX46rhnjz1am4hB786M@ifi.uio.no>
2008-06-30  6:41 ` Robert Hancock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4875C45C.2010901@fastmq.com \
    --to=sustrik@fastmq.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mato@kotelna.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox