Re: [PATCH 0/8][RFC] IO latency/throughput fixes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, tytso@mit.edu
Subject: Re: [PATCH 0/8][RFC] IO latency/throughput fixes
Date: Mon, 6 Apr 2009 18:57:53 +0200	[thread overview]
Message-ID: <20090406165753.GG5178@kernel.dk> (raw)
In-Reply-To: <alpine.LFD.2.00.0904060835530.3863@localhost.localdomain>

On Mon, Apr 06 2009, Linus Torvalds wrote:
> 
> 
> On Mon, 6 Apr 2009, Jens Axboe wrote:
> > 
> > Ran the fsync-tester [1]. Drive is a 3-4 years old SATA drive, fs is
> > ext3/writeback. IO scheduler is CFQ.
> > 
> > fsync time: 0.2785s
> > fsync time: 0.2640s
> > 
> > And with Linus torture dd running in the background:
> > 
> > fsync time: 0.0109s
> > fsync time: 0.5236s
> > fsync time: 1.2108s
> 
> Ok, it's definitely better for me too. CFQ used to be the problem case 
> (with the previous patches), now I've been trying with CFQ for a while, 
> and it seems ok.
> 
> Not wonderful, by any means, but I haven't seen a 5+ second delay yet. 
> I've come close (I have a few 2+s hickups in my trace), but it's 
> clearly more responsive, even if I'd wish it to be better still.

OK that's good. I'll run some testing with this as well and perhaps we
can even do better still.

> One thing that I find intriguing is how the fsync time seems so 
> _consistent_ across a wild variety of drives. It's interesting how you see 
> delays that are roughly the same order of magnitude, even though you are 
> using an old SATA drive, and I'm using the Intel SSD. And when you turn 
> off TCQ, your numbers go down even more.
> 
> That just makes me suspect that there is something else than pure IO going 
> on. There shouldn't be any idling by the IO scheduler in my setup 
> ("rotational" is zero for me), and quite frankly, I should not see 
> latencies in the seconds even _with_ TCQ, since it should be limited to 
> just 32 tags. Of course, maybe some of those requests just grow humongous. 
> 
> So maybe one reason the "sync()" workload is so horrible is that we get 
> insanely big single requests. I see
> 
> 	[root@nehalem queue]# cat max_sectors_kb 
> 	512
> 
> so we should be limited to half a meg per request, but I guess 32 of those 
> will take some time even on the Intel SSD. In fact, I guess the SSD is not 
> really any faster than your 2-3 year old SATA disk when it comes to pure 
> linear throughput

It's probably close, for your MLC version. This drive does around
60Mb/sec sequential writes, which is in the ball park with the ~70 yours
should be doing.

> Hmm. Doing a "echo 64 > max_sectors_kb" does seem to make my experience 
> nicer. At no really noticeable downside in throughput that I can see: the 
> "dd+sync" still tends to fluctuate 30-40s. But maybe I'm fooling myself. 
> But my 'strace' seems to agree: I'm having a hard time triggering anything 
> even close to a second latency now.
> 
> I wonder if we could limit the tag usage by request _size_, ie not let big 
> requests fill up all the tags (by all means allow writes to fill them up 
> if they are small - it's with many small requests that you get the biggest 
> advantage, after all, and with many _big_ requests that the downside is 
> the biggest too).

I think we are doing OK with the tag async vs sync allocation, there
should be plenty of room for sync IO still. The problem is likely
earlier, before we even assign a tag. The IO schedulers will move IO to
the dispatch list and the driver will grab it from there, assign a tag,
start, etc. Perhaps we end up moving too much to the dispatch list. That
would increase latencies, even if a sync queue preempts the current
async queue and dispatches a request (which goes to the dispatch list,
ordered, and thus may have to wait for others to be serviced first).

Just speculating, I'll test and probe and see what comes up.

-- 
Jens Axboe

next prev parent reply	other threads:[~2009-04-06 16:58 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-06 12:48 [PATCH 0/8][RFC] IO latency/throughput fixes Jens Axboe
2009-04-06 12:48 ` [PATCH 1/8] block: change the request allocation/congestion logic to be sync/async based Jens Axboe
2009-04-06 12:48 ` [PATCH 2/8] Add WRITE_SYNC_PLUG and SWRITE_SYNC_PLUG Jens Axboe
2009-04-06 12:48 ` [PATCH 3/8] block: fsync_buffers_list() should use SWRITE_SYNC_PLUG Jens Axboe
2009-04-06 12:48 ` [PATCH 4/8] jbd: use WRITE_SYNC_PLUG instead of WRITE_SYNC Jens Axboe
2009-04-06 12:48 ` [PATCH 5/8] jbd2: " Jens Axboe
2009-04-06 12:48 ` [PATCH 6/8] block: enabling plugging on SSD devices that don't do queuing Jens Axboe
2009-04-06 12:48 ` [PATCH 7/8] block: Add flag for telling the IO schedulers NOT to anticipate more IO Jens Axboe
2009-04-06 12:48 ` [PATCH 8/8] block: switch sync_dirty_buffer() over to WRITE_SYNC Jens Axboe
2009-04-06 13:04 ` [PATCH 0/8][RFC] IO latency/throughput fixes Jens Axboe
2009-04-06 13:13   ` Jens Axboe
2009-04-06 15:37   ` Linus Torvalds
2009-04-06 16:57     ` Jens Axboe [this message]
2009-04-07  3:28     ` Chris Mason
2009-04-06 15:04 ` Linus Torvalds
2009-04-06 15:10   ` Jens Axboe
2009-04-06 15:45     ` Linus Torvalds
2009-04-06 17:01       ` Jens Axboe
2009-04-06 18:31       ` Theodore Tso
2009-04-06 19:57         ` Linus Torvalds
2009-04-06 20:10           ` Linus Torvalds
2009-04-06 21:26             ` Theodore Tso
2009-04-06 20:12           ` Hua Zhong
2009-04-06 20:20             ` Linus Torvalds
2009-04-06 21:19             ` Theodore Tso
2009-04-06 21:35               ` Hua Zhong
2009-04-06 22:04                 ` Ray Lee
2009-04-06 22:17                   ` Linus Torvalds
2009-04-06 23:10                     ` Linus Torvalds
2009-04-07  7:51                       ` Geert Uytterhoeven
2009-04-07 10:36                         ` Ingo Molnar
2009-04-07 14:10                           ` Diego Calleja
2009-04-08 12:04                             ` Ingo Molnar
2009-04-08 12:56                           ` Denys Vlasenko
2009-04-08 13:27                             ` Ingo Molnar
2009-04-07 13:35                       ` Mark Lord
2009-04-07 14:33                         ` Linus Torvalds
2009-04-07 19:24                           ` Mark Lord
2009-04-07 19:45                             ` Jeff Garzik
2009-04-07 20:53                           ` Mike Galbraith
2009-04-09  2:40                       ` Eric Sandeen
2009-04-09 14:01                         ` Ric Wheeler
2009-04-06 22:25                   ` Hua Zhong
2009-04-06 22:48                     ` Ray Lee
2009-04-06 22:52                       ` Hua Zhong
2009-04-06 23:19                       ` Alan Cox
2009-04-07  3:52               ` Chris Mason
2009-04-07  4:13                 ` Trenton D. Adams
2009-04-07  4:27                   ` Linus Torvalds
2009-04-07  4:48                     ` Trenton D. Adams
2009-04-07  5:02                       ` Linus Torvalds
2009-04-07  5:23                         ` Hua Zhong
2009-04-07  6:27                         ` Trenton D. Adams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090406165753.GG5178@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox