All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Theodore Tso <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Developers List <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [GIT PULL] Ext3 latency fixes
Date: Mon, 6 Apr 2009 08:15:59 +0200	[thread overview]
Message-ID: <20090406061558.GL5178@kernel.dk> (raw)
In-Reply-To: <20090404232222.GA7480@mit.edu>

On Sat, Apr 04 2009, Theodore Tso wrote:
> On Sat, Apr 04, 2009 at 08:01:08PM +0200, Jens Axboe wrote:
> > 
> > Unless you make all journal writes sync, ext3 fsync will always suck big
> > time. But I get your point.
> > 
> 
> We have already made all other journal writes sync when triggered by
> fsync() --- except for the commit record which is being written by
> sync_dirty_buffer().  I've verified this via blktrace.
> 
> However, you're right.  We do have an performance regression using the
> regression test provided in commit 78f707bf.  Taking the average of at
> least 5 runs, I found:
> 
> stock 2.6.29				 1687 ms
> 2.6.29 w/ ext3-latency-fixes		 7337 ms
> 2.6.29 w/ full-latency-fixes		13722 ms
> 
> "ext3-latency-fixes" are the patches which Linus has already pulled
> into the 2.6.29 tree.  "full-latency-fixes" are ext3-latency-fixes
> plus the sync_dirty_buffes() patch.

Ugh yes, not everyone will appreciate the better latency and for them an
8x slowdown is veeery painful.

> Looking at blktrace of stock 2.6.29 running the sqlite performance
> measurement, we see this:
> 
> 254,2    1       13     0.005120554  7183  Q   W 199720 + 8 [sqlite-fsync-te]
> 254,2    1       15     0.005666573  6947  Q   W 10277200 + 8 [kjournald]
> 254,2    1       16     0.005691576  6947  Q   W 10277208 + 8 [kjournald]
> 254,2    1       18     0.006145684  6947  Q   W 10277216 + 8 [kjournald]
> 254,2    1       21     0.006685348  7183  Q   W 199720 + 8 [sqlite-fsync-te]
> 254,2    1       24     0.007162644  6947  Q   W 10277224 + 8 [kjournald]
> 254,2    1       25     0.007187857  6947  Q   W 10277232 + 8 [kjournald]
> 254,2    0       27     0.007616473  6947  Q   W 10277240 + 8 [kjournald]
> 
> Looking at a blktrace of 2.6.29 plus the full-latency-fixes, we see this:
> 
> 254,2    0       13     0.013744556  7205  Q  WS 199208 + 8 [sqlite-fsync-te]
> 254,2    0       16     0.019270748  6965  Q  WS 10301896 + 8 [kjournald]
> 254,2    0       17     0.019400024  6965  Q  WS 10301904 + 8 [kjournald]
> 254,2    1       23     0.019892824  6965  Q  WS 10301912 + 8 [kjournald]
> 254,2    0       20     0.020450995  7205  Q  WS 199208 + 8 [sqlite-fsync-te]
> 254,2    1       26     0.025954909  6965  Q  WS 10301920 + 8 [kjournald]
> 254,2    1       27     0.026084814  6965  Q  WS 10301928 + 8 [kjournald]
> 254,2    0       24     0.026612884  6965  Q  WS 10301936 + 8 [kjournald]
> 
> Looking at the deltas between the two iterations of the sqlite
> analogue, we see that stock 2.6.29 is 1.56ms, where as with the full
> latency fixes, it's 6.70ms. 

It would be interesting with the full set of information, like unplug
and merge. The above definitely shows longer between queue-to-queue, I
wonder what that is due to.

> However, the full latency fixes all the writes are synchronous, so it
> must be the case that the delays are caused by the fact that queue is
> getting implicitly unplugged after the synchronous write, and the
> problem is no longer the mixing of WRITE and WRITE_SYNC requests as
> posted in the commit log for 78f707bf.  If we remove the automatic
> unplug for WRITE_SYNC requests, and add an explicit unplug where it is
> needed, that should fix the performance regression for this particular
> sqlite test case.  (Which isn't a throughput issue, since the test is
> basically fopen/write/fsync/fclose over and over again, with no
> background load.)

We definitely should add a WRITE_SYNC variant that doesn't unplug, for
the cases where you know you are going to submit more than one IO before
waiting. Whether that should just be WRITE_SYNC or we should add a
WRITE_SYNC_UNPLUG, I don't have a really strong preference either way.

But if the delays are due to premature unplugging, it should be visible
in the trace. Care to send the "full" thing?

> The scenario which Linus and I had been focused on is one where there
> was a heavy background load writing asynchronously.  We do want to
> make sure that a series of fsync() calls in a tight loop is also
> reasonable as well, so Jens, I do think you are absolutely right that
> this is something that we need to pay attention to.

Most definitely!

> I had missed the commit 78f707bf when you originally submitted it, so
> I didn't do this test before submitting the patch.  And I guess you
> had missed my patch proposal to LKML, and I didn't think to explicitly
> CC you on my patches.  Apologies for the communication faults, but
> hopefully we can fix this performance issue for both cases and get
> these problems behind us.

We still have a long cycle ahead of us, so I'm sure we can get it nailed
down before release.

-- 
Jens Axboe


  parent reply	other threads:[~2009-04-06  6:16 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-03  7:01 [GIT PULL] Ext3 latency fixes Theodore Ts'o
2009-04-03  7:01 ` [PATCH 1/4] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks Theodore Ts'o
2009-04-03  7:01   ` [PATCH 2/4] ext3: Use WRITE_SYNC for commits which are caused by fsync() Theodore Ts'o
2009-04-03  7:01     ` [PATCH 3/4] ext3: Add replace-on-truncate hueristics for data=writeback mode Theodore Ts'o
2009-04-03  7:01       ` [PATCH 4/4] ext3: Add replace-on-rename " Theodore Ts'o
2009-04-03 15:01 ` EXT4 in embedded systems Nick Hennenfent (nhennefe)
2009-04-03 16:06   ` Eric Sandeen
2009-04-03 17:15     ` Nick Hennenfent (nhennefe)
2009-04-03 18:24 ` [GIT PULL] Ext3 latency fixes Linus Torvalds
2009-04-03 18:47   ` Jens Axboe
2009-04-03 19:13     ` Theodore Tso
2009-04-03 21:01     ` Chris Mason
2009-04-03 19:02   ` Linus Torvalds
2009-04-03 20:41     ` Linus Torvalds
2009-04-04 13:57       ` Theodore Tso
2009-04-04 15:16         ` Jens Axboe
2009-04-04 15:57           ` Linus Torvalds
2009-04-04 16:06             ` Linus Torvalds
2009-04-04 17:36               ` Jens Axboe
2009-04-04 17:34             ` Jens Axboe
2009-04-04 17:44               ` Linus Torvalds
2009-04-04 18:00                 ` Trenton D. Adams
2009-04-04 18:01                 ` Jens Axboe
2009-04-04 18:10                   ` Linus Torvalds
2009-04-04 23:22                   ` Theodore Tso
2009-04-04 23:33                     ` Arjan van de Ven
2009-04-05  0:10                       ` Theodore Tso
2009-04-05 15:05                         ` Arjan van de Ven
2009-04-05 17:01                         ` Linus Torvalds
2009-04-05 17:15                           ` Mark Lord
2009-04-05 20:57                             ` Jeff Garzik
2009-04-05 23:48                               ` Arjan van de Ven
2009-04-06  2:32                                 ` Mark Lord
2009-04-06  5:47                                 ` Jeff Garzik
2009-04-07 18:18                                   ` Linus Torvalds
2009-04-07 18:22                                     ` Linus Torvalds
2009-04-07 19:40                                     ` [PATCH libata: add SSD detection hueristic; move SSD setup to ata_dev_configure (was Re: [GIT PULL] Ext3 latency fixes) Jeff Garzik
2009-04-09 18:21                                       ` Tejun Heo
2009-04-18  3:02                                         ` George Spelvin
2009-04-06  8:13                             ` [GIT PULL] Ext3 latency fixes Jens Axboe
2009-04-05 18:56                           ` Arjan van de Ven
2009-04-05 19:34                             ` Linus Torvalds
2009-04-05 20:06                               ` Arjan van de Ven
2009-04-06  6:25                               ` Jens Axboe
2009-04-06  6:05                           ` Theodore Tso
2009-04-06  6:23                           ` Jens Axboe
2009-04-06  8:16                       ` Jens Axboe
2009-04-06 14:48                         ` Linus Torvalds
2009-04-06 15:09                           ` Jens Axboe
2009-04-06  6:15                     ` Jens Axboe [this message]
2009-04-04 20:18               ` Ingo Molnar
2009-04-06 21:50                 ` Lennart Sorensen
2009-04-07 13:31                   ` Mark Lord
2009-04-07 14:48                     ` Lennart Sorensen
2009-04-07 19:21                       ` Mark Lord
2009-04-07 19:57                         ` Lennart Sorensen
2009-04-04 20:56               ` Arjan van de Ven
2009-04-06  7:06                 ` Jens Axboe
2009-04-07 15:39             ` Indan Zupancic
2009-04-04 19:18           ` Theodore Tso
2009-04-06  8:12             ` Jens Axboe
2009-04-04 22:13         ` Linus Torvalds
2009-04-04 22:19           ` Linus Torvalds
2009-04-05  0:20           ` Theodore Tso
2009-04-03 19:54   ` Theodore Tso
  -- strict thread matches above, loose matches on Subject: below --
2009-04-08 23:40 Theodore Ts'o
2009-04-09 15:49 ` Linus Torvalds
2009-04-09 16:23   ` Chris Mason
2009-04-09 17:49     ` Jan Kara
2009-04-09 18:10       ` Chris Mason
2009-04-09 19:04         ` Jan Kara
2009-04-09 17:36   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090406061558.GL5178@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.