Re: [PATCH] ext4: reduce scheduling latency with delayed allocation

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Michal Schmidt <mschmidt@redhat.com>
To: tytso@mit.edu
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: reduce scheduling latency with delayed allocation
Date: Wed, 10 Mar 2010 14:09:34 +0100	[thread overview]
Message-ID: <20100310140934.60c06148@leela> (raw)
In-Reply-To: <20100302030619.GB6077@thunk.org>

On Mon, 1 Mar 2010 22:06:19 -0500 tytso@mit.edu wrote:
> On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote:
> > mpage_da_submit_io() may process tens of thousands of pages at a
> > time. Unless full preemption is enabled, it causes scheduling
> > latencies in the order of tens of milliseconds.
> > 
> > It can be reproduced simply by writing a big file on ext4
> > repeatedly with dd if=/dev/zero of=/tmp/dummy bs=10M count=50
> > 
> > The patch fixes it by allowing to reschedule in the loop.
> 
> Have you tested for any performance regressions as a result of this
> patch, using some file system benchmarks?

I used the 'fio' benchmark to test sequential write speed. Here are the
results:

           Test               kernel   aggregate bandwidth
 ------------------------------------------------------
      hdd-multi     2.6.33.nopreempt   32.7 ±  3.5 MB/s
      hdd-multi        2.6.33.reduce   33.8 ±  3.7 MB/s
      hdd-multi       2.6.33.preempt   33.4 ±  3.1 MB/s

     hdd-single     2.6.33.nopreempt   35.9 ±  2.1 MB/s
     hdd-single        2.6.33.reduce   36.6 ±  2.3 MB/s
     hdd-single       2.6.33.preempt   35.9 ±  2.0 MB/s

  ramdisk-multi     2.6.33.nopreempt  189.7 ±  9.2 MB/s
  ramdisk-multi        2.6.33.reduce  191.4 ±  9.5 MB/s
  ramdisk-multi       2.6.33.preempt  163.5 ±  9.4 MB/s

 ramdisk-single     2.6.33.nopreempt  152.3 ± 10.9 MB/s
 ramdisk-single        2.6.33.reduce  171.3 ± 17.0 MB/s
 ramdisk-single       2.6.33.preempt  144.2 ± 15.2 MB/s

The tests were run on a laptop with dual AMD Turion 2 GHz, 2 GB RAM.
A newly created filesystem was used for every fio run.
In the 'hdd' tests the filesystem was on a 24 GB LV on a harddisk. These
tests were repeated 12 times.
 - In the '-single' variant a single process wrote a 5 GB file.
 - In the '-multi' variant 5 processes wrote a 1 GB file each.
In the 'ramdisk' tests the filesystem was on a 1.5 GB ramdisk. These
tests were repeated >40 times.
 - In the '-single' variant a single process wrote a 1400 MB file.
 - In the '-multi' variant 5 processes wrote a 280 MB file each.
The kernels were:
 '2.6.33.nopreempt' - vanilla 2.6.33 with CONFIG_PREEMPT_NONE
 '2.6.33.reduce'    - the same + the patch to add the cond_resched()
 '2.6.33.preempt'   - 2.6.33 with CONFIG_PREEMPT (for curiosity)
The data for 'aggregate bandwidth' were taken from fio's 'aggrb' result.
The margin of error as reported in the table is 2 * standard deviation.

Conclusion: Adding the cond_resched() did not result in any measurable
performance decrease of sequential writes. (The results show a
performance increase, but it's within the margin of error.)

> I don't think this is the best way to fix this problem, though.  The
> real right answer is to change how the code is structued.  All of the
> callsites that call mpage_da_submit_io() are immediately preceeded by
> mpage_da_map_blocks().  These two functions should be combined and
> instead of calling ext4_writepage() for each page,
> mpage_da_map_and_write_blocks() should make a single call to
> submit_bio() for each extent.  That should far more CPU efficient,
> solving both your scheduling latency issue as well as helping out for
> benchmarks that strive to stress both the disk and CPU simultaneously
> (such as for example the TPC benchmarks).
> 
> This will also make our blktrace results much more compact, and Chris
> Mason will be very happy about that!

You're almost certainly right, but I'm not likely to make such a change
in the near future.

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2010-03-10 13:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-01 12:34 [PATCH] ext4: reduce scheduling latency with delayed allocation Michal Schmidt
2010-03-02  3:06 ` tytso
2010-03-10 13:09   ` Michal Schmidt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100310140934.60c06148@leela \
    --to=mschmidt@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox