Re: [PATCH] ext4: reduce scheduling latency with delayed allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Michal Schmidt <mschmidt@redhat.com>
To: tytso@mit.edu
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: reduce scheduling latency with delayed allocation
Date: Wed, 10 Mar 2010 14:09:34 +0100	[thread overview]
Message-ID: <20100310140934.60c06148@leela> (raw)
In-Reply-To: <20100302030619.GB6077@thunk.org>

On Mon, 1 Mar 2010 22:06:19 -0500 tytso@mit.edu wrote:
> On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote:
> > mpage_da_submit_io() may process tens of thousands of pages at a
> > time. Unless full preemption is enabled, it causes scheduling
> > latencies in the order of tens of milliseconds.
> > 
> > It can be reproduced simply by writing a big file on ext4
> > repeatedly with dd if=/dev/zero of=/tmp/dummy bs=10M count=50
> > 
> > The patch fixes it by allowing to reschedule in the loop.
> 
> Have you tested for any performance regressions as a result of this
> patch, using some file system benchmarks?

I used the 'fio' benchmark to test sequential write speed. Here are the
results:

           Test               kernel   aggregate bandwidth
 ------------------------------------------------------
      hdd-multi     2.6.33.nopreempt   32.7 ±  3.5 MB/s
      hdd-multi        2.6.33.reduce   33.8 ±  3.7 MB/s
      hdd-multi       2.6.33.preempt   33.4 ±  3.1 MB/s

     hdd-single     2.6.33.nopreempt   35.9 ±  2.1 MB/s
     hdd-single        2.6.33.reduce   36.6 ±  2.3 MB/s
     hdd-single       2.6.33.preempt   35.9 ±  2.0 MB/s

  ramdisk-multi     2.6.33.nopreempt  189.7 ±  9.2 MB/s
  ramdisk-multi        2.6.33.reduce  191.4 ±  9.5 MB/s
  ramdisk-multi       2.6.33.preempt  163.5 ±  9.4 MB/s

 ramdisk-single     2.6.33.nopreempt  152.3 ± 10.9 MB/s
 ramdisk-single        2.6.33.reduce  171.3 ± 17.0 MB/s
 ramdisk-single       2.6.33.preempt  144.2 ± 15.2 MB/s

The tests were run on a laptop with dual AMD Turion 2 GHz, 2 GB RAM.
A newly created filesystem was used for every fio run.
In the 'hdd' tests the filesystem was on a 24 GB LV on a harddisk. These
tests were repeated 12 times.
 - In the '-single' variant a single process wrote a 5 GB file.
 - In the '-multi' variant 5 processes wrote a 1 GB file each.
In the 'ramdisk' tests the filesystem was on a 1.5 GB ramdisk. These
tests were repeated >40 times.
 - In the '-single' variant a single process wrote a 1400 MB file.
 - In the '-multi' variant 5 processes wrote a 280 MB file each.
The kernels were:
 '2.6.33.nopreempt' - vanilla 2.6.33 with CONFIG_PREEMPT_NONE
 '2.6.33.reduce'    - the same + the patch to add the cond_resched()
 '2.6.33.preempt'   - 2.6.33 with CONFIG_PREEMPT (for curiosity)
The data for 'aggregate bandwidth' were taken from fio's 'aggrb' result.
The margin of error as reported in the table is 2 * standard deviation.

Conclusion: Adding the cond_resched() did not result in any measurable
performance decrease of sequential writes. (The results show a
performance increase, but it's within the margin of error.)

> I don't think this is the best way to fix this problem, though.  The
> real right answer is to change how the code is structued.  All of the
> callsites that call mpage_da_submit_io() are immediately preceeded by
> mpage_da_map_blocks().  These two functions should be combined and
> instead of calling ext4_writepage() for each page,
> mpage_da_map_and_write_blocks() should make a single call to
> submit_bio() for each extent.  That should far more CPU efficient,
> solving both your scheduling latency issue as well as helping out for
> benchmarks that strive to stress both the disk and CPU simultaneously
> (such as for example the TPC benchmarks).
> 
> This will also make our blktrace results much more compact, and Chris
> Mason will be very happy about that!

You're almost certainly right, but I'm not likely to make such a change
in the near future.

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: Michal Schmidt <mschmidt@redhat.com>
To: tytso@mit.edu
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: reduce scheduling latency with delayed allocation
Date: Wed, 10 Mar 2010 14:09:34 +0100	[thread overview]
Message-ID: <20100310140934.60c06148@leela> (raw)
In-Reply-To: <20100302030619.GB6077@thunk.org>

On Mon, 1 Mar 2010 22:06:19 -0500 tytso@mit.edu wrote:
> On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote:
> > mpage_da_submit_io() may process tens of thousands of pages at a
> > time. Unless full preemption is enabled, it causes scheduling
> > latencies in the order of tens of milliseconds.
> > 
> > It can be reproduced simply by writing a big file on ext4
> > repeatedly with dd if=/dev/zero of=/tmp/dummy bs=10M count=50
> > 
> > The patch fixes it by allowing to reschedule in the loop.
> 
> Have you tested for any performance regressions as a result of this
> patch, using some file system benchmarks?

I used the 'fio' benchmark to test sequential write speed. Here are the
results:

           Test               kernel   aggregate bandwidth
 ------------------------------------------------------
      hdd-multi     2.6.33.nopreempt   32.7 ±  3.5 MB/s
      hdd-multi        2.6.33.reduce   33.8 ±  3.7 MB/s
      hdd-multi       2.6.33.preempt   33.4 ±  3.1 MB/s

     hdd-single     2.6.33.nopreempt   35.9 ±  2.1 MB/s
     hdd-single        2.6.33.reduce   36.6 ±  2.3 MB/s
     hdd-single       2.6.33.preempt   35.9 ±  2.0 MB/s

  ramdisk-multi     2.6.33.nopreempt  189.7 ±  9.2 MB/s
  ramdisk-multi        2.6.33.reduce  191.4 ±  9.5 MB/s
  ramdisk-multi       2.6.33.preempt  163.5 ±  9.4 MB/s

 ramdisk-single     2.6.33.nopreempt  152.3 ± 10.9 MB/s
 ramdisk-single        2.6.33.reduce  171.3 ± 17.0 MB/s
 ramdisk-single       2.6.33.preempt  144.2 ± 15.2 MB/s

The tests were run on a laptop with dual AMD Turion 2 GHz, 2 GB RAM.
A newly created filesystem was used for every fio run.
In the 'hdd' tests the filesystem was on a 24 GB LV on a harddisk. These
tests were repeated 12 times.
 - In the '-single' variant a single process wrote a 5 GB file.
 - In the '-multi' variant 5 processes wrote a 1 GB file each.
In the 'ramdisk' tests the filesystem was on a 1.5 GB ramdisk. These
tests were repeated >40 times.
 - In the '-single' variant a single process wrote a 1400 MB file.
 - In the '-multi' variant 5 processes wrote a 280 MB file each.
The kernels were:
 '2.6.33.nopreempt' - vanilla 2.6.33 with CONFIG_PREEMPT_NONE
 '2.6.33.reduce'    - the same + the patch to add the cond_resched()
 '2.6.33.preempt'   - 2.6.33 with CONFIG_PREEMPT (for curiosity)
The data for 'aggregate bandwidth' were taken from fio's 'aggrb' result.
The margin of error as reported in the table is 2 * standard deviation.

Conclusion: Adding the cond_resched() did not result in any measurable
performance decrease of sequential writes. (The results show a
performance increase, but it's within the margin of error.)

> I don't think this is the best way to fix this problem, though.  The
> real right answer is to change how the code is structued.  All of the
> callsites that call mpage_da_submit_io() are immediately preceeded by
> mpage_da_map_blocks().  These two functions should be combined and
> instead of calling ext4_writepage() for each page,
> mpage_da_map_and_write_blocks() should make a single call to
> submit_bio() for each extent.  That should far more CPU efficient,
> solving both your scheduling latency issue as well as helping out for
> benchmarks that strive to stress both the disk and CPU simultaneously
> (such as for example the TPC benchmarks).
> 
> This will also make our blktrace results much more compact, and Chris
> Mason will be very happy about that!

You're almost certainly right, but I'm not likely to make such a change
in the near future.

Michal

next prev parent reply	other threads:[~2010-03-10 13:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-01 12:34 [PATCH] ext4: reduce scheduling latency with delayed allocation Michal Schmidt
2010-03-02  3:06 ` tytso
2010-03-10 13:09   ` Michal Schmidt [this message]
2010-03-10 13:09     ` Michal Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100310140934.60c06148@leela \
    --to=mschmidt@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.