From: Michal Schmidt <mschmidt@redhat.com>
To: tytso@mit.edu
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: reduce scheduling latency with delayed allocation
Date: Wed, 10 Mar 2010 14:09:34 +0100 [thread overview]
Message-ID: <20100310140934.60c06148@leela> (raw)
In-Reply-To: <20100302030619.GB6077@thunk.org>
On Mon, 1 Mar 2010 22:06:19 -0500 tytso@mit.edu wrote:
> On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote:
> > mpage_da_submit_io() may process tens of thousands of pages at a
> > time. Unless full preemption is enabled, it causes scheduling
> > latencies in the order of tens of milliseconds.
> >
> > It can be reproduced simply by writing a big file on ext4
> > repeatedly with dd if=/dev/zero of=/tmp/dummy bs=10M count=50
> >
> > The patch fixes it by allowing to reschedule in the loop.
>
> Have you tested for any performance regressions as a result of this
> patch, using some file system benchmarks?
I used the 'fio' benchmark to test sequential write speed. Here are the
results:
Test kernel aggregate bandwidth
------------------------------------------------------
hdd-multi 2.6.33.nopreempt 32.7 ± 3.5 MB/s
hdd-multi 2.6.33.reduce 33.8 ± 3.7 MB/s
hdd-multi 2.6.33.preempt 33.4 ± 3.1 MB/s
hdd-single 2.6.33.nopreempt 35.9 ± 2.1 MB/s
hdd-single 2.6.33.reduce 36.6 ± 2.3 MB/s
hdd-single 2.6.33.preempt 35.9 ± 2.0 MB/s
ramdisk-multi 2.6.33.nopreempt 189.7 ± 9.2 MB/s
ramdisk-multi 2.6.33.reduce 191.4 ± 9.5 MB/s
ramdisk-multi 2.6.33.preempt 163.5 ± 9.4 MB/s
ramdisk-single 2.6.33.nopreempt 152.3 ± 10.9 MB/s
ramdisk-single 2.6.33.reduce 171.3 ± 17.0 MB/s
ramdisk-single 2.6.33.preempt 144.2 ± 15.2 MB/s
The tests were run on a laptop with dual AMD Turion 2 GHz, 2 GB RAM.
A newly created filesystem was used for every fio run.
In the 'hdd' tests the filesystem was on a 24 GB LV on a harddisk. These
tests were repeated 12 times.
- In the '-single' variant a single process wrote a 5 GB file.
- In the '-multi' variant 5 processes wrote a 1 GB file each.
In the 'ramdisk' tests the filesystem was on a 1.5 GB ramdisk. These
tests were repeated >40 times.
- In the '-single' variant a single process wrote a 1400 MB file.
- In the '-multi' variant 5 processes wrote a 280 MB file each.
The kernels were:
'2.6.33.nopreempt' - vanilla 2.6.33 with CONFIG_PREEMPT_NONE
'2.6.33.reduce' - the same + the patch to add the cond_resched()
'2.6.33.preempt' - 2.6.33 with CONFIG_PREEMPT (for curiosity)
The data for 'aggregate bandwidth' were taken from fio's 'aggrb' result.
The margin of error as reported in the table is 2 * standard deviation.
Conclusion: Adding the cond_resched() did not result in any measurable
performance decrease of sequential writes. (The results show a
performance increase, but it's within the margin of error.)
> I don't think this is the best way to fix this problem, though. The
> real right answer is to change how the code is structued. All of the
> callsites that call mpage_da_submit_io() are immediately preceeded by
> mpage_da_map_blocks(). These two functions should be combined and
> instead of calling ext4_writepage() for each page,
> mpage_da_map_and_write_blocks() should make a single call to
> submit_bio() for each extent. That should far more CPU efficient,
> solving both your scheduling latency issue as well as helping out for
> benchmarks that strive to stress both the disk and CPU simultaneously
> (such as for example the TPC benchmarks).
>
> This will also make our blktrace results much more compact, and Chris
> Mason will be very happy about that!
You're almost certainly right, but I'm not likely to make such a change
in the near future.
Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Michal Schmidt <mschmidt@redhat.com>
To: tytso@mit.edu
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: reduce scheduling latency with delayed allocation
Date: Wed, 10 Mar 2010 14:09:34 +0100 [thread overview]
Message-ID: <20100310140934.60c06148@leela> (raw)
In-Reply-To: <20100302030619.GB6077@thunk.org>
On Mon, 1 Mar 2010 22:06:19 -0500 tytso@mit.edu wrote:
> On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote:
> > mpage_da_submit_io() may process tens of thousands of pages at a
> > time. Unless full preemption is enabled, it causes scheduling
> > latencies in the order of tens of milliseconds.
> >
> > It can be reproduced simply by writing a big file on ext4
> > repeatedly with dd if=/dev/zero of=/tmp/dummy bs=10M count=50
> >
> > The patch fixes it by allowing to reschedule in the loop.
>
> Have you tested for any performance regressions as a result of this
> patch, using some file system benchmarks?
I used the 'fio' benchmark to test sequential write speed. Here are the
results:
Test kernel aggregate bandwidth
------------------------------------------------------
hdd-multi 2.6.33.nopreempt 32.7 ± 3.5 MB/s
hdd-multi 2.6.33.reduce 33.8 ± 3.7 MB/s
hdd-multi 2.6.33.preempt 33.4 ± 3.1 MB/s
hdd-single 2.6.33.nopreempt 35.9 ± 2.1 MB/s
hdd-single 2.6.33.reduce 36.6 ± 2.3 MB/s
hdd-single 2.6.33.preempt 35.9 ± 2.0 MB/s
ramdisk-multi 2.6.33.nopreempt 189.7 ± 9.2 MB/s
ramdisk-multi 2.6.33.reduce 191.4 ± 9.5 MB/s
ramdisk-multi 2.6.33.preempt 163.5 ± 9.4 MB/s
ramdisk-single 2.6.33.nopreempt 152.3 ± 10.9 MB/s
ramdisk-single 2.6.33.reduce 171.3 ± 17.0 MB/s
ramdisk-single 2.6.33.preempt 144.2 ± 15.2 MB/s
The tests were run on a laptop with dual AMD Turion 2 GHz, 2 GB RAM.
A newly created filesystem was used for every fio run.
In the 'hdd' tests the filesystem was on a 24 GB LV on a harddisk. These
tests were repeated 12 times.
- In the '-single' variant a single process wrote a 5 GB file.
- In the '-multi' variant 5 processes wrote a 1 GB file each.
In the 'ramdisk' tests the filesystem was on a 1.5 GB ramdisk. These
tests were repeated >40 times.
- In the '-single' variant a single process wrote a 1400 MB file.
- In the '-multi' variant 5 processes wrote a 280 MB file each.
The kernels were:
'2.6.33.nopreempt' - vanilla 2.6.33 with CONFIG_PREEMPT_NONE
'2.6.33.reduce' - the same + the patch to add the cond_resched()
'2.6.33.preempt' - 2.6.33 with CONFIG_PREEMPT (for curiosity)
The data for 'aggregate bandwidth' were taken from fio's 'aggrb' result.
The margin of error as reported in the table is 2 * standard deviation.
Conclusion: Adding the cond_resched() did not result in any measurable
performance decrease of sequential writes. (The results show a
performance increase, but it's within the margin of error.)
> I don't think this is the best way to fix this problem, though. The
> real right answer is to change how the code is structued. All of the
> callsites that call mpage_da_submit_io() are immediately preceeded by
> mpage_da_map_blocks(). These two functions should be combined and
> instead of calling ext4_writepage() for each page,
> mpage_da_map_and_write_blocks() should make a single call to
> submit_bio() for each extent. That should far more CPU efficient,
> solving both your scheduling latency issue as well as helping out for
> benchmarks that strive to stress both the disk and CPU simultaneously
> (such as for example the TPC benchmarks).
>
> This will also make our blktrace results much more compact, and Chris
> Mason will be very happy about that!
You're almost certainly right, but I'm not likely to make such a change
in the near future.
Michal
next prev parent reply other threads:[~2010-03-10 13:09 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-01 12:34 [PATCH] ext4: reduce scheduling latency with delayed allocation Michal Schmidt
2010-03-02 3:06 ` tytso
2010-03-10 13:09 ` Michal Schmidt [this message]
2010-03-10 13:09 ` Michal Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100310140934.60c06148@leela \
--to=mschmidt@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.