From: Kay Diederichs <Kay.Diederichs@uni-konstanz.de>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
linux <linux-kernel@vger.kernel.org>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Karsten Schaefer <karsten.schaefer@uni-konstanz.de>,
Ted Ts'o <tytso@mit.edu>
Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later
Date: Tue, 03 Aug 2010 15:31:16 +0200 [thread overview]
Message-ID: <4C581A24.6090709@uni-konstanz.de> (raw)
In-Reply-To: <4C56EE67.4070905@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 4058 bytes --]
Eric Sandeen schrieb:
> On 08/02/2010 09:52 AM, Kay Diederichs wrote:
>> Dave,
>>
>> as you suggested, we reverted "ext4: Avoid group preallocation for
>> closed files" and this indeed fixes a big part of the problem: after
>> booting the NFS server we get
>>
>> NFS-Server: turn5 2.6.32.16p i686
>> NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64
>>
>> exported directory on the nfs-server:
>> /dev/md5 /mnt/md5 ext4
>> rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0
>>
>> 48 seconds for preparations
>> 28 seconds to rsync 100 frames with 597M from nfs directory
>> 57 seconds to rsync 100 frames with 595M to nfs directory
>> 70 seconds to untar 24353 kernel files with 323M to nfs directory
>> 57 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 133 seconds to run xds_par in nfs directory
>> 425 seconds to run the script
>
> Interesting, I had found this commit to be a problem for small files
> which are constantly created & deleted; the commit had the effect of
> packing the newly created files in the first free space that could be
> found, rather than walking down the disk leaving potentially fragmented
> freespace behind (see seekwatcher graph attached). Reverting the patch
> sped things up for this test, but left the filesystem freespace in bad
> shape.
>
> But you seem to see one of the largest effects in here:
>
> 261 seconds to rsync 100 frames with 595M to nfs directory
> vs
> 57 seconds to rsync 100 frames with 595M to nfs directory
>
> with the patch reverted making things go faster. So you are doing 100
> 6MB writes to the server, correct? Is the filesystem mkfs'd fresh
> before each test, or is it aged? If not mkfs'd, is it at least
> completely empty prior to the test, or does data remain on it? I'm just
> wondering if fragmented freespace is contributing to this behavior as
> well. If there is fragmented freespace, then with the patch I think the
> allocator is more likely to hunt around for small discontiguous chunks
> of free sapce, rather than going further out in the disk looking for a
> large area to allocate from.
>
> It might be interesting to use seekwatcher on the server to visualize
> the allocation/IO patterns for the test running just this far?
>
> -Eric
>
>
> ------------------------------------------------------------------------
>
Eric,
seekwatcher does not seem to understand the blktrace output of old
kernels, so I rolled my own primitive plotting, e.g.
blkparse -i md5.xds_par.2.6.32.16p_run1 > blkparse.out
grep flush blkparse.out | grep W > flush_W
grep flush blkparse.out | grep R > flush_R
grep nfsd blkparse.out | grep R > nfsd_R
grep nfsd blkparse.out | grep W > nfsd_W
grep sync blkparse.out | grep R > sync_R
grep sync blkparse.out | grep W > sync_W
gnuplot<<EOF
set term png
set out '2.6.32.16p_run1.png'
set key outside
set title "2.6.32.16p_run1"
plot 'nfsd_W' us 4:8,'flush_W' us 4:8,'sync_W' us 4:8,'nfsd_R' us
4:8,'flush_R' us 4:8
EOF
I attach the resulting plots for 2.6.27.48_run1 (after booting) and
2.6.27.48_run2 (after run1 ; sync; and drop_cache). They show seconds on
the x axis (horizontal) and block numbers (512-byte blocks, I suppose;
the ext4 filesystem has 976761344 4096-byte blocks so that would be
about 8e+09 512-byte blocks) on the y axis (vertical).
You'll have to do the real interpretation of the plots yourself, but
even someone who does not know exactly what the pdflush (in 2.6.27.48)
or flush (in 2.6.32+) kernel threads are supposed to do can tell that
the kernels behave _very_ differently.
In particular, stock 2.6.32.16 every time (only run1 is shown, but run2
is the same) has the flush thread visiting all of the filesystem, in
steps of 263168 blocks. I have no idea why it does this.
Roughly the first 1/3 of the filesystem is also visited by kernels
2.6.27.48 and the patched 2.6.32.16 that Dave Chinner suggested, but
only in the first run after booting. Subsequent runs are fast and do not
employ the flush thread much.
Hope this helps to pin down the regression.
thanks,
Kay
[-- Attachment #1.2: 2.6.27.48_run1.png --]
[-- Type: image/png, Size: 5146 bytes --]
[-- Attachment #1.3: 2.6.27.48_run2.png --]
[-- Type: image/png, Size: 4484 bytes --]
[-- Attachment #1.4: 2.6.32.16p_run1.png --]
[-- Type: image/png, Size: 4935 bytes --]
[-- Attachment #1.5: 2.6.32.16p_run2.png --]
[-- Type: image/png, Size: 4443 bytes --]
[-- Attachment #1.6: 2.6.32.16.png --]
[-- Type: image/png, Size: 5359 bytes --]
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 5236 bytes --]
next prev parent reply other threads:[~2010-08-03 13:31 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-28 19:51 ext4 performance regression 2.6.27-stable versus 2.6.32 and later Kay Diederichs
2010-07-28 21:00 ` Greg Freemyer
2010-08-02 10:47 ` Kay Diederichs
2010-08-02 16:04 ` Henrique de Moraes Holschuh
2010-08-02 16:10 ` Henrique de Moraes Holschuh
2010-07-29 23:28 ` Dave Chinner
2010-08-02 14:52 ` Kay Diederichs
2010-08-02 16:12 ` Eric Sandeen
2010-08-02 21:08 ` Kay Diederichs
2010-08-03 13:31 ` Kay Diederichs [this message]
2010-07-30 2:20 ` Ted Ts'o
2010-07-30 21:01 ` Kay Diederichs
2010-08-01 23:02 ` Ted Ts'o
2010-08-02 15:28 ` Kay Diederichs
[not found] ` <4C56E47B.8080600@uni-konstanz.de>
[not found] ` <20100802202123.GC25653@thunk.org>
2010-08-04 8:18 ` Kay Diederichs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C581A24.6090709@uni-konstanz.de \
--to=kay.diederichs@uni-konstanz.de \
--cc=david@fromorbit.com \
--cc=karsten.schaefer@uni-konstanz.de \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).