linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kay Diederichs <Kay.Diederichs@uni-konstanz.de>
To: Greg Freemyer <greg.freemyer@gmail.com>
Cc: linux <linux-kernel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Karsten Schaefer <karsten.schaefer@uni-konstanz.de>
Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later
Date: Mon, 02 Aug 2010 12:47:28 +0200	[thread overview]
Message-ID: <4C56A240.1040506@uni-konstanz.de> (raw)
In-Reply-To: <AANLkTimh3eKc-M4xphq0djnSa=4W4hUf7KRf=icdF9Rk@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5041 bytes --]

Greg Freemyer schrieb:
> On Wed, Jul 28, 2010 at 3:51 PM, Kay Diederichs
> <Kay.Diederichs@uni-konstanz.de> wrote:
>> Dear all,
>>
>> we reproducibly find significantly worse ext4 performance when our
>> fileservers run 2.6.32 or later kernels, when compared to the
>> 2.6.27-stable series.
>>
>> The hardware is RAID5 of 5 1TB WD10EACS disks (giving almost 4TB) in an
>> external eSATA enclosure (STARDOM ST6600); disks are not partitioned but
>> rather the complete disks are used:
>> md5 : active raid5 sde[0] sdg[5] sdd[3] sdc[2] sdf[1]
>>    3907045376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
>> [UUUUU]
>>
>> The enclosure is connected using a Silicon Image (supported by
>> sata_sil24) PCIe-X1 adapter to one of our fileservers (either the backup
>> fileserver, 32bit desktop hardware with Intel(R) Pentium(R) D CPU
>> 3.40GHz, or a production-fileserver 64bit Precision WorkStation 670 w/ 2
>> Xeon 3.2GHz).
>>
>> The ext4 filesystem was created using
>> mke2fs -j -T largefile -E stride=128,stripe_width=512 -O extent,uninit_bg
>> It is mounted with noatime,data=writeback
>>
>> As operating system we usually use RHEL5.5, but to exclude problems with
>> self-compiled kernels, we also booted USB sticks with latest Fedora12
>> and FC13 .
>>
>> Our benchmarks consist of copying 100 6MB files from and to the RAID5,
>> over NFS (NVSv3, GB ethernet, TCP, async export), and tar-ing and
>> rsync-ing kernel trees back and forth. Before and after each individual
>> benchmark part, we "sync" and "echo 3 > /proc/sys/vm/drop_caches" on
>> both the client and the server.
>>
>> The problem:
>> with 2.6.27.48 we typically get:
>>  44 seconds for preparations
>>  23 seconds to rsync 100 frames with 597M from nfs directory
>>  33 seconds to rsync 100 frames with 595M to nfs directory
>>  50 seconds to untar 24353 kernel files with 323M to nfs directory
>>  56 seconds to rsync 24353 kernel files with 323M from nfs directory
>>  67 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 301 seconds to run the script
>>
>> with 2.6.32.16 we find:
>>  49 seconds for preparations
>>  23 seconds to rsync 100 frames with 597M from nfs directory
>> 261 seconds to rsync 100 frames with 595M to nfs directory
>>  74 seconds to untar 24353 kernel files with 323M to nfs directory
>>  67 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 290 seconds to run xds_par in nfs directory (reads and writes 600M)
>> 797 seconds to run the script
>>
>> This is quite reproducible (times varying about 1-2% or so). All times
>> include reading and writing on the client side (stock CentOS5.5 Nehalem
>> machines with fast single SATA disks). The 2.6.32.16 times are the same
>> with FC12 and FC13 (booted from USB stick).
>>
>> The 2.6.27-versus-2.6.32+ regression cannot be due to barriers because
>> md RAID5 does not support barriers ("JBD: barrier-based sync failed on
>> md5 - disabling barriers").
>>
>> What we tried: noop and deadline schedulers instead of cfq;
>> modifications of /sys/block/sd[c-g]/queue/max_sectors_kb; switching
>> on/off NCQ; blockdev --setra 8192 /dev/md5; increasing
>> /sys/block/md5/md/stripe_cache_size
>>
>> When looking at the I/O statistics while the benchmark is running, we
>> see very choppy patterns for 2.6.32, but quite smooth stats for
>> 2.6.27-stable.
>>
>> It is not an NFS problem; we see the same effect when transferring the
>> data using an rsync daemon. We believe, but are not sure, that the
>> problem does not exist with ext3 - it's not so quick to re-format a 4 TB
>> volume.
>>
>> Any ideas? We cannot believe that a general ext4 regression should have
>> gone unnoticed. So is it due to the interaction of ext4 with md-RAID5 ?
>>
>> thanks,
>>
>> Kay
> 
> Kay,
> 
> I didn't read your whole e-mail, but 2.6.27 has known issues with
> barriers not working in many raid configs.  Thus it is more likely to
> experience data loss in the event of a power failure.
> 
> With newer kernels, If you prefer to have performance over robustness,
> you can mount with the "nobarrier" option.
> 
> So now you have your choice whereas with 2.6.27, with raid5 you
> effectively had nobarriers as your only choice.
> 
> Greg

Greg,

2.6.33 and later support md5 write barriers, whereas 2.6.27-stable
doesn't. I looked thru the 2.6.32.* Changelogs at
http://kernel.org/pub/linux/kernel/v2.6/ but could not find anything
indicating that md5 write barriers were backported to 2.6.32-stable.

Anyway, we do not get the message "JBD: barrier-based sync failed on md5
- disabling barriers" when using 2.6.32.16 which might indicate that
write barriers are indeed active when specifying no options in this respect.

Performance-wise, we tried mounting with barrier versus nobarrier (or
barrier=1 versus barrier=0) and re-did the 2.6.32+ benchmarks. It turned
out that the benchmark difference with and without barrier is less than
the variation between runs (which is much higher with 2.6.32+ than with
2.6.27-stable), so the influence seems to be minor.

best,

Kay

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 5236 bytes --]

  reply	other threads:[~2010-08-02 10:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-28 19:51 ext4 performance regression 2.6.27-stable versus 2.6.32 and later Kay Diederichs
2010-07-28 21:00 ` Greg Freemyer
2010-08-02 10:47   ` Kay Diederichs [this message]
2010-08-02 16:04     ` Henrique de Moraes Holschuh
2010-08-02 16:10       ` Henrique de Moraes Holschuh
2010-07-29 23:28 ` Dave Chinner
2010-08-02 14:52   ` Kay Diederichs
2010-08-02 16:12     ` Eric Sandeen
2010-08-02 21:08       ` Kay Diederichs
2010-08-03 13:31       ` Kay Diederichs
2010-07-30  2:20 ` Ted Ts'o
2010-07-30 21:01   ` Kay Diederichs
2010-08-01 23:02     ` Ted Ts'o
2010-08-02 15:28   ` Kay Diederichs
     [not found]   ` <4C56E47B.8080600@uni-konstanz.de>
     [not found]     ` <20100802202123.GC25653@thunk.org>
2010-08-04  8:18       ` Kay Diederichs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C56A240.1040506@uni-konstanz.de \
    --to=kay.diederichs@uni-konstanz.de \
    --cc=greg.freemyer@gmail.com \
    --cc=karsten.schaefer@uni-konstanz.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).