From: Tomasz Chmielewski <mangoo@wpkg.org>
To: Theodore Tso <tytso@mit.edu>, Andi Kleen <andi@firstfloor.org>,
LKML <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: very poor ext3 write performance on big filesystems?
Date: Wed, 27 Feb 2008 12:20:19 +0100 [thread overview]
Message-ID: <47C54773.4040402@wpkg.org> (raw)
In-Reply-To: <20080218141640.GC12568@mit.edu>
Theodore Tso schrieb:
> On Mon, Feb 18, 2008 at 03:03:44PM +0100, Andi Kleen wrote:
>> Tomasz Chmielewski <mangoo@wpkg.org> writes:
>>> Is it normal to expect the write speed go down to only few dozens of
>>> kilobytes/s? Is it because of that many seeks? Can it be somehow
>>> optimized?
>> I have similar problems on my linux source partition which also
>> has a lot of hard linked files (although probably not quite
>> as many as you do). It seems like hard linking prevents
>> some of the heuristics ext* uses to generate non fragmented
>> disk layouts and the resulting seeking makes things slow.
A follow-up to this thread.
Using small optimizations like playing with /proc/sys/vm/* didn't help
much, increasing "commit=" ext3 mount option helped only a tiny bit.
What *did* help a lot was... disabling the internal bitmap of the RAID-5
array. "rm -rf" doesn't "pause" for several seconds any more.
If md and dm supported barriers, it would be even better I guess (I
could enable write cache with some degree of confidence).
This is "iostat sda -d 10" output without the internal bitmap.
The system mostly tries to read (Blk_read/s), and once in a while it
does a big commit (Blk_wrtn/s):
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 164,67 2088,62 0,00 20928 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 180,12 1999,60 0,00 20016 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 172,63 2587,01 0,00 25896 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 156,53 2054,64 0,00 20608 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 170,20 3013,60 0,00 30136 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,46 1377,25 5264,67 13800 52752
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 154,05 1897,10 0,00 18952 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 197,70 2177,02 0,00 21792 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 166,47 1805,19 0,00 18088 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 150,95 1552,05 0,00 15536 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 158,44 1792,61 0,00 17944 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 132,47 1399,40 3781,82 14008 37856
With the bitmap enabled, it sometimes behave similarly, but mostly, I
can see as reads compete with writes, and both have very low numbers then:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 112,57 946,11 5837,13 9480 58488
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 157,24 1858,94 0,00 18608 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 116,90 1173,60 44,00 11736 440
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 24,05 85,43 172,46 856 1728
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,60 90,40 165,60 904 1656
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,05 276,25 180,44 2768 1808
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22,70 65,60 229,60 656 2296
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 21,66 202,79 786,43 2032 7880
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 20,90 83,20 1800,00 832 18000
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 51,75 237,36 479,52 2376 4800
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 35,43 129,34 245,91 1296 2464
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 34,50 88,00 270,40 880 2704
Now, let's disable the bitmap in the RAID-5 array:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 110,59 536,26 973,43 5368 9744
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,68 533,07 1574,43 5336 15760
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 123,78 368,43 2335,26 3688 23376
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 122,48 315,68 1990,01 3160 19920
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 117,08 580,22 1009,39 5808 10104
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,50 324,00 1080,80 3240 10808
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 118,36 353,69 1926,55 3544 19304
And let's enable it again - after a while, it degrades again, and I can
see "rm -rf" stops for longer periods:
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 162,70 2213,60 0,00 22136 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 165,73 1639,16 0,00 16408 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 119,76 1192,81 3722,16 11952 37296
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 178,70 1855,20 0,00 18552 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 162,64 1528,07 0,80 15296 8
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 182,87 2082,07 0,00 20904 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 168,93 1692,71 0,00 16944 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 177,45 1572,06 0,00 15752 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 123,10 1436,00 4941,60 14360 49416
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 201,30 1984,03 0,00 19880 0
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 165,50 1555,20 22,40 15552 224
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 25,35 273,05 189,22 2736 1896
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 22,58 63,94 165,43 640 1656
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 69,40 435,20 262,40 4352 2624
There is a related thread (although not much kernel-related) on a
BackupPC mailing list:
http://thread.gmane.org/gmane.comp.sysutils.backup.backuppc.general/14009
As it's BackupPC software which makes this amount of hardlinks (but hey,
I can keep ~14 TB of data on a 1.2 TB filesystem which is not even 65%
full).
--
Tomasz Chmielewski
http://wpkg.org
next prev parent reply other threads:[~2008-02-27 11:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-18 12:57 very poor ext3 write performance on big filesystems? Tomasz Chmielewski
2008-02-18 14:03 ` Andi Kleen
2008-02-18 14:16 ` Theodore Tso
2008-02-18 15:02 ` Tomasz Chmielewski
2008-02-18 15:16 ` Theodore Tso
2008-02-18 15:57 ` Andi Kleen
2008-02-18 15:35 ` Theodore Tso
2008-02-20 10:57 ` Jan Engelhardt
2008-02-20 17:44 ` David Rees
2008-02-20 18:08 ` Jan Engelhardt
2008-02-18 16:16 ` Tomasz Chmielewski
2008-02-18 18:45 ` Theodore Tso
2008-02-18 15:18 ` Andi Kleen
2008-02-18 15:03 ` Theodore Tso
2008-02-19 14:54 ` Tomasz Chmielewski
2008-02-19 15:06 ` Chris Mason
2008-02-19 15:21 ` Tomasz Chmielewski
2008-02-19 16:04 ` Chris Mason
2008-02-19 18:29 ` Mark Lord
2008-02-19 18:41 ` Mark Lord
2008-02-19 18:58 ` Paulo Marques
2008-02-19 22:33 ` Mark Lord
2008-02-27 11:20 ` Tomasz Chmielewski [this message]
2008-02-27 20:03 ` Andreas Dilger
2008-02-27 20:25 ` Tomasz Chmielewski
2008-03-01 20:04 ` Bill Davidsen
2008-02-19 9:24 ` Vladislav Bolkhovitin
[not found] <9YdLC-75W-51@gated-at.bofh.it>
[not found] ` <9YeRh-Gq-39@gated-at.bofh.it>
[not found] ` <9Yf0W-SX-19@gated-at.bofh.it>
[not found] ` <9YfNi-2da-23@gated-at.bofh.it>
[not found] ` <9YfWL-2pZ-1@gated-at.bofh.it>
[not found] ` <9Yg6H-2DJ-23@gated-at.bofh.it>
2008-02-19 13:14 ` Paul Slootman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47C54773.4040402@wpkg.org \
--to=mangoo@wpkg.org \
--cc=andi@firstfloor.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.