Re: ext3 throughput woes on certain (possibly heavily fragmented) files

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Simon Kirby <sim@netnation.com>
To: "Stephen C. Tweedie" <sct@redhat.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: ext3 throughput woes on certain (possibly heavily fragmented) files
Date: Mon, 16 Sep 2002 15:39:11 -0700	[thread overview]
Message-ID: <20020916223911.GA1658@netnation.com> (raw)
In-Reply-To: <20020906182457.F3029@redhat.com>

On Fri, Sep 06, 2002 at 06:24:57PM +0100, Stephen C. Tweedie wrote:

> Ext2 has a preallocation mechanism so that if you have multiple
> writes, they get dealt with to some extent as a single allocation.
> However, that doesn't work over close(): the preallocated blocks are
> discarded wheneven we close the file.
> 
> The problem with mail files, though, is that they tend to grow quite
> slowly, so the writes span very many transactions and we don't have
> that opportunity for coalescing the writes.  Actively defragmenting on
> writes is an alternative in that case.

We recently switched a large mail spool from ext2 to ext3 with default
journalling, and we are now having huge problems with disk I/O load.

We have fsync and friends disabled for performance reasons.  With ext2,
the machine would happily hum along with an average load of 0.2 and a
usual 400 kB - 800 kB write every 5 seconds, with about 10 kB/sec read in
every second.

Now with ext3, the machine has a load average of about 15 and writing
happens almost all of the time.  "vmstat 1" output:

   procs                    memory   swap       io     system       cpu
 r  b  w  swpd  free   buff  cache  si so  bi   bo   in    cs  us sy id
 0 42  2 79368 47196 100456 1080348  0  0   0 3036 2514  2077  18 21 60
 0 76  2 79368 44264 100456 1080348  0  0   0 1776 1266   823   4  3 92
 0 111 3 79368 41248 100456 1080348  0  0   0 1952 1176   722   4  5 91
 0 132 2 79368 39432 100460 1080348  0  0   0 1368 1007   612   1  3 96
 0 67  3 79368 34412 100460 1080628  0  0   0 2884 1968  1246  18 13 69
 0 41  2 79368 36572 100468 1080828  0  0  24 4020 2661  1530  16 21 64
 0 32  3 79368 31736 100500 1081456  0  0   0 3688 2696  2061  26 22 52
 0 39  3 79368 24588 100528 1082164  0  0   4 3800 2636  2643  30 21 50
 0 32  4 79368 21500 100536 1082832  0  0  24 3216 2404  2419  32 15 54
 5 28  2 79368 18160 100536 1083360  0  0   0 3416 2372  2164  24 19 57
 0 25  4 79368 19748 100552 1082896  0  0   4 4120 2544  2421  17 21 62
 4 16  4 79368 18216 100560 1083284  0  0   0 3532 2115  2361  20 17 63
 0 37  2 79368 17240 100568 1083456  0  0  16 2376 1817  1691   8 12 80
 1 67  3 79368 15112 100568 1083456  4  0   4 1644 1051   723   6  4 90
 1 88  3 79368 12028 100572 1083464  0  0   8 1884 1102   684   6  3 91
 0 108 3 79368 10132 100572 1083468  0  0   0 1716  924   503   3  3 94
15  0  2 79368 14460 100548 1081996  0  0  12 3852 2609  2000  17 25 59
 0 39  3 79368 13252 100576 1082220  0  0  52 4288 2740  2095  19 19 62

This box is primarily running a POP3 server (written in-house to cache
mbox offsets, so that it can handle a huge volume of mail), and also
exports the mail spool via NFS to other servers which run exim (-fsync). 
nfsd is exported async.  Everything is mounted noatime, nodiratime.  No
applications should be calling sync/fsync/fdatasync or using O_SYNC. 
It's a mail server, so everything is fragmented.

We're using dotlocking.  Would this cause metadata journalling?  We had
to hash the mail spool a long time ago do to system time eating all CPU
(the ext2 linear directory scan to find a slot available in the spool
directory to add the dotlock file).  I estimate about 200 - 300 dotlock
files are created per second, but these should all be asynchronous. 
Would switching to fctnl() locking (if this works over NFS) solve the
problem?

A "ps -eo pid,stat,args,wchan | grep simpopd | grep ' D '" shows POP3
processes stuck in either "down" or in "do_get_write_access", which
appears to be a journal function.

We notice there are some ext3 updates included as a patch to vanilla
2.4.18 in the newest Red Hat kernel, including changes to the
do_get_write_access function.  Have improvements in this area been made?

Thanks!

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

next prev parent reply	other threads:[~2002-09-16 22:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-03  9:24 ext3 throughput woes on certain (possibly heavily fragmented) files Aaron Lehmann
2002-09-06 16:06 ` Stephen C. Tweedie
2002-09-06 17:14   ` Nikita Danilov
2002-09-06 17:22     ` Hans Reiser
2002-09-06 21:02       ` Aaron Lehmann
2002-09-06 22:05         ` Hans Reiser
2002-09-06 17:24     ` Stephen C. Tweedie
2002-09-16 22:39       ` Simon Kirby [this message]
2002-09-17 16:53         ` Andreas Dilger
2002-09-17 21:55         ` jw schultz
  -- strict thread matches above, loose matches on Subject: below --
2002-09-16 18:00 Peter Niemayer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020916223911.GA1658@netnation.com \
    --to=sim@netnation.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox