linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Willy Tarreau <w@1wt.eu>, Linus Torvalds <torvalds@osdl.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	git@vger.kernel.org, nigel@nigel.suspend2.net,
	"J.H." <warthog9@kernel.org>,
	Randy Dunlap <randy.dunlap@oracle.com>,
	Pavel Machek <pavel@ucw.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	webmaster@kernel.org,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: How git affects kernel.org performance
Date: Mon, 8 Jan 2007 08:35:55 +0530	[thread overview]
Message-ID: <20070108030555.GA7289@in.ibm.com> (raw)
In-Reply-To: <20070107011542.3496bc76.akpm@osdl.org>

On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
> On Sun, 7 Jan 2007 09:55:26 +0100
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > >
> > > > During extremely high load, it appears that what slows kernel.org down more
> > > > than anything else is the time that each individual getdents() call takes.
> > > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > > packed tree, you can do the math yourself.
> > >
> > > "getdents()" is totally serialized by the inode semaphore. It's one of the
> > > most expensive system calls in Linux, partly because of that, and partly
> > > because it has to call all the way down into the filesystem in a way that
> > > almost no other common system call has to (99% of all filesystem calls can
> > > be handled basically at the VFS layer with generic caches - but not
> > > getdents()).
> > >
> > > So if there are concurrent readdirs on the same directory, they get
> > > serialized. If there is any file creation/deletion activity in the
> > > directory, it serializes getdents().
> > >
> > > To make matters worse, I don't think it has any read-ahead at all when you
> > > use hashed directory entries. So if you have cold-cache case, you'll read
> > > every single block totally individually, and serialized. One block at a
> > > time (I think the non-hashed case is likely also suspect, but that's a
> > > separate issue)
> > >
> > > In other words, I'm not at all surprised it hits on filldir time.
> > > Especially on ext3.
> >
> > At work, we had the same problem on a file server with ext3. We use rsync
> > to make backups to a local IDE disk, and we noticed that getdents() took
> > about the same time as Peter reports (0.2 to 2 seconds), especially in
> > maildir directories. We tried many things to fix it with no result,
> > including enabling dirindexes. Finally, we made a full backup, and switched
> > over to XFS and the problem totally disappeared. So it seems that the
> > filesystem matters a lot here when there are lots of entries in a
> > directory, and that ext3 is not suitable for usages with thousands
> > of entries in directories with millions of files on disk. I'm not
> > certain it would be that easy to try other filesystems on kernel.org
> > though :-/
> >
> 
> Yeah, slowly-growing directories will get splattered all over the disk.
> 
> Possible short-term fixes would be to just allocate up to (say) eight
> blocks when we grow a directory by one block.  Or teach the
> directory-growth code to use ext3 reservations.
> 
> Longer-term people are talking about things like on-disk rerservations.
> But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

  parent reply	other threads:[~2007-01-08  3:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061216094421.416a271e.randy.dunlap@oracle.com>
     [not found] ` <20061216095702.3e6f1d1f.akpm@osdl.org>
     [not found]   ` <458434B0.4090506@oracle.com>
     [not found]     ` <1166297434.26330.34.camel@localhost.localdomain>
     [not found]       ` <1166304080.13548.8.camel@nigel.suspend2.net>
     [not found]         ` <459152B1.9040106@zytor.com>
     [not found]           ` <1168140954.2153.1.camel@nigel.suspend2.net>
     [not found]             ` <45A08269.4050504@zytor.com>
     [not found]               ` <45A083F2.5000000@zytor.com>
     [not found]                 ` <Pine.LNX.4.64.0701062130260.3661@woody.osdl.org>
     [not found]                   ` <20070107085526.GR24090@1wt.eu>
2007-01-07  9:15                     ` How git affects kernel.org performance Andrew Morton
2007-01-07  9:38                       ` Rene Herman
2007-01-08  3:05                       ` Suparna Bhattacharya [this message]
2007-01-08 12:58                         ` Theodore Tso
2007-01-08 13:41                           ` Johannes Stezenbach
2007-01-08 13:56                             ` Theodore Tso
2007-01-08 13:59                               ` Pavel Machek
2007-01-08 14:17                                 ` Theodore Tso
2007-01-08 13:43                           ` Jeff Garzik
2007-01-09  1:09                             ` Paul Jackson
2007-01-09  2:18                               ` Jeremy Higdon
     [not found]                           ` <20070109075945.GA8799@mail.ustc.edu.cn>
2007-01-09  7:59                             ` Fengguang Wu
2007-01-09 16:23                               ` Linus Torvalds
     [not found]                                 ` <20070110015739.GA26978@mail.ustc.edu.cn>
2007-01-10  1:57                                   ` Fengguang Wu
2007-01-10  1:57                                   ` Fengguang Wu
2007-01-10  1:57                                   ` Fengguang Wu
2007-01-10  3:20                                   ` Nigel Cunningham
     [not found]                                     ` <20070110140730.GA986@mail.ustc.edu.cn>
2007-01-10 14:07                                       ` Fengguang Wu
2007-01-10 14:07                                       ` Fengguang Wu
2007-01-10 14:07                                       ` Fengguang Wu
2007-01-12 10:54                                       ` Nigel Cunningham
2007-01-09  7:59                             ` Fengguang Wu
2007-01-09  7:59                             ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070108030555.GA7289@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nigel@nigel.suspend2.net \
    --cc=pavel@ucw.cz \
    --cc=randy.dunlap@oracle.com \
    --cc=torvalds@osdl.org \
    --cc=w@1wt.eu \
    --cc=warthog9@kernel.org \
    --cc=webmaster@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).