linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Mingming Cao <cmm@us.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
	Andreas Dilger <adilger@sun.com>,
	linux-ext4@vger.kernel.org
Subject: Re: How to fix up mballoc
Date: Thu, 23 Jul 2009 20:43:24 -0400	[thread overview]
Message-ID: <20090724004324.GB14052@mit.edu> (raw)
In-Reply-To: <4A68A33E.4050103@us.ibm.com>

On Thu, Jul 23, 2009 at 10:51:58AM -0700, Mingming Cao wrote:
> I am trying to understand what user cases prefer normalize allocation  
> request size? If they are uncommon cases, perhaps
> we should disable the normalize the allocation size disabled by default,  
> unless the apps opens the files with O_APPEND?

The case where we would want to round the allocation size up would be
if we are writing a large file (say, like a large mp3 or mpeg4 file),
which takes a while for the audio/video encoder to write out the
blocks.   In that case, doing file-based preallocation is a good thing.

Normally, if we are doing block allocations for files greater than 16
blocks (i.e, 64k), we use file-based preallocation.  Otherwise we use
block group allocations.  The problem with using block group
allocations is that way it works is that first time we try to allocate
out of a block group, we try to find a free extent which is 512 blocks
long.  If we can't find a free extent which is 512 blocks long, we'll
try another block group.  Hence, for small files, once a block group
gets fragmented to the point where there isn't a free chunk which is
512 blocks long, we'll try to find another block group --- even if
that means finding another block group far, FAR away from the block
group where the directory is contained.

Worse yet, if we unmount and remount the filesystem, we forget the
fact that we were using a particular (now-partially filled)
preallocation group, so the next time we try to allocate space for a
small file, we will find *another* free 512 block chunk to allocate
small files.  Given that there is 32,768 blocks in block group, after
64 interations of "mount, write one 4k file in a directory, unmount",
that block group will have 64 files, each separated by 511 blocks, and
that block group will no longer have any free 512 chunks for block
allocations.  (And given that the block preallocation is per-CPU, it
becomes even worse on an SMP system.)

Put this baldly, it may be that we need to do a fundamental rethink on
how we do per-cpu, per-blockgroup preallocations for small files.
Maybe instead of trying to find a 512 extent which is completely full,
we should instead be looking for a 512 extent which has at least
mb_stream_req free blocks (i.e. by default 16 free blocks).

	      	   	  	   	      	   - Ted

  reply	other threads:[~2009-07-24  0:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-21  0:17 [PATCH] e2freefrag utility Andreas Dilger
2009-07-22  7:43 ` Theodore Tso
2009-07-23  4:59   ` Eric Sandeen
2009-07-23 13:45     ` How to fix up mballoc Theodore Tso
2009-07-23 17:43       ` Eric Sandeen
2009-07-24  0:23         ` Theodore Tso
2009-07-24  2:18           ` Eric Sandeen
2009-07-24  2:25             ` Eric Sandeen
2009-07-24  2:30           ` Andreas Dilger
2009-07-23 17:51       ` Mingming Cao
2009-07-24  0:43         ` Theodore Tso [this message]
2009-07-23 17:07     ` [PATCH] e2freefrag utility Andreas Dilger
2009-07-23 17:18       ` Eric Sandeen
2009-07-24 22:32       ` Theodore Tso
2009-07-24 23:14         ` Andreas Dilger
2009-07-25  0:18           ` Theodore Tso
2009-07-27 18:36             ` Andreas Dilger
2009-08-10  3:31               ` [PATCH 0/6] Patches to improve/fix e2freefrag Theodore Ts'o
2009-08-10  3:31               ` [PATCH 1/6] e2freefrag: Clarify e2freefrag's messages Theodore Ts'o
2009-08-10  3:31               ` [PATCH 2/6] e2freefrag: Do not print chunk-related information by default Theodore Ts'o
2009-08-10  3:31               ` [PATCH 3/6] e2freefrag: Fix to work correctly for file systems with 1kb block sizes Theodore Ts'o
2009-08-10  3:31               ` [PATCH 4/6] e2freefrag: Take into account the last free extent in the file system Theodore Ts'o
2009-08-10  3:31               ` [PATCH 5/6] Add V=1 support when linking e2freefrag in misc/Makefile.in Theodore Ts'o
2009-08-10  3:31               ` [PATCH 6/6] libext2fs: Treat uninitialized parts of bitmaps as unallocated Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090724004324.GB14052@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger@sun.com \
    --cc=cmm@us.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).