linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: Andreas Dilger <adilger@dilger.ca>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: ext4: first write to large ext3 filesystem takes 96 seconds
Date: Thu, 31 Jul 2014 09:03:32 -0400	[thread overview]
Message-ID: <20140731130332.GB1566@thunk.org> (raw)
In-Reply-To: <20140730144928.GA10295@kvack.org>

On Wed, Jul 30, 2014 at 10:49:28AM -0400, Benjamin LaHaise wrote:
> This seems like a pretty serious regression relative to ext3.  Why can't 
> ext4's mballoc pick better block groups to attempt allocating from based 
> on the free block counts in the block group summaries?

Allocation algorithms are *always* tradeoffs.  So I don't think
regression is necessarily the best way to think about things.
Unfortuntaely, your use case really doesn't work well with how we have
set things up with ext4 now.  Sure, if you your specific use case is
one where you are mostly allocating 8MB files, then we can add a
special case where if you are allocating 32768 blocks, we should
search for block groups that have 32768 blocks free.  And if that's
what you are asking for, we can certainly do that.

The problem is that free block counts don't work well in general.  If
I see that the free block count is 2048 blocks, that doesn't tell me
the free blocks are in a contiguous single chunk of 2048 blocks, or
2048 single block items.  (We do actually pay attention to free
blocks, by the way, but it's in a nuanced way.)

If the only goal you have is fast block allocation after fail over,
you can always use the VFAT block allocation --- i.e., use the first
free block in the file system.  Unfortunately, it will result in a
very badly fragmented file system, as Microsoft and its users
discovered.

I'm sure that are things we could do that would make things better for
your workload (if you want to tell us in great detail exactly what the
file/block allocation patterns are for your workload), and perhaps
even better in general, but the challenge is making sure we don't
regress for other workloads --- and this includes long-term
fragmentation resistance.  This is a hard problem.  Kvetching about
how it's so horrible just for you isn't really helpful for solving it.

(BTW, one of the problems is that ext4_mb_normalize_request caps large
allocations so that we use the same goal length for multiple passes as
we search for good block groups.  We might want to use the original
goal length --- so long as it is less than 32768 blocks --- for the
first scan, or at least for goal lengths which are powers of two.  So
if your application is regularly allocating files which are exactly
8MB, there are probably some optimizations that we could apply.  But
if they aren't exactly 8MB, life gets a bit trickier.)

Regards,

						- Ted

  reply	other threads:[~2014-07-31 13:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 21:13 ext4: first write to large ext3 filesystem takes 96 seconds Benjamin LaHaise
2014-07-08  0:16 ` Theodore Ts'o
2014-07-08  1:35   ` Benjamin LaHaise
2014-07-08  3:54     ` Theodore Ts'o
2014-07-08 14:53       ` Benjamin LaHaise
2014-07-08  5:11   ` Andreas Dilger
2014-07-30 14:49     ` Benjamin LaHaise
2014-07-31 13:03       ` Theodore Ts'o [this message]
2014-07-31 14:04         ` Benjamin LaHaise
2014-07-31 15:27           ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140731130332.GB1566@thunk.org \
    --to=tytso@mit.edu \
    --cc=adilger@dilger.ca \
    --cc=bcrl@kvack.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).