linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: "Lukáš Czerner" <lczerner@redhat.com>
Cc: linux-ext4@vger.kernel.org, gharm@google.com
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
Date: Mon, 25 Mar 2013 08:53:09 -0400	[thread overview]
Message-ID: <20130325125309.GE26792@thunk.org> (raw)
In-Reply-To: <alpine.LFD.2.00.1303251051460.23176@localhost>

On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote:
> 
> Sorry for being dense, but I am trying to understand why this is so
> bad and what is the "expected" column there.
> 
> The physical offset of each extent bellow starts on the start of the
> block group and it seems to me that it's perfectly aligned for every
> power of two up to the block group size.

Yes, but the logical offset isn't aligned.  Consider the simplest
workload, which is where we are writing the 1GB file sequentially.
Let's assume that the raid stripe size is 8M.  So ideally, we would
want each write to be a multiple of 8M, starting at logical block 0.

But look what happens here:

> > File size of 1 is 1073741824 (262144 blocks of 4096 bytes)
> >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> >    0:        0..   32766:     458752..    491518:  32767:             unwritten
> >    1:    32767..   65533:     491520..    524286:  32767:     491519: unwritten
> >    2:    65534..   98300:     589824..    622590:  32767:     524287: unwritten

If we do 8M writes, then we would want to write in chunks of 2048
blocks.  So consider what happens when we write the 2048 block chunk
starting with logical block 30720.  The fact that there is a
discontinuity between logical blocks 32766 and 32767 means that we
will have to do a read-modify-write cycle for that particular RAID
stripe.

Does that make more sense?

Another reason why keeping the file as physically contiguous as
possible is because we can now extent caching using the extent status
tree.  So if we can allocate the file using 2 physically contiguous
extents in instead of 9 or 10 physically contiguous extents, it means
the extent status tree uses less memory, too.  For a 1GB file, that
might not make that much difference, but if we caching 2048 of these
1G files (on a 2TB disk, for example), keeping the files as physically
contiguous as possible means we can cache the logical to physical
block mapping of all of these files much more easily.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2013-03-25 12:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:50 [PATCH] ext4: Do not normalize request from fallocate Lukas Czerner
2013-03-21 16:03 ` Dmitry Monakhov
2013-03-22 17:10   ` Greg Harmon
2013-03-22 19:36     ` Theodore Ts'o
     [not found]     ` <514cb91d.8a48340a.33fd.ffff9fa3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-03-22 22:19       ` Greg Harmon
2013-03-24  0:11 ` Theodore Ts'o
2013-03-24  2:42   ` Andreas Dilger
2013-03-25 10:09   ` Lukáš Czerner
2013-03-25 12:53     ` Theodore Ts'o [this message]
2013-03-25 13:26       ` Lukáš Czerner
2013-03-25 14:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130325125309.GE26792@thunk.org \
    --to=tytso@mit.edu \
    --cc=gharm@google.com \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).