linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Lukáš Czerner" <lczerner@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Lukáš Czerner" <lczerner@redhat.com>,
	linux-ext4@vger.kernel.org, gharm@google.com
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
Date: Mon, 25 Mar 2013 14:26:54 +0100 (CET)	[thread overview]
Message-ID: <alpine.LFD.2.00.1303251420480.23176@localhost> (raw)
In-Reply-To: <20130325125309.GE26792@thunk.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2629 bytes --]

On Mon, 25 Mar 2013, Theodore Ts'o wrote:

> Date: Mon, 25 Mar 2013 08:53:09 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org, gharm@google.com
> Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
> 
> On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote:
> > 
> > Sorry for being dense, but I am trying to understand why this is so
> > bad and what is the "expected" column there.
> > 
> > The physical offset of each extent bellow starts on the start of the
> > block group and it seems to me that it's perfectly aligned for every
> > power of two up to the block group size.
> 
> Yes, but the logical offset isn't aligned.  Consider the simplest
> workload, which is where we are writing the 1GB file sequentially.
> Let's assume that the raid stripe size is 8M.  So ideally, we would
> want each write to be a multiple of 8M, starting at logical block 0.
> 
> But look what happens here:
> 
> > > File size of 1 is 1073741824 (262144 blocks of 4096 bytes)
> > >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> > >    0:        0..   32766:     458752..    491518:  32767:             unwritten
> > >    1:    32767..   65533:     491520..    524286:  32767:     491519: unwritten
> > >    2:    65534..   98300:     589824..    622590:  32767:     524287: unwritten
> 
> If we do 8M writes, then we would want to write in chunks of 2048
> blocks.  So consider what happens when we write the 2048 block chunk
> starting with logical block 30720.  The fact that there is a
> discontinuity between logical blocks 32766 and 32767 means that we
> will have to do a read-modify-write cycle for that particular RAID
> stripe.
> 
> Does that make more sense?

Oh, now I get it :) Thanks a lot for explanation I kept thinking
about the physical layout and forgot that the logical is actually
misaligned.

> 
> Another reason why keeping the file as physically contiguous as
> possible is because we can now extent caching using the extent status
> tree.  So if we can allocate the file using 2 physically contiguous
> extents in instead of 9 or 10 physically contiguous extents, it means
> the extent status tree uses less memory, too.  For a 1GB file, that
> might not make that much difference, but if we caching 2048 of these
> 1G files (on a 2TB disk, for example), keeping the files as physically
> contiguous as possible means we can cache the logical to physical
> block mapping of all of these files much more easily.

Yes, that makes sense too.

> 
> Regards,
> 
> 						- Ted
> 

Thanks!
-Lukas

  reply	other threads:[~2013-03-25 13:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:50 [PATCH] ext4: Do not normalize request from fallocate Lukas Czerner
2013-03-21 16:03 ` Dmitry Monakhov
2013-03-22 17:10   ` Greg Harmon
2013-03-22 19:36     ` Theodore Ts'o
     [not found]     ` <514cb91d.8a48340a.33fd.ffff9fa3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-03-22 22:19       ` Greg Harmon
2013-03-24  0:11 ` Theodore Ts'o
2013-03-24  2:42   ` Andreas Dilger
2013-03-25 10:09   ` Lukáš Czerner
2013-03-25 12:53     ` Theodore Ts'o
2013-03-25 13:26       ` Lukáš Czerner [this message]
2013-03-25 14:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1303251420480.23176@localhost \
    --to=lczerner@redhat.com \
    --cc=gharm@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).