All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Lukáš Czerner" <lczerner@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Lukáš Czerner" <lczerner@redhat.com>,
	linux-ext4@vger.kernel.org, gharm@google.com
Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
Date: Mon, 25 Mar 2013 14:26:54 +0100 (CET)	[thread overview]
Message-ID: <alpine.LFD.2.00.1303251420480.23176@localhost> (raw)
In-Reply-To: <20130325125309.GE26792@thunk.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2629 bytes --]

On Mon, 25 Mar 2013, Theodore Ts'o wrote:

> Date: Mon, 25 Mar 2013 08:53:09 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org, gharm@google.com
> Subject: Re: [PATCH] ext4: Do not normalize request from fallocate
> 
> On Mon, Mar 25, 2013 at 11:09:35AM +0100, Lukáš Czerner wrote:
> > 
> > Sorry for being dense, but I am trying to understand why this is so
> > bad and what is the "expected" column there.
> > 
> > The physical offset of each extent bellow starts on the start of the
> > block group and it seems to me that it's perfectly aligned for every
> > power of two up to the block group size.
> 
> Yes, but the logical offset isn't aligned.  Consider the simplest
> workload, which is where we are writing the 1GB file sequentially.
> Let's assume that the raid stripe size is 8M.  So ideally, we would
> want each write to be a multiple of 8M, starting at logical block 0.
> 
> But look what happens here:
> 
> > > File size of 1 is 1073741824 (262144 blocks of 4096 bytes)
> > >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> > >    0:        0..   32766:     458752..    491518:  32767:             unwritten
> > >    1:    32767..   65533:     491520..    524286:  32767:     491519: unwritten
> > >    2:    65534..   98300:     589824..    622590:  32767:     524287: unwritten
> 
> If we do 8M writes, then we would want to write in chunks of 2048
> blocks.  So consider what happens when we write the 2048 block chunk
> starting with logical block 30720.  The fact that there is a
> discontinuity between logical blocks 32766 and 32767 means that we
> will have to do a read-modify-write cycle for that particular RAID
> stripe.
> 
> Does that make more sense?

Oh, now I get it :) Thanks a lot for explanation I kept thinking
about the physical layout and forgot that the logical is actually
misaligned.

> 
> Another reason why keeping the file as physically contiguous as
> possible is because we can now extent caching using the extent status
> tree.  So if we can allocate the file using 2 physically contiguous
> extents in instead of 9 or 10 physically contiguous extents, it means
> the extent status tree uses less memory, too.  For a 1GB file, that
> might not make that much difference, but if we caching 2048 of these
> 1G files (on a 2TB disk, for example), keeping the files as physically
> contiguous as possible means we can cache the logical to physical
> block mapping of all of these files much more easily.

Yes, that makes sense too.

> 
> Regards,
> 
> 						- Ted
> 

Thanks!
-Lukas

  reply	other threads:[~2013-03-25 13:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-21 15:50 [PATCH] ext4: Do not normalize request from fallocate Lukas Czerner
2013-03-21 16:03 ` Dmitry Monakhov
2013-03-22 17:10   ` Greg Harmon
2013-03-22 19:36     ` Theodore Ts'o
     [not found]     ` <514cb91d.8a48340a.33fd.ffff9fa3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-03-22 22:19       ` Greg Harmon
2013-03-24  0:11 ` Theodore Ts'o
2013-03-24  2:42   ` Andreas Dilger
2013-03-25 10:09   ` Lukáš Czerner
2013-03-25 12:53     ` Theodore Ts'o
2013-03-25 13:26       ` Lukáš Czerner [this message]
2013-03-25 14:44         ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1303251420480.23176@localhost \
    --to=lczerner@redhat.com \
    --cc=gharm@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.