All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: Bron Gondwana <brong@fastmail.fm>
Cc: linux-ext4@vger.kernel.org, Rob Mueller <robm@fastmail.fm>
Subject: Re: fallocate creating fragmented files
Date: Wed, 30 Jan 2013 09:56:51 -0600	[thread overview]
Message-ID: <510942C3.1070503@redhat.com> (raw)
In-Reply-To: <1359527713.648.140661184334613.06CF38D4@webmail.messagingengine.com>

On 1/30/13 12:35 AM, Bron Gondwana wrote:
> On Wed, Jan 30, 2013, at 05:05 PM, Eric Sandeen wrote:
>> On 1/29/13 11:46 PM, Bron Gondwana wrote:
>>> Hi All,
>>>
>>> I'm trying to understand why my ext4 filesystem is creating highly fragmented files even though it's only just over 50% full.
>>
>> It's at least possible that freespace is very fragmented; you could try the "e2freefrag" command to see.
> 
> [brong@imap14 ~]$ e2freefrag /dev/md0
> Device: /dev/md0
> Blocksize: 1024 bytes
> Total blocks: 62522624
> Free blocks: 26483551 (42.4%)
> 
> Min. free extent: 1 KB 
> Max. free extent: 757 KB
> Avg. free extent: 14 KB
> Num. free extent: 1940838
> 
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range :  Free extents   Free Blocks  Percent
>     1K...    2K-  :        538480        538480    2.03%
>     2K...    4K-  :        362189        870860    3.29%
>     4K...    8K-  :        321158       1681591    6.35%
>     8K...   16K-  :        268848       2934959   11.08%
>    16K...   32K-  :        210746       4697440   17.74%
>    32K...   64K-  :        151755       6738418   25.44%
>    64K...  128K-  :         63761       5512870   20.82%
>   128K...  256K-  :         20563       3552580   13.41%
>   256K...  512K-  :          3308       1047995    3.96%
>   512K... 1024K-  :            30         17615    0.07%

Ok, TBH I'd not certain why the allocator is doing just what it's doing.
There are quite a lot of larger-than-3-block free spaces. OTOH, it might be
trying for some kind of locality.

I think it'd take some digging into the allocator behavior; there may 
be tracepoints that'd help.

-Eric

>>> Now looking at the verbose output, we can see that there are many extents of just 3 or 4 blocks:
>>>
>>> [brong@imap14 conf]$ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | head
>>>       2 
>>>       1 is
>>>       1 length
>>>       1 unwritten
>>>       6 3
>>>      10 4
>>>       6 5
>>>       5 6
>>>       3 7
>>>       1 8
>>
>> But longer extents too, right:
>>
>> $ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | tail
>>       1 162
>>       1 164
>>       1 179
>>       1 188
>>       1 215
>>       1 231
>>       1 233
>>       1 255
>>       1 322
>>       1 357
>>
>>> Yet looking at the next file,
>>>
>>> [brong@imap14 conf]$ filefrag -v testfile2 | awk '{print $5}' | sort -n | uniq -c | tail
>>>       1 173
>>>       1 175
>>>       1 178
>>>       1 184
>>>       1 187
>>>       1 189
>>>       1 194
>>>       1 289
>>>       1 321
>>>       1 330
>>>
>>
>> and presumably shorter extents at the beginning?
> 
> Well, that's sorted.  Yes, there were shorter extents too.
> 
>> So it sounds like both files are a mix of long & short extents.
> 
> Definitely. 
> 
>>> There are multiple extents of hundreds of blocks in length.  Why weren't they used in allocating the first file?
>>
>> I'm not sure, offhand.  But just to be clear, while contiguous allocations are usually a nice side-effect of fallocate, nothing at all guarantees it.  It only guarantees that you'll have that space available for future writes.
> 
> Sure.  I was hoping it would help though!
> 
>> Still, it'd be interesting to figure out why the allocator is behaving this way.
>> It'd be interesting to see the freefrag info, the allocator might really be in scavenger mode.
> 
> What do you think from the output above.  Is that reasonable?  I'll check a more recently set-up machine.
> 
> [brong@imap30 ~]$ e2freefrag /dev/sdf1
> Device: /dev/sdf1
> Blocksize: 1024 bytes
> 
> Total blocks: 97124320
> Free blocks: 68429391 (70.5%)
> 
> Min. free extent: 1 KB 
> Max. free extent: 1009 KB
> Avg. free extent: 25 KB
> Num. free extent: 2781696
> 
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range :  Free extents   Free Blocks  Percent
>     1K...    2K-  :        705257        705257    1.03%
>     2K...    4K-  :        553577       1348712    1.97%
>     4K...    8K-  :        349406       1789755    2.62%
>     8K...   16K-  :        289102       3185026    4.65%
>    16K...   32K-  :        279061       6307452    9.22%
>    32K...   64K-  :        271631      12321046   18.01%
>    64K...  128K-  :        205191      18340308   26.80%
>   128K...  256K-  :        110082      19121199   27.94%
>   256K...  512K-  :         16962       5584384    8.16%
>   512K... 1024K-  :          1427        882388    1.29%
> 
> This one is 100Gb SSDs from some other vendor (can't remember which) on hardware RAID1.  It's never been more than about 30% full.  It looks like a similar histogram of extent sizes.  Again it's a 1kb block size (piles of small files on these filesystems)
> 
> [brong@imap30 ~]$ dumpe2fs -h /dev/sdf1
> dumpe2fs 1.42.4 (12-Jun-2012)
> Filesystem volume name:   ssd30
> Last mounted on:          /mnt/ssd30
> Filesystem UUID:          c2623b6a-b3f4-4a5a-99e3-495f29112ba6
> Filesystem magic number:  0xEF53
> Filesystem revision #:    1 (dynamic)
> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags:         signed_directory_hash 
> Default mount options:    (none)
> Filesystem state:         clean
> Errors behavior:          Continue
> Filesystem OS type:       Linux
> Inode count:              12140544
> Block count:              97124320
> Reserved block count:     4856216
> Free blocks:              68429391
> Free inodes:              7157347
> First block:              1
> Block size:               1024
> Fragment size:            1024
> Reserved GDT blocks:      256
> Blocks per group:         8192
> Fragments per group:      8192
> Inodes per group:         1024
> Inode blocks per group:   256
> Flex block group size:    16
> Filesystem created:       Tue Aug  2 07:39:40 2011
> Last mount time:          Thu Jan 24 23:15:41 2013
> Last write time:          Thu Jan 24 23:15:41 2013
> Mount count:              10
> Maximum mount count:      39
> Last checked:             Tue Aug  2 07:39:40 2011
> Check interval:           15552000 (6 months)
> Next check after:         Sun Jan 29 06:39:40 2012
> Lifetime writes:          13 TB
> Reserved blocks uid:      0 (user root)
> Reserved blocks gid:      0 (group root)
> First inode:              11
> Inode size:	          256
> Required extra isize:     28
> Desired extra isize:      28
> Journal inode:            8
> Default directory hash:   half_md4
> Directory Hash Seed:      0ecbfe75-57e3-4d4e-b4a8-bf0114dc0997
> Journal backup:           inode blocks
> Journal features:         journal_incompat_revoke
> Journal size:             32M
> Journal length:           32768
> Journal sequence:         0x32367a0d
> Journal start:            1537
> 
> Regards,
> 
> Bron.
> 


  reply	other threads:[~2013-01-30 15:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-30  5:46 fallocate creating fragmented files Bron Gondwana
2013-01-30  6:05 ` Eric Sandeen
2013-01-30  6:35   ` Bron Gondwana
2013-01-30 15:56     ` Eric Sandeen [this message]
2013-01-30 20:14       ` Theodore Ts'o
2013-01-30 21:21         ` Robert Mueller
2013-01-30 21:43           ` Theodore Ts'o
2013-01-30 22:40             ` Bron Gondwana
2013-01-30 22:49               ` Robert Mueller
2013-01-30 22:51             ` Robert Mueller
2013-02-01 11:33               ` Bron Gondwana
2013-02-01 13:55                 ` Theodore Ts'o
2013-02-02 10:50                   ` Bron Gondwana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510942C3.1070503@redhat.com \
    --to=sandeen@redhat.com \
    --cc=brong@fastmail.fm \
    --cc=linux-ext4@vger.kernel.org \
    --cc=robm@fastmail.fm \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.