From: Eric Sandeen <sandeen@redhat.com>
To: Bron Gondwana <brong@fastmail.fm>
Cc: linux-ext4@vger.kernel.org, Rob Mueller <robm@fastmail.fm>
Subject: Re: fallocate creating fragmented files
Date: Wed, 30 Jan 2013 09:56:51 -0600 [thread overview]
Message-ID: <510942C3.1070503@redhat.com> (raw)
In-Reply-To: <1359527713.648.140661184334613.06CF38D4@webmail.messagingengine.com>
On 1/30/13 12:35 AM, Bron Gondwana wrote:
> On Wed, Jan 30, 2013, at 05:05 PM, Eric Sandeen wrote:
>> On 1/29/13 11:46 PM, Bron Gondwana wrote:
>>> Hi All,
>>>
>>> I'm trying to understand why my ext4 filesystem is creating highly fragmented files even though it's only just over 50% full.
>>
>> It's at least possible that freespace is very fragmented; you could try the "e2freefrag" command to see.
>
> [brong@imap14 ~]$ e2freefrag /dev/md0
> Device: /dev/md0
> Blocksize: 1024 bytes
> Total blocks: 62522624
> Free blocks: 26483551 (42.4%)
>
> Min. free extent: 1 KB
> Max. free extent: 757 KB
> Avg. free extent: 14 KB
> Num. free extent: 1940838
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 1K... 2K- : 538480 538480 2.03%
> 2K... 4K- : 362189 870860 3.29%
> 4K... 8K- : 321158 1681591 6.35%
> 8K... 16K- : 268848 2934959 11.08%
> 16K... 32K- : 210746 4697440 17.74%
> 32K... 64K- : 151755 6738418 25.44%
> 64K... 128K- : 63761 5512870 20.82%
> 128K... 256K- : 20563 3552580 13.41%
> 256K... 512K- : 3308 1047995 3.96%
> 512K... 1024K- : 30 17615 0.07%
Ok, TBH I'd not certain why the allocator is doing just what it's doing.
There are quite a lot of larger-than-3-block free spaces. OTOH, it might be
trying for some kind of locality.
I think it'd take some digging into the allocator behavior; there may
be tracepoints that'd help.
-Eric
>>> Now looking at the verbose output, we can see that there are many extents of just 3 or 4 blocks:
>>>
>>> [brong@imap14 conf]$ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | head
>>> 2
>>> 1 is
>>> 1 length
>>> 1 unwritten
>>> 6 3
>>> 10 4
>>> 6 5
>>> 5 6
>>> 3 7
>>> 1 8
>>
>> But longer extents too, right:
>>
>> $ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | tail
>> 1 162
>> 1 164
>> 1 179
>> 1 188
>> 1 215
>> 1 231
>> 1 233
>> 1 255
>> 1 322
>> 1 357
>>
>>> Yet looking at the next file,
>>>
>>> [brong@imap14 conf]$ filefrag -v testfile2 | awk '{print $5}' | sort -n | uniq -c | tail
>>> 1 173
>>> 1 175
>>> 1 178
>>> 1 184
>>> 1 187
>>> 1 189
>>> 1 194
>>> 1 289
>>> 1 321
>>> 1 330
>>>
>>
>> and presumably shorter extents at the beginning?
>
> Well, that's sorted. Yes, there were shorter extents too.
>
>> So it sounds like both files are a mix of long & short extents.
>
> Definitely.
>
>>> There are multiple extents of hundreds of blocks in length. Why weren't they used in allocating the first file?
>>
>> I'm not sure, offhand. But just to be clear, while contiguous allocations are usually a nice side-effect of fallocate, nothing at all guarantees it. It only guarantees that you'll have that space available for future writes.
>
> Sure. I was hoping it would help though!
>
>> Still, it'd be interesting to figure out why the allocator is behaving this way.
>> It'd be interesting to see the freefrag info, the allocator might really be in scavenger mode.
>
> What do you think from the output above. Is that reasonable? I'll check a more recently set-up machine.
>
> [brong@imap30 ~]$ e2freefrag /dev/sdf1
> Device: /dev/sdf1
> Blocksize: 1024 bytes
>
> Total blocks: 97124320
> Free blocks: 68429391 (70.5%)
>
> Min. free extent: 1 KB
> Max. free extent: 1009 KB
> Avg. free extent: 25 KB
> Num. free extent: 2781696
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 1K... 2K- : 705257 705257 1.03%
> 2K... 4K- : 553577 1348712 1.97%
> 4K... 8K- : 349406 1789755 2.62%
> 8K... 16K- : 289102 3185026 4.65%
> 16K... 32K- : 279061 6307452 9.22%
> 32K... 64K- : 271631 12321046 18.01%
> 64K... 128K- : 205191 18340308 26.80%
> 128K... 256K- : 110082 19121199 27.94%
> 256K... 512K- : 16962 5584384 8.16%
> 512K... 1024K- : 1427 882388 1.29%
>
> This one is 100Gb SSDs from some other vendor (can't remember which) on hardware RAID1. It's never been more than about 30% full. It looks like a similar histogram of extent sizes. Again it's a 1kb block size (piles of small files on these filesystems)
>
> [brong@imap30 ~]$ dumpe2fs -h /dev/sdf1
> dumpe2fs 1.42.4 (12-Jun-2012)
> Filesystem volume name: ssd30
> Last mounted on: /mnt/ssd30
> Filesystem UUID: c2623b6a-b3f4-4a5a-99e3-495f29112ba6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 12140544
> Block count: 97124320
> Reserved block count: 4856216
> Free blocks: 68429391
> Free inodes: 7157347
> First block: 1
> Block size: 1024
> Fragment size: 1024
> Reserved GDT blocks: 256
> Blocks per group: 8192
> Fragments per group: 8192
> Inodes per group: 1024
> Inode blocks per group: 256
> Flex block group size: 16
> Filesystem created: Tue Aug 2 07:39:40 2011
> Last mount time: Thu Jan 24 23:15:41 2013
> Last write time: Thu Jan 24 23:15:41 2013
> Mount count: 10
> Maximum mount count: 39
> Last checked: Tue Aug 2 07:39:40 2011
> Check interval: 15552000 (6 months)
> Next check after: Sun Jan 29 06:39:40 2012
> Lifetime writes: 13 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 0ecbfe75-57e3-4d4e-b4a8-bf0114dc0997
> Journal backup: inode blocks
> Journal features: journal_incompat_revoke
> Journal size: 32M
> Journal length: 32768
> Journal sequence: 0x32367a0d
> Journal start: 1537
>
> Regards,
>
> Bron.
>
next prev parent reply other threads:[~2013-01-30 15:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-30 5:46 fallocate creating fragmented files Bron Gondwana
2013-01-30 6:05 ` Eric Sandeen
2013-01-30 6:35 ` Bron Gondwana
2013-01-30 15:56 ` Eric Sandeen [this message]
2013-01-30 20:14 ` Theodore Ts'o
2013-01-30 21:21 ` Robert Mueller
2013-01-30 21:43 ` Theodore Ts'o
2013-01-30 22:40 ` Bron Gondwana
2013-01-30 22:49 ` Robert Mueller
2013-01-30 22:51 ` Robert Mueller
2013-02-01 11:33 ` Bron Gondwana
2013-02-01 13:55 ` Theodore Ts'o
2013-02-02 10:50 ` Bron Gondwana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=510942C3.1070503@redhat.com \
--to=sandeen@redhat.com \
--cc=brong@fastmail.fm \
--cc=linux-ext4@vger.kernel.org \
--cc=robm@fastmail.fm \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.