* Insane file system overhead on large volume
@ 2012-01-27 7:50 Manny
2012-01-27 10:44 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Manny @ 2012-01-27 7:50 UTC (permalink / raw)
To: xfs
Hi there,
I'm not sure if this is intended behavior, but I was a bit stumped
when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
6) with XFS and noticed that there were only 22 TB left. I just called
mkfs.xfs with default parameters - except for swith and sunit which
match the RAID setup.
Is it normal that I lost 8TB just for the file system? That's almost
30% of the volume. Should I set the block size higher? Or should I
increase the number of allocation groups? Would that make a
difference? Whats the preferred method for handling such large
volumes?
Thanks a lot,
Manny
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-27 7:50 Insane file system overhead on large volume Manny
@ 2012-01-27 10:44 ` Christoph Hellwig
2012-01-27 19:15 ` Manny
2012-01-27 18:21 ` Eric Sandeen
2012-01-27 19:08 ` Stan Hoeppner
2 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2012-01-27 10:44 UTC (permalink / raw)
To: Manny; +Cc: xfs
On Fri, Jan 27, 2012 at 08:50:38AM +0100, Manny wrote:
> Hi there,
>
> I'm not sure if this is intended behavior, but I was a bit stumped
> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
> 6) with XFS and noticed that there were only 22 TB left. I just called
> mkfs.xfs with default parameters - except for swith and sunit which
> match the RAID setup.
>
> Is it normal that I lost 8TB just for the file system? That's almost
> 30% of the volume. Should I set the block size higher? Or should I
> increase the number of allocation groups? Would that make a
> difference? Whats the preferred method for handling such large
> volumes?
Where did you get the sizes for the raw volume and the filesystem usage
from?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-27 7:50 Insane file system overhead on large volume Manny
2012-01-27 10:44 ` Christoph Hellwig
@ 2012-01-27 18:21 ` Eric Sandeen
2012-01-28 14:55 ` Martin Steigerwald
2012-01-27 19:08 ` Stan Hoeppner
2 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2012-01-27 18:21 UTC (permalink / raw)
To: Manny; +Cc: xfs
On 1/27/12 1:50 AM, Manny wrote:
> Hi there,
>
> I'm not sure if this is intended behavior, but I was a bit stumped
> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
> 6) with XFS and noticed that there were only 22 TB left. I just called
> mkfs.xfs with default parameters - except for swith and sunit which
> match the RAID setup.
>
> Is it normal that I lost 8TB just for the file system? That's almost
> 30% of the volume. Should I set the block size higher? Or should I
> increase the number of allocation groups? Would that make a
> difference? Whats the preferred method for handling such large
> volumes?
If it was 12x3TB I imagine you're confusing TB with TiB, so
perhaps your 30T is really only 27TiB to start with.
Anyway, fs metadata should not eat much space:
# mkfs.xfs -dfile,name=fsfile,size=30t
# ls -lh fsfile
-rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile
# mount -o loop fsfile mnt/
# df -h mnt
Filesystem Size Used Avail Use% Mounted on
/tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt
So Christoph's question was a good one; where are you getting
your sizes?
-Eric
> Thanks a lot,
> Manny
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-27 7:50 Insane file system overhead on large volume Manny
2012-01-27 10:44 ` Christoph Hellwig
2012-01-27 18:21 ` Eric Sandeen
@ 2012-01-27 19:08 ` Stan Hoeppner
2 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2012-01-27 19:08 UTC (permalink / raw)
To: xfs
On 1/27/2012 1:50 AM, Manny wrote:
> Hi there,
>
> I'm not sure if this is intended behavior, but I was a bit stumped
> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
> 6) with XFS and noticed that there were only 22 TB left. I just called
> mkfs.xfs with default parameters - except for swith and sunit which
> match the RAID setup.
>
> Is it normal that I lost 8TB just for the file system? That's almost
> 30% of the volume. Should I set the block size higher? Or should I
> increase the number of allocation groups? Would that make a
> difference? Whats the preferred method for handling such large
> volumes?
Maybe you simply assigned 2 spares and forgot, so you actually only have
10 RAID6 disks with 8 disks worth of stripe, equaling 24 TB, or 21.8
TiB. 21.8 TiB matches up pretty closely with your 22 TB, so this
scenario seems pretty plausible, dare I say likely.
If this is the case you'll want to reformat the 10 disk RAID6 with the
proper sunit/swidth values.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-27 10:44 ` Christoph Hellwig
@ 2012-01-27 19:15 ` Manny
0 siblings, 0 replies; 11+ messages in thread
From: Manny @ 2012-01-27 19:15 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
> Where did you get the sizes for the raw volume and the filesystem usage
> from?
Oh my god, you are so right. The raw volume was actually just 24TB. My
Raid controller decided to leave 6TB on the VDisk for a Snap pool.
Thanks for the hint, and sorry to bother you
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-27 18:21 ` Eric Sandeen
@ 2012-01-28 14:55 ` Martin Steigerwald
2012-01-28 15:35 ` Eric Sandeen
0 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2012-01-28 14:55 UTC (permalink / raw)
To: xfs; +Cc: Manny, Eric Sandeen
Am Freitag, 27. Januar 2012 schrieb Eric Sandeen:
> On 1/27/12 1:50 AM, Manny wrote:
> > Hi there,
> >
> > I'm not sure if this is intended behavior, but I was a bit stumped
> > when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
> > 6) with XFS and noticed that there were only 22 TB left. I just
> > called mkfs.xfs with default parameters - except for swith and sunit
> > which match the RAID setup.
> >
> > Is it normal that I lost 8TB just for the file system? That's almost
> > 30% of the volume. Should I set the block size higher? Or should I
> > increase the number of allocation groups? Would that make a
> > difference? Whats the preferred method for handling such large
> > volumes?
>
> If it was 12x3TB I imagine you're confusing TB with TiB, so
> perhaps your 30T is really only 27TiB to start with.
>
> Anyway, fs metadata should not eat much space:
>
> # mkfs.xfs -dfile,name=fsfile,size=30t
> # ls -lh fsfile
> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile
> # mount -o loop fsfile mnt/
> # df -h mnt
> Filesystem Size Used Avail Use% Mounted on
> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt
>
> So Christoph's question was a good one; where are you getting
> your sizes?
An academic question:
Why is it that I get
merkaba:/tmp> mkfs.xfs -dfile,name=fsfile,size=30t
meta-data=fsfile isize=256 agcount=30, agsize=268435455
blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=8053063650, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =Internes Protokoll bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =keine extsz=4096 blocks=0, rtextents=0
merkaba:/tmp> mount -o loop fsfile /mnt/zeit
merkaba:/tmp> df -hT /mnt/zeit
Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf
/dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
merkaba:/tmp> LANG=C df -hT /mnt/zeit
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
33MiB used on first mount instead of 5?
merkaba:/tmp> cat /proc/version
Linux version 3.2.0-1-amd64 (Debian 3.2.1-2) ([…]) (gcc version 4.6.2
(Debian 4.6.2-12) ) #1 SMP Tue Jan 24 05:01:45 UTC 2012
merkaba:/tmp> mkfs.xfs -V
mkfs.xfs Version 3.1.7
Maybe its due to me using a tmpfs for /tmp:
merkaba:/tmp> LANG=C df -hT .
Filesystem Type Size Used Avail Use% Mounted on
tmpfs tmpfs 2.0G 2.0G 6.6M 100% /tmp
Hmmm, but creating the file on Ext4 does not work:
merkaba:/home> LANG=C df -hT .
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/merkaba-home ext4 224G 202G 20G 92% /home
merkaba:/home> LANG=C mkfs.xfs -dfile,name=fsfile,size=30t
meta-data=fsfile isize=256 agcount=30, agsize=268435455
blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=8053063650, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mkfs.xfs: Growing the data section failed
fallocate instead of sparse file?
And on BTRFS as well as XFS it appears to try to create a 30T file for
real, i.e. by writing data - I stopped it before it could do too much
harm.
Where did you create that hugish XFS file?
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-28 14:55 ` Martin Steigerwald
@ 2012-01-28 15:35 ` Eric Sandeen
2012-01-28 16:05 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Eric Sandeen @ 2012-01-28 15:35 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Manny, xfs
On 1/28/12 8:55 AM, Martin Steigerwald wrote:
> Am Freitag, 27. Januar 2012 schrieb Eric Sandeen:
>> On 1/27/12 1:50 AM, Manny wrote:
>>> Hi there,
>>>
>>> I'm not sure if this is intended behavior, but I was a bit stumped
>>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
>>> 6) with XFS and noticed that there were only 22 TB left. I just
>>> called mkfs.xfs with default parameters - except for swith and sunit
>>> which match the RAID setup.
>>>
>>> Is it normal that I lost 8TB just for the file system? That's almost
>>> 30% of the volume. Should I set the block size higher? Or should I
>>> increase the number of allocation groups? Would that make a
>>> difference? Whats the preferred method for handling such large
>>> volumes?
>>
>> If it was 12x3TB I imagine you're confusing TB with TiB, so
>> perhaps your 30T is really only 27TiB to start with.
>>
>> Anyway, fs metadata should not eat much space:
>>
>> # mkfs.xfs -dfile,name=fsfile,size=30t
>> # ls -lh fsfile
>> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile
>> # mount -o loop fsfile mnt/
>> # df -h mnt
>> Filesystem Size Used Avail Use% Mounted on
>> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt
>>
>> So Christoph's question was a good one; where are you getting
>> your sizes?
To solve your original problem, can you answer the above question?
Adding your actual raid config output (/proc/mdstat maybe) would help
too.
> An academic question:
>
> Why is it that I get
>
> merkaba:/tmp> mkfs.xfs -dfile,name=fsfile,size=30t
> meta-data=fsfile isize=256 agcount=30, agsize=268435455
> blks
> = sectsz=512 attr=2, projid32bit=0
> data = bsize=4096 blocks=8053063650, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =Internes Protokoll bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =keine extsz=4096 blocks=0, rtextents=0
>
> merkaba:/tmp> mount -o loop fsfile /mnt/zeit
> merkaba:/tmp> df -hT /mnt/zeit
> Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf
> /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
> merkaba:/tmp> LANG=C df -hT /mnt/zeit
> Filesystem Type Size Used Avail Use% Mounted on
> /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
>
>
> 33MiB used on first mount instead of 5?
Not sure offhand, differences in xfsprogs version mkfs defaults perhaps.
...
> Hmmm, but creating the file on Ext4 does not work:
ext4 is not designed to handle very large files, so anything
above 16T will fail.
> fallocate instead of sparse file?
no, you just ran into file offset limits on ext4.
> And on BTRFS as well as XFS it appears to try to create a 30T file for
> real, i.e. by writing data - I stopped it before it could do too much
> harm.
Why do you say that it appears to create a 30T file for real? It
should not...
> Where did you create that hugish XFS file?
On XFS. Of course. :)
> Ciao,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-28 15:35 ` Eric Sandeen
@ 2012-01-28 16:05 ` Christoph Hellwig
2012-01-28 16:07 ` Eric Sandeen
2012-01-28 16:23 ` Martin Steigerwald
2 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2012-01-28 16:05 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Manny, xfs
Everyone calm done, Manny already replied and mentioned the problem.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-28 15:35 ` Eric Sandeen
2012-01-28 16:05 ` Christoph Hellwig
@ 2012-01-28 16:07 ` Eric Sandeen
2012-01-28 16:23 ` Martin Steigerwald
2 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2012-01-28 16:07 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Manny, xfs
On 1/28/12 9:35 AM, Eric Sandeen wrote:
> On 1/28/12 8:55 AM, Martin Steigerwald wrote:
>> Am Freitag, 27. Januar 2012 schrieb Eric Sandeen:
...
>>> So Christoph's question was a good one; where are you getting
>>> your sizes?
>
> To solve your original problem, can you answer the above question?
> Adding your actual raid config output (/proc/mdstat maybe) would help
> too.
Sorry, nevermind. I missed the earlier reply about solving the problem and
confused the responders. Argh.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-28 15:35 ` Eric Sandeen
2012-01-28 16:05 ` Christoph Hellwig
2012-01-28 16:07 ` Eric Sandeen
@ 2012-01-28 16:23 ` Martin Steigerwald
2012-01-29 22:18 ` Dave Chinner
2 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2012-01-28 16:23 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Manny, xfs
Am Samstag, 28. Januar 2012 schrieb Eric Sandeen:
> On 1/28/12 8:55 AM, Martin Steigerwald wrote:
> > Am Freitag, 27. Januar 2012 schrieb Eric Sandeen:
> >> On 1/27/12 1:50 AM, Manny wrote:
> >>> Hi there,
> >>>
> >>> I'm not sure if this is intended behavior, but I was a bit stumped
> >>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in
> >>> RAID 6) with XFS and noticed that there were only 22 TB left. I
> >>> just called mkfs.xfs with default parameters - except for swith
> >>> and sunit which match the RAID setup.
> >>>
> >>> Is it normal that I lost 8TB just for the file system? That's
> >>> almost 30% of the volume. Should I set the block size higher? Or
> >>> should I increase the number of allocation groups? Would that make
> >>> a difference? Whats the preferred method for handling such large
> >>> volumes?
> >>
> >> If it was 12x3TB I imagine you're confusing TB with TiB, so
> >> perhaps your 30T is really only 27TiB to start with.
> >>
> >> Anyway, fs metadata should not eat much space:
> >>
> >> # mkfs.xfs -dfile,name=fsfile,size=30t
> >> # ls -lh fsfile
> >> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile
> >> # mount -o loop fsfile mnt/
> >> # df -h mnt
> >> Filesystem Size Used Avail Use% Mounted on
> >> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt
> >>
> >> So Christoph's question was a good one; where are you getting
> >> your sizes?
>
> To solve your original problem, can you answer the above question?
> Adding your actual raid config output (/proc/mdstat maybe) would help
> too.
Eric, I wrote
> > An academic question:
to make clear that it was just something I was curious about.
I was not the reporter of the problem anyway, I have no problem,
the reporter has no problem, see his answer, so all is good ;)
With your hint and some thinking / testing through it I was able to
resolve most of my other questions. Thanks.
For the gory details:
> > Why is it that I get
[…]
> > merkaba:/tmp> LANG=C df -hT /mnt/zeit
> > Filesystem Type Size Used Avail Use% Mounted on
> > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
> >
> >
> > 33MiB used on first mount instead of 5?
>
> Not sure offhand, differences in xfsprogs version mkfs defaults
> perhaps.
Okay, thats fine with me. I was just curious. It doesn´t matter much.
> > Hmmm, but creating the file on Ext4 does not work:
> ext4 is not designed to handle very large files, so anything
> above 16T will fail.
>
> > fallocate instead of sparse file?
>
> no, you just ran into file offset limits on ext4.
Oh, yes. Completely forgot about these Ext4 limits. Sorry.
> > And on BTRFS as well as XFS it appears to try to create a 30T file
> > for real, i.e. by writing data - I stopped it before it could do too
> > much harm.
>
> Why do you say that it appears to create a 30T file for real? It
> should not...
I jumped to a conclusion too quickly. It did do a I/O storm onto the
Intel SSD 320:
martin@merkaba:~> vmstat -S M 1 (not applied to bi/bo)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 1630 4365 87 1087 0 0 101 53 7 81 5 2 93 0
1 0 1630 4365 87 1087 0 0 0 0 428 769 1 0 99 0
2 0 1630 4365 87 1087 0 0 0 0 426 740 1 1 99 0
0 0 1630 4358 87 1088 0 0 0 0 1165 2297 4 7 89 0
0 0 1630 4357 87 1088 0 0 0 40 1736 3434 8 6 86 0
0 0 1630 4357 87 1088 0 0 0 0 614 1121 3 1 96 0
0 0 1630 4357 87 1088 0 0 0 32 359 636 0 0 100 0
1 1 1630 3852 87 1585 0 0 13 81540 529 1045 1 7 91 1
0 3 1630 3398 87 2027 0 0 0 227940 1357 2764 0 9 54 37
4 3 1630 3225 87 2188 0 0 0 212004 2346 4796 5 6 41 49
1 3 1630 2992 87 2415 0 0 0 215608 1825 3821 1 6 42 50
0 2 1630 2820 87 2582 0 0 0 200492 1476 3089 3 6 49 41
1 1 1630 2569 87 2832 0 0 0 198156 1250 2508 0 6 59 34
0 2 1630 2386 87 3009 0 0 0 229896 1301 2611 1 6 56 37
0 2 1630 2266 87 3126 0 0 0 302876 1067 2093 0 5 62 33
1 3 1630 2266 87 3126 0 0 0 176092 723 1321 0 3 71 26
0 3 1630 2266 87 3126 0 0 0 163840 706 1351 0 1 74 25
0 1 1630 2266 87 3126 0 0 0 80104 3137 6228 1 4 69 26
0 0 1630 2267 87 3126 0 0 0 3 3505 7035 6 3 86 5
0 0 1630 2266 87 3126 0 0 0 0 631 1203 4 1 95 0
0 0 1630 2259 87 3127 0 0 0 0 715 1398 4 2 94 0
2 0 1630 2259 87 3127 0 0 0 0 1501 3087 10 3 86 0
0 0 1630 2259 87 3127 0 0 0 27 945 1883 5 2 93 0
0 0 1630 2259 87 3127 0 0 0 0 399 713 1 0 99 0
^C
But then stopped. Thus mkfs.xfs was just writing metadata it seems
and I didn´t see this in the tmpfs obviously.
But when I review it, creating a 30TB XFS filesystem should involve writing
some metadata at different places of the file.
I get:
merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile
fsfile:
0: [0..255]: 96..351
1: [256..2147483639]: hole
2: [2147483640..2147483671]: 3400032..3400063
3: [2147483672..4294967279]: hole
4: [4294967280..4294967311]: 3400064..3400095
5: [4294967312..6442450919]: hole
6: [6442450920..6442450951]: 3400096..3400127
7: [6442450952..8589934559]: hole
8: [8589934560..8589934591]: 3400128..3400159
9: [8589934592..10737418199]: hole
10: [10737418200..10737418231]: 3400160..3400191
11: [10737418232..12884901839]: hole
12: [12884901840..12884901871]: 3400192..3400223
13: [12884901872..15032385479]: hole
14: [15032385480..15032385511]: 3400224..3400255
15: [15032385512..17179869119]: hole
16: [17179869120..17179869151]: 3400256..3400287
17: [17179869152..19327352759]: hole
18: [19327352760..19327352791]: 3400296..3400327
19: [19327352792..21474836399]: hole
20: [21474836400..21474836431]: 3400328..3400359
21: [21474836432..23622320039]: hole
22: [23622320040..23622320071]: 3400360..3400391
23: [23622320072..25769803679]: hole
24: [25769803680..25769803711]: 3400392..3400423
25: [25769803712..27917287319]: hole
26: [27917287320..27917287351]: 3400424..3400455
27: [27917287352..30064770959]: hole
28: [30064770960..30064770991]: 3400456..3400487
29: [30064770992..32212254599]: hole
30: [32212254600..32212254631]: 3400488..3400519
31: [32212254632..32215654311]: 352..3400031
32: [32215654312..32216428455]: 3400520..4174663
33: [32216428456..34359738239]: hole
34: [34359738240..34359738271]: 4174664..4174695
35: [34359738272..36507221879]: hole
36: [36507221880..36507221911]: 4174696..4174727
37: [36507221912..38654705519]: hole
38: [38654705520..38654705551]: 4174728..4174759
39: [38654705552..40802189159]: hole
40: [40802189160..40802189191]: 4174760..4174791
41: [40802189192..42949672799]: hole
42: [42949672800..42949672831]: 4174792..4174823
43: [42949672832..45097156439]: hole
44: [45097156440..45097156471]: 4174824..4174855
45: [45097156472..47244640079]: hole
46: [47244640080..47244640111]: 4174856..4174887
47: [47244640112..49392123719]: hole
48: [49392123720..49392123751]: 4174888..4174919
49: [49392123752..51539607359]: hole
50: [51539607360..51539607391]: 4174920..4174951
51: [51539607392..53687090999]: hole
52: [53687091000..53687091031]: 4174952..4174983
53: [53687091032..55834574639]: hole
54: [55834574640..55834574671]: 4174984..4175015
55: [55834574672..57982058279]: hole
56: [57982058280..57982058311]: 4175016..4175047
57: [57982058312..60129541919]: hole
58: [60129541920..60129541951]: 4175048..4175079
59: [60129541952..62277025559]: hole
60: [62277025560..62277025591]: 4175080..4175111
61: [62277025592..64424509191]: hole
62: [64424509192..64424509199]: 4175112..4175119
Okay, it needed to write 2 GB:
merkaba:/mnt/zeit> du -h fsfile
2,0G fsfile
merkaba:/mnt/zeit> du --apparent-size -h fsfile
30T fsfile
merkaba:/mnt/zeit>
I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it
for a 30 TB filesystem I find this reasonable.
Still it has 33 MiB for metadata:
merkaba:/mnt/zeit> mkdir bigfilefs
merkaba:/mnt/zeit> mount -o loop fsfile bigfilefs
merkaba:/mnt/zeit> LANG=C df -hT bigfilefs
Filesystem Type Size Used Avail Use% Mounted on
/dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit/bigfilefs
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume
2012-01-28 16:23 ` Martin Steigerwald
@ 2012-01-29 22:18 ` Dave Chinner
0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2012-01-29 22:18 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Manny, Eric Sandeen, xfs
On Sat, Jan 28, 2012 at 05:23:42PM +0100, Martin Steigerwald wrote:
> Am Samstag, 28. Januar 2012 schrieb Eric Sandeen:
> > On 1/28/12 8:55 AM, Martin Steigerwald wrote:
> For the gory details:
>
> > > Why is it that I get
> […]
> > > merkaba:/tmp> LANG=C df -hT /mnt/zeit
> > > Filesystem Type Size Used Avail Use% Mounted on
> > > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit
> > >
> > >
> > > 33MiB used on first mount instead of 5?
> >
> > Not sure offhand, differences in xfsprogs version mkfs defaults
> > perhaps.
>
> Okay, thats fine with me. I was just curious. It doesn´t matter much.
More likely the kernel. Older kernels only use 1024 blocks for
the reserve block pool, while more recent ones use 8192 blocks.
$ gl -n 1 8babd8a
commit 8babd8a2e75cccff3167a61176c2a3e977e13799
Author: Dave Chinner <david@fromorbit.com>
Date: Thu Mar 4 01:46:25 2010 +0000
xfs: Increase the default size of the reserved blocks pool
The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions. If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited
warning so we know if this starts happening again.
This is an updated version of an old patch from Lachlan McIlroy.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
> But when I review it, creating a 30TB XFS filesystem should involve writing
> some metadata at different places of the file.
>
> I get:
>
> merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile
> fsfile:
> 0: [0..255]: 96..351
> 1: [256..2147483639]: hole
> 2: [2147483640..2147483671]: 3400032..3400063
> 3: [2147483672..4294967279]: hole
> 4: [4294967280..4294967311]: 3400064..3400095
> 5: [4294967312..6442450919]: hole
> 6: [6442450920..6442450951]: 3400096..3400127
> 7: [6442450952..8589934559]: hole
.....
Yeah, that's all the AG headers.
> Okay, it needed to write 2 GB:
>
> merkaba:/mnt/zeit> du -h fsfile
> 2,0G fsfile
> merkaba:/mnt/zeit> du --apparent-size -h fsfile
> 30T fsfile
> merkaba:/mnt/zeit>
>
> I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it
> for a 30 TB filesystem I find this reasonable.
It zeroed the log, which will be just under 2GB in size for a
filesystem that large. Zeroing the log accounts for >99% of the IO
that mkfs does for most normal cases.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-01-29 22:18 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-27 7:50 Insane file system overhead on large volume Manny
2012-01-27 10:44 ` Christoph Hellwig
2012-01-27 19:15 ` Manny
2012-01-27 18:21 ` Eric Sandeen
2012-01-28 14:55 ` Martin Steigerwald
2012-01-28 15:35 ` Eric Sandeen
2012-01-28 16:05 ` Christoph Hellwig
2012-01-28 16:07 ` Eric Sandeen
2012-01-28 16:23 ` Martin Steigerwald
2012-01-29 22:18 ` Dave Chinner
2012-01-27 19:08 ` Stan Hoeppner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox