* Insane file system overhead on large volume
@ 2012-01-27 7:50 Manny
2012-01-27 10:44 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Manny @ 2012-01-27 7:50 UTC (permalink / raw)
To: xfs
Hi there,
I'm not sure if this is intended behavior, but I was a bit stumped
when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID
6) with XFS and noticed that there were only 22 TB left. I just called
mkfs.xfs with default parameters - except for swith and sunit which
match the RAID setup.
Is it normal that I lost 8TB just for the file system? That's almost
30% of the volume. Should I set the block size higher? Or should I
increase the number of allocation groups? Would that make a
difference? Whats the preferred method for handling such large
volumes?
Thanks a lot,
Manny
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Insane file system overhead on large volume 2012-01-27 7:50 Insane file system overhead on large volume Manny @ 2012-01-27 10:44 ` Christoph Hellwig 2012-01-27 19:15 ` Manny 2012-01-27 18:21 ` Eric Sandeen 2012-01-27 19:08 ` Stan Hoeppner 2 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2012-01-27 10:44 UTC (permalink / raw) To: Manny; +Cc: xfs On Fri, Jan 27, 2012 at 08:50:38AM +0100, Manny wrote: > Hi there, > > I'm not sure if this is intended behavior, but I was a bit stumped > when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID > 6) with XFS and noticed that there were only 22 TB left. I just called > mkfs.xfs with default parameters - except for swith and sunit which > match the RAID setup. > > Is it normal that I lost 8TB just for the file system? That's almost > 30% of the volume. Should I set the block size higher? Or should I > increase the number of allocation groups? Would that make a > difference? Whats the preferred method for handling such large > volumes? Where did you get the sizes for the raw volume and the filesystem usage from? _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-27 10:44 ` Christoph Hellwig @ 2012-01-27 19:15 ` Manny 0 siblings, 0 replies; 11+ messages in thread From: Manny @ 2012-01-27 19:15 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs > Where did you get the sizes for the raw volume and the filesystem usage > from? Oh my god, you are so right. The raw volume was actually just 24TB. My Raid controller decided to leave 6TB on the VDisk for a Snap pool. Thanks for the hint, and sorry to bother you _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-27 7:50 Insane file system overhead on large volume Manny 2012-01-27 10:44 ` Christoph Hellwig @ 2012-01-27 18:21 ` Eric Sandeen 2012-01-28 14:55 ` Martin Steigerwald 2012-01-27 19:08 ` Stan Hoeppner 2 siblings, 1 reply; 11+ messages in thread From: Eric Sandeen @ 2012-01-27 18:21 UTC (permalink / raw) To: Manny; +Cc: xfs On 1/27/12 1:50 AM, Manny wrote: > Hi there, > > I'm not sure if this is intended behavior, but I was a bit stumped > when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID > 6) with XFS and noticed that there were only 22 TB left. I just called > mkfs.xfs with default parameters - except for swith and sunit which > match the RAID setup. > > Is it normal that I lost 8TB just for the file system? That's almost > 30% of the volume. Should I set the block size higher? Or should I > increase the number of allocation groups? Would that make a > difference? Whats the preferred method for handling such large > volumes? If it was 12x3TB I imagine you're confusing TB with TiB, so perhaps your 30T is really only 27TiB to start with. Anyway, fs metadata should not eat much space: # mkfs.xfs -dfile,name=fsfile,size=30t # ls -lh fsfile -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile # mount -o loop fsfile mnt/ # df -h mnt Filesystem Size Used Avail Use% Mounted on /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt So Christoph's question was a good one; where are you getting your sizes? -Eric > Thanks a lot, > Manny > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-27 18:21 ` Eric Sandeen @ 2012-01-28 14:55 ` Martin Steigerwald 2012-01-28 15:35 ` Eric Sandeen 0 siblings, 1 reply; 11+ messages in thread From: Martin Steigerwald @ 2012-01-28 14:55 UTC (permalink / raw) To: xfs; +Cc: Manny, Eric Sandeen Am Freitag, 27. Januar 2012 schrieb Eric Sandeen: > On 1/27/12 1:50 AM, Manny wrote: > > Hi there, > > > > I'm not sure if this is intended behavior, but I was a bit stumped > > when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID > > 6) with XFS and noticed that there were only 22 TB left. I just > > called mkfs.xfs with default parameters - except for swith and sunit > > which match the RAID setup. > > > > Is it normal that I lost 8TB just for the file system? That's almost > > 30% of the volume. Should I set the block size higher? Or should I > > increase the number of allocation groups? Would that make a > > difference? Whats the preferred method for handling such large > > volumes? > > If it was 12x3TB I imagine you're confusing TB with TiB, so > perhaps your 30T is really only 27TiB to start with. > > Anyway, fs metadata should not eat much space: > > # mkfs.xfs -dfile,name=fsfile,size=30t > # ls -lh fsfile > -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile > # mount -o loop fsfile mnt/ > # df -h mnt > Filesystem Size Used Avail Use% Mounted on > /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt > > So Christoph's question was a good one; where are you getting > your sizes? An academic question: Why is it that I get merkaba:/tmp> mkfs.xfs -dfile,name=fsfile,size=30t meta-data=fsfile isize=256 agcount=30, agsize=268435455 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=8053063650, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =Internes Protokoll bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =keine extsz=4096 blocks=0, rtextents=0 merkaba:/tmp> mount -o loop fsfile /mnt/zeit merkaba:/tmp> df -hT /mnt/zeit Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit merkaba:/tmp> LANG=C df -hT /mnt/zeit Filesystem Type Size Used Avail Use% Mounted on /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit 33MiB used on first mount instead of 5? merkaba:/tmp> cat /proc/version Linux version 3.2.0-1-amd64 (Debian 3.2.1-2) ([…]) (gcc version 4.6.2 (Debian 4.6.2-12) ) #1 SMP Tue Jan 24 05:01:45 UTC 2012 merkaba:/tmp> mkfs.xfs -V mkfs.xfs Version 3.1.7 Maybe its due to me using a tmpfs for /tmp: merkaba:/tmp> LANG=C df -hT . Filesystem Type Size Used Avail Use% Mounted on tmpfs tmpfs 2.0G 2.0G 6.6M 100% /tmp Hmmm, but creating the file on Ext4 does not work: merkaba:/home> LANG=C df -hT . Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/merkaba-home ext4 224G 202G 20G 92% /home merkaba:/home> LANG=C mkfs.xfs -dfile,name=fsfile,size=30t meta-data=fsfile isize=256 agcount=30, agsize=268435455 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=8053063650, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: Growing the data section failed fallocate instead of sparse file? And on BTRFS as well as XFS it appears to try to create a 30T file for real, i.e. by writing data - I stopped it before it could do too much harm. Where did you create that hugish XFS file? Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-28 14:55 ` Martin Steigerwald @ 2012-01-28 15:35 ` Eric Sandeen 2012-01-28 16:05 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Eric Sandeen @ 2012-01-28 15:35 UTC (permalink / raw) To: Martin Steigerwald; +Cc: Manny, xfs On 1/28/12 8:55 AM, Martin Steigerwald wrote: > Am Freitag, 27. Januar 2012 schrieb Eric Sandeen: >> On 1/27/12 1:50 AM, Manny wrote: >>> Hi there, >>> >>> I'm not sure if this is intended behavior, but I was a bit stumped >>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID >>> 6) with XFS and noticed that there were only 22 TB left. I just >>> called mkfs.xfs with default parameters - except for swith and sunit >>> which match the RAID setup. >>> >>> Is it normal that I lost 8TB just for the file system? That's almost >>> 30% of the volume. Should I set the block size higher? Or should I >>> increase the number of allocation groups? Would that make a >>> difference? Whats the preferred method for handling such large >>> volumes? >> >> If it was 12x3TB I imagine you're confusing TB with TiB, so >> perhaps your 30T is really only 27TiB to start with. >> >> Anyway, fs metadata should not eat much space: >> >> # mkfs.xfs -dfile,name=fsfile,size=30t >> # ls -lh fsfile >> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile >> # mount -o loop fsfile mnt/ >> # df -h mnt >> Filesystem Size Used Avail Use% Mounted on >> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt >> >> So Christoph's question was a good one; where are you getting >> your sizes? To solve your original problem, can you answer the above question? Adding your actual raid config output (/proc/mdstat maybe) would help too. > An academic question: > > Why is it that I get > > merkaba:/tmp> mkfs.xfs -dfile,name=fsfile,size=30t > meta-data=fsfile isize=256 agcount=30, agsize=268435455 > blks > = sectsz=512 attr=2, projid32bit=0 > data = bsize=4096 blocks=8053063650, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =Internes Protokoll bsize=4096 blocks=521728, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =keine extsz=4096 blocks=0, rtextents=0 > > merkaba:/tmp> mount -o loop fsfile /mnt/zeit > merkaba:/tmp> df -hT /mnt/zeit > Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit > merkaba:/tmp> LANG=C df -hT /mnt/zeit > Filesystem Type Size Used Avail Use% Mounted on > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit > > > 33MiB used on first mount instead of 5? Not sure offhand, differences in xfsprogs version mkfs defaults perhaps. ... > Hmmm, but creating the file on Ext4 does not work: ext4 is not designed to handle very large files, so anything above 16T will fail. > fallocate instead of sparse file? no, you just ran into file offset limits on ext4. > And on BTRFS as well as XFS it appears to try to create a 30T file for > real, i.e. by writing data - I stopped it before it could do too much > harm. Why do you say that it appears to create a 30T file for real? It should not... > Where did you create that hugish XFS file? On XFS. Of course. :) > Ciao, _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-28 15:35 ` Eric Sandeen @ 2012-01-28 16:05 ` Christoph Hellwig 2012-01-28 16:07 ` Eric Sandeen 2012-01-28 16:23 ` Martin Steigerwald 2 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2012-01-28 16:05 UTC (permalink / raw) To: Eric Sandeen; +Cc: Manny, xfs Everyone calm done, Manny already replied and mentioned the problem. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-28 15:35 ` Eric Sandeen 2012-01-28 16:05 ` Christoph Hellwig @ 2012-01-28 16:07 ` Eric Sandeen 2012-01-28 16:23 ` Martin Steigerwald 2 siblings, 0 replies; 11+ messages in thread From: Eric Sandeen @ 2012-01-28 16:07 UTC (permalink / raw) To: Martin Steigerwald; +Cc: Manny, xfs On 1/28/12 9:35 AM, Eric Sandeen wrote: > On 1/28/12 8:55 AM, Martin Steigerwald wrote: >> Am Freitag, 27. Januar 2012 schrieb Eric Sandeen: ... >>> So Christoph's question was a good one; where are you getting >>> your sizes? > > To solve your original problem, can you answer the above question? > Adding your actual raid config output (/proc/mdstat maybe) would help > too. Sorry, nevermind. I missed the earlier reply about solving the problem and confused the responders. Argh. -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-28 15:35 ` Eric Sandeen 2012-01-28 16:05 ` Christoph Hellwig 2012-01-28 16:07 ` Eric Sandeen @ 2012-01-28 16:23 ` Martin Steigerwald 2012-01-29 22:18 ` Dave Chinner 2 siblings, 1 reply; 11+ messages in thread From: Martin Steigerwald @ 2012-01-28 16:23 UTC (permalink / raw) To: Eric Sandeen; +Cc: Manny, xfs Am Samstag, 28. Januar 2012 schrieb Eric Sandeen: > On 1/28/12 8:55 AM, Martin Steigerwald wrote: > > Am Freitag, 27. Januar 2012 schrieb Eric Sandeen: > >> On 1/27/12 1:50 AM, Manny wrote: > >>> Hi there, > >>> > >>> I'm not sure if this is intended behavior, but I was a bit stumped > >>> when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in > >>> RAID 6) with XFS and noticed that there were only 22 TB left. I > >>> just called mkfs.xfs with default parameters - except for swith > >>> and sunit which match the RAID setup. > >>> > >>> Is it normal that I lost 8TB just for the file system? That's > >>> almost 30% of the volume. Should I set the block size higher? Or > >>> should I increase the number of allocation groups? Would that make > >>> a difference? Whats the preferred method for handling such large > >>> volumes? > >> > >> If it was 12x3TB I imagine you're confusing TB with TiB, so > >> perhaps your 30T is really only 27TiB to start with. > >> > >> Anyway, fs metadata should not eat much space: > >> > >> # mkfs.xfs -dfile,name=fsfile,size=30t > >> # ls -lh fsfile > >> -rw-r--r-- 1 root root 30T Jan 27 12:18 fsfile > >> # mount -o loop fsfile mnt/ > >> # df -h mnt > >> Filesystem Size Used Avail Use% Mounted on > >> /tmp/fsfile 30T 5.0M 30T 1% /tmp/mnt > >> > >> So Christoph's question was a good one; where are you getting > >> your sizes? > > To solve your original problem, can you answer the above question? > Adding your actual raid config output (/proc/mdstat maybe) would help > too. Eric, I wrote > > An academic question: to make clear that it was just something I was curious about. I was not the reporter of the problem anyway, I have no problem, the reporter has no problem, see his answer, so all is good ;) With your hint and some thinking / testing through it I was able to resolve most of my other questions. Thanks. For the gory details: > > Why is it that I get […] > > merkaba:/tmp> LANG=C df -hT /mnt/zeit > > Filesystem Type Size Used Avail Use% Mounted on > > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit > > > > > > 33MiB used on first mount instead of 5? > > Not sure offhand, differences in xfsprogs version mkfs defaults > perhaps. Okay, thats fine with me. I was just curious. It doesn´t matter much. > > Hmmm, but creating the file on Ext4 does not work: > ext4 is not designed to handle very large files, so anything > above 16T will fail. > > > fallocate instead of sparse file? > > no, you just ran into file offset limits on ext4. Oh, yes. Completely forgot about these Ext4 limits. Sorry. > > And on BTRFS as well as XFS it appears to try to create a 30T file > > for real, i.e. by writing data - I stopped it before it could do too > > much harm. > > Why do you say that it appears to create a 30T file for real? It > should not... I jumped to a conclusion too quickly. It did do a I/O storm onto the Intel SSD 320: martin@merkaba:~> vmstat -S M 1 (not applied to bi/bo) procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1630 4365 87 1087 0 0 101 53 7 81 5 2 93 0 1 0 1630 4365 87 1087 0 0 0 0 428 769 1 0 99 0 2 0 1630 4365 87 1087 0 0 0 0 426 740 1 1 99 0 0 0 1630 4358 87 1088 0 0 0 0 1165 2297 4 7 89 0 0 0 1630 4357 87 1088 0 0 0 40 1736 3434 8 6 86 0 0 0 1630 4357 87 1088 0 0 0 0 614 1121 3 1 96 0 0 0 1630 4357 87 1088 0 0 0 32 359 636 0 0 100 0 1 1 1630 3852 87 1585 0 0 13 81540 529 1045 1 7 91 1 0 3 1630 3398 87 2027 0 0 0 227940 1357 2764 0 9 54 37 4 3 1630 3225 87 2188 0 0 0 212004 2346 4796 5 6 41 49 1 3 1630 2992 87 2415 0 0 0 215608 1825 3821 1 6 42 50 0 2 1630 2820 87 2582 0 0 0 200492 1476 3089 3 6 49 41 1 1 1630 2569 87 2832 0 0 0 198156 1250 2508 0 6 59 34 0 2 1630 2386 87 3009 0 0 0 229896 1301 2611 1 6 56 37 0 2 1630 2266 87 3126 0 0 0 302876 1067 2093 0 5 62 33 1 3 1630 2266 87 3126 0 0 0 176092 723 1321 0 3 71 26 0 3 1630 2266 87 3126 0 0 0 163840 706 1351 0 1 74 25 0 1 1630 2266 87 3126 0 0 0 80104 3137 6228 1 4 69 26 0 0 1630 2267 87 3126 0 0 0 3 3505 7035 6 3 86 5 0 0 1630 2266 87 3126 0 0 0 0 631 1203 4 1 95 0 0 0 1630 2259 87 3127 0 0 0 0 715 1398 4 2 94 0 2 0 1630 2259 87 3127 0 0 0 0 1501 3087 10 3 86 0 0 0 1630 2259 87 3127 0 0 0 27 945 1883 5 2 93 0 0 0 1630 2259 87 3127 0 0 0 0 399 713 1 0 99 0 ^C But then stopped. Thus mkfs.xfs was just writing metadata it seems and I didn´t see this in the tmpfs obviously. But when I review it, creating a 30TB XFS filesystem should involve writing some metadata at different places of the file. I get: merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile fsfile: 0: [0..255]: 96..351 1: [256..2147483639]: hole 2: [2147483640..2147483671]: 3400032..3400063 3: [2147483672..4294967279]: hole 4: [4294967280..4294967311]: 3400064..3400095 5: [4294967312..6442450919]: hole 6: [6442450920..6442450951]: 3400096..3400127 7: [6442450952..8589934559]: hole 8: [8589934560..8589934591]: 3400128..3400159 9: [8589934592..10737418199]: hole 10: [10737418200..10737418231]: 3400160..3400191 11: [10737418232..12884901839]: hole 12: [12884901840..12884901871]: 3400192..3400223 13: [12884901872..15032385479]: hole 14: [15032385480..15032385511]: 3400224..3400255 15: [15032385512..17179869119]: hole 16: [17179869120..17179869151]: 3400256..3400287 17: [17179869152..19327352759]: hole 18: [19327352760..19327352791]: 3400296..3400327 19: [19327352792..21474836399]: hole 20: [21474836400..21474836431]: 3400328..3400359 21: [21474836432..23622320039]: hole 22: [23622320040..23622320071]: 3400360..3400391 23: [23622320072..25769803679]: hole 24: [25769803680..25769803711]: 3400392..3400423 25: [25769803712..27917287319]: hole 26: [27917287320..27917287351]: 3400424..3400455 27: [27917287352..30064770959]: hole 28: [30064770960..30064770991]: 3400456..3400487 29: [30064770992..32212254599]: hole 30: [32212254600..32212254631]: 3400488..3400519 31: [32212254632..32215654311]: 352..3400031 32: [32215654312..32216428455]: 3400520..4174663 33: [32216428456..34359738239]: hole 34: [34359738240..34359738271]: 4174664..4174695 35: [34359738272..36507221879]: hole 36: [36507221880..36507221911]: 4174696..4174727 37: [36507221912..38654705519]: hole 38: [38654705520..38654705551]: 4174728..4174759 39: [38654705552..40802189159]: hole 40: [40802189160..40802189191]: 4174760..4174791 41: [40802189192..42949672799]: hole 42: [42949672800..42949672831]: 4174792..4174823 43: [42949672832..45097156439]: hole 44: [45097156440..45097156471]: 4174824..4174855 45: [45097156472..47244640079]: hole 46: [47244640080..47244640111]: 4174856..4174887 47: [47244640112..49392123719]: hole 48: [49392123720..49392123751]: 4174888..4174919 49: [49392123752..51539607359]: hole 50: [51539607360..51539607391]: 4174920..4174951 51: [51539607392..53687090999]: hole 52: [53687091000..53687091031]: 4174952..4174983 53: [53687091032..55834574639]: hole 54: [55834574640..55834574671]: 4174984..4175015 55: [55834574672..57982058279]: hole 56: [57982058280..57982058311]: 4175016..4175047 57: [57982058312..60129541919]: hole 58: [60129541920..60129541951]: 4175048..4175079 59: [60129541952..62277025559]: hole 60: [62277025560..62277025591]: 4175080..4175111 61: [62277025592..64424509191]: hole 62: [64424509192..64424509199]: 4175112..4175119 Okay, it needed to write 2 GB: merkaba:/mnt/zeit> du -h fsfile 2,0G fsfile merkaba:/mnt/zeit> du --apparent-size -h fsfile 30T fsfile merkaba:/mnt/zeit> I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it for a 30 TB filesystem I find this reasonable. Still it has 33 MiB for metadata: merkaba:/mnt/zeit> mkdir bigfilefs merkaba:/mnt/zeit> mount -o loop fsfile bigfilefs merkaba:/mnt/zeit> LANG=C df -hT bigfilefs Filesystem Type Size Used Avail Use% Mounted on /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit/bigfilefs Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-28 16:23 ` Martin Steigerwald @ 2012-01-29 22:18 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2012-01-29 22:18 UTC (permalink / raw) To: Martin Steigerwald; +Cc: Manny, Eric Sandeen, xfs On Sat, Jan 28, 2012 at 05:23:42PM +0100, Martin Steigerwald wrote: > Am Samstag, 28. Januar 2012 schrieb Eric Sandeen: > > On 1/28/12 8:55 AM, Martin Steigerwald wrote: > For the gory details: > > > > Why is it that I get > […] > > > merkaba:/tmp> LANG=C df -hT /mnt/zeit > > > Filesystem Type Size Used Avail Use% Mounted on > > > /dev/loop0 xfs 30T 33M 30T 1% /mnt/zeit > > > > > > > > > 33MiB used on first mount instead of 5? > > > > Not sure offhand, differences in xfsprogs version mkfs defaults > > perhaps. > > Okay, thats fine with me. I was just curious. It doesn´t matter much. More likely the kernel. Older kernels only use 1024 blocks for the reserve block pool, while more recent ones use 8192 blocks. $ gl -n 1 8babd8a commit 8babd8a2e75cccff3167a61176c2a3e977e13799 Author: Dave Chinner <david@fromorbit.com> Date: Thu Mar 4 01:46:25 2010 +0000 xfs: Increase the default size of the reserved blocks pool The current default size of the reserved blocks pool is easy to deplete with certain workloads, in particular workloads that do lots of concurrent delayed allocation extent conversions. If enough transactions are running in parallel and the entire pool is consumed then subsequent calls to xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited warning so we know if this starts happening again. This is an updated version of an old patch from Lachlan McIlroy. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com> > But when I review it, creating a 30TB XFS filesystem should involve writing > some metadata at different places of the file. > > I get: > > merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile > fsfile: > 0: [0..255]: 96..351 > 1: [256..2147483639]: hole > 2: [2147483640..2147483671]: 3400032..3400063 > 3: [2147483672..4294967279]: hole > 4: [4294967280..4294967311]: 3400064..3400095 > 5: [4294967312..6442450919]: hole > 6: [6442450920..6442450951]: 3400096..3400127 > 7: [6442450952..8589934559]: hole ..... Yeah, that's all the AG headers. > Okay, it needed to write 2 GB: > > merkaba:/mnt/zeit> du -h fsfile > 2,0G fsfile > merkaba:/mnt/zeit> du --apparent-size -h fsfile > 30T fsfile > merkaba:/mnt/zeit> > > I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it > for a 30 TB filesystem I find this reasonable. It zeroed the log, which will be just under 2GB in size for a filesystem that large. Zeroing the log accounts for >99% of the IO that mkfs does for most normal cases. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Insane file system overhead on large volume 2012-01-27 7:50 Insane file system overhead on large volume Manny 2012-01-27 10:44 ` Christoph Hellwig 2012-01-27 18:21 ` Eric Sandeen @ 2012-01-27 19:08 ` Stan Hoeppner 2 siblings, 0 replies; 11+ messages in thread From: Stan Hoeppner @ 2012-01-27 19:08 UTC (permalink / raw) To: xfs On 1/27/2012 1:50 AM, Manny wrote: > Hi there, > > I'm not sure if this is intended behavior, but I was a bit stumped > when I formatted a 30TB volume (12x3TB minus 2x3TB for parity in RAID > 6) with XFS and noticed that there were only 22 TB left. I just called > mkfs.xfs with default parameters - except for swith and sunit which > match the RAID setup. > > Is it normal that I lost 8TB just for the file system? That's almost > 30% of the volume. Should I set the block size higher? Or should I > increase the number of allocation groups? Would that make a > difference? Whats the preferred method for handling such large > volumes? Maybe you simply assigned 2 spares and forgot, so you actually only have 10 RAID6 disks with 8 disks worth of stripe, equaling 24 TB, or 21.8 TiB. 21.8 TiB matches up pretty closely with your 22 TB, so this scenario seems pretty plausible, dare I say likely. If this is the case you'll want to reformat the 10 disk RAID6 with the proper sunit/swidth values. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-01-29 22:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-27 7:50 Insane file system overhead on large volume Manny 2012-01-27 10:44 ` Christoph Hellwig 2012-01-27 19:15 ` Manny 2012-01-27 18:21 ` Eric Sandeen 2012-01-28 14:55 ` Martin Steigerwald 2012-01-28 15:35 ` Eric Sandeen 2012-01-28 16:05 ` Christoph Hellwig 2012-01-28 16:07 ` Eric Sandeen 2012-01-28 16:23 ` Martin Steigerwald 2012-01-29 22:18 ` Dave Chinner 2012-01-27 19:08 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox