* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 3:21 block sizes > 4K ?? possible w/large page support? Linda A. Walsh
@ 2012-06-11 13:13 ` Stan Hoeppner
2012-06-11 13:29 ` Carlos Maiolino
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2012-06-11 13:13 UTC (permalink / raw)
To: Linda A. Walsh; +Cc: xfs-oss
On 6/10/2012 10:21 PM, Linda A. Walsh wrote:
> Is this something being thought about??
Probably not much these days, but I'm sure it's been debated much over
the years amongst many filesystem and kernel developers across all
operating system teams, including Linux, *nix, VMS, MVS, and Windows.
...
> All but 2 could benefit from a 16K block size, and 3 of them could benefit
> from a 128K block size. Wouldn't that benefit in in freeing up some space
> both on disk and in memory? Just a thought.
If you could increase the page size and thus the XFS block size to some
arbitrarily high number such as 64KB, it would do nothing for on disk
layout but increase wasted disk sectors. It would increase transfer
performance on some workloads, but it would also cause a myriad of
problems. Not least of which is the need to recode, debug, and
regression test the entire x86[64] kernel to use properly use 64KB
pages, which I assume is no small task.
Everything is a tradeoff Linda. At this point, 4KB appears to be the
best tradeoff. And again, even if it were increased, it wouldn't do
anything to benefit the case the you mention, but would actually hurt
it, because you'd end up with more wasted sectors at the end of each file.
Array controllers and disks have no awareness of the FS block size.
They simply swallow or sling 512B sectors from/to the block layer. It's
the block layer that can benefit from being fed larger FS blocks as it
can schedule transfers more efficiently. For instance, allowing the
elevator the potential to order sector accesses, minimizing head seeks.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 3:21 block sizes > 4K ?? possible w/large page support? Linda A. Walsh
2012-06-11 13:13 ` Stan Hoeppner
@ 2012-06-11 13:29 ` Carlos Maiolino
2012-06-11 14:54 ` Stan Hoeppner
2012-06-12 0:08 ` Dave Chinner
2012-06-12 2:32 ` Eric Sandeen
3 siblings, 1 reply; 11+ messages in thread
From: Carlos Maiolino @ 2012-06-11 13:29 UTC (permalink / raw)
To: xfs
Hi,
> Is this something being thought about??
>
> More than one of my hard disks:
>
> /boot: 130 files in 103112 4K blocks: 793.6 blks/file
> /tmp: 1401 files in 746715 4K blocks: 533.4 blks/file
> /var/cache: 1438 files in 87858 4K blocks: 61.5 blks/file
> /backups: 713 files in 2523985177 4K blocks: 3539951.6 blks/file
> /var: 9038 files in 746715 4K blocks: 83.1 blks/file
> /var/cache/squid: 570 files in 90031 4K blocks: 158.4 blks/file
> /Media: 51893 files in 1691400956 4K blocks: 32594.5 blks/file
> /: 37312 files in 506778 4K blocks: 14.0 blks/file
> /usr/share: 320805 files in 195425485 4K blocks: 609.6 blks/file
> /backups/Media: 50544 files in 1642550112 4K blocks: 32497.9 blks/file
> /usr: 116650 files in 1389380 4K blocks: 12.4 blks/file
> /Share: 1617995 files in 305269701 4K blocks: 189.1 blks/file
> /home: 5822174 files in 195412389 4K blocks: 34.0 blks/file
>
> All but 2 could benefit from a 16K block size, and 3 of them could benefit
> from a 128K block size. Wouldn't that benefit in in freeing up some space
> both on disk and in memory? Just a thought.
The maximum block size of a XFS filesystem is 64kiB. But in linux it's limited
to the PAGE_SIZE value. so, on x86 architectures, the maximum block size is
4kiB.
although it could benefit from a 16kiB page size, you'll need to be running an
operating system which supports this page size value.
--
--Carlos
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 13:29 ` Carlos Maiolino
@ 2012-06-11 14:54 ` Stan Hoeppner
2012-06-11 16:21 ` Carlos Maiolino
2012-06-11 23:56 ` Dave Chinner
0 siblings, 2 replies; 11+ messages in thread
From: Stan Hoeppner @ 2012-06-11 14:54 UTC (permalink / raw)
To: xfs
On 6/11/2012 8:29 AM, Carlos Maiolino wrote:
> The maximum block size of a XFS filesystem is 64kiB. But in linux it's limited
> to the PAGE_SIZE value.
Correct.
> so, on x86 architectures, the maximum block size is
> 4kiB.
Not entirely correct. Since ~1996, 16 years ago, PPro and higher 32bit
CPUs with PSE/PSE36 support pages of 4MB, or 2MB with PAE enabled.
x86-64 CPUs in long mode also support a 2MB page size. But the problem
of internal fragmentation may outweigh the TLB and other benefits of
these very large pages. I'm not an MM dev so I can't elaborate further.
There may be other issues.
> although it could benefit from a 16kiB page size, you'll need to be running an
> operating system which supports this page size value.
And AFAIK the kernel MM team doesn't have x86 2MB pages on their radar.
Or do they?
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 14:54 ` Stan Hoeppner
@ 2012-06-11 16:21 ` Carlos Maiolino
2012-06-11 21:25 ` Stefan Ring
2012-06-11 23:56 ` Dave Chinner
1 sibling, 1 reply; 11+ messages in thread
From: Carlos Maiolino @ 2012-06-11 16:21 UTC (permalink / raw)
To: xfs
> > The maximum block size of a XFS filesystem is 64kiB. But in linux it's limited
> > to the PAGE_SIZE value.
>
> Correct.
>
> > so, on x86 architectures, the maximum block size is
> > 4kiB.
>
> Not entirely correct. Since ~1996, 16 years ago, PPro and higher 32bit
> CPUs with PSE/PSE36 support pages of 4MB, or 2MB with PAE enabled.
>
I know we can use hugepages with these sizes, but didn't know we can use this as
common usage. I tried to look at MM code and didn't find anything which would
make PAGE_SIZE greater than 4096 (at least in x86), but well, I'm not a MM
developer too.
> x86-64 CPUs in long mode also support a 2MB page size. But the problem
> of internal fragmentation may outweigh the TLB and other benefits of
> these very large pages. I'm not an MM dev so I can't elaborate further.
> There may be other issues.
>
> > although it could benefit from a 16kiB page size, you'll need to be running an
> > operating system which supports this page size value.
>
> And AFAIK the kernel MM team doesn't have x86 2MB pages on their radar.
> Or do they?
No clue, I'm not a MM developer too =/ maybe I can be someday :D
--
--Carlos
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 16:21 ` Carlos Maiolino
@ 2012-06-11 21:25 ` Stefan Ring
0 siblings, 0 replies; 11+ messages in thread
From: Stefan Ring @ 2012-06-11 21:25 UTC (permalink / raw)
To: xfs
> I know we can use hugepages with these sizes, but didn't know we can use this as
> common usage. I tried to look at MM code and didn't find anything which would
> make PAGE_SIZE greater than 4096 (at least in x86), but well, I'm not a MM
> developer too.
Me neither, but so much I think I know. With rather recent kernels,
huge pages are used automatically for "normal" memory. There's a
background thread that tries to locate or create adjacent blocks of
memory and create huge pages from these.
But the normal page size as a unit of allocation is still 4KB, and
this will never change (with the current hardware). The page size of
4KB is hard-wired into the architecture, and I'm not aware of any
extension that allows to change that.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 14:54 ` Stan Hoeppner
2012-06-11 16:21 ` Carlos Maiolino
@ 2012-06-11 23:56 ` Dave Chinner
1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2012-06-11 23:56 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: xfs
On Mon, Jun 11, 2012 at 09:54:57AM -0500, Stan Hoeppner wrote:
> On 6/11/2012 8:29 AM, Carlos Maiolino wrote:
>
> > The maximum block size of a XFS filesystem is 64kiB. But in linux it's limited
> > to the PAGE_SIZE value.
>
> Correct.
>
> > so, on x86 architectures, the maximum block size is
> > 4kiB.
>
> Not entirely correct. Since ~1996, 16 years ago, PPro and higher 32bit
> CPUs with PSE/PSE36 support pages of 4MB, or 2MB with PAE enabled.
>
> x86-64 CPUs in long mode also support a 2MB page size. But the problem
> of internal fragmentation may outweigh the TLB and other benefits of
> these very large pages. I'm not an MM dev so I can't elaborate further.
> There may be other issues.
>
> > although it could benefit from a 16kiB page size, you'll need to be running an
> > operating system which supports this page size value.
>
> And AFAIK the kernel MM team doesn't have x86 2MB pages on their radar.
> Or do they?
Been supported for a few of years now in one form or another.
google for "transparent huge pages".....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 3:21 block sizes > 4K ?? possible w/large page support? Linda A. Walsh
2012-06-11 13:13 ` Stan Hoeppner
2012-06-11 13:29 ` Carlos Maiolino
@ 2012-06-12 0:08 ` Dave Chinner
2012-06-12 2:32 ` Eric Sandeen
3 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2012-06-12 0:08 UTC (permalink / raw)
To: Linda A. Walsh; +Cc: xfs-oss
On Sun, Jun 10, 2012 at 08:21:35PM -0700, Linda A. Walsh wrote:
> Is this something being thought about??
>
> More than one of my hard disks:
>
> /boot: 130 files in 103112 4K blocks: 793.6 blks/file
> /tmp: 1401 files in 746715 4K blocks: 533.4 blks/file
> /var/cache: 1438 files in 87858 4K blocks: 61.5 blks/file
> /backups: 713 files in 2523985177 4K blocks: 3539951.6 blks/file
> /var: 9038 files in 746715 4K blocks: 83.1 blks/file
> /var/cache/squid: 570 files in 90031 4K blocks: 158.4 blks/file
> /Media: 51893 files in 1691400956 4K blocks: 32594.5 blks/file
> /: 37312 files in 506778 4K blocks: 14.0 blks/file
> /usr/share: 320805 files in 195425485 4K blocks: 609.6 blks/file
> /backups/Media: 50544 files in 1642550112 4K blocks: 32497.9 blks/file
> /usr: 116650 files in 1389380 4K blocks: 12.4 blks/file
> /Share: 1617995 files in 305269701 4K blocks: 189.1 blks/file
> /home: 5822174 files in 195412389 4K blocks: 34.0 blks/file
>
> All but 2 could benefit from a 16K block size, and 3 of them could benefit
> from a 128K block size.
Block size has nothing to do with how efficiently space is indexed
on disk. Remember - XFS uses extents to track used and free space,
and all of the above average blocks per file fit within a couple of
extents on a 4k block size filesystem.
e.g. with 4k block sizes the maximum extent size is 8GB, so an
inode with inline extents (up to 8, I think) can be up to 64GB. a
single extent form block can reference roughly 4096/16 = 256
extents, so file sizes of up to 2TB can be referenced with a single
IO to read the extent list. If we go to 2 IOs, then it's roughly
65000 extents that can be referenced, so once again for most people
this is more than sufficient.
Freespace and other metadata btrees will become shallower with
larger block sizes. However, the one that really matters is the
directory trees and they can already be made with block sizes up to
64k in size even on a 4k block size filesystem....
> Wouldn't that benefit in in freeing up some space
> both on disk and in memory? Just a thought.
No, it won't make much difference at all unless you regularly create
sparse files with hundreds of thousands of extents or multi-terabyte
sized files...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-11 3:21 block sizes > 4K ?? possible w/large page support? Linda A. Walsh
` (2 preceding siblings ...)
2012-06-12 0:08 ` Dave Chinner
@ 2012-06-12 2:32 ` Eric Sandeen
2012-06-12 17:37 ` Linda A. Walsh
3 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2012-06-12 2:32 UTC (permalink / raw)
To: Linda A. Walsh; +Cc: xfs-oss
On 6/10/12 10:21 PM, Linda A. Walsh wrote:
> Is this something being thought about??
>
> More than one of my hard disks:
>
> /boot: 130 files in 103112 4K blocks: 793.6 blks/file
> /tmp: 1401 files in 746715 4K blocks: 533.4 blks/file
> /var/cache: 1438 files in 87858 4K blocks: 61.5 blks/file
> /backups: 713 files in 2523985177 4K blocks: 3539951.6 blks/file
> /var: 9038 files in 746715 4K blocks: 83.1 blks/file
> /var/cache/squid: 570 files in 90031 4K blocks: 158.4 blks/file
> /Media: 51893 files in 1691400956 4K blocks: 32594.5 blks/file
> /: 37312 files in 506778 4K blocks: 14.0 blks/file
> /usr/share: 320805 files in 195425485 4K blocks: 609.6 blks/file
> /backups/Media: 50544 files in 1642550112 4K blocks: 32497.9 blks/file
> /usr: 116650 files in 1389380 4K blocks: 12.4 blks/file
> /Share: 1617995 files in 305269701 4K blocks: 189.1 blks/file
> /home: 5822174 files in 195412389 4K blocks: 34.0 blks/file
>
> All but 2 could benefit from a 16K block size, and 3 of them could benefit
> from a 128K block size. Wouldn't that benefit in in freeing up some space
> both on disk and in memory? Just a thought.
Since on average each file in an evenly-distributed filesystem wastes half
a block, in theory each fs would waste 4x more space w/ 16k blocks than
4k blocks, right?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: block sizes > 4K ?? possible w/large page support?
2012-06-12 2:32 ` Eric Sandeen
@ 2012-06-12 17:37 ` Linda A. Walsh
2012-06-12 18:55 ` Eric Sandeen
0 siblings, 1 reply; 11+ messages in thread
From: Linda A. Walsh @ 2012-06-12 17:37 UTC (permalink / raw)
To: xfs-oss
Eric Sandeen wrote:
> On 6/10/12 10:21 PM, Linda A. Walsh wrote:
>> Is this something being thought about??
>>
>> More than one of my hard disks:
>>
>> /boot: 130 files in 103112 4K blocks: 793.6 blks/file
>> /tmp: 1401 files in 746715 4K blocks: 533.4 blks/file
>> /var/cache: 1438 files in 87858 4K blocks: 61.5 blks/file
>> /backups: 713 files in 2523985177 4K blocks: 3539951.6 blks/file
>> /var: 9038 files in 746715 4K blocks: 83.1 blks/file
>> /var/cache/squid: 570 files in 90031 4K blocks: 158.4 blks/file
>> /Media: 51893 files in 1691400956 4K blocks: 32594.5 blks/file
>> /: 37312 files in 506778 4K blocks: 14.0 blks/file
>> /usr/share: 320805 files in 195425485 4K blocks: 609.6 blks/file
>> /backups/Media: 50544 files in 1642550112 4K blocks: 32497.9 blks/file
>> /usr: 116650 files in 1389380 4K blocks: 12.4 blks/file
>> /Share: 1617995 files in 305269701 4K blocks: 189.1 blks/file
>> /home: 5822174 files in 195412389 4K blocks: 34.0 blks/file
>>
>> All but 2 could benefit from a 16K block size, and 3 of them could benefit
>> from a 128K block size. Wouldn't that benefit in in freeing up some space
>> both on disk and in memory? Just a thought.
>
> Since on average each file in an evenly-distributed filesystem wastes half
> a block, in theory each fs would waste 4x more space w/ 16k blocks than
> 4k blocks, right?
---
Well the real candidates for a larger block size would be backups,
and maybe Media... the rest wouldn't benefit.
So, it sounds like I might just as well benefit by going to a 1K
block size, if there's no cost in smaller block sizes? Or would that be
entirely dependent on the files/dir?
Those blks/file are 4k-blks/file if there was any doubt...
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: block sizes > 4K ?? possible w/large page support?
2012-06-12 17:37 ` Linda A. Walsh
@ 2012-06-12 18:55 ` Eric Sandeen
0 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2012-06-12 18:55 UTC (permalink / raw)
To: Linda A. Walsh; +Cc: xfs-oss
On 6/12/12 12:37 PM, Linda A. Walsh wrote:
>
>
> Eric Sandeen wrote:
>> On 6/10/12 10:21 PM, Linda A. Walsh wrote:
>>> Is this something being thought about??
>>>
>>> More than one of my hard disks:
>>>
>>> /boot: 130 files in 103112 4K blocks: 793.6 blks/file
>>> /tmp: 1401 files in 746715 4K blocks: 533.4 blks/file
>>> /var/cache: 1438 files in 87858 4K blocks: 61.5 blks/file
>>> /backups: 713 files in 2523985177 4K blocks: 3539951.6 blks/file
>>> /var: 9038 files in 746715 4K blocks: 83.1 blks/file
>>> /var/cache/squid: 570 files in 90031 4K blocks: 158.4 blks/file
>>> /Media: 51893 files in 1691400956 4K blocks: 32594.5 blks/file
>>> /: 37312 files in 506778 4K blocks: 14.0 blks/file
>>> /usr/share: 320805 files in 195425485 4K blocks: 609.6 blks/file
>>> /backups/Media: 50544 files in 1642550112 4K blocks: 32497.9 blks/file
>>> /usr: 116650 files in 1389380 4K blocks: 12.4 blks/file
>>> /Share: 1617995 files in 305269701 4K blocks: 189.1 blks/file
>>> /home: 5822174 files in 195412389 4K blocks: 34.0 blks/file
>>>
>>> All but 2 could benefit from a 16K block size, and 3 of them could benefit
>>> from a 128K block size. Wouldn't that benefit in in freeing up some space
>>> both on disk and in memory? Just a thought.
>>
>> Since on average each file in an evenly-distributed filesystem wastes half
>> a block, in theory each fs would waste 4x more space w/ 16k blocks than
>> 4k blocks, right?
> ---
> Well the real candidates for a larger block size would be backups,
> and maybe Media... the rest wouldn't benefit.
>
> So, it sounds like I might just as well benefit by going to a 1K
> block size, if there's no cost in smaller block sizes? Or would that be
> entirely dependent on the files/dir?
Well, there are some metadata overhead costs there, so it's a tradeoff.
Like we always say, use the defaults unless you can definitively show
that other options work better for your needs after testing. :)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 11+ messages in thread