All of lore.kernel.org
 help / color / mirror / Atom feed
* 13GB dcache+inode cache hash tables
@ 2013-06-25  8:56 Daniel J Blueman
  2013-06-25  9:48 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel J Blueman @ 2013-06-25  8:56 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Ingo Molnar, Steffen Persvold

As memory capacity increases, we see the dentry and inode cache hash 
tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.

Perhaps a better approach adds a linear component to an exponent to give 
tuned scaling, given that spatial locality is an advantage in hash table 
and careful use of resources.

The same approach would fit to other hash tables (mount-cache, TCP 
established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different 
coefficients, so perhaps we could generalise.

If so what are reasonable reference points and assumptions?

Thanks,
   Daniel

--- [1]

1GB:
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)

8GB:
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)

1TB:
Dentry cache hash table entries: 134217728 (order: 18, 1073741824 bytes)
Inode-cache hash table entries: 67108864 (order: 17, 536870912 bytes)

4.5TB
Dentry cache hash table entries: 1073741824 (order: 21, 8589934592 bytes)
Inode-cache hash table entries: 536870912 (order: 20, 4294967296 bytes)
-- 
Daniel J Blueman
Principal Software Engineer, Numascale Asia

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 13GB dcache+inode cache hash tables
  2013-06-25  8:56 13GB dcache+inode cache hash tables Daniel J Blueman
@ 2013-06-25  9:48 ` Eric Dumazet
  2013-06-27  9:08   ` Daniel J Blueman
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2013-06-25  9:48 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, Ingo Molnar, Steffen Persvold

On Tue, 2013-06-25 at 16:56 +0800, Daniel J Blueman wrote:
> As memory capacity increases, we see the dentry and inode cache hash 
> tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.
> 
> Perhaps a better approach adds a linear component to an exponent to give 
> tuned scaling, given that spatial locality is an advantage in hash table 
> and careful use of resources.
> 
> The same approach would fit to other hash tables (mount-cache, TCP 
> established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different 
> coefficients, so perhaps we could generalise.
> 

TCP hash table is limited to 512K slots, unless overridden.
TCP bind limited to 64K slots.
UDP limited to 64K slots.

> If so what are reasonable reference points and assumptions?
> 

I do not know what you have in mind, please show us a patch ;)

I would love if all these hash tables could use hugepages.

vmalloc() is nice for NUMA spreading, but being able to use hugepages
for very large hashes could lower TLB pressure...

# grep alloc_large_system_hash /proc/vmallocinfo 
0xffffc90000002000-0xffffc90004003000 67112960 alloc_large_system_hash+0x153/0x21c pages=16384 vmalloc vpages N0=8192 N1=8192
0xffffc90004003000-0xffffc90004024000  135168 alloc_large_system_hash+0x153/0x21c pages=32 vmalloc N0=16 N1=16
0xffffc90004024000-0xffffc90006025000 33558528 alloc_large_system_hash+0x153/0x21c pages=8192 vmalloc vpages N0=4096 N1=4096
0xffffc90006025000-0xffffc90006036000   69632 alloc_large_system_hash+0x153/0x21c pages=16 vmalloc N0=8 N1=8
0xffffc90006052000-0xffffc90006057000   20480 alloc_large_system_hash+0x153/0x21c pages=4 vmalloc N0=2 N1=2
0xffffc90016081000-0xffffc90016882000 8392704 alloc_large_system_hash+0x153/0x21c pages=2048 vmalloc vpages N0=1024 N1=1024
0xffffc90016882000-0xffffc90016983000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128
0xffffc90016983000-0xffffc90016a84000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128
0xffffc90016a84000-0xffffc90016b85000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128

# dmesg | grep hash
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.003976] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
[    0.016692] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
[    0.022074] Mount-cache hash table entries: 256
[    1.089249] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[    1.090651] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.090946] UDP hash table entries: 32768 (order: 8, 1048576 bytes)
[    1.091187] UDP-Lite hash table entries: 32768 (order: 8, 1048576 bytes)
[    1.119761] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 13GB dcache+inode cache hash tables
  2013-06-25  9:48 ` Eric Dumazet
@ 2013-06-27  9:08   ` Daniel J Blueman
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel J Blueman @ 2013-06-27  9:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel, Ingo Molnar, Steffen Persvold

On 25/06/2013 17:48, Eric Dumazet wrote:
> On Tue, 2013-06-25 at 16:56 +0800, Daniel J Blueman wrote:
>> As memory capacity increases, we see the dentry and inode cache hash
>> tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.
>>
>> Perhaps a better approach adds a linear component to an exponent to give
>> tuned scaling, given that spatial locality is an advantage in hash table
>> and careful use of resources.
>>
>> The same approach would fit to other hash tables (mount-cache, TCP
>> established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different
>> coefficients, so perhaps we could generalise.
>>
>
> TCP hash table is limited to 512K slots, unless overridden.
> TCP bind limited to 64K slots.
> UDP limited to 64K slots.
>
>> If so what are reasonable reference points and assumptions?
>
> I do not know what you have in mind, please show us a patch ;)
[...]

Alright, I'll see what I can get together in the next week or so when I 
can fit it in.

Dan
-- 
Daniel J Blueman
Principal Software Engineer, Numascale Asia

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-06-27  9:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-25  8:56 13GB dcache+inode cache hash tables Daniel J Blueman
2013-06-25  9:48 ` Eric Dumazet
2013-06-27  9:08   ` Daniel J Blueman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.