linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Stephen Tweedie <sct@redhat.com>,
	Jeremy Eder <jeder@redhat.com>
Subject: Re: [PATCH 5/5][RFC][CFT] resizable namespace.c hashes
Date: Fri, 7 Mar 2014 18:38:34 +0000	[thread overview]
Message-ID: <20140307183834.GE18016@ZenIV.linux.org.uk> (raw)
In-Reply-To: <87bnxi6i8w.fsf@tassilo.jf.intel.com>

On Fri, Mar 07, 2014 at 09:17:19AM -0800, Andi Kleen wrote:
> Al Viro <viro@ZenIV.linux.org.uk> writes:
> 
> > * switch allocation to alloc_large_system_hash()
> > * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
> > * switch mountpoint_hashtable from list_head to hlist_head
> 
> So how much memory does this use on a standard system (<4GB memory)?

Two hash chains per megabyte.  IOW, 2Gb => 4096 of them.  Could drive
it lower, probably, but I'm a bit nervous about _really_ low-end
boxen.  Right now it matches your variant at 128Mb box.

> How much memory does it use on a large system (0.5TB)?

Probably over the top - alloc_large_system_hash() has that problem
in general.

0.5Tb would amount to 2^20 hash chains.  Which is probably too much,
but then dentry hash on the same box will be 2^26 hash chains and
inode hash - 2^25 chains.

> How good is your hash function. Would jhash be more appropiate
> and allow smaller hash tables?

How the hell does hash function affect the average chain length?  And
yes, they are pretty evenly distributed.  And while we are at it,
what the hell does jhash have to do with the whole thing?  No strings
involved - the hash key is a pair of pointers.  To objects allocated
by kmem_cache_alloc(), so their middle bits are already fairly random.

> Perhaps just want a tree here.

_What_ tree?  Andi, WTF are you talking about?  That hash is a mapping
from (parent vfsmount, mountpoint dentry) to child vfsmount.  Sure,
we do have the mount tree - all children of given vfsmount are on a
cyclic list anchored in it.  And iterating through those is painfully
slow on realistic setups.  Have a bunch of nfs4 referrals on one fs,
and you are welcome to a hundred of vfsmounts on the child list of one.
Create a bunch of bindings in assorted places in /usr, have *any*
mountpoint in /usr pay the price when we cross it on pathname resolution.
Same with /, since there tends to be a bunch of mounts on directories
in root filesystem.  FWIW, on the box I'm typing at right now - /proc,
/sys, /dev, /run, /tmp, /home, /usr, /var.   8 of them.  On the box I'm
sshed into: /proc, /sys, /dev, /run, /usr, /media, /var, /tmp, /boot,
/archive - 10.  And traversing that would be the price on *any* pathname
resolution trying to cross from root to e.g. /home or /usr.

To get an equally bad behaviour on the current setup (even with the
ridiculously small hash table size set by your patch back in 2002),
you'd need ~2000 mounts.  And it very easily degrades even more -
consider e.g. a box that has a shitload of stuff automounted under
/home.  It can easily go well past 10 simultaneous ones...

We could keep a per-mountpoint list, anchored in dentry.  And pay with
an extra pointer in each struct dentry out there.  Which will cost a lot
more.  Or, we could anchor them in struct mountpoint and do hash lookup
*and* list search on each mountpoint crossing - one to find struct mountpoint
by struct dentry, another to look for vfsmount with the right parent.

      reply	other threads:[~2014-03-07 18:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-05  3:47 [PATCHES][RFC][CFT] scalability fixes for shitloads of mounts Al Viro
2014-03-05  3:49 ` [PATCH 1/5][RFC][CFT] percpu fixes, part 1 Al Viro
2014-03-06 19:20   ` Tejun Heo
2014-03-06 20:30     ` Al Viro
2014-03-06 20:47       ` Tejun Heo
2014-03-07  2:52         ` Al Viro
2014-03-07 12:30           ` Tejun Heo
2014-03-14 18:45           ` Al Viro
2014-03-14 18:47             ` Tejun Heo
2014-03-14 18:53               ` Al Viro
2014-03-17 20:12                 ` [PATCH percpu/for-3.15] percpu: allocation size should be even Tejun Heo
2014-03-05  3:50 ` [PATCH 2/5][RFC][CFT] fold pcpu_split_block() into the only caller Al Viro
2014-03-06 19:21   ` Tejun Heo
2014-03-05  3:51 ` [PATCH 3/5][RFC][CFT] smarter propagate_mnt() Al Viro
2014-03-05  3:51 ` [PATCH 4/5][RFC][CFT] reduce m_start() cost Al Viro
2014-03-05  3:52 ` [PATCH 5/5][RFC][CFT] resizable namespace.c hashes Al Viro
2014-03-07 17:17   ` Andi Kleen
2014-03-07 18:38     ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140307183834.GE18016@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=andi@firstfloor.org \
    --cc=jeder@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sct@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).