From: Eric Dumazet <eric.dumazet@gmail.com>
To: Nick Piggin <npiggin@suse.de>
Cc: netdev@vger.kernel.org
Subject: Re: rt hash table / rt hash locks question
Date: Wed, 16 Jun 2010 14:27:38 +0200 [thread overview]
Message-ID: <1276691258.2632.55.camel@edumazet-laptop> (raw)
In-Reply-To: <20100616104633.GW6138@laptop>
Le mercredi 16 juin 2010 à 20:46 +1000, Nick Piggin a écrit :
> I'm just converting this scalable dentry/inode hash table to a more
> compact form. I was previously using a dumb spinlock per bucket,
> but this doubles the size of the tables so isn't production quality.
>
Yes, we had this in the past (one rwlock or spinlock per hash chain),
and it was not very good with LOCKDEP on.
> What I've done at the moment is to use a bit_spinlock in bit 0 of each
> list pointer of the table. Bit spinlocks are now pretty nice because
> we can do __bit_spin_unlock() which gives non-atomic store with release
> ordering, so it should be almost as fast as spinlock.
>
> But I look at rt hash and it seems you use a small hash on the side
> for spinlocks. So I wonder, pros for each:
>
> - bitlocks have effectively zero storage
yes but a mask is needed to get head pointer. Special care also must
be taken when insert/delete a node in chain, keeping this bit set.
> - bitlocks hit the same cacheline that the hash walk hits.
yes
> - in RCU list, locked hash walks usually followed by hash modification,
> bitlock should have brought in the line for exclusive.
But we usually perform a read only lookup, _then_ take the lock, to
perform a new lookup before insert. So at time we would take the
bitlock, cache line is in shared state. With spinlocks, we always use
the exclusive mode, but on a separate cache line...
> - bitlock number of locks scales with hash size
Yes, but concurrency is more a function of online cpus, given we use
jhash.
> - spinlocks may be slightly better at the cacheline level (bitops
> sometimes require explicit load which may not acquire exclusive
> line on some archs). On x86 ll/sc architectures, this shouldn't
> be a problem.
Yes, you can add fairness (if ticket spinlocks variant used), but on
route cache I really doubt it can make a difference.
> - spinlocks better debugging (could be overcome with a LOCKDEP
> option to revert to spinlocks, but a bit ugly).
Definitely a good thing.
> - in practice, contention due to aliasing in buckets to lock mapping
> is probably fairly minor.
Agreed
>
> Net code is obviously tested and tuned well, but instinctively I would
> have tought bitlocks are the better way to go. Any comments on this?
Well, to be honest, this code is rather old, and at time I wrote it,
bitlocks were probably not available.
You can add :
- One downside of the hashed spinlocks is the X86_INTERNODE_CACHE_SHIFT
being 12 on X86_VSMP : All locks are probably in same internode block :(
- Another downside is all locks are currently on a single NUMA node,
since we kmalloc() them in one contiguous chunk.
So I guess it would be worth to try :)
next prev parent reply other threads:[~2010-06-16 12:27 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-16 10:46 rt hash table / rt hash locks question Nick Piggin
2010-06-16 12:27 ` Eric Dumazet [this message]
2010-06-16 12:49 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1276691258.2632.55.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox