From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: rt hash table / rt hash locks question Date: Wed, 16 Jun 2010 14:27:38 +0200 Message-ID: <1276691258.2632.55.camel@edumazet-laptop> References: <20100616104633.GW6138@laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: Nick Piggin Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:34651 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754402Ab0FPM1o (ORCPT ); Wed, 16 Jun 2010 08:27:44 -0400 Received: by wyb40 with SMTP id 40so5632510wyb.19 for ; Wed, 16 Jun 2010 05:27:42 -0700 (PDT) In-Reply-To: <20100616104633.GW6138@laptop> Sender: netdev-owner@vger.kernel.org List-ID: Le mercredi 16 juin 2010 =C3=A0 20:46 +1000, Nick Piggin a =C3=A9crit : > I'm just converting this scalable dentry/inode hash table to a more > compact form. I was previously using a dumb spinlock per bucket, > but this doubles the size of the tables so isn't production quality. >=20 Yes, we had this in the past (one rwlock or spinlock per hash chain), and it was not very good with LOCKDEP on. > What I've done at the moment is to use a bit_spinlock in bit 0 of eac= h > list pointer of the table. Bit spinlocks are now pretty nice because > we can do __bit_spin_unlock() which gives non-atomic store with relea= se > ordering, so it should be almost as fast as spinlock. >=20 > But I look at rt hash and it seems you use a small hash on the side > for spinlocks. So I wonder, pros for each: >=20 > - bitlocks have effectively zero storage yes but a mask is needed to get head pointer. Special care also mus= t be taken when insert/delete a node in chain, keeping this bit set. > - bitlocks hit the same cacheline that the hash walk hits. yes > - in RCU list, locked hash walks usually followed by hash modificatio= n, > bitlock should have brought in the line for exclusive. But we usually perform a read only lookup, _then_ take the lock, to perform a new lookup before insert. So at time we would take the bitlock, cache line is in shared state. With spinlocks, we always use the exclusive mode, but on a separate cache line... > - bitlock number of locks scales with hash size Yes, but concurrency is more a function of online cpus, given we us= e jhash.=20 > - spinlocks may be slightly better at the cacheline level (bitops > sometimes require explicit load which may not acquire exclusive > line on some archs). On x86 ll/sc architectures, this shouldn't > be a problem. Yes, you can add fairness (if ticket spinlocks variant used), but o= n route cache I really doubt it can make a difference. > - spinlocks better debugging (could be overcome with a LOCKDEP > option to revert to spinlocks, but a bit ugly). Definitely a good thing. > - in practice, contention due to aliasing in buckets to lock mapping > is probably fairly minor. Agreed >=20 > Net code is obviously tested and tuned well, but instinctively I woul= d > have tought bitlocks are the better way to go. Any comments on this? Well, to be honest, this code is rather old, and at time I wrote it, bitlocks were probably not available. You can add : - One downside of the hashed spinlocks is the X86_INTERNODE_CACHE_SHIFT being 12 on X86_VSMP : All locks are probably in same internode block := ( - Another downside is all locks are currently on a single NUMA node, since we kmalloc() them in one contiguous chunk. So I guess it would be worth to try :)