From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion. Date: Thu, 29 Mar 2018 08:26:21 +1100 Message-ID: <87370j51tu.fsf@notabene.neil.brown.name> References: <152210688405.11435.13010923693146415942.stgit@noble> <152210718434.11435.6551477417902631683.stgit@noble> <20180327155610.GD14001@gondor.apana.org.au> <87y3id42zo.fsf@notabene.neil.brown.name> <20180328060432.GA16291@gondor.apana.org.au> <87d0zo4r5j.fsf@notabene.neil.brown.name> <20180328072727.GA17306@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Cc: Thomas Graf , netdev@vger.kernel.org, linux-kernel@vger.kernel.org To: Herbert Xu Return-path: In-Reply-To: <20180328072727.GA17306@gondor.apana.org.au> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Wed, Mar 28 2018, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 06:04:40PM +1100, NeilBrown wrote: >> >> I disagree. My patch 6 only makes it common instead of exceedingly >> rare. If any table in the list other than the first has a chain with 16 >> elements, then trying to insert an element with a hash which matches >> that chain will fail with -EBUSY. This is theoretically possible >> already, though astronomically unlikely. So that case will never be >> tested for. > > No that's not true. If the table is correctly sized then the > probability of having a chain with 16 elements is extremely low. I say "astronomically unlikely", you say "probability .. is extremely low". I think we are in agreement here. The point remains that if an error *can* be returned then I have to write code to handle it and test that code. I'd rather not. > > Even if it does happen we won't fail because we will perform > an immediate rehash. We only fail if it happens right away > after the rehash (that is, at least another 16 elements have > been inserted and you're trying to insert a 17th element, all > while the new hash table has not been completely populated), > which means that somebody has figured out our hash secret and > failing in that case makes sense. > >> It is hard to know if it is necessary. And making the new table larger >> will make the error less likely, but still won't make it impossible. So >> callers will have to handle it - just like they currently have to handle >> -ENOMEM even though it is highly unlikely (and not strictly necessary). > > Callers should not handle an ENOMEM error by retrying. Nor should > they retry an EBUSY return value. I never suggested retrying, but I would have to handle it somehow. I'd rather not. > >> Are these errors ever actually useful? I thought I had convinced myself >> before that they were (to throttle attacks on the hash function), but >> they happen even less often than I thought. > > The EBUSY error indicates that the hash table has essentially > degenereated into a linked list because somebody has worked out > our hash secret. While I have no doubt that there are hashtables where someone could try to attack the hash, I am quite sure there are others where is such an attack is meaningless - any code which could generate the required range of keys, could do far worse things more easily. > >> Maybe. Reading a percpu counter isn't cheap. Reading it whenever a hash >> chain reaches 16 is reasonable, but I think we would want to read it a >> lot more often than that. So probably store the last-sampled time (with >> no locking) and only sample the counter if last-sampled is more than >> jiffies - 10*HZ (???) > > We could also take the spinlock table approach and have a counter > per bucket spinlock. This should be sufficient as you'll contend > on the bucket spinlock table anyway. Yes, storing a sharded count in the spinlock table does seem like an appropriate granularity. However that leads me to ask: why do we have the spinlock table? Why not bit spinlocks in the hashchain head like include/linux/list_bl uses? > > This also allows us to estimate the total table size and not have > to always do a last-ditch growth when it's too late. I don't understand how it can ever be "too late", though I appreciate that in some cases "sooner" is better than "later" If we give up on the single atomic_t counter, then we must accept that the number of elements could exceed any given value. The only promise we can provide is that it wont exceed N% of the table size for more than T seconds. Thanks, NeilBrown > > Cheers, > --=20 > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlq8CH0ACgkQOeye3VZi gbnUkg/5AYRx+nSwNNDMCSUSMCfJSZXeT8FRL/nXAJj1w9WjZgnSccvg6dpTvvTr Pl5wZD+yOnkakDAzoKd71xoYEK9AWm5vOGrc/7Te+4kmYvmXuHDiKusWS+tEP8G3 TDHjhqVOMaDP6Naei5LFHj3v9nI0oxDYXh4Yc4OBfTIPJ1PLObyMkHBOH3L5QNrD Fy2+4yg6LPrETufUBCDyLJZM0Pa/brMCIy+g1DOpGpbQl0v6q/iEXO9na/flCvMj zLiJmch7V/6t1y+N/rr9HR6O9HJVN5DK8in16moFelc/y32kgpiYJTVQ3XFbVd17 0AVxe2wsgsliwQgyF9QprlyMjFybt+Vey/fhLYWEVNAMx83ri1cndtZyh9f7ix96 atFGuGdo5PTiabodyOSsKfMnH+mx2Zl/hO/3a7lsIGgeNiCQgQTAzxrgR7FLFMaJ xtGQzJkUMYQNoh5oCDHoaRcOOxIvKEtBaftV8LGuQLqHIBB1FPL5JJ/irHla+7AB JVnugb/BDZylEva5y+kMyFvr2UFjDfSB6hOD02tUoVWMvijquo+K7kRzkfQdd7HO rPTlwgaTMQ4uEW7X9JKeAc9HS6JFGPLKolKkFEYaW2cYvmcF4jHrfQPz/M0J4izc f9Gkywm13MTyji9N1sYUqotG/eVpitP2qcHJ49ALQELIglWor30= =lLnV -----END PGP SIGNATURE----- --=-=-=--