Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Thomas Graf <tgraf@suug.ch>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH 5/6] rhashtable: support guaranteed successful insertion.
Date: Fri, 06 Apr 2018 13:11:56 +1000	[thread overview]
Message-ID: <87y3i1vxj7.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20180329052231.GA21028@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 5902 bytes --]

On Thu, Mar 29 2018, Herbert Xu wrote:

> On Thu, Mar 29, 2018 at 08:26:21AM +1100, NeilBrown wrote:
>>
>> I say "astronomically unlikely", you say "probability .. is extremely
>> low".  I think we are in agreement here.
>> 
>> The point remains that if an error *can* be returned then I have to
>> write code to handle it and test that code.  I'd rather not.
>
> You have be able to handle errors anyway because of memory allocation
> failures.  Ultimately if you keep inserting you will eventually
> fail with ENOMEM.  So I don't see the issue with an additional
> error value.

You don't need to handle memory allocation failures at the point where
you insert into the table - adding to a linked list requires no new
memory.

If you are running out of memory, you will fail to allocate new objects,
so you won't be able to add them to the rhashtable.  Obviously you have
to handle that failure, and that is likely to save rhashtable from
getting many mem-alloc failures.

But once you have allocated a new object, rhashtable should just accept
it.  It doesn't help anyone to have the insertion fail.
The insertion can currently fail even when allocating a new object would
succeed, as assertion can require a GFP_ATOMIC allocation to succeed,
while allocating a new object might only need a GFP_KERNEL allocation to
succeed. 

>
>> > Even if it does happen we won't fail because we will perform
>> > an immediate rehash.  We only fail if it happens right away
>> > after the rehash (that is, at least another 16 elements have
>> > been inserted and you're trying to insert a 17th element, all
>> > while the new hash table has not been completely populated),
>> > which means that somebody has figured out our hash secret and
>> > failing in that case makes sense.
>
> BTW, you didn't acknowledge this bit which I think is crucial to
> how likely such an error is.

The likelihood of the error isn't really the issue - it either can
happen or it cannot.  If it can, it requires code to handle it.

>
>> I never suggested retrying, but I would have to handle it somehow.  I'd
>> rather not.
>
> ...
>
>> While I have no doubt that there are hashtables where someone could try
>> to attack the hash, I am quite sure there are others where is such an
>> attack is meaningless - any code which could generate the required range of
>> keys, could do far worse things more easily.
>
> Our network hashtable has to be secure against adversaries.  I
> understand that this may not be important to your use-case.  However,
> given the fact that the failure would only occur if an adversary
> is present and actively attacking your hash table, I don't think
> it has that much of a negative effect on your use-case either.

I certainly accept that there are circumstances where it is a real
possibility that an adversary might succeed in attacking the hash
function, and changing the seed for each table is an excellent idea.
It isn't clear to me that failing insertions is needed - it is done
rather late and so doesn't throttle much, and could give the attacker
feedback that they had succeeded (?).

Not all hash tables are susceptible to attack.  It would be nice if
developers working in less exposed areas could use rhashtable without ever
having to check for errors.

Error codes are messages from the implementation to the caller.
They should be chosen to help the caller make useful choices, not to
allow the implementation to avoid awkward situations.

>
> Of course if you can reproduce the EBUSY error without your
> disable_count patch or even after you have fixed the issue I
> have pointed out in your disable_count patch you can still reproduce
> it then that would suggest a real bug and we would need to fix it,
> for everyone.

I suspect that I cannot.  However experience tells me that customers do
things that I cannot and would never expect - they can often trigger
errors that I cannot.  It is best if the error cannot be returned.

>
>> Yes, storing a sharded count in the spinlock table does seem like an
>> appropriate granularity.  However that leads me to ask: why do we have
>> the spinlock table?  Why not bit spinlocks in the hashchain head like
>> include/linux/list_bl uses?
>
> The spinlock table predates rhashtable.  Perhaps Thomas/Eric/Dave
> can elucidate this.

Maybe I'll submit an RFC patch to change it - if I can make it work.

>
>> I don't understand how it can ever be "too late", though I appreciate
>> that in some cases "sooner" is better than "later"
>> If we give up on the single atomic_t counter, then we must accept that
>> the number of elements could exceed any given value.  The only promise
>> we can provide is that it wont exceed N% of the table size for more than
>> T seconds.
>
> Sure.  However, assuming we use an estimate that is hash-based,
> that *should* be fairly accurate assuming that your hash function
> is working in the first place.  It's completely different vs.
> estimating based on a per-cpu count which could be wildly inaccurate.

Ahhh... I see what you are thinking now.  You aren't suggestion a
sharded count that is summed periodically.  Rather you are suggesting that
we divide the hash space into N regions each with their own independent
counter, and resize the table if any one region reaches 70% occupancy -
on the assumption that the other regions are likely to be close.  Is
that right?
It would probably be dangerous to allow automatic shrinking (when just
one drops below 30%) in that case, but it might be OK for growing.

I'm not sure I like the idea, but it is worth thinking about.

Thanks,
NeilBrown

>
> Cheers,
> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

next prev parent reply	other threads:[~2018-04-06  3:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-26 23:33 [PATCH 0/6] rhashtable: assorted fixes and enhancements NeilBrown
2018-03-26 23:33 ` [PATCH 5/6] rhashtable: support guaranteed successful insertion NeilBrown
2018-03-27 15:56   ` Herbert Xu
2018-03-27 21:34     ` NeilBrown
2018-03-28  6:04       ` Herbert Xu
2018-03-28  7:04         ` NeilBrown
2018-03-28  7:27           ` Herbert Xu
2018-03-28 21:26             ` NeilBrown
2018-03-29  5:22               ` Herbert Xu
2018-04-06  3:11                 ` NeilBrown [this message]
2018-04-06  4:13                   ` Herbert Xu
2018-03-26 23:33 ` [PATCH 6/6] rhashtable: allow element counting to be disabled NeilBrown
2018-03-26 23:33 ` [PATCH 4/6] rhashtable: allow a walk of the hash table without missing objects NeilBrown
2018-03-27 15:49   ` David Miller
2018-03-27 15:54     ` Herbert Xu
2018-03-27 21:50     ` NeilBrown
2018-03-27 15:51   ` Herbert Xu
2018-03-27 21:54     ` NeilBrown
2018-03-28  6:07       ` Herbert Xu
2018-03-28  7:17         ` NeilBrown
2018-03-28  7:30           ` Herbert Xu
2018-03-28 21:34             ` NeilBrown
2018-03-29  1:13               ` NeilBrown
2018-03-26 23:33 ` [PATCH 2/6] rhashtable: remove outdated comments about grow_decision etc NeilBrown
2018-03-26 23:33 ` [PATCH 3/6] rhashtable: reset intr when rhashtable_walk_start sees new table NeilBrown
2018-03-27 15:47   ` Herbert Xu
2018-03-26 23:33 ` [PATCH 1/6] rhashtable: improve documentation for rhashtable_walk_peek() NeilBrown
2018-03-27 10:55   ` Sergei Shtylyov
2018-03-27 15:30   ` Herbert Xu
2018-03-27 15:47   ` David Miller
2018-03-27 21:45   ` [PATCH 1/6 v2] " NeilBrown
2018-03-27 22:46   ` [PATCH 1/6] " Andreas Grünbacher
2018-03-28  0:49     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y3i1vxj7.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).