From: Steven Rostedt <rostedt@goodmis.org>
To: Ross Zwisler <zwisler@google.com>
Cc: linux-trace-devel@vger.kernel.org,
Stevie Alvarez <stevie.6strings@gmail.com>
Subject: Re: [PATCH v2 08/17] libtraceeval histograms: Move hash functions into their own file
Date: Tue, 15 Aug 2023 16:23:33 -0400 [thread overview]
Message-ID: <20230815162333.621b6d8b@gandalf.local.home> (raw)
In-Reply-To: <20230815193156.GD780024@google.com>
On Tue, 15 Aug 2023 13:31:56 -0600
Ross Zwisler <zwisler@google.com> wrote:
>
> > diff --git a/src/hash.c b/src/hash.c
> > new file mode 100644
> > index 000000000000..e4f2a983d39c
> > --- /dev/null
> > +++ b/src/hash.c
> > @@ -0,0 +1,119 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * libtraceeval hashtable interface implementation.
> > + *
> > + * Copyright (C) 2023 Google Inc, Steven Rostedt <rostedt@goodmis.org>
> > + */
> > +
> > +#include <traceeval-hist.h>
> > +
> > +#include "eval-local.h"
> > +
> > +__hidden struct hash_table *hash_alloc(void)
> > +{
> > + struct hash_table *hash;
> > +
> > + hash = calloc(1, sizeof(*hash));
> > + if (!hash)
> > + return NULL;
> > +
> > + hash->bits = HASH_BITS;
> > + hash->hash = calloc(HASH_SIZE(hash->bits), sizeof(*hash->hash));
> > + if (!hash->hash) {
> > + free(hash);
> > + hash = NULL;
> > + }
> > +
> > + return hash;
> > +}
> > +
> > +__hidden void hash_free(struct hash_table *hash)
> > +{
> > + free(hash->hash);
> > + free(hash);
> > +}
> > +
> > +__hidden void hash_add(struct hash_table *hash, struct hash_item *item, unsigned key)
> > +{
> > + key &= HASH_MASK(hash->bits);
>
> key should already be masked to HASH_MASK(hash->bits) via make_hash(). If
> those bits are set, we have a bug somewhere.
>
> I think it's better to check to see if those bits are set and bail out loudly
> with an error.
I debated a bit about where to do the mask. I didn't want to put a
dependency that the bits were already masked, so I just did it twice. It's
a very fast operation.
I also don't want to make the dependency that key can only come from
mask_hash(). As it's critical that key is masked here (or we have a array
overflow), I just kept it in both places.
I can comment that here.
>
> > +
> > + item->next = hash->hash[key];
> > + hash->hash[key] = item;
> > + item->key = key;
> > +
> > + hash->nr_items++;
> > +}
> > +
> > +__hidden int hash_remove(struct hash_table *hash, struct hash_item *item)
> > +{
> > + struct hash_item **parent;
> > +
> > + for (parent = &hash->hash[item->key]; *parent; parent = &(*parent)->next) {
> > + if (*parent == item) {
> > + *parent = item->next;
> > + hash->nr_items--;
> > + return 1;
> > + }
> > + }
> > + return 0;
> > +}
> > +
> > +__hidden struct hash_iter *hash_iter_start(struct hash_table *hash)
> > +{
> > + struct hash_iter *iter = &hash->iter;
> > + size_t i;
> > +
> > + for (i = 0; i < HASH_SIZE(hash->bits); i++) {
> > + if (!hash->hash[i])
> > + continue;
> > + iter->next_item = hash->hash[i];
>
> I think we need to break here after we've found a populated bucket and set
> iter->next_item. Right now this works only if we have a single populated
> bucket, because we'll set iter->next_item once and then keep iterating until
> i == HASH_SIZE(hash->bits).
>
> This will mean that iter->current_bucket will always == HASH_SIZE(hash->bits),
> but we have a bug in hash_iter_next() that meant we weren't returning early
> with NULL when we hit this condition.
Nice catch, but I found this bug during my tests but fixed it in patch 11. :-p
I'll move that change here, as that's the proper place it belongs.
>
> > + }
> > + iter->current_bucket = i;
> > + return iter;
> > +}
> > +
> > +__hidden struct hash_item *hash_iter_next(struct hash_iter *iter)
> > +{
> > + struct hash_table *hash = container_of(iter, struct
> > hash_table, iter);
> > + struct hash_item *item;
> > +
> > + if (iter->current_bucket > HASH_SIZE(hash->bits))
>
> if (iter->current_bucket >= HASH_SIZE(hash->bits))
>
> Right now we're missing the exit case where
> iter->current_bucket == HASH_SIZE(hash->bits), which means we've run out
> of buckets with entries.
You are correct that it should be fixed, but it just happens that the rest
of the logic will still end up returning NULL when this happens. But the
above was suppose to be a "shortcut" that never gets hit (or why I fixed it
in patch 11)!
>
> > + return NULL;
> > +
> > + item = iter->next_item;
In patch 11, I added:
if (!item)
return NULL;
That should be here too (along with the above off by one fix).
> > +
> > + iter->next_item = item->next;
> > + if (!iter->next_item) {
> > + size_t i;
> > +
> > + for (i = iter->current_bucket + 1; i <
> > HASH_SIZE(hash->bits); i++) {
> > + if (!hash->hash[i])
> > + continue;
> > + iter->next_item = hash->hash[i];
>
> As in hash_iter_start(), we need to break when we find the next non-empty
> bucket and set iter->next_item. This will let us set iter->current_bucket
> correctly as well.
And this too was fixed in patch 11.
>
> > + }
> > + iter->current_bucket = i;
> > + }
> > + return item;
> > +}
> > +
> > +__hidden struct hash_iter *hash_iter_bucket(struct hash_table *hash,
> > unsigned key) +{
> > + struct hash_iter *iter = &hash->iter;
> > +
> > + key &= HASH_MASK(hash->bits);
>
> As with hash_add(), 'key' should already be masked and I think it'd be
> better to error out loudly if upper bits in 'key' are unexpectedly set.
Again, I don't want to add a dependency of where key was created. It may be
used for other purposes (and the upper bits may hold info later). I think
it's just more robust to keep the extra masks. Again, they are very fast
operations.
Thanks Ross,
-- Steve
>
> > +
> > + iter->current_bucket = key;
> > + iter->next_item = hash->hash[key];
> > +
> > + return iter;
> > +}
> > +
next prev parent reply other threads:[~2023-08-15 20:24 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-11 5:39 [PATCH v2 00/17] libtraceeval histogram: Updates Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 01/17] libtraceeval histograms: Fix traceeval_results_release() error message Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 02/17] libtraceeval: Add sample task-eval program Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 03/17] libtraceeval hist: Add pointer and const string types Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 04/17] libtraceeval histogram: Have cmp and release functions be generic Steven Rostedt
2023-08-15 16:50 ` Ross Zwisler
2023-08-15 18:52 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 05/17] libtraceeval histograms: Add traceeval struct to compare function Steven Rostedt
2023-08-15 16:55 ` Ross Zwisler
2023-08-15 18:53 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 06/17] libtraceeval histogram: Remove comparing of traceeval and types Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 07/17] libtraceeval: Convert hist array into a hash table Steven Rostedt
2023-08-15 18:44 ` Ross Zwisler
2023-08-15 19:05 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 08/17] libtraceeval histograms: Move hash functions into their own file Steven Rostedt
2023-08-15 19:31 ` Ross Zwisler
2023-08-15 20:23 ` Steven Rostedt [this message]
2023-08-15 22:56 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 09/17] libtraceeval histogram: Label and check keys and values Steven Rostedt
2023-08-15 19:48 ` Ross Zwisler
2023-08-15 20:24 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 10/17] libtraceeval histogram: Add updating of stats Steven Rostedt
2023-08-15 20:25 ` Ross Zwisler
2023-08-15 20:55 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 11/17] libtraceeval histogram: Add iterator APIs Steven Rostedt
2023-08-16 21:34 ` Ross Zwisler
2023-08-16 21:49 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 12/17] libtraceeval histogram: Add data copy callback Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 13/17] libtraceeval histogram: Do the release on updates Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 14/17] libtraceeval histogram: Use stack for old copy in update Steven Rostedt
2023-08-16 22:37 ` Ross Zwisler
2023-08-16 23:12 ` Steven Rostedt
2023-08-17 1:03 ` Steven Rostedt
2023-08-17 1:13 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 15/17] libtraceeval histogram: Add traceeval_iterator_sort_custom() Steven Rostedt
2023-08-16 22:57 ` Ross Zwisler
2023-08-16 23:22 ` Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 16/17] libtraceeval histogram: Have traceeval_query() just give the pointer to results Steven Rostedt
2023-08-11 5:39 ` [PATCH v2 17/17] libtraceeval samples: Update task-eval to use the histogram logic Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230815162333.621b6d8b@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=linux-trace-devel@vger.kernel.org \
--cc=stevie.6strings@gmail.com \
--cc=zwisler@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).