From: Junio C Hamano <gitster@pobox.com>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 01/23] pack v4: initial pack dictionary structure and code
Date: Tue, 27 Aug 2013 08:08:08 -0700 [thread overview]
Message-ID: <xmqqbo4jf9yf.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1377577567-27655-2-git-send-email-nico@fluxnic.net> (Nicolas Pitre's message of "Tue, 27 Aug 2013 00:25:45 -0400")
Nicolas Pitre <nico@fluxnic.net> writes:
> Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
> ---
Was there a reason not to reuse the hash-table Linus did in
hash.[ch]?
It may not make much of a difference for something so small and
isolated from the rest of the system, but if hash.[ch] can be easily
fixed with a small tweak to suit the use by this subsystem better,
it might be worth reusing the existing code with improvement, which
may help other potential users.
> packv4-create.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 137 insertions(+)
> create mode 100644 packv4-create.c
>
> diff --git a/packv4-create.c b/packv4-create.c
> new file mode 100644
> index 0000000..2de6d41
> --- /dev/null
> +++ b/packv4-create.c
> @@ -0,0 +1,137 @@
> +/*
> + * packv4-create.c: management of dictionary tables used in pack v4
> + *
> + * (C) Nicolas Pitre <nico@fluxnic.net>
> + *
> + * This code is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include "cache.h"
> +
> +struct data_entry {
> + unsigned offset;
> + unsigned hits;
> +};
> +
> +struct dict_table {
> + char *data;
> + unsigned ptr;
> + unsigned size;
> + struct data_entry *entry;
> + unsigned nb_entries;
> + unsigned max_entries;
> + unsigned *hash;
> + unsigned hash_size;
> +};
> +
> +struct dict_table *create_dict_table(void)
> +{
> + return xcalloc(sizeof(struct dict_table), 1);
> +}
> +
> +void destroy_dict_table(struct dict_table *t)
> +{
> + free(t->data);
> + free(t->entry);
> + free(t->hash);
> + free(t);
> +}
> +
> +static int locate_entry(struct dict_table *t, const char *str)
> +{
> + int i = 0;
> + const unsigned char *s = (const unsigned char *) str;
> +
> + while (*s)
> + i = i * 111 + *s++;
> + i = (unsigned)i % t->hash_size;
> +
> + while (t->hash[i]) {
> + unsigned n = t->hash[i] - 1;
> + if (!strcmp(str, t->data + t->entry[n].offset))
> + return n;
> + if (++i >= t->hash_size)
> + i = 0;
> + }
> + return -1 - i;
> +}
> +
> +static void rehash_entries(struct dict_table *t)
> +{
> + unsigned n;
> +
> + t->hash_size *= 2;
> + if (t->hash_size < 1024)
> + t->hash_size = 1024;
> + t->hash = xrealloc(t->hash, t->hash_size * sizeof(*t->hash));
> + memset(t->hash, 0, t->hash_size * sizeof(*t->hash));
> +
> + for (n = 0; n < t->nb_entries; n++) {
> + int i = locate_entry(t, t->data + t->entry[n].offset);
> + if (i < 0)
> + t->hash[-1 - i] = n + 1;
> + }
> +}
> +
> +int dict_add_entry(struct dict_table *t, const char *str)
> +{
> + int i, len = strlen(str) + 1;
> +
> + if (t->ptr + len >= t->size) {
> + t->size = (t->size + len + 1024) * 3 / 2;
> + t->data = xrealloc(t->data, t->size);
> + }
> + memcpy(t->data + t->ptr, str, len);
> +
> + i = (t->nb_entries) ? locate_entry(t, t->data + t->ptr) : -1;
> + if (i >= 0) {
> + t->entry[i].hits++;
> + return i;
> + }
> +
> + if (t->nb_entries >= t->max_entries) {
> + t->max_entries = (t->max_entries + 1024) * 3 / 2;
> + t->entry = xrealloc(t->entry, t->max_entries * sizeof(*t->entry));
> + }
> + t->entry[t->nb_entries].offset = t->ptr;
> + t->entry[t->nb_entries].hits = 1;
> + t->ptr += len + 1;
> + t->nb_entries++;
> +
> + if (t->hash_size * 3 <= t->nb_entries * 4)
> + rehash_entries(t);
> + else
> + t->hash[-1 - i] = t->nb_entries;
> +
> + return t->nb_entries - 1;
> +}
> +
> +static int cmp_dict_entries(const void *a_, const void *b_)
> +{
> + const struct data_entry *a = a_;
> + const struct data_entry *b = b_;
> + int diff = b->hits - a->hits;
> + if (!diff)
> + diff = a->offset - b->offset;
> + return diff;
> +}
> +
> +static void sort_dict_entries_by_hits(struct dict_table *t)
> +{
> + qsort(t->entry, t->nb_entries, sizeof(*t->entry), cmp_dict_entries);
> + t->hash_size = (t->nb_entries * 4 / 3) / 2;
> + rehash_entries(t);
> +}
> +
> +void dict_dump(struct dict_table *t)
> +{
> + int i;
> +
> + sort_dict_entries_by_hits(t);
> + for (i = 0; i < t->nb_entries; i++)
> + printf("%d\t%s\n",
> + t->entry[i].hits,
> + t->data + t->entry[i].offset);
> +}
next prev parent reply other threads:[~2013-08-27 15:08 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-27 4:25 [PATCH 00/23] Preliminary pack v4 support Nicolas Pitre
2013-08-27 4:25 ` [PATCH 01/23] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-08-27 15:08 ` Junio C Hamano [this message]
2013-08-27 16:13 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 02/23] export packed_object_info() Nicolas Pitre
2013-08-27 4:25 ` [PATCH 03/23] pack v4: scan tree objects Nicolas Pitre
2013-08-27 4:25 ` [PATCH 04/23] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-08-27 4:25 ` [PATCH 05/23] pack v4: add commit object parsing Nicolas Pitre
2013-08-27 15:26 ` Junio C Hamano
2013-08-27 16:47 ` Nicolas Pitre
2013-08-27 17:42 ` Junio C Hamano
2013-08-27 4:25 ` [PATCH 06/23] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-08-27 4:25 ` [PATCH 07/23] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-08-27 4:25 ` [PATCH 08/23] pack v4: basic references encoding Nicolas Pitre
2013-08-27 15:29 ` Junio C Hamano
2013-08-27 15:53 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 09/23] pack v4: commit object encoding Nicolas Pitre
2013-08-27 15:39 ` Junio C Hamano
2013-08-27 16:50 ` Nicolas Pitre
2013-08-27 19:59 ` Nicolas Pitre
2013-08-27 20:15 ` Junio C Hamano
2013-08-27 21:43 ` Nicolas Pitre
2013-09-02 20:48 ` Duy Nguyen
2013-09-03 6:30 ` Nicolas Pitre
2013-09-03 7:41 ` Duy Nguyen
2013-09-05 3:50 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 10/23] pack v4: tree " Nicolas Pitre
2013-08-27 15:44 ` Junio C Hamano
2013-08-27 16:52 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 11/23] pack v4: dictionary table output Nicolas Pitre
2013-08-27 4:25 ` [PATCH 12/23] pack v4: creation code Nicolas Pitre
2013-08-27 15:48 ` Junio C Hamano
2013-08-27 16:59 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 13/23] pack v4: object headers Nicolas Pitre
2013-08-27 4:25 ` [PATCH 14/23] pack v4: object data copy Nicolas Pitre
2013-08-27 15:53 ` Junio C Hamano
2013-08-27 18:24 ` Nicolas Pitre
2013-08-27 4:25 ` [PATCH 15/23] pack v4: object writing Nicolas Pitre
2013-08-27 4:26 ` [PATCH 16/23] pack v4: tree object delta encoding Nicolas Pitre
2013-08-27 4:26 ` [PATCH 17/23] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-08-27 4:26 ` [PATCH 18/23] pack v4: honor pack.compression config option Nicolas Pitre
2013-08-27 4:26 ` [PATCH 19/23] pack v4: relax commit parsing a bit Nicolas Pitre
2013-08-27 4:26 ` [PATCH 20/23] pack index v3 Nicolas Pitre
2013-08-27 4:26 ` [PATCH 21/23] pack v4: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-08-27 4:26 ` [PATCH 22/23] pack v4: add progress display Nicolas Pitre
2013-08-27 4:26 ` [PATCH 23/23] initial pack index v3 support on the read side Nicolas Pitre
2013-08-31 11:45 ` Duy Nguyen
2013-09-03 6:09 ` Nicolas Pitre
2013-09-03 7:34 ` Duy Nguyen
2013-08-27 11:17 ` [PATCH] Document pack v4 format Nguyễn Thái Ngọc Duy
2013-08-27 18:25 ` Junio C Hamano
2013-08-27 18:53 ` Nicolas Pitre
2013-08-31 2:49 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-09-03 6:00 ` Nicolas Pitre
2013-09-03 6:46 ` Nicolas Pitre
2013-09-03 11:49 ` Duy Nguyen
2013-09-03 14:54 ` Duy Nguyen
2013-09-05 4:12 ` Nicolas Pitre
2013-09-05 4:19 ` Duy Nguyen
2013-09-05 4:40 ` Nicolas Pitre
2013-09-05 5:04 ` Duy Nguyen
2013-09-05 5:39 ` Nicolas Pitre
2013-09-05 16:52 ` Duy Nguyen
2013-09-05 17:14 ` Nicolas Pitre
2013-09-05 20:26 ` Junio C Hamano
2013-09-05 21:04 ` Nicolas Pitre
2013-09-06 4:18 ` Duy Nguyen
2013-09-06 13:19 ` Nicolas Pitre
2013-09-06 2:14 ` [PATCH v3] " Nguyễn Thái Ngọc Duy
2013-09-06 3:23 ` Nicolas Pitre
2013-09-06 9:48 ` Duy Nguyen
2013-09-06 13:25 ` Nicolas Pitre
2013-09-06 13:44 ` Duy Nguyen
2013-09-06 16:44 ` Nicolas Pitre
2013-09-07 4:57 ` Nicolas Pitre
2013-09-07 4:52 ` Nicolas Pitre
2013-09-07 8:05 ` Duy Nguyen
2013-08-27 15:03 ` [PATCH 00/23] Preliminary pack v4 support Junio C Hamano
2013-08-27 15:59 ` Nicolas Pitre
2013-08-27 16:44 ` Junio C Hamano
2013-08-28 2:30 ` Duy Nguyen
2013-08-28 2:58 ` Nicolas Pitre
2013-08-28 3:06 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqbo4jf9yf.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.