git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Cc: Eric Sunshine <sunshine@sunshineco.com>,
	Tanay Abhra <tanayabh@gmail.com>, Git List <git@vger.kernel.org>,
	Ramkumar Ramachandra <artagnon@gmail.com>
Subject: Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Date: Sat, 28 Jun 2014 07:20:41 +0200	[thread overview]
Message-ID: <53AE50A9.6010707@gmail.com> (raw)
In-Reply-To: <vpqmwcyw47q.fsf@anie.imag.fr>

Am 27.06.2014 21:19, schrieb Matthieu Moy:
> 
> The reason for which setting any config value invalidates the cache is
> that it is _much_ simpler to implement. Less complexity, less
> corner-cases, less bugs.
> 

I think I see what you mean. E.g. in cmd_clone we have:

  write_config(&option_config);
  git_config(git_default_config, NULL);
  ...
  git_config_set("remote.Foo.url", repo);
  ...
  remote = remote_get(option_origin); /* which does 'git_config(handle_config)' */

So if we load the config cache at 'git_config(git_default_config)', then
'remote_get' won't see "remote.Foo.url" unless we invalidate the cache first.


I still don't like that the invalidation is done in git_config_set, though, as
this is also used to write completely unrelated files. Wouldn't it be better to
have a 'git_config_refresh()' that could be used in place of (or before) current
'git_config(callback)' calls? The initial implementation could just invalidate
the config cache. If there's time and energy to spare, a more advanced version
could first check if any of the involved config files has changed.


The xstrdup() problem could be solved by interning strings (see the
attached patch for a trivial implementation). I.e. allocate each distinct
string only once (and keep it allocated).

So if there are 100 instances of "true" in the config file, the interned string
pool would contain only one instance (i.e. 5 bytes + hashmap_entry + malloc
overhead, vs. 100 * (5 bytes + malloc overhead) in case of xstrdup()). If the
config cache is cleared, the interned string in the pool would still remain
valid. If the config cache is reloaded, unmodified values would reuse the
existing strings from the pool.

Side note: interning getenv() results would fix the non-POSIX-compliant
getenv()-usage in git [1].

[1] https://groups.google.com/d/msg/msysgit/4cVWWJkN4to/DR8FGTQ09Q0J

---- 8< ----
Subject: [PATCH] hashmap: add string interning API

Interning short strings that are likely to have multiple copies may reduce
memory footprint and speed up string comparisons.

Add a strintern() API that uses a hashmap to manage the pool of interned
strings.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 Documentation/technical/api-hashmap.txt | 11 +++++++++++
 hashmap.c                               | 34 +++++++++++++++++++++++++++++++++
 hashmap.h                               |  4 ++++
 t/t0011-hashmap.sh                      | 13 +++++++++++++
 test-hashmap.c                          | 14 ++++++++++++++
 5 files changed, 76 insertions(+)

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
index b977ae8..db7c955 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -162,6 +162,17 @@ more entries.
 `hashmap_iter_first` is a combination of both (i.e. initializes the iterator
 and returns the first entry, if any).
 
+`const char *strintern(const char *string)`::
+
+	Returns the unique, interned version of the specified string, similar
+	to the `String.intern` API in Java and .NET. Interned strings remain
+	valid for the entire lifetime of the process.
++
+Can be used as `[x]strdup()` replacement, except that the interned string must
+not be modified or freed.
++
+Uses a hashmap to store the pool of interned strings.
+
 Usage example
 -------------
 
diff --git a/hashmap.c b/hashmap.c
index d1b8056..03cd8f3 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -226,3 +226,37 @@ void *hashmap_iter_next(struct hashmap_iter *iter)
 		current = iter->map->table[iter->tablepos++];
 	}
 }
+
+struct string_pool_entry {
+	struct hashmap_entry ent;
+	char key[FLEX_ARRAY];
+};
+
+static int string_pool_cmp(const struct string_pool_entry *e1,
+			   const struct string_pool_entry *e2,
+			   const char *key)
+{
+	return strcmp(e1->key, key ? key : e2->key);
+}
+
+const char *strintern(const char *string)
+{
+	static struct hashmap map;
+	struct string_pool_entry key, *e;
+	/* initialize string pool hashmap */
+	if (!map.tablesize)
+		hashmap_init(&map, (hashmap_cmp_fn) string_pool_cmp, 0);
+
+	/* lookup interned string in pool */
+	hashmap_entry_init(&key, strhash(string));
+	e = hashmap_get(&map, &key, string);
+	if (!e) {
+		/* not found: create it */
+		int len = strlen(string);
+		e = xmalloc(sizeof(struct string_pool_entry) + len + 1);
+		hashmap_entry_init(e, key.ent.hash);
+		memcpy(e->key, string, len + 1);
+		hashmap_add(&map, e);
+	}
+	return e->key;
+}
diff --git a/hashmap.h b/hashmap.h
index a816ad4..b428677 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -68,4 +68,8 @@ static inline void *hashmap_iter_first(struct hashmap *map,
 	return hashmap_iter_next(iter);
 }
 
+/* string interning */
+
+extern const char *strintern(const char *string);
+
 #endif
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 391e2b6..f97c805 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -237,4 +237,17 @@ test_expect_success 'grow / shrink' '
 
 '
 
+test_expect_success 'string interning' '
+
+test_hashmap "intern value1
+intern Value1
+intern value2
+intern value2
+" "value1
+Value1
+value2
+value2"
+
+'
+
 test_done
diff --git a/test-hashmap.c b/test-hashmap.c
index f5183fb..116cbb5 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -239,6 +239,20 @@ int main(int argc, char *argv[])
 			/* print table sizes */
 			printf("%u %u\n", map.tablesize, map.size);
 
+		} else if (!strcmp("intern", cmd) && l1) {
+
+			/* test that strintern works */
+			const char *i1 = strintern(p1);
+			const char *i2 = strintern(p1);
+			if (strcmp(i1, p1))
+				printf("strintern(%s) returns %s\n", p1, i1);
+			else if (i1 == p1)
+				printf("strintern(%s) returns input pointer\n", p1);
+			else if (i1 != i2)
+				printf("strintern(%s) != strintern(%s)", i1, i2);
+			else
+				printf("%s\n", i1);
+
 		} else if (!strcmp("perfhashmap", cmd) && l1 && l2) {
 
 			perf_hashmap(atoi(p1), atoi(p2));
-- 
2.0.0.9649.g1eafa1f.dirty

  reply	other threads:[~2014-06-28  5:20 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-23 10:41 [RFC/PATCH V2] alias.c: replace git_config with git_config_get_string Tanay Abhra
2014-06-23 10:41 ` [RFC/PATCH V2] branch.c: " Tanay Abhra
2014-06-25  4:45   ` Eric Sunshine
2014-06-26  8:09     ` Tanay Abhra
2014-06-29 11:06       ` Eric Sunshine
2014-06-23 10:41 ` [RFC/PATCH] imap-send.c: " Tanay Abhra
2014-06-25  7:09   ` Eric Sunshine
2014-06-26  8:14     ` Tanay Abhra
2014-06-26 16:50   ` Matthieu Moy
2014-06-26 23:57     ` Karsten Blees
2014-06-23 10:41 ` [RFC/PATCH] notes-util.c: " Tanay Abhra
2014-06-25  7:54   ` Eric Sunshine
2014-06-26  8:19     ` Tanay Abhra
2014-06-29 11:01       ` Eric Sunshine
2014-06-30 13:34         ` Karsten Blees
2014-06-30 14:32           ` Eric Sunshine
2014-06-30 14:54             ` Karsten Blees
2014-06-30 14:39           ` Tanay Abhra
2014-06-30 15:56             ` Karsten Blees
2014-06-30 16:21               ` Tanay Abhra
2014-06-30 17:52               ` Junio C Hamano
2014-07-01  8:36             ` Matthieu Moy
2014-06-23 10:41 ` [RFC/PATCH] notes.c: " Tanay Abhra
2014-06-25  8:06   ` Eric Sunshine
2014-06-26  8:20     ` Tanay Abhra
2014-06-23 10:41 ` [RFC/PATCH] pager.c: " Tanay Abhra
2014-06-25  3:59   ` Eric Sunshine
2014-06-26  8:24     ` Tanay Abhra
2014-06-26 18:46     ` Karsten Blees
2014-06-27 11:55       ` Matthieu Moy
2014-06-27 16:57         ` Karsten Blees
2014-06-27 19:19           ` Matthieu Moy
2014-06-28  5:20             ` Karsten Blees [this message]
2014-06-28  6:01               ` Matthieu Moy
2014-06-28 14:29                 ` Karsten Blees
2014-06-29 12:04                   ` Matthieu Moy
2014-06-23 22:38 ` [RFC/PATCH V2] alias.c: " Jonathan Nieder
2014-06-24  1:50   ` Tanay Abhra
2014-06-25  2:12 ` Eric Sunshine
2014-06-26  8:24   ` Tanay Abhra
2014-06-26 16:39 ` Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53AE50A9.6010707@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sunshine@sunshineco.com \
    --cc=tanayabh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).