git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] git config cache & special querying api utilizing the cache
@ 2014-07-02  6:01 Tanay Abhra
  2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
  2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
  0 siblings, 2 replies; 17+ messages in thread
From: Tanay Abhra @ 2014-07-02  6:01 UTC (permalink / raw)
  To: git
  Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy, Karsten Blees,
	Junio C Hamano, Eric Sunshine

Hi,

[PATCH v4]: Introduced `config_set` construct which points to a ordered set of
	config-files cached as hashmaps. Added relevant API functions. For more
	details see the documentation. Rewrote the git_config_get* family to use
	`config_set` internally. Added tests for both config_set API and git_config_get
	family. Added type specific API functions which parses the found value and
	converts it into a specific type.
	Most of the changes implemented are the result of discussion in [6].
	Thanks to Eric, Ramsay, Junio, Matthieu & Karsten for their suggestions
	and review.


[PATCH v3]: Added flag for NULL values that were causing segfaults in some cases.
	Added test-config for usage examples.
	Minor changes and corrections. Refer to discussion thread[5] for more details.
	Thanks to Matthieu, Jeff and Junio for their valuable suggestions.

[PATCH v2]:Changed the string_list to a struct instead of pointer to a struct.
	Added string-list initilization functions.
	Minor mistakes corrected acoording to review comments[4]. Thanks to
	Eric and Matthieu for their review.

[PATCH V1]:Most of the invaluable suggestions by Eric Sunshine, Torsten Bogershausen and
	Jeff King has been implemented[1]. Complete rewrite of config_cache*() family
	using git_config() as hook as suggested by Jeff. Thanks for the review.

[RFC V2]: Improved according to the suggestions by Eric Sunshine and Torsten Bogershausen.
	Added cache invalidation when config file is changed.[2]
	I am using git_config_set_multivar_in_file() as an update hook.

This is patch is for this year's GSoC. My project is
"Git Config API improvements". The link of my proposal is appended below [3].

The aim of this patch series is to generate a cache for querying values from
the config files in a non-callback manner as the current method reads and
parses the config files every time a value is queried for.

The cache is generated from hooking the update_cache function to the current
parsing and callback mechanism in config.c. It is implemented as an hashmap
using the hashmap-api with variables and its corresponding values list as
its members. The values in the list are sorted in order of increasing priority.
The cache is initialised the first time when any of the new query functions is
called. It is invalidated by using git_config_set_multivar_in_file() as an
update hook.

We add two new functions to the config-api, git_config_get_string() and
git_config_get_string_multi() for querying in a non callback manner from
the cache.

[1] http://marc.info/?t=140172066200006&r=1&w=2
[2] http://git.661346.n2.nabble.com/RFC-PATCH-0-2-Git-config-cache-amp-special-querying-api-utilizing-the-cache-td7611691.html
[3] https://drive.google.com/file/d/0B4suZ-aHqDcnSUZJRXVTTnZUN1E/edit?usp=sharing
[4] http://thread.gmane.org/gmane.comp.version-control.git/251073/focus=251369
[5] http://thread.gmane.org/gmane.comp.version-control.git/251704/
[6] http://thread.gmane.org/gmane.comp.version-control.git/252329/

Tanay Abhra (2):
  config-hash.c
  test-config

 .gitignore                             |   1 +
 Documentation/technical/api-config.txt | 144 ++++++++++++
 Makefile                               |   2 +
 cache.h                                |  41 ++++
 config-hash.c                          | 405 +++++++++++++++++++++++++++++++++
 config.c                               |   3 +
 t/t1308-config-hash.sh                 | 163 +++++++++++++
 test-config.c                          | 129 +++++++++++
 8 files changed, 888 insertions(+)
 create mode 100644 config-hash.c
 create mode 100755 t/t1308-config-hash.sh
 create mode 100644 test-config.c

-- 
1.9.0.GIT

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  6:01 [PATCH v4 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
@ 2014-07-02  6:01 ` Tanay Abhra
  2014-07-02  9:14   ` Matthieu Moy
  2014-07-02 17:00   ` Junio C Hamano
  2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
  1 sibling, 2 replies; 17+ messages in thread
From: Tanay Abhra @ 2014-07-02  6:01 UTC (permalink / raw)
  To: git
  Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy, Karsten Blees,
	Junio C Hamano, Eric Sunshine

Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a `config_set` construct that points to an ordered set of config files
specified by the user, each of which represents what was read and cached
in core as hashmaps. Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets,
which follow `last one wins` semantic(i.e. if there are multiple matches
for the queried key in the files of the configset the value returned will
the last value in the last matching file of the configset) Add type
specific query functions like `git_configset_get_bool` and similar.

Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files(repo specific .git/config, user wide
~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
single hashmap populated using git_config(). The reason for doing so is
twofold, one is to honour include directives, another is to guarantee O(1)
lookups for usual config values, as values are queried for hundred of
times during a run of a git program.
Add two external functions `git_config_get_value` and 
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
---
 Documentation/technical/api-config.txt | 144 ++++++++++++
 Makefile                               |   1 +
 cache.h                                |  41 ++++
 config-hash.c                          | 405 +++++++++++++++++++++++++++++++++
 config.c                               |   3 +
 5 files changed, 594 insertions(+)
 create mode 100644 config-hash.c

diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 230b3a0..2c02fee 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -77,6 +77,75 @@ To read a specific file in git-config format, use
 `git_config_from_file`. This takes the same callback and data parameters
 as `git_config`.
 
+Querying For Specific Variables
+-------------------------------
+
+For programs wanting to query for specific variables in a non-callback
+manner, the config API provides two functions `git_config_get_value`
+and `git_config_get_value_multi`.They both read values from an internal
+cache generated previously from reading the config files.
+
+`git_config_get_value` takes two parameters,
+
+- a key string in canonical flat form for which the corresponding value
+  with the highest priority (i.e. value in the repo config will be
+  preferred over value in user wide config for the same variable) will
+  be retrieved.
+
+- a pointer to a string which will point to the retrieved value. The caller
+  should not free or modify the value returned as it is owned by the cache.
+
+`git_config_get_value` returns 0 on success, or -1 for no value found.
+
+`git_config_get_value_multi` allocates and returns a `string_list`
+containing all the values for the key passed as parameter, sorted in order
+of increasing priority (Note: caller has to call `string_list_clear` on
+the returned pointer and then free it).
+
+`git_config_get_value_multi` returns NULL for no value found.
+
+`git_config_clear` is provided for resetting and invalidating the cache.
+
+`git_config_iter` gives a way to iterate over the keys in cache. Existing
+`git_config` callback function signature is used for iterating.
+
+The config API also provides type specific API functions which do conversion
+as well as retrieval for the queried variable, including:
+
+`git_config_get_int`::
+Parse the retrieved value to an integer, including unit factors. Dies on
+error; otherwise, allocates and copies the integer into the dest parameter.
+Returns 0 on success, or 1 if no value was found.
+
+`git_config_get_ulong`::
+Identical to `git_config_get_int` but for unsigned longs.
+
+`git_config_bool`::
+Parse the retrieved value into a boolean value, respecting keywords like
+"true" and "false". Integer values are converted into true/false values
+(when they are non-zero or zero, respectively). Other values cause a die().
+If parsing is successful, allocates and copies the result into the dest
+parameter. Returns 0 on success, or 1 if no value is found.
+
+`git_config_get_bool_or_int`::
+Same as `git_config_get_bool`, except that integers are copied as-is, and
+an `is_bool` flag is unset.
+
+`git_config_get_maybe_bool`::
+Same as `git_config_get_bool`, except that it returns -1 on error rather
+than dying.
+
+`git_config_get_string`::
+Allocates and copies the retrieved value string into the `dest` parameter;
+if NULL string is given, prints an error message and returns -1. Returns 1,
+if no value is found for the queried variable .
+
+`git_config_get_pathname`::
+Similar to `git_config_get_string`, but expands `~` or `~user` into the
+user's home directory when found at the beginning of the path.
+
+See test-config.c for usage examples.
+
 Value Parsing Helpers
 ---------------------
 
@@ -134,6 +203,81 @@ int read_file_with_include(const char *file, config_fn_t fn, void *data)
 `git_config` respects includes automatically. The lower-level
 `git_config_from_file` does not.
 
+Custom Configsets
+-----------------
+
+A `config_set` points at an ordered set of config files, each of
+which represents what was read and cached in-core from a file.
+It can be used to construct an in-memory cache for config files that
+the caller specifies. For example,
+
+---------------------------------------
+struct config_set gm_config = CONFIG_SET_INIT;
+int b;
+/* we add config files to the config_set */
+git_configset_add_file(&gm_config, ".gitmodules");
+git_configset_add_file(&gm_config, ".gitmodules_alt");
+
+if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) {
+/* hack hack hack */
+}
+
+/* when we are done with the configset */
+git_configset_clear(&gm_config);
+----------------------------------------
+
+Configset API provides functions for the above mentioned work flow, including:
+
+`git_configset_init`::
+
+Returns an allocated and initialized struct `config_set` pointer.
+
+`git_configset_add_file`::
+
+Parses the file and constructs an in core cache which is appended at the end of
+the config_set, dies if there is an error in parsing the file.
+
+`git_configset_get_value`::
+
+Follows "last one wins" semantic, i.e., if there are multiple matches for the
+queried key in the files of the configset the value returned will the last value
+in the last matching file of the configset.
+
+It takes three parameters,
+
+- the `config_set` from which the value will be queried.
+
+- a key string in canonical flat form
+
+- a pointer to a string which will point to the retrieved value. The caller
+  should not free or modify the value returned as it is owned by the cache.
+
+`git_configset_get_value` returns 0 on success, or -1 for no value found.
+
+`git_configset_get_value_multi`::
+
+`git_configset_get_value_multi` allocates and returns a `string_list`
+containing all the values for the key passed as parameter, sorted in order
+of increasing priority (Note: caller has to call `string_list_clear` on
+the returned pointer and then free it).
+
+`git_configset_get_value_multi` returns NULL for no value found.
+
+'git_configset_iter'::
+
+It gives a way to iterate over the keys in config_set.
+Existing `git_config` callback function signature is used for iterating.
+
+`git_configset_clear`::
+It is used to clear the config_set after use. If the `config_set` was dynamically
+allocated using `git_configset_init`, the caller has to free the `config_set`
+after calling `git_configset_clear`.
+
+In addition to above the config_set API provides type specific functions in the
+vein of `git_config_get_int` but with an extra parameter, `config_set`. They
+all behave similarly to the `git_config_get*()` family described in
+"Querying For Specific Variables" above.
+
 Writing Config Files
 --------------------
 
diff --git a/Makefile b/Makefile
index 07ea105..d503f78 100644
--- a/Makefile
+++ b/Makefile
@@ -777,6 +777,7 @@ LIB_OBJS += commit.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
+LIB_OBJS += config-hash.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/cache.h b/cache.h
index df65231..005e945 100644
--- a/cache.h
+++ b/cache.h
@@ -7,6 +7,7 @@
 #include "advice.h"
 #include "gettext.h"
 #include "convert.h"
+#include "string-list.h"
 
 #include SHA1_HEADER
 #ifndef git_SHA_CTX
@@ -1325,6 +1326,46 @@ extern int parse_config_key(const char *var,
 			    const char **subsection, int *subsection_len,
 			    const char **key);
 
+/* config-hash.c */
+#define CONFIG_SET_INIT { NULL, 0, 0 }
+
+struct config_set {
+	struct config_set_item *item;
+	unsigned int nr, alloc;
+};
+
+struct config_set_item {
+	struct hashmap config_hash;
+	char *filename;
+};
+
+extern struct config_set *git_configset_init(void);
+extern int git_configset_add_file(struct config_set *cs, const char *filename);
+extern int git_configset_get_value(struct config_set *cs, const char *key, const char **value);
+extern const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
+extern void git_configset_clear(struct config_set *cs);
+extern void git_configset_iter(struct config_set *cs, config_fn_t fn, void *data);
+extern int git_configset_get_string(struct config_set *cs, const char *key, const char **dest);
+extern int git_configset_get_int(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_ulong(struct config_set *cs, const char *key, unsigned long *dest);
+extern int git_configset_get_bool(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_bool_or_int(struct config_set *cs, const char *key, int *is_bool, int *dest);
+extern int git_configset_get_maybe_bool(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_pathname(struct config_set *cs, const char *key, const char **dest);
+
+extern int git_config_get_value(const char *key, const char **value);
+extern const struct string_list *git_config_get_value_multi(const char *key);
+extern void git_config_clear(void);
+extern void git_config_iter(config_fn_t fn, void *data);
+extern int git_config_get_string(const char *key, const char **dest);
+extern int git_config_get_int(const char *key, int *dest);
+extern int git_config_get_ulong(const char *key, unsigned long *dest);
+extern int git_config_get_bool(const char *key, int *dest);
+extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest);
+extern int git_config_get_maybe_bool(const char *key, int *dest);
+extern int git_config_get_pathname(const char *key, const char **dest);
+
+
 extern int committer_ident_sufficiently_given(void);
 extern int author_ident_sufficiently_given(void);
 
diff --git a/config-hash.c b/config-hash.c
new file mode 100644
index 0000000..3e45d03
--- /dev/null
+++ b/config-hash.c
@@ -0,0 +1,405 @@
+#include "cache.h"
+#include "hashmap.h"
+#include "string-list.h"
+
+
+/*
+ * Default config_set that contains key-value pairs from the usual set of config
+ * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
+ * global /etc/gitconfig)
+ */
+static struct config_set the_config_set;
+static int the_config_set_initialised = 0;
+
+struct config_hash_entry {
+	struct hashmap_entry ent;
+	char *key;
+	struct string_list value_list;
+};
+
+static int config_hash_add_value(const char *, const char *,
+				  struct config_set_item *, struct config_set *);
+
+static int config_hash_entry_cmp(const struct config_hash_entry *e1,
+				 const struct config_hash_entry *e2, const void *unused)
+{
+	return strcmp(e1->key, e2->key);
+}
+
+static void config_hash_init(struct hashmap *config_hash)
+{
+	hashmap_init(config_hash, (hashmap_cmp_fn)config_hash_entry_cmp, 0);
+}
+
+struct config_set_cb {
+	struct config_set *cs;
+	struct config_set_item *ct;
+};
+
+static int config_hash_callback(const char *key, const char *value, void *cb)
+{
+	struct config_set_cb *c = cb;
+	config_hash_add_value(key, value, c->ct, c->cs);
+	return 0;
+}
+
+static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
+{
+	int ret = 0;
+	struct config_set_item *ct;
+	struct config_set_cb cb;
+	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
+	ct = &cs->item[cs->nr++];
+	ct->filename = xstrdup(filename);
+	config_hash_init(&ct->config_hash);
+	cb.cs = cs;
+	cb.ct = ct;
+	/*
+	 * When filename is "default", hashmap is constructed from the usual set of
+	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
+	 * global /etc/gitconfig), used in `the_config_set`
+	 */
+	if (!strcmp(filename, "default"))
+		git_config(config_hash_callback, &cb);
+	else
+		ret = git_config_from_file(config_hash_callback, filename, &cb);
+	if (!ret)
+		return &ct->config_hash;
+	else {
+		free(ct->filename);
+		hashmap_free(&ct->config_hash, 1);
+		cs->nr--;
+		return NULL;
+	}
+}
+
+static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
+{
+	int i;
+	for(i = 0; i < cs->nr; i++) {
+		if (!strcmp(cs->item[i].filename, filename))
+			return &cs->item[i].config_hash;
+	}
+	return add_config_hash(filename, cs);
+}
+
+static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
+							struct config_set *cs)
+{
+	struct hashmap *config_hash;
+	struct config_hash_entry k;
+	struct config_hash_entry *found_entry;
+	char *normalized_key;
+	int ret;
+	config_hash = get_config_hash(filename, cs);
+	ret = git_config_parse_key(key, &normalized_key, NULL);
+
+	if (ret)
+		return NULL;
+
+	hashmap_entry_init(&k, strhash(normalized_key));
+	k.key = normalized_key;
+	found_entry = hashmap_get(config_hash, &k, NULL);
+	free(normalized_key);
+	return found_entry;
+}
+
+static struct string_list *config_hash_get_value(const char *key, const char* filename, 
+						 struct config_set *cs)
+{
+	struct config_hash_entry *e = config_hash_find_entry(key, filename, cs);
+	return e ? &e->value_list : NULL;
+}
+
+static int config_hash_add_value(const char *key, const char *value,
+				 struct config_set_item *ct, struct config_set *cs)
+{
+	struct hashmap *config_hash = &ct->config_hash;
+	struct config_hash_entry *e;
+	e = config_hash_find_entry(key, ct->filename, cs);
+
+	if (!e) {
+		e = xmalloc(sizeof(*e));
+		hashmap_entry_init(e, strhash(key));
+		e->key = xstrdup(key);
+		memset(&e->value_list, 0, sizeof(e->value_list));
+		e->value_list.strdup_strings = 1;
+		hashmap_add(config_hash, e);
+	}
+	/*
+	 * Since the values are being fed by git_config*() callback mechanism, they
+	 * are already normalised. So simply add them without any further munging.
+	 */
+	string_list_append_nodup(&e->value_list, value ? xstrdup(value) : NULL);
+
+	return 0;
+}
+
+struct config_set *git_configset_init(void)
+{
+	struct config_set *cs = xcalloc(1, sizeof(*cs));
+	return cs;
+}
+
+int git_configset_add_file(struct config_set *cs, const char *filename)
+{
+	return get_config_hash(filename, cs) ? 0 : -1;
+}
+
+int git_configset_get_value(struct config_set *cs, const char *key, const char **value)
+{
+	int i;
+	struct string_list *values = NULL;
+	/*
+	 * Follows "last one wins" semantic, i.e., if there are multiple matches for the
+	 * queried key in the files of the configset the value returned will the last value
+	 * in the last matching file of the configset
+	 */
+	assert(cs->nr > 0);
+	for (i = cs->nr-1; i >= 0; i--) {
+		values = config_hash_get_value(key, cs->item[i].filename, cs);
+		if (values)
+			break;
+	}
+	if (!values)
+		return -1;
+	assert(values->nr > 0);
+	*value = values->items[values->nr - 1].string;
+	return 0;
+}
+
+const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
+{
+	int i, j;
+	struct string_list *values = NULL;
+	struct string_list *retval = xcalloc(1, sizeof(*retval));
+	for (i = 0; i < cs->nr; i++) {
+		values = config_hash_get_value(key, cs->item[i].filename, cs);
+		if (values)
+			for(j = 0; j < values->nr; j++)
+				string_list_append_nodup(retval, values->items[j].string);
+	}
+	if (!retval->nr) {
+		free(retval);
+		return NULL;
+	}
+	return retval;
+}
+
+/*
+ * Uses the existing git_config() callback function signature to iterate over
+ * keys in the configset.
+ */
+void git_configset_iter(struct config_set *cs, config_fn_t fn, void *data)
+{
+	int i;
+	char *value;
+	struct string_list *strptr;
+	struct config_hash_entry *entry;
+	struct hashmap *config_hash;
+	struct hashmap_iter iter;
+	for (i = 0; i < cs->nr; i++) {
+		config_hash = &cs->item[i].config_hash;
+		hashmap_iter_init(config_hash, &iter);
+		while ((entry = hashmap_iter_next(&iter))) {
+			strptr = &entry->value_list;
+			for (i = 0; i < strptr->nr; i++) {
+				value = strptr->items[i].string;
+				fn(entry->key, value, data);
+			}
+		}
+	}
+}
+
+void git_configset_clear(struct config_set *cs)
+{
+	int i;
+	struct hashmap *config_hash;
+	struct config_hash_entry *entry;
+	struct hashmap_iter iter;
+	/* No files in config_set so return */
+	if (!cs->nr)
+		return;
+	for (i = 0; i < cs->nr; i++) {
+		config_hash = &cs->item[i].config_hash;
+		free(cs->item[i].filename);
+		hashmap_iter_init(config_hash, &iter);
+		while ((entry = hashmap_iter_next(&iter))) {
+			free(entry->key);
+			string_list_clear(&entry->value_list, 0);
+		}
+		hashmap_free(config_hash, 1);
+	}
+	free(cs->item);
+	cs->item = NULL;
+	cs->nr = cs->alloc = 0;
+}
+
+int git_configset_get_string(struct config_set *cs, const char *key, const char **dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value))
+		return git_config_string(dest, key, value);
+	else
+		return 1;
+}
+
+int git_configset_get_int(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_int(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_ulong(struct config_set *cs, const char *key, unsigned long *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_ulong(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_bool(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_bool(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_bool_or_int(struct config_set *cs, const char *key,
+				int *is_bool, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_bool_or_int(key, value, is_bool);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_maybe_bool(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_maybe_bool(key, value);
+		if (*dest == -1)
+			return -1;
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_pathname(struct config_set *cs, const char *key, const char **dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value))
+		return git_config_pathname(dest, key, value);
+	else
+		return 1;
+}
+
+int git_config_get_value(const char *key, const char **value)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_value(&the_config_set, key, value);
+}
+
+const struct string_list *git_config_get_value_multi(const char *key)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_value_multi(&the_config_set, key);
+}
+
+void git_config_clear(void)
+{
+	if (!the_config_set_initialised)
+		return;
+	git_configset_clear(&the_config_set);
+	the_config_set_initialised = 0;
+}
+
+void git_config_iter(config_fn_t fn, void *data)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	git_configset_iter(&the_config_set, fn, data);
+}
+
+int git_config_get_string(const char *key, const char **dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_string(&the_config_set, key, dest);
+}
+
+int git_config_get_int(const char *key, int *dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_int(&the_config_set, key, dest);
+}
+
+int git_config_get_ulong(const char *key, unsigned long *dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_ulong(&the_config_set, key, dest);
+}
+
+int git_config_get_bool(const char *key, int *dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_bool(&the_config_set, key, dest);
+}
+
+int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_bool_or_int(&the_config_set, key, is_bool, dest);
+}
+
+int git_config_get_maybe_bool(const char *key, int *dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_maybe_bool(&the_config_set, key, dest);
+}
+
+int git_config_get_pathname(const char *key, const char **dest)
+{
+	if (!the_config_set_initialised) {
+		git_configset_add_file(&the_config_set, "default");
+		the_config_set_initialised = 1;
+	}
+	return git_configset_get_pathname(&the_config_set, key, dest);
+}
diff --git a/config.c b/config.c
index a1aef1c..6cffec7 100644
--- a/config.c
+++ b/config.c
@@ -1708,6 +1708,9 @@ int git_config_set_multivar_in_file(const char *config_filename,
 	lock = NULL;
 	ret = 0;
 
+	/* Invalidate the config cache */
+	git_config_clear();
+
 out_free:
 	if (lock)
 		rollback_lock_file(lock);
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] test-config: Add tests for the config_set API
  2014-07-02  6:01 [PATCH v4 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
  2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
@ 2014-07-02  6:01 ` Tanay Abhra
  2014-07-02  9:29   ` Matthieu Moy
  1 sibling, 1 reply; 17+ messages in thread
From: Tanay Abhra @ 2014-07-02  6:01 UTC (permalink / raw)
  To: git
  Cc: Tanay Abhra, Ramkumar Ramachandra, Matthieu Moy, Karsten Blees,
	Junio C Hamano, Eric Sunshine

Expose the `config_set` C API as a set of simple commands in order to
facilitate testing. Add tests for the `config_set` API as well as for
`git_config_get_*()` family for the usual config files.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
---
 .gitignore             |   1 +
 Makefile               |   1 +
 t/t1308-config-hash.sh | 163 +++++++++++++++++++++++++++++++++++++++++++++++++
 test-config.c          | 129 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 294 insertions(+)
 create mode 100755 t/t1308-config-hash.sh
 create mode 100644 test-config.c

diff --git a/.gitignore b/.gitignore
index 42294e5..eeb66e2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -176,6 +176,7 @@
 /gitweb/static/gitweb.js
 /gitweb/static/gitweb.min.*
 /test-chmtime
+/test-config
 /test-ctype
 /test-date
 /test-delta
diff --git a/Makefile b/Makefile
index d503f78..e875070 100644
--- a/Makefile
+++ b/Makefile
@@ -548,6 +548,7 @@ X =
 PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
 TEST_PROGRAMS_NEED_X += test-chmtime
+TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
diff --git a/t/t1308-config-hash.sh b/t/t1308-config-hash.sh
new file mode 100755
index 0000000..6c059c5
--- /dev/null
+++ b/t/t1308-config-hash.sh
@@ -0,0 +1,163 @@
+#!/bin/sh
+
+test_description='Test git config-hash API in different settings'
+
+. ./test-lib.sh
+
+test_expect_success 'clear default config' '
+	rm -f .git/config
+'
+
+cat > .git/config << EOF
+[core]
+	penguin = very blue
+	Movie = BadPhysics
+	UPPERCASE = true
+	MixedCase = true
+	my =
+	foo
+	baz = sam
+[Cores]
+	WhatEver = Second
+[my "Foo bAr"]
+	hi = hello
+[core]
+	baz = bat
+	baz = hask
+[lamb]
+	chop = 65
+	head = none
+[goat]
+	legs = 4
+	head = true
+	skin = false
+	nose = 1
+	horns
+EOF
+
+test_expect_success 'get value for a simple key' '
+	echo "very blue" >expect &&
+	test-config get_value core.penguin >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'get value for a key with value as an empty string' '
+	echo "" >expect &&
+	test-config get_value core.my >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'get value for a key with value as NULL' '
+	echo "(NULL)" >expect &&
+	test-config get_value core.foo >actual &&
+	test_cmp expect actual
+'
+test_expect_success 'upper case key' '
+	echo "true" >expect &&
+	test-config get_value core.UPPERCASE >actual &&
+	test_cmp expect actual
+'
+test_expect_success 'mixed case key' '
+	echo "true" >expect &&
+	test-config get_value core.MixedCase >actual &&
+	test_cmp expect actual
+'
+test_expect_success 'key and value with mixed case' '
+	echo "BadPhysics" >expect &&
+	test-config get_value core.Movie >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'key with case sensitive subsection' '
+	echo "hello" >expect &&
+	test-config get_value "my.Foo bAr.hi">actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find value with misspelled key' '
+	echo "Value not found for \"my.fOo Bar.hi\"" >expect &&
+	test_must_fail test-config get_value "my.fOo Bar.hi">actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find value with the highest priority' '
+	echo hask >expect &&
+	test-config get_value "core.baz">actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find integer value for a key' '
+	echo 65 >expect &&
+	test-config get_int lamb.chop >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find integer if value is non parse-able' '
+	echo 65 >expect &&
+	test_must_fail test-config get_int lamb.head >actual &&
+	test_must_fail test_cmp expect actual
+'
+
+cat > expect << EOF
+1
+0
+1
+1
+1
+EOF
+
+test_expect_success 'find bool value for the entered key' '
+	test-config get_bool goat.head >>actual &&
+	test-config get_bool goat.skin >>actual &&
+	test-config get_bool goat.nose >>actual &&
+	test-config get_bool goat.horns >>actual &&
+	test-config get_bool goat.legs >>actual &&
+	test_cmp expect actual
+'
+
+cat > expect << EOF
+sam
+bat
+hask
+EOF
+
+test_expect_success 'find multiple values' '
+	test-config get_value_multi "core.baz">actual &&
+	test_cmp expect actual
+'
+
+cat > config2 << EOF
+[core]
+	baz = lama
+[my]
+	new = silk
+[core]
+	baz = ball
+EOF
+
+test_expect_success 'find value from a configset' '
+	echo silk > expect &&
+	test-config configset_get_value 2 config2 .git/config my.new >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'find value with highest priority from a configset' '
+	echo hask > expect &&
+	test-config configset_get_value 2 config2 .git/config core.baz >actual &&
+	test_cmp expect actual
+'
+
+cat > except << EOF
+sam
+bat
+hask
+lama
+ball
+EOF
+
+test_expect_success 'find value_list for a key from a configset' '
+	test-config configset_get_value 2 config2 .git/config core.baz >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/test-config.c b/test-config.c
new file mode 100644
index 0000000..32c912a
--- /dev/null
+++ b/test-config.c
@@ -0,0 +1,129 @@
+#include "cache.h"
+#include "hashmap.h"
+#include "string-list.h"
+
+/*
+ * This program exposes the C API of the `config_set` mechanism
+ * as a set of simple commands in order to facilitate testing.
+ *
+ * Reads stdin and prints result of command to stdout:
+ *
+ * get_value -> prints the value with highest priority for the entered key
+ *
+ * get_value_multi -> prints all values for the entered key in increasing order
+ *		     of priority
+ *
+ * get_int -> print integer value for the entered key or die
+ *
+ * get_bool -> print bool value for the entered key or die
+ *
+ * configset_get_value -> first makes a config-set of files entered as arguments
+ *			 and then returns value with the highest priority for the
+ *			 entered key.
+ *
+ * configset_get_value_multi -> first makes a config-set of files entered as
+ * 				arguments and then returns value_list for the entered
+ *				key sorted in ascending order of priority.
+ *
+ * Examples:
+ *
+ * To print the value with highest priority for key "foo.bAr Baz.rock":
+ * 	test-config get_value "foo.bAr Baz.rock"
+ *
+ */
+
+
+int main(int argc, char **argv)
+{
+	int i, no_of_files;
+	int ret = 0;
+	const char *v;
+	int val;
+	const struct string_list *strptr;
+	struct config_set cs = CONFIG_SET_INIT;
+	if (argc == 3 && !strcmp(argv[1], "get_value")) {
+		if (!git_config_get_value(argv[2], &v)) {
+			if (!v)
+				printf("(NULL)\n");
+			else
+				printf("%s\n", v);
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
+		strptr = git_config_get_value_multi(argv[2]);
+		if (strptr) {
+			for (i = 0; i < strptr->nr; i++) {
+				v = strptr->items[i].string;
+				if (!v)
+					printf("(NULL)\n");
+				else
+					printf("%s\n", v);
+			}
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+	} else if (argc == 3 && !strcmp(argv[1], "get_int")) {
+		if (!git_config_get_int(argv[2], &val)) {
+			printf("%d\n", val);
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+	} else if (argc == 3 && !strcmp(argv[1], "get_bool")) {
+		if (!git_config_get_bool(argv[2], &val)) {
+			printf("%d\n", val);
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+	} else if (!strcmp(argv[1], "configset_get_value")) {
+		no_of_files = git_config_int("unused", argv[2]);
+		for (i = 1; i <= no_of_files; i++) {
+			ret = git_configset_add_file(&cs, argv[i + 2]);
+			if (ret)
+				return -1;
+		}
+		if (!git_configset_get_value(&cs, argv[3 + no_of_files], &v)) {
+			if (!v)
+				printf("(NULL)\n");
+			else
+				printf("%s\n", v);
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+
+	} else if (!strcmp(argv[1], "configset_get_value_multi")) {
+		no_of_files = git_config_int("unused", argv[2]);
+		for (i = 1; i <= no_of_files; i++) {
+			ret = git_configset_add_file(&cs, argv[i + 2]);
+			if (ret)
+				return -1;
+		}
+		strptr = git_configset_get_value_multi(&cs, argv[3 + no_of_files]);
+		if (strptr) {
+			for (i = 0; i < strptr->nr; i++) {
+				v = strptr->items[i].string;
+				if (!v)
+					printf("(NULL)\n");
+				else
+					printf("%s\n", v);
+			}
+			return 0;
+		} else {
+			printf("Value not found for \"%s\"\n", argv[2]);
+			return -1;
+		}
+	}
+
+	fprintf(stderr, "%s: Please check the syntax and the function name\n", argv[0]);
+	return 1;
+}
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
@ 2014-07-02  9:14   ` Matthieu Moy
  2014-07-02 11:58     ` Tanay Abhra
                       ` (2 more replies)
  2014-07-02 17:00   ` Junio C Hamano
  1 sibling, 3 replies; 17+ messages in thread
From: Matthieu Moy @ 2014-07-02  9:14 UTC (permalink / raw)
  To: Tanay Abhra
  Cc: git, Ramkumar Ramachandra, Karsten Blees, Junio C Hamano,
	Eric Sunshine

Tanay Abhra <tanayabh@gmail.com> writes:

> Add a `config_set` construct that points to an ordered set of config files
> specified by the user, each of which represents what was read and cached
> in core as hashmaps. Add two external functions `git_configset_get_value`
> and `git_configset_get_value_multi` for querying from the config sets,
> which follow `last one wins` semantic(i.e. if there are multiple matches

Space before (

> for the queried key in the files of the configset the value returned will
> the last value in the last matching file of the configset) Add type

will _be_ ?

Does this remark also apply to git_configset_get_value_multi? My
understanding was that "last one wins" applied only to
git_configset_get_value.

> Add a default `config_set`, `the_config_set` to cache all key-value pairs

I don't like the name "the_config_set". It's not the only one. Perhaps
repo_config_set? (not totally satisfactory as it does not contain only
the repo, but the repo+user+system)

What do others think?

> read from usual config files(repo specific .git/config, user wide

(You always forget space before '('...)

> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
> single hashmap populated using git_config(). The reason for doing so is
> twofold, one is to honour include directives, another is to guarantee O(1)
> lookups for usual config values, as values are queried for hundred of
> times during a run of a git program.

What is the reason to deal with `the_config_set` and other config sets
differently? You're giving arguments to store `the_config_set` as a
single hashmap, but what is the reason to store others as multiple
hashmaps?

And actually, I don't completely buy your arguments: having 3 or 4
hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
/etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
could deal with include directives by having ".git/config and included
files" in a hashmap, "~/.gitconfig and included files" in a second
hashmap, ...

My guess is that the real argument is "git_config already does what I
want and I'm too lazy to change it". And I do consider lazyness as a
quality for a computer-scientist ;-).

I would personally find it much simpler to have a single hashmap. We'd
lose the ability to invalidate the cache for only a single file, but I'm
not sure it's worth the code complexity.

> +Querying For Specific Variables
> +-------------------------------
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal

Space after .

> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

"Existing" refers to the old state. It's OK in a commit message, but
won't make sense in the future, when your non-callback API and the old
callback API will live side by side.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

It seems strange to me to return 1 here, and -1 in git_config_get_value
to mean the same thing.

> +`git_config_bool`::

Isn't it git_config_get_bool?

> +/* config-hash.c */

You may point to the documentation in the comment too.

> +#define CONFIG_SET_INIT { NULL, 0, 0 }
> +
> +struct config_set {
> +	struct config_set_item *item;
> +	unsigned int nr, alloc;
> +};
> +
> +struct config_set_item {
> +	struct hashmap config_hash;
> +	char *filename;

Perhaps const char *?

> +static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int ret = 0;
> +	struct config_set_item *ct;
> +	struct config_set_cb cb;
> +	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
> +	ct = &cs->item[cs->nr++];
> +	ct->filename = xstrdup(filename);
> +	config_hash_init(&ct->config_hash);
> +	cb.cs = cs;
> +	cb.ct = ct;
> +	/*
> +	 * When filename is "default", hashmap is constructed from the usual set of
> +	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
> +	 * global /etc/gitconfig), used in `the_config_set`
> +	 */
> +	if (!strcmp(filename, "default"))

How does one read a file actually called "default" with this API?

More generally, why do you need a special-case here? I think it would be
much better to leave this part unaware of the default, and have a
separate function like create_repo_config_hash (or
create_the_config_hash) that would call git_config(...). There actually
isn't much shared code between the "default" case and the others in your
function.

> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
> +{
> +	int i;
> +	for(i = 0; i < cs->nr; i++) {
> +		if (!strcmp(cs->item[i].filename, filename))
> +			return &cs->item[i].config_hash;
> +	}
> +	return add_config_hash(filename, cs);
> +}
> +
> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
> +							struct config_set *cs)

I don't get the logic here.

Either the caller explicitly manages a config_set variable like

  config_set my_set = git_configset_init();
  git_configset_add_file(my_set, "my-file");
  git_configset_get_value(my_set, "some.variable.name");

Or there's an implicit singleton mapping files to hashmaps to allow the
user to say

  git_configset_get_value("my-file", "some.variable.name");

without asking the user to explicitly manipulate config_set variables.

It seems to me that your solution is a bad compromise between both
solutions: you do require the user to manipulate config_set variables,
but you also require a filename to look for the right entry in the list.

Did you miss the end of Junio's message:

  http://thread.gmane.org/gmane.comp.version-control.git/252567

?

If the user explicitly asks for a single entry in the list, then why
group them in a list? I guess the question is the same as above, and the
more I read the patch, the more I think a config_set should contain a
single hashmap.

> +static int config_hash_add_value(const char *key, const char *value,
> +				 struct config_set_item *ct, struct config_set *cs)
> +{
> +	struct hashmap *config_hash = &ct->config_hash;
> +	struct config_hash_entry *e;
> +	e = config_hash_find_entry(key, ct->filename, cs);
> +
> +	if (!e) {
> +		e = xmalloc(sizeof(*e));
> +		hashmap_entry_init(e, strhash(key));
> +		e->key = xstrdup(key);
> +		memset(&e->value_list, 0, sizeof(e->value_list));
> +		e->value_list.strdup_strings = 1;
> +		hashmap_add(config_hash, e);
> +	}
> +	/*
> +	 * Since the values are being fed by git_config*() callback mechanism, they
> +	 * are already normalised.

We usually speak en_US rather than en_UK => normalized.

> +int git_config_get_value(const char *key, const char **value)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value(&the_config_set, key, value);
> +}
> +
> +const struct string_list *git_config_get_value_multi(const char *key)
> +{
> +	if (!the_config_set_initialised) {
> +		git_configset_add_file(&the_config_set, "default");
> +		the_config_set_initialised = 1;
> +	}
> +	return git_configset_get_value_multi(&the_config_set, key);
> +}

These if statements could be refactored, and the_config_set_initialised
could be hidden to the caller. Just make a function get_the_config_set()
that initializes if needed and returns a pointer to the_config_set.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] test-config: Add tests for the config_set API
  2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
@ 2014-07-02  9:29   ` Matthieu Moy
  2014-07-02 12:04     ` Tanay Abhra
  0 siblings, 1 reply; 17+ messages in thread
From: Matthieu Moy @ 2014-07-02  9:29 UTC (permalink / raw)
  To: Tanay Abhra
  Cc: git, Ramkumar Ramachandra, Karsten Blees, Junio C Hamano,
	Eric Sunshine

Tanay Abhra <tanayabh@gmail.com> writes:

> +test_expect_success 'clear default config' '
> +	rm -f .git/config
> +'
> +
> +cat > .git/config << EOF

t/README says:

 - Put all code inside test_expect_success and other assertions.

   Even code that isn't a test per se, but merely some setup code
   should be inside a test assertion.

Even these cat > would better be in a test_expect_success 'initialize
config'.

(Not applied everywhere in Git's code essentially because some tests
were written before the guideline was set and never updated).

> +[core]
> +	penguin = very blue
> +	Movie = BadPhysics
> +	UPPERCASE = true
> +	MixedCase = true
> +	my =
> +	foo
> +	baz = sam
> +[Cores]
> +	WhatEver = Second
> +[my "Foo bAr"]
> +	hi = hello

To really stress the "case sensitive middle part" case, you should also
have other sections like

[my "foo bar"]
	hi = lower-case
[my "FOO BAR"]
	hi = upper-case

and check that you get the right value for my.*.hi

Similarly, I'd add a [CORE] and a [CoRe] section to check that their
content is actually merged with [core].

> +test_expect_success 'get value for a key with value as an empty string' '
> +	echo "" >expect &&
> +	test-config get_value core.my >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'get value for a key with value as NULL' '
> +	echo "(NULL)" >expect &&
> +	test-config get_value core.foo >actual &&
> +	test_cmp expect actual
> +'
> +test_expect_success 'upper case key' '

Keep the style consistent, if you separate tests with a single blank
line, do it everywhere.

> +cat > expect << EOF

See above, should be in test_expect_success.

Also, >expect, not > expect.

There are other instances.

> +1
> +0
> +1
> +1
> +1
> +EOF
> +
> +test_expect_success 'find bool value for the entered key' '
> +	test-config get_bool goat.head >>actual &&

The first one should be a single >, or you should clear actual before
the test.

> +int main(int argc, char **argv)
> +{
> +	int i, no_of_files;
> +	int ret = 0;
> +	const char *v;
> +	int val;
> +	const struct string_list *strptr;
> +	struct config_set cs = CONFIG_SET_INIT;



> +	if (argc == 3 && !strcmp(argv[1], "get_value")) {
> +		if (!git_config_get_value(argv[2], &v)) {
> +			if (!v)
> +				printf("(NULL)\n");
> +			else
> +				printf("%s\n", v);
> +			return 0;
> +		} else {
> +			printf("Value not found for \"%s\"\n", argv[2]);
> +			return -1;
> +		}
> +	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
> +		strptr = git_config_get_value_multi(argv[2]);
> +		if (strptr) {
> +			for (i = 0; i < strptr->nr; i++) {
> +				v = strptr->items[i].string;
> +				if (!v)
> +					printf("(NULL)\n");
> +				else
> +					printf("%s\n", v);
> +			}
> +			return 0;
> +		} else {
> +			printf("Value not found for \"%s\"\n", argv[2]);
> +			return -1;
> +		}
> +	} else if (argc == 3 && !strcmp(argv[1], "get_int")) {
> +		if (!git_config_get_int(argv[2], &val)) {
> +			printf("%d\n", val);
> +			return 0;
> +		} else {
> +			printf("Value not found for \"%s\"\n", argv[2]);
> +			return -1;
> +		}
> +	} else if (argc == 3 && !strcmp(argv[1], "get_bool")) {
> +		if (!git_config_get_bool(argv[2], &val)) {
> +			printf("%d\n", val);
> +			return 0;
> +		} else {
> +			printf("Value not found for \"%s\"\n", argv[2]);
> +			return -1;
> +		}
> +	} else if (!strcmp(argv[1], "configset_get_value")) {
> +		no_of_files = git_config_int("unused", argv[2]);

Why ask the user to give a number of files on the command-line. With a
syntax like

test-config configset_get_value <key> <files>...

you could just use argc to iterate over argv. Here, you trust the user
to provide the right value, and most likely segfault otherwise (and this
is not really documented). I know this is only test code, but why not do
it right anyway ;-).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  9:14   ` Matthieu Moy
@ 2014-07-02 11:58     ` Tanay Abhra
  2014-07-02 14:32       ` Matthieu Moy
  2014-07-02 17:09     ` Junio C Hamano
  2014-07-04  4:58     ` Tanay Abhra
  2 siblings, 1 reply; 17+ messages in thread
From: Tanay Abhra @ 2014-07-02 11:58 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: git, Ramkumar Ramachandra, Karsten Blees, Junio C Hamano,
	Eric Sunshine



On 7/2/2014 2:44 PM, Matthieu Moy wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
> 
>> Add a `config_set` construct that points to an ordered set of config files
>> specified by the user, each of which represents what was read and cached
>> in core as hashmaps. Add two external functions `git_configset_get_value`
>> and `git_configset_get_value_multi` for querying from the config sets,
>> which follow `last one wins` semantic(i.e. if there are multiple matches
> 
> Space before (
> 

Noted. Sorry about that.

>> for the queried key in the files of the configset the value returned will
>> the last value in the last matching file of the configset) Add type

> Does this remark also apply to git_configset_get_value_multi? My
> understanding was that "last one wins" applied only to
> git_configset_get_value.
> 

Maybe a reworded sentence may work,
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list.)

>> Add a default `config_set`, `the_config_set` to cache all key-value pairs
> 
> I don't like the name "the_config_set". It's not the only one. Perhaps
> repo_config_set? (not totally satisfactory as it does not contain only
> the repo, but the repo+user+system)
> 
> What do others think?
> 
>> read from usual config files(repo specific .git/config, user wide
>> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
>> single hashmap populated using git_config(). The reason for doing so is
>> twofold, one is to honour include directives, another is to guarantee O(1)
>> lookups for usual config values, as values are queried for hundred of
>> times during a run of a git program.
> 
> What is the reason to deal with `the_config_set` and other config sets
> differently? You're giving arguments to store `the_config_set` as a
> single hashmap, but what is the reason to store others as multiple
> hashmaps?
> 
> And actually, I don't completely buy your arguments: having 3 or 4
> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
> could deal with include directives by having ".git/config and included
> files" in a hashmap, "~/.gitconfig and included files" in a second
> hashmap, ...
> 
> My guess is that the real argument is "git_config already does what I
> want and I'm too lazy to change it". And I do consider lazyness as a
> quality for a computer-scientist ;-).
> 
> I would personally find it much simpler to have a single hashmap. We'd
> lose the ability to invalidate the cache for only a single file, but I'm
> not sure it's worth the code complexity.
> 

Point noted. I can take the multiple hashmap route for `the_config_set`.
Still, since it will be the most used config set in the code and for each
query it would have to look through n hashmaps hampering performance. I
can change it if you want but like you, I don't think it is worth the code
complexity.

>> +Querying For Specific Variables
>> +-------------------------------
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_value`
>> +and `git_config_get_value_multi`. They both read values from an internal
>> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
>> +`git_config` callback function signature is used for iterating.
> 
> "Existing" refers to the old state. It's OK in a commit message, but
> won't make sense in the future, when your non-callback API and the old
> callback API will live side by side.
>

Noted.

>> +The config API also provides type specific API functions which do conversion
>> +as well as retrieval for the queried variable, including:
>> +
>> +`git_config_get_int`::
>> +Parse the retrieved value to an integer, including unit factors. Dies on
>> +error; otherwise, allocates and copies the integer into the dest parameter.
>> +Returns 0 on success, or 1 if no value was found.
> 
> It seems strange to me to return 1 here, and -1 in git_config_get_value
> to mean the same thing.
> 

Noted. Some of the type specific function return -1 on wrong parsing, I will
put return value 1 for no value found in all of the cases.

>> +`git_config_bool`::
>> +/* config-hash.c */

>> +#define CONFIG_SET_INIT { NULL, 0, 0 }
>> +
>> +struct config_set {
>> +	struct config_set_item *item;
>> +	unsigned int nr, alloc;
>> +};
>> +
>> +struct config_set_item {
>> +	struct hashmap config_hash;
>> +	char *filename;
>> +static struct hashmap *add_config_hash(const char *filename, struct config_set *cs)
>> +{
>> +	int ret = 0;
>> +	struct config_set_item *ct;
>> +	struct config_set_cb cb;
>> +	ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
>> +	ct = &cs->item[cs->nr++];
>> +	ct->filename = xstrdup(filename);
>> +	config_hash_init(&ct->config_hash);
>> +	cb.cs = cs;
>> +	cb.ct = ct;
>> +	/*
>> +	 * When filename is "default", hashmap is constructed from the usual set of
>> +	 * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
>> +	 * global /etc/gitconfig), used in `the_config_set`
>> +	 */
>> +	if (!strcmp(filename, "default"))
> 
> How does one read a file actually called "default" with this API?
> 
> More generally, why do you need a special-case here? I think it would be
> much better to leave this part unaware of the default, and have a
> separate function like create_repo_config_hash (or
> create_the_config_hash) that would call git_config(...). There actually
> isn't much shared code between the "default" case and the others in your
> function.
>

Noted, a separate function would be much better.

>> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
>> +{
>> +	int i;
>> +	for(i = 0; i < cs->nr; i++) {
>> +		if (!strcmp(cs->item[i].filename, filename))
>> +			return &cs->item[i].config_hash;
>> +	}
>> +	return add_config_hash(filename, cs);
>> +}
>> +
>> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
>> +							struct config_set *cs)
> 
> I don't get the logic here.
> 
> Either the caller explicitly manages a config_set variable like
> 
>   config_set my_set = git_configset_init();
>   git_configset_add_file(my_set, "my-file");
>   git_configset_get_value(my_set, "some.variable.name");
> 
> Or there's an implicit singleton mapping files to hashmaps to allow the
> user to say
> 
>   git_configset_get_value("my-file", "some.variable.name");
> 
> without asking the user to explicitly manipulate config_set variables.
> 
> It seems to me that your solution is a bad compromise between both
> solutions: you do require the user to manipulate config_set variables,
> but you also require a filename to look for the right entry in the list.
> 
> Did you miss the end of Junio's message:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/252567
> 
> ?
>

This is an internal function which is used to iterate over every `config_set`
item which contains a hashmap and filename as an identifier.
The exposed API which I documented above doesn't require the user to pass the
filename when querying for a value.

The API works like this, suppose there are two files,
A.config	B.config
[foo]		[foo]
a = b		a = d
a = c		a = e

config_set *my_set = git_configset_init();
git_configset_add_file(my_set, "A.config");
git_configset_add_file(my_set, "B.config");
git_configset_get_value(my_set, "foo.a");

Here get_value calls config_hash_find_entry once for each configset_item
"A.config" and "B.config" with key as "foo.a". which in turn returns
config_hash_item which contains the key and the value list.

We get two string_lists containing values for the key foo.a,
{b,c} and {d,e} corresponding to A.config & B.config  respectively.

get_value then returns the value with the highest priority which is 'e'
after going through both of the lists and picking the last one.
These steps are done by internal functions not by the user facing
API ones.

> If the user explicitly asks for a single entry in the list, then why
> group them in a list? I guess the question is the same as above, and the
> more I read the patch, the more I think a config_set should contain a
> single hashmap.
>

Do you think so after reading the whole patch? It would be much easier
to implement config_set using a single hashmap . But Junio had suggested
that it would be much better for config_set to be a collection of hashmaps
and `the_config_set` to contain a single hashmap[1].

[1]http://thread.gmane.org/gmane.comp.version-control.git/252329/focus=252460

>> +static int config_hash_add_value(const char *key, const char *value,
>> +				 struct config_set_item *ct, struct config_set *cs)
>> +{
>> +	struct hashmap *config_hash = &ct->config_hash;
>> +	struct config_hash_entry *e;
>> +	e = config_hash_find_entry(key, ct->filename, cs);
>> +
>> +	if (!e) {
>> +		e = xmalloc(sizeof(*e));
>> +		hashmap_entry_init(e, strhash(key));
>> +		e->key = xstrdup(key);
>> +		memset(&e->value_list, 0, sizeof(e->value_list));
>> +		e->value_list.strdup_strings = 1;
>> +		hashmap_add(config_hash, e);
>> +	}
>> +	/*
>> +	 * Since the values are being fed by git_config*() callback mechanism, they
>> +	 * are already normalised.
>> +int git_config_get_value(const char *key, const char **value)
>> +{
>> +	if (!the_config_set_initialised) {
>> +		git_configset_add_file(&the_config_set, "default");
>> +		the_config_set_initialised = 1;
>> +	}
>> +	return git_configset_get_value(&the_config_set, key, value);
>> +}
>> +
>> +const struct string_list *git_config_get_value_multi(const char *key)
>> +{
>> +	if (!the_config_set_initialised) {
>> +		git_configset_add_file(&the_config_set, "default");
>> +		the_config_set_initialised = 1;
>> +	}
>> +	return git_configset_get_value_multi(&the_config_set, key);
>> +}
> 
> These if statements could be refactored, and the_config_set_initialised
> could be hidden to the caller. Just make a function get_the_config_set()
> that initializes if needed and returns a pointer to the_config_set.
>

Noted. Thanks for the quick review.

Tanay.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] test-config: Add tests for the config_set API
  2014-07-02  9:29   ` Matthieu Moy
@ 2014-07-02 12:04     ` Tanay Abhra
  0 siblings, 0 replies; 17+ messages in thread
From: Tanay Abhra @ 2014-07-02 12:04 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: git, Ramkumar Ramachandra, Karsten Blees, Junio C Hamano,
	Eric Sunshine



On 7/2/2014 2:59 PM, Matthieu Moy wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
> 
>> +test_expect_success 'clear default config' '
>> +	rm -f .git/config
>> +'
>> +
>> +cat > .git/config << EOF
> 
> t/README says:
> 
>  - Put all code inside test_expect_success and other assertions.
> 
>    Even code that isn't a test per se, but merely some setup code
>    should be inside a test assertion.
> 
> Even these cat > would better be in a test_expect_success 'initialize
> config'.
> 
> (Not applied everywhere in Git's code essentially because some tests
> were written before the guideline was set and never updated).

Sorry about that. I followed t1300-repo-config.sh which has these mistakes
also.

>> +[core]
>> +	penguin = very blue
>> +	Movie = BadPhysics
>> +	UPPERCASE = true
>> +	MixedCase = true
>> +	my =
>> +	foo
>> +	baz = sam
>> +[Cores]
>> +	WhatEver = Second
>> +[my "Foo bAr"]
>> +	hi = hello
> 
> To really stress the "case sensitive middle part" case, you should also
> have other sections like
> 
> [my "foo bar"]
> 	hi = lower-case
> [my "FOO BAR"]
> 	hi = upper-case
> 
> and check that you get the right value for my.*.hi
> 
> Similarly, I'd add a [CORE] and a [CoRe] section to check that their
> content is actually merged with [core].
>

Noted.

>> +test_expect_success 'get value for a key with value as an empty string' '
>> +	echo "" >expect &&
>> +	test-config get_value core.my >actual &&
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success 'get value for a key with value as NULL' '
>> +	echo "(NULL)" >expect &&
>> +	test-config get_value core.foo >actual &&
>> +	test_cmp expect actual
>> +'
>> +test_expect_success 'upper case key' '
> 
> Keep the style consistent, if you separate tests with a single blank
> line, do it everywhere.
> 
>> +cat > expect << EOF
> 
> See above, should be in test_expect_success.
> 
> Also, >expect, not > expect.
> 
> There are other instances.
>

Noted. Again copied t1300-repo-config.sh style for cat.

>> +1
>> +0
>> +1
>> +1
>> +1
>> +EOF
>> +
>> +test_expect_success 'find bool value for the entered key' '
>> +	test-config get_bool goat.head >>actual &&
> 
> The first one should be a single >, or you should clear actual before
> the test.
> 

Noted.

>> +int main(int argc, char **argv)
>> +{
>> +	int i, no_of_files;
>> +	int ret = 0;
>> +	const char *v;
>> +	int val;
>> +	const struct string_list *strptr;
>> +	struct config_set cs = CONFIG_SET_INIT;
> 
> 
> 
>> +	if (argc == 3 && !strcmp(argv[1], "get_value")) {
>> +		if (!git_config_get_value(argv[2], &v)) {
>> +			if (!v)
>> +				printf("(NULL)\n");
>> +			else
>> +				printf("%s\n", v);
>> +			return 0;
>> +		} else {
>> +			printf("Value not found for \"%s\"\n", argv[2]);
>> +			return -1;
>> +		}
>> +	} else if (argc == 3 && !strcmp(argv[1], "get_value_multi")) {
>> +		strptr = git_config_get_value_multi(argv[2]);
>> +		if (strptr) {
>> +			for (i = 0; i < strptr->nr; i++) {
>> +				v = strptr->items[i].string;
>> +				if (!v)
>> +					printf("(NULL)\n");
>> +				else
>> +					printf("%s\n", v);
>> +			}
>> +			return 0;
>> +		} else {
>> +			printf("Value not found for \"%s\"\n", argv[2]);
>> +			return -1;
>> +		}
>> +	} else if (argc == 3 && !strcmp(argv[1], "get_int")) {
>> +		if (!git_config_get_int(argv[2], &val)) {
>> +			printf("%d\n", val);
>> +			return 0;
>> +		} else {
>> +			printf("Value not found for \"%s\"\n", argv[2]);
>> +			return -1;
>> +		}
>> +	} else if (argc == 3 && !strcmp(argv[1], "get_bool")) {
>> +		if (!git_config_get_bool(argv[2], &val)) {
>> +			printf("%d\n", val);
>> +			return 0;
>> +		} else {
>> +			printf("Value not found for \"%s\"\n", argv[2]);
>> +			return -1;
>> +		}
>> +	} else if (!strcmp(argv[1], "configset_get_value")) {
>> +		no_of_files = git_config_int("unused", argv[2]);
> 
> Why ask the user to give a number of files on the command-line. With a
> syntax like
> 
> test-config configset_get_value <key> <files>...
> 
> you could just use argc to iterate over argv. Here, you trust the user
> to provide the right value, and most likely segfault otherwise (and this
> is not really documented). I know this is only test code, but why not do
> it right anyway ;-).
> 

Yup, your way is much better, thanks for the review.
Tanay.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02 11:58     ` Tanay Abhra
@ 2014-07-02 14:32       ` Matthieu Moy
  0 siblings, 0 replies; 17+ messages in thread
From: Matthieu Moy @ 2014-07-02 14:32 UTC (permalink / raw)
  To: Tanay Abhra
  Cc: git, Ramkumar Ramachandra, Karsten Blees, Junio C Hamano,
	Eric Sunshine

Tanay Abhra <tanayabh@gmail.com> writes:
> On 7/2/2014 2:44 PM, Matthieu Moy wrote:
>> Tanay Abhra <tanayabh@gmail.com> writes:

> Maybe a reworded sentence may work,
> `git_configset_get_value_multi` returns a list of values sorted in order of
> increasing priority (i.e. last match will be at the end of the list.)

OK.

>>> read from usual config files(repo specific .git/config, user wide
>>> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
>>> single hashmap populated using git_config(). The reason for doing so is
>>> twofold, one is to honour include directives, another is to guarantee O(1)
>>> lookups for usual config values, as values are queried for hundred of
>>> times during a run of a git program.
>> 
>> What is the reason to deal with `the_config_set` and other config sets
>> differently? You're giving arguments to store `the_config_set` as a
>> single hashmap, but what is the reason to store others as multiple
>> hashmaps?
>> 
>> And actually, I don't completely buy your arguments: having 3 or 4
>> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
>> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
>> could deal with include directives by having ".git/config and included
>> files" in a hashmap, "~/.gitconfig and included files" in a second
>> hashmap, ...
>> 
>> My guess is that the real argument is "git_config already does what I
>> want and I'm too lazy to change it". And I do consider lazyness as a
>> quality for a computer-scientist ;-).
>> 
>> I would personally find it much simpler to have a single hashmap. We'd
>> lose the ability to invalidate the cache for only a single file, but I'm
>> not sure it's worth the code complexity.
>> 
>
> Point noted. I can take the multiple hashmap route for `the_config_set`.
> Still, since it will be the most used config set in the code and for each
> query it would have to look through n hashmaps hampering performance. I
> can change it if you want but like you, I don't think it is worth the code
> complexity.

That's why my suggestion is to use a single hashmap everywhere.

I don't have strong opinion either way, but whichever way you go, you
should justify the choice better in the commit message.

>>> +The config API also provides type specific API functions which do conversion
>>> +as well as retrieval for the queried variable, including:
>>> +
>>> +`git_config_get_int`::
>>> +Parse the retrieved value to an integer, including unit factors. Dies on
>>> +error; otherwise, allocates and copies the integer into the dest parameter.
>>> +Returns 0 on success, or 1 if no value was found.
>> 
>> It seems strange to me to return 1 here, and -1 in git_config_get_value
>> to mean the same thing.
>> 
>
> Noted. Some of the type specific function return -1 on wrong parsing, I will
> put return value 1 for no value found in all of the cases.

I'm not sure I fully get the existing convention. My understanding is
that when the extracted value is returned, -1 is used as a special value
to mean "no value" (e.g. git_config_maybe_bool can return 1, 0 or -1),
but when the extracted value is written to a by-address parameter, then
the return value is 1 or 0.

>>> +static struct hashmap *get_config_hash(const char *filename, struct config_set *cs)
>>> +{
>>> +	int i;
>>> +	for(i = 0; i < cs->nr; i++) {
>>> +		if (!strcmp(cs->item[i].filename, filename))
>>> +			return &cs->item[i].config_hash;
>>> +	}
>>> +	return add_config_hash(filename, cs);
>>> +}
>>> +
>>> +static struct config_hash_entry *config_hash_find_entry(const char *key, const char* filename,
>>> +							struct config_set *cs)
>> 
>> I don't get the logic here.
>> 
>> Either the caller explicitly manages a config_set variable like
>> 
>>   config_set my_set = git_configset_init();
>>   git_configset_add_file(my_set, "my-file");
>>   git_configset_get_value(my_set, "some.variable.name");
>> 
>> Or there's an implicit singleton mapping files to hashmaps to allow the
>> user to say
>> 
>>   git_configset_get_value("my-file", "some.variable.name");
>> 
>> without asking the user to explicitly manipulate config_set variables.
>> 
>> It seems to me that your solution is a bad compromise between both
>> solutions: you do require the user to manipulate config_set variables,
>> but you also require a filename to look for the right entry in the list.
>> 
>> Did you miss the end of Junio's message:
>> 
>>   http://thread.gmane.org/gmane.comp.version-control.git/252567
>> 
>> ?
>>
>
> This is an internal function which is used to iterate over every `config_set`
> item which contains a hashmap and filename as an identifier.
> The exposed API which I documented above doesn't require the user to pass the
> filename when querying for a value.
>
> The API works like this, suppose there are two files,
> A.config	B.config
> [foo]		[foo]
> a = b		a = d
> a = c		a = e
>
> config_set *my_set = git_configset_init();
> git_configset_add_file(my_set, "A.config");
> git_configset_add_file(my_set, "B.config");
> git_configset_get_value(my_set, "foo.a");
>
> Here get_value calls config_hash_find_entry once for each configset_item
> "A.config" and "B.config" with key as "foo.a". which in turn returns
> config_hash_item which contains the key and the value list.

OK, I understand what you did. But it still seem wrong to me. If you
inline functions, you are doing something like

for (f in files) { // in git_configset_get_value
   for (g in files) { // in get_config_hash
      if (f == g) {
         take the hashmap for g.
      }
   }
}

This is O(n^2) to iterate over an ordered list. OK, the list is small,
but still, why not iterate normally over the array?

Algorithmically speaking, you do not need the file names here, just
iterate over an array of hashmaps. Filenames do not harm in struct
config_set, but they are just metadata, not usefull to do the search.

> We get two string_lists containing values for the key foo.a,
> {b,c} and {d,e} corresponding to A.config & B.config  respectively.
>
> get_value then returns the value with the highest priority which is 'e'
> after going through both of the lists and picking the last one.
> These steps are done by internal functions not by the user facing
> API ones.

But then, I have to wonder again what is the benefit of having multiple
hashmaps in the config_set, since it has no user-visible benefit.

>> If the user explicitly asks for a single entry in the list, then why
>> group them in a list? I guess the question is the same as above, and the
>> more I read the patch, the more I think a config_set should contain a
>> single hashmap.
>>
>
> Do you think so after reading the whole patch? It would be much easier
> to implement config_set using a single hashmap . But Junio had suggested
> that it would be much better for config_set to be a collection of hashmaps
> and `the_config_set` to contain a single hashmap[1].
>
> [1]http://thread.gmane.org/gmane.comp.version-control.git/252329/focus=252460

I do not read the message like you. Junio gives pros&cons of "everything
in a single hashmap" Vs "one hashmap per file", and explicitely
concludes with

| I don't know at this point, and thinking these things through
| to arrive at a good design is part of the GSoC project after all, so
| I'd rather not to think it through to the end myself ;-).

I do not read "there should be multiple hashmaps, except in the case of
the_configset which should be handled differently".

To summarize:

Advantages of using a single hashmap per configset:

1) Code is simpler

2) Lookup is slightly more efficient

Advantages of using one hashmap per configset:

A) Allows sharing hashmaps between configsets

B) Allows selectively refreshing the config for a single file without
   touching the other files' hashmaps.

C) Would allow more operations like "tell me which file contains the
   config key foo.bar", that could be useful in some situation.

Right now, I do not think that your implementation uses any of A), B) or
C), so you're doing something uselessly complex.

In the future, my feeling is that A) is essentially irrelevant (we're
talking about a few hundreds of entry for large config files, not
something we want to fine-tune in memory), B) would raise tons of
corner-cases and would be hard to get right compared to the benefit. C)
could be interesting, but the most common case would be to use
the_configset, in which there's only one hashmap so it's not applicable
without a big code change.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
  2014-07-02  9:14   ` Matthieu Moy
@ 2014-07-02 17:00   ` Junio C Hamano
  2014-07-02 17:09     ` Matthieu Moy
  2014-07-03 17:05     ` Tanay Abhra
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2014-07-02 17:00 UTC (permalink / raw)
  To: Tanay Abhra
  Cc: git, Ramkumar Ramachandra, Matthieu Moy, Karsten Blees,
	Eric Sunshine

Tanay Abhra <tanayabh@gmail.com> writes:

> diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
> index 230b3a0..2c02fee 100644
> --- a/Documentation/technical/api-config.txt
> +++ b/Documentation/technical/api-config.txt
> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>  `git_config_from_file`. This takes the same callback and data parameters
>  as `git_config`.
>  
> +Querying For Specific Variables
> +-------------------------------
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal
> +cache generated previously from reading the config files.
> +
> +`git_config_get_value` takes two parameters,
> +
> +- a key string in canonical flat form for which the corresponding value

What is "canonical flat form"?

> +  with the highest priority (i.e. value in the repo config will be
> +  preferred over value in user wide config for the same variable) will
> +  be retrieved.
> +
> +- a pointer to a string which will point to the retrieved value. The caller
> +  should not free or modify the value returned as it is owned by the cache.
> +
> +`git_config_get_value` returns 0 on success, or -1 for no value found.
> +
> +`git_config_get_value_multi` allocates and returns a `string_list`
> +containing all the values for the key passed as parameter, sorted in order
> +of increasing priority (Note: caller has to call `string_list_clear` on
> +the returned pointer and then free it).
> +
> +`git_config_get_value_multi` returns NULL for no value found.
> +
> +`git_config_clear` is provided for resetting and invalidating the cache.
> +
> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

Instead of doing prose above and then enumeration below,
consistently using the enumeration-style would make the descriptions
of API functions easier to find and read.  Also show what parameters
each function takes.  E.g.

	`git_config_get_value(const char *key, const char **value)`::
		Find the highest-priority value for the
		configuration variable `key`, store the pointer to
		it in `value` and return 0.  When the configuration
		variable `key` is not found, return -1 without
		touching `value`.
                
A few random thoughts.

 - Are "a value for the variable is found" and "no value for the
   variable is found" the only possible outcome?  Should somebody
   (not necessarily the calling code) be notified of an error---for
   example, if you opened a config file successfully but later found
   that the file could not be parsed due to a syntax error or an I/O
   error, is it possible the caller of this function to tell, or are
   these just some special cases of "variable not found"?

 - The same goes for the "multi" variant but it is a bit more grave,
   as a function that returns an "int" can later be updated to
   return different negative values to signal different kinds of
   errors, but a function that returns a pointer to string-list can
   only return one kind of NULL ;-)

 - For the purpose of "git_config_get_value()", what is the value
   returned for core.filemode when the configuration file says
   "[core] filemode\n", i.e. when git_config() callback would get a
   NULL for value to signal a boolean true?

 - Is there a reason why you need a separate and new "config_iter"?
   What does it do differently from the good-old git_config() and
   how?  It does not belong to "Querying for Specific Variables"
   anyway.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

"allocates and copies"????  Is a caller forced to do something like
this?

	int *val;

	if (!git_config_get_int("int.one", &val)) {
        	use(*val);
                free(val);
	}

I'd expect it to be more like:

	int val;
        if (!git_config_get_int("int.one", &val))
        	use(val);

Also, is there a reason why the "not found" signal is "1" (as
opposed to "-1" for the lower-level get_value() API)?

> +`git_config_get_ulong`::
> +Identical to `git_config_get_int` but for unsigned longs.

Obviously this is not identical but merely "Similar" to.

> +`git_config_bool`::

git_config_get_bool, perhaps?

> +Custom Configsets
> +-----------------
> +
> +A `config_set` points at an ordered set of config files, each of
> +which represents what was read and cached in-core from a file.

This over-specifies the implementation.  Judging from the list of
API functions below, an implementation is possible without having to
keep track of what source files were fed in what order (i.e. it can
just keep a single hash-table of read values and incrementally parse
the new contents given via configset-add-file into it, without even
recording the origins, and forget about the source files once it
finishes parsing).

As was pointed out during the previous review round, a stack of
<hash, filename> tuples cannot represent the configuration when
config-includes are involved.  Also we would eventually want to be
able to also read from non-file sources (e.g. from a blob, or a
caller-supplied in-core strings).

For the purpose of reporting errors at the point of value being
used, I have a suspicion that you would need to associate the
<file,line> pair with each individual value string you will keep
after configset_add_file() parses the file.  The implementation of
the call-chain may look like:


	git_configset_add_file(cs, filename):
        	file = open filename
                lineno = 0
		while read line from file:
	                git_configset_parse_line(cs, filename, lineno++, line)
		close file

	git_configset_parse_line(cs, filename, lineno, line):
        	... this needs to implement a state machine
                ... that keeps track of what has been read halfway
                ... e.g. we may have seen "[core] crlf =\\\n"
        	... earlier, which is not a complete input yet,
                ... and now we may be looking at "auto" that lets
                ... us finally parse one item.

	        if state in 'cs' and new 'line' gives us a complete input:
			determine key and value
                        record <value, filename, lineno> for 'key' to cs.hash
                update state in 'cs'

For processing "git -c section.variable=value", we probably would
call git_configset_parse_line() on the_configset with filename
set to "command line" or something.

And then when the caller tries to actually use the value, e.g.

         git_configset_get_int(cs, key):
		look up <value, filename, lineno> for 'key' from the cs.hash
		if value is successfully parsed as int:
                	return the parsed result
		else:
                	error("not an int: '%s' (%s:%s)", value, filename, lineno)

perhaps?

> +It can be used to construct an in-memory cache for config files that
> +the caller specifies. For example,

This is almost good to help a reader decide if she might want to use
it in her code, but we probably want to stress the fact that use of
a config_set is done for a namespace separate from the main
configuration system, e.g. ".gitmodules".

> +---------------------------------------
> +struct config_set gm_config = CONFIG_SET_INIT;
> +int b;
> +/* we add config files to the config_set */
> +git_configset_add_file(&gm_config, ".gitmodules");
> +git_configset_add_file(&gm_config, ".gitmodules_alt");
> +
> +if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) {
> +/* hack hack hack */
> +}
> +
> +/* when we are done with the configset */
> +git_configset_clear(&gm_config);
> +----------------------------------------
> +
> +Configset API provides functions for the above mentioned work flow, including:
> +
> +`git_configset_init`::
> +
> +Returns an allocated and initialized struct `config_set` pointer.

"allocated"???  Is a caller forced to do this, i.e. always have the
config-set on heap?

	struct config_set *config = git_configset_init();
        ... use it ...
        git_configset_clear(config);

I'd expect it be more like this:

	struct config_set config;

	git_configset_init(&config);
        ... use it...
        git_configset_clear(&config);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02 17:00   ` Junio C Hamano
@ 2014-07-02 17:09     ` Matthieu Moy
  2014-07-03 17:05     ` Tanay Abhra
  1 sibling, 0 replies; 17+ messages in thread
From: Matthieu Moy @ 2014-07-02 17:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Tanay Abhra, git, Ramkumar Ramachandra, Karsten Blees,
	Eric Sunshine

Junio C Hamano <gitster@pobox.com> writes:

> Tanay Abhra <tanayabh@gmail.com> writes:
>
>> diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
>> index 230b3a0..2c02fee 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>>  `git_config_from_file`. This takes the same callback and data parameters
>>  as `git_config`.
>>  
>> +Querying For Specific Variables
>> +-------------------------------
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_value`
>> +and `git_config_get_value_multi`.They both read values from an internal
>> +cache generated previously from reading the config files.
>> +
>> +`git_config_get_value` takes two parameters,
>> +
>> +- a key string in canonical flat form for which the corresponding value
>
> What is "canonical flat form"?

Actually, it's defined above in the same file (was here before the
patch):

  - the name of the parsed variable. This is in canonical "flat" form: the
    section, subsection, and variable segments will be separated by dots,
    and the section and variable segments will be all lowercase. E.g.,

But it might make sense to reword the doc, e.g. introduce a glossary
section at the beginning.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  9:14   ` Matthieu Moy
  2014-07-02 11:58     ` Tanay Abhra
@ 2014-07-02 17:09     ` Junio C Hamano
  2014-07-04  4:58     ` Tanay Abhra
  2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2014-07-02 17:09 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Tanay Abhra, git, Ramkumar Ramachandra, Karsten Blees,
	Eric Sunshine

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> I don't like the name "the_config_set". It's not the only one. Perhaps
> repo_config_set? (not totally satisfactory as it does not contain only
> the repo, but the repo+user+system)
>
> What do others think?

I actually do like "the_configset", which goes nicely parallel to
"the_index".  Neither is the only one, but most of the time our code
operates on the primary one and "the_frotz" refers to it (and
convenience wrappers that does not specify which set defaults to
it).

> What is the reason to deal with `the_config_set` and other config sets
> differently? You're giving arguments to store `the_config_set` as a
> single hashmap, but what is the reason to store others as multiple
> hashmaps?

Confusion, probably.  "the_configset" should be just a singleton
instance used to store the values we use for the default config
system, but its shape should be exactly the same as the other ones
users can use to read .gitmodules and friends with.

> And actually, I don't completely buy your arguments: having 3 or 4
> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
> could deal with include directives by having ".git/config and included
> files" in a hashmap, "~/.gitconfig and included files" in a second
> hashmap, ...

OK.

> I would personally find it much simpler to have a single hashmap. We'd
> lose the ability to invalidate the cache for only a single file, but I'm
> not sure it's worth the code complexity.

OK, too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02 17:00   ` Junio C Hamano
  2014-07-02 17:09     ` Matthieu Moy
@ 2014-07-03 17:05     ` Tanay Abhra
  1 sibling, 0 replies; 17+ messages in thread
From: Tanay Abhra @ 2014-07-03 17:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Ramkumar Ramachandra, Matthieu Moy, Karsten Blees,
	Eric Sunshine



On 7/2/2014 10:30 PM, Junio C Hamano wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
> 
>> diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
>> index 230b3a0..2c02fee 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>> +`git_config_get_value` returns 0 on success, or -1 for no value found.
>> +
>> +`git_config_get_value_multi` allocates and returns a `string_list`
>> +containing all the values for the key passed as parameter, sorted in order
>> +of increasing priority (Note: caller has to call `string_list_clear` on
>> +the returned pointer and then free it).
>> +
>> +`git_config_get_value_multi` returns NULL for no value found.
>> +
>> +`git_config_clear` is provided for resetting and invalidating the cache.
>> +
>> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
>> +`git_config` callback function signature is used for iterating.
> 
> Instead of doing prose above and then enumeration below,
> consistently using the enumeration-style would make the descriptions
> of API functions easier to find and read.  Also show what parameters
> each function takes.  E.g.
> 

Noted.

>                 
> A few random thoughts.
> 
>  - Are "a value for the variable is found" and "no value for the
>    variable is found" the only possible outcome?  Should somebody
>    (not necessarily the calling code) be notified of an error---for
>    example, if you opened a config file successfully but later found
>    that the file could not be parsed due to a syntax error or an I/O
>    error, is it possible the caller of this function to tell, or are
>    these just some special cases of "variable not found"?

The syntactical errors when parsing the file are shown when it fails to
construct the hashmap. Whenever a caller calls for a value for the first
time, the file is read and parsed and if it fails during parsing it dies
raising a error specifying the lineno and filename.

Variable not found means that the variable is not present in the file,
nothing more. Note: the hashmap code uses git_config() parsing stack
so whatever error it raises in normal git_config() invocation, it
would be raised here too.

>  - The same goes for the "multi" variant but it is a bit more grave,
>    as a function that returns an "int" can later be updated to
>    return different negative values to signal different kinds of
>    errors, but a function that returns a pointer to string-list can
>    only return one kind of NULL ;-)
>

Null pointer just means no variable found in the files. What other kind
of errors may arise?

>  - For the purpose of "git_config_get_value()", what is the value
>    returned for core.filemode when the configuration file says
>    "[core] filemode\n", i.e. when git_config() callback would get a
>    NULL for value to signal a boolean true?

NULL in value pointer with 0 return value denoting variable found.

>  - Is there a reason why you need a separate and new "config_iter"?
>    What does it do differently from the good-old git_config() and
>    how?  It does not belong to "Querying for Specific Variables"
>    anyway.
>

You mentioned previously in the list for a analogue to git_config()
which supports iterating over the keys.
The link is [1] I think, gmane is not working for me currently.

http://thread.gmane.org/gmane.comp.version-control.git/252567

>> +The config API also provides type specific API functions which do conversion
>> +as well as retrieval for the queried variable, including:
>> +
>> +`git_config_get_int`::
>> +Parse the retrieved value to an integer, including unit factors. Dies on
>> +error; otherwise, allocates and copies the integer into the dest parameter.
>> +Returns 0 on success, or 1 if no value was found.
> 
> "allocates and copies"????  Is a caller forced to do something like
> this?
> 
> 	int *val;
> 
> 	if (!git_config_get_int("int.one", &val)) {
>         	use(*val);
>                 free(val);
> 	}
> 
> I'd expect it to be more like:
> 
> 	int val;
>         if (!git_config_get_int("int.one", &val))
>         	use(val);
>

Yup, you are right, my fault.

> Also, is there a reason why the "not found" signal is "1" (as
> opposed to "-1" for the lower-level get_value() API)?
> 

Many of the type specific functions return -1 for wrongful parsing
like git_config_get_string and git_config_get_maybe_bool, that is
why I changed the return value for such functions.

>> +Custom Configsets
>> +-----------------
>> +
>> +A `config_set` points at an ordered set of config files, each of
>> +which represents what was read and cached in-core from a file.
> 
> This over-specifies the implementation.  Judging from the list of
> API functions below, an implementation is possible without having to
> keep track of what source files were fed in what order (i.e. it can
> just keep a single hash-table of read values and incrementally parse
> the new contents given via configset-add-file into it, without even
> recording the origins, and forget about the source files once it
> finishes parsing).
> 
> As was pointed out during the previous review round, a stack of
> <hash, filename> tuples cannot represent the configuration when
> config-includes are involved.  Also we would eventually want to be
> able to also read from non-file sources (e.g. from a blob, or a
> caller-supplied in-core strings).
> 
> For the purpose of reporting errors at the point of value being
> used, I have a suspicion that you would need to associate the
> <file,line> pair with each individual value string you will keep
> after configset_add_file() parses the file.  The implementation of
> the call-chain may look like:
> 
> 
> 	git_configset_add_file(cs, filename):
>         	file = open filename
>                 lineno = 0
> 		while read line from file:
> 	                git_configset_parse_line(cs, filename, lineno++, line)
> 		close file
> 
> 	git_configset_parse_line(cs, filename, lineno, line):
>         	... this needs to implement a state machine
>                 ... that keeps track of what has been read halfway
>                 ... e.g. we may have seen "[core] crlf =\\\n"
>         	... earlier, which is not a complete input yet,
>                 ... and now we may be looking at "auto" that lets
>                 ... us finally parse one item.
> 
> 	        if state in 'cs' and new 'line' gives us a complete input:
> 			determine key and value
>                         record <value, filename, lineno> for 'key' to cs.hash
>                 update state in 'cs'
> 
> For processing "git -c section.variable=value", we probably would
> call git_configset_parse_line() on the_configset with filename
> set to "command line" or something.
> 
> And then when the caller tries to actually use the value, e.g.
> 
>          git_configset_get_int(cs, key):
> 		look up <value, filename, lineno> for 'key' from the cs.hash
> 		if value is successfully parsed as int:
>                 	return the parsed result
> 		else:
>                 	error("not an int: '%s' (%s:%s)", value, filename, lineno)
> 

Okay, lets see what I can do with it.

> 
>> +It can be used to construct an in-memory cache for config files that
>> +the caller specifies. For example,
> 
> This is almost good to help a reader decide if she might want to use
> it in her code, but we probably want to stress the fact that use of
> a config_set is done for a namespace separate from the main
> configuration system, e.g. ".gitmodules".
> 
>> +---------------------------------------
>> +struct config_set gm_config = CONFIG_SET_INIT;
>> +int b;
>> +/* we add config files to the config_set */
>> +git_configset_add_file(&gm_config, ".gitmodules");
>> +git_configset_add_file(&gm_config, ".gitmodules_alt");
>> +
>> +if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) {
>> +/* hack hack hack */
>> +}
>> +
>> +/* when we are done with the configset */
>> +git_configset_clear(&gm_config);
>> +----------------------------------------
>> +
>> +Configset API provides functions for the above mentioned work flow, including:
>> +
>> +`git_configset_init`::
>> +
>> +Returns an allocated and initialized struct `config_set` pointer.
> 
> "allocated"???  Is a caller forced to do this, i.e. always have the
> config-set on heap?
> 
> 	struct config_set *config = git_configset_init();
>         ... use it ...
>         git_configset_clear(config);
> 
> I'd expect it be more like this:
> 
> 	struct config_set config;
> 
> 	git_configset_init(&config);
>         ... use it...
>         git_configset_clear(&config);
> 

Yup, I prefer the second one.
Thanks for the review,
Tanay Abhra.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-02  9:14   ` Matthieu Moy
  2014-07-02 11:58     ` Tanay Abhra
  2014-07-02 17:09     ` Junio C Hamano
@ 2014-07-04  4:58     ` Tanay Abhra
  2014-07-04  9:17       ` Matthieu Moy
  2 siblings, 1 reply; 17+ messages in thread
From: Tanay Abhra @ 2014-07-04  4:58 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, Ramkumar Ramachandra

Hi,

I have cooked up a single hashmap implementation. What are your
thoughts about it? I must say it is much cleaner than the
previous attempt and it passes all the tests.
I will send the revised patch tomorrow with the corrected
documentation, till then please say so if you prefer this one
or the multi hashmap one.

Cheers,
Tanay Abhra.

-- >8 --
diff --git a/Makefile b/Makefile
index 07ea105..d503f78 100644
--- a/Makefile
+++ b/Makefile
@@ -777,6 +777,7 @@ LIB_OBJS += commit.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
+LIB_OBJS += config-hash.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/cache.h b/cache.h
index df65231..2a675f6 100644
--- a/cache.h
+++ b/cache.h
@@ -7,6 +7,7 @@
 #include "advice.h"
 #include "gettext.h"
 #include "convert.h"
+#include "string-list.h"

 #include SHA1_HEADER
 #ifndef git_SHA_CTX
@@ -1325,6 +1326,41 @@ extern int parse_config_key(const char *var,
 			    const char **subsection, int *subsection_len,
 			    const char **key);

+/* config-hash.c */
+
+struct config_set {
+	struct hashmap config_hash;
+	int hash_initialized;
+};
+
+extern struct config_set *git_configset_init(void);
+extern int git_configset_add_file(struct config_set *cs, const char *filename);
+extern int git_configset_get_value(struct config_set *cs, const char *key, const char **value);
+extern const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key);
+extern void git_configset_clear(struct config_set *cs);
+extern void git_configset_iter(struct config_set *cs, config_fn_t fn, void *data);
+extern int git_configset_get_string(struct config_set *cs, const char *key, const char **dest);
+extern int git_configset_get_int(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_ulong(struct config_set *cs, const char *key, unsigned long *dest);
+extern int git_configset_get_bool(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_bool_or_int(struct config_set *cs, const char *key, int *is_bool, int *dest);
+extern int git_configset_get_maybe_bool(struct config_set *cs, const char *key, int *dest);
+extern int git_configset_get_pathname(struct config_set *cs, const char *key, const char **dest);
+
+extern int git_config_get_value(const char *key, const char **value);
+extern const struct string_list *git_config_get_value_multi(const char *key);
+extern void git_config_clear(void);
+extern void git_config_iter(config_fn_t fn, void *data);
+extern int git_config_get_string(const char *key, const char **dest);
+extern int git_config_get_int(const char *key, int *dest);
+extern int git_config_get_ulong(const char *key, unsigned long *dest);
+extern int git_config_get_bool(const char *key, int *dest);
+extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest);
+extern int git_config_get_maybe_bool(const char *key, int *dest);
+extern int git_config_get_pathname(const char *key, const char **dest);
+
+
 extern int committer_ident_sufficiently_given(void);
 extern int author_ident_sufficiently_given(void);

diff --git a/config-hash.c b/config-hash.c
new file mode 100644
index 0000000..e12bf57
--- /dev/null
+++ b/config-hash.c
@@ -0,0 +1,325 @@
+#include "cache.h"
+#include "hashmap.h"
+#include "string-list.h"
+
+
+/*
+ * Default config_set that contains key-value pairs from the usual set of config
+ * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
+ * global /etc/gitconfig)
+ */
+static struct config_set the_config_set;
+
+struct config_hash_entry {
+	struct hashmap_entry ent;
+	char *key;
+	struct string_list value_list;
+};
+
+static int config_hash_add_value(const char *, const char *, struct hashmap *);
+
+static int config_hash_entry_cmp(const struct config_hash_entry *e1,
+				 const struct config_hash_entry *e2, const void *unused)
+{
+	return strcmp(e1->key, e2->key);
+}
+
+static void config_hash_init(struct hashmap *config_hash)
+{
+	hashmap_init(config_hash, (hashmap_cmp_fn)config_hash_entry_cmp, 0);
+}
+
+static int config_hash_callback(const char *key, const char *value, void *cb)
+{
+	struct config_set *cs = cb;
+	config_hash_add_value(key, value, &cs->config_hash);
+	return 0;
+}
+
+static int add_configset_hash(const char *filename, struct config_set *cs)
+{
+	int ret = 0;
+	if (!cs->hash_initialized)
+		config_hash_init(&cs->config_hash);
+	cs->hash_initialized = 1;
+	ret = git_config_from_file(config_hash_callback, filename, cs);
+	if (!ret)
+		return 0;
+	else {
+		hashmap_free(&cs->config_hash, 1);
+		cs->hash_initialized = 0;
+		return -1;
+	}
+}
+
+static struct config_hash_entry *config_hash_find_entry(const char *key,
+							struct hashmap *config_hash)
+{
+	struct config_hash_entry k;
+	struct config_hash_entry *found_entry;
+	char *normalized_key;
+	int ret;
+	ret = git_config_parse_key(key, &normalized_key, NULL);
+
+	if (ret)
+		return NULL;
+
+	hashmap_entry_init(&k, strhash(normalized_key));
+	k.key = normalized_key;
+	found_entry = hashmap_get(config_hash, &k, NULL);
+	free(normalized_key);
+	return found_entry;
+}
+
+static struct string_list *configset_get_list(const char *key, struct config_set *cs)
+{
+	struct config_hash_entry *e = config_hash_find_entry(key, &cs->config_hash);
+	return e ? &e->value_list : NULL;
+}
+
+static int config_hash_add_value(const char *key, const char *value, struct hashmap *config_hash)
+{
+	struct config_hash_entry *e;
+	e = config_hash_find_entry(key, config_hash);
+
+	if (!e) {
+		e = xmalloc(sizeof(*e));
+		hashmap_entry_init(e, strhash(key));
+		e->key = xstrdup(key);
+		memset(&e->value_list, 0, sizeof(e->value_list));
+		e->value_list.strdup_strings = 1;
+		hashmap_add(config_hash, e);
+	}
+	/*
+	 * Since the values are being fed by git_config*() callback mechanism, they
+	 * are already normalized. So simply add them without any further munging.
+	 */
+	string_list_append_nodup(&e->value_list, value ? xstrdup(value) : NULL);
+
+	return 0;
+}
+
+struct config_set *git_configset_init(void)
+{
+	struct config_set *cs = xcalloc(1, sizeof(*cs));
+	return cs;
+}
+
+int git_configset_add_file(struct config_set *cs, const char *filename)
+{
+	return add_configset_hash(filename, cs);
+}
+
+int git_configset_get_value(struct config_set *cs, const char *key, const char **value)
+{
+	struct string_list *values = NULL;
+	/*
+	 * Follows "last one wins" semantic, i.e., if there are multiple matches for the
+	 * queried key in the files of the configset the value returned will the last value
+	 * in the last matching file of the configset
+	 */
+	values = configset_get_list(key, cs);
+
+	if (!values)
+		return 1;
+	assert(values->nr > 0);
+	*value = values->items[values->nr - 1].string;
+	return 0;
+}
+
+const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)
+{
+	return configset_get_list(key, cs);
+}
+
+/*
+ * Uses the existing git_config() callback function signature to iterate over
+ * keys in the configset.
+ */
+void git_configset_iter(struct config_set *cs, config_fn_t fn, void *data)
+{
+	int i;
+	char *value;
+	struct string_list *strptr;
+	struct config_hash_entry *entry;
+	struct hashmap_iter iter;
+	hashmap_iter_init(&cs->config_hash, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		strptr = &entry->value_list;
+		for (i = 0; i < strptr->nr; i++) {
+			value = strptr->items[i].string;
+			fn(entry->key, value, data);
+		}
+	}
+}
+
+void git_configset_clear(struct config_set *cs)
+{
+	struct config_hash_entry *entry;
+	struct hashmap_iter iter;
+	if (!cs->hash_initialized)
+		return;
+
+	hashmap_iter_init( &cs->config_hash, &iter);
+	while ((entry = hashmap_iter_next(&iter))) {
+		free(entry->key);
+		string_list_clear(&entry->value_list, 0);
+	}
+	hashmap_free(&cs->config_hash, 1);
+}
+
+int git_configset_get_string(struct config_set *cs, const char *key, const char **dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value))
+		return git_config_string(dest, key, value);
+	else
+		return 1;
+}
+
+int git_configset_get_int(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_int(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_ulong(struct config_set *cs, const char *key, unsigned long *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_ulong(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_bool(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_bool(key, value);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_bool_or_int(struct config_set *cs, const char *key,
+				int *is_bool, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_bool_or_int(key, value, is_bool);
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_maybe_bool(struct config_set *cs, const char *key, int *dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value)) {
+		*dest = git_config_maybe_bool(key, value);
+		if (*dest == -1)
+			return -1;
+		return 0;
+	} else
+		return 1;
+}
+
+int git_configset_get_pathname(struct config_set *cs, const char *key, const char **dest)
+{
+	const char *value;
+	if (!git_configset_get_value(cs, key, &value))
+		return git_config_pathname(dest, key, value);
+	else
+		return 1;
+}
+
+static int git_config_check_init(void)
+{
+	int ret = 0;
+	if (the_config_set.hash_initialized)
+		return 0;
+	config_hash_init(&the_config_set.config_hash);
+	the_config_set.hash_initialized = 1;
+	ret = git_config(config_hash_callback, &the_config_set);
+	if (ret >= 0)
+		return 0;
+	else {
+		hashmap_free(&the_config_set.config_hash, 1);
+		the_config_set.hash_initialized = 0;
+		return -1;
+	}
+}
+
+int git_config_get_value(const char *key, const char **value)
+{
+	git_config_check_init();
+	return git_configset_get_value(&the_config_set, key, value);
+}
+
+const struct string_list *git_config_get_value_multi(const char *key)
+{
+	git_config_check_init();
+	return git_configset_get_value_multi(&the_config_set, key);
+}
+
+void git_config_clear(void)
+{
+	if (!the_config_set.hash_initialized)
+		return;
+	git_configset_clear(&the_config_set);
+	the_config_set.hash_initialized = 0;
+}
+
+void git_config_iter(config_fn_t fn, void *data)
+{
+	git_config_check_init();
+	git_configset_iter(&the_config_set, fn, data);
+}
+
+int git_config_get_string(const char *key, const char **dest)
+{
+	git_config_check_init();
+	return git_configset_get_string(&the_config_set, key, dest);
+}
+
+int git_config_get_int(const char *key, int *dest)
+{
+	git_config_check_init();
+	return git_configset_get_int(&the_config_set, key, dest);
+}
+
+int git_config_get_ulong(const char *key, unsigned long *dest)
+{
+	git_config_check_init();
+	return git_configset_get_ulong(&the_config_set, key, dest);
+}
+
+int git_config_get_bool(const char *key, int *dest)
+{
+	git_config_check_init();
+	return git_configset_get_bool(&the_config_set, key, dest);
+}
+
+int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)
+{
+	git_config_check_init();
+	return git_configset_get_bool_or_int(&the_config_set, key, is_bool, dest);
+}
+
+int git_config_get_maybe_bool(const char *key, int *dest)
+{
+	git_config_check_init();
+	return git_configset_get_maybe_bool(&the_config_set, key, dest);
+}
+
+int git_config_get_pathname(const char *key, const char **dest)
+{
+	git_config_check_init();
+	return git_configset_get_pathname(&the_config_set, key, dest);
+}
diff --git a/config.c b/config.c
index a1aef1c..6cffec7 100644
--- a/config.c
+++ b/config.c
@@ -1708,6 +1708,9 @@ int git_config_set_multivar_in_file(const char *config_filename,
 	lock = NULL;
 	ret = 0;

+	/* Invalidate the config cache */
+	git_config_clear();
+
 out_free:
 	if (lock)
 		rollback_lock_file(lock);
-- 
1.9.0.GIT

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-04  4:58     ` Tanay Abhra
@ 2014-07-04  9:17       ` Matthieu Moy
  2014-07-04  9:25         ` Tanay Abhra
  0 siblings, 1 reply; 17+ messages in thread
From: Matthieu Moy @ 2014-07-04  9:17 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> Hi,
>
> I have cooked up a single hashmap implementation. What are your
> thoughts about it?

I had a quick look, and it looks good to me. I'll make a more detailed
review when you send the next series.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-04  9:17       ` Matthieu Moy
@ 2014-07-04  9:25         ` Tanay Abhra
  2014-07-04  9:43           ` Matthieu Moy
  2014-07-07 17:02           ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Tanay Abhra @ 2014-07-04  9:25 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git, Ramkumar Ramachandra



On 7/4/2014 2:47 PM, Matthieu Moy wrote:
> Tanay Abhra <tanayabh@gmail.com> writes:
> 
>> Hi,
>>
>> I have cooked up a single hashmap implementation. What are your
>> thoughts about it?
> 
> I had a quick look, and it looks good to me. I'll make a more detailed
> review when you send the next series.
>

One more doubt, does <filename,linenr> for every value has any use other than
raising semantic error in typespecific API functions.

For example, if we call git_config_get_int(foo.bar), we can show to the user
"value not a int at <filename, linenr>". Other than that I cannot think of
any other use of it. Currently `git_config_int` dies if value put for
parsing is not an int.

Junio and Karsten, both raised the point for saving <filename,linenr>, but I can't
find any use cases for it other than what I mentioned above.

Thanks.
Tanay.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-04  9:25         ` Tanay Abhra
@ 2014-07-04  9:43           ` Matthieu Moy
  2014-07-07 17:02           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Matthieu Moy @ 2014-07-04  9:43 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> One more doubt, does <filename,linenr> for every value has any use other than
> raising semantic error in typespecific API functions.

I don't see any other. My suggestion would be: ignore this for now, it's
not needed to get a new API that has at least as many features as the
existing one.

It's easy to add to the data-structure afterwards (although a bit less
easy to add to the code that fills-in the data structure).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] add `config_set` API for caching config files
  2014-07-04  9:25         ` Tanay Abhra
  2014-07-04  9:43           ` Matthieu Moy
@ 2014-07-07 17:02           ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2014-07-07 17:02 UTC (permalink / raw)
  To: Tanay Abhra; +Cc: Matthieu Moy, git, Ramkumar Ramachandra

Tanay Abhra <tanayabh@gmail.com> writes:

> On 7/4/2014 2:47 PM, Matthieu Moy wrote:
>> Tanay Abhra <tanayabh@gmail.com> writes:
>> 
>>> Hi,
>>>
>>> I have cooked up a single hashmap implementation. What are your
>>> thoughts about it?
>> 
>> I had a quick look, and it looks good to me. I'll make a more detailed
>> review when you send the next series.
>>
>
> One more doubt, does <filename,linenr> for every value has any use other than
> raising semantic error in typespecific API functions.
>
> For example, if we call git_config_get_int(foo.bar), we can show to the user
> "value not a int at <filename, linenr>". Other than that I cannot think of
> any other use of it. Currently `git_config_int` dies if value put for
> parsing is not an int.
>
> Junio and Karsten, both raised the point for saving <filename,linenr>, but I can't
> find any use cases for it other than what I mentioned above.

Yes, error reporting is what the pair needs to be kept for.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-07-07 17:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02  6:01 [PATCH v4 0/3] git config cache & special querying api utilizing the cache Tanay Abhra
2014-07-02  6:01 ` [PATCH v4 1/2] add `config_set` API for caching config files Tanay Abhra
2014-07-02  9:14   ` Matthieu Moy
2014-07-02 11:58     ` Tanay Abhra
2014-07-02 14:32       ` Matthieu Moy
2014-07-02 17:09     ` Junio C Hamano
2014-07-04  4:58     ` Tanay Abhra
2014-07-04  9:17       ` Matthieu Moy
2014-07-04  9:25         ` Tanay Abhra
2014-07-04  9:43           ` Matthieu Moy
2014-07-07 17:02           ` Junio C Hamano
2014-07-02 17:00   ` Junio C Hamano
2014-07-02 17:09     ` Matthieu Moy
2014-07-03 17:05     ` Tanay Abhra
2014-07-02  6:01 ` [PATCH 2/2] test-config: Add tests for the config_set API Tanay Abhra
2014-07-02  9:29   ` Matthieu Moy
2014-07-02 12:04     ` Tanay Abhra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).