* [PATCH v4 0/4] submodule config lookup API
@ 2015-06-02 14:24 Heiko Voigt
2015-06-02 14:25 ` [PATCH v4 1/4] implement submodule config cache for lookup of submodule names Heiko Voigt
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Heiko Voigt @ 2015-06-02 14:24 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
Since there were no other comments than style fixes here is another
iteration. The last iteration can be found here:
http://article.gmane.org/gmane.comp.version-control.git/269611
I found there is a merge conflict with master so I rebased this series
to master.
The interdiff to last iteration contains nothing spectacular:
diff --git a/submodule-config.c b/submodule-config.c
index 177767d..199692b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -27,7 +27,7 @@ struct submodule_entry {
enum lookup_type {
lookup_name,
- lookup_path,
+ lookup_path
};
static struct submodule_cache cache;
@@ -390,12 +390,12 @@ static const struct submodule *config_from(struct submodule_cache *cache,
return submodule;
switch (lookup_type) {
- case lookup_name:
- submodule = cache_lookup_name(cache, sha1, key);
- break;
- case lookup_path:
- submodule = cache_lookup_path(cache, sha1, key);
- break;
+ case lookup_name:
+ submodule = cache_lookup_name(cache, sha1, key);
+ break;
+ case lookup_path:
+ submodule = cache_lookup_path(cache, sha1, key);
+ break;
}
if (submodule)
return submodule;
@@ -419,12 +419,12 @@ static const struct submodule *config_from(struct submodule_cache *cache,
free(config);
switch (lookup_type) {
- case lookup_name:
- submodule = cache_lookup_name(cache, sha1, key);
- break;
- case lookup_path:
- submodule = cache_lookup_path(cache, sha1, key);
- break;
+ case lookup_name:
+ submodule = cache_lookup_name(cache, sha1, key);
+ break;
+ case lookup_path:
+ submodule = cache_lookup_path(cache, sha1, key);
+ break;
}
return submodule;
diff --git a/submodule-config.h b/submodule-config.h
index 58afc83..9061e4e 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -24,6 +24,6 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
const char *name);
const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
const char *path);
-void submodule_free();
+void submodule_free(void);
#endif /* SUBMODULE_CONFIG_H */
Heiko Voigt (4):
implement submodule config cache for lookup of submodule names
extract functions for submodule config set and lookup
use new config API for worktree configurations of submodules
do not die on error of parsing fetchrecursesubmodules option
.gitignore | 1 +
Documentation/technical/api-submodule-config.txt | 63 +++
Makefile | 2 +
builtin/checkout.c | 1 +
builtin/fetch.c | 1 +
diff.c | 1 +
submodule-config.c | 484 +++++++++++++++++++++++
submodule-config.h | 29 ++
submodule.c | 122 ++----
submodule.h | 4 +-
t/t7411-submodule-config.sh | 153 +++++++
test-submodule-config.c | 76 ++++
12 files changed, 839 insertions(+), 98 deletions(-)
create mode 100644 Documentation/technical/api-submodule-config.txt
create mode 100644 submodule-config.c
create mode 100644 submodule-config.h
create mode 100755 t/t7411-submodule-config.sh
create mode 100644 test-submodule-config.c
--
2.4.2.391.g2979c89
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 1/4] implement submodule config cache for lookup of submodule names
2015-06-02 14:24 [PATCH v4 0/4] submodule config lookup API Heiko Voigt
@ 2015-06-02 14:25 ` Heiko Voigt
2015-06-02 19:57 ` Junio C Hamano
2015-06-02 14:26 ` [PATCH v4 2/4] extract functions for submodule config set and lookup Heiko Voigt
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: Heiko Voigt @ 2015-06-02 14:25 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
This submodule configuration cache allows us to lazily read .gitmodules
configurations by commit into a runtime cache which can then be used to
easily lookup values from it. Currently only the values for path or name
are stored but it can be extended for any value needed.
It is expected that .gitmodules files do not change often between
commits. Thats why we lookup the .gitmodules sha1 from a commit and then
either lookup an already parsed configuration or parse and cache an
unknown one for each sha1. The cache is lazily build on demand for each
requested commit.
This cache can be used for all purposes which need knowledge about
submodule configurations. Example use cases are:
* Recursive submodule checkout needs lookup a submodule name from its
path when a submodule first appears. This needs be done before this
configuration exists in the worktree.
* The implementation of submodule support for 'git archive' needs to
lookup the submodule name to generate the archive when given a
revision that is not checked out.
* 'git fetch' when given the --recurse-submodules=on-demand option (or
configuration) needs to lookup submodule names by path from the
database rather than reading from the worktree. For new submodule it
needs to lookup the name from its path to allow cloning new
submodules into the .git folder so they can be checked out without
any network interaction when the user does a checkout of that
revision.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
.gitignore | 1 +
Documentation/technical/api-submodule-config.txt | 46 +++
Makefile | 2 +
submodule-config.c | 445 +++++++++++++++++++++++
submodule-config.h | 27 ++
submodule.c | 1 +
submodule.h | 1 +
t/t7411-submodule-config.sh | 85 +++++
test-submodule-config.c | 66 ++++
9 files changed, 674 insertions(+)
create mode 100644 Documentation/technical/api-submodule-config.txt
create mode 100644 submodule-config.c
create mode 100644 submodule-config.h
create mode 100755 t/t7411-submodule-config.sh
create mode 100644 test-submodule-config.c
diff --git a/.gitignore b/.gitignore
index 422c538..337c121 100644
--- a/.gitignore
+++ b/.gitignore
@@ -204,6 +204,7 @@
/test-sha1-array
/test-sigchain
/test-string-list
+/test-submodule-config
/test-subprocess
/test-svn-fe
/test-urlmatch-normalization
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
new file mode 100644
index 0000000..2ff4907
--- /dev/null
+++ b/Documentation/technical/api-submodule-config.txt
@@ -0,0 +1,46 @@
+submodule config cache API
+==========================
+
+The submodule config cache API allows to read submodule
+configurations/information from specified revisions. Internally
+information is lazily read into a cache that is used to avoid
+unnecessary parsing of the same .gitmodule files. Lookups can be done by
+submodule path or name.
+
+Usage
+-----
+
+The caller can look up information about submodules by using the
+`submodule_from_path()` or `submodule_from_name()` functions. They return
+a `struct submodule` which contains the values. The API automatically
+initializes and allocates the needed infrastructure on-demand.
+
+If the internal cache might grow too big or when the caller is done with
+the API, all internally cached values can be freed with submodule_free().
+
+Data Structures
+---------------
+
+`struct submodule`::
+
+ This structure is used to return the information about one
+ submodule for a certain revision. It is returned by the lookup
+ functions.
+
+Functions
+---------
+
+`void submodule_free()`::
+
+ Use these to free the internally cached values.
+
+`const struct submodule *submodule_from_path(const unsigned char *commit_sha1, const char *path)`::
+
+ Lookup values for one submodule by its commit_sha1 and path or
+ name.
+
+`const struct submodule *submodule_from_name(const unsigned char *commit_sha1, const char *name)`::
+
+ The same as above but lookup by name.
+
+For an example usage see test-submodule-config.c.
diff --git a/Makefile b/Makefile
index 54ec511..5d9a63f 100644
--- a/Makefile
+++ b/Makefile
@@ -594,6 +594,7 @@ TEST_PROGRAMS_NEED_X += test-sha1
TEST_PROGRAMS_NEED_X += test-sha1-array
TEST_PROGRAMS_NEED_X += test-sigchain
TEST_PROGRAMS_NEED_X += test-string-list
+TEST_PROGRAMS_NEED_X += test-submodule-config
TEST_PROGRAMS_NEED_X += test-subprocess
TEST_PROGRAMS_NEED_X += test-svn-fe
TEST_PROGRAMS_NEED_X += test-urlmatch-normalization
@@ -784,6 +785,7 @@ LIB_OBJS += strbuf.o
LIB_OBJS += streaming.o
LIB_OBJS += string-list.o
LIB_OBJS += submodule.o
+LIB_OBJS += submodule-config.o
LIB_OBJS += symlinks.o
LIB_OBJS += tag.o
LIB_OBJS += trace.o
diff --git a/submodule-config.c b/submodule-config.c
new file mode 100644
index 0000000..97f4a04
--- /dev/null
+++ b/submodule-config.c
@@ -0,0 +1,445 @@
+#include "cache.h"
+#include "submodule-config.h"
+#include "submodule.h"
+#include "strbuf.h"
+
+/*
+ * submodule cache lookup structure
+ * There is one shared set of 'struct submodule' entries which can be
+ * looked up by their sha1 blob id of the .gitmodule file and either
+ * using path or name as key.
+ * for_path stores submodule entries with path as key
+ * for_name stores submodule entries with name as key
+ */
+struct submodule_cache {
+ struct hashmap for_path;
+ struct hashmap for_name;
+};
+
+/*
+ * thin wrapper struct needed to insert 'struct submodule' entries to
+ * the hashmap
+ */
+struct submodule_entry {
+ struct hashmap_entry ent;
+ struct submodule *config;
+};
+
+enum lookup_type {
+ lookup_name,
+ lookup_path
+};
+
+static struct submodule_cache cache;
+static int is_cache_init;
+
+static int config_path_cmp(const struct submodule_entry *a,
+ const struct submodule_entry *b,
+ const void *unused)
+{
+ return strcmp(a->config->path, b->config->path) ||
+ hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
+}
+
+static int config_name_cmp(const struct submodule_entry *a,
+ const struct submodule_entry *b,
+ const void *unused)
+{
+ return strcmp(a->config->name, b->config->name) ||
+ hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
+}
+
+static void cache_init(struct submodule_cache *cache)
+{
+ hashmap_init(&cache->for_path, (hashmap_cmp_fn) config_path_cmp, 0);
+ hashmap_init(&cache->for_name, (hashmap_cmp_fn) config_name_cmp, 0);
+}
+
+static void free_one_config(struct submodule_entry *entry)
+{
+ free((void *) entry->config->path);
+ free((void *) entry->config->name);
+ free(entry->config);
+}
+
+static void cache_free(struct submodule_cache *cache)
+{
+ struct hashmap_iter iter;
+ struct submodule_entry *entry;
+
+ /*
+ * We iterate over the name hash here to be symmetric with the
+ * allocation of struct submodule entries. Each is allocated by
+ * their .gitmodule blob sha1 and submodule name.
+ */
+ hashmap_iter_init(&cache->for_name, &iter);
+ while ((entry = hashmap_iter_next(&iter)))
+ free_one_config(entry);
+
+ hashmap_free(&cache->for_path, 1);
+ hashmap_free(&cache->for_name, 1);
+}
+
+static unsigned int hash_sha1_string(const unsigned char *sha1,
+ const char *string)
+{
+ return memhash(sha1, 20) + strhash(string);
+}
+
+static void cache_put_path(struct submodule_cache *cache,
+ struct submodule *submodule)
+{
+ unsigned int hash = hash_sha1_string(submodule->gitmodules_sha1,
+ submodule->path);
+ struct submodule_entry *e = xmalloc(sizeof(*e));
+ hashmap_entry_init(e, hash);
+ e->config = submodule;
+ hashmap_put(&cache->for_path, e);
+}
+
+static void cache_remove_path(struct submodule_cache *cache,
+ struct submodule *submodule)
+{
+ unsigned int hash = hash_sha1_string(submodule->gitmodules_sha1,
+ submodule->path);
+ struct submodule_entry e;
+ struct submodule_entry *removed;
+ hashmap_entry_init(&e, hash);
+ e.config = submodule;
+ removed = hashmap_remove(&cache->for_path, &e, NULL);
+ free(removed);
+}
+
+static void cache_add(struct submodule_cache *cache,
+ struct submodule *submodule)
+{
+ unsigned int hash = hash_sha1_string(submodule->gitmodules_sha1,
+ submodule->name);
+ struct submodule_entry *e = xmalloc(sizeof(*e));
+ hashmap_entry_init(e, hash);
+ e->config = submodule;
+ hashmap_add(&cache->for_name, e);
+}
+
+static const struct submodule *cache_lookup_path(struct submodule_cache *cache,
+ const unsigned char *gitmodules_sha1, const char *path)
+{
+ struct submodule_entry *entry;
+ unsigned int hash = hash_sha1_string(gitmodules_sha1, path);
+ struct submodule_entry key;
+ struct submodule key_config;
+
+ hashcpy(key_config.gitmodules_sha1, gitmodules_sha1);
+ key_config.path = path;
+
+ hashmap_entry_init(&key, hash);
+ key.config = &key_config;
+
+ entry = hashmap_get(&cache->for_path, &key, NULL);
+ if (entry)
+ return entry->config;
+ return NULL;
+}
+
+static struct submodule *cache_lookup_name(struct submodule_cache *cache,
+ const unsigned char *gitmodules_sha1, const char *name)
+{
+ struct submodule_entry *entry;
+ unsigned int hash = hash_sha1_string(gitmodules_sha1, name);
+ struct submodule_entry key;
+ struct submodule key_config;
+
+ hashcpy(key_config.gitmodules_sha1, gitmodules_sha1);
+ key_config.name = name;
+
+ hashmap_entry_init(&key, hash);
+ key.config = &key_config;
+
+ entry = hashmap_get(&cache->for_name, &key, NULL);
+ if (entry)
+ return entry->config;
+ return NULL;
+}
+
+static int name_and_item_from_var(const char *var, struct strbuf *name,
+ struct strbuf *item)
+{
+ const char *subsection, *key;
+ int subsection_len, parse;
+ parse = parse_config_key(var, "submodule", &subsection,
+ &subsection_len, &key);
+ if (parse < 0 || !subsection)
+ return 0;
+
+ strbuf_add(name, subsection, subsection_len);
+ strbuf_addstr(item, key);
+
+ return 1;
+}
+
+static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
+ const unsigned char *gitmodules_sha1, const char *name)
+{
+ struct submodule *submodule;
+ struct strbuf name_buf = STRBUF_INIT;
+
+ submodule = cache_lookup_name(cache, gitmodules_sha1, name);
+ if (submodule)
+ return submodule;
+
+ submodule = xmalloc(sizeof(*submodule));
+
+ strbuf_addstr(&name_buf, name);
+ submodule->name = strbuf_detach(&name_buf, NULL);
+
+ submodule->path = NULL;
+ submodule->url = NULL;
+ submodule->fetch_recurse = RECURSE_SUBMODULES_NONE;
+ submodule->ignore = NULL;
+
+ hashcpy(submodule->gitmodules_sha1, gitmodules_sha1);
+
+ cache_add(cache, submodule);
+
+ return submodule;
+}
+
+static void warn_multiple_config(const unsigned char *commit_sha1,
+ const char *name, const char *option)
+{
+ const char *commit_string = "WORKTREE";
+ if (commit_sha1)
+ commit_string = sha1_to_hex(commit_sha1);
+ warning("%s:.gitmodules, multiple configurations found for "
+ "'submodule.%s.%s'. Skipping second one!",
+ commit_string, name, option);
+}
+
+struct parse_config_parameter {
+ struct submodule_cache *cache;
+ const unsigned char *commit_sha1;
+ const unsigned char *gitmodules_sha1;
+ int overwrite;
+};
+
+static int parse_config(const char *var, const char *value, void *data)
+{
+ struct parse_config_parameter *me = data;
+ struct submodule *submodule;
+ struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
+ int ret = 0;
+
+ /* this also ensures that we only parse submodule entries */
+ if (!name_and_item_from_var(var, &name, &item))
+ return 0;
+
+ submodule = lookup_or_create_by_name(me->cache, me->gitmodules_sha1,
+ name.buf);
+
+ if (!strcmp(item.buf, "path")) {
+ struct strbuf path = STRBUF_INIT;
+ if (!value) {
+ ret = config_error_nonbool(var);
+ goto release_return;
+ }
+ if (!me->overwrite && submodule->path != NULL) {
+ warn_multiple_config(me->commit_sha1, submodule->name,
+ "path");
+ goto release_return;
+ }
+
+ if (submodule->path)
+ cache_remove_path(me->cache, submodule);
+ free((void *) submodule->path);
+ strbuf_addstr(&path, value);
+ submodule->path = strbuf_detach(&path, NULL);
+ cache_put_path(me->cache, submodule);
+ } else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+ if (!me->overwrite &&
+ submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) {
+ warn_multiple_config(me->commit_sha1, submodule->name,
+ "fetchrecursesubmodules");
+ goto release_return;
+ }
+
+ submodule->fetch_recurse = parse_fetch_recurse_submodules_arg(var, value);
+ } else if (!strcmp(item.buf, "ignore")) {
+ struct strbuf ignore = STRBUF_INIT;
+ if (!me->overwrite && submodule->ignore != NULL) {
+ warn_multiple_config(me->commit_sha1, submodule->name,
+ "ignore");
+ goto release_return;
+ }
+ if (!value) {
+ ret = config_error_nonbool(var);
+ goto release_return;
+ }
+ if (strcmp(value, "untracked") && strcmp(value, "dirty") &&
+ strcmp(value, "all") && strcmp(value, "none")) {
+ warning("Invalid parameter '%s' for config option "
+ "'submodule.%s.ignore'", value, var);
+ goto release_return;
+ }
+
+ free((void *) submodule->ignore);
+ strbuf_addstr(&ignore, value);
+ submodule->ignore = strbuf_detach(&ignore, NULL);
+ } else if (!strcmp(item.buf, "url")) {
+ struct strbuf url = STRBUF_INIT;
+ if (!value) {
+ ret = config_error_nonbool(var);
+ goto release_return;
+ }
+ if (!me->overwrite && submodule->url != NULL) {
+ warn_multiple_config(me->commit_sha1, submodule->name,
+ "url");
+ goto release_return;
+ }
+
+ free((void *) submodule->url);
+ strbuf_addstr(&url, value);
+ submodule->url = strbuf_detach(&url, NULL);
+ }
+
+release_return:
+ strbuf_release(&name);
+ strbuf_release(&item);
+
+ return ret;
+}
+
+static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+ unsigned char *gitmodules_sha1)
+{
+ struct strbuf rev = STRBUF_INIT;
+ int ret = 0;
+
+ if (is_null_sha1(commit_sha1)) {
+ hashcpy(gitmodules_sha1, null_sha1);
+ return 1;
+ }
+
+ strbuf_addf(&rev, "%s:.gitmodules", sha1_to_hex(commit_sha1));
+ if (get_sha1(rev.buf, gitmodules_sha1) >= 0)
+ ret = 1;
+
+ strbuf_release(&rev);
+ return ret;
+}
+
+/* This does a lookup of a submodule configuration by name or by path
+ * (key) with on-demand reading of the appropriate .gitmodules from
+ * revisions.
+ */
+static const struct submodule *config_from(struct submodule_cache *cache,
+ const unsigned char *commit_sha1, const char *key,
+ enum lookup_type lookup_type)
+{
+ struct strbuf rev = STRBUF_INIT;
+ unsigned long config_size;
+ char *config;
+ unsigned char sha1[20];
+ enum object_type type;
+ const struct submodule *submodule = NULL;
+ struct parse_config_parameter parameter;
+
+ /*
+ * If any parameter except the cache is a NULL pointer just
+ * return the first submodule. Can be used to check whether
+ * there are any submodules parsed.
+ */
+ if (!commit_sha1 || !key) {
+ struct hashmap_iter iter;
+ struct submodule_entry *entry;
+
+ hashmap_iter_init(&cache->for_name, &iter);
+ entry = hashmap_iter_next(&iter);
+ if (!entry)
+ return NULL;
+ return entry->config;
+ }
+
+ if (!gitmodule_sha1_from_commit(commit_sha1, sha1))
+ return submodule;
+
+ switch (lookup_type) {
+ case lookup_name:
+ submodule = cache_lookup_name(cache, sha1, key);
+ break;
+ case lookup_path:
+ submodule = cache_lookup_path(cache, sha1, key);
+ break;
+ }
+ if (submodule)
+ return submodule;
+
+ config = read_sha1_file(sha1, &type, &config_size);
+ if (!config)
+ return submodule;
+
+ if (type != OBJ_BLOB) {
+ free(config);
+ return submodule;
+ }
+
+ /* fill the submodule config into the cache */
+ parameter.cache = cache;
+ parameter.commit_sha1 = commit_sha1;
+ parameter.gitmodules_sha1 = sha1;
+ parameter.overwrite = 0;
+ git_config_from_buf(parse_config, rev.buf, config, config_size,
+ ¶meter);
+ free(config);
+
+ switch (lookup_type) {
+ case lookup_name:
+ submodule = cache_lookup_name(cache, sha1, key);
+ break;
+ case lookup_path:
+ submodule = cache_lookup_path(cache, sha1, key);
+ break;
+ }
+
+ return submodule;
+}
+
+static const struct submodule *config_from_path(struct submodule_cache *cache,
+ const unsigned char *commit_sha1, const char *path)
+{
+ return config_from(cache, commit_sha1, path, lookup_path);
+}
+
+static const struct submodule *config_from_name(struct submodule_cache *cache,
+ const unsigned char *commit_sha1, const char *name)
+{
+ return config_from(cache, commit_sha1, name, lookup_name);
+}
+
+static void ensure_cache_init(void)
+{
+ if (is_cache_init)
+ return;
+
+ cache_init(&cache);
+ is_cache_init = 1;
+}
+
+const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
+ const char *name)
+{
+ ensure_cache_init();
+ return config_from_name(&cache, commit_sha1, name);
+}
+
+const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
+ const char *path)
+{
+ ensure_cache_init();
+ return config_from_path(&cache, commit_sha1, path);
+}
+
+void submodule_free(void)
+{
+ cache_free(&cache);
+ is_cache_init = 0;
+}
diff --git a/submodule-config.h b/submodule-config.h
new file mode 100644
index 0000000..cd68030
--- /dev/null
+++ b/submodule-config.h
@@ -0,0 +1,27 @@
+#ifndef SUBMODULE_CONFIG_CACHE_H
+#define SUBMODULE_CONFIG_CACHE_H
+
+#include "hashmap.h"
+#include "strbuf.h"
+
+/*
+ * Submodule entry containing the information about a certain submodule
+ * in a certain revision.
+ */
+struct submodule {
+ const char *path;
+ const char *name;
+ const char *url;
+ int fetch_recurse;
+ const char *ignore;
+ /* the sha1 blob id of the responsible .gitmodules file */
+ unsigned char gitmodules_sha1[20];
+};
+
+const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
+ const char *name);
+const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
+ const char *path);
+void submodule_free(void);
+
+#endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index b8747f5..7822dc5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -355,6 +355,7 @@ int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg)
default:
if (!strcmp(arg, "on-demand"))
return RECURSE_SUBMODULES_ON_DEMAND;
+ /* TODO: remove the die for history parsing here */
die("bad %s argument: %s", opt, arg);
}
}
diff --git a/submodule.h b/submodule.h
index 7beec48..920fef3 100644
--- a/submodule.h
+++ b/submodule.h
@@ -5,6 +5,7 @@ struct diff_options;
struct argv_array;
enum {
+ RECURSE_SUBMODULES_NONE = -2,
RECURSE_SUBMODULES_ON_DEMAND = -1,
RECURSE_SUBMODULES_OFF = 0,
RECURSE_SUBMODULES_DEFAULT = 1,
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
new file mode 100755
index 0000000..2602bc5
--- /dev/null
+++ b/t/t7411-submodule-config.sh
@@ -0,0 +1,85 @@
+#!/bin/sh
+#
+# Copyright (c) 2014 Heiko Voigt
+#
+
+test_description='Test submodules config cache infrastructure
+
+This test verifies that parsing .gitmodules configuration directly
+from the database works.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+
+test_expect_success 'submodule config cache setup' '
+ mkdir submodule &&
+ (cd submodule &&
+ git init &&
+ echo a >a &&
+ git add . &&
+ git commit -ma
+ ) &&
+ mkdir super &&
+ (cd super &&
+ git init &&
+ git submodule add ../submodule &&
+ git submodule add ../submodule a &&
+ git commit -m "add as submodule and as a" &&
+ git mv a b &&
+ git commit -m "move a to b"
+ )
+'
+
+cat >super/expect <<EOF
+Submodule name: 'a' for path 'a'
+Submodule name: 'a' for path 'b'
+Submodule name: 'submodule' for path 'submodule'
+Submodule name: 'submodule' for path 'submodule'
+EOF
+
+test_expect_success 'test parsing and lookup of submodule config by path' '
+ (cd super &&
+ test-submodule-config \
+ HEAD^ a \
+ HEAD b \
+ HEAD^ submodule \
+ HEAD submodule \
+ >actual &&
+ test_cmp expect actual
+ )
+'
+
+test_expect_success 'test parsing and lookup of submodule config by name' '
+ (cd super &&
+ test-submodule-config --name \
+ HEAD^ a \
+ HEAD a \
+ HEAD^ submodule \
+ HEAD submodule \
+ >actual &&
+ test_cmp expect actual
+ )
+'
+
+cat >super/expect_error <<EOF
+Submodule name: 'a' for path 'b'
+Submodule name: 'submodule' for path 'submodule'
+EOF
+
+test_expect_success 'error in one submodule config lets continue' '
+ (cd super &&
+ cp .gitmodules .gitmodules.bak &&
+ echo " value = \"" >>.gitmodules &&
+ git add .gitmodules &&
+ mv .gitmodules.bak .gitmodules &&
+ git commit -m "add error" &&
+ test-submodule-config \
+ HEAD b \
+ HEAD submodule \
+ >actual &&
+ test_cmp expect_error actual
+ )
+'
+
+test_done
diff --git a/test-submodule-config.c b/test-submodule-config.c
new file mode 100644
index 0000000..f3c3918
--- /dev/null
+++ b/test-submodule-config.c
@@ -0,0 +1,66 @@
+#include "cache.h"
+#include "submodule-config.h"
+
+static void die_usage(int argc, char **argv, const char *msg)
+{
+ fprintf(stderr, "%s\n", msg);
+ fprintf(stderr, "Usage: %s [<commit> <submodulepath>] ...\n", argv[0]);
+ exit(1);
+}
+
+int main(int argc, char **argv)
+{
+ char **arg = argv;
+ int my_argc = argc;
+ int output_url = 0;
+ int lookup_name = 0;
+
+ arg++;
+ my_argc--;
+ while (starts_with(arg[0], "--")) {
+ if (!strcmp(arg[0], "--url"))
+ output_url = 1;
+ if (!strcmp(arg[0], "--name"))
+ lookup_name = 1;
+ arg++;
+ my_argc--;
+ }
+
+ if (my_argc % 2 != 0)
+ die_usage(argc, argv, "Wrong number of arguments.");
+
+ while (*arg) {
+ unsigned char commit_sha1[20];
+ const struct submodule *submodule;
+ const char *commit;
+ const char *path_or_name;
+
+ commit = arg[0];
+ path_or_name = arg[1];
+
+ if (commit[0] == '\0')
+ hashcpy(commit_sha1, null_sha1);
+ else if (get_sha1(commit, commit_sha1) < 0)
+ die_usage(argc, argv, "Commit not found.");
+
+ if (lookup_name) {
+ submodule = submodule_from_name(commit_sha1, path_or_name);
+ } else
+ submodule = submodule_from_path(commit_sha1, path_or_name);
+ if (!submodule)
+ die_usage(argc, argv, "Submodule not found.");
+
+ if (output_url)
+ printf("Submodule url: '%s' for path '%s'\n",
+ submodule->url, submodule->path);
+ else
+ printf("Submodule name: '%s' for path '%s'\n",
+ submodule->name, submodule->path);
+
+ arg += 2;
+ }
+
+ submodule_free();
+
+ return 0;
+}
--
2.4.2.391.g2979c89
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/4] extract functions for submodule config set and lookup
2015-06-02 14:24 [PATCH v4 0/4] submodule config lookup API Heiko Voigt
2015-06-02 14:25 ` [PATCH v4 1/4] implement submodule config cache for lookup of submodule names Heiko Voigt
@ 2015-06-02 14:26 ` Heiko Voigt
2015-06-02 14:27 ` [PATCH v4 3/4] use new config API for worktree configurations of submodules Heiko Voigt
2015-06-02 14:28 ` [PATCH v4 4/4] do not die on error of parsing fetchrecursesubmodules option Heiko Voigt
3 siblings, 0 replies; 7+ messages in thread
From: Heiko Voigt @ 2015-06-02 14:26 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
This is one step towards using the new configuration API. We just
extract these functions to make replacing the actual code easier.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
submodule.c | 142 +++++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 97 insertions(+), 45 deletions(-)
diff --git a/submodule.c b/submodule.c
index 7822dc5..c3b5f44 100644
--- a/submodule.c
+++ b/submodule.c
@@ -41,6 +41,76 @@ static int gitmodules_is_unmerged;
*/
static int gitmodules_is_modified;
+static const char *get_name_for_path(const char *path)
+{
+ struct string_list_item *path_option;
+ if (path == NULL) {
+ if (config_name_for_path.nr > 0)
+ return config_name_for_path.items[0].util;
+ else
+ return NULL;
+ }
+ path_option = unsorted_string_list_lookup(&config_name_for_path, path);
+ if (!path_option)
+ return NULL;
+ return path_option->util;
+}
+
+static void set_name_for_path(const char *path, const char *name, int namelen)
+{
+ struct string_list_item *config;
+ config = unsorted_string_list_lookup(&config_name_for_path, path);
+ if (config)
+ free(config->util);
+ else
+ config = string_list_append(&config_name_for_path, xstrdup(path));
+ config->util = xmemdupz(name, namelen);
+}
+
+static const char *get_ignore_for_name(const char *name)
+{
+ struct string_list_item *ignore_option;
+ ignore_option = unsorted_string_list_lookup(&config_ignore_for_name, name);
+ if (!ignore_option)
+ return NULL;
+
+ return ignore_option->util;
+}
+
+static void set_ignore_for_name(const char *name, int namelen, const char *ignore)
+{
+ struct string_list_item *config;
+ char *name_cstr = xmemdupz(name, namelen);
+ config = unsorted_string_list_lookup(&config_ignore_for_name, name_cstr);
+ if (config) {
+ free(config->util);
+ free(name_cstr);
+ } else
+ config = string_list_append(&config_ignore_for_name, name_cstr);
+ config->util = xstrdup(ignore);
+}
+
+static int get_fetch_recurse_for_name(const char *name)
+{
+ struct string_list_item *fetch_recurse;
+ fetch_recurse = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name);
+ if (!fetch_recurse)
+ return RECURSE_SUBMODULES_NONE;
+
+ return (intptr_t) fetch_recurse->util;
+}
+
+static void set_fetch_recurse_for_name(const char *name, int namelen, int fetch_recurse)
+{
+ struct string_list_item *config;
+ char *name_cstr = xmemdupz(name, namelen);
+ config = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name_cstr);
+ if (!config)
+ config = string_list_append(&config_fetch_recurse_submodules_for_name, name_cstr);
+ else
+ free(name_cstr);
+ config->util = (void *)(intptr_t) fetch_recurse;
+}
int is_staging_gitmodules_ok(void)
{
@@ -55,7 +125,7 @@ int is_staging_gitmodules_ok(void)
int update_path_in_gitmodules(const char *oldpath, const char *newpath)
{
struct strbuf entry = STRBUF_INIT;
- struct string_list_item *path_option;
+ const char *path;
if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
return -1;
@@ -63,13 +133,13 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
if (gitmodules_is_unmerged)
die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
- path_option = unsorted_string_list_lookup(&config_name_for_path, oldpath);
- if (!path_option) {
+ path = get_name_for_path(oldpath);
+ if (!path) {
warning(_("Could not find section in .gitmodules where path=%s"), oldpath);
return -1;
}
strbuf_addstr(&entry, "submodule.");
- strbuf_addstr(&entry, path_option->util);
+ strbuf_addstr(&entry, path);
strbuf_addstr(&entry, ".path");
if (git_config_set_in_file(".gitmodules", entry.buf, newpath) < 0) {
/* Maybe the user already did that, don't error out here */
@@ -89,7 +159,7 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
int remove_path_from_gitmodules(const char *path)
{
struct strbuf sect = STRBUF_INIT;
- struct string_list_item *path_option;
+ const char *path_option;
if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
return -1;
@@ -97,13 +167,13 @@ int remove_path_from_gitmodules(const char *path)
if (gitmodules_is_unmerged)
die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
- path_option = unsorted_string_list_lookup(&config_name_for_path, path);
+ path_option = get_name_for_path(path);
if (!path_option) {
warning(_("Could not find section in .gitmodules where path=%s"), path);
return -1;
}
strbuf_addstr(§, "submodule.");
- strbuf_addstr(§, path_option->util);
+ strbuf_addstr(§, path_option);
if (git_config_rename_section_in_file(".gitmodules", sect.buf, NULL) < 0) {
/* Maybe the user already did that, don't error out here */
warning(_("Could not remove .gitmodules entry for %s"), path);
@@ -165,12 +235,11 @@ done:
void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
const char *path)
{
- struct string_list_item *path_option, *ignore_option;
- path_option = unsorted_string_list_lookup(&config_name_for_path, path);
- if (path_option) {
- ignore_option = unsorted_string_list_lookup(&config_ignore_for_name, path_option->util);
- if (ignore_option)
- handle_ignore_submodules_arg(diffopt, ignore_option->util);
+ const char *name = get_name_for_path(path);
+ if (name) {
+ const char *ignore = get_ignore_for_name(name);
+ if (ignore)
+ handle_ignore_submodules_arg(diffopt, ignore);
else if (gitmodules_is_unmerged)
DIFF_OPT_SET(diffopt, IGNORE_SUBMODULES);
}
@@ -221,7 +290,6 @@ void gitmodules_config(void)
int parse_submodule_config_option(const char *var, const char *value)
{
- struct string_list_item *config;
const char *name, *key;
int namelen;
@@ -232,22 +300,14 @@ int parse_submodule_config_option(const char *var, const char *value)
if (!value)
return config_error_nonbool(var);
- config = unsorted_string_list_lookup(&config_name_for_path, value);
- if (config)
- free(config->util);
- else
- config = string_list_append(&config_name_for_path, xstrdup(value));
- config->util = xmemdupz(name, namelen);
+ set_name_for_path(value, name, namelen);
+
} else if (!strcmp(key, "fetchrecursesubmodules")) {
- char *name_cstr = xmemdupz(name, namelen);
- config = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name_cstr);
- if (!config)
- config = string_list_append(&config_fetch_recurse_submodules_for_name, name_cstr);
- else
- free(name_cstr);
- config->util = (void *)(intptr_t)parse_fetch_recurse_submodules_arg(var, value);
+ int fetch_recurse = parse_fetch_recurse_submodules_arg(var, value);
+
+ set_fetch_recurse_for_name(name, namelen, fetch_recurse);
+
} else if (!strcmp(key, "ignore")) {
- char *name_cstr;
if (!value)
return config_error_nonbool(var);
@@ -258,14 +318,7 @@ int parse_submodule_config_option(const char *var, const char *value)
return 0;
}
- name_cstr = xmemdupz(name, namelen);
- config = unsorted_string_list_lookup(&config_ignore_for_name, name_cstr);
- if (config) {
- free(config->util);
- free(name_cstr);
- } else
- config = string_list_append(&config_ignore_for_name, name_cstr);
- config->util = xstrdup(value);
+ set_ignore_for_name(name, namelen, value);
return 0;
}
return 0;
@@ -646,7 +699,7 @@ static void calculate_changed_submodule_paths(void)
struct argv_array argv = ARGV_ARRAY_INIT;
/* No need to check if there are no submodules configured */
- if (!config_name_for_path.nr)
+ if (!get_name_for_path(NULL))
return;
init_revisions(&rev, NULL);
@@ -693,7 +746,7 @@ int fetch_populated_submodules(const struct argv_array *options,
int i, result = 0;
struct child_process cp = CHILD_PROCESS_INIT;
struct argv_array argv = ARGV_ARRAY_INIT;
- struct string_list_item *name_for_path;
+ const char *name_for_path;
const char *work_tree = get_git_work_tree();
if (!work_tree)
goto out;
@@ -724,18 +777,17 @@ int fetch_populated_submodules(const struct argv_array *options,
continue;
name = ce->name;
- name_for_path = unsorted_string_list_lookup(&config_name_for_path, ce->name);
+ name_for_path = get_name_for_path(ce->name);
if (name_for_path)
- name = name_for_path->util;
+ name = name_for_path;
default_argv = "yes";
if (command_line_option == RECURSE_SUBMODULES_DEFAULT) {
- struct string_list_item *fetch_recurse_submodules_option;
- fetch_recurse_submodules_option = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name);
- if (fetch_recurse_submodules_option) {
- if ((intptr_t)fetch_recurse_submodules_option->util == RECURSE_SUBMODULES_OFF)
+ int fetch_recurse_option = get_fetch_recurse_for_name(name);
+ if (fetch_recurse_option != RECURSE_SUBMODULES_NONE) {
+ if (fetch_recurse_option == RECURSE_SUBMODULES_OFF)
continue;
- if ((intptr_t)fetch_recurse_submodules_option->util == RECURSE_SUBMODULES_ON_DEMAND) {
+ if (fetch_recurse_option == RECURSE_SUBMODULES_ON_DEMAND) {
if (!unsorted_string_list_lookup(&changed_submodule_paths, ce->name))
continue;
default_argv = "on-demand";
--
2.4.2.391.g2979c89
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 3/4] use new config API for worktree configurations of submodules
2015-06-02 14:24 [PATCH v4 0/4] submodule config lookup API Heiko Voigt
2015-06-02 14:25 ` [PATCH v4 1/4] implement submodule config cache for lookup of submodule names Heiko Voigt
2015-06-02 14:26 ` [PATCH v4 2/4] extract functions for submodule config set and lookup Heiko Voigt
@ 2015-06-02 14:27 ` Heiko Voigt
2015-06-02 14:28 ` [PATCH v4 4/4] do not die on error of parsing fetchrecursesubmodules option Heiko Voigt
3 siblings, 0 replies; 7+ messages in thread
From: Heiko Voigt @ 2015-06-02 14:27 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
We remove the extracted functions and directly parse into and read out
of the cache. This allows us to have one unified way of accessing
submodule configuration values specific to single submodules. Regardless
whether we need to access a configuration from history or from the
worktree.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
Documentation/technical/api-submodule-config.txt | 19 ++-
builtin/checkout.c | 1 +
diff.c | 1 +
submodule-config.c | 12 ++
submodule-config.h | 1 +
submodule.c | 160 ++++-------------------
submodule.h | 1 -
t/t7411-submodule-config.sh | 37 +++++-
test-submodule-config.c | 10 ++
9 files changed, 104 insertions(+), 138 deletions(-)
diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
index 2ff4907..2ea520a 100644
--- a/Documentation/technical/api-submodule-config.txt
+++ b/Documentation/technical/api-submodule-config.txt
@@ -10,10 +10,18 @@ submodule path or name.
Usage
-----
+To initialize the cache with configurations from the worktree the caller
+typically first calls `gitmodules_config()` to read values from the
+worktree .gitmodules and then to overlay the local git config values
+`parse_submodule_config_option()` from the config parsing
+infrastructure.
+
The caller can look up information about submodules by using the
`submodule_from_path()` or `submodule_from_name()` functions. They return
a `struct submodule` which contains the values. The API automatically
-initializes and allocates the needed infrastructure on-demand.
+initializes and allocates the needed infrastructure on-demand. If the
+caller does only want to lookup values from revisions the initialization
+can be skipped.
If the internal cache might grow too big or when the caller is done with
the API, all internally cached values can be freed with submodule_free().
@@ -34,6 +42,11 @@ Functions
Use these to free the internally cached values.
+`int parse_submodule_config_option(const char *var, const char *value)`::
+
+ Can be passed to the config parsing infrastructure to parse
+ local (worktree) submodule configurations.
+
`const struct submodule *submodule_from_path(const unsigned char *commit_sha1, const char *path)`::
Lookup values for one submodule by its commit_sha1 and path or
@@ -43,4 +56,8 @@ Functions
The same as above but lookup by name.
+If given the null_sha1 as commit_sha1 the local configuration of a
+submodule will be returned (e.g. consolidated values from local git
+configuration and the .gitmodules file in the worktree).
+
For an example usage see test-submodule-config.c.
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2f92328..f1f168d 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -18,6 +18,7 @@
#include "xdiff-interface.h"
#include "ll-merge.h"
#include "resolve-undo.h"
+#include "submodule-config.h"
#include "submodule.h"
#include "argv-array.h"
#include "sigchain.h"
diff --git a/diff.c b/diff.c
index 7500c55..d0be279 100644
--- a/diff.c
+++ b/diff.c
@@ -13,6 +13,7 @@
#include "utf8.h"
#include "userdiff.h"
#include "sigchain.h"
+#include "submodule-config.h"
#include "submodule.h"
#include "ll-merge.h"
#include "string-list.h"
diff --git a/submodule-config.c b/submodule-config.c
index 97f4a04..fc7bf40 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -424,6 +424,18 @@ static void ensure_cache_init(void)
is_cache_init = 1;
}
+int parse_submodule_config_option(const char *var, const char *value)
+{
+ struct parse_config_parameter parameter;
+ parameter.cache = &cache;
+ parameter.commit_sha1 = NULL;
+ parameter.gitmodules_sha1 = null_sha1;
+ parameter.overwrite = 1;
+
+ ensure_cache_init();
+ return parse_config(var, value, ¶meter);
+}
+
const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
const char *name)
{
diff --git a/submodule-config.h b/submodule-config.h
index cd68030..5fe44ce 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -18,6 +18,7 @@ struct submodule {
unsigned char gitmodules_sha1[20];
};
+int parse_submodule_config_option(const char *var, const char *value);
const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
const char *name);
const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
diff --git a/submodule.c b/submodule.c
index c3b5f44..97355eb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1,4 +1,5 @@
#include "cache.h"
+#include "submodule-config.h"
#include "submodule.h"
#include "dir.h"
#include "diff.h"
@@ -12,9 +13,6 @@
#include "argv-array.h"
#include "blob.h"
-static struct string_list config_name_for_path;
-static struct string_list config_fetch_recurse_submodules_for_name;
-static struct string_list config_ignore_for_name;
static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
static struct string_list changed_submodule_paths;
static int initialized_fetch_ref_tips;
@@ -41,77 +39,6 @@ static int gitmodules_is_unmerged;
*/
static int gitmodules_is_modified;
-static const char *get_name_for_path(const char *path)
-{
- struct string_list_item *path_option;
- if (path == NULL) {
- if (config_name_for_path.nr > 0)
- return config_name_for_path.items[0].util;
- else
- return NULL;
- }
- path_option = unsorted_string_list_lookup(&config_name_for_path, path);
- if (!path_option)
- return NULL;
- return path_option->util;
-}
-
-static void set_name_for_path(const char *path, const char *name, int namelen)
-{
- struct string_list_item *config;
- config = unsorted_string_list_lookup(&config_name_for_path, path);
- if (config)
- free(config->util);
- else
- config = string_list_append(&config_name_for_path, xstrdup(path));
- config->util = xmemdupz(name, namelen);
-}
-
-static const char *get_ignore_for_name(const char *name)
-{
- struct string_list_item *ignore_option;
- ignore_option = unsorted_string_list_lookup(&config_ignore_for_name, name);
- if (!ignore_option)
- return NULL;
-
- return ignore_option->util;
-}
-
-static void set_ignore_for_name(const char *name, int namelen, const char *ignore)
-{
- struct string_list_item *config;
- char *name_cstr = xmemdupz(name, namelen);
- config = unsorted_string_list_lookup(&config_ignore_for_name, name_cstr);
- if (config) {
- free(config->util);
- free(name_cstr);
- } else
- config = string_list_append(&config_ignore_for_name, name_cstr);
- config->util = xstrdup(ignore);
-}
-
-static int get_fetch_recurse_for_name(const char *name)
-{
- struct string_list_item *fetch_recurse;
- fetch_recurse = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name);
- if (!fetch_recurse)
- return RECURSE_SUBMODULES_NONE;
-
- return (intptr_t) fetch_recurse->util;
-}
-
-static void set_fetch_recurse_for_name(const char *name, int namelen, int fetch_recurse)
-{
- struct string_list_item *config;
- char *name_cstr = xmemdupz(name, namelen);
- config = unsorted_string_list_lookup(&config_fetch_recurse_submodules_for_name, name_cstr);
- if (!config)
- config = string_list_append(&config_fetch_recurse_submodules_for_name, name_cstr);
- else
- free(name_cstr);
- config->util = (void *)(intptr_t) fetch_recurse;
-}
-
int is_staging_gitmodules_ok(void)
{
return !gitmodules_is_modified;
@@ -125,7 +52,7 @@ int is_staging_gitmodules_ok(void)
int update_path_in_gitmodules(const char *oldpath, const char *newpath)
{
struct strbuf entry = STRBUF_INIT;
- const char *path;
+ const struct submodule *submodule;
if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
return -1;
@@ -133,13 +60,13 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
if (gitmodules_is_unmerged)
die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
- path = get_name_for_path(oldpath);
- if (!path) {
+ submodule = submodule_from_path(null_sha1, oldpath);
+ if (!submodule || !submodule->name) {
warning(_("Could not find section in .gitmodules where path=%s"), oldpath);
return -1;
}
strbuf_addstr(&entry, "submodule.");
- strbuf_addstr(&entry, path);
+ strbuf_addstr(&entry, submodule->name);
strbuf_addstr(&entry, ".path");
if (git_config_set_in_file(".gitmodules", entry.buf, newpath) < 0) {
/* Maybe the user already did that, don't error out here */
@@ -159,7 +86,7 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
int remove_path_from_gitmodules(const char *path)
{
struct strbuf sect = STRBUF_INIT;
- const char *path_option;
+ const struct submodule *submodule;
if (!file_exists(".gitmodules")) /* Do nothing without .gitmodules */
return -1;
@@ -167,13 +94,13 @@ int remove_path_from_gitmodules(const char *path)
if (gitmodules_is_unmerged)
die(_("Cannot change unmerged .gitmodules, resolve merge conflicts first"));
- path_option = get_name_for_path(path);
- if (!path_option) {
+ submodule = submodule_from_path(null_sha1, path);
+ if (!submodule || !submodule->name) {
warning(_("Could not find section in .gitmodules where path=%s"), path);
return -1;
}
strbuf_addstr(§, "submodule.");
- strbuf_addstr(§, path_option);
+ strbuf_addstr(§, submodule->name);
if (git_config_rename_section_in_file(".gitmodules", sect.buf, NULL) < 0) {
/* Maybe the user already did that, don't error out here */
warning(_("Could not remove .gitmodules entry for %s"), path);
@@ -235,11 +162,10 @@ done:
void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
const char *path)
{
- const char *name = get_name_for_path(path);
- if (name) {
- const char *ignore = get_ignore_for_name(name);
- if (ignore)
- handle_ignore_submodules_arg(diffopt, ignore);
+ const struct submodule *submodule = submodule_from_path(null_sha1, path);
+ if (submodule) {
+ if (submodule->ignore)
+ handle_ignore_submodules_arg(diffopt, submodule->ignore);
else if (gitmodules_is_unmerged)
DIFF_OPT_SET(diffopt, IGNORE_SUBMODULES);
}
@@ -288,42 +214,6 @@ void gitmodules_config(void)
}
}
-int parse_submodule_config_option(const char *var, const char *value)
-{
- const char *name, *key;
- int namelen;
-
- if (parse_config_key(var, "submodule", &name, &namelen, &key) < 0 || !name)
- return 0;
-
- if (!strcmp(key, "path")) {
- if (!value)
- return config_error_nonbool(var);
-
- set_name_for_path(value, name, namelen);
-
- } else if (!strcmp(key, "fetchrecursesubmodules")) {
- int fetch_recurse = parse_fetch_recurse_submodules_arg(var, value);
-
- set_fetch_recurse_for_name(name, namelen, fetch_recurse);
-
- } else if (!strcmp(key, "ignore")) {
-
- if (!value)
- return config_error_nonbool(var);
-
- if (strcmp(value, "untracked") && strcmp(value, "dirty") &&
- strcmp(value, "all") && strcmp(value, "none")) {
- warning("Invalid parameter \"%s\" for config option \"submodule.%s.ignore\"", value, var);
- return 0;
- }
-
- set_ignore_for_name(name, namelen, value);
- return 0;
- }
- return 0;
-}
-
void handle_ignore_submodules_arg(struct diff_options *diffopt,
const char *arg)
{
@@ -699,7 +589,7 @@ static void calculate_changed_submodule_paths(void)
struct argv_array argv = ARGV_ARRAY_INIT;
/* No need to check if there are no submodules configured */
- if (!get_name_for_path(NULL))
+ if (!submodule_from_path(NULL, NULL))
return;
init_revisions(&rev, NULL);
@@ -746,7 +636,6 @@ int fetch_populated_submodules(const struct argv_array *options,
int i, result = 0;
struct child_process cp = CHILD_PROCESS_INIT;
struct argv_array argv = ARGV_ARRAY_INIT;
- const char *name_for_path;
const char *work_tree = get_git_work_tree();
if (!work_tree)
goto out;
@@ -771,23 +660,26 @@ int fetch_populated_submodules(const struct argv_array *options,
struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = active_cache[i];
- const char *git_dir, *name, *default_argv;
+ const char *git_dir, *default_argv;
+ const struct submodule *submodule;
if (!S_ISGITLINK(ce->ce_mode))
continue;
- name = ce->name;
- name_for_path = get_name_for_path(ce->name);
- if (name_for_path)
- name = name_for_path;
+ submodule = submodule_from_path(null_sha1, ce->name);
+ if (!submodule)
+ submodule = submodule_from_name(null_sha1, ce->name);
default_argv = "yes";
if (command_line_option == RECURSE_SUBMODULES_DEFAULT) {
- int fetch_recurse_option = get_fetch_recurse_for_name(name);
- if (fetch_recurse_option != RECURSE_SUBMODULES_NONE) {
- if (fetch_recurse_option == RECURSE_SUBMODULES_OFF)
+ if (submodule &&
+ submodule->fetch_recurse !=
+ RECURSE_SUBMODULES_NONE) {
+ if (submodule->fetch_recurse ==
+ RECURSE_SUBMODULES_OFF)
continue;
- if (fetch_recurse_option == RECURSE_SUBMODULES_ON_DEMAND) {
+ if (submodule->fetch_recurse ==
+ RECURSE_SUBMODULES_ON_DEMAND) {
if (!unsorted_string_list_lookup(&changed_submodule_paths, ce->name))
continue;
default_argv = "on-demand";
diff --git a/submodule.h b/submodule.h
index 920fef3..547219d 100644
--- a/submodule.h
+++ b/submodule.h
@@ -20,7 +20,6 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
const char *path);
int submodule_config(const char *var, const char *value, void *cb);
void gitmodules_config(void);
-int parse_submodule_config_option(const char *var, const char *value);
void handle_ignore_submodules_arg(struct diff_options *diffopt, const char *);
int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
void show_submodule_summary(FILE *f, const char *path,
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 2602bc5..7229978 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -5,8 +5,8 @@
test_description='Test submodules config cache infrastructure
-This test verifies that parsing .gitmodules configuration directly
-from the database works.
+This test verifies that parsing .gitmodules configurations directly
+from the database and from the worktree works.
'
TEST_NO_CREATE_REPO=1
@@ -82,4 +82,37 @@ test_expect_success 'error in one submodule config lets continue' '
)
'
+cat >super/expect_url <<EOF
+Submodule url: 'git@somewhere.else.net:a.git' for path 'b'
+Submodule url: 'git@somewhere.else.net:submodule.git' for path 'submodule'
+EOF
+
+cat >super/expect_local_path <<EOF
+Submodule name: 'a' for path 'c'
+Submodule name: 'submodule' for path 'submodule'
+EOF
+
+test_expect_success 'reading of local configuration' '
+ (cd super &&
+ old_a=$(git config submodule.a.url) &&
+ old_submodule=$(git config submodule.submodule.url) &&
+ git config submodule.a.url git@somewhere.else.net:a.git &&
+ git config submodule.submodule.url git@somewhere.else.net:submodule.git &&
+ test-submodule-config --url \
+ "" b \
+ "" submodule \
+ >actual &&
+ test_cmp expect_url actual &&
+ git config submodule.a.path c &&
+ test-submodule-config \
+ "" c \
+ "" submodule \
+ >actual &&
+ test_cmp expect_local_path actual &&
+ git config submodule.a.url $old_a &&
+ git config submodule.submodule.url $old_submodule &&
+ git config --unset submodule.a.path c
+ )
+'
+
test_done
diff --git a/test-submodule-config.c b/test-submodule-config.c
index f3c3918..dab8c27 100644
--- a/test-submodule-config.c
+++ b/test-submodule-config.c
@@ -1,5 +1,6 @@
#include "cache.h"
#include "submodule-config.h"
+#include "submodule.h"
static void die_usage(int argc, char **argv, const char *msg)
{
@@ -8,6 +9,11 @@ static void die_usage(int argc, char **argv, const char *msg)
exit(1);
}
+static int git_test_config(const char *var, const char *value, void *cb)
+{
+ return parse_submodule_config_option(var, value);
+}
+
int main(int argc, char **argv)
{
char **arg = argv;
@@ -29,6 +35,10 @@ int main(int argc, char **argv)
if (my_argc % 2 != 0)
die_usage(argc, argv, "Wrong number of arguments.");
+ setup_git_directory();
+ gitmodules_config();
+ git_config(git_test_config, NULL);
+
while (*arg) {
unsigned char commit_sha1[20];
const struct submodule *submodule;
--
2.4.2.391.g2979c89
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 4/4] do not die on error of parsing fetchrecursesubmodules option
2015-06-02 14:24 [PATCH v4 0/4] submodule config lookup API Heiko Voigt
` (2 preceding siblings ...)
2015-06-02 14:27 ` [PATCH v4 3/4] use new config API for worktree configurations of submodules Heiko Voigt
@ 2015-06-02 14:28 ` Heiko Voigt
3 siblings, 0 replies; 7+ messages in thread
From: Heiko Voigt @ 2015-06-02 14:28 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
We should not die when reading the submodule config cache since the user
might not be able to get out of that situation when the configuration is
part of the history.
We should handle this condition later when the value is about to be
used.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
---
builtin/fetch.c | 1 +
submodule-config.c | 29 ++++++++++++++++++++++++++++-
submodule-config.h | 1 +
submodule.c | 15 ---------------
submodule.h | 2 +-
t/t7411-submodule-config.sh | 35 +++++++++++++++++++++++++++++++++++
6 files changed, 66 insertions(+), 17 deletions(-)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7910419..faae548 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -11,6 +11,7 @@
#include "run-command.h"
#include "parse-options.h"
#include "sigchain.h"
+#include "submodule-config.h"
#include "submodule.h"
#include "connected.h"
#include "argv-array.h"
diff --git a/submodule-config.c b/submodule-config.c
index fc7bf40..199692b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -204,6 +204,30 @@ static struct submodule *lookup_or_create_by_name(struct submodule_cache *cache,
return submodule;
}
+static int parse_fetch_recurse(const char *opt, const char *arg,
+ int die_on_error)
+{
+ switch (git_config_maybe_bool(opt, arg)) {
+ case 1:
+ return RECURSE_SUBMODULES_ON;
+ case 0:
+ return RECURSE_SUBMODULES_OFF;
+ default:
+ if (!strcmp(arg, "on-demand"))
+ return RECURSE_SUBMODULES_ON_DEMAND;
+
+ if (die_on_error)
+ die("bad %s argument: %s", opt, arg);
+ else
+ return RECURSE_SUBMODULES_ERROR;
+ }
+}
+
+int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg)
+{
+ return parse_fetch_recurse(opt, arg, 1);
+}
+
static void warn_multiple_config(const unsigned char *commit_sha1,
const char *name, const char *option)
{
@@ -255,6 +279,8 @@ static int parse_config(const char *var, const char *value, void *data)
submodule->path = strbuf_detach(&path, NULL);
cache_put_path(me->cache, submodule);
} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+ /* when parsing worktree configurations we can die early */
+ int die_on_error = is_null_sha1(me->gitmodules_sha1);
if (!me->overwrite &&
submodule->fetch_recurse != RECURSE_SUBMODULES_NONE) {
warn_multiple_config(me->commit_sha1, submodule->name,
@@ -262,7 +288,8 @@ static int parse_config(const char *var, const char *value, void *data)
goto release_return;
}
- submodule->fetch_recurse = parse_fetch_recurse_submodules_arg(var, value);
+ submodule->fetch_recurse = parse_fetch_recurse(var, value,
+ die_on_error);
} else if (!strcmp(item.buf, "ignore")) {
struct strbuf ignore = STRBUF_INIT;
if (!me->overwrite && submodule->ignore != NULL) {
diff --git a/submodule-config.h b/submodule-config.h
index 5fe44ce..9061e4e 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -18,6 +18,7 @@ struct submodule {
unsigned char gitmodules_sha1[20];
};
+int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
int parse_submodule_config_option(const char *var, const char *value);
const struct submodule *submodule_from_name(const unsigned char *commit_sha1,
const char *name);
diff --git a/submodule.c b/submodule.c
index 97355eb..4822559 100644
--- a/submodule.c
+++ b/submodule.c
@@ -288,21 +288,6 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
strbuf_release(&sb);
}
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg)
-{
- switch (git_config_maybe_bool(opt, arg)) {
- case 1:
- return RECURSE_SUBMODULES_ON;
- case 0:
- return RECURSE_SUBMODULES_OFF;
- default:
- if (!strcmp(arg, "on-demand"))
- return RECURSE_SUBMODULES_ON_DEMAND;
- /* TODO: remove the die for history parsing here */
- die("bad %s argument: %s", opt, arg);
- }
-}
-
void show_submodule_summary(FILE *f, const char *path,
const char *line_prefix,
unsigned char one[20], unsigned char two[20],
diff --git a/submodule.h b/submodule.h
index 547219d..5507c3d 100644
--- a/submodule.h
+++ b/submodule.h
@@ -5,6 +5,7 @@ struct diff_options;
struct argv_array;
enum {
+ RECURSE_SUBMODULES_ERROR = -3,
RECURSE_SUBMODULES_NONE = -2,
RECURSE_SUBMODULES_ON_DEMAND = -1,
RECURSE_SUBMODULES_OFF = 0,
@@ -21,7 +22,6 @@ void set_diffopt_flags_from_submodule_config(struct diff_options *diffopt,
int submodule_config(const char *var, const char *value, void *cb);
void gitmodules_config(void);
void handle_ignore_submodules_arg(struct diff_options *diffopt, const char *);
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
void show_submodule_summary(FILE *f, const char *path,
const char *line_prefix,
unsigned char one[20], unsigned char two[20],
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 7229978..fc97c33 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -115,4 +115,39 @@ test_expect_success 'reading of local configuration' '
)
'
+cat >super/expect_fetchrecurse_die.err <<EOF
+fatal: bad submodule.submodule.fetchrecursesubmodules argument: blabla
+EOF
+
+test_expect_success 'local error in fetchrecursesubmodule dies early' '
+ (cd super &&
+ git config submodule.submodule.fetchrecursesubmodules blabla &&
+ test_must_fail test-submodule-config \
+ "" b \
+ "" submodule \
+ >actual.out 2>actual.err &&
+ touch expect_fetchrecurse_die.out &&
+ test_cmp expect_fetchrecurse_die.out actual.out &&
+ test_cmp expect_fetchrecurse_die.err actual.err &&
+ git config --unset submodule.submodule.fetchrecursesubmodules
+ )
+'
+
+test_expect_success 'error in history in fetchrecursesubmodule lets continue' '
+ (cd super &&
+ git config -f .gitmodules \
+ submodule.submodule.fetchrecursesubmodules blabla &&
+ git add .gitmodules &&
+ git config --unset -f .gitmodules \
+ submodule.submodule.fetchrecursesubmodules &&
+ git commit -m "add error in fetchrecursesubmodules" &&
+ test-submodule-config \
+ HEAD b \
+ HEAD submodule \
+ >actual &&
+ test_cmp expect_error actual &&
+ git reset --hard HEAD^
+ )
+'
+
test_done
--
2.4.2.391.g2979c89
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/4] implement submodule config cache for lookup of submodule names
2015-06-02 14:25 ` [PATCH v4 1/4] implement submodule config cache for lookup of submodule names Heiko Voigt
@ 2015-06-02 19:57 ` Junio C Hamano
2015-06-03 19:31 ` Heiko Voigt
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2015-06-02 19:57 UTC (permalink / raw)
To: Heiko Voigt
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
Heiko Voigt <hvoigt@hvoigt.net> writes:
> This submodule configuration cache allows us to lazily read .gitmodules
> configurations by commit into a runtime cache which can then be used to
> easily lookup values from it. Currently only the values for path or name
> are stored but it can be extended for any value needed.
>
> It is expected that .gitmodules files do not change often between
> commits. Thats why we lookup the .gitmodules sha1 from a commit and then
> either lookup an already parsed configuration or parse and cache an
> unknown one for each sha1. The cache is lazily build on demand for each
> requested commit.
>
> This cache can be used for all purposes which need knowledge about
> submodule configurations. Example use cases are:
>
> * Recursive submodule checkout needs lookup a submodule name from its
> path when a submodule first appears. This needs be done before this
> configuration exists in the worktree.
>
> * The implementation of submodule support for 'git archive' needs to
> lookup the submodule name to generate the archive when given a
> revision that is not checked out.
>
> * 'git fetch' when given the --recurse-submodules=on-demand option (or
> configuration) needs to lookup submodule names by path from the
> database rather than reading from the worktree. For new submodule it
> needs to lookup the name from its path to allow cloning new
> submodules into the .git folder so they can be checked out without
> any network interaction when the user does a checkout of that
> revision.
What is unclear to me after reading the above twice is what this
thing is meant to achieve. Is it efficiency by doing lazy lookups
and caching to avoid asking the same thing more than once from
either the filesystem or read_sha1_file()? Is it expected that
reading through this "cache" will be the _only_ way callers would
interact with the .gitmodules data, or is it an opt-in feature that
some callers that do not see the benefit (why they may want to
ignore is totally unclear, because what the "cache" system wants to
achieve is) can safely ignore and bypass?
Because the above talks about looking up ".gitmodules from a
commit", I am guessing that the "commit" used as one of the lookup
keys throughout the system is a commit in the superproject, not from
submodules, but you may want to state that more explicitly.
What, if anything, should be done for .gitmodules that are not yet
committed? Are there cases that the callers that usually interact
with .gitmodules via this "cache" system need to use .gitmodules
immediately after adding a new submodule but before committing that
change to the superproject? Do they code something like this?
if (cached)
read .gitmodules from the index and fabricate
struct submodule;
else if (worktree)
read .gitmodules from the working tree and fabricate
struct submodule;
else
call submodule_from_name("HEAD", ...) and receive
struct submodule;
use the struct submodule to learn from the module;
Yes, I am wondering if submodule_from_name() should be extended to
allow the former two cases, so that the caller can make a single
call above and then use resulting "struct submodule" throughout its
code after doing so. And I also am hoping that the answer to my
questions above to be "This is not just an opt-in 'cache' API, but
we want to make it the unified API for C code to learn about what is
in .gitmodule".
> diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
> new file mode 100644
> index 0000000..2ff4907
> --- /dev/null
> +++ b/Documentation/technical/api-submodule-config.txt
> @@ -0,0 +1,46 @@
> +submodule config cache API
> +==========================
> +
> +The submodule config cache API allows to read submodule
> +configurations/information from specified revisions. Internally
> +information is lazily read into a cache that is used to avoid
> +unnecessary parsing of the same .gitmodule files. Lookups can be done by
> +submodule path or name.
> +
> +Usage
> +-----
> +
> +The caller can look up information about submodules by using the
> +`submodule_from_path()` or `submodule_from_name()` functions. They return
> +a `struct submodule` which contains the values. The API automatically
> +initializes and allocates the needed infrastructure on-demand.
> +
> +If the internal cache might grow too big or when the caller is done with
> +the API, all internally cached values can be freed with submodule_free().
> +
> +Data Structures
> +---------------
> +
> +`struct submodule`::
> +
> + This structure is used to return the information about one
> + submodule for a certain revision. It is returned by the lookup
> + functions.
Hopefully this will not stay an opaque structure as we read later
patches ;-).
> +Functions
> +---------
> +
> +`void submodule_free()`::
> +
> + Use these to free the internally cached values.
"These" meaning "this single function", or are there variants of it?
> diff --git a/submodule-config.c b/submodule-config.c
> new file mode 100644
> index 0000000..97f4a04
> --- /dev/null
> +++ b/submodule-config.c
> @@ -0,0 +1,445 @@
> +#include "cache.h"
> +#include "submodule-config.h"
> +#include "submodule.h"
> +#include "strbuf.h"
> +
> +/*
> + * submodule cache lookup structure
> + * There is one shared set of 'struct submodule' entries which can be
> + * looked up by their sha1 blob id of the .gitmodule file and either
> + * using path or name as key.
> + * for_path stores submodule entries with path as key
> + * for_name stores submodule entries with name as key
> + */
> +struct submodule_cache {
> + struct hashmap for_path;
> + struct hashmap for_name;
> +};
> +
> +/*
> + * thin wrapper struct needed to insert 'struct submodule' entries to
> + * the hashmap
> + */
> +struct submodule_entry {
> + struct hashmap_entry ent;
> + struct submodule *config;
> +};
The above, and the singleton-ness of the "cache", implies that we
can have only one "struct submodule" for a given path (or a name).
Does that mean the subsystem implicitly is tied to a single commit
at the superproject level?
What happens when I call submodule_from_path() for a single
submodule at one commit in the superproject, and then ask about that
same submodule for another commit in the superproject, which may
have a different version of .gitmodules, by calling the same
function again?
Side note: I think I know the answer to these questions,
after reading the hash function. for_path does not store
submodule entries with path as key. It uses the commit and
the path as a combined key, so both HEAD:.gitmodules and
HEAD^:.gitmodules can be cached and looked up separatedly if
their contents are different. The comment and field names
of "struct submodule_cache" may want to be improved.
When do we evict the cache? I am wondering what would happen when
you do "git log --recursive" at the superproject level, which may
grow the cache in an unbounded way without some eviction policy.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/4] implement submodule config cache for lookup of submodule names
2015-06-02 19:57 ` Junio C Hamano
@ 2015-06-03 19:31 ` Heiko Voigt
0 siblings, 0 replies; 7+ messages in thread
From: Heiko Voigt @ 2015-06-03 19:31 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Jens Lehmann, Jonathan Nieder, Jeff King, W. Trevor King,
Eric Sunshine, Karsten Blees
On Tue, Jun 02, 2015 at 12:57:08PM -0700, Junio C Hamano wrote:
> Heiko Voigt <hvoigt@hvoigt.net> writes:
>
> > This submodule configuration cache allows us to lazily read .gitmodules
> > configurations by commit into a runtime cache which can then be used to
> > easily lookup values from it. Currently only the values for path or name
> > are stored but it can be extended for any value needed.
> >
> > It is expected that .gitmodules files do not change often between
> > commits. Thats why we lookup the .gitmodules sha1 from a commit and then
> > either lookup an already parsed configuration or parse and cache an
> > unknown one for each sha1. The cache is lazily build on demand for each
> > requested commit.
> >
> > This cache can be used for all purposes which need knowledge about
> > submodule configurations. Example use cases are:
> >
> > * Recursive submodule checkout needs lookup a submodule name from its
> > path when a submodule first appears. This needs be done before this
> > configuration exists in the worktree.
> >
> > * The implementation of submodule support for 'git archive' needs to
> > lookup the submodule name to generate the archive when given a
> > revision that is not checked out.
> >
> > * 'git fetch' when given the --recurse-submodules=on-demand option (or
> > configuration) needs to lookup submodule names by path from the
> > database rather than reading from the worktree. For new submodule it
> > needs to lookup the name from its path to allow cloning new
> > submodules into the .git folder so they can be checked out without
> > any network interaction when the user does a checkout of that
> > revision.
>
> What is unclear to me after reading the above twice is what this
> thing is meant to achieve. Is it efficiency by doing lazy lookups
> and caching to avoid asking the same thing more than once from
> either the filesystem or read_sha1_file()? Is it expected that
> reading through this "cache" will be the _only_ way callers would
> interact with the .gitmodules data, or is it an opt-in feature that
> some callers that do not see the benefit (why they may want to
> ignore is totally unclear, because what the "cache" system wants to
> achieve is) can safely ignore and bypass?
Maybe I should add a paragraph like this on top:
In a superproject some commands need to interact with submodules. They
need to query values from the .gitmodules file either from the
worktree of from certain revisions. At the moment this is quite hard
since a caller would need to read the .gitmodules file from the
history and then parse the values. We want to provide an API for this
so we have one place to get values from .gitmodules from any revision
(including the worktree).
So yes it will be the only way callers would read .gitmodules data.
Since you abstractly wrote "interact" I realize that I have not thought
about writing values.
> Because the above talks about looking up ".gitmodules from a
> commit", I am guessing that the "commit" used as one of the lookup
> keys throughout the system is a commit in the superproject, not from
> submodules, but you may want to state that more explicitly.
Yes. Well it is a commit in the current project which should be the
superproject to other submodules. Assuming we have the additional
paragraph above. Does that make it more clear?
> What, if anything, should be done for .gitmodules that are not yet
> committed? Are there cases that the callers that usually interact
> with .gitmodules via this "cache" system need to use .gitmodules
> immediately after adding a new submodule but before committing that
> change to the superproject? Do they code something like this?
>
> if (cached)
> read .gitmodules from the index and fabricate
> struct submodule;
> else if (worktree)
> read .gitmodules from the working tree and fabricate
> struct submodule;
> else
> call submodule_from_name("HEAD", ...) and receive
> struct submodule;
>
> use the struct submodule to learn from the module;
>
> Yes, I am wondering if submodule_from_name() should be extended to
> allow the former two cases, so that the caller can make a single
> call above and then use resulting "struct submodule" throughout its
> code after doing so.
Reading from the worktree is supported by passing in null_sha1 (as
documented). That said it may also contain values from the local
configuration (overlaid in the typical priority).
The index case is the only one that is missing. I am not sure whether we
will need that. The use cases I have described above do not require this
(AFAICS). But you are right maybe it makes sense to keep this open so we
can at least allow it. Since we already use the NULL pointer and the
null_sha1 as special values we could either add another function pair
and a separate internal cache for them or define some special sha1 value
that is very unlikely to collide (e.g. an empty commit sha1 at epoch
with some special author).
Either way I think there is nothing blocking us from extending it but I
would not do it now without any users.
The current use cases are all about interrogating an existing history
and the worktree. I currently see no need to interrogate .gitmodules
directly after adding a submodule in the same process. We also should be
able to add it later in case needed.
> And I also am hoping that the answer to my
> questions above to be "This is not just an opt-in 'cache' API, but
> we want to make it the unified API for C code to learn about what is
> in .gitmodule".
Yes as stated above I am planning to make this the unified API for C
code. If you look at the next two commits you see that I am already
replacing the existing lookups from the worktree to use this cache.
With this series all places that read .gitmodules values use this cache.
Anything else would be a bug in my code :)
> > diff --git a/Documentation/technical/api-submodule-config.txt b/Documentation/technical/api-submodule-config.txt
> > new file mode 100644
> > index 0000000..2ff4907
> > --- /dev/null
> > +++ b/Documentation/technical/api-submodule-config.txt
> > @@ -0,0 +1,46 @@
> > +submodule config cache API
> > +==========================
> > +
> > +The submodule config cache API allows to read submodule
> > +configurations/information from specified revisions. Internally
> > +information is lazily read into a cache that is used to avoid
> > +unnecessary parsing of the same .gitmodule files. Lookups can be done by
> > +submodule path or name.
> > +
> > +Usage
> > +-----
> > +
> > +The caller can look up information about submodules by using the
> > +`submodule_from_path()` or `submodule_from_name()` functions. They return
> > +a `struct submodule` which contains the values. The API automatically
> > +initializes and allocates the needed infrastructure on-demand.
> > +
> > +If the internal cache might grow too big or when the caller is done with
> > +the API, all internally cached values can be freed with submodule_free().
> > +
> > +Data Structures
> > +---------------
> > +
> > +`struct submodule`::
> > +
> > + This structure is used to return the information about one
> > + submodule for a certain revision. It is returned by the lookup
> > + functions.
>
> Hopefully this will not stay an opaque structure as we read later
> patches ;-).
Ok will add some documentation for the members here :)
> > +Functions
> > +---------
> > +
> > +`void submodule_free()`::
> > +
> > + Use these to free the internally cached values.
>
> "These" meaning "this single function", or are there variants of it?
I think I had multiple free routines at some point. This seems like a
leftover. Will change to "this".
> > diff --git a/submodule-config.c b/submodule-config.c
> > new file mode 100644
> > index 0000000..97f4a04
> > --- /dev/null
> > +++ b/submodule-config.c
> > @@ -0,0 +1,445 @@
> > +#include "cache.h"
> > +#include "submodule-config.h"
> > +#include "submodule.h"
> > +#include "strbuf.h"
> > +
> > +/*
> > + * submodule cache lookup structure
> > + * There is one shared set of 'struct submodule' entries which can be
> > + * looked up by their sha1 blob id of the .gitmodule file and either
> > + * using path or name as key.
> > + * for_path stores submodule entries with path as key
> > + * for_name stores submodule entries with name as key
> > + */
> > +struct submodule_cache {
> > + struct hashmap for_path;
> > + struct hashmap for_name;
> > +};
> > +
> > +/*
> > + * thin wrapper struct needed to insert 'struct submodule' entries to
> > + * the hashmap
> > + */
> > +struct submodule_entry {
> > + struct hashmap_entry ent;
> > + struct submodule *config;
> > +};
>
> The above, and the singleton-ness of the "cache", implies that we
> can have only one "struct submodule" for a given path (or a name).
> Does that mean the subsystem implicitly is tied to a single commit
> at the superproject level?
No the keys in the hashmap are a combination of the sha1 blob id of the
.gitmodules file and its name or path respectively. I should make that
more clear here.
> What happens when I call submodule_from_path() for a single
> submodule at one commit in the superproject, and then ask about that
> same submodule for another commit in the superproject, which may
> have a different version of .gitmodules, by calling the same
> function again?
>
> Side note: I think I know the answer to these questions,
> after reading the hash function. for_path does not store
> submodule entries with path as key. It uses the commit and
> the path as a combined key, so both HEAD:.gitmodules and
> HEAD^:.gitmodules can be cached and looked up separatedly if
> their contents are different. The comment and field names
> of "struct submodule_cache" may want to be improved.
That is correct.
> When do we evict the cache? I am wondering what would happen when
> you do "git log --recursive" at the superproject level, which may
> grow the cache in an unbounded way without some eviction policy.
I left this as an exercise for the caller to figure out ;-) I think we
have to leave it to the caller how much of the cache he wants to use.
The only thing I can currently think of, that might help, is that we
could provide the caller with the amount of .gitmodules files read so he
has some simple measure to determine when he wants to call
submodule_free().
All use-cases I currently have in mind are expected to use only a few
.gitmodules. But only in the typical situation. There might be extreme
cases were this is not true anymore. I am not sure how theoretical this
is and thus not sure whether we should take countermeasures now.
What would you expect "git log --recursive" to do?
Cheers Heiko
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-03 19:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 14:24 [PATCH v4 0/4] submodule config lookup API Heiko Voigt
2015-06-02 14:25 ` [PATCH v4 1/4] implement submodule config cache for lookup of submodule names Heiko Voigt
2015-06-02 19:57 ` Junio C Hamano
2015-06-03 19:31 ` Heiko Voigt
2015-06-02 14:26 ` [PATCH v4 2/4] extract functions for submodule config set and lookup Heiko Voigt
2015-06-02 14:27 ` [PATCH v4 3/4] use new config API for worktree configurations of submodules Heiko Voigt
2015-06-02 14:28 ` [PATCH v4 4/4] do not die on error of parsing fetchrecursesubmodules option Heiko Voigt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).