* [RFC/PATCH 0/3] Enable in-process submodule traversal @ 2009-01-11 23:45 Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 1/3] tree.c: add support for traversal of submodules Lars Hjemli 2009-01-12 2:07 ` [RFC/PATCH 0/3] Enable in-process submodule traversal Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Lars Hjemli @ 2009-01-11 23:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano This patch series implements basic support for traversing the tree objects in submodules when the linked commit object is reachable. Normally such linked commit objects will not be reachable in the containing repository, but adding local copies of submodule repositories as alternate object databases for the containing repo solves this issue. The first patch in the series does all the 'hard work' required for the traversal to work, while the next two patches adds a '--submodules' flag to git-archive and git-ls-tree as proof of concept. Lars Hjemli (3): tree.c: add support for traversal of submodules archive.c: enable traversal of submodules builtin-ls-tree: enable traversal of submodules archive.c | 2 ++ builtin-ls-tree.c | 23 ++++++++--------------- tree.c | 20 +++++++++++++++++--- tree.h | 1 + 4 files changed, 28 insertions(+), 18 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC/PATCH 1/3] tree.c: add support for traversal of submodules 2009-01-11 23:45 [RFC/PATCH 0/3] Enable in-process submodule traversal Lars Hjemli @ 2009-01-11 23:45 ` Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 2/3] archive.c: enable " Lars Hjemli 2009-01-12 3:15 ` [RFC/PATCH 1/3] tree.c: add support for " Junio C Hamano 2009-01-12 2:07 ` [RFC/PATCH 0/3] Enable in-process submodule traversal Junio C Hamano 1 sibling, 2 replies; 9+ messages in thread From: Lars Hjemli @ 2009-01-11 23:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano If the commit referenced by a gitlink is available in the (possibly alternate) object database, read_tree_recursive() is now able to descend into the tree of the linked commit if the flag 'traverse_gitlinks' is turned on. Signed-off-by: Lars Hjemli <hjemli@gmail.com> --- tree.c | 20 +++++++++++++++++--- tree.h | 1 + 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/tree.c b/tree.c index 03e782a..1468e10 100644 --- a/tree.c +++ b/tree.c @@ -7,6 +7,7 @@ #include "tree-walk.h" const char *tree_type = "tree"; +int traverse_gitlinks = 0; static int read_one_entry_opt(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, int opt) { @@ -114,16 +115,29 @@ int read_tree_recursive(struct tree *tree, default: return -1; } - if (S_ISDIR(entry.mode)) { + if (S_ISDIR(entry.mode) || (traverse_gitlinks && S_ISGITLINK(entry.mode))) { int retval; char *newbase; unsigned int pathlen = tree_entry_len(entry.path, entry.sha1); - + struct commit *commit; + struct tree *node; + + if (S_ISDIR(entry.mode)) { + node = lookup_tree(entry.sha1); + } else { + commit = lookup_commit_reference_gently(entry.sha1, 1); + if (!commit) + continue; + if (parse_commit(commit)) + die("parse_commit(%s) failed", + sha1_to_hex(entry.sha1)); + node = commit->tree; + } newbase = xmalloc(baselen + 1 + pathlen); memcpy(newbase, base, baselen); memcpy(newbase + baselen, entry.path, pathlen); newbase[baselen + pathlen] = '/'; - retval = read_tree_recursive(lookup_tree(entry.sha1), + retval = read_tree_recursive(node, newbase, baselen + pathlen + 1, stage, match, fn, context); diff --git a/tree.h b/tree.h index 2ff01a4..b6b938f 100644 --- a/tree.h +++ b/tree.h @@ -4,6 +4,7 @@ #include "object.h" extern const char *tree_type; +extern int traverse_gitlinks; struct tree { struct object object; -- 1.6.1.81.g1df1.dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC/PATCH 2/3] archive.c: enable traversal of submodules 2009-01-11 23:45 ` [RFC/PATCH 1/3] tree.c: add support for traversal of submodules Lars Hjemli @ 2009-01-11 23:45 ` Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 3/3] builtin-ls-tree: " Lars Hjemli 2009-01-12 3:15 ` [RFC/PATCH 1/3] tree.c: add support for " Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Lars Hjemli @ 2009-01-11 23:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano The option --submodules can be used to activate traversal of sub- modules when creating archives. Signed-off-by: Lars Hjemli <hjemli@gmail.com> --- archive.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/archive.c b/archive.c index 9ac455d..973dde4 100644 --- a/archive.c +++ b/archive.c @@ -262,6 +262,8 @@ static int parse_archive_args(int argc, const char **argv, OPT_STRING(0, "format", &format, "fmt", "archive format"), OPT_STRING(0, "prefix", &base, "prefix", "prepend prefix to each pathname in the archive"), + OPT_BOOLEAN(0, "submodules", &traverse_gitlinks, + "include reacheable submodules"), OPT__VERBOSE(&verbose), OPT__COMPR('0', &compression_level, "store only", 0), OPT__COMPR('1', &compression_level, "compress faster", 1), -- 1.6.1.81.g1df1.dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC/PATCH 3/3] builtin-ls-tree: enable traversal of submodules 2009-01-11 23:45 ` [RFC/PATCH 2/3] archive.c: enable " Lars Hjemli @ 2009-01-11 23:45 ` Lars Hjemli 0 siblings, 0 replies; 9+ messages in thread From: Lars Hjemli @ 2009-01-11 23:45 UTC (permalink / raw) To: git; +Cc: Junio C Hamano The option '--submodules', which implies '-r', activates the traversal of all submodules for which the linked commit is reachable. Signed-off-by: Lars Hjemli <hjemli@gmail.com> --- builtin-ls-tree.c | 23 ++++++++--------------- 1 files changed, 8 insertions(+), 15 deletions(-) diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c index cb61717..8a1db54 100644 --- a/builtin-ls-tree.c +++ b/builtin-ls-tree.c @@ -23,7 +23,7 @@ static int chomp_prefix; static const char *ls_tree_prefix; static const char ls_tree_usage[] = - "git ls-tree [-d] [-r] [-t] [-l] [-z] [--name-only] [--name-status] [--full-name] [--abbrev[=<n>]] <tree-ish> [path...]"; + "git ls-tree [-d] [-r] [-t] [-l] [-z] [--name-only] [--name-status] [--full-name] [--abbrev[=<n>]] [--submodules] <tree-ish> [path...]"; static int show_recursive(const char *base, int baselen, const char *pathname) { @@ -63,20 +63,8 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen, unsigned long size; if (S_ISGITLINK(mode)) { - /* - * Maybe we want to have some recursive version here? - * - * Something similar to this incomplete example: - * - if (show_subprojects(base, baselen, pathname)) { - struct child_process ls_tree; - - ls_tree.dir = base; - ls_tree.argv = ls-tree; - start_command(&ls_tree); - } - * - */ + if (show_recursive(base, baselen, pathname)) + retval = READ_TREE_RECURSIVE; type = commit_type; } else if (S_ISDIR(mode)) { if (show_recursive(base, baselen, pathname)) { @@ -168,6 +156,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) abbrev = DEFAULT_ABBREV; break; } + if (!strcmp(argv[1]+2, "submodules")) { + ls_options |= LS_RECURSIVE; + traverse_gitlinks = 1; + break; + } /* otherwise fallthru */ default: usage(ls_tree_usage); -- 1.6.1.81.g1df1.dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 1/3] tree.c: add support for traversal of submodules 2009-01-11 23:45 ` [RFC/PATCH 1/3] tree.c: add support for traversal of submodules Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 2/3] archive.c: enable " Lars Hjemli @ 2009-01-12 3:15 ` Junio C Hamano 2009-01-12 9:04 ` Lars Hjemli 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2009-01-12 3:15 UTC (permalink / raw) To: Lars Hjemli; +Cc: git Lars Hjemli <hjemli@gmail.com> writes: > If the commit referenced by a gitlink is available in the (possibly > alternate) object database, read_tree_recursive() is now able to descend > into the tree of the linked commit if the flag 'traverse_gitlinks' is > turned on. > > Signed-off-by: Lars Hjemli <hjemli@gmail.com> > --- > tree.c | 20 +++++++++++++++++--- > tree.h | 1 + > 2 files changed, 18 insertions(+), 3 deletions(-) > > diff --git a/tree.c b/tree.c > index 03e782a..1468e10 100644 > --- a/tree.c > +++ b/tree.c > @@ -7,6 +7,7 @@ > #include "tree-walk.h" > > const char *tree_type = "tree"; > +int traverse_gitlinks = 0; I think we tend to put these global settings that will affect everybody in environment.c. You do not have to initialize variable to zero; BSS will take care of it. When the user explicitly asks you to traverse into submodules and the necessary commit is not available in a submodule, the code goes on without complaining. I am not saying it is bad, but I wonder if we would want to distinguish these three cases: (1) the submodule is initialized and the necessary commit is there. (2) the submodule is initialized, but the necessary commit is missing. (3) the submodule is not even initialized (aka "the user is not interested in it"); there is only an empty directory. I think it is perfectly fine not to say anything for (3) but I am unsure about the second case. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 1/3] tree.c: add support for traversal of submodules 2009-01-12 3:15 ` [RFC/PATCH 1/3] tree.c: add support for " Junio C Hamano @ 2009-01-12 9:04 ` Lars Hjemli 2009-01-12 10:07 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Lars Hjemli @ 2009-01-12 9:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, Jan 12, 2009 at 04:15, Junio C Hamano <gitster@pobox.com> wrote: > Lars Hjemli <hjemli@gmail.com> writes: > >> If the commit referenced by a gitlink is available in the (possibly >> alternate) object database, read_tree_recursive() is now able to descend >> into the tree of the linked commit if the flag 'traverse_gitlinks' is >> turned on. >> >> Signed-off-by: Lars Hjemli <hjemli@gmail.com> >> --- >> tree.c | 20 +++++++++++++++++--- >> tree.h | 1 + >> 2 files changed, 18 insertions(+), 3 deletions(-) >> >> diff --git a/tree.c b/tree.c >> index 03e782a..1468e10 100644 >> --- a/tree.c >> +++ b/tree.c >> @@ -7,6 +7,7 @@ >> #include "tree-walk.h" >> >> const char *tree_type = "tree"; >> +int traverse_gitlinks = 0; > > I think we tend to put these global settings that will affect everybody in > environment.c. You do not have to initialize variable to zero; BSS will > take care of it. Ok, I'll add a proper interface in environment.c for this setting. > When the user explicitly asks you to traverse into submodules and the > necessary commit is not available in a submodule, the code goes on without > complaining. I am not saying it is bad, but I wonder if we would want to > distinguish these three cases: > > (1) the submodule is initialized and the necessary commit is there. > > (2) the submodule is initialized, but the necessary commit is missing. > > (3) the submodule is not even initialized (aka "the user is not > interested in it"); there is only an empty directory. > > I think it is perfectly fine not to say anything for (3) but I am unsure > about the second case. Do we want to impose the porcelainish rules of git-submodule (.gitmodules, .git/config) in read_tree_recursive()? If so, I guess a new submodule.h might provide something like this (disclaimer: coded in gmail): struct submodule { int interesting:1; char *name; char *url; char **objectdirs; char **paths; } typedef int (*submodule_cb)(struct submodule *submodule, void *data); int load_submodule_config(); struct submodule *get_submodule_from_path(char *path); int add_submodule_objectdb(struct submodule *item); int for_each_submodule(submodule_cb cb, void *data); Then, in read_tree_recursive(), we could check submodule->interesting to decide if we should follow the gitlink. Also, for bare repositories, we'd need to support something like 'submodule.<name>.objectdir' in the config file (and .gitmodules must obviously be read from the objectdb). If we don't want to impose these rules, the current patch should be minimally sufficient (the user has to edit .git/objects/info/alternates by hand and missing submodule commits are treated as if the submodule is not interesting). -- larsh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 1/3] tree.c: add support for traversal of submodules 2009-01-12 9:04 ` Lars Hjemli @ 2009-01-12 10:07 ` Junio C Hamano 2009-01-12 10:59 ` Lars Hjemli 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2009-01-12 10:07 UTC (permalink / raw) To: Lars Hjemli; +Cc: git Lars Hjemli <hjemli@gmail.com> writes: > On Mon, Jan 12, 2009 at 04:15, Junio C Hamano <gitster@pobox.com> wrote: > ... >> When the user explicitly asks you to traverse into submodules and the >> necessary commit is not available in a submodule, the code goes on without >> complaining. I am not saying it is bad, but I wonder if we would want to >> distinguish these three cases: >> >> (1) the submodule is initialized and the necessary commit is there. >> >> (2) the submodule is initialized, but the necessary commit is missing. >> >> (3) the submodule is not even initialized (aka "the user is not >> interested in it"); there is only an empty directory. >> >> I think it is perfectly fine not to say anything for (3) but I am unsure >> about the second case. > > Do we want to impose the porcelainish rules of git-submodule > (.gitmodules, .git/config) in read_tree_recursive()? > > If so, I guess a new submodule.h might provide something like this > (disclaimer: coded in gmail): I do not see why you would need anything more than we already have to tell (3) from (1) and (2). And I do not see why you need to have the Porcelain policy in the picture for telling these three cases apart, either. For example, there is this code in read-cache.c::ce_compare_gitlink(): static int ce_compare_gitlink(struct cache_entry *ce) { unsigned char sha1[20]; /* * We don't actually require that the .git directory * under GITLINK directory be a valid git directory. It * might even be missing (in case nobody populated that * sub-project). * * If so, we consider it always to match. */ if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0) return 0; return hashcmp(sha1, ce->sha1); } It asks resolve_gitlink_ref() to see if the directory (where the submodule checkout _might_ be present if the user is interested in it) has .git/HEAD that resolves. If so, the user has a checkout and is interested in it. Otherwise, there is no checkout, in other words, we have case (3) above. Whether you force the user to link the submodule object store to the primary one as alternates, or do that for the user temporarily inside the process [*1*], you would then be able to tell (1) and (2) apart by asking has_sha1_file() if you can see the commit. One thing that is unclear is to me is for whom the commit is missing (or present). I think the outline I gave above follows the design of your patch to assume that the commit may (or may not) be available to the superproject and traverse into the commit when that is the case. It does not mean the commit is available to the submodule itself (the commit may have found in the primary project itself, not via the alternates), but such an arrangement makes it somewhat useless. What's the typical direction of using alternates in a setting with superproject with a submodule? Do people have alternates in the submodule repository that borrows from the superproject repository? Or the other way around? What's the rationale for having such alternates for normal use case? I am suspecting that there is no reason (other than this "recursive tree traversal") to have an alternates file in either direction, but I also strongly suspect that I am missing some unwritten assumption you are making. [Footnote] *1* If you want to recurse seemlessly, it might make sense to add (during the course of this "recursive tree traversal") the object store of the submodule repository to the process'es list of alternate object databases yourself, instead of forcing the user to do so permanently by mucking with the alternates list of the primary project repository. But that is an independent issue. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 1/3] tree.c: add support for traversal of submodules 2009-01-12 10:07 ` Junio C Hamano @ 2009-01-12 10:59 ` Lars Hjemli 0 siblings, 0 replies; 9+ messages in thread From: Lars Hjemli @ 2009-01-12 10:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, Jan 12, 2009 at 11:07, Junio C Hamano <gitster@pobox.com> wrote: > Lars Hjemli <hjemli@gmail.com> writes: > >> On Mon, Jan 12, 2009 at 04:15, Junio C Hamano <gitster@pobox.com> wrote: >> ... >>> When the user explicitly asks you to traverse into submodules and the >>> necessary commit is not available in a submodule, the code goes on without >>> complaining. I am not saying it is bad, but I wonder if we would want to >>> distinguish these three cases: >>> >>> (1) the submodule is initialized and the necessary commit is there. >>> >>> (2) the submodule is initialized, but the necessary commit is missing. >>> >>> (3) the submodule is not even initialized (aka "the user is not >>> interested in it"); there is only an empty directory. >>> >>> I think it is perfectly fine not to say anything for (3) but I am unsure >>> about the second case. >> >> Do we want to impose the porcelainish rules of git-submodule >> (.gitmodules, .git/config) in read_tree_recursive()? >> >> If so, I guess a new submodule.h might provide something like this >> (disclaimer: coded in gmail): > > I do not see why you would need anything more than we already have to tell > (3) from (1) and (2). And I do not see why you need to have the Porcelain > policy in the picture for telling these three cases apart, either. > > For example, there is this code in read-cache.c::ce_compare_gitlink(): > > static int ce_compare_gitlink(struct cache_entry *ce) > { > unsigned char sha1[20]; > > /* > * We don't actually require that the .git directory > * under GITLINK directory be a valid git directory. It > * might even be missing (in case nobody populated that > * sub-project). > * > * If so, we consider it always to match. > */ > if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0) > return 0; > return hashcmp(sha1, ce->sha1); > } > > It asks resolve_gitlink_ref() to see if the directory (where the submodule > checkout _might_ be present if the user is interested in it) has .git/HEAD > that resolves. If so, the user has a checkout and is interested in it. > Otherwise, there is no checkout, in other words, we have case (3) above. Ah, yes, this makes sense. Thanks. > Whether you force the user to link the submodule object store to the > primary one as alternates, or do that for the user temporarily inside the > process [*1*], If resolve_gitlink_ref() returns 0, I think we should automatically insert the objectdir of the submodule as a tempory alternate. > you would then be able to tell (1) and (2) apart by asking > has_sha1_file() if you can see the commit. Yes (I've also got a use-case for this with bare repositories [*1*], but in that setting I guess it's ok to force the user to link the alternates manually). > One thing that is unclear is to me is for whom the commit is missing (or > present). I think the outline I gave above follows the design of your > patch to assume that the commit may (or may not) be available to the > superproject and traverse into the commit when that is the case. It does > not mean the commit is available to the submodule itself (the commit may > have found in the primary project itself, not via the alternates), but > such an arrangement makes it somewhat useless. I think we can ignore this issue; if someone has added the superproject as an alternate for the submodule and then done a checkout of a superproject commit in the submodule followed by committing this gitlink in the superproject, we can only hope the user really knew what he was doing... > What's the typical direction of using alternates in a setting with > superproject with a submodule? Do people have alternates in the submodule > repository that borrows from the superproject repository? Or the other > way around? What's the rationale for having such alternates for normal > use case? I am suspecting that there is no reason (other than this > "recursive tree traversal") to have an alternates file in either > direction, Likewise. > but I also strongly suspect that I am missing some unwritten > assumption you are making. I'm only assuming that we want to support traversal into the submodules from core git. *1* This will be (ab)used by cgit to support downloading of 'complete' tarballs, as outlined in http://thread.gmane.org/gmane.comp.version-control.git/102827. -- larsh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC/PATCH 0/3] Enable in-process submodule traversal 2009-01-11 23:45 [RFC/PATCH 0/3] Enable in-process submodule traversal Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 1/3] tree.c: add support for traversal of submodules Lars Hjemli @ 2009-01-12 2:07 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2009-01-12 2:07 UTC (permalink / raw) To: Lars Hjemli; +Cc: git Sounds interesting except 1/3 didn't seem to reach the list... ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-01-12 11:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-11 23:45 [RFC/PATCH 0/3] Enable in-process submodule traversal Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 1/3] tree.c: add support for traversal of submodules Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 2/3] archive.c: enable " Lars Hjemli 2009-01-11 23:45 ` [RFC/PATCH 3/3] builtin-ls-tree: " Lars Hjemli 2009-01-12 3:15 ` [RFC/PATCH 1/3] tree.c: add support for " Junio C Hamano 2009-01-12 9:04 ` Lars Hjemli 2009-01-12 10:07 ` Junio C Hamano 2009-01-12 10:59 ` Lars Hjemli 2009-01-12 2:07 ` [RFC/PATCH 0/3] Enable in-process submodule traversal Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).