* [RFC/PATCH] clone: add `--shallow-submodules` flag
@ 2016-03-11 23:41 Stefan Beller
2016-03-12 0:41 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-03-11 23:41 UTC (permalink / raw)
To: git; +Cc: larsxschneider, jrnieder, gitster, Stefan Beller
When creating a shallow clone of a repository with submodules, the depth
argument does not influence the submodules, i.e. the submodules are done
as non-shallow clones. It is unclear what the best default is for the
depth of submodules of a shallow clone, so we need to have the possibility
to do all kinds of combinations:
* shallow super project with shallow submodules
e.g. build bots starting always from scratch. They want to transmit
the least amount of network data as well as using the least amount
of space on their hard drive.
* shallow super project with unshallow submodules
e.g. The superproject is just there to track a collection of repositories
and it is not important to have the relationship between the repositories
intact. However the history of the individual submodules matter.
* unshallow super project with shallow submodules
e.g. The superproject is the actual project and the submodule is a
library which is rarely touched.
The new switch to select submodules to be shallow or unshallow supports
all of these three cases.
It is easy to transition from the first to the second case by just
unshallowing the submodules (`git submodule foreach git fetch
--unshallow`), but it is not possible to transition from the second to the
first case (as we wouldd have already transmitted the non shallow over
the network). That is why we want to make the first case the default in
case of a shallow super project. This leads to the inconvenience in the
second case with the shallow super project and unshallow submodules,
as you need to pass `--no-shallow-submodules`.
Signed-off-by: Stefan Beller <sbeller@google.com>
---
A few notes:
* This applies on top of sb/submodule-parallel-update
* I am aware of the current release cycle, and I ought to not add shiny
new features. But scanning the list revealed no bugs I could jump at to fix.
* Lars made some unit tests for a very similar case a few weeks ago, but they
were not applied as there was no intention to fix it. So I am hoping we can
reuse some of these tests for this patch.
* Currently I have the opinion that thinking about (un)shallow projects should
be rather binary. Either you want the full history or you want as least as
possible. (Who wants to have a --depth 42? Some people have argued that you
can use the depth to avoid large binaries which were part of the history in
the past. But I'd counter that argument by pointing out the --depth argument
is a workaround for such a use case. What you really want in that situation
is a setting to clone history since a certain point in time/DAG, or a setting
to clone up to a certain depth such that the filesize of the packed objects
is smaller than some threshold.)
If we were to allow specifying the depth for submodules, we'd need to discuss
how to specify them for individual submodules i.e. clone submodule A with
depth 4 and submodule B with depth 10. but that problem is solved easier
by first doing a shallow clone with depth 1 of all submodules and then deepen
them individually.
So binary shallowness (depth = 1 or infinity) it is.
Thanks,
Stefan
Documentation/git-clone.txt | 13 ++++++++++---
builtin/clone.c | 6 ++++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 6db7b6d..20a4577 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -14,8 +14,8 @@ SYNOPSIS
[-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
[--dissociate] [--separate-git-dir <git dir>]
[--depth <depth>] [--[no-]single-branch]
- [--recursive | --recurse-submodules] [--jobs <n>] [--] <repository>
- [<directory>]
+ [--recursive | --recurse-submodules] [--[no-]shallow-submodules]
+ [--jobs <n>] [--] <repository> [<directory>]
DESCRIPTION
-----------
@@ -190,7 +190,11 @@ objects from the source repository into a pack in the cloned repository.
--depth <depth>::
Create a 'shallow' clone with a history truncated to the
- specified number of revisions.
+ specified number of revisions. Implies `--single-branch` unless
+ `--no-single-branch` is given to fetch the histories near the
+ tips of all branches. This implies `--shallow-submodules`. If
+ you want to have a shallow clone, but full submodules, also pass
+ `--no-shallow-submodules`.
--[no-]single-branch::
Clone only the history leading to the tip of a single branch,
@@ -214,6 +218,9 @@ objects from the source repository into a pack in the cloned repository.
repository does not have a worktree/checkout (i.e. if any of
`--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
+--shallow-submodules::
+ All submodules which are cloned, will be shallow.
+
--separate-git-dir=<git dir>::
Instead of placing the cloned repository where it is supposed
to be, place the cloned repository at the specified directory,
diff --git a/builtin/clone.c b/builtin/clone.c
index b004fb4..cfa01fe 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -40,6 +40,7 @@ static const char * const builtin_clone_usage[] = {
static int option_no_checkout, option_bare, option_mirror, option_single_branch = -1;
static int option_local = -1, option_no_hardlinks, option_shared, option_recursive;
+static int option_shallow_submodules = -1;
static char *option_template, *option_depth;
static char *option_origin = NULL;
static char *option_branch = NULL;
@@ -91,6 +92,8 @@ static struct option builtin_clone_options[] = {
N_("create a shallow clone of that depth")),
OPT_BOOL(0, "single-branch", &option_single_branch,
N_("clone only one branch, HEAD or --branch")),
+ OPT_BOOL(0, "shallow-submodules", &option_shallow_submodules,
+ N_("any cloned submodules will be shallow")),
OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
N_("separate git dir from working tree")),
OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
@@ -727,6 +730,9 @@ static int checkout(void)
struct argv_array args = ARGV_ARRAY_INIT;
argv_array_pushl(&args, "submodule", "update", "--init", "--recursive", NULL);
+ if (option_shallow_submodules && option_depth)
+ argv_array_pushf(&args, "--depth=1");
+
if (max_jobs != -1)
argv_array_pushf(&args, "--jobs=%d", max_jobs);
--
2.7.0.rc0.42.g42a5408.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH] clone: add `--shallow-submodules` flag
2016-03-11 23:41 [RFC/PATCH] clone: add `--shallow-submodules` flag Stefan Beller
@ 2016-03-12 0:41 ` Junio C Hamano
2016-03-12 0:56 ` Stefan Beller
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-03-12 0:41 UTC (permalink / raw)
To: Stefan Beller; +Cc: git, larsxschneider, jrnieder
Stefan Beller <sbeller@google.com> writes:
> When creating a shallow clone of a repository with submodules, the depth
> argument does not influence the submodules, i.e. the submodules are done
> as non-shallow clones. It is unclear what the best default is for the
> depth of submodules of a shallow clone, so we need to have the possibility
> to do all kinds of combinations:
>
> * shallow super project with shallow submodules
> e.g. build bots starting always from scratch. They want to transmit
> the least amount of network data as well as using the least amount
> of space on their hard drive.
> * shallow super project with unshallow submodules
> e.g. The superproject is just there to track a collection of repositories
> and it is not important to have the relationship between the repositories
> intact. However the history of the individual submodules matter.
> * unshallow super project with shallow submodules
> e.g. The superproject is the actual project and the submodule is a
> library which is rarely touched.
>
> The new switch to select submodules to be shallow or unshallow supports
> all of these three cases.
I think something like this is necessary to prime the well, but the
more important (and intereseting) bit is how this shallowness is
going to be maintained and carried forward across the future updates
to the top-level supermodule. A submodule that was cloned at depth=1
initially along with its supermodule when the latter was initially
cloned does not have to be indefinitely kept at depth=1, and there
would be a lot of creative ways to make it useful, but the creative
and useful logic would need a piece of information to tell the
future "submodule update" why the submodule repository is shallow to
take into account, I would imagine.
It is somewhat curious that there is no hint left in the submodule
repositories (e.g. their configfile) that they are originally
created with an explicit user request "I said that I want these
submodules to be cloned with depth=1", from that point of view.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH] clone: add `--shallow-submodules` flag
2016-03-12 0:41 ` Junio C Hamano
@ 2016-03-12 0:56 ` Stefan Beller
2016-03-12 19:29 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-03-12 0:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger.kernel.org, Lars Schneider, Jonathan Nieder
On Fri, Mar 11, 2016 at 4:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> When creating a shallow clone of a repository with submodules, the depth
>> argument does not influence the submodules, i.e. the submodules are done
>> as non-shallow clones. It is unclear what the best default is for the
>> depth of submodules of a shallow clone, so we need to have the possibility
>> to do all kinds of combinations:
>>
>> * shallow super project with shallow submodules
>> e.g. build bots starting always from scratch. They want to transmit
>> the least amount of network data as well as using the least amount
>> of space on their hard drive.
>> * shallow super project with unshallow submodules
>> e.g. The superproject is just there to track a collection of repositories
>> and it is not important to have the relationship between the repositories
>> intact. However the history of the individual submodules matter.
>> * unshallow super project with shallow submodules
>> e.g. The superproject is the actual project and the submodule is a
>> library which is rarely touched.
>>
>> The new switch to select submodules to be shallow or unshallow supports
>> all of these three cases.
>
> I think something like this is necessary to prime the well, but the
> more important (and intereseting) bit is how this shallowness is
> going to be maintained and carried forward across the future updates
> to the top-level supermodule. A submodule that was cloned at depth=1
> initially along with its supermodule when the latter was initially
> cloned does not have to be indefinitely kept at depth=1, and there
> would be a lot of creative ways to make it useful, but the creative
> and useful logic would need a piece of information to tell the
> future "submodule update" why the submodule repository is shallow to
> take into account, I would imagine.
>
> It is somewhat curious that there is no hint left in the submodule
> repositories (e.g. their configfile) that they are originally
> created with an explicit user request "I said that I want these
> submodules to be cloned with depth=1", from that point of view.
Why is it interesting for submodules but not for standard repositories?
If I clone a repository without submodules, it is also not recorded
that I cloned with an explicit depth=1. If you fetch, you may end up with
a deeper history as git fetch doesn't do a "reshallow" to the configured
depth.
As the depth can easily change I view depth as a measure which is
only valid at moment in time, after the operation succeeded we rather
want to talk about the cut off points which were introduced by the
shallow operation? And these are kept as is by default which is sane.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH] clone: add `--shallow-submodules` flag
2016-03-12 0:56 ` Stefan Beller
@ 2016-03-12 19:29 ` Junio C Hamano
2016-03-14 18:17 ` Stefan Beller
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-03-12 19:29 UTC (permalink / raw)
To: Stefan Beller; +Cc: git@vger.kernel.org, Lars Schneider, Jonathan Nieder
Stefan Beller <sbeller@google.com> writes:
> Why is it interesting for submodules but not for standard repositories?
>
> If I clone a repository without submodules, it is also not recorded
> that I cloned with an explicit depth=1. If you fetch, you may end up with
> a deeper history as git fetch doesn't do a "reshallow" to the configured
> depth.
Very simple.
If you do not have submodule, you would always interact with the
other side directly with "git fetch" or "git pull" and have total
control over when you choose to pass or not to pass extra options to
choose to 1. incrementally extend, 2. deepen, or 3. unshallow. The
user will always explicitly tell you, and knowing how you got there
would not help you, as there is no need to guess for you.
The user can do the same explicit "cd dir && git fetch" update in
each submodule directory and give appropriate options to choose
among the three, but I have an impression that your recent work is
going in the direction of making commands that are run in the
superproject recurse into submodules that automatically fetches and
updates the history down there, discouraging users from working on
individual submodules. You lose the flexibility to explicitly
choose among the three for individual submodules, and you may want
to have some smart in your "run from the superproject and recurse"
tools.
A submodule that was initially cloned with depth=1, perhaps because
the user didn't know if the module was interesting to her in the
context of working on the superproject before she had her clone of
the superproject hence she only wanted to see what's there, and a
submodule that was not even fetched initially when the superproject
was cloned and later was "submodule init"ed and fetched with
depth=1, would have the same shallow boundary, but the intent of the
user would clearly be different in the larger picture. I imagined
that your "run in top-level and recurse to fetch in submodules"
tools would benefit if it has more information to intuit what the
end user meant.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH] clone: add `--shallow-submodules` flag
2016-03-12 19:29 ` Junio C Hamano
@ 2016-03-14 18:17 ` Stefan Beller
2016-03-14 18:37 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Beller @ 2016-03-14 18:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger.kernel.org, Lars Schneider, Jonathan Nieder
On Sat, Mar 12, 2016 at 11:29 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Why is it interesting for submodules but not for standard repositories?
>>
>> If I clone a repository without submodules, it is also not recorded
>> that I cloned with an explicit depth=1. If you fetch, you may end up with
>> a deeper history as git fetch doesn't do a "reshallow" to the configured
>> depth.
>
> Very simple.
>
> If you do not have submodule, you would always interact with the
> other side directly with "git fetch" or "git pull" and have total
> control over when you choose to pass or not to pass extra options to
> choose to 1. incrementally extend, 2. deepen, or 3. unshallow. The
> user will always explicitly tell you, and knowing how you got there
> would not help you, as there is no need to guess for you.
But the 1. being the default when no options are given, would fit into
the story of submodule treatment here, too?
Say you have run
$ git clone --recurse-submodules --shallow-submodules --depth 42
and later
$ git fetch --recurse-submodules
you'd fetch as usual without moving the anchor point. You could have
options like
$ git fetch --recurse-submodules[=label/pattern]
--unshallow-submodules[=label/pattern]
So first a switch for submodule behavior during fetch and that switch could
be narrowed down to specific submodules only.
>
> The user can do the same explicit "cd dir && git fetch" update in
> each submodule directory and give appropriate options to choose
> among the three, but I have an impression that your recent work is
> going in the direction of making commands that are run in the
> superproject recurse into submodules that automatically fetches and
> updates the history down there, discouraging users from working on
> individual submodules.
Glad you see the high level direction where the submodules are heading,
I was just fixing the most obvious problems (as indicated by their existence
in the "repo" tool).
For the workflow I would think you'd only operate in the supermodule for
synchronizing, e.g.
$ (cd super_project && git pull --recurse-submodules)
$ (cd super_project/submoduleA && $EDITOR && git add <...> && git
commit <...>)
$ (cd super_project/submoduleB && $EDITOR && git add <...> && git
commit <...>)
# now it becomes a Gerrit specific workflow; no need to commit
submodule changes
# to the super project, (but it may not do harm?)
$ (cd super_project && git push --recurse-submodules
--submodules-only origin HEAD:refs/for/master)
The last command pushes submodules only as Gerrit will perform the
superproject update
on your behalf. This is needed as submodules are treated as binary
files, i.e. merging
diverged submodules is hard. So to avoid diverging submodules, Gerrit
can do that for you.
Another workflow could be to improve the merge algorithm for
submodules, such that
you can specify how they should be integrated (merged when no conflict
in the submodules
occurs; or rebase the commits in the submodules, altering the commit
in the superproject.)
> You lose the flexibility to explicitly
> choose among the three for individual submodules, and you may want
> to have some smart in your "run from the superproject and recurse"
> tools.
But that smart can also come from commandline options (or configuration in
the super project). So you would suggest to configure the superproject
to a certain behavior when you clone with submodules in a certain way?
>
> A submodule that was initially cloned with depth=1, perhaps because
> the user didn't know if the module was interesting to her in the
> context of working on the superproject before she had her clone of
> the superproject hence she only wanted to see what's there, and a
> submodule that was not even fetched initially when the superproject
> was cloned and later was "submodule init"ed and fetched with
> depth=1, would have the same shallow boundary, but the intent of the
> user would clearly be different in the larger picture. I imagined
> that your "run in top-level and recurse to fetch in submodules"
> tools would benefit if it has more information to intuit what the
> end user meant.
So the first one should be shallowed after fetch, but the second would
fetch or even deepen automatically?
I am not sure if we add too much "intuitive magic" here as it would be
hard to explain why that happens?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH] clone: add `--shallow-submodules` flag
2016-03-14 18:17 ` Stefan Beller
@ 2016-03-14 18:37 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2016-03-14 18:37 UTC (permalink / raw)
To: Stefan Beller; +Cc: git@vger.kernel.org, Lars Schneider, Jonathan Nieder
Stefan Beller <sbeller@google.com> writes:
> you'd fetch as usual without moving the anchor point. You could have
> options like
>
> $ git fetch --recurse-submodules[=label/pattern]
> --unshallow-submodules[=label/pattern]
> ...
> So the first one should be shallowed after fetch, but the second would
> fetch or even deepen automatically?
>
> I am not sure if we add too much "intuitive magic" here as it would be
> hard to explain why that happens?
All of these are things that need to be thought about when you are
making more things recurse into submodules, as you would want to
make sure people can do different things to each of the submodule
they have.
And if my comment to an RFC/PATCH made you think about them, it
served its purpose. I didn't mean to say "You must implement the
smart from the beginning"--I just meant to say that users will
expect more from the recursive behaviour and you must be prepared
for it.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-14 18:38 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-11 23:41 [RFC/PATCH] clone: add `--shallow-submodules` flag Stefan Beller
2016-03-12 0:41 ` Junio C Hamano
2016-03-12 0:56 ` Stefan Beller
2016-03-12 19:29 ` Junio C Hamano
2016-03-14 18:17 ` Stefan Beller
2016-03-14 18:37 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).