git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] clone: --dissociate option to mark that reference is only temporary
@ 2014-10-14 19:57 Junio C Hamano
  2014-10-15 14:34 ` Marc Branchaud
  2014-10-15 19:44 ` Johannes Sixt
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2014-10-14 19:57 UTC (permalink / raw)
  To: git

While use of the --reference option to borrow objects from an
existing local repository of the same project is an effective way to
reduce traffic when cloning a project over the network, it makes the
resulting "borrowing" repository dependent on the "borrowed"
repository.  After running

	git clone --reference=P $URL Q

the resulting repository Q will be broken if the borrowed repository
P disappears.

The way to allow the borrowed repository to be removed is to repack
the borrowing repository (i.e. run "git repack -a -d" in Q); while
power users may know it very well, it is not easily discoverable.

Teach a new "--dissociate" option to "git clone" to run this
repacking for the user.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This comes from
   http://thread.gmane.org/gmane.comp.version-control.git/243918/focus=245397
   which is one of the low-hanging entries in the leftover-bits list
   http://git-blame.blogspot.com/p/leftover-bits.html

   Yes, I must have been really bored to do this ;-)

 Documentation/git-clone.txt | 11 +++++++++--
 builtin/clone.c             | 25 +++++++++++++++++++++++++
 t/t5700-clone-reference.sh  | 17 +++++++++++++++++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 0363d00..f1f2a3f 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 'git clone' [--template=<template_directory>]
 	  [-l] [-s] [--no-hardlinks] [-q] [-n] [--bare] [--mirror]
 	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
-	  [--separate-git-dir <git dir>]
+	  [--dissociate] [--separate-git-dir <git dir>]
 	  [--depth <depth>] [--[no-]single-branch]
 	  [--recursive | --recurse-submodules] [--] <repository>
 	  [<directory>]
@@ -98,7 +98,14 @@ objects from the source repository into a pack in the cloned repository.
 	require fewer objects to be copied from the repository
 	being cloned, reducing network and local storage costs.
 +
-*NOTE*: see the NOTE for the `--shared` option.
+*NOTE*: see the NOTE for the `--shared` option, and also the
+`--dissociate` option.
+
+--dissociate::
+	Borrow the objects from reference repositories specified
+	with the `--reference` options only to reduce network
+	transfer and stop borrowing from them after a clone is made
+	by making necessary local copies of borrowed objects.
 
 --quiet::
 -q::
diff --git a/builtin/clone.c b/builtin/clone.c
index bbd169c..780fbd5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -48,6 +48,7 @@ static int option_verbosity;
 static int option_progress = -1;
 static struct string_list option_config;
 static struct string_list option_reference;
+static int option_dissociate;
 
 static int opt_parse_reference(const struct option *opt, const char *arg, int unset)
 {
@@ -93,6 +94,8 @@ static struct option builtin_clone_options[] = {
 		    N_("create a shallow clone of that depth")),
 	OPT_BOOL(0, "single-branch", &option_single_branch,
 		    N_("clone only one branch, HEAD or --branch")),
+	OPT_BOOL(0, "dissociate", &option_dissociate,
+		 N_("use --reference only while cloning")),
 	OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 		   N_("separate git dir from working tree")),
 	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
@@ -736,6 +739,21 @@ static void write_refspec_config(const char* src_ref_prefix,
 	strbuf_release(&value);
 }
 
+static void dissociate_from_references(void)
+{
+	struct child_process cmd;
+
+	memset(&cmd, 0, sizeof(cmd));
+	argv_array_pushl(&cmd.args, "repack", "-a", "-d", NULL);
+	cmd.git_cmd = 1;
+	cmd.out = -1;
+	cmd.no_stdin = 1;
+	if (run_command(&cmd))
+		die(_("cannot repack to clean up"));
+	if (unlink(git_path("objects/info/alternates")) && errno != ENOENT)
+		die_errno(_("cannot unlink temporary alternates file"));
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
 	int is_bundle = 0, is_local;
@@ -883,6 +901,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	if (option_reference.nr)
 		setup_reference();
+	else if (option_dissociate) {
+		warning(_("--dissociate given, but there is no --reference"));
+		option_dissociate = 0;
+	}
 
 	fetch_pattern = value.buf;
 	refspec = parse_fetch_refspec(1, &fetch_pattern);
@@ -996,6 +1018,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	transport_unlock_pack(transport);
 	transport_disconnect(transport);
 
+	if (option_dissociate)
+		dissociate_from_references();
+
 	junk_mode = JUNK_LEAVE_REPO;
 	err = checkout();
 
diff --git a/t/t5700-clone-reference.sh b/t/t5700-clone-reference.sh
index 6537911..3e783fc 100755
--- a/t/t5700-clone-reference.sh
+++ b/t/t5700-clone-reference.sh
@@ -198,4 +198,21 @@ test_expect_success 'clone using repo pointed at by gitfile as reference' '
 	test_cmp expected "$base_dir/O/.git/objects/info/alternates"
 '
 
+test_expect_success 'clone and dissociate from reference' '
+	git init P &&
+	(
+		cd P &&	test_commit one
+	) &&
+	git clone P Q &&
+	(
+		cd Q && test_commit two
+	) &&
+	git clone --no-local --reference=P Q R &&
+	git clone --no-local --reference=P --dissociate Q S &&
+	# removing the reference P would corrupt R but not S
+	rm -fr P &&
+	test_must_fail git -C R fsck &&
+	git -C S fsck
+'
+
 test_done
-- 
2.1.2-488-g6ab273f

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-14 19:57 [PATCH] clone: --dissociate option to mark that reference is only temporary Junio C Hamano
@ 2014-10-15 14:34 ` Marc Branchaud
  2014-10-15 17:29   ` Junio C Hamano
  2014-10-15 19:44 ` Johannes Sixt
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2014-10-15 14:34 UTC (permalink / raw)
  To: Junio C Hamano, git

On 14-10-14 03:57 PM, Junio C Hamano wrote:
> While use of the --reference option to borrow objects from an
> existing local repository of the same project is an effective way to
> reduce traffic when cloning a project over the network, it makes the
> resulting "borrowing" repository dependent on the "borrowed"
> repository.  After running
> 
> 	git clone --reference=P $URL Q
> 
> the resulting repository Q will be broken if the borrowed repository
> P disappears.
> 
> The way to allow the borrowed repository to be removed is to repack
> the borrowing repository (i.e. run "git repack -a -d" in Q); while
> power users may know it very well, it is not easily discoverable.
> 
> Teach a new "--dissociate" option to "git clone" to run this
> repacking for the user.

After reading the above I thought the option would be better named
"--derference".  It seemed to me to be something one would like to run after
the first --reference clone.

Looking more closely I see that's not the case.  In fact, the option only
makes sense if --reference is also used.

I think things would be more understandable if the option was "--dissociate
<repository>" and was an explicit alternative to --reference:
	[[--reference | --dissociate] <repository>]

I'm still not liking the name "--dissociate" though.  The original suggestion
of "--borrow" is better.  Perhaps "--library" or "--local-cache"?  I dunno...

So now I'm wondering if the implementation would be more efficient as an
extension of the --local operation.  That is, instead of a post-clone repack,
do a --local clone first followed by a simple "git fetch" from the source repo.

		M.


> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> 
>  * This comes from
>    http://thread.gmane.org/gmane.comp.version-control.git/243918/focus=245397
>    which is one of the low-hanging entries in the leftover-bits list
>    http://git-blame.blogspot.com/p/leftover-bits.html
> 
>    Yes, I must have been really bored to do this ;-)
> 
>  Documentation/git-clone.txt | 11 +++++++++--
>  builtin/clone.c             | 25 +++++++++++++++++++++++++
>  t/t5700-clone-reference.sh  | 17 +++++++++++++++++
>  3 files changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
> index 0363d00..f1f2a3f 100644
> --- a/Documentation/git-clone.txt
> +++ b/Documentation/git-clone.txt
> @@ -12,7 +12,7 @@ SYNOPSIS
>  'git clone' [--template=<template_directory>]
>  	  [-l] [-s] [--no-hardlinks] [-q] [-n] [--bare] [--mirror]
>  	  [-o <name>] [-b <name>] [-u <upload-pack>] [--reference <repository>]
> -	  [--separate-git-dir <git dir>]
> +	  [--dissociate] [--separate-git-dir <git dir>]
>  	  [--depth <depth>] [--[no-]single-branch]
>  	  [--recursive | --recurse-submodules] [--] <repository>
>  	  [<directory>]
> @@ -98,7 +98,14 @@ objects from the source repository into a pack in the cloned repository.
>  	require fewer objects to be copied from the repository
>  	being cloned, reducing network and local storage costs.
>  +
> -*NOTE*: see the NOTE for the `--shared` option.
> +*NOTE*: see the NOTE for the `--shared` option, and also the
> +`--dissociate` option.
> +
> +--dissociate::
> +	Borrow the objects from reference repositories specified
> +	with the `--reference` options only to reduce network
> +	transfer and stop borrowing from them after a clone is made
> +	by making necessary local copies of borrowed objects.
>  
>  --quiet::
>  -q::
> diff --git a/builtin/clone.c b/builtin/clone.c
> index bbd169c..780fbd5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -48,6 +48,7 @@ static int option_verbosity;
>  static int option_progress = -1;
>  static struct string_list option_config;
>  static struct string_list option_reference;
> +static int option_dissociate;
>  
>  static int opt_parse_reference(const struct option *opt, const char *arg, int unset)
>  {
> @@ -93,6 +94,8 @@ static struct option builtin_clone_options[] = {
>  		    N_("create a shallow clone of that depth")),
>  	OPT_BOOL(0, "single-branch", &option_single_branch,
>  		    N_("clone only one branch, HEAD or --branch")),
> +	OPT_BOOL(0, "dissociate", &option_dissociate,
> +		 N_("use --reference only while cloning")),
>  	OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
>  		   N_("separate git dir from working tree")),
>  	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
> @@ -736,6 +739,21 @@ static void write_refspec_config(const char* src_ref_prefix,
>  	strbuf_release(&value);
>  }
>  
> +static void dissociate_from_references(void)
> +{
> +	struct child_process cmd;
> +
> +	memset(&cmd, 0, sizeof(cmd));
> +	argv_array_pushl(&cmd.args, "repack", "-a", "-d", NULL);
> +	cmd.git_cmd = 1;
> +	cmd.out = -1;
> +	cmd.no_stdin = 1;
> +	if (run_command(&cmd))
> +		die(_("cannot repack to clean up"));
> +	if (unlink(git_path("objects/info/alternates")) && errno != ENOENT)
> +		die_errno(_("cannot unlink temporary alternates file"));
> +}
> +
>  int cmd_clone(int argc, const char **argv, const char *prefix)
>  {
>  	int is_bundle = 0, is_local;
> @@ -883,6 +901,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  
>  	if (option_reference.nr)
>  		setup_reference();
> +	else if (option_dissociate) {
> +		warning(_("--dissociate given, but there is no --reference"));
> +		option_dissociate = 0;
> +	}
>  
>  	fetch_pattern = value.buf;
>  	refspec = parse_fetch_refspec(1, &fetch_pattern);
> @@ -996,6 +1018,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	transport_unlock_pack(transport);
>  	transport_disconnect(transport);
>  
> +	if (option_dissociate)
> +		dissociate_from_references();
> +
>  	junk_mode = JUNK_LEAVE_REPO;
>  	err = checkout();
>  
> diff --git a/t/t5700-clone-reference.sh b/t/t5700-clone-reference.sh
> index 6537911..3e783fc 100755
> --- a/t/t5700-clone-reference.sh
> +++ b/t/t5700-clone-reference.sh
> @@ -198,4 +198,21 @@ test_expect_success 'clone using repo pointed at by gitfile as reference' '
>  	test_cmp expected "$base_dir/O/.git/objects/info/alternates"
>  '
>  
> +test_expect_success 'clone and dissociate from reference' '
> +	git init P &&
> +	(
> +		cd P &&	test_commit one
> +	) &&
> +	git clone P Q &&
> +	(
> +		cd Q && test_commit two
> +	) &&
> +	git clone --no-local --reference=P Q R &&
> +	git clone --no-local --reference=P --dissociate Q S &&
> +	# removing the reference P would corrupt R but not S
> +	rm -fr P &&
> +	test_must_fail git -C R fsck &&
> +	git -C S fsck
> +'
> +
>  test_done
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 14:34 ` Marc Branchaud
@ 2014-10-15 17:29   ` Junio C Hamano
  2014-10-15 20:51     ` Marc Branchaud
  2014-10-16 19:27     ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2014-10-15 17:29 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: git

Marc Branchaud <marcnarc@xiplink.com> writes:

> I think things would be more understandable if the option was "--dissociate
> <repository>" and was an explicit alternative to --reference:
> 	[[--reference | --dissociate] <repository>]
>
> I'm still not liking the name "--dissociate" though.  The original suggestion
> of "--borrow" is better.  Perhaps "--library" or "--local-cache"?  I dunno...

I was not thinking when I originally started the topic with
"--borrow", until I realized that it would not make much sense,
primarily because we allow multiple references.

What should this command line do, and how would you implement such a
behaviour?

    $ git clone \
        --reference=/local/pool/linux.git \
        --borrow=../my/neighbour/linux-hack.git \
        git://git.kernel.org/...../linux.git

With "do the usual --reference thing, but then dissociate the result
from referents" option, there is no ambiguity and that is why I did
not go with the "--borrow" option suggested in the original thread.

> So now I'm wondering if the implementation would be more efficient as an
> extension of the --local operation.  That is, instead of a post-clone repack,
> do a --local clone first followed by a simple "git fetch" from the source repo.

The network overhead may be comparable to the "--reference"
optimization, but if your "clone --local" ends up copying (instead
of hard-linking), the initial cost to copy locally would be a pure
extra price over "clone --reference and then --dissociate".  If the
local clone uses hard-linking, it would be cheaper, but it still
costs more than dropping an entry into .git/objects/info/alternates,
I would imagine.  You will pay with your scheme the same cost to run
"repack -a -d", which is paid by "--dissociate" at the end of clone,
eventually at the first "gc", so there is no efficiency advantage,
either.

The above is my knee-jerk assessment without any measuring, though.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-14 19:57 [PATCH] clone: --dissociate option to mark that reference is only temporary Junio C Hamano
  2014-10-15 14:34 ` Marc Branchaud
@ 2014-10-15 19:44 ` Johannes Sixt
  2014-10-15 21:33   ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Sixt @ 2014-10-15 19:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Am 14.10.2014 um 21:57 schrieb Junio C Hamano:
> +static void dissociate_from_references(void)
> +{
> +	struct child_process cmd;
> +
> +	memset(&cmd, 0, sizeof(cmd));

We have CHILD_PROCESS_INIT these days.

> +	argv_array_pushl(&cmd.args, "repack", "-a", "-d", NULL);
> +	cmd.git_cmd = 1;
> +	cmd.out = -1;

This requests a pipe, but you don't use it nor need it.

> +	cmd.no_stdin = 1;
> +	if (run_command(&cmd))
> +		die(_("cannot repack to clean up"));
> +	if (unlink(git_path("objects/info/alternates")) && errno != ENOENT)
> +		die_errno(_("cannot unlink temporary alternates file"));
> +}

Unless you have a secret plan, you can do it even shorter with our
helpers:

diff --git a/builtin/clone.c b/builtin/clone.c
index 8649090..81efb07 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -743,14 +743,9 @@ static void write_refspec_config(const char *src_ref_prefix,
 
 static void dissociate_from_references(void)
 {
-	struct child_process cmd;
-
-	memset(&cmd, 0, sizeof(cmd));
-	argv_array_pushl(&cmd.args, "repack", "-a", "-d", NULL);
-	cmd.git_cmd = 1;
-	cmd.out = -1;
-	cmd.no_stdin = 1;
-	if (run_command(&cmd))
+	static const char* argv[] = { "repack", "-a", "-d", NULL };
+
+	if (run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDIN))
 		die(_("cannot repack to clean up"));
 	if (unlink(git_path("objects/info/alternates")) && errno != ENOENT)
 		die_errno(_("cannot unlink temporary alternates file"));

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 17:29   ` Junio C Hamano
@ 2014-10-15 20:51     ` Marc Branchaud
  2014-10-15 21:33       ` Junio C Hamano
  2014-10-16 19:27     ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2014-10-15 20:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 14-10-15 01:29 PM, Junio C Hamano wrote:
> Marc Branchaud <marcnarc@xiplink.com> writes:
> 
>> I think things would be more understandable if the option was "--dissociate
>> <repository>" and was an explicit alternative to --reference:
>> 	[[--reference | --dissociate] <repository>]
>>
>> I'm still not liking the name "--dissociate" though.  The original suggestion
>> of "--borrow" is better.  Perhaps "--library" or "--local-cache"?  I dunno...
> 
> I was not thinking when I originally started the topic with
> "--borrow", until I realized that it would not make much sense,
> primarily because we allow multiple references.

I hadn't realized that was possible.  There's no indication in the man page
that multiple --references are allowed (or forbidden, for that matter).

> What should this command line do, and how would you implement such a
> behaviour?
> 
>     $ git clone \
>         --reference=/local/pool/linux.git \
>         --borrow=../my/neighbour/linux-hack.git \
>         git://git.kernel.org/...../linux.git
> 
> With "do the usual --reference thing, but then dissociate the result
> from referents" option, there is no ambiguity and that is why I did
> not go with the "--borrow" option suggested in the original thread.

I had not considered this case.  My limited imagination has a hard time
coming up with a scenario where more than one --reference (or
--borrow/--dissociate) would make sense.  In this example, the --borrow seems
useless.  How would clone decide that it even needed objects from the
neighbour repo?  None of the refs on gko need any of the neighbour's unique
objects.  (I get the feeling I don't understand how clone works...)

>> So now I'm wondering if the implementation would be more efficient as an
>> extension of the --local operation.  That is, instead of a post-clone repack,
>> do a --local clone first followed by a simple "git fetch" from the source repo.
> 
> The network overhead may be comparable to the "--reference"
> optimization, but if your "clone --local" ends up copying (instead
> of hard-linking), the initial cost to copy locally would be a pure
> extra price over "clone --reference and then --dissociate".  If the
> local clone uses hard-linking, it would be cheaper, but it still
> costs more than dropping an entry into .git/objects/info/alternates,
> I would imagine.  You will pay with your scheme the same cost to run
> "repack -a -d", which is paid by "--dissociate" at the end of clone,
> eventually at the first "gc", so there is no efficiency advantage,
> either.
> 
> The above is my knee-jerk assessment without any measuring, though.

That makes sense to me, at least.  I agree with your assessment.

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 20:51     ` Marc Branchaud
@ 2014-10-15 21:33       ` Junio C Hamano
  2014-10-15 21:44         ` Marc Branchaud
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2014-10-15 21:33 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: git

Marc Branchaud <marcnarc@xiplink.com> writes:

> On 14-10-15 01:29 PM, Junio C Hamano wrote:
>
>>     $ git clone \
>>         --reference=/local/pool/linux.git \
>>         --borrow=../my/neighbour/linux-hack.git \
>>         git://git.kernel.org/...../linux.git
>> 
>> With "do the usual --reference thing, but then dissociate the result
>> from referents" option, there is no ambiguity and that is why I did
>> not go with the "--borrow" option suggested in the original thread.
>
> I had not considered this case.  My limited imagination has a hard time
> coming up with a scenario where more than one --reference (or
> In this example, the --borrow seems
> useless.  How would clone decide that it even needed objects from the
> neighbour repo?  None of the refs on gko need any of the neighbour's unique
> objects.

A probable scenario might go like this.

    The company-wide pool is designed for everybody's use and will
    stay, even if it lags behind because it fetches every other day,
    so it is safe to keep referring to via alternates.  My neighbour
    is following the linux-next repository and has changes that are
    meant to land "in the future" to the mainline, but it can
    disappear without notice so I cannot afford to depend on its
    presense forever.

Under that particular scenario, what should happen is fairly clear;
we want to dissociate from neibour's immediately after clone is
done, while being still dependent on the shared pool.  But there is
the question of "how would you implement such a behaviour" (even if
you know that is the single only behaviour you would want to see).

Also I am not confident enough that it is the only plausible way any
user may want to mix reference and borrow together.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 19:44 ` Johannes Sixt
@ 2014-10-15 21:33   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2014-10-15 21:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <j6t@kdbg.org> writes:

> Unless you have a secret plan, you can do it even shorter with our
> helpers:

Thanks.  No there isn't a secret plan.  It was just me being rusty.

>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 8649090..81efb07 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -743,14 +743,9 @@ static void write_refspec_config(const char *src_ref_prefix,
>  
>  static void dissociate_from_references(void)
>  {
> -	struct child_process cmd;
> -
> -	memset(&cmd, 0, sizeof(cmd));
> -	argv_array_pushl(&cmd.args, "repack", "-a", "-d", NULL);
> -	cmd.git_cmd = 1;
> -	cmd.out = -1;
> -	cmd.no_stdin = 1;
> -	if (run_command(&cmd))
> +	static const char* argv[] = { "repack", "-a", "-d", NULL };
> +
> +	if (run_command_v_opt(argv, RUN_GIT_CMD|RUN_COMMAND_NO_STDIN))
>  		die(_("cannot repack to clean up"));
>  	if (unlink(git_path("objects/info/alternates")) && errno != ENOENT)
>  		die_errno(_("cannot unlink temporary alternates file"));

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 21:33       ` Junio C Hamano
@ 2014-10-15 21:44         ` Marc Branchaud
  2014-10-15 21:50           ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2014-10-15 21:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 14-10-15 05:33 PM, Junio C Hamano wrote:
> Marc Branchaud <marcnarc@xiplink.com> writes:
> 
>> On 14-10-15 01:29 PM, Junio C Hamano wrote:
>>
>>>     $ git clone \
>>>         --reference=/local/pool/linux.git \
>>>         --borrow=../my/neighbour/linux-hack.git \
>>>         git://git.kernel.org/...../linux.git
>>>
>>> With "do the usual --reference thing, but then dissociate the result
>>> from referents" option, there is no ambiguity and that is why I did
>>> not go with the "--borrow" option suggested in the original thread.
>>
>> I had not considered this case.  My limited imagination has a hard time
>> coming up with a scenario where more than one --reference (or
>> In this example, the --borrow seems
>> useless.  How would clone decide that it even needed objects from the
>> neighbour repo?  None of the refs on gko need any of the neighbour's unique
>> objects.
> 
> A probable scenario might go like this.
> 
>     The company-wide pool is designed for everybody's use and will
>     stay, even if it lags behind because it fetches every other day,
>     so it is safe to keep referring to via alternates.  My neighbour
>     is following the linux-next repository and has changes that are
>     meant to land "in the future" to the mainline, but it can
>     disappear without notice so I cannot afford to depend on its
>     presense forever.
> 
> Under that particular scenario, what should happen is fairly clear;
> we want to dissociate from neibour's immediately after clone is
> done, while being still dependent on the shared pool.

Yes, but we're cloning gko, not the neighbour.  Doesn't that mean that the
clone operation won't know about any of the neighbour's refs?  In order to
get any of the neighbour's refs (and its unique objects) you have to either
clone the neighbour directly or (post-clone) fetch from it, no?

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 21:44         ` Marc Branchaud
@ 2014-10-15 21:50           ` Junio C Hamano
  2014-10-16 15:26             ` Marc Branchaud
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2014-10-15 21:50 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: git

Marc Branchaud <marcnarc@xiplink.com> writes:

> Yes, but we're cloning gko, not the neighbour.  Doesn't that mean that the
> clone operation won't know about any of the neighbour's refs?

No.  --reference (and a natural implementation of --borrow, I would imagine)
peeks the refs of the repository we borrow from and that is how
clone can say "I already have objects reachable from these refs, so
please send me the remainder" to the repository it is cloning from.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 21:50           ` Junio C Hamano
@ 2014-10-16 15:26             ` Marc Branchaud
  2014-10-17 12:47               ` Jakub Narębski
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Branchaud @ 2014-10-16 15:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 14-10-15 05:50 PM, Junio C Hamano wrote:
> Marc Branchaud <marcnarc@xiplink.com> writes:
> 
>> Yes, but we're cloning gko, not the neighbour.  Doesn't that mean that the
>> clone operation won't know about any of the neighbour's refs?
> 
> No.  --reference (and a natural implementation of --borrow, I would imagine)
> peeks the refs of the repository we borrow from and that is how
> clone can say "I already have objects reachable from these refs, so
> please send me the remainder" to the repository it is cloning from.

By "know about" I meant "want to use".  Sorry for being a bit dense about
this; let me try again.

(BTW, it occurs to me that your patch -- if I read it right -- doesn't
fulfill your scenario since it disassociates the clone from all repos,
regardless of whether they are specified with --reference or --borrow.  In
the following I assume a --borrow that only disassociates from the specified
repo and leaves the --reference repo(s) alone.)

Since we're cloning gko's refs, all of the neighbour's unique refs and
objects can be ignored.  Even though paths to the neighbour and the local
pool will be in the clone's alternates file, any refs the clone operation
creates won't need to use any objects from the neighbour.  The clone
operation gives us no way to refer to the neighbour's unique objects.

I just don't see what difference the --borrow option makes.  Consider the two
cases:

With just --reference=/local/pool/linux.git:
	1. Set up the alternates file with that path.
	2. Copy gko's refs into refs/remotes/origin/.
	3. Set up refs/heads/master to refer to gko's HEAD.
	4. Checkout refs/heads/master (uses objects from local pool).

With both that --reference and --borrow=../my/neighbour/linux-hack.git:
	1. Set up the alternates file with both paths.
	2. Copy gko's refs into refs/remotes/origin/.
	3. Set up refs/heads/master to refer to gko's HEAD.
	4. Checkout refs/heads/master (uses objects from local pool).
	5. Disassociate ourselves from the neighbour repo.

In both cases the first four actions have no need of the neighbour repo.  The
second case's fifth action surgically removes the neighbour as an alternate
object store, and we're left with the same clone we got in the first case.
What was the point?

It seems that in order to make something like --borrow useful, "git clone"
would somehow need to know which of the neighbour's refs you want to *also*
clone, then copy any unique objects from the neighbour before disassociating
from it.

		M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-15 17:29   ` Junio C Hamano
  2014-10-15 20:51     ` Marc Branchaud
@ 2014-10-16 19:27     ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2014-10-16 19:27 UTC (permalink / raw)
  To: Marc Branchaud; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Marc Branchaud <marcnarc@xiplink.com> writes:
>
>> I think things would be more understandable if the option was "--dissociate
>> <repository>" and was an explicit alternative to --reference:
>> 	[[--reference | --dissociate] <repository>]
>>
>> I'm still not liking the name "--dissociate" though.  The original suggestion
>> of "--borrow" is better.  Perhaps "--library" or "--local-cache"?  I dunno...
>
> I was not thinking when I originally started the topic with
> "--borrow", until I realized that it would not make much sense,
> primarily because we allow multiple references.
>
> What should this command line do, and how would you implement such a
> behaviour?
>
>     $ git clone \
>         --reference=/local/pool/linux.git \
>         --borrow=../my/neighbour/linux-hack.git \
>         git://git.kernel.org/...../linux.git
>
> With "do the usual --reference thing, but then dissociate the result
> from referents" option, there is no ambiguity and that is why I did
> not go with the "--borrow" option suggested in the original thread.

Another approach we could take is to add --borrow and then forbid
mixing --reference and --borrow on the same command line, until
somebody comes up with an implementation to allow us dissociate from
borrowed repositories while still depending on referenced ones, at
which time we can lift that limitation.

But if that future extension is not going to happen, there is not
much difference.  The end result will be either

 - the one that is very clear to the users that they cannot
   selectively dissociate because there is no such option documented
   (i.e. --reference, --dissociate and no --borrow); and

 - the other one that gives a slight hope to the users that the
   combination might work (i.e. with --reference, --borrow and no
   --dissociate) but refuses to do so when it actually is run.

Between the two, the former might actually be easier to the users to
understand, as it keeps their expectation in line with the reality.

So I dunno.

I certainly am *not* going to do the selective dissociation change
myself.  Do we have a volunteer (hint: this probably does not fall
into "low-hanging fruit" category)?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] clone: --dissociate option to mark that reference is only temporary
  2014-10-16 15:26             ` Marc Branchaud
@ 2014-10-17 12:47               ` Jakub Narębski
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Narębski @ 2014-10-17 12:47 UTC (permalink / raw)
  To: Marc Branchaud, Junio C Hamano; +Cc: git

On 2014-10-16 17:26, Marc Branchaud wrote:

> I just don't see what difference the --borrow option makes.  Consider the two
> cases:
>
> With just --reference=/local/pool/linux.git:
> 	1. Set up the alternates file with that path.

         x. Fetch object from origin not present in pool

> 	2. Copy gko's refs into refs/remotes/origin/.
> 	3. Set up refs/heads/master to refer to gko's HEAD.
> 	4. Checkout refs/heads/master (uses objects from local pool).
>
> With both that --reference and --borrow=../my/neighbour/linux-hack.git:
> 	1. Set up the alternates file with both paths.

         x. Fetch objects from origin not present in either pool
            or neighbour repo ("have" pool and neighbour)

> 	2. Copy gko's refs into refs/remotes/origin/.
> 	3. Set up refs/heads/master to refer to gko's HEAD.
> 	4. Checkout refs/heads/master (uses objects from local pool).
> 	5. Disassociate ourselves from the neighbour repo.
            which means roughly:

            5.1. Remove neighbour repo from alternates
            5.2. Fetch required objects from neighbour repo
                 ("want" neighbour, have ???)

It is possible that because of technical limitations --reference and 
--borrow / dissociate / --temporary-reference / --object-cache are to be
mutually exclusive.

> In both cases the first four actions have no need of the neighbour repo.  The
> second case's fifth action surgically removes the neighbour as an alternate
> object store, and we're left with the same clone we got in the first case.
> What was the point?

You are missing fetching object from your list of actions.

> It seems that in order to make something like --borrow useful, "git clone"
> would somehow need to know which of the neighbour's refs you want to *also*
> clone, then copy any unique objects from the neighbour before disassociating
> from it.
>
> 		M.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-10-17 12:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-14 19:57 [PATCH] clone: --dissociate option to mark that reference is only temporary Junio C Hamano
2014-10-15 14:34 ` Marc Branchaud
2014-10-15 17:29   ` Junio C Hamano
2014-10-15 20:51     ` Marc Branchaud
2014-10-15 21:33       ` Junio C Hamano
2014-10-15 21:44         ` Marc Branchaud
2014-10-15 21:50           ` Junio C Hamano
2014-10-16 15:26             ` Marc Branchaud
2014-10-17 12:47               ` Jakub Narębski
2014-10-16 19:27     ` Junio C Hamano
2014-10-15 19:44 ` Johannes Sixt
2014-10-15 21:33   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).