git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git clone downloads objects that are in GIT_OBJECT_DIRECTORY
@ 2006-03-06  1:08 Benjamin LaHaise
  2006-03-06  1:42 ` Shawn Pearce
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin LaHaise @ 2006-03-06  1:08 UTC (permalink / raw)
  To: git

Hi folks,

Doing a fresh git clone git://some.git.url/ foo seems to download the 
entire remote repository even if all the objects are already stored in 
GIT_OBJECT_DIRECTORY=/home/bcrl/.git/object .  Is this a known bug?  
At 100MB for a kernel, this takes a *long* time.

		-ben (who needed to free up disk space)
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git clone downloads objects that are in GIT_OBJECT_DIRECTORY
  2006-03-06  1:08 git clone downloads objects that are in GIT_OBJECT_DIRECTORY Benjamin LaHaise
@ 2006-03-06  1:42 ` Shawn Pearce
  2006-03-06  2:34   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn Pearce @ 2006-03-06  1:42 UTC (permalink / raw)
  To: git

Benjamin LaHaise <bcrl@kvack.org> wrote:
> Hi folks,
> 
> Doing a fresh git clone git://some.git.url/ foo seems to download the 
> entire remote repository even if all the objects are already stored in 
> GIT_OBJECT_DIRECTORY=/home/bcrl/.git/object .  Is this a known bug?  
> At 100MB for a kernel, this takes a *long* time.

I believe it is a known missing feature.  :-) git-clone doesn't
prep HEAD to have some sort of starting point so the pull it uses
to download everything literally downloads everything as nothing
is in common.

One could work around it by running git-init-db to create the new
clone locally, git-update-ref HEAD to some commit which you have in
common with the remote, create a origin file, then perform a git-pull.
This would only download the objects between the commit you put into
HEAD and the current master of the remote...  But that is actually
some work.

I think Cogito's clone is capable of restarting a failed clone; I
wonder if that logic would benefit you here?

Is using a common GIT_OBJECT_DIRECTORY across many clones actually
pretty common?  Maybe its time that git-clone gets some more smarts
with regards to what it yanks from the origin.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git clone downloads objects that are in GIT_OBJECT_DIRECTORY
  2006-03-06  1:42 ` Shawn Pearce
@ 2006-03-06  2:34   ` Junio C Hamano
  2006-03-06  2:57     ` Shawn Pearce
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-03-06  2:34 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git

Shawn Pearce <spearce@spearce.org> writes:

> Benjamin LaHaise <bcrl@kvack.org> wrote:
>> Hi folks,
>> 
>> Doing a fresh git clone git://some.git.url/ foo seems to download the 
>> entire remote repository even if all the objects are already stored in 
>> GIT_OBJECT_DIRECTORY=/home/bcrl/.git/object .  Is this a known bug?  
>> At 100MB for a kernel, this takes a *long* time.
>
> I believe it is a known missing feature.  :-) git-clone doesn't
> prep HEAD to have some sort of starting point so the pull it uses
> to download everything literally downloads everything as nothing
> is in common.

You would first 'clone -l -s' from your local repository and
then clone into that from whatever remote, perhaps.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git clone downloads objects that are in GIT_OBJECT_DIRECTORY
  2006-03-06  2:34   ` Junio C Hamano
@ 2006-03-06  2:57     ` Shawn Pearce
       [not found]       ` <20060305223115.37c1a734.seanlkml@sympatico.ca>
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn Pearce @ 2006-03-06  2:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > Benjamin LaHaise <bcrl@kvack.org> wrote:
> >> Hi folks,
> >> 
> >> Doing a fresh git clone git://some.git.url/ foo seems to download the 
> >> entire remote repository even if all the objects are already stored in 
> >> GIT_OBJECT_DIRECTORY=/home/bcrl/.git/object .  Is this a known bug?  
> >> At 100MB for a kernel, this takes a *long* time.
> >
> > I believe it is a known missing feature.  :-) git-clone doesn't
> > prep HEAD to have some sort of starting point so the pull it uses
> > to download everything literally downloads everything as nothing
> > is in common.
> 
> You would first 'clone -l -s' from your local repository and
> then clone into that from whatever remote, perhaps.

Yea but that's about as much fun as creating a bare repository
by hand.  (Which I've been doing up until this thread prompted me
to read git-clone.sh and learn the existance of --bare.)

It might be nicer if the user could place a list of locally (here
locally being possibly remote but closer network-wise) available
repositories which should be considered as sources for faster
cloning.  When cloning a remote repository git-clone would try to
examine each of the designated repositories to see if any of them
have commits in common with the remote; if so clone off that and
then pull from the remote, but designating the remote as `origin'.

This could be ugly if you have a large number of locally available
candidates or if the candidates are many (e.g. 1000s) commits
behind the remote being cloned.  But it would save the user from
pulling down 100+MB of objects they already have while making it
very convient to establish a new repository+working directory based
on someone else's publically available repository.


Or we could just tell the user to create their own clone script,
e.g. kernel-clone:

	#!/bin/sh
	git-clone -l -n -s ~/kernel-base "$2" &&
	cd "$2" &&
	echo "URL: $1" >.git/remotes/origin &&
	echo "Pull: master:origin" >>.git/remotes/origin &&
	git-pull


But it would be better if it was more integrated, and somehow
slightly more automatic...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git clone downloads objects that are in GIT_OBJECT_DIRECTORY
       [not found]       ` <20060305223115.37c1a734.seanlkml@sympatico.ca>
@ 2006-03-06  3:31         ` sean
  2006-03-06  9:20           ` Johannes Schindelin
  0 siblings, 1 reply; 6+ messages in thread
From: sean @ 2006-03-06  3:31 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: junkio, git

On Sun, 5 Mar 2006 21:57:02 -0500
Shawn Pearce <spearce@spearce.org> wrote:

> It might be nicer if the user could place a list of locally (here
> locally being possibly remote but closer network-wise) available
> repositories which should be considered as sources for faster
> cloning.  When cloning a remote repository git-clone would try to
> examine each of the designated repositories to see if any of them
> have commits in common with the remote; if so clone off that and
> then pull from the remote, but designating the remote as `origin'.

It is already easy to start from a similar repo (eg. locally cloned)
if you wish to conserve bandwidth.

However, it might be nice to have a command that allows you to 
change origin information for a repo without needing to know git
internals; maybe something like:

$ git set-origin <URL>

Or maybe better:

$ git set-remote --pull master:origin origin <URL>

Sean

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git clone downloads objects that are in GIT_OBJECT_DIRECTORY
  2006-03-06  3:31         ` sean
@ 2006-03-06  9:20           ` Johannes Schindelin
  0 siblings, 0 replies; 6+ messages in thread
From: Johannes Schindelin @ 2006-03-06  9:20 UTC (permalink / raw)
  To: sean; +Cc: Shawn Pearce, junkio, git

Hi,

On Sun, 5 Mar 2006, sean wrote:

> However, it might be nice to have a command that allows you to 
> change origin information for a repo without needing to know git
> internals; maybe something like:
> 
> $ git set-origin <URL>
> 
> Or maybe better:
> 
> $ git set-remote --pull master:origin origin <URL>

FWIW, I once sent patches to make this easier by placing this information 
into the config file, but for reasons I did not understand, they were 
rejected. Sigh!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-03-06  9:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-06  1:08 git clone downloads objects that are in GIT_OBJECT_DIRECTORY Benjamin LaHaise
2006-03-06  1:42 ` Shawn Pearce
2006-03-06  2:34   ` Junio C Hamano
2006-03-06  2:57     ` Shawn Pearce
     [not found]       ` <20060305223115.37c1a734.seanlkml@sympatico.ca>
2006-03-06  3:31         ` sean
2006-03-06  9:20           ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).