Tracking a repository for content instead of history

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Tracking a repository for content instead of history
@ 2006-12-12 12:35 Andy Parkins
  2006-12-12 13:04 ` Jakub Narebski
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 12:35 UTC (permalink / raw)
  To: git

Hello,

For interests sake I'd like to track the kernel.org linux repository.  
However, I'm not that bothered about tracking the history - it's more that I 
like to have the latest kernel release lying around.

Is there a way that I could just pull individual commits from a git 
repository?  In particular - could I make a repository (obviously not a 
clone, because it wouldn't have all the history) that contained only the 
tagged commits from an upstream repository?

Is it even sensible to want that?  It strikes me that it's possible that there 
isn't that much space/bandwidth saving to be made.  Should I just clone the 
repository and shut up?  :-)

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 12:35 Tracking a repository for content instead of history Andy Parkins
@ 2006-12-12 13:04 ` Jakub Narebski
  2006-12-12 13:26   ` Andy Parkins
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2006-12-12 13:04 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

> For interests sake I'd like to track the kernel.org linux repository.  
> However, I'm not that bothered about tracking the history - it's more that I 
> like to have the latest kernel release lying around.
> 
> Is there a way that I could just pull individual commits from a git 
> repository?  In particular - could I make a repository (obviously not a 
> clone, because it wouldn't have all the history) that contained only the 
> tagged commits from an upstream repository?

As of beta (in 'next') you can do 'shallow clone'm i.e. clone/fetch
only N commits depth history.
 
> Is it even sensible to want that?  It strikes me that it's possible that there 
> isn't that much space/bandwidth saving to be made.  Should I just clone the 
> repository and shut up?  :-)

I've had similar idea: search for "sparse clone" keyword. But no code.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 13:04 ` Jakub Narebski
@ 2006-12-12 13:26   ` Andy Parkins
  2006-12-12 14:28     ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 13:26 UTC (permalink / raw)
  To: git

On Tuesday 2006 December 12 13:04, Jakub Narebski wrote:

> > Is it even sensible to want that?  It strikes me that it's possible that
> > there isn't that much space/bandwidth saving to be made.  Should I just
> > clone the repository and shut up?  :-)
>
> I've had similar idea: search for "sparse clone" keyword. But no code.

While the functionality might not be built into git in terms of clone, would 
there be a way to pull a particular commit from another repository? 

The way I would do it given nothing else is to simply extract snapshots into a 
working directory; and create a repository from scratch.  I was just 
wondering if a method existed that could reduce the size of the download.

I think the best way is going to be to use the patches published at kernel.org 
and apply them one at a time with git-apply.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 13:26   ` Andy Parkins
@ 2006-12-12 14:28     ` Johannes Schindelin
  2006-12-12 15:38       ` Andy Parkins
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 14:28 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Hi,

On Tue, 12 Dec 2006, Andy Parkins wrote:

> The way I would do it given nothing else is to simply extract snapshots 
> into a working directory; and create a repository from scratch.  I was 
> just wondering if a method existed that could reduce the size of the 
> download.

You are not by any chance talking about the --remote option to 
git-archive?

If you want to reduce the number of objects to be downloaded, by telling 
the other side what you have, you literally end up with something like 
shallow clone: the other side _has_ to support it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 14:28     ` Johannes Schindelin
@ 2006-12-12 15:38       ` Andy Parkins
  2006-12-12 16:24         ` Johannes Schindelin
  2006-12-12 21:46         ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 15:38 UTC (permalink / raw)
  To: git

On Tuesday 2006 December 12 14:28, Johannes Schindelin wrote:

> You are not by any chance talking about the --remote option to
> git-archive?

I wasn't; but that's certainly a helpful switch.  It's certainly a huge help.

> If you want to reduce the number of objects to be downloaded, by telling
> the other side what you have, you literally end up with something like
> shallow clone: the other side _has_ to support it.

I suppose so; but I was thinking more an automated way of getting the data 
that is supplied for the kernel anyway.  So:

base-v1.0.0.tar.gz
patch-v1.0.1.gz
patch-v1.0.2.gz
etc

Each patch is obviously smaller than "base".  Git could easily make the 
patches, and each of those patches could be fed by hand into a repository 
with git-apply.  It doesn't seem like something that would require support on 
the other side, because it isn't so much a shallow clone (which /would/ 
preserve history, making it available if wanted); it is pulling just, say, 
tagged commits out of an existing repository.

Given a list of tags it is almost:

git-archive <get me base>
ssh remote git-diff v1.0.0..v1.0.1 | git-apply; git commit
ssh remote git-diff v1.0.1..v1.0.2 | git-apply; git commit

If that makes sense?  Obviously though it would be possible to use git rather 
than ssh to do this.

However... please don't waste any more time thinking about this; it's not a 
problem I have that needs a solution - it was more a "because I'm curious" 
sort of question.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 15:38       ` Andy Parkins
@ 2006-12-12 16:24         ` Johannes Schindelin
  2006-12-12 16:35           ` Johannes Schindelin
  2006-12-12 21:46         ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 16:24 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Hi,

On Tue, 12 Dec 2006, Andy Parkins wrote:

> On Tuesday 2006 December 12 14:28, Johannes Schindelin wrote:
> 
> > You are not by any chance talking about the --remote option to
> > git-archive?
> 
> I wasn't; but that's certainly a helpful switch.  It's certainly a huge 
> help.
> 
> > If you want to reduce the number of objects to be downloaded, by telling
> > the other side what you have, you literally end up with something like
> > shallow clone: the other side _has_ to support it.
> 
> I suppose so; but I was thinking more an automated way of getting the data 
> that is supplied for the kernel anyway.  So:
> 
> base-v1.0.0.tar.gz
> patch-v1.0.1.gz
> patch-v1.0.2.gz
> etc
> 
> Each patch is obviously smaller than "base".  Git could easily make the 
> patches, and each of those patches could be fed by hand into a repository 
> with git-apply.

If it weren't for the recent discussion of kernel.org being overloaded 
with gitweb processes, I'd just write down a hint like 
http://repo.or.cz/w/git/jnareb-git.git?a=commitdiff_plain;h=next;hp=master

But since kernel.org is overloaded, I will not do that.

Ciao,

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 16:24         ` Johannes Schindelin
@ 2006-12-12 16:35           ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 16:35 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Hi,

On Tue, 12 Dec 2006, Johannes Schindelin wrote:

> If it weren't for the recent discussion of kernel.org being overloaded 
> with gitweb processes, I'd just write down a hint like 
> [URL edited out]
> 
> But since kernel.org is overloaded, I will not do that.

Side note: it would probably not help you. The diff is uncompressed, and 
thus likely _substantially larger_ than getting the snapshot via gitweb, 
which _is_ compressed.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 15:38       ` Andy Parkins
  2006-12-12 16:24         ` Johannes Schindelin
@ 2006-12-12 21:46         ` Nguyen Thai Ngoc Duy
  2006-12-12 21:48           ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2006-12-12 21:46 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

On 12/12/06, Andy Parkins <andyparkins@gmail.com> wrote:
> I suppose so; but I was thinking more an automated way of getting the data
> that is supplied for the kernel anyway.  So:
>
> base-v1.0.0.tar.gz
> patch-v1.0.1.gz
> patch-v1.0.2.gz
> etc
>
> Each patch is obviously smaller than "base".  Git could easily make the
> patches, and each of those patches could be fed by hand into a repository
> with git-apply.  It doesn't seem like something that would require support on
> the other side, because it isn't so much a shallow clone (which /would/
> preserve history, making it available if wanted); it is pulling just, say,
> tagged commits out of an existing repository.
>
> Given a list of tags it is almost:
>
> git-archive <get me base>
> ssh remote git-diff v1.0.0..v1.0.1 | git-apply; git commit
> ssh remote git-diff v1.0.1..v1.0.2 | git-apply; git commit
>
> If that makes sense?  Obviously though it would be possible to use git rather
> than ssh to do this.

Hm.. I'm no git:// expert. But is it possible doing as follow?
1. git-archive <base>
2. reconstruct commit, blobs and trees from the archive
3. tell git server that you have one commit, you need another commit
(maybe heads only, i'm not sure here)
4. get the pack from git server, create new commit and a diff
-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 21:46         ` Nguyen Thai Ngoc Duy
@ 2006-12-12 21:48           ` Nguyen Thai Ngoc Duy
  2006-12-12 22:25             ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2006-12-12 21:48 UTC (permalink / raw)
  To: git

On 12/13/06, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> Hm.. I'm no git:// expert. But is it possible doing as follow?
> 1. git-archive <base>
> 2. reconstruct commit, blobs and trees from the archive
> 3. tell git server that you have one commit, you need another commit
> (maybe heads only, i'm not sure here)
> 4. get the pack from git server, create new commit and a diff

Ok. Stupid idea. The pack may base on objects that I don't have.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Tracking a repository for content instead of history
  2006-12-12 21:48           ` Nguyen Thai Ngoc Duy
@ 2006-12-12 22:25             ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 22:25 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git

Hi,

On Wed, 13 Dec 2006, Nguyen Thai Ngoc Duy wrote:

> On 12/13/06, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > Hm.. I'm no git:// expert. But is it possible doing as follow?
> > 1. git-archive <base>
> > 2. reconstruct commit, blobs and trees from the archive
> > 3. tell git server that you have one commit, you need another commit
> > (maybe heads only, i'm not sure here)
> > 4. get the pack from git server, create new commit and a diff
> 
> Ok. Stupid idea. The pack may base on objects that I don't have.

The only not-so-brilliant idea is to reconstruct the commit from the 
archive. This is not possible, as not only some author and 
committer metadata is not reconstructable, but worse: the parents' hash is 
not either. And since all these are hashed to get the commit hash, you 
lost.

However, it could work like this:

- reconstruct tree commit
- ask for a diff between a certain commit, with respect to your tree

It might even be easy to convince git-upload-pack to construct a thin pack 
containing deltas _only_ against objects which are reachable from your 
tree.

Note: this is feasible, but not necessarily sensible:

- it puts more strain on the server, which otherwise could probably reuse 
a lot of deltas, and
- it contradicts the idea of _distributed_ development (for example, you 
could not tell which HEAD commit is newer when you fetched from two 
repos).

Probably, you could add a third argument: merges are not necessarily 
_possible_ with that setup. Note that this argument applies to shallow 
clones, too!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-12-12 22:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-12 12:35 Tracking a repository for content instead of history Andy Parkins
2006-12-12 13:04 ` Jakub Narebski
2006-12-12 13:26   ` Andy Parkins
2006-12-12 14:28     ` Johannes Schindelin
2006-12-12 15:38       ` Andy Parkins
2006-12-12 16:24         ` Johannes Schindelin
2006-12-12 16:35           ` Johannes Schindelin
2006-12-12 21:46         ` Nguyen Thai Ngoc Duy
2006-12-12 21:48           ` Nguyen Thai Ngoc Duy
2006-12-12 22:25             ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).