* Tracking a repository for content instead of history
@ 2006-12-12 12:35 Andy Parkins
2006-12-12 13:04 ` Jakub Narebski
0 siblings, 1 reply; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 12:35 UTC (permalink / raw)
To: git
Hello,
For interests sake I'd like to track the kernel.org linux repository.
However, I'm not that bothered about tracking the history - it's more that I
like to have the latest kernel release lying around.
Is there a way that I could just pull individual commits from a git
repository? In particular - could I make a repository (obviously not a
clone, because it wouldn't have all the history) that contained only the
tagged commits from an upstream repository?
Is it even sensible to want that? It strikes me that it's possible that there
isn't that much space/bandwidth saving to be made. Should I just clone the
repository and shut up? :-)
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 12:35 Tracking a repository for content instead of history Andy Parkins
@ 2006-12-12 13:04 ` Jakub Narebski
2006-12-12 13:26 ` Andy Parkins
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2006-12-12 13:04 UTC (permalink / raw)
To: git
Andy Parkins wrote:
> For interests sake I'd like to track the kernel.org linux repository.
> However, I'm not that bothered about tracking the history - it's more that I
> like to have the latest kernel release lying around.
>
> Is there a way that I could just pull individual commits from a git
> repository? In particular - could I make a repository (obviously not a
> clone, because it wouldn't have all the history) that contained only the
> tagged commits from an upstream repository?
As of beta (in 'next') you can do 'shallow clone'm i.e. clone/fetch
only N commits depth history.
> Is it even sensible to want that? It strikes me that it's possible that there
> isn't that much space/bandwidth saving to be made. Should I just clone the
> repository and shut up? :-)
I've had similar idea: search for "sparse clone" keyword. But no code.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 13:04 ` Jakub Narebski
@ 2006-12-12 13:26 ` Andy Parkins
2006-12-12 14:28 ` Johannes Schindelin
0 siblings, 1 reply; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 13:26 UTC (permalink / raw)
To: git
On Tuesday 2006 December 12 13:04, Jakub Narebski wrote:
> > Is it even sensible to want that? It strikes me that it's possible that
> > there isn't that much space/bandwidth saving to be made. Should I just
> > clone the repository and shut up? :-)
>
> I've had similar idea: search for "sparse clone" keyword. But no code.
While the functionality might not be built into git in terms of clone, would
there be a way to pull a particular commit from another repository?
The way I would do it given nothing else is to simply extract snapshots into a
working directory; and create a repository from scratch. I was just
wondering if a method existed that could reduce the size of the download.
I think the best way is going to be to use the patches published at kernel.org
and apply them one at a time with git-apply.
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 13:26 ` Andy Parkins
@ 2006-12-12 14:28 ` Johannes Schindelin
2006-12-12 15:38 ` Andy Parkins
0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 14:28 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Hi,
On Tue, 12 Dec 2006, Andy Parkins wrote:
> The way I would do it given nothing else is to simply extract snapshots
> into a working directory; and create a repository from scratch. I was
> just wondering if a method existed that could reduce the size of the
> download.
You are not by any chance talking about the --remote option to
git-archive?
If you want to reduce the number of objects to be downloaded, by telling
the other side what you have, you literally end up with something like
shallow clone: the other side _has_ to support it.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 14:28 ` Johannes Schindelin
@ 2006-12-12 15:38 ` Andy Parkins
2006-12-12 16:24 ` Johannes Schindelin
2006-12-12 21:46 ` Nguyen Thai Ngoc Duy
0 siblings, 2 replies; 10+ messages in thread
From: Andy Parkins @ 2006-12-12 15:38 UTC (permalink / raw)
To: git
On Tuesday 2006 December 12 14:28, Johannes Schindelin wrote:
> You are not by any chance talking about the --remote option to
> git-archive?
I wasn't; but that's certainly a helpful switch. It's certainly a huge help.
> If you want to reduce the number of objects to be downloaded, by telling
> the other side what you have, you literally end up with something like
> shallow clone: the other side _has_ to support it.
I suppose so; but I was thinking more an automated way of getting the data
that is supplied for the kernel anyway. So:
base-v1.0.0.tar.gz
patch-v1.0.1.gz
patch-v1.0.2.gz
etc
Each patch is obviously smaller than "base". Git could easily make the
patches, and each of those patches could be fed by hand into a repository
with git-apply. It doesn't seem like something that would require support on
the other side, because it isn't so much a shallow clone (which /would/
preserve history, making it available if wanted); it is pulling just, say,
tagged commits out of an existing repository.
Given a list of tags it is almost:
git-archive <get me base>
ssh remote git-diff v1.0.0..v1.0.1 | git-apply; git commit
ssh remote git-diff v1.0.1..v1.0.2 | git-apply; git commit
If that makes sense? Obviously though it would be possible to use git rather
than ssh to do this.
However... please don't waste any more time thinking about this; it's not a
problem I have that needs a solution - it was more a "because I'm curious"
sort of question.
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 15:38 ` Andy Parkins
@ 2006-12-12 16:24 ` Johannes Schindelin
2006-12-12 16:35 ` Johannes Schindelin
2006-12-12 21:46 ` Nguyen Thai Ngoc Duy
1 sibling, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 16:24 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Hi,
On Tue, 12 Dec 2006, Andy Parkins wrote:
> On Tuesday 2006 December 12 14:28, Johannes Schindelin wrote:
>
> > You are not by any chance talking about the --remote option to
> > git-archive?
>
> I wasn't; but that's certainly a helpful switch. It's certainly a huge
> help.
>
> > If you want to reduce the number of objects to be downloaded, by telling
> > the other side what you have, you literally end up with something like
> > shallow clone: the other side _has_ to support it.
>
> I suppose so; but I was thinking more an automated way of getting the data
> that is supplied for the kernel anyway. So:
>
> base-v1.0.0.tar.gz
> patch-v1.0.1.gz
> patch-v1.0.2.gz
> etc
>
> Each patch is obviously smaller than "base". Git could easily make the
> patches, and each of those patches could be fed by hand into a repository
> with git-apply.
If it weren't for the recent discussion of kernel.org being overloaded
with gitweb processes, I'd just write down a hint like
http://repo.or.cz/w/git/jnareb-git.git?a=commitdiff_plain;h=next;hp=master
But since kernel.org is overloaded, I will not do that.
Ciao,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 16:24 ` Johannes Schindelin
@ 2006-12-12 16:35 ` Johannes Schindelin
0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 16:35 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Hi,
On Tue, 12 Dec 2006, Johannes Schindelin wrote:
> If it weren't for the recent discussion of kernel.org being overloaded
> with gitweb processes, I'd just write down a hint like
> [URL edited out]
>
> But since kernel.org is overloaded, I will not do that.
Side note: it would probably not help you. The diff is uncompressed, and
thus likely _substantially larger_ than getting the snapshot via gitweb,
which _is_ compressed.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 15:38 ` Andy Parkins
2006-12-12 16:24 ` Johannes Schindelin
@ 2006-12-12 21:46 ` Nguyen Thai Ngoc Duy
2006-12-12 21:48 ` Nguyen Thai Ngoc Duy
1 sibling, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2006-12-12 21:46 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
On 12/12/06, Andy Parkins <andyparkins@gmail.com> wrote:
> I suppose so; but I was thinking more an automated way of getting the data
> that is supplied for the kernel anyway. So:
>
> base-v1.0.0.tar.gz
> patch-v1.0.1.gz
> patch-v1.0.2.gz
> etc
>
> Each patch is obviously smaller than "base". Git could easily make the
> patches, and each of those patches could be fed by hand into a repository
> with git-apply. It doesn't seem like something that would require support on
> the other side, because it isn't so much a shallow clone (which /would/
> preserve history, making it available if wanted); it is pulling just, say,
> tagged commits out of an existing repository.
>
> Given a list of tags it is almost:
>
> git-archive <get me base>
> ssh remote git-diff v1.0.0..v1.0.1 | git-apply; git commit
> ssh remote git-diff v1.0.1..v1.0.2 | git-apply; git commit
>
> If that makes sense? Obviously though it would be possible to use git rather
> than ssh to do this.
Hm.. I'm no git:// expert. But is it possible doing as follow?
1. git-archive <base>
2. reconstruct commit, blobs and trees from the archive
3. tell git server that you have one commit, you need another commit
(maybe heads only, i'm not sure here)
4. get the pack from git server, create new commit and a diff
--
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 21:46 ` Nguyen Thai Ngoc Duy
@ 2006-12-12 21:48 ` Nguyen Thai Ngoc Duy
2006-12-12 22:25 ` Johannes Schindelin
0 siblings, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2006-12-12 21:48 UTC (permalink / raw)
To: git
On 12/13/06, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> Hm.. I'm no git:// expert. But is it possible doing as follow?
> 1. git-archive <base>
> 2. reconstruct commit, blobs and trees from the archive
> 3. tell git server that you have one commit, you need another commit
> (maybe heads only, i'm not sure here)
> 4. get the pack from git server, create new commit and a diff
Ok. Stupid idea. The pack may base on objects that I don't have.
--
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Tracking a repository for content instead of history
2006-12-12 21:48 ` Nguyen Thai Ngoc Duy
@ 2006-12-12 22:25 ` Johannes Schindelin
0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2006-12-12 22:25 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: git
Hi,
On Wed, 13 Dec 2006, Nguyen Thai Ngoc Duy wrote:
> On 12/13/06, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > Hm.. I'm no git:// expert. But is it possible doing as follow?
> > 1. git-archive <base>
> > 2. reconstruct commit, blobs and trees from the archive
> > 3. tell git server that you have one commit, you need another commit
> > (maybe heads only, i'm not sure here)
> > 4. get the pack from git server, create new commit and a diff
>
> Ok. Stupid idea. The pack may base on objects that I don't have.
The only not-so-brilliant idea is to reconstruct the commit from the
archive. This is not possible, as not only some author and
committer metadata is not reconstructable, but worse: the parents' hash is
not either. And since all these are hashed to get the commit hash, you
lost.
However, it could work like this:
- reconstruct tree commit
- ask for a diff between a certain commit, with respect to your tree
It might even be easy to convince git-upload-pack to construct a thin pack
containing deltas _only_ against objects which are reachable from your
tree.
Note: this is feasible, but not necessarily sensible:
- it puts more strain on the server, which otherwise could probably reuse
a lot of deltas, and
- it contradicts the idea of _distributed_ development (for example, you
could not tell which HEAD commit is newer when you fetched from two
repos).
Probably, you could add a third argument: merges are not necessarily
_possible_ with that setup. Note that this argument applies to shallow
clones, too!
Ciao,
Dscho
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-12-12 22:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-12 12:35 Tracking a repository for content instead of history Andy Parkins
2006-12-12 13:04 ` Jakub Narebski
2006-12-12 13:26 ` Andy Parkins
2006-12-12 14:28 ` Johannes Schindelin
2006-12-12 15:38 ` Andy Parkins
2006-12-12 16:24 ` Johannes Schindelin
2006-12-12 16:35 ` Johannes Schindelin
2006-12-12 21:46 ` Nguyen Thai Ngoc Duy
2006-12-12 21:48 ` Nguyen Thai Ngoc Duy
2006-12-12 22:25 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).