"git-send-pack"

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* "git-send-pack"
@ 2005-06-30 17:54 Linus Torvalds
  2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
                   ` (4 more replies)
  0 siblings, 5 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 17:54 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Daniel Barkalow, Junio C Hamano, ftpadmin

Ok,
 I'm happy to say that the first cut of my new packed-object-sending thing 
seems to work. I have successfully sent updates both locally and over ssh, 
and it seems to work fine, although it has some limitations.

The syntax is very simple indeed:

	git-send-pack destination

will go to the destination (which can be either a local directory or a
remote ssh one, with the remote destination format currently being _only_
the "machine:path" format), and it will go through all the refs in the 
remote destination, compare them with the local ones, and create a pack 
that updates from one to the other.

If the pack/unpack sequence is successful, it then updates the refs at the 
other end, and is done.

My quick tests were very successful, in the sense that it even performed
really well. But I only tested some small updates.

Anyway, what are the limitations? Here's a few obvious ones:

 - the code actually contains support for limiting the refs to be updated
   on the remote end, but I don't actually pass the arguments to the 
   remote git-receive-pack binary yet, so this is currently not 
   functional. Call me lazy.

 - the thing currently refuses to create new refs. Again, this is mainly 
   just me being lazy: it should be easy to add support for creating a new 
   branch, it just requires some care to make sure that we take the old 
   branches into account when generating the pack-file so that we don't 
   send too many objects over. 

 - I really hate how "ssh" apparently cannot be told to have alternate 
   paths. For example, on master.kernel.org, I don't control the setup, so 
   I can't install my own git binaries anywhere except in my ~/bin
   directory, but I also cannot get ssh to accept that that is a valid 
   path. This one really bums me out, and I think it's an ssh deficiency. 

   You apparently have to compile in the paths at compile-time into sshd, 
   and PermitUserEnvironment is disabled by default (not that it even 
   seems to work for the PATH environment, but that may have been my 
   testing that didn't re-start sshd).

   That just sucks.

 - It doesn't update the working directory at the other end. This is fine 
   for what it's intended for (pushing to a central "raw" git archives), 
   so this could be considered a feature, but it's worth pointing out. 
   Only a "pull" will update your working directory, and this pack sending 
   really is meant to be used in a kind of "push to central archive" way.

 - this is also (at least once we've tested it a lot more and added the
   code to allow it to create new refs on the remote side) meant to be a
   good way to mirror things out, since clearly rsync isn't scaling. 

   However, I don't know what the rules for acceptable mirroring 
   approaches are, and it's entirely possible (nay, probable) that an ssh
   connection from the "master" ain't it. It would be good to know what 
   (of any) would be acceptable solutions..

Anyway, please do give it a test. I think I'll use this to sync up to
kernel.org, except I _really_ would want to solve that ssh issue some 
other way than hardcoding the /home/torvalds/bin/ path in my local 
copies.. If somebody knows a good solution, pls holler.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 17:54 "git-send-pack" Linus Torvalds
@ 2005-06-30 18:24 ` A Large Angry SCM
  2005-06-30 18:27   ` "git-send-pack" A Large Angry SCM
  2005-06-30 19:04   ` "git-send-pack" Linus Torvalds
  2005-06-30 18:45 ` "git-send-pack" Jan Harkes
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 86+ messages in thread
From: A Large Angry SCM @ 2005-06-30 18:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Have you tried something like the following?

ssh torvalds@master.kernel.org \
	'/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"'

Linus Torvalds wrote:
> 
...
 >
> Anyway, please do give it a test. I think I'll use this to sync up to
> kernel.org, except I _really_ would want to solve that ssh issue some 
> other way than hardcoding the /home/torvalds/bin/ path in my local 
> copies.. If somebody knows a good solution, pls holler.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
@ 2005-06-30 18:27   ` A Large Angry SCM
  2005-06-30 19:04   ` "git-send-pack" Linus Torvalds
  1 sibling, 0 replies; 86+ messages in thread
From: A Large Angry SCM @ 2005-06-30 18:27 UTC (permalink / raw)
  To: gitzilla; +Cc: Linus Torvalds, Git Mailing List

Damn! That should have been:

ssh torvalds@master.kernel.org \
	'/bin/sh -c "export PATH=~/tmp/foo:$PATH ; env"'

A Large Angry SCM wrote:
> Have you tried something like the following?
> 
> ssh torvalds@master.kernel.org \
>     '/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"'
> 
> Linus Torvalds wrote:
>>
> ...
>  >
>> Anyway, please do give it a test. I think I'll use this to sync up to
>> kernel.org, except I _really_ would want to solve that ssh issue some 
>> other way than hardcoding the /home/torvalds/bin/ path in my local 
>> copies.. If somebody knows a good solution, pls holler.
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
  2005-06-30 18:27   ` "git-send-pack" A Large Angry SCM
@ 2005-06-30 19:04   ` Linus Torvalds
  1 sibling, 0 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 19:04 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Git Mailing List

On Thu, 30 Jun 2005, A Large Angry SCM wrote:
>
> Have you tried something like the following?
> 
> ssh torvalds@master.kernel.org \
> 	'/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"'

The point is that the user does not call "ssh" itself, but git-send-pack 
does it automatically.

And that means that git-send-pack will always do the same thing, for any
host it is given. If one host needs a special PATH, that's an effing pain.

However, Kees Cook points out that it's driver error: I set up my PATH in
.bash_profile, and if I just do it in .bashrc instead it all works.

Danke,

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 17:54 "git-send-pack" Linus Torvalds
  2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
@ 2005-06-30 18:45 ` Jan Harkes
  2005-06-30 19:01 ` "git-send-pack" Mike Taht
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 86+ messages in thread
From: Jan Harkes @ 2005-06-30 18:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Thu, Jun 30, 2005 at 10:54:48AM -0700, Linus Torvalds wrote:
> Anyway, please do give it a test. I think I'll use this to sync up to
> kernel.org, except I _really_ would want to solve that ssh issue some 
> other way than hardcoding the /home/torvalds/bin/ path in my local 
> copies.. If somebody knows a good solution, pls holler.

I've got a couple of 'export FOO=bar' lines in ~/.bashrc on the
"remote-side" and it looks like they are set correctly when
I do something like "ssh remote.host env".

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 17:54 "git-send-pack" Linus Torvalds
  2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
  2005-06-30 18:45 ` "git-send-pack" Jan Harkes
@ 2005-06-30 19:01 ` Mike Taht
  2005-06-30 19:42   ` "git-send-pack" Linus Torvalds
  2005-06-30 19:44 ` "git-send-pack" Linus Torvalds
  2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow
  4 siblings, 1 reply; 86+ messages in thread
From: Mike Taht @ 2005-06-30 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Git Mailing List, Daniel Barkalow, Junio C Hamano, ftpadmin


>    However, I don't know what the rules for acceptable mirroring 
>    approaches are, and it's entirely possible (nay, probable) that an ssh
>    connection from the "master" ain't it. It would be good to know what 
>    (of any) would be acceptable solutions..

Flute, perhaps

http://www.atm.tut.fi/mad/

or fcast

http://www.inrialpes.fr/planete/people/roca/mcl/mcl.html

> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 19:01 ` "git-send-pack" Mike Taht
@ 2005-06-30 19:42   ` Linus Torvalds
  2005-07-01  9:50     ` "git-send-pack" Matthias Urlichs
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 19:42 UTC (permalink / raw)
  To: Mike Taht; +Cc: Git Mailing List, Daniel Barkalow, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, Mike Taht wrote:
> 
> >    However, I don't know what the rules for acceptable mirroring 
> >    approaches are, and it's entirely possible (nay, probable) that an ssh
> >    connection from the "master" ain't it. It would be good to know what 
> >    (of any) would be acceptable solutions..
> 
> Flute, perhaps
> 
> http://www.atm.tut.fi/mad/

Well, I was hoping for something that has git knowledge, since there are 
issues like updating objects in the right order. 

So "git-send-pack" is nice in many ways: it allows you to update any 
number of branches (in particular, it allows you to update just a _subset_ 
of the branches, which is nice if you have a shared central repository, 
and some people have write permissions to some branches but not to 
others), but it also allows for efficient unpacking on the receiver side 
in a way no "general-purpose" mirror program can really match.

However, that requires the receiver to run a git-aware unpacker (in this
case git-receive-pack). I'm hoping that would be acceptable, I'm just
wondering what kind of safety concerns I'd need to make sure of in order
to make people comfortable running a special receiver program.

So the current approach is very flexible: if the pusher has ssh access, he
can do it. Safe, secure, and no new security issues. And since the only
programs the receiver has to be able to run is two git programs
(git-receive-pack will run git-unpack-objects), maybe it would be ok to
even have "git-receive-pack" as the shell for the receiver side, so that
you don't actually give the mirrorer any shell access at all. But it's
still "push-based" in the sense that it's kernel.org that is doing the
pushing, and that may simply not be acceptable.

			Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 19:42   ` "git-send-pack" Linus Torvalds
@ 2005-07-01  9:50     ` Matthias Urlichs
  0 siblings, 0 replies; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-01  9:50 UTC (permalink / raw)
  To: git

Hi, Linus Torvalds wrote:

> maybe it would be ok to
> even have "git-receive-pack" as the shell for the receiver side, so that
> you don't actually give the mirrorer any shell access at all.

You can probably just set the remote command (in ~/.ssh/authorized_keys)
to git-receive-pack. That also works around any $PATH issues.

Once this is stable, master.kernel.org should be updated with the
latest git.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
People are never so ready to believe you as when you say things in dispraise
of yourself; and you are never so much annoyed as when they take you at your
word.
					-- Somerset Maugham

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 17:54 "git-send-pack" Linus Torvalds
                   ` (2 preceding siblings ...)
  2005-06-30 19:01 ` "git-send-pack" Mike Taht
@ 2005-06-30 19:44 ` Linus Torvalds
  2005-06-30 20:38   ` "git-send-pack" Junio C Hamano
  2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow
  4 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 19:44 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Daniel Barkalow, Junio C Hamano, ftpadmin



On Thu, 30 Jun 2005, Linus Torvalds wrote:
> 
> Anyway, please do give it a test. I think I'll use this to sync up to
> kernel.org

In fact, the most recent push was gone with a

	git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git

so if the new commit ("Do ref matching on the sender side rather than on 
receiver") shows up after the mirrors have caught up, then this thing is 
officially in production use..

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 19:44 ` "git-send-pack" Linus Torvalds
@ 2005-06-30 20:38   ` Junio C Hamano
  2005-06-30 21:05     ` "git-send-pack" Daniel Barkalow
                       ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: Junio C Hamano @ 2005-06-30 20:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> In fact, the most recent push was gone with a

LT> 	git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git

Congrats for a job well done.

Now is there anything for us poor mortals who would want to have
a "pull" support?  Logging in via ssh and run send-pack on the
other end is workable but not so pretty ;-).

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:38   ` "git-send-pack" Junio C Hamano
@ 2005-06-30 21:05     ` Daniel Barkalow
  2005-06-30 21:29       ` "git-send-pack" Linus Torvalds
  2005-06-30 21:08     ` "git-send-pack" Linus Torvalds
  2005-06-30 21:10     ` "git-send-pack" Dan Holmsand
  2 siblings, 1 reply; 86+ messages in thread
From: Daniel Barkalow @ 2005-06-30 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Thu, 30 Jun 2005, Junio C Hamano wrote:

> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> In fact, the most recent push was gone with a
> 
> LT> 	git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git
> 
> Congrats for a job well done.
> 
> Now is there anything for us poor mortals who would want to have
> a "pull" support?  Logging in via ssh and run send-pack on the
> other end is workable but not so pretty ;-).

I suspect that I'll be able to merge send-pack/receive-pack with
ssh-push/ssh-pull this evening, and then it'll have the feature of not
caring too much which side your command line is on.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:05     ` "git-send-pack" Daniel Barkalow
@ 2005-06-30 21:29       ` Linus Torvalds
  2005-06-30 21:55         ` "git-send-pack" H. Peter Anvin
  2005-06-30 22:25         ` "git-send-pack" Daniel Barkalow
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 21:29 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git

On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> 
> I suspect that I'll be able to merge send-pack/receive-pack with
> ssh-push/ssh-pull this evening, and then it'll have the feature of not
> caring too much which side your command line is on.

The simple thing to do is to just get one commit at a time, see if you 
have it already, parse if it not, and go on to the parents.

That would fit the current git-pull thing, and may be good enough, but it 
has the downside that it can need a _lot_ of back-and-forth fecthing of 
commit objects from the other side until you find the one you want. That's 
going to be _very_ slow over a high-latency connection.

So what I'd suggest is:

 - puller starts by just asking "what's your SHA1 for the ref I want"

   The puller wants to know this, because a common case may be that it 
   already has it, in which case it doesn't need to do anything. But more 
   importantly, the puller will need to know this anyway if it gets an 
   object-pack, so that the puller can update it's FETCH_HEAD.

 - if puller doesn't have it, then the _puller_ does:

	"git-rev-list my-current-refs"

   to generate an in-date-order list of commits it has, and it starts 
   feeding the result in chunks of 100 entries or something to the other
   end.

 - now, the server sees this stream of SHA1's that the client wants, and 
   it can very cheaply just test "do I have this SHA1". Now, if the client 
   hasn't made any changes at all, then the first one will be a hit, and 
   we already have sufficient knowledge to tell what the difference 
   between the client and the server is.

   But more importantly, even if the client _has_ made changes, the client 
   likely has more available CPU than the server has, _and_ the client 
   likely has a shorter list of changes than the server has, so it's
   really the client that should do this. We should burden the server as 
   lightly as possible for this to scale.

 - At some point the server sees the first SHA1 it recognizes, and at that 
   point the server will have to start working. It will just send back an 
   "ok, got it" message (telling the client to not bother continuing to 
   send it any more commit ID's), and then does

	git-rev-list --objects ref-client-wants ^first-common-sha1 |
		git-pack-objects --stdout

 - the client just unpacks the objects, and if successful, it puts the new 
   top ref it got into FETCH_HEAD. It's now done.

And I do _not_ think that it makes a lot of sense to try to be symmetric.  
For one thing, while a "git-send-pack" should update all the refs
in-place, a "git-pull-pack" should _not_ update the ref, it should just
set FETCH_HEAD instead and the puller can decide what he wants to do with
that ref (possibly merge it, but possibly just make it be a new local
branch "remote-branch").

So I think sending and receiving are fundamentally non-symmetric.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:29       ` "git-send-pack" Linus Torvalds
@ 2005-06-30 21:55         ` H. Peter Anvin
  2005-06-30 22:26           ` "git-send-pack" Linus Torvalds
  2005-06-30 22:25         ` "git-send-pack" Daniel Barkalow
  1 sibling, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 21:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git

It seems to me that git always defines a DAG of objects, such that if 
you have a list of terminals (defined as objects not referenced by other 
objects), you can, given access to the same objects, figure out all 
intervening objects.

The tricky bit becomes finding the DAG both sides have in common with as 
little traffic as possible.

For producing minimum network traffic, I think something like this would 
work:

a) The sender sends a list of its terminals to the receiver.

b) The receiver sends a list of nodes it needs, plus a list of all its 
own meta-terminals, obtained by pruning its own DAG according to the 
terminals list of the sender.

c) This may have to be performed iteratively?  I need to sit down and 
work out the exact algorithm for all cases, including branch trees and 
multi-rooted DAGs.

d) Once the sender knows the subset of its own DAG available to the 
receiver, it can transmit either all objects that it has the sender does 
not, or all objects on the path to one or more specific objects (e.g. HEAD.)

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:55         ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 22:26           ` Linus Torvalds
  2005-06-30 23:40             ` "git-send-pack" H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 22:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Daniel Barkalow, Junio C Hamano, git

On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
> For producing minimum network traffic, I think something like this would 
> work:

In the "minimum traffic", the thing to look at is number of packets, and 
penalize further for anything that requires a synchronous reply.

That's why I'd suggest just letting the client stream out the list of
objects it has - it may appear wasteful to stream out even a thousand
SHA1's, but hey, that's just 20kB worth of data, and especially if there
is no synchronous stuff, that's just 15 ethernet packets.

For the server side, looking up a thousand SHA's is pretty easy (it's
_really_ cheap if the server ends up using a few big packed objects: you
don't even have to look at the pack data itself, it can look at just the
index and say "yup, I've got it")

So I'd go for simple brute force over anything that needs to discuss
things and have a back-and-forth between server/client. And making the
client do the heavy lifting is the right thing to do (the server will have
to create the pack, which can be expensive, but you can tune the delta 
window for how much CPU the server has)

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 22:26           ` "git-send-pack" Linus Torvalds
@ 2005-06-30 23:40             ` H. Peter Anvin
  2005-07-01  0:02               ` "git-send-pack" Linus Torvalds
  0 siblings, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:
> 
> On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
>>For producing minimum network traffic, I think something like this would 
>>work:
> 
> In the "minimum traffic", the thing to look at is number of packets, and 
> penalize further for anything that requires a synchronous reply.
> 
> That's why I'd suggest just letting the client stream out the list of
> objects it has - it may appear wasteful to stream out even a thousand
> SHA1's, but hey, that's just 20kB worth of data, and especially if there
> is no synchronous stuff, that's just 15 ethernet packets.
> 

In your linux-2.6 tree, there are currently 54,204 objects, and that is 
after less than one full 2.6.x kernel release cycle.  That's a megabyte 
of SHA1s.

In /pub/scm on kernel.org, there are currently 1,815,573 objects or hard 
links to objects, which would take a 36.3 MB list to produce.

Although this is better than what rsync does, which is it encodes this 
list into ASCII with pathnames and all and it ends up being closer to 
200 MB, it isn't fundamentally different.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 23:40             ` "git-send-pack" H. Peter Anvin
@ 2005-07-01  0:02               ` Linus Torvalds
  2005-07-01  1:24                 ` "git-send-pack" H. Peter Anvin
  2005-07-01 23:44                 ` "git-send-pack" Mike Taht
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-01  0:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Daniel Barkalow, Junio C Hamano, git

On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
> In your linux-2.6 tree, there are currently 54,204 objects, and that is 
> after less than one full 2.6.x kernel release cycle.  That's a megabyte 
> of SHA1s.

But that's _all_ objects. There are "only" 4040 commit objects (which are
always the starting point for a search). 

So streaming out the commit objects a few hundred at a time is actually 
a very simple strategy. 

Also, note that the server is usually _more_ ahead than the client is, and 
the server is the one that potentially has lots of commits that the 
client doesn't have. Not the other way around. So if the client makes a 
list of it's top commits, it almost certainly won't have to make a very 
long list until the server can tell it "ok, stop, I've seen it".

Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit 
in a regular-sized ethernet packet, but whatever - you'd burst out your 
commits "latest first", so you'd never even get to the current 4040 unless 
you've literally done the kind of work we've done in the git tree for the 
last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-01  0:02               ` "git-send-pack" Linus Torvalds
@ 2005-07-01  1:24                 ` H. Peter Anvin
  2005-07-01 23:44                 ` "git-send-pack" Mike Taht
  1 sibling, 0 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-01  1:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:
> 
> On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
>>In your linux-2.6 tree, there are currently 54,204 objects, and that is 
>>after less than one full 2.6.x kernel release cycle.  That's a megabyte 
>>of SHA1s.
> 
> 
> But that's _all_ objects. There are "only" 4040 commit objects (which are
> always the starting point for a search). 
> 

Well, there are objects that reference commit objects (e.g. tag 
objects), not the other way around, but your point is well taken.

> So streaming out the commit objects a few hundred at a time is actually 
> a very simple strategy. 
> 
> Also, note that the server is usually _more_ ahead than the client is, and 
> the server is the one that potentially has lots of commits that the 
> client doesn't have. Not the other way around. So if the client makes a 
> list of it's top commits, it almost certainly won't have to make a very 
> long list until the server can tell it "ok, stop, I've seen it".

Well, what I proposed was pretty much that except to have the client 
(receiver) start first.

I prefer calling it sender and receiver, because in the case of upload 
and download you have different sides being the "server".

> Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit 
> in a regular-sized ethernet packet, but whatever - you'd burst out your 
> commits "latest first", so you'd never even get to the current 4040 unless 
> you've literally done the kind of work we've done in the git tree for the 
> last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_.

Well, in the common case (sender has a superset of receiver), what I 
proposed would converge on the first iteration.  I'm not even convinced 
that the algorithm *ever* needs to iterate.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-01  0:02               ` "git-send-pack" Linus Torvalds
  2005-07-01  1:24                 ` "git-send-pack" H. Peter Anvin
@ 2005-07-01 23:44                 ` Mike Taht
  2005-07-02  0:07                   ` "git-send-pack" H. Peter Anvin
  2005-07-02  1:56                   ` "git-send-pack" Linus Torvalds
  1 sibling, 2 replies; 86+ messages in thread
From: Mike Taht @ 2005-07-01 23:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:

> Also, note that the server is usually _more_ ahead than the client is, and 
> the server is the one that potentially has lots of commits that the 
> client doesn't have. Not the other way around. So if the client makes a 
> list of it's top commits, it almost certainly won't have to make a very 
> long list until the server can tell it "ok, stop, I've seen it".
> 
> Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit 
> in a regular-sized ethernet packet, but whatever - you'd burst out your 
> commits "latest first", so you'd never even get to the current 4040 unless 
> you've literally done the kind of work we've done in the git tree for the 
> last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_.

You are getting closer and closer to where something like bitTorrent or 
a multicast protocol makes sense. The problem isn't just the number of 
outstanding commit objects but the number of machines and developers 
that want to grab those commits at the same time.


Mike Taht
PostCards From The Bleeding Edge
http://the-edge.blogspot.com "Tempel 1 worth 2.2 million trillion bux"

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-01 23:44                 ` "git-send-pack" Mike Taht
@ 2005-07-02  0:07                   ` H. Peter Anvin
  2005-07-02  1:56                   ` "git-send-pack" Linus Torvalds
  1 sibling, 0 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02  0:07 UTC (permalink / raw)
  To: Mike Taht; +Cc: Linus Torvalds, Daniel Barkalow, Junio C Hamano, git

Mike Taht wrote:
> 
> You are getting closer and closer to where something like bitTorrent or 
> a multicast protocol makes sense. The problem isn't just the number of 
> outstanding commit objects but the number of machines and developers 
> that want to grab those commits at the same time.
> 

Not really.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-01 23:44                 ` "git-send-pack" Mike Taht
  2005-07-02  0:07                   ` "git-send-pack" H. Peter Anvin
@ 2005-07-02  1:56                   ` Linus Torvalds
  2005-07-02  4:08                     ` "git-send-pack" H. Peter Anvin
  1 sibling, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02  1:56 UTC (permalink / raw)
  To: Mike Taht; +Cc: H. Peter Anvin, Daniel Barkalow, Junio C Hamano, git

On Fri, 1 Jul 2005, Mike Taht wrote:
> 
> You are getting closer and closer to where something like bitTorrent or 
> a multicast protocol makes sense. The problem isn't just the number of 
> outstanding commit objects but the number of machines and developers 
> that want to grab those commits at the same time.

I don't think so. First off, I don't think the decision is kernel- 
specific, in the sense that I at least use git for sparse and git itself 
too, so the solution should make sense for small projects as well.

Also, even for the kernel, the total dataset right now (after three months
or whatever) is a 60MB pack. It's not like we're sending DVD's or even
CD's worth of data around - we're sending the equivalent of 20MB per
_month_. That's really not a lot of data. You could easily keep up with a 
slow modem.

Also, the number of people involved isn't _that_ big. We're talking a few
thousand people who actively would update their trees for a big project,
and many smaller projects have anything from a couple to maybe a hundred. 
A few mirrors, and you don't have any problem.

So I think that the problem is actually not that big, and we just need to
find an acceptable format. Quite frankly, it might be perfectly acceptable
for kernel.org to run a simple packing script once a week which packs
everything into one single file, and even if that means that the mirrors
will have to re-get everything once a week, that actually sounds 
acceptable.

It's obviously a _stupid_ way to handle the rsync problem, so there's 
bound to be some cleaner solution, but the point is that we can probably 
make mirroring acceptable even with a really really stupid approach. I'd 
be a bit ashamed of just how ugly it is, but it would likely _work_ fine.
You'd create 52 pack-files in a year, but each pack-file is likely just
ten megabytes each. 

Oh, each pack-file should also be associated with the list of "refs" that
were used to generate that pack-file, so make that 104 files per project
year (but the list of "refs" would usually be something small, like

	refs/heads/master       4a89a04f1ee21a7c1f4413f1ad7dcfac50ff9b63
	refs/tags/v2.6.11       5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
	refs/tags/v2.6.11-tree  5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c
	refs/tags/v2.6.12       26791a8bcf0e6d33f43aef7682bdb555236d56de
	refs/tags/v2.6.12-rc2   9e734775f7c22d2f89943ad6c745571f1930105f
	refs/tags/v2.6.12-rc3   0397236d43e48e821cce5bbe6a80a1a56bb7cc3a
	refs/tags/v2.6.12-rc4   ebb5573ea8beaf000d4833735f3e53acb9af844c
	refs/tags/v2.6.12-rc5   06f6d9e2f140466eeb41e494e14167f90210f89d
	refs/tags/v2.6.12-rc6   701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e
	refs/tags/v2.6.13-rc1   733ad933f62e82ebc92fed988c7f0795e64dea62

which was trivially generated from my current tree with

	for i in refs/*/*; do echo -ne $i"\t"; cat $i; done

so now you can use the refs associated with the previous pack-file as the 
list of refs you're _not_ interested in, and the current list of refs as 
the list you _are_ interested in, and generate the new pack-file.

Generating the pack-file would literally be something like

	obj=$(git-rev-parse $(cut -f2 new-list) --not $(cut -f2 old-list))
	git-rev-list $obj | git-pack-objects --stdin > new-pack

so a few one-liners like this, run from a cron-job once a week, should
just do it.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02  1:56                   ` "git-send-pack" Linus Torvalds
@ 2005-07-02  4:08                     ` H. Peter Anvin
  2005-07-02  4:22                       ` "git-send-pack" Linus Torvalds
  0 siblings, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02  4:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:
> 
> Also, the number of people involved isn't _that_ big. We're talking a few
> thousand people who actively would update their trees for a big project,
> and many smaller projects have anything from a couple to maybe a hundred. 
> A few mirrors, and you don't have any problem.
> 
> So I think that the problem is actually not that big, and we just need to
> find an acceptable format. Quite frankly, it might be perfectly acceptable
> for kernel.org to run a simple packing script once a week which packs
> everything into one single file, and even if that means that the mirrors
> will have to re-get everything once a week, that actually sounds 
> acceptable.
> 
> It's obviously a _stupid_ way to handle the rsync problem, so there's 
> bound to be some cleaner solution, but the point is that we can probably 
> make mirroring acceptable even with a really really stupid approach. I'd 
> be a bit ashamed of just how ugly it is, but it would likely _work_ fine.
> You'd create 52 pack-files in a year, but each pack-file is likely just
> ten megabytes each. 
> 

Any reason not to simply append objects to an existing packfile?  It 
really seems like an easy solutions, and should have relatively good I/O 
patterns to boot simply because it naturally creates a topological sort 
of the objects.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02  4:08                     ` "git-send-pack" H. Peter Anvin
@ 2005-07-02  4:22                       ` Linus Torvalds
  2005-07-02  4:29                         ` "git-send-pack" H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02  4:22 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git

On Fri, 1 Jul 2005, H. Peter Anvin wrote:
> 
> Any reason not to simply append objects to an existing packfile?

What happens when somebody screws up in the middle?

The one thing I care about more than anything else is consistency. We are 
careful about writing objects in the right order, and we can re-create the 
state from the originator etc. But if we start appending stuff and 
something goes wrong in the middle, I'm just not going to touch it. A 
"truncate and hope for the best" algorithm? 

Besides, the result is not a valid git archive any more. 

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02  4:22                       ` "git-send-pack" Linus Torvalds
@ 2005-07-02  4:29                         ` H. Peter Anvin
  2005-07-02 17:16                           ` "git-send-pack" Linus Torvalds
  0 siblings, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02  4:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:
> 
> On Fri, 1 Jul 2005, H. Peter Anvin wrote:
> 
>>Any reason not to simply append objects to an existing packfile?
> 
> 
> What happens when somebody screws up in the middle?
> 
> The one thing I care about more than anything else is consistency. We are 
> careful about writing objects in the right order, and we can re-create the 
> state from the originator etc. But if we start appending stuff and 
> something goes wrong in the middle, I'm just not going to touch it. A 
> "truncate and hope for the best" algorithm? 
> 
> Besides, the result is not a valid git archive any more. 
> 

It's a log.  It's a standard technique to append entries to a log.  The 
requirements for this to always be consistent is that a) it's possible 
to know when the entry/entries at the end are inconsistent and b) it's 
always possible to roll back the log to a consistent state.

This is normally done with commit records (write data - fdatasync - 
write commit record - fdatasync), but in the case of git, the commit 
record isn't required because each git record is self-validating.  This 
is an incredibly powerful property.

If the log is written in topological sort order, then even a truncated 
log file is a valid (subset) git object store.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02  4:29                         ` "git-send-pack" H. Peter Anvin
@ 2005-07-02 17:16                           ` Linus Torvalds
  2005-07-02 17:37                             ` "git-send-pack" H. Peter Anvin
  2005-07-02 17:44                             ` "git-send-pack" Tony Luck
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02 17:16 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git

On Fri, 1 Jul 2005, H. Peter Anvin wrote:
> 
> It's a log.

..but that's not what we're looking for. I'm not looking for kernel.org to
be my distributed backup tape.

For it to be useful, it must do more than just log all activity and mirror
it out via rsync. It must also be usable for people pulling on it. Which
means that it has to be a valid git archive or at least easily
incrementally unpackable, so that people can actually use the end result.

A log of packs that are just incremented is certainly unpackable: you
teach git-unpack-objects to just unpack several packs after each other.  
But since it's not seekable, you'd have to unpack a 100MB compressed
archive just to get the last tip of it that you don't have unpacked yet.

Also, it means that it's impossible to efficiently do a git-specific 
thing. I want people to be able to do what we used to be able to do with 
BK: just do a

	git pull master.kernel.org:xxxx

and get something useful. And that means _not_ having to pull a 100MB blob 
to get the last objects at the end.

And don't tell me "rsync can efficiently get just the end". That's true 
for _mirrors_, but it's not true for users that don't have every single 
archive on kernel.org. I don't have (and I don't want to have) a copy of 
every single persons log that ever might want to push to me.

So no, a log simply isn't useful. It _has_ to be a valid git archive to be 
useful. Thousands of objects satisfy that. Or a "few packs + few objects". 
Not a log.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02 17:16                           ` "git-send-pack" Linus Torvalds
@ 2005-07-02 17:37                             ` H. Peter Anvin
  2005-07-02 17:44                             ` "git-send-pack" Tony Luck
  1 sibling, 0 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 17:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git

Linus Torvalds wrote:
> 
> ..but that's not what we're looking for. I'm not looking for kernel.org to
> be my distributed backup tape.
> 
> For it to be useful, it must do more than just log all activity and mirror
> it out via rsync. It must also be usable for people pulling on it. Which
> means that it has to be a valid git archive or at least easily
> incrementally unpackable, so that people can actually use the end result.
> 
> A log of packs that are just incremented is certainly unpackable: you
> teach git-unpack-objects to just unpack several packs after each other.  
> But since it's not seekable, you'd have to unpack a 100MB compressed
> archive just to get the last tip of it that you don't have unpacked yet.
> 

Agreed, you also need an index file.  The index file can be recreated 
from the log file in case of corruption, but is what you'd use to seek 
directly to an object.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02 17:16                           ` "git-send-pack" Linus Torvalds
  2005-07-02 17:37                             ` "git-send-pack" H. Peter Anvin
@ 2005-07-02 17:44                             ` Tony Luck
  2005-07-02 17:48                               ` "git-send-pack" H. Peter Anvin
  1 sibling, 1 reply; 86+ messages in thread
From: Tony Luck @ 2005-07-02 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Mike Taht, Daniel Barkalow, Junio C Hamano, git

Here's another approach.

Teach the variants of git-pull to look for a file that names an
alternate repository
that should be used to get any object that is referenced in the repository, but
doesn't exist in it.

At least part of the problem for kernel.org is that there around 50 repositories
that are tracking the 2.6 kernel.  All of them have 50,000 objects that are
duplicates of each other ... and a few hundred 'unique' objects that belong
to just one repo, or are minimally shared.

If there was a way to specify an alternate repo, then a large GIT server like
kernel.org could set up a "git-history"[1] repo which each of the hosted repos
could point to.  Then a cron job could look for duplicates, and move them
off to the history area.

-Tony

[1] Different projects, like git and sparse, might never have any common
files with the Linux kernel ... but they can all share the same history.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02 17:44                             ` "git-send-pack" Tony Luck
@ 2005-07-02 17:48                               ` H. Peter Anvin
  2005-07-02 18:12                                 ` "git-send-pack" A Large Angry SCM
  0 siblings, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 17:48 UTC (permalink / raw)
  To: Tony Luck; +Cc: Linus Torvalds, Mike Taht, Daniel Barkalow, Junio C Hamano, git

Tony Luck wrote:
> 
> At least part of the problem for kernel.org is that there around 50 repositories
> that are tracking the 2.6 kernel.  All of them have 50,000 objects that are
> duplicates of each other ... and a few hundred 'unique' objects that belong
> to just one repo, or are minimally shared.
> 
> If there was a way to specify an alternate repo, then a large GIT server like
> kernel.org could set up a "git-history"[1] repo which each of the hosted repos
> could point to.  Then a cron job could look for duplicates, and move them
> off to the history area.
> 

This is why I've been talking about a global object repository -- 
including the problems associated with them.  git as it currently stands 
permit a single global object store, *except* for the issue of duplicate 
tags.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-02 17:48                               ` "git-send-pack" H. Peter Anvin
@ 2005-07-02 18:12                                 ` A Large Angry SCM
  0 siblings, 0 replies; 86+ messages in thread
From: A Large Angry SCM @ 2005-07-02 18:12 UTC (permalink / raw)
  To: git

H. Peter Anvin wrote:
> Tony Luck wrote:
>>
...
> 
> This is why I've been talking about a global object repository -- 
> including the problems associated with them.  git as it currently stands 
> permit a single global object store, *except* for the issue of duplicate 
> tags.

So why not store just the git objects in the global repository and keep
all the things that reference an object (HEAD, branches/*, refs/*/*,
etc.) in a per project and/or contributor area like it is currently?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:29       ` "git-send-pack" Linus Torvalds
  2005-06-30 21:55         ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 22:25         ` Daniel Barkalow
  2005-06-30 23:56           ` "git-send-pack" Linus Torvalds
  1 sibling, 1 reply; 86+ messages in thread
From: Daniel Barkalow @ 2005-06-30 22:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Thu, 30 Jun 2005, Linus Torvalds wrote:

> On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> > 
> > I suspect that I'll be able to merge send-pack/receive-pack with
> > ssh-push/ssh-pull this evening, and then it'll have the feature of not
> > caring too much which side your command line is on.
> 
> The simple thing to do is to just get one commit at a time, see if you 
> have it already, parse if it not, and go on to the parents.
> 
> That would fit the current git-pull thing, and may be good enough, but it 
> has the downside that it can need a _lot_ of back-and-forth fecthing of 
> commit objects from the other side until you find the one you want. That's 
> going to be _very_ slow over a high-latency connection.
> 
> So what I'd suggest is:
> 
> 1- puller starts by just asking "what's your SHA1 for the ref I want"
> 
>    The puller wants to know this, because a common case may be that it 
>    already has it, in which case it doesn't need to do anything. But more 
>    importantly, the puller will need to know this anyway if it gets an 
>    object-pack, so that the puller can update it's FETCH_HEAD.

Already have this, for the non-pack case.

>  - At some point the server sees the first SHA1 it recognizes, and at that 
>    point the server will have to start working. It will just send back an 
>    "ok, got it" message (telling the client to not bother continuing to 
>    send it any more commit ID's), and then does
> 
> 	git-rev-list --objects ref-client-wants ^first-common-sha1 |
> 		git-pack-objects --stdout

Right.

>  - the client just unpacks the objects, and if successful, it puts the new 
>    top ref it got into FETCH_HEAD. It's now done.

Or wherever it's been told to, yes.

> And I do _not_ think that it makes a lot of sense to try to be symmetric.  
> For one thing, while a "git-send-pack" should update all the refs
> in-place, a "git-pull-pack" should _not_ update the ref, it should just
> set FETCH_HEAD instead and the puller can decide what he wants to do with
> that ref (possibly merge it, but possibly just make it be a new local
> branch "remote-branch").

My expectation is that the puller will have a ref "remote-branch", and
will therefore: (1) want to update it, and (2) know the last commit pulled
from it. In this situation, we can skip figuring out the start (the two
points I didn't quote), because we saved it from before.

At least, this is how I've always done it; I've got a "linus" branch that
follows the public repo, and I commit changes to a different branch. I
suppose one could skip hanging onto this info, but it seems like an
obviously useful thing to keep, if for no other reason than that I want to
diff against it. This is essentially promoting FETCH_HEAD to a refs/heads/
thing, and having separate ones when you pull from separate sources.

I suppose things are different if you do a lot of one-shot pulls, rather
than tracking branches that you pull from; I'll need to think about this
case (assuming that's actually what you do).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 22:25         ` "git-send-pack" Daniel Barkalow
@ 2005-06-30 23:56           ` Linus Torvalds
  2005-07-01  5:01             ` "git-send-pack" Daniel Barkalow
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 23:56 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> 
> My expectation is that the puller will have a ref "remote-branch", and
> will therefore: (1) want to update it, and (2) know the last commit pulled
> from it. In this situation, we can skip figuring out the start (the two
> points I didn't quote), because we saved it from before.

This is _never_ how I do things, so I think that's a bad expectation. I 
have other peoples trees "just show up", since they are actually based on 
mine..

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 23:56           ` "git-send-pack" Linus Torvalds
@ 2005-07-01  5:01             ` Daniel Barkalow
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel Barkalow @ 2005-07-01  5:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

On Thu, 30 Jun 2005, Linus Torvalds wrote:

> On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> > 
> > My expectation is that the puller will have a ref "remote-branch", and
> > will therefore: (1) want to update it, and (2) know the last commit pulled
> > from it. In this situation, we can skip figuring out the start (the two
> > points I didn't quote), because we saved it from before.
> 
> This is _never_ how I do things, so I think that's a bad expectation. I 
> have other peoples trees "just show up", since they are actually based on 
> mine..

Okay, so my next task will be to support this case.

What I'm doing now is:

 - if the source is using an old version, fall back on individual objects

 - send one (or more) ids to exclude

 - find out if the server recognized any of the ids

 - if not, fall back on transferring individual objects (or we could try
   another batch)

 - request a pack for the given hash, excluding whatever we've said to
   exclude

I've implemented this for the case of updating a head, and got it to
transfer a pack of 11 objects. It took 31s (including connecting) to
transfer the entire history of git (3973 objects) over a DSL-DSL link with
a 39ms ping time. I sent the same thing with the old method previously,
and it took ages (wasn't timing it, though).

It should be possible to notice that we're not updating a ref, send all
the refs you have instead, see if the source recognized any, try again
with the next 70 commits, check, and repeat. Does this match what you were
suggesting?

I can send you the messy version tomorrow if you want to hack on it or
test it, and I'll have a clean patch series over the weekend.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:38   ` "git-send-pack" Junio C Hamano
  2005-06-30 21:05     ` "git-send-pack" Daniel Barkalow
@ 2005-06-30 21:08     ` Linus Torvalds
  2005-06-30 21:10     ` "git-send-pack" Dan Holmsand
  2 siblings, 0 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, git

On Thu, 30 Jun 2005, Junio C Hamano wrote:
> 
> Now is there anything for us poor mortals who would want to have
> a "pull" support?  Logging in via ssh and run send-pack on the
> other end is workable but not so pretty ;-).

I'm thinking about it. You can't actually do send-pack from the other end,
since send-pack needs to know what the base is, and the base you have may
not even exist in the remote.

So a "git-pull-pack" will follow the objects on the other side until it 
hits one we have, and _then_ it can send a nice pack. It's not hard per 
se, and some of the problems are actually simpler than git-send-pack, but 
it needs more communication (and in order to be efficient you want to not 
ping-pong a "do-you-have-it" query every time around).

I also want to make sure that the biggest burden is on the pull side, not 
the push side. I have a plan, though.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:38   ` "git-send-pack" Junio C Hamano
  2005-06-30 21:05     ` "git-send-pack" Daniel Barkalow
  2005-06-30 21:08     ` "git-send-pack" Linus Torvalds
@ 2005-06-30 21:10     ` Dan Holmsand
  2 siblings, 0 replies; 86+ messages in thread
From: Dan Holmsand @ 2005-06-30 21:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Daniel Barkalow, git

Junio C Hamano wrote:
>>>>>>"LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> 
> LT> In fact, the most recent push was gone with a
> 
> LT> 	git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git
> 
> Congrats for a job well done.

Agree totally. And the whole pack thing is really cool. Git is sooo much
faster when running from pack-files only on my poor laptop.

> Now is there anything for us poor mortals who would want to have
> a "pull" support?  Logging in via ssh and run send-pack on the
> other end is workable but not so pretty ;-).

Agreed again :-)

Even cooler would be pack-pulls via http. That would be a bit hard on 
the servers with the current git-pack-objects, but it ought to be 
possible to create something similar that doesn't re-delta anything, but 
instead just spits out what's in an existing pack-file, and (perhaps) 
deltifies objects from the file system.

If people then re-pack their repositories occasionally, this should be 
plenty fast, the number of files for rsync to deal with could be kept 
down, as could download times for mortal users.

/dan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 17:54 "git-send-pack" Linus Torvalds
                   ` (3 preceding siblings ...)
  2005-06-30 19:44 ` "git-send-pack" Linus Torvalds
@ 2005-06-30 19:49 ` Daniel Barkalow
  2005-06-30 20:12   ` "git-send-pack" Linus Torvalds
  4 siblings, 1 reply; 86+ messages in thread
From: Daniel Barkalow @ 2005-06-30 19:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, Linus Torvalds wrote:

> Anyway, what are the limitations? Here's a few obvious ones:
> 
>  - I really hate how "ssh" apparently cannot be told to have alternate 
>    paths. For example, on master.kernel.org, I don't control the setup, so 
>    I can't install my own git binaries anywhere except in my ~/bin
>    directory, but I also cannot get ssh to accept that that is a valid 
>    path. This one really bums me out, and I think it's an ssh deficiency. 
> 
>    You apparently have to compile in the paths at compile-time into sshd, 
>    and PermitUserEnvironment is disabled by default (not that it even 
>    seems to work for the PATH environment, but that may have been my 
>    testing that didn't re-start sshd).
> 
>    That just sucks.

The easiest thing might be to have a centrally-installed wrapper script
that could run programs installed in your home directory. E.g., if
"git" had a "source ~/.git-env" at the beginning, and your ~/.git-env
fixed your PATH, then "git receive-pack ARGS" should work, for a generic
centrally installed git and special stuff in your home directory.

>  - It doesn't update the working directory at the other end. This is fine 
>    for what it's intended for (pushing to a central "raw" git archives), 
>    so this could be considered a feature, but it's worth pointing out. 
>    Only a "pull" will update your working directory, and this pack sending 
>    really is meant to be used in a kind of "push to central archive" way.

I thought only "resolve" (as part of "fetch") updated your working
directory, so this is completely consistant.

>  - this is also (at least once we've tested it a lot more and added the
>    code to allow it to create new refs on the remote side) meant to be a
>    good way to mirror things out, since clearly rsync isn't scaling. 
> 
>    However, I don't know what the rules for acceptable mirroring 
>    approaches are, and it's entirely possible (nay, probable) that an ssh
>    connection from the "master" ain't it. It would be good to know what 
>    (of any) would be acceptable solutions..

The right solution probably involves getting each pack file you push to
the mirrors as well as to the master. They'll probably update no less
frequently than you push, and they should go through a series of states
which matches the master, so it's not necessary to have anything smart on
master sending them, and they only have to unpack the files they get (and
update the refs afterward). That should make the cross-system trust
requirements relatively minimal; the mirror can fetch things from master,
and neither side has to allow the other to specify a command line.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow
@ 2005-06-30 20:12   ` Linus Torvalds
  2005-06-30 20:23     ` "git-send-pack" H. Peter Anvin
  2005-06-30 20:49     ` "git-send-pack" Daniel Barkalow
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 20:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Git Mailing List, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> 
> The right solution probably involves getting each pack file you push to
> the mirrors as well as to the master. They'll probably update no less
> frequently than you push, and they should go through a series of states
> which matches the master, so it's not necessary to have anything smart on
> master sending them, and they only have to unpack the files they get (and
> update the refs afterward).

Hmm, yes. That would work, together with just fetching the heads.

It won't _really_ solve the problem, since the pushed pack objects will
grow at a proportional rate to the current objects - it's just a constant
factor (admittedly a potentially fairly _big_ constant factor)  
improvement both in size and in number of files.

So the mirroring ends up getting slowly slower and slower as the number of 
pack files go up. In contrast, a git-aware thing can be basically 
constant-time, and mirroring expense ends up being relative to the size of 
the change rather than the size of the repository.

But mirroring just pack-files might solve the problem for the forseeable 
future, so..

"git-receive-pack" would need to take a flag to tell it to instead of
unpacking just check the object instead (ie call "git-unpack-object" with
the "-n" flag - it will check that everything looks ok, including the
embedded protecting SHA1 hash), and write it out to the filesystem (as it
comes in) and then rename it to the right place.

			Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:12   ` "git-send-pack" Linus Torvalds
@ 2005-06-30 20:23     ` H. Peter Anvin
  2005-06-30 20:52       ` "git-send-pack" Linus Torvalds
  2005-06-30 20:49     ` "git-send-pack" Daniel Barkalow
  1 sibling, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> 
> It won't _really_ solve the problem, since the pushed pack objects will
> grow at a proportional rate to the current objects - it's just a constant
> factor (admittedly a potentially fairly _big_ constant factor)  
> improvement both in size and in number of files.
> 

If I've understood this correctly, it's not a constant factor 
improvement in the number of files (in the size, yes); it's changing it 
from O(t*c) to O(t) where t is number of trees and c is number of 
changesets.  That's key.

The problem we're having (on kernel.org) right now is that there isn't a 
hierarchial time stamp in Unix, so we have to compare on a file-by-file 
level.  rsync is quite good at discovering an invariant beginning of a 
file, but when it comes to a mass of files it has to compare the stamps 
on each and every one, each time.  It will only descend into a single 
file, however, if that file has had its timestamp changed.

For the purposes of rsync, storing the objects in a single append-only 
file would be a very efficient method, since the rsync algorithm will 
quickly discover an invariant head and only transmit the tail.  It's not 
ideal, and having something git-aware would be better, but I think it's 
really would be nice to have something which also plays well with rsync. 
  There is a *lot* of infrastructure in rsync which is actually hard to 
replicate with another tool (including the server architecture); in many 
ways it would be easier to convince the rsync developers to create a 
plugin architecture and re-use all that code rather than developing an 
equivalent tool from scratch.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:23     ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 20:52       ` Linus Torvalds
  2005-06-30 21:23         ` "git-send-pack" H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 20:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
> If I've understood this correctly, it's not a constant factor 
> improvement in the number of files (in the size, yes); it's changing it 
> from O(t*c) to O(t) where t is number of trees and c is number of 
> changesets.  That's key.

No, it _is_ a constant factor even in number of files, if you just keep 
the pack objects around without re-packing them.

Basically, you'd get one new pack-file every time I push. That's better
than getting <n> "raw object" files (where <n> can be anything from just a
couple to several thousand, depending on whether I had pulled things), but
it's still just a constant factor on both number of files and size of
files.

Now, you could re-pack the objects every once in a while: it would force a
whole new "epoch", of course and then the mirrorers would have to fetch
the whole repacked file, but that might be fine. Especially if you stop
re-packing after you've hit a certain size (say, a couple of megs), and
then start on the next pack.

> For the purposes of rsync, storing the objects in a single append-only 
> file would be a very efficient method, since the rsync algorithm will 
> quickly discover an invariant head and only transmit the tail.

Actually, it won't be "quick" - it will have to read the whole file and do 
it's hash window thing.

You _could_ append the pack-files into one single "superpack" file (since
you can figure out where the pack boundaries are), but it would be
extremely big after a while, and rsync would spend all its time doing over
the hash window. You'd definitely be better off with re-packing.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:52       ` "git-send-pack" Linus Torvalds
@ 2005-06-30 21:23         ` H. Peter Anvin
  2005-06-30 21:26           ` "git-send-pack" H. Peter Anvin
  2005-06-30 21:42           ` "git-send-pack" Linus Torvalds
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 21:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> 
>>For the purposes of rsync, storing the objects in a single append-only 
>>file would be a very efficient method, since the rsync algorithm will 
>>quickly discover an invariant head and only transmit the tail.
> 
> Actually, it won't be "quick" - it will have to read the whole file and do 
> it's hash window thing.
> 

It does that, but it only have to do that when the actual file has 
changed.  That's acceptable, at least for the repository sizes we're 
likely to deal with within the medium term.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:23         ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 21:26           ` H. Peter Anvin
  2005-06-30 21:42           ` "git-send-pack" Linus Torvalds
  1 sibling, 0 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 21:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

H. Peter Anvin wrote:
> Linus Torvalds wrote:
> 
>>
>>> For the purposes of rsync, storing the objects in a single 
>>> append-only file would be a very efficient method, since the rsync 
>>> algorithm will quickly discover an invariant head and only transmit 
>>> the tail.
>>
>>
>> Actually, it won't be "quick" - it will have to read the whole file 
>> and do it's hash window thing.
>>
> 
> It does that, but it only have to do that when the actual file has 
> changed.  That's acceptable, at least for the repository sizes we're 
> likely to deal with within the medium term.
> 

I guess I should clarify a bit here.  I'm concerned with two aspects: 
the "keeping mirrors in sync" problem, where asking people to use a tool 
other than rsync is a really tough sell, and the developer usage 
scenario, in which case something git-aware is obviously the better thing.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:23         ` "git-send-pack" H. Peter Anvin
  2005-06-30 21:26           ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 21:42           ` Linus Torvalds
  2005-06-30 22:00             ` "git-send-pack" H. Peter Anvin
  1 sibling, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 21:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
> It does that, but it only have to do that when the actual file has 
> changed.  That's acceptable, at least for the repository sizes we're 
> likely to deal with within the medium term.

Well, realize that "incremental packs" deltify a lot worse than a "big
pack", since pack-files don't do deltas to objects outside the pack-file.

So we'd get _some_ compression, but not as much as possible. The current
kernel compresses down to a single 63 MB pack-file (that's with the 2.6.11
tree too, not just the HEAD history), but without deltas it weights in at
about 177 MB.

So a "sum of incremental packs" should be somewhere in between those two
values, even today. For a single kernel archive.

So repository sizes aren't exactly trivial. I don't know how expensive
that rsync hash thing is, but one thing you lose is the ability to
hardlink objects, so if you have a few kernel repositories at some point
it doesn't fit in the cache any more, and then the rsync will have to read
that much pack object stuff from disk in addition to doing the hash. Ugh.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 21:42           ` "git-send-pack" Linus Torvalds
@ 2005-06-30 22:00             ` H. Peter Anvin
  2005-07-01 10:31               ` "git-send-pack" Matthias Urlichs
  2005-07-01 13:56               ` Tags Eric W. Biederman
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-06-30 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> 
> On Thu, 30 Jun 2005, H. Peter Anvin wrote:
> 
>>It does that, but it only have to do that when the actual file has 
>>changed.  That's acceptable, at least for the repository sizes we're 
>>likely to deal with within the medium term.
> 
> 
> Well, realize that "incremental packs" deltify a lot worse than a "big
> pack", since pack-files don't do deltas to objects outside the pack-file.
> 
> So we'd get _some_ compression, but not as much as possible. The current
> kernel compresses down to a single 63 MB pack-file (that's with the 2.6.11
> tree too, not just the HEAD history), but without deltas it weights in at
> about 177 MB.
> 
> So a "sum of incremental packs" should be somewhere in between those two
> values, even today. For a single kernel archive.
> 
> So repository sizes aren't exactly trivial. I don't know how expensive
> that rsync hash thing is, but one thing you lose is the ability to
> hardlink objects, so if you have a few kernel repositories at some point
> it doesn't fit in the cache any more, and then the rsync will have to read
> that much pack object stuff from disk in addition to doing the hash. Ugh.
> 

The bulk of the cost in doing the hashing comes from having to read the 
file.

Well, if you grow a single pack file with appending, then you can have 
delta references to earlier objects within the same pack file.

At least at this point, we'd handle a few very large files a lot better 
than an enormous swarm of smaller ones.

In the end, it might be that the right thing to do for git on kernel.org 
is to have a single, unified object store which isn't accessible by 
anything other than git-specific protocols.  There would have to be some 
way of dealing with, for example, conflicting tags that apply to 
different repositories, though.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 22:00             ` "git-send-pack" H. Peter Anvin
@ 2005-07-01 10:31               ` Matthias Urlichs
  2005-07-01 14:43                 ` "git-send-pack" Jan Harkes
  2005-07-01 13:56               ` Tags Eric W. Biederman
  1 sibling, 1 reply; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-01 10:31 UTC (permalink / raw)
  To: git

Hi, H. Peter Anvin wrote:

> In the end, it might be that the right thing to do for git on kernel.org 
> is to have a single, unified object store which isn't accessible by 
> anything other than git-specific protocols.

Makes sense.

>  There would have to be some 
> way of dealing with, for example, conflicting tags that apply to 
> different repositories, though.
>
It seems that user-specific subdirectories in refs/heads (and, presumably,
../tags) mostly work already.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Don't lock the barn after it is stolen.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-07-01 10:31               ` "git-send-pack" Matthias Urlichs
@ 2005-07-01 14:43                 ` Jan Harkes
  0 siblings, 0 replies; 86+ messages in thread
From: Jan Harkes @ 2005-07-01 14:43 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: git

On Fri, Jul 01, 2005 at 12:31:53PM +0200, Matthias Urlichs wrote:
> > In the end, it might be that the right thing to do for git on kernel.org 
> > is to have a single, unified object store which isn't accessible by 
> > anything other than git-specific protocols.
> 
> Makes sense.
> 
> >  There would have to be some 
> > way of dealing with, for example, conflicting tags that apply to 
> > different repositories, though.
>
> It seems that user-specific subdirectories in refs/heads (and, presumably,
> ../tags) mostly work already.

They work pretty well, the core git commands have no problem with them
and I just sent off some patches for gitweb and gitk.

All git/objects directories can be merged into a common repository. The
refs/heads and refs/tags be copied to user specific subdirectories.

Then a pull like,
    git pull http://www.kernel.org/.../torvalds/linux-2.6.git

Would become,
    git pull http://www.kernel.org/.../linux-2.6.git torvalds/linux-2.6/master

It would make rsync more expensive for people who are interested in only
a branch or two, but there is only one repository which should be easier
on the mirrors. The http, ssh, and some future 'pack' transfer methods
won't see a difference since they only pull the specific commits they
need to catch up with a branch.

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Tags
  2005-06-30 22:00             ` "git-send-pack" H. Peter Anvin
  2005-07-01 10:31               ` "git-send-pack" Matthias Urlichs
@ 2005-07-01 13:56               ` Eric W. Biederman
  2005-07-01 16:37                 ` Tags H. Peter Anvin
  2005-07-01 18:09                 ` Tags Petr Baudis
  1 sibling, 2 replies; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-01 13:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> In the end, it might be that the right thing to do for git on kernel.org is to
> have a single, unified object store which isn't accessible by anything other
> than git-specific protocols.  There would have to be some way of dealing with,
> for example, conflicting tags that apply to different repositories, though.

As far as I can tell public distributed tags are not that hard and if
you are going to be synching them it is probably worth working on.

The basic idea is that instead of having one global tag of
'linux-2.6.13-rc1' you have a global tag of
'torvalds@osdl.org/linux-2.6.13-rc1'.

The important part is that the tag namespace is made hierarchical
with at least 2 levels.  Where the top level is a globally
unique tag owner id and the bottom level is the actual tag.  This
prevents collisions when merging trees because two peoples
tags are never in the same namespace, as least when
people are not actively hostile :)

Still being a complete git dummy I think the trivial mapping is
to put tags in:
.git/refs/tags/user@domain/tag
and then have a symlink at:
.git/TAGS 
that points to your default directory of tags.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 13:56               ` Tags Eric W. Biederman
@ 2005-07-01 16:37                 ` H. Peter Anvin
  2005-07-01 22:38                   ` Tags Eric W. Biederman
  2005-07-01 18:09                 ` Tags Petr Baudis
  1 sibling, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-01 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> 
>>In the end, it might be that the right thing to do for git on kernel.org is to
>>have a single, unified object store which isn't accessible by anything other
>>than git-specific protocols.  There would have to be some way of dealing with,
>>for example, conflicting tags that apply to different repositories, though.
> 
> 
> As far as I can tell public distributed tags are not that hard and if
> you are going to be synching them it is probably worth working on.
> 
> The basic idea is that instead of having one global tag of
> 'linux-2.6.13-rc1' you have a global tag of
> 'torvalds@osdl.org/linux-2.6.13-rc1'.
> 
> The important part is that the tag namespace is made hierarchical
> with at least 2 levels.  Where the top level is a globally
> unique tag owner id and the bottom level is the actual tag.  This
> prevents collisions when merging trees because two peoples
> tags are never in the same namespace, as least when
> people are not actively hostile :)
> 
> Still being a complete git dummy I think the trivial mapping is
> to put tags in:
> .git/refs/tags/user@domain/tag
> and then have a symlink at:
> .git/TAGS 
> that points to your default directory of tags.
> 

Unless you have an authentication mechanism and *enforce* it (you can do 
that with GPG signatures if *and only if* your disambiguation includes 
your GPG signature fingerprint) you still have a problem with someone 
introducing fake tags as a DoS attack.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 16:37                 ` Tags H. Peter Anvin
@ 2005-07-01 22:38                   ` Eric W. Biederman
  2005-07-01 22:44                     ` Tags H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-01 22:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> "H. Peter Anvin" <hpa@zytor.com> writes:
>>
> Unless you have an authentication mechanism and *enforce* it (you can do that
> with GPG signatures if *and only if* your disambiguation includes your GPG
> signature fingerprint) you still have a problem with someone introducing fake
> tags as a DoS attack.

There is a question of how bad is this.   For releases you certainly
need some kind of signature that people can verify and we
already have that but I think we can keep spoofing tags
down to the same level as spoofing patches.

Basically all this takes is to make your global namespace
the committer email address and you have the rule that
you can only tag your own commits.  Then when you merge
tags you never automatically add tags to your own tag namespace.

I think that is enough to make global tags usable in practice.

And for those who are typing challenged if all you ever
look at are your own tags the you should never need to
specify a fully qualified tag name as git should be able
to find the committer email address through other means.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 22:38                   ` Tags Eric W. Biederman
@ 2005-07-01 22:44                     ` H. Peter Anvin
  2005-07-01 23:07                       ` Tags Eric W. Biederman
  2005-07-02 16:00                       ` Tags Matthias Urlichs
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-01 22:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> 
> There is a question of how bad is this.   For releases you certainly
> need some kind of signature that people can verify and we
> already have that but I think we can keep spoofing tags
> down to the same level as spoofing patches.
> 
> Basically all this takes is to make your global namespace
> the committer email address and you have the rule that
> you can only tag your own commits.  Then when you merge
> tags you never automatically add tags to your own tag namespace.
> 

Doesn't work.  You can trivially generate a key with someone else's 
address.  It would require a full PKI.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 22:44                     ` Tags H. Peter Anvin
@ 2005-07-01 23:07                       ` Eric W. Biederman
  2005-07-01 23:22                         ` Tags Daniel Barkalow
  2005-07-02  0:06                         ` Tags H. Peter Anvin
  2005-07-02 16:00                       ` Tags Matthias Urlichs
  1 sibling, 2 replies; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-01 23:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> There is a question of how bad is this.   For releases you certainly
>> need some kind of signature that people can verify and we
>> already have that but I think we can keep spoofing tags
>> down to the same level as spoofing patches.
>> Basically all this takes is to make your global namespace
>> the committer email address and you have the rule that
>> you can only tag your own commits.  Then when you merge
>> tags you never automatically add tags to your own tag namespace.
>>
>
> Doesn't work.  You can trivially generate a key with someone else's address.  It
> would require a full PKI.

I'm not saying it's provable correct.  I'm simply saying it is as
correct as the rest of the git repository.

If I really care what developer xyz tagged I will pull from them,
or a mirror I trust.  And since developer xyz doesn't pull his
own global tags from other repositories that should be sufficient.

Plus if you pull from a spoofed tag somewhere further along
when you merge your code the merge will fail because what
you thought was a common ancestor isn't.  And you will
also likely get an error when you have the same tag
coming from 2 different sources with different values.

So all I am really arguing is that using the committer
email address is simply sufficient to prevent non-malicious
conflicts between developers, and it makes it enough
that to get a malicious conflict isn't completely trivial.
So I think it is good enough.

But for releases and things lots of people must trust yes you want
a full PKI infrastructure but I don't see a reason any of that
should be inherently tied to tags.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 23:07                       ` Tags Eric W. Biederman
@ 2005-07-01 23:22                         ` Daniel Barkalow
  2005-07-02  0:06                         ` Tags H. Peter Anvin
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Barkalow @ 2005-07-01 23:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Linus Torvalds, Git Mailing List, Junio C Hamano,
	ftpadmin

On Fri, 1 Jul 2005, Eric W. Biederman wrote:

> Plus if you pull from a spoofed tag somewhere further along
> when you merge your code the merge will fail because what
> you thought was a common ancestor isn't.  And you will
> also likely get an error when you have the same tag
> coming from 2 different sources with different values.

Actually, I think it would be beneficial to support multiple tags with the
same name in any case: if people are going to use local private tags like
"broken", either we need to support having refs/tags/broken being a list
of hashes, or any particular user can only have one broken version.

I don't see any major problems with having refs/ files contain potentially
multiple hashes (limited by what makes sense to be multiple; i.e., heads/*
should have only one value), and this lets the users check the content of
the tag objects to figure out what they care about, and either specify
things in more detail or discard things they don't like (or, when
appropriate, use all values). The main issue I see is that rsync wouldn't
merge them usefully.

(And it would be useful to have a structure to support keeping a simple
piece of information about a set of objects.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 23:07                       ` Tags Eric W. Biederman
  2005-07-01 23:22                         ` Tags Daniel Barkalow
@ 2005-07-02  0:06                         ` H. Peter Anvin
  2005-07-02  7:00                           ` Tags Eric W. Biederman
  2005-07-02 20:38                           ` Tags Jan Harkes
  1 sibling, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02  0:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> 
> If I really care what developer xyz tagged I will pull from them,
> or a mirror I trust.  And since developer xyz doesn't pull his
> own global tags from other repositories that should be sufficient.
> 

You're missing something totally and utterly fundamental here: I'm 
talking about creating an infrastructure (think sourceforge) where there 
is only one git repository for the whole system, period, full stop, end 
of story.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02  0:06                         ` Tags H. Peter Anvin
@ 2005-07-02  7:00                           ` Eric W. Biederman
  2005-07-02 17:47                             ` Tags H. Peter Anvin
  2005-07-02 20:38                           ` Tags Jan Harkes
  1 sibling, 1 reply; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-02  7:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> If I really care what developer xyz tagged I will pull from them,
>> or a mirror I trust.  And since developer xyz doesn't pull his
>> own global tags from other repositories that should be sufficient.
>>
>
> You're missing something totally and utterly fundamental here: I'm talking about
> creating an infrastructure (think sourceforge) where there is only one git
> repository for the whole system, period, full stop, end of story.

Could be I'm certainly not up to speed on git yet.

However all you have to do for your single system git repository is
to filter tags at creation time.  So for a person to upload something
you need a git aware tool and you need authentication so you are certain
it is the right person creating the tag.  

Since it is a shared repository you probably want rules like you can
only create tags that belong to yourself or are owned by people 
who do not have accounts on the system.

Likewise in a system like sourceforge it is desirable to check all
of the committer information in commits as well, so you have a reasonable
audit trail, and it make sense to check little things like the file under
a sha1 key actually matches the sha1 key.

Downstream mirrors can happily rsync just fine.  So long as they
verify the upstream source.

Tags that you mirror are of course suspect but they will always be.
The primary tags created by people with accounts should be reliable
though.

So in essence I see nothing with my proposal that is any worse than
any other part of git.

That being said, it sounds like there is a slightly more git 
knowledgeable/native version suggested having to do with multiple
heads.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02  7:00                           ` Tags Eric W. Biederman
@ 2005-07-02 17:47                             ` H. Peter Anvin
  2005-07-02 17:54                               ` Tags Eric W. Biederman
  0 siblings, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 17:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> 
> However all you have to do for your single system git repository is
> to filter tags at creation time.  So for a person to upload something
> you need a git aware tool and you need authentication so you are certain
> it is the right person creating the tag.  
> 

That's complicated; it pretty much works out to having to have a PKI and 
a system of registered IDs, or some such.  That's painful.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 17:47                             ` Tags H. Peter Anvin
@ 2005-07-02 17:54                               ` Eric W. Biederman
  2005-07-02 17:58                                 ` Tags H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-02 17:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> However all you have to do for your single system git repository is
>> to filter tags at creation time.  So for a person to upload something
>> you need a git aware tool and you need authentication so you are certain
>> it is the right person creating the tag.
>
> That's complicated; it pretty much works out to having to have a PKI and a
> system of registered IDs, or some such.  That's painful.

?? Isn't that what ssh is?

To some extent a lot depends on how active you expect people to
try and forge things.  If there is an expectation of honesty
you are fine.  

If you want to build one mondo repository with thousands of developers
having write access you need to be more careful.  But as far as I know
none of that is specific to tags.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 17:54                               ` Tags Eric W. Biederman
@ 2005-07-02 17:58                                 ` H. Peter Anvin
  2005-07-02 18:31                                   ` Tags Eric W. Biederman
  2005-07-02 18:45                                   ` Tags Linus Torvalds
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 17:58 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> 
> ?? Isn't that what ssh is?
> 
> To some extent a lot depends on how active you expect people to
> try and forge things.  If there is an expectation of honesty
> you are fine.  
> 

I can't afford to have that.

> If you want to build one mondo repository with thousands of developers
> having write access you need to be more careful.  But as far as I know
> none of that is specific to tags.

Well, you're wrong.  Tags is the only part of git which cannot be 
protected by git's own self-validation system.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 17:58                                 ` Tags H. Peter Anvin
@ 2005-07-02 18:31                                   ` Eric W. Biederman
  2005-07-02 19:55                                     ` Tags Matthias Urlichs
  2005-07-02 21:16                                     ` Tags H. Peter Anvin
  2005-07-02 18:45                                   ` Tags Linus Torvalds
  1 sibling, 2 replies; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-02 18:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>> ?? Isn't that what ssh is?
>> To some extent a lot depends on how active you expect people to
>> try and forge things.  If there is an expectation of honesty
>> you are fine.
>
> I can't afford to have that.

So you are now your requirements are more stringent then sourceforge?
Sourcefore limited things by reducing the scope of commits per
project.  But once you had commit access to a project you could do
just about anything.

>> If you want to build one mondo repository with thousands of developers
>> having write access you need to be more careful.  But as far as I know
>> none of that is specific to tags.
>
> Well, you're wrong.  Tags is the only part of git which cannot be protected by
> git's own self-validation system.

Which is why I suggested having tags in sync with the committer
information, that way you are as valid as the commit record
in git.  Although I suspect the multiple head solution is
probably better, and simply limiting the people who can commit
to an individual head will achieve what is necessary.  One user
per head?

One thing arch has shown is that you can sucessfully move
authentication/permission checking to the underlying environment
if you structure things carefully.

I guess the problem is really we want to structure things so that
a user who has downloaded the code can verify they have the
release/tag is what they are looking for.  You can detect
a spoofed file in objects by simply verifying the sha1 of the file.

For a file that you can't internally verify that way the traditional
way to handle that is to create a file with a gpg signature.  So
is there anything wrong with adding .git/refs/tags/tag-name.sign
that is a traditional signature file?   That will at least give
you an end to end consistency check.  (Hmm.  Why didn't I suggest
this before?)

If you don't want to mirror and propagate data you need to do
consistency checks earlier in the process, and I have probably had
some poor suggestions on how to implement those.  But if everything
is setup so we can verify things once we have the code downloaded,
where you perform the checks is simply a matter of optimization.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 18:31                                   ` Tags Eric W. Biederman
@ 2005-07-02 19:55                                     ` Matthias Urlichs
  2005-07-02 21:16                                     ` Tags H. Peter Anvin
  1 sibling, 0 replies; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-02 19:55 UTC (permalink / raw)
  To: git

Hi, Eric W. Biederman wrote:

> So
> is there anything wrong with adding .git/refs/tags/tag-name.sign
> that is a traditional signature file?

The signature is already appended to the tag file itself (or can be).
See "git-tag-script".

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Democracy is that form of government where everybody gets what the majority
deserves.
					-- James Dale Davidson

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 18:31                                   ` Tags Eric W. Biederman
  2005-07-02 19:55                                     ` Tags Matthias Urlichs
@ 2005-07-02 21:16                                     ` H. Peter Anvin
  2005-07-02 21:39                                       ` Tags Linus Torvalds
  1 sibling, 1 reply; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 21:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> 
>>Eric W. Biederman wrote:
>>
>>>?? Isn't that what ssh is?
>>>To some extent a lot depends on how active you expect people to
>>>try and forge things.  If there is an expectation of honesty
>>>you are fine.
>>
>>I can't afford to have that.
> 
> So you are now your requirements are more stringent then sourceforge?
> Sourcefore limited things by reducing the scope of commits per
> project.  But once you had commit access to a project you could do
> just about anything.
> 

They're not using a single global object storage.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 21:16                                     ` Tags H. Peter Anvin
@ 2005-07-02 21:39                                       ` Linus Torvalds
  2005-07-02 21:42                                         ` Tags H. Peter Anvin
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02 21:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List,
	Junio C Hamano, ftpadmin

On Sat, 2 Jul 2005, H. Peter Anvin wrote:
> 
> They're not using a single global object storage.

Note that the fact that you use a common object store does not mean that 
everything should be common.

I still contend that tags and branches and things like that should be 
personal. A "gitforge" thing should _not_ try to unify tags. Instead, give 
people their own private area for keeping their own private references 
(you can limit it to just a few kilobytes per person, so you might as well 
just consider it to be part of their "user information" thing along with 
whatever other preferences they have).

Then, they call all share the objects, and there's never any confusion
about tags - everybody has their own tags, and you add a few simple
operations like "copy user xxx's tag to my tag-space, and start a new 
branch from that".

There're really no downsides. The only thing you need to have is some nice
tag-browser (and some simple permission model where developers can say
"others can read my tag" or "this tag is visible only to me" - the object 
store may be shared, but if nobody can see your pointers into the object 
store, you effectively have a totally private branch - which might be 
what some people want).

There's really never any reason to make tags global. Even in the case of
the kernel, people don't want to see a tag like "v2.6.12". They want to
see what _I_ tagged v2.6.12, so implicit in that whole thing is very much
that they want to see _my_ tags. Again, it's a _browsing_ issue, not a
"tags should be global" issue. They should be visible and easily 
fetchable.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 21:39                                       ` Tags Linus Torvalds
@ 2005-07-02 21:42                                         ` H. Peter Anvin
  2005-07-02 22:02                                           ` Tags A Large Angry SCM
                                                             ` (2 more replies)
  0 siblings, 3 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-02 21:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List,
	Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> 
> Note that the fact that you use a common object store does not mean that 
> everything should be common.
> 
> I still contend that tags and branches and things like that should be 
> personal. A "gitforge" thing should _not_ try to unify tags. Instead, give 
> people their own private area for keeping their own private references 
> (you can limit it to just a few kilobytes per person, so you might as well 
> just consider it to be part of their "user information" thing along with 
> whatever other preferences they have).
> 
> Then, they call all share the objects, and there's never any confusion
> about tags - everybody has their own tags, and you add a few simple
> operations like "copy user xxx's tag to my tag-space, and start a new 
> branch from that".
> 
> There're really no downsides. The only thing you need to have is some nice
> tag-browser (and some simple permission model where developers can say
> "others can read my tag" or "this tag is visible only to me" - the object 
> store may be shared, but if nobody can see your pointers into the object 
> store, you effectively have a totally private branch - which might be 
> what some people want).
> 
> There's really never any reason to make tags global. Even in the case of
> the kernel, people don't want to see a tag like "v2.6.12". They want to
> see what _I_ tagged v2.6.12, so implicit in that whole thing is very much
> that they want to see _my_ tags. Again, it's a _browsing_ issue, not a
> "tags should be global" issue. They should be visible and easily 
> fetchable.
> 

OK, so let me retell what I think I hear you say:

- Store all the tags in the object store; they may conflict.
- Let each source user have a set of refs, and provide a method for the 
end user to select which refs to get.

In other words, the only way (other than knowing what GPG keys to trust) 
to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" 
is that the former is referenced by *your* refs as opposed to JRH's 
refs.  This also means the refs cannot be uniquely rebuilt from the 
object storage.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 21:42                                         ` Tags H. Peter Anvin
@ 2005-07-02 22:02                                           ` A Large Angry SCM
  2005-07-02 22:20                                             ` Tags Linus Torvalds
  2005-07-02 22:14                                           ` Tags Petr Baudis
  2005-07-02 22:17                                           ` Tags Linus Torvalds
  2 siblings, 1 reply; 86+ messages in thread
From: A Large Angry SCM @ 2005-07-02 22:02 UTC (permalink / raw)
  To: Git Mailing List
  Cc: H. Peter Anvin, Linus Torvalds, Eric W. Biederman,
	Daniel Barkalow, Junio C Hamano, ftpadmin



H. Peter Anvin wrote:
...
> 
> OK, so let me retell what I think I hear you say:
> 
> - Store all the tags in the object store; they may conflict.
> - Let each source user have a set of refs, and provide a method for the 
> end user to select which refs to get.
> 
> In other words, the only way (other than knowing what GPG keys to trust) 
> to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" 
> is that the former is referenced by *your* refs as opposed to JRH's 
> refs.  This also means the refs cannot be uniquely rebuilt from the 
> object storage.

Why have tag objects at all?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 22:02                                           ` Tags A Large Angry SCM
@ 2005-07-02 22:20                                             ` Linus Torvalds
  2005-07-02 23:49                                               ` Tags A Large Angry SCM
  0 siblings, 1 reply; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02 22:20 UTC (permalink / raw)
  To: A Large Angry SCM
  Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman,
	Daniel Barkalow, Junio C Hamano, ftpadmin

On Sat, 2 Jul 2005, A Large Angry SCM wrote:
> 
> Why have tag objects at all?

Trust.

None of git itself normally has any "trust". The SHA1 means that the 
_integrity_ of the archive is ensured, but for some things (notably 
releases), you want to have something else. That's the "tag object".

And I really should probably have called them something else. _I_
personally tend to want to have a 1:1 relationship between my "tag
references" (ie the 20-byte SHA1 pointer) and my "tag objects", but that's
because my releases are things that I envision people may actually want to
verify are mine.

In many cases, you'd never use a "tag object", and the "tag reference" 
would just point directly to a commit, with no extra indirect object.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 22:20                                             ` Tags Linus Torvalds
@ 2005-07-02 23:49                                               ` A Large Angry SCM
  2005-07-03  0:17                                                 ` Tags Linus Torvalds
  0 siblings, 1 reply; 86+ messages in thread
From: A Large Angry SCM @ 2005-07-02 23:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman,
	Daniel Barkalow, Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> 
> On Sat, 2 Jul 2005, A Large Angry SCM wrote:
>>Why have tag objects at all?
> 
> Trust.
> 
> None of git itself normally has any "trust". The SHA1 means that the 
> _integrity_ of the archive is ensured, but for some things (notably 
> releases), you want to have something else. That's the "tag object".
> 

But can't the commit object do this just as well by signing the commit text?

> And I really should probably have called them something else. _I_
> personally tend to want to have a 1:1 relationship between my "tag
> references" (ie the 20-byte SHA1 pointer) and my "tag objects", but that's
> because my releases are things that I envision people may actually want to
> verify are mine.
> 

Your tendency is to use tag objects as a permanent, public label of some 
state. Signing the commit text or the email stating that commit 
${COMMIT_SHA} would work just as well for verification purposes. Or even 
a blob object containing the signed text "${COMMIT_SHA} is vX.X.X.X". 
Either way, you'd still need some kind of external reference to find the 
object.

> In many cases, you'd never use a "tag object", and the "tag reference" 
> would just point directly to a commit, with no extra indirect object.

Tag refs, like head refs and branches, are all just (temporary) 
notational shorthand to make using the tools easier.

The problem with the Borg repository is not the objects but the object 
refs. Isn't that just a namespace problem?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 23:49                                               ` Tags A Large Angry SCM
@ 2005-07-03  0:17                                                 ` Linus Torvalds
  0 siblings, 0 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-03  0:17 UTC (permalink / raw)
  To: A Large Angry SCM
  Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman,
	Daniel Barkalow, Junio C Hamano, ftpadmin

On Sat, 2 Jul 2005, A Large Angry SCM wrote:
>
> Linus Torvalds wrote:
> > 
> > None of git itself normally has any "trust". The SHA1 means that the 
> > _integrity_ of the archive is ensured, but for some things (notably 
> > releases), you want to have something else. That's the "tag object".
> > 
> 
> But can't the commit object do this just as well by signing the commit text?

Yes and no.

Technically yes, absolutely, you could add a signature to the commit text.

However, that's just wrong for several reasons:

First off, the signing is not necessarily done by the person committing
something. Think of any paperwork: the person that signs the paperwork is 
not necessarily the same person that _wrote_ the paperwork. A signature is 
a "witness".

For an example of this, look at the signatures that we've had for a long 
time on kernel.org: check out the files like "patch-2.6.8.1.sign". That's 
a signature, but it's not a signature by _me_. It's kernel.org signing the 
thing so that downstream people can verify things.

And it would be not only wrong, but literally _impossible_ for me to do it 
in the commit. I don't have (or want to have) the kernel.org private key. 
That's not what the signature is about. kernel.org is signing that "this 
is what I got, and what I passed on". It's not signing that "this is what 
I wrote".

In a lot of systems, you tag something good after it has passed a
regression test. Ie the _tag_ may happen days or even weeks after the
commit has been done.

So any system that signs commits directly is doing something _wrong_. 

Secondly, you can say that you trust other things. In git, you can tag 
individual blobs, and you can tag individual trees. For an example of 
where it makes sense to tag (sign) individual file versions, we've 
actually had things like ISDN drivers (or firmware) that passed some telco 
verification suite, and in certain countries it used to be that you 
weren't legally supposed to use hadrware that hadn't passed that suite. In 
cases like that, you could sign the particular version of the driver, and 
say "this one is good".

(Yeah, those laws are happily going away, but I think the ISDN people in 
germany actually ended up doing exactly that, except they obviously didn't 
use git signatures. I think they had a list of file+md5sum).

Finally, it's a tools issue. It's wrong to mix up the notion of committing 
and signing in the same thing, because that just complicates a tool that 
has to be able to do both. Now you can have a nice graphical commit tool, 
and it doesn't need to know about public keys etc to be useful - you can 
use another tool to do the signing.

Small is beautiful, but "independent" is even more so.

> Your tendency is to use tag objects as a permanent, public label of some 
> state. Signing the commit text or the email stating that commit 
> ${COMMIT_SHA} would work just as well for verification purposes.

Well, according to that logic, you'd never need signatures at all - you 
can always keep them totally outside the system.

But if they are totally outside the system, then you have to have some
other mechanism to track them, and you can never trust a git archive on
its own. My goal with the tag objects was that you can just get my git
archive, and the archive is _inherently_ trustworthy, because if you care,
you can verify it without any external input at all (except you need to
know my public key, of course, but that's not a tools issue any more,
that's about how signatures work).

So by having tag objects, I can just have refs to them, and anything that 
can fetch a ref (which implies _any_ kind of "pull" functionality) can get 
it. No special cases. No crap.

Do one thing, and do it well. Git does objects with relationships. That's 
really what git is all about, and the "tag object" fits very well into 
that mentality.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 21:42                                         ` Tags H. Peter Anvin
  2005-07-02 22:02                                           ` Tags A Large Angry SCM
@ 2005-07-02 22:14                                           ` Petr Baudis
  2005-07-02 22:17                                           ` Tags Linus Torvalds
  2 siblings, 0 replies; 86+ messages in thread
From: Petr Baudis @ 2005-07-02 22:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Eric W. Biederman, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Dear diary, on Sat, Jul 02, 2005 at 11:42:51PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Linus Torvalds wrote:
> >
> >Note that the fact that you use a common object store does not mean that 
> >everything should be common.

\o/ Finally I have some hope that we don't end up with something
braindead w.r.t. the tags... ;-)

..snip..
> OK, so let me retell what I think I hear you say:
> 
> - Store all the tags in the object store; they may conflict.

They may have the same "human-readable name", but they will have a
different hash.

> - Let each source user have a set of refs, and provide a method for the 
> end user to select which refs to get.
> 
> In other words, the only way (other than knowing what GPG keys to trust) 
> to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" 
> is that the former is referenced by *your* refs as opposed to JRH's 
> refs.

After all, this is the best way to distinguish it, isn't it? Just "tag
name" without a name of the branch the tag concerns makes no sense -
that's the point I'm trying to get along. JRH's v2.6.12 wouldn't make
much sense to you if you use Linus' v2.6.12, since the object JRH's
v2.6.12 references simply may not be in the branch you use. Yes, JRH
could tag it somewhere in the common past, but that's kind of strange
and is likely some private JRH's stuff. If Linus merged JRH, he will
take his v2.6.12 if it makes sense in his branch - so the decision
is then up to the one who merges, which makes some sense too.

FYI, I'll teach Cogito about the refs/tags/<branch>/<tag> later today
(and totally offtopic, it already has some trivial cg-push now).
It will still fall back to refs/tags/<tag>.

> This also means the refs cannot be uniquely rebuilt from the 
> object storage.

Why should they be, after all.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 21:42                                         ` Tags H. Peter Anvin
  2005-07-02 22:02                                           ` Tags A Large Angry SCM
  2005-07-02 22:14                                           ` Tags Petr Baudis
@ 2005-07-02 22:17                                           ` Linus Torvalds
  2005-07-03  0:04                                             ` Tags Dan Holmsand
  2005-07-05 13:04                                             ` Tags Eric W. Biederman
  2 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02 22:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List,
	Junio C Hamano, ftpadmin

On Sat, 2 Jul 2005, H. Peter Anvin wrote:
> 
> OK, so let me retell what I think I hear you say:
> 
> - Store all the tags in the object store; they may conflict.

No. They cannot conflict.

A git "tag object" cannmot conflict in any way. It is just a generic 
"pointer object", and like all other objects, it is defined by its 
contents, and there are no "conflicts". If two people have exactly the 
same pointer, they'll just have the same object - that's not a conflict, 
that's just a fact of life with content-addressable filesystems.

The git "tag object" contains a suggested symbolic name, but that actually 
has no meaning except as being informational. So for example:

	[torvalds@g5 linux]$ git-cat-file tag v2.6.12
	object 9ee1c939d1cb936b1f98e8d81aeffab57bae46ab
	type commit
	tag v2.6.12

	This is the final 2.6.12 release
	-----BEGIN PGP SIGNATURE-----
	Version: GnuPG v1.2.4 (GNU/Linux)

	iD8DBQBCsykyF3YsRnbiHLsRAvPNAJ482tCZwuxp/bJRz7Q98MHlN83TpACdHr37
	o6X/3T+vm8K3bf3driRr34c=
	=sBHn
	-----END PGP SIGNATURE-----

here the "symbolic name" is "v2.6.12", but that's purely informational, 
and nothing at all cares if a million people have made their own tags that 
have that same tag-name. The git _object_ is:

	[torvalds@g5 linux]$ git-rev-parse v2.6.12
	26791a8bcf0e6d33f43aef7682bdb555236d56de

and that object name is going to be unique (modulo hash collissions)

> - Let each source user have a set of refs, and provide a method for the 
> end user to select which refs to get.

Right. Let users have any damn refs they want. They may be refs to tags
objects, but they may just be direct refs to the commit. The tag object 
really has no meaning to git, except it allows signing. That's really the 
_only_ thing a tag object does: it introduces trust. There's no other 
reason to ever use one, really.

And a "tag ref" thing is really nothing more (and nothing less) than a
branch. It's a 41-byte filename, although if you actually were to have a
"gitforge" deamon, it could also be just the raw 20-byte SHA1 in a
database. Let people have their own refs, and have some good way to create
them and delete them, and copy them from others (and refer to other
peoples refs - one common usage might be "I want to merge with that other
users ref 'xyzzy'".

Note that the .git/refs/tags/xxx files are _literally_ treated exactly the 
same as the same files under "heads". Or under "mydir". Git really doesn't 
care, it's purely syntactic sugar. To git, a ref is a ref is a ref. It 
just refers to an object, and it's nothing more than a way to specify some 
random SHA1 at any time.

> In other words, the only way (other than knowing what GPG keys to trust) 
> to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" 
> is that the former is referenced by *your* refs as opposed to JRH's 
> refs.  This also means the refs cannot be uniquely rebuilt from the 
> object storage.

Right. All the refs are personal and "fleeting" - some refs are actively
changed all the time (branch refs - aka "heads" - get updated when you
update the branch). Tags are really the same way in all technical ways,
and the only real difference between a "branch ref" and a "tag ref" is
your _expectation_ of them - one you expect to be mostly stable, the other
you expect to be updated with development. _Technically_ there's no
difference between the two, though.

(And you might also change tag contents occasionally. One reason might be
a bug and you decide to re-tag something else. But a more common reason
might be because you want to have tags like "latest" that don't actually
update with development, but they update with some other event, like a
release event or some automated test cycle completion or something like
that. So tags aren't _immutable_ even from an expectation standpoint, 
it's just that they tend to change _less_).

Now, from tag _objects_ (as opposed to tag refs) you _can_ build them if
somebody created a tag object, and you have the signature so that you can
re-associate the tag-name with the person. But you should consider that a
pretty heavy and unusual case. The normal case is that you just want to
back up peoples refs. They're like a part of a personal ".gitrc": you
could equally well think of them as "these are my shorthands, because I
don't want to talk about 40-digit hex numbers all the time". It's nothing
more than a personal address book, really.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 22:17                                           ` Tags Linus Torvalds
@ 2005-07-03  0:04                                             ` Dan Holmsand
  2005-07-03 22:34                                               ` Tags Kevin Smith
  2005-07-05 13:04                                             ` Tags Eric W. Biederman
  1 sibling, 1 reply; 86+ messages in thread
From: Dan Holmsand @ 2005-07-03  0:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Eric W. Biederman, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Linus Torvalds wrote:
> And a "tag ref" thing is really nothing more (and nothing less) than a
> branch. 

I'm guessing that this is the root of the confusion here. To you, and to 
git, a tag is just a another branch. And a tag object is pretty much a 
specialized commit object, that can't have children and only one parent.

But people seem to *expect* tags to be connected somehow to a specific 
repository. Or, rather, to a specific branch.

That's why people want e.g. cogito to get "all the tags" from 
torvalds/linux-2.6.git when they cg-pull.

 From git's point of view, that doesn't really make any sense; it's like 
saying that you should pull all the branches from a specific branch. But 
from a practical point of view, it *does* make sense if you hold the 
view that tags are connected to a branch, and that you should be able to 
diff against v2.6.12 as soon as you've pulled the latest head.

So why not add tags to the branch itself?

It should be pretty straightforward: just make git look for tag refs in, 
say, a .gittags tree in the current HEAD. The whole thing would pretty 
much as if you've symlinked .git/refs/tags to .gittags in the current 
working tree, except that tag refs would have to be read directly from 
the repository.

That way, tag refs could be handled pretty much just like any other 
git-managed file: they can be added, deleted, changed, merged, 
committed, etc. We could track their history, and see who tagged what 
and when.

And tags could easily be signed and contain arbitrary text, just like 
the present day tag objects, as long as they start with a sha1 ref.

This way, a git branch could have public, shared tags, with a minimum of 
hassle. No special-casing needed for storage or transfer.

And there would be no room for conflicting tag names (but you could 
easily use the same name in different branches, just as any file can 
differ in content between two branches).

It might be useful, though, to add some syntax for "tag in a specific 
branch", say <branch-name>@<tag-name>.

The present tagging mechanism should be kept. It is useful for private 
tagging, and may be useful for signalling that "this is a branch that is 
unlikely to change".

So, am I missing something obvious here?

/dan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-03  0:04                                             ` Tags Dan Holmsand
@ 2005-07-03 22:34                                               ` Kevin Smith
  0 siblings, 0 replies; 86+ messages in thread
From: Kevin Smith @ 2005-07-03 22:34 UTC (permalink / raw)
  To: Dan Holmsand; +Cc: Git Mailing List

Dan Holmsand wrote:
> So why not add tags to the branch itself?
> 
> It should be pretty straightforward: just make git look for tag refs in, 
> say, a .gittags tree in the current HEAD. The whole thing would pretty 
> much as if you've symlinked .git/refs/tags to .gittags in the current 
> working tree, except that tag refs would have to be read directly from 
> the repository.
> 
> That way, tag refs could be handled pretty much just like any other 
> git-managed file: they can be added, deleted, changed, merged, 
> committed, etc. We could track their history, and see who tagged what 
> and when.

Sounds like the way mercurial handles tags. It really seemed weird to me 
at first, but the more I think about it, the more it makes sense. Even 
more so after reading this thread :-)

   http://www.serpentine.com/mercurial/index.cgi?Tag

Kevin

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 22:17                                           ` Tags Linus Torvalds
  2005-07-03  0:04                                             ` Tags Dan Holmsand
@ 2005-07-05 13:04                                             ` Eric W. Biederman
  2005-07-05 16:21                                               ` Tags Daniel Barkalow
  1 sibling, 1 reply; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-05 13:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Daniel Barkalow, Git Mailing List, Junio C Hamano,
	ftpadmin

Linus Torvalds <torvalds@osdl.org> writes:

> (And you might also change tag contents occasionally. One reason might be
> a bug and you decide to re-tag something else. But a more common reason
> might be because you want to have tags like "latest" that don't actually
> update with development, but they update with some other event, like a
> release event or some automated test cycle completion or something like
> that. So tags aren't _immutable_ even from an expectation standpoint, 
> it's just that they tend to change _less_).

Could you include the person who generated the tag and the time the
tag was generated in the tag object?

For a tag like "latest" it would help quite a bit if you could actually
find out which was the latest version of it :)

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 13:04                                             ` Tags Eric W. Biederman
@ 2005-07-05 16:21                                               ` Daniel Barkalow
  2005-07-05 17:51                                                 ` Tags Eric W. Biederman
  0 siblings, 1 reply; 86+ messages in thread
From: Daniel Barkalow @ 2005-07-05 16:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, H. Peter Anvin, Git Mailing List, Junio C Hamano,
	ftpadmin

On Tue, 5 Jul 2005, Eric W. Biederman wrote:

> Could you include the person who generated the tag and the time the
> tag was generated in the tag object?
> 
> For a tag like "latest" it would help quite a bit if you could actually
> find out which was the latest version of it :)

Actually, what you really want here is to put in refs/tags/latest the hash
of the tag whose "tag" field is v2.6.13-rc1 (or whatever it is). Having a
tag with the "tag" field of "latest" would be a bit silly, because the
object will probably stay in circulation long after it's no longer
true. And the object itself would tell you that it was the latest version
when it was created (but isn't every version?). That's why you want the
_tag_ to say something useful about the version (maybe "v2.6.12", maybe
just "tested"), and the _ref_ to tell you it's the latest.

The fact that lots of tags get refs named with their contents is just due
to tags only getting used for a small portion of their possible uses. This
only happens when the feature you'd look something up under is a feature
which is persistent.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 16:21                                               ` Tags Daniel Barkalow
@ 2005-07-05 17:51                                                 ` Eric W. Biederman
  2005-07-05 18:33                                                   ` Tags Linus Torvalds
  0 siblings, 1 reply; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-05 17:51 UTC (permalink / raw)
  To: Daniel Barkalow
  Cc: Linus Torvalds, H. Peter Anvin, Git Mailing List, Junio C Hamano,
	ftpadmin

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Tue, 5 Jul 2005, Eric W. Biederman wrote:
>
>> Could you include the person who generated the tag and the time the
>> tag was generated in the tag object?
>> 
>> For a tag like "latest" it would help quite a bit if you could actually
>> find out which was the latest version of it :)
>
> The fact that lots of tags get refs named with their contents is just due
> to tags only getting used for a small portion of their possible uses. This
> only happens when the feature you'd look something up under is a feature
> which is persistent.

True but if you can you will get multiple tags with the
same suggested name.  So you need so way to find the one you
care about.

Either a date or it's position in the tree, are all you have
to go on.

I picked on latest as that is an extreme example that had already
been mentioned.

Eric

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 17:51                                                 ` Tags Eric W. Biederman
@ 2005-07-05 18:33                                                   ` Linus Torvalds
  2005-07-05 19:22                                                     ` Tags Junio C Hamano
  2005-07-07  3:31                                                     ` Tags Eric W. Biederman
  0 siblings, 2 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-05 18:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Barkalow, H. Peter Anvin, Git Mailing List, Junio C Hamano,
	ftpadmin



On Tue, 5 Jul 2005, Eric W. Biederman wrote:
> 
> True but if you can you will get multiple tags with the
> same suggested name.  So you need so way to find the one you
> care about.

I do agree that it would make sense to have a "tagger" field with the same 
semantics as the "committer" in a commit (including all the same fields: 
real name, email, and date).

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 18:33                                                   ` Tags Linus Torvalds
@ 2005-07-05 19:22                                                     ` Junio C Hamano
  2005-07-06 18:04                                                       ` Tags Matthias Urlichs
  2005-07-07  3:31                                                     ` Tags Eric W. Biederman
  1 sibling, 1 reply; 86+ messages in thread
From: Junio C Hamano @ 2005-07-05 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Daniel Barkalow, H. Peter Anvin,
	Git Mailing List, ftpadmin

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Tue, 5 Jul 2005, Eric W. Biederman wrote:
>> 
>> True but if you can you will get multiple tags with the
>> same suggested name.  So you need so way to find the one you
>> care about.

LT> I do agree that it would make sense to have a "tagger" field with the same 
LT> semantics as the "committer" in a commit (including all the same fields: 
LT> real name, email, and date).

While we are talking about changing tag object format/fields,
I've wondered if we would want to be able to associate more than
one objects with a single tag (i.e. have more than one "object"
lines just like commits can have more than one "parent" lines).
I admit that it would not be a "tag" anymore, rather, it would
be a "bag".

I wanted to have something like this in the past for some reason
I do not exactly remember anymore, but basically it was to
record "here is the list of related objects."

I could fake it with a multi-parent commit with a commit message
if all I want to include are commits with a single blob, but
that is (1) abusing the commit to record something that is not
even a merge, and (2) the tree associated with that commit would
not mean anything.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 19:22                                                     ` Tags Junio C Hamano
@ 2005-07-06 18:04                                                       ` Matthias Urlichs
  0 siblings, 0 replies; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-06 18:04 UTC (permalink / raw)
  To: git

Hi, Junio C Hamano wrote:

> I wanted to have something like this in the past for some reason
> I do not exactly remember anymore, but basically it was to
> record "here is the list of related objects."

One use I'd have for that is regression testing -- collect all IDs in one
bag and then say "gitk bad ^good".

OTOH, I dunno whether the core tools really need to understand that.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
If at first you don't succeed, you must be a programmer.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-05 18:33                                                   ` Tags Linus Torvalds
  2005-07-05 19:22                                                     ` Tags Junio C Hamano
@ 2005-07-07  3:31                                                     ` Eric W. Biederman
  1 sibling, 0 replies; 86+ messages in thread
From: Eric W. Biederman @ 2005-07-07  3:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Daniel Barkalow, H. Peter Anvin, Git Mailing List, Junio C Hamano,
	ftpadmin

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 5 Jul 2005, Eric W. Biederman wrote:
>> 
>> True but if you can you will get multiple tags with the
>> same suggested name.  So you need so way to find the one you
>> care about.
>
> I do agree that it would make sense to have a "tagger" field with the same 
> semantics as the "committer" in a commit (including all the same fields: 
> real name, email, and date).

Ok here is a patch that implements it.

I don't know how robust my code to get the defaults of tagger
email address and especially tagger name are but basically it
works.

In addition I added a message when git-tag-script is waiting
for you to type the tag message so people aren't confused.

And of course I modified git-mktag to check that the tagger
field is present.

Now git-pull-script just needs to be tweaked to optionally
add tags in the update into .git/refs/tags :)   Using git-fsck-cache
to find tags is doable but it slows down as your archive grows.

Eric


diff --git a/date.c b/date.c
diff --git a/git-tag-script b/git-tag-script
--- a/git-tag-script
+++ b/git-tag-script
@@ -1,12 +1,30 @@
 #!/bin/sh
 # Copyright (c) 2005 Linus Torvalds
 
+usage() {
+	echo 'git tag <tag name> [<sha1>]'
+	exit 1
+}
+
 : ${GIT_DIR=.git}
+if [ ! -d "$GIT_DIR" ]; then
+	echo Not a git directory 1>&2
+	exit 1
+fi
+
+if [ $# -gt 2 -o $# -lt 1 ]; then
+	usage
+fi
 
 object=${2:-$(cat "$GIT_DIR"/HEAD)}
 type=$(git-cat-file -t $object) || exit 1
-( echo -e "object $object\ntype $type\ntag $1\n"; cat ) > .tmp-tag
+tagger_name=${GIT_COMMITTER_NAME:-$(sed -n -e "s/^$(whoami):[^:]*:[^:]*:[^:]*:\([^:,]*\).*:.*$/\1/p" <  /etc/passwd)}
+tagger_email=${GIT_COMMITTER_EMAIL:-"$(whoami)@$(hostname --fqdn)"}
+tagger_date=$(date -d "${GIT_COMMITTER_DATE:-$(date -R)}" +"%s %z") || exit 1
+echo "Enter tag message now. ^D when finished"
+( echo -e "object $object\ntype $type\ntag $1\ntagger $tagger_name <$tagger_email> $tagger_date\n"; cat) > .tmp-tag
 rm -f .tmp-tag.asc
 gpg -bsa .tmp-tag && cat .tmp-tag.asc >> .tmp-tag
-git-mktag < .tmp-tag
-#rm .tmp-tag .tmp-tag.sig
+exit 1
+./git-mktag < .tmp-tag
+rm -f .tmp-tag .tmp-tag.sig
diff --git a/mktag.c b/mktag.c
--- a/mktag.c
+++ b/mktag.c
@@ -42,7 +42,7 @@ static int verify_tag(char *buffer, unsi
 	int typelen;
 	char type[20];
 	unsigned char sha1[20];
-	const char *object, *type_line, *tag_line;
+	const char *object, *type_line, *tag_line, *tagger_line;
 
 	if (size < 64 || size > MAXSIZE-1)
 		return -1;
@@ -91,6 +91,11 @@ static int verify_tag(char *buffer, unsi
 			continue;
 		return -1;
 	}
+	/* Verify the tagger line */
+	tagger_line = tag_line;
+
+	if (memcmp(tagger_line, "tagger ", 7) || (tagger_line[7] == '\n'))
+		return -1;
 
 	/* The actual stuff afterwards we don't care about.. */
 	return 0;
@@ -119,7 +124,7 @@ int main(int argc, char **argv)
 		size += ret;
 	}
 
-	// Verify it for some basic sanity: it needs to start with "object <sha1>\ntype "
+	// Verify it for some basic sanity: it needs to start with "object <sha1>\ntype\ntagger "
 	if (verify_tag(buffer, size) < 0)
 		die("invalid tag signature file");
 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 17:58                                 ` Tags H. Peter Anvin
  2005-07-02 18:31                                   ` Tags Eric W. Biederman
@ 2005-07-02 18:45                                   ` Linus Torvalds
  1 sibling, 0 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-07-02 18:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List,
	Junio C Hamano, ftpadmin

On Sat, 2 Jul 2005, H. Peter Anvin wrote:
> 
> Well, you're wrong.  Tags is the only part of git which cannot be 
> protected by git's own self-validation system.

Well, you _can_ use the tag objects. That's what I do. The namespace isn't
the tag name you use ("v2.6.12"), it's the name of the tag itself (in this
case "26791a8bcf0e6d33f43aef7682bdb555236d56de"), and then it does
actually distribute fine. The symbolic name is encoded within the tag, but 
isn't guaranteed to be unique in any way.

So no, it doesn't protect the tag _name_ per se. Anybody can create a tag
called "v2.6.12", and I don't think there's any way to handle clashes
sanely. But you can find the tag objects in a pack, and you could index 
them separately. Then you'd need to let the users decide which ones they 
trust or want to use.

		Linus

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02  0:06                         ` Tags H. Peter Anvin
  2005-07-02  7:00                           ` Tags Eric W. Biederman
@ 2005-07-02 20:38                           ` Jan Harkes
  2005-07-02 22:32                             ` Tags Jan Harkes
  1 sibling, 1 reply; 86+ messages in thread
From: Jan Harkes @ 2005-07-02 20:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

On Fri, Jul 01, 2005 at 05:06:15PM -0700, H. Peter Anvin wrote:
> Eric W. Biederman wrote:
> >
> >If I really care what developer xyz tagged I will pull from them,
> >or a mirror I trust.  And since developer xyz doesn't pull his
> >own global tags from other repositories that should be sufficient.
> >
> 
> You're missing something totally and utterly fundamental here: I'm 
> talking about creating an infrastructure (think sourceforge) where there 
> is only one git repository for the whole system, period, full stop, end 
> of story.

I'm not entirely sure what you are envisoning, but it is definitely
doable in a secure way.

- Assume that each developer will one or more private trees with one or
  more branches on kernel.org, lets say all these private repositories
  are stored under /scm/git/<user>/

- Now you create a single 'global repository' which is going to be the
  publicly visible one that will be mirrored out,

- Then you run the following script (untested)
  #!/bin/sh
  GIT_DIR=$global_repo
  for user in `(cd /scm/git ; ls)`; do
    for tree in `find /scm/git/$user -name *.git` ; do
	for ref in `find $tree/refs -type f`  ; do
	    type=`echo $ref | sed 'sX^.*/refs/\([^/]*\)/.*$X\1X'`
	    name=`echo $ref | sed 'sX^.*/refs/[^/]*/\(.*\)$X\1X'`
	    git fetch /scm/git/$tree $branch 
	    mkdir -p $GIT_DIR/refs/$type/$user/$name
	    cat $GIT_DIR/FETCH_HEAD > $GIT_DIR/refs/$type/$user/$name
	done
    done
  done

- You can repack the global repository whenever you want.
- Finally, once a user knows that all his changes are available from the
  global repository, he can remove any objects from his tree and use
  GIT_ALTERNATE_OBJECT_DIRECTORIES=$global_repo/objects
  (maybe there should be a flag for git prune to removes local objects
  that are already available in the alternate object directories)

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-02 20:38                           ` Tags Jan Harkes
@ 2005-07-02 22:32                             ` Jan Harkes
  0 siblings, 0 replies; 86+ messages in thread
From: Jan Harkes @ 2005-07-02 22:32 UTC (permalink / raw)
  To: Git Mailing List
  Cc: H. Peter Anvin, Eric W. Biederman, Linus Torvalds,
	Daniel Barkalow, Junio C Hamano, ftpadmin

On Sat, Jul 02, 2005 at 04:38:06PM -0400, Jan Harkes wrote:
> - Then you run the following script (untested)

Ok, I tested it and it was pretty broken, I assumed that git-fetch-script
accepted the same arguments as git-pull-script.

Here is one that actually seems to work.

Jan


#!/bin/sh
#
# combine per-user private trees into a single repository.
# assumes that user repositories are stored as "$repos/<user>/<tree>.git"
#
global=global.git
repos=/path/to/user/repositories

export GIT_DIR="$global"

# create global repository if it doesn't exist
git-init-db

for tree in $(cd "$repos" && find . -name '*.git' -prune | sed 'sX./XX')
do
    root="$repos/$tree"
    for ref in $(cd "$root" && find refs -type f)  ; do
	echo Synchronizing $tree
	git fetch "$root" "$ref"

	type=$(echo "$ref" | sed -ne 'sX^refs/\([^/]*\)/.*$X\1Xp')
	name=$(echo "$ref" | sed -ne 'sX^refs/[^/]*/\(.*\)$X\1Xp')
	dest="$GIT_DIR/refs/$type/$tree/$name"
	mkdir -p $(dirname "$dest")
	cat "$GIT_DIR/FETCH_HEAD" > "$dest"
    done
done

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 22:44                     ` Tags H. Peter Anvin
  2005-07-01 23:07                       ` Tags Eric W. Biederman
@ 2005-07-02 16:00                       ` Matthias Urlichs
  1 sibling, 0 replies; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-02 16:00 UTC (permalink / raw)
  To: git

Hi, H. Peter Anvin wrote:

> Doesn't work.  You can trivially generate a key with someone else's 
> address.  It would require a full PKI.

So you use the GPG key's fingerprint as the directory name, and add
a few strategically named symlinks for convenience. *Shrug*

Besides, what's wrong with requiring full PKI? Everybody who has
a kernel.org account should be in the strongly connected set...

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
What I want is all of the power and none of the responsibility.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 13:56               ` Tags Eric W. Biederman
  2005-07-01 16:37                 ` Tags H. Peter Anvin
@ 2005-07-01 18:09                 ` Petr Baudis
  2005-07-01 18:37                   ` Tags H. Peter Anvin
  1 sibling, 1 reply; 86+ messages in thread
From: Petr Baudis @ 2005-07-01 18:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Linus Torvalds, Daniel Barkalow, Git Mailing List,
	Junio C Hamano, ftpadmin

Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter
where "Eric W. Biederman" <ebiederm@xmission.com> told me that...
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> > In the end, it might be that the right thing to do for git on kernel.org is to
> > have a single, unified object store which isn't accessible by anything other
> > than git-specific protocols.  There would have to be some way of dealing with,
> > for example, conflicting tags that apply to different repositories, though.
> 
> As far as I can tell public distributed tags are not that hard and if
> you are going to be synching them it is probably worth working on.
> 
> The basic idea is that instead of having one global tag of
> 'linux-2.6.13-rc1' you have a global tag of
> 'torvalds@osdl.org/linux-2.6.13-rc1'.
> 
> The important part is that the tag namespace is made hierarchical
> with at least 2 levels.  Where the top level is a globally
> unique tag owner id and the bottom level is the actual tag.  This
> prevents collisions when merging trees because two peoples
> tags are never in the same namespace, as least when
> people are not actively hostile :)

I don't know, I don't consider this very appealing myself. I'd rather
prefer the private tags to be per-repository rather than per-user, since
those ugly "merged-here", "broken" etc. tags aren't very useful on
larger scope than of a repository. OTOH, what tags would be per-user,
not per-repository and not global?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 18:09                 ` Tags Petr Baudis
@ 2005-07-01 18:37                   ` H. Peter Anvin
  2005-07-01 21:20                     ` Tags Matthias Urlichs
  2005-07-01 21:42                     ` Tags Petr Baudis
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-01 18:37 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Petr Baudis wrote:
> Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter
> where "Eric W. Biederman" <ebiederm@xmission.com> told me that...
> 
>>"H. Peter Anvin" <hpa@zytor.com> writes:
>>
>>
>>>In the end, it might be that the right thing to do for git on kernel.org is to
>>>have a single, unified object store which isn't accessible by anything other
>>>than git-specific protocols.  There would have to be some way of dealing with,
>>>for example, conflicting tags that apply to different repositories, though.
>>
>>As far as I can tell public distributed tags are not that hard and if
>>you are going to be synching them it is probably worth working on.
>>
>>The basic idea is that instead of having one global tag of
>>'linux-2.6.13-rc1' you have a global tag of
>>'torvalds@osdl.org/linux-2.6.13-rc1'.
>>
>>The important part is that the tag namespace is made hierarchical
>>with at least 2 levels.  Where the top level is a globally
>>unique tag owner id and the bottom level is the actual tag.  This
>>prevents collisions when merging trees because two peoples
>>tags are never in the same namespace, as least when
>>people are not actively hostile :)
> 
> 
> I don't know, I don't consider this very appealing myself. I'd rather
> prefer the private tags to be per-repository rather than per-user, since
> those ugly "merged-here", "broken" etc. tags aren't very useful on
> larger scope than of a repository. OTOH, what tags would be per-user,
> not per-repository and not global?
> 

He's talking about global tags, just using a "globally unique" 
namespace.  Which of course only works right if only genuinely can't 
create tags outside your assigned namespace.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 18:37                   ` Tags H. Peter Anvin
@ 2005-07-01 21:20                     ` Matthias Urlichs
  2005-07-01 21:42                     ` Tags Petr Baudis
  1 sibling, 0 replies; 86+ messages in thread
From: Matthias Urlichs @ 2005-07-01 21:20 UTC (permalink / raw)
  To: git

Hi, H. Peter Anvin wrote:

> Which of course only works right if only genuinely can't 
> create tags outside your assigned namespace.

I'd rather say that you can't *push* the tags to the central server if
their namspace is wrong, but nothing would prevent you from *creating*
arbitrary tags in your own repository.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
Habit is habit, and not to be flung out of the window by any man, but coaxed
down-stairs a step at a time.
		-- Mark Twain

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 18:37                   ` Tags H. Peter Anvin
  2005-07-01 21:20                     ` Tags Matthias Urlichs
@ 2005-07-01 21:42                     ` Petr Baudis
  2005-07-01 21:52                       ` Tags H. Peter Anvin
  1 sibling, 1 reply; 86+ messages in thread
From: Petr Baudis @ 2005-07-01 21:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Dear diary, on Fri, Jul 01, 2005 at 08:37:55PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Petr Baudis wrote:
> >Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter
> >where "Eric W. Biederman" <ebiederm@xmission.com> told me that...
> >
> >>"H. Peter Anvin" <hpa@zytor.com> writes:
> >>
> >>
> >>>In the end, it might be that the right thing to do for git on kernel.org 
> >>>is to
> >>>have a single, unified object store which isn't accessible by anything 
> >>>other
> >>>than git-specific protocols.  There would have to be some way of dealing 
> >>>with,
> >>>for example, conflicting tags that apply to different repositories, 
> >>>though.
> >>
> >>As far as I can tell public distributed tags are not that hard and if
> >>you are going to be synching them it is probably worth working on.
> >>
> >>The basic idea is that instead of having one global tag of
> >>'linux-2.6.13-rc1' you have a global tag of
> >>'torvalds@osdl.org/linux-2.6.13-rc1'.
> >>
> >>The important part is that the tag namespace is made hierarchical
> >>with at least 2 levels.  Where the top level is a globally
> >>unique tag owner id and the bottom level is the actual tag.  This
> >>prevents collisions when merging trees because two peoples
> >>tags are never in the same namespace, as least when
> >>people are not actively hostile :)
> >
> >
> >I don't know, I don't consider this very appealing myself. I'd rather
> >prefer the private tags to be per-repository rather than per-user, since
> >those ugly "merged-here", "broken" etc. tags aren't very useful on
> >larger scope than of a repository. OTOH, what tags would be per-user,
> >not per-repository and not global?
> >
> 
> He's talking about global tags, just using a "globally unique" 
> namespace.  Which of course only works right if only genuinely can't 
> create tags outside your assigned namespace.

I doubt that's really useful either. Rather artificial mechanisms for
protection of the namespace would have to be deployed, and again, what
would it be good for anyway? If you are tagging linux-2.m.n, you are
probably whoever you should be - David, Alan, Marcelo, Linus, or whoever
else, while if you are tagging linux-2.m.n-cki, you are likely Con
Kolivas. I don't believe there is any (or much) potential for "natural"
conflicts and if you are malicious, you will just fake the namespace;
but frequently what's interesting about the tags is not the author at
all - I would consider it confusing to have to suddenly dive to another
namespace when Linus hands maintenance of linux-2.m to someone else.

The only significant value I can therefore see in the namespaces is
prevention of user mistakes, but I think the successful strategy here
would be just "upstream will notice", and make sure the upstream will be
noticed properly (perhaps even interactively) about any new tags it
gets.

Ok, I admit that it boils down to me being lazy and that "it'd be more
typing!"... ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 21:42                     ` Tags Petr Baudis
@ 2005-07-01 21:52                       ` H. Peter Anvin
  2005-07-01 22:27                         ` Tags Daniel Barkalow
  2005-07-01 22:59                         ` Tags Petr Baudis
  0 siblings, 2 replies; 86+ messages in thread
From: H. Peter Anvin @ 2005-07-01 21:52 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Petr Baudis wrote:
> 
> I doubt that's really useful either. Rather artificial mechanisms for
> protection of the namespace would have to be deployed, and again, what
> would it be good for anyway? If you are tagging linux-2.m.n, you are
> probably whoever you should be - David, Alan, Marcelo, Linus, or whoever
> else, while if you are tagging linux-2.m.n-cki, you are likely Con
> Kolivas. I don't believe there is any (or much) potential for "natural"
> conflicts and if you are malicious, you will just fake the namespace;
> but frequently what's interesting about the tags is not the author at
> all - I would consider it confusing to have to suddenly dive to another
> namespace when Linus hands maintenance of linux-2.m to someone else.
> 
> The only significant value I can therefore see in the namespaces is
> prevention of user mistakes, but I think the successful strategy here
> would be just "upstream will notice", and make sure the upstream will be
> noticed properly (perhaps even interactively) about any new tags it
> gets.
> 
> Ok, I admit that it boils down to me being lazy and that "it'd be more
> typing!"... ;-)
> 

You're missing the whole point of the discussion.  Right now the only 
thing that makes a global object store impossible is the potential for a 
tag conflict, either intentional or accidental.

	-hpa

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 21:52                       ` Tags H. Peter Anvin
@ 2005-07-01 22:27                         ` Daniel Barkalow
  2005-07-01 22:59                         ` Tags Petr Baudis
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Barkalow @ 2005-07-01 22:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Petr Baudis, Eric W. Biederman, Linus Torvalds, Git Mailing List,
	Junio C Hamano, ftpadmin

On Fri, 1 Jul 2005, H. Peter Anvin wrote:

> You're missing the whole point of the discussion.  Right now the only 
> thing that makes a global object store impossible is the potential for a 
> tag conflict, either intentional or accidental.

Is there some issue remaining with having a global *object* store,
symlinked from multiple repositories, each with its own tags and
such? (I'd think that, in the refs, there would be more contention over
the heads than the tags, in any case; refs/heads/master is kind of
popular)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: Tags
  2005-07-01 21:52                       ` Tags H. Peter Anvin
  2005-07-01 22:27                         ` Tags Daniel Barkalow
@ 2005-07-01 22:59                         ` Petr Baudis
  1 sibling, 0 replies; 86+ messages in thread
From: Petr Baudis @ 2005-07-01 22:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow,
	Git Mailing List, Junio C Hamano, ftpadmin

Dear diary, on Fri, Jul 01, 2005 at 11:52:51PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> You're missing the whole point of the discussion.  Right now the only 
> thing that makes a global object store impossible is the potential for a 
> tag conflict, either intentional or accidental.

Ok, I was arguing about something a bit different here, sorry.

The point of refs/tags/ should be to just indicate tags which we have in
the current head (remember that this structure comes from the times
before Dave, when the repository:"master branch" mapping was 1:1), since
that are usually the only objects you have in _your_ repository.  What's
the point of having tag linux-1.0.4-ac128 when you don't have the
linux-1.0.4-ac branch whatsoever?  The distinction of "public" vs
"private" tags here is really only that the "public" tags should be
propagated to your head when you merge the remote head.  This way, each
head will have its own set of tags, and it will be only tags which
actually reference objects relevant to the head.

Now that we can have many branches in a repository, each with its own
set of tags, we should probably extend the tags hierarchy to
refs/tags/<head>/<tagname>. And see, you can actually have that in the
global object store, as long as the head names are unique. But heads
don't propagate in any way so that's a purely administrative issue on
the global store side.

BTW, I don't think many (most?) heads named "master" are big issue.
That's how the head is called locally, and noone says that's how the
head should be known at the other side too. It's fine to have a head
called "master" in your repository and when pushing to the global object
store call it "pasky/linux-l33t" over there. (If you are using Cogito,
you can add that branch using a URL
proto://global/obj/store#pasky/linux-l33t.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
<Espy> be careful, some twit might quote you out of context..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: "git-send-pack"
  2005-06-30 20:12   ` "git-send-pack" Linus Torvalds
  2005-06-30 20:23     ` "git-send-pack" H. Peter Anvin
@ 2005-06-30 20:49     ` Daniel Barkalow
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel Barkalow @ 2005-06-30 20:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, ftpadmin

On Thu, 30 Jun 2005, Linus Torvalds wrote:

> On Thu, 30 Jun 2005, Daniel Barkalow wrote:
> > 
> > The right solution probably involves getting each pack file you push to
> > the mirrors as well as to the master. They'll probably update no less
> > frequently than you push, and they should go through a series of states
> > which matches the master, so it's not necessary to have anything smart on
> > master sending them, and they only have to unpack the files they get (and
> > update the refs afterward).
> 
> Hmm, yes. That would work, together with just fetching the heads.
> 
> It won't _really_ solve the problem, since the pushed pack objects will
> grow at a proportional rate to the current objects - it's just a constant
> factor (admittedly a potentially fairly _big_ constant factor)  
> improvement both in size and in number of files.
>
> So the mirroring ends up getting slowly slower and slower as the number of 
> pack files go up. In contrast, a git-aware thing can be basically 
> constant-time, and mirroring expense ends up being relative to the size of 
> the change rather than the size of the repository.
> 
> But mirroring just pack-files might solve the problem for the forseeable 
> future, so..

Whenever it gets slow, you could replace all the old packs with a single
new pack containing all the old objects; and master could repack whenever
it has a lot of pack files. That's pretty close to O(n) in change size.

Alternatively, having a reverse-ordered list of pack files would mean that
mirrors could just go through that list until they found one they already
had, and stop there, which would really be O(n).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2005-07-07  3:36 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-30 17:54 "git-send-pack" Linus Torvalds
2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
2005-06-30 18:27   ` "git-send-pack" A Large Angry SCM
2005-06-30 19:04   ` "git-send-pack" Linus Torvalds
2005-06-30 18:45 ` "git-send-pack" Jan Harkes
2005-06-30 19:01 ` "git-send-pack" Mike Taht
2005-06-30 19:42   ` "git-send-pack" Linus Torvalds
2005-07-01  9:50     ` "git-send-pack" Matthias Urlichs
2005-06-30 19:44 ` "git-send-pack" Linus Torvalds
2005-06-30 20:38   ` "git-send-pack" Junio C Hamano
2005-06-30 21:05     ` "git-send-pack" Daniel Barkalow
2005-06-30 21:29       ` "git-send-pack" Linus Torvalds
2005-06-30 21:55         ` "git-send-pack" H. Peter Anvin
2005-06-30 22:26           ` "git-send-pack" Linus Torvalds
2005-06-30 23:40             ` "git-send-pack" H. Peter Anvin
2005-07-01  0:02               ` "git-send-pack" Linus Torvalds
2005-07-01  1:24                 ` "git-send-pack" H. Peter Anvin
2005-07-01 23:44                 ` "git-send-pack" Mike Taht
2005-07-02  0:07                   ` "git-send-pack" H. Peter Anvin
2005-07-02  1:56                   ` "git-send-pack" Linus Torvalds
2005-07-02  4:08                     ` "git-send-pack" H. Peter Anvin
2005-07-02  4:22                       ` "git-send-pack" Linus Torvalds
2005-07-02  4:29                         ` "git-send-pack" H. Peter Anvin
2005-07-02 17:16                           ` "git-send-pack" Linus Torvalds
2005-07-02 17:37                             ` "git-send-pack" H. Peter Anvin
2005-07-02 17:44                             ` "git-send-pack" Tony Luck
2005-07-02 17:48                               ` "git-send-pack" H. Peter Anvin
2005-07-02 18:12                                 ` "git-send-pack" A Large Angry SCM
2005-06-30 22:25         ` "git-send-pack" Daniel Barkalow
2005-06-30 23:56           ` "git-send-pack" Linus Torvalds
2005-07-01  5:01             ` "git-send-pack" Daniel Barkalow
2005-06-30 21:08     ` "git-send-pack" Linus Torvalds
2005-06-30 21:10     ` "git-send-pack" Dan Holmsand
2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow
2005-06-30 20:12   ` "git-send-pack" Linus Torvalds
2005-06-30 20:23     ` "git-send-pack" H. Peter Anvin
2005-06-30 20:52       ` "git-send-pack" Linus Torvalds
2005-06-30 21:23         ` "git-send-pack" H. Peter Anvin
2005-06-30 21:26           ` "git-send-pack" H. Peter Anvin
2005-06-30 21:42           ` "git-send-pack" Linus Torvalds
2005-06-30 22:00             ` "git-send-pack" H. Peter Anvin
2005-07-01 10:31               ` "git-send-pack" Matthias Urlichs
2005-07-01 14:43                 ` "git-send-pack" Jan Harkes
2005-07-01 13:56               ` Tags Eric W. Biederman
2005-07-01 16:37                 ` Tags H. Peter Anvin
2005-07-01 22:38                   ` Tags Eric W. Biederman
2005-07-01 22:44                     ` Tags H. Peter Anvin
2005-07-01 23:07                       ` Tags Eric W. Biederman
2005-07-01 23:22                         ` Tags Daniel Barkalow
2005-07-02  0:06                         ` Tags H. Peter Anvin
2005-07-02  7:00                           ` Tags Eric W. Biederman
2005-07-02 17:47                             ` Tags H. Peter Anvin
2005-07-02 17:54                               ` Tags Eric W. Biederman
2005-07-02 17:58                                 ` Tags H. Peter Anvin
2005-07-02 18:31                                   ` Tags Eric W. Biederman
2005-07-02 19:55                                     ` Tags Matthias Urlichs
2005-07-02 21:16                                     ` Tags H. Peter Anvin
2005-07-02 21:39                                       ` Tags Linus Torvalds
2005-07-02 21:42                                         ` Tags H. Peter Anvin
2005-07-02 22:02                                           ` Tags A Large Angry SCM
2005-07-02 22:20                                             ` Tags Linus Torvalds
2005-07-02 23:49                                               ` Tags A Large Angry SCM
2005-07-03  0:17                                                 ` Tags Linus Torvalds
2005-07-02 22:14                                           ` Tags Petr Baudis
2005-07-02 22:17                                           ` Tags Linus Torvalds
2005-07-03  0:04                                             ` Tags Dan Holmsand
2005-07-03 22:34                                               ` Tags Kevin Smith
2005-07-05 13:04                                             ` Tags Eric W. Biederman
2005-07-05 16:21                                               ` Tags Daniel Barkalow
2005-07-05 17:51                                                 ` Tags Eric W. Biederman
2005-07-05 18:33                                                   ` Tags Linus Torvalds
2005-07-05 19:22                                                     ` Tags Junio C Hamano
2005-07-06 18:04                                                       ` Tags Matthias Urlichs
2005-07-07  3:31                                                     ` Tags Eric W. Biederman
2005-07-02 18:45                                   ` Tags Linus Torvalds
2005-07-02 20:38                           ` Tags Jan Harkes
2005-07-02 22:32                             ` Tags Jan Harkes
2005-07-02 16:00                       ` Tags Matthias Urlichs
2005-07-01 18:09                 ` Tags Petr Baudis
2005-07-01 18:37                   ` Tags H. Peter Anvin
2005-07-01 21:20                     ` Tags Matthias Urlichs
2005-07-01 21:42                     ` Tags Petr Baudis
2005-07-01 21:52                       ` Tags H. Peter Anvin
2005-07-01 22:27                         ` Tags Daniel Barkalow
2005-07-01 22:59                         ` Tags Petr Baudis
2005-06-30 20:49     ` "git-send-pack" Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).