* "git-send-pack"
@ 2005-06-30 17:54 Linus Torvalds
2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM
` (4 more replies)
0 siblings, 5 replies; 86+ messages in thread
From: Linus Torvalds @ 2005-06-30 17:54 UTC (permalink / raw)
To: Git Mailing List; +Cc: Daniel Barkalow, Junio C Hamano, ftpadmin
Ok,
I'm happy to say that the first cut of my new packed-object-sending thing
seems to work. I have successfully sent updates both locally and over ssh,
and it seems to work fine, although it has some limitations.
The syntax is very simple indeed:
git-send-pack destination
will go to the destination (which can be either a local directory or a
remote ssh one, with the remote destination format currently being _only_
the "machine:path" format), and it will go through all the refs in the
remote destination, compare them with the local ones, and create a pack
that updates from one to the other.
If the pack/unpack sequence is successful, it then updates the refs at the
other end, and is done.
My quick tests were very successful, in the sense that it even performed
really well. But I only tested some small updates.
Anyway, what are the limitations? Here's a few obvious ones:
- the code actually contains support for limiting the refs to be updated
on the remote end, but I don't actually pass the arguments to the
remote git-receive-pack binary yet, so this is currently not
functional. Call me lazy.
- the thing currently refuses to create new refs. Again, this is mainly
just me being lazy: it should be easy to add support for creating a new
branch, it just requires some care to make sure that we take the old
branches into account when generating the pack-file so that we don't
send too many objects over.
- I really hate how "ssh" apparently cannot be told to have alternate
paths. For example, on master.kernel.org, I don't control the setup, so
I can't install my own git binaries anywhere except in my ~/bin
directory, but I also cannot get ssh to accept that that is a valid
path. This one really bums me out, and I think it's an ssh deficiency.
You apparently have to compile in the paths at compile-time into sshd,
and PermitUserEnvironment is disabled by default (not that it even
seems to work for the PATH environment, but that may have been my
testing that didn't re-start sshd).
That just sucks.
- It doesn't update the working directory at the other end. This is fine
for what it's intended for (pushing to a central "raw" git archives),
so this could be considered a feature, but it's worth pointing out.
Only a "pull" will update your working directory, and this pack sending
really is meant to be used in a kind of "push to central archive" way.
- this is also (at least once we've tested it a lot more and added the
code to allow it to create new refs on the remote side) meant to be a
good way to mirror things out, since clearly rsync isn't scaling.
However, I don't know what the rules for acceptable mirroring
approaches are, and it's entirely possible (nay, probable) that an ssh
connection from the "master" ain't it. It would be good to know what
(of any) would be acceptable solutions..
Anyway, please do give it a test. I think I'll use this to sync up to
kernel.org, except I _really_ would want to solve that ssh issue some
other way than hardcoding the /home/torvalds/bin/ path in my local
copies.. If somebody knows a good solution, pls holler.
Linus
^ permalink raw reply [flat|nested] 86+ messages in thread* Re: "git-send-pack" 2005-06-30 17:54 "git-send-pack" Linus Torvalds @ 2005-06-30 18:24 ` A Large Angry SCM 2005-06-30 18:27 ` "git-send-pack" A Large Angry SCM 2005-06-30 19:04 ` "git-send-pack" Linus Torvalds 2005-06-30 18:45 ` "git-send-pack" Jan Harkes ` (3 subsequent siblings) 4 siblings, 2 replies; 86+ messages in thread From: A Large Angry SCM @ 2005-06-30 18:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Have you tried something like the following? ssh torvalds@master.kernel.org \ '/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"' Linus Torvalds wrote: > ... > > Anyway, please do give it a test. I think I'll use this to sync up to > kernel.org, except I _really_ would want to solve that ssh issue some > other way than hardcoding the /home/torvalds/bin/ path in my local > copies.. If somebody knows a good solution, pls holler. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM @ 2005-06-30 18:27 ` A Large Angry SCM 2005-06-30 19:04 ` "git-send-pack" Linus Torvalds 1 sibling, 0 replies; 86+ messages in thread From: A Large Angry SCM @ 2005-06-30 18:27 UTC (permalink / raw) To: gitzilla; +Cc: Linus Torvalds, Git Mailing List Damn! That should have been: ssh torvalds@master.kernel.org \ '/bin/sh -c "export PATH=~/tmp/foo:$PATH ; env"' A Large Angry SCM wrote: > Have you tried something like the following? > > ssh torvalds@master.kernel.org \ > '/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"' > > Linus Torvalds wrote: >> > ... > > >> Anyway, please do give it a test. I think I'll use this to sync up to >> kernel.org, except I _really_ would want to solve that ssh issue some >> other way than hardcoding the /home/torvalds/bin/ path in my local >> copies.. If somebody knows a good solution, pls holler. > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM 2005-06-30 18:27 ` "git-send-pack" A Large Angry SCM @ 2005-06-30 19:04 ` Linus Torvalds 1 sibling, 0 replies; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 19:04 UTC (permalink / raw) To: A Large Angry SCM; +Cc: Git Mailing List On Thu, 30 Jun 2005, A Large Angry SCM wrote: > > Have you tried something like the following? > > ssh torvalds@master.kernel.org \ > '/bin/sh -c "export PATH=/tmp/foo:$PATH ; env"' The point is that the user does not call "ssh" itself, but git-send-pack does it automatically. And that means that git-send-pack will always do the same thing, for any host it is given. If one host needs a special PATH, that's an effing pain. However, Kees Cook points out that it's driver error: I set up my PATH in .bash_profile, and if I just do it in .bashrc instead it all works. Danke, Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 17:54 "git-send-pack" Linus Torvalds 2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM @ 2005-06-30 18:45 ` Jan Harkes 2005-06-30 19:01 ` "git-send-pack" Mike Taht ` (2 subsequent siblings) 4 siblings, 0 replies; 86+ messages in thread From: Jan Harkes @ 2005-06-30 18:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Thu, Jun 30, 2005 at 10:54:48AM -0700, Linus Torvalds wrote: > Anyway, please do give it a test. I think I'll use this to sync up to > kernel.org, except I _really_ would want to solve that ssh issue some > other way than hardcoding the /home/torvalds/bin/ path in my local > copies.. If somebody knows a good solution, pls holler. I've got a couple of 'export FOO=bar' lines in ~/.bashrc on the "remote-side" and it looks like they are set correctly when I do something like "ssh remote.host env". Jan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 17:54 "git-send-pack" Linus Torvalds 2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM 2005-06-30 18:45 ` "git-send-pack" Jan Harkes @ 2005-06-30 19:01 ` Mike Taht 2005-06-30 19:42 ` "git-send-pack" Linus Torvalds 2005-06-30 19:44 ` "git-send-pack" Linus Torvalds 2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow 4 siblings, 1 reply; 86+ messages in thread From: Mike Taht @ 2005-06-30 19:01 UTC (permalink / raw) To: Linus Torvalds Cc: Git Mailing List, Daniel Barkalow, Junio C Hamano, ftpadmin > However, I don't know what the rules for acceptable mirroring > approaches are, and it's entirely possible (nay, probable) that an ssh > connection from the "master" ain't it. It would be good to know what > (of any) would be acceptable solutions.. Flute, perhaps http://www.atm.tut.fi/mad/ or fcast http://www.inrialpes.fr/planete/people/roca/mcl/mcl.html > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 19:01 ` "git-send-pack" Mike Taht @ 2005-06-30 19:42 ` Linus Torvalds 2005-07-01 9:50 ` "git-send-pack" Matthias Urlichs 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 19:42 UTC (permalink / raw) To: Mike Taht; +Cc: Git Mailing List, Daniel Barkalow, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, Mike Taht wrote: > > > However, I don't know what the rules for acceptable mirroring > > approaches are, and it's entirely possible (nay, probable) that an ssh > > connection from the "master" ain't it. It would be good to know what > > (of any) would be acceptable solutions.. > > Flute, perhaps > > http://www.atm.tut.fi/mad/ Well, I was hoping for something that has git knowledge, since there are issues like updating objects in the right order. So "git-send-pack" is nice in many ways: it allows you to update any number of branches (in particular, it allows you to update just a _subset_ of the branches, which is nice if you have a shared central repository, and some people have write permissions to some branches but not to others), but it also allows for efficient unpacking on the receiver side in a way no "general-purpose" mirror program can really match. However, that requires the receiver to run a git-aware unpacker (in this case git-receive-pack). I'm hoping that would be acceptable, I'm just wondering what kind of safety concerns I'd need to make sure of in order to make people comfortable running a special receiver program. So the current approach is very flexible: if the pusher has ssh access, he can do it. Safe, secure, and no new security issues. And since the only programs the receiver has to be able to run is two git programs (git-receive-pack will run git-unpack-objects), maybe it would be ok to even have "git-receive-pack" as the shell for the receiver side, so that you don't actually give the mirrorer any shell access at all. But it's still "push-based" in the sense that it's kernel.org that is doing the pushing, and that may simply not be acceptable. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 19:42 ` "git-send-pack" Linus Torvalds @ 2005-07-01 9:50 ` Matthias Urlichs 0 siblings, 0 replies; 86+ messages in thread From: Matthias Urlichs @ 2005-07-01 9:50 UTC (permalink / raw) To: git Hi, Linus Torvalds wrote: > maybe it would be ok to > even have "git-receive-pack" as the shell for the receiver side, so that > you don't actually give the mirrorer any shell access at all. You can probably just set the remote command (in ~/.ssh/authorized_keys) to git-receive-pack. That also works around any $PATH issues. Once this is stable, master.kernel.org should be updated with the latest git. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - People are never so ready to believe you as when you say things in dispraise of yourself; and you are never so much annoyed as when they take you at your word. -- Somerset Maugham ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 17:54 "git-send-pack" Linus Torvalds ` (2 preceding siblings ...) 2005-06-30 19:01 ` "git-send-pack" Mike Taht @ 2005-06-30 19:44 ` Linus Torvalds 2005-06-30 20:38 ` "git-send-pack" Junio C Hamano 2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow 4 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 19:44 UTC (permalink / raw) To: Git Mailing List; +Cc: Daniel Barkalow, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, Linus Torvalds wrote: > > Anyway, please do give it a test. I think I'll use this to sync up to > kernel.org In fact, the most recent push was gone with a git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git so if the new commit ("Do ref matching on the sender side rather than on receiver") shows up after the mirrors have caught up, then this thing is officially in production use.. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 19:44 ` "git-send-pack" Linus Torvalds @ 2005-06-30 20:38 ` Junio C Hamano 2005-06-30 21:05 ` "git-send-pack" Daniel Barkalow ` (2 more replies) 0 siblings, 3 replies; 86+ messages in thread From: Junio C Hamano @ 2005-06-30 20:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> In fact, the most recent push was gone with a LT> git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git Congrats for a job well done. Now is there anything for us poor mortals who would want to have a "pull" support? Logging in via ssh and run send-pack on the other end is workable but not so pretty ;-). ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:38 ` "git-send-pack" Junio C Hamano @ 2005-06-30 21:05 ` Daniel Barkalow 2005-06-30 21:29 ` "git-send-pack" Linus Torvalds 2005-06-30 21:08 ` "git-send-pack" Linus Torvalds 2005-06-30 21:10 ` "git-send-pack" Dan Holmsand 2 siblings, 1 reply; 86+ messages in thread From: Daniel Barkalow @ 2005-06-30 21:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git On Thu, 30 Jun 2005, Junio C Hamano wrote: > >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: > > LT> In fact, the most recent push was gone with a > > LT> git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git > > Congrats for a job well done. > > Now is there anything for us poor mortals who would want to have > a "pull" support? Logging in via ssh and run send-pack on the > other end is workable but not so pretty ;-). I suspect that I'll be able to merge send-pack/receive-pack with ssh-push/ssh-pull this evening, and then it'll have the feature of not caring too much which side your command line is on. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:05 ` "git-send-pack" Daniel Barkalow @ 2005-06-30 21:29 ` Linus Torvalds 2005-06-30 21:55 ` "git-send-pack" H. Peter Anvin 2005-06-30 22:25 ` "git-send-pack" Daniel Barkalow 0 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 21:29 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Junio C Hamano, git On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > I suspect that I'll be able to merge send-pack/receive-pack with > ssh-push/ssh-pull this evening, and then it'll have the feature of not > caring too much which side your command line is on. The simple thing to do is to just get one commit at a time, see if you have it already, parse if it not, and go on to the parents. That would fit the current git-pull thing, and may be good enough, but it has the downside that it can need a _lot_ of back-and-forth fecthing of commit objects from the other side until you find the one you want. That's going to be _very_ slow over a high-latency connection. So what I'd suggest is: - puller starts by just asking "what's your SHA1 for the ref I want" The puller wants to know this, because a common case may be that it already has it, in which case it doesn't need to do anything. But more importantly, the puller will need to know this anyway if it gets an object-pack, so that the puller can update it's FETCH_HEAD. - if puller doesn't have it, then the _puller_ does: "git-rev-list my-current-refs" to generate an in-date-order list of commits it has, and it starts feeding the result in chunks of 100 entries or something to the other end. - now, the server sees this stream of SHA1's that the client wants, and it can very cheaply just test "do I have this SHA1". Now, if the client hasn't made any changes at all, then the first one will be a hit, and we already have sufficient knowledge to tell what the difference between the client and the server is. But more importantly, even if the client _has_ made changes, the client likely has more available CPU than the server has, _and_ the client likely has a shorter list of changes than the server has, so it's really the client that should do this. We should burden the server as lightly as possible for this to scale. - At some point the server sees the first SHA1 it recognizes, and at that point the server will have to start working. It will just send back an "ok, got it" message (telling the client to not bother continuing to send it any more commit ID's), and then does git-rev-list --objects ref-client-wants ^first-common-sha1 | git-pack-objects --stdout - the client just unpacks the objects, and if successful, it puts the new top ref it got into FETCH_HEAD. It's now done. And I do _not_ think that it makes a lot of sense to try to be symmetric. For one thing, while a "git-send-pack" should update all the refs in-place, a "git-pull-pack" should _not_ update the ref, it should just set FETCH_HEAD instead and the puller can decide what he wants to do with that ref (possibly merge it, but possibly just make it be a new local branch "remote-branch"). So I think sending and receiving are fundamentally non-symmetric. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:29 ` "git-send-pack" Linus Torvalds @ 2005-06-30 21:55 ` H. Peter Anvin 2005-06-30 22:26 ` "git-send-pack" Linus Torvalds 2005-06-30 22:25 ` "git-send-pack" Daniel Barkalow 1 sibling, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 21:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git It seems to me that git always defines a DAG of objects, such that if you have a list of terminals (defined as objects not referenced by other objects), you can, given access to the same objects, figure out all intervening objects. The tricky bit becomes finding the DAG both sides have in common with as little traffic as possible. For producing minimum network traffic, I think something like this would work: a) The sender sends a list of its terminals to the receiver. b) The receiver sends a list of nodes it needs, plus a list of all its own meta-terminals, obtained by pruning its own DAG according to the terminals list of the sender. c) This may have to be performed iteratively? I need to sit down and work out the exact algorithm for all cases, including branch trees and multi-rooted DAGs. d) Once the sender knows the subset of its own DAG available to the receiver, it can transmit either all objects that it has the sender does not, or all objects on the path to one or more specific objects (e.g. HEAD.) -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:55 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 22:26 ` Linus Torvalds 2005-06-30 23:40 ` "git-send-pack" H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 22:26 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Daniel Barkalow, Junio C Hamano, git On Thu, 30 Jun 2005, H. Peter Anvin wrote: > > For producing minimum network traffic, I think something like this would > work: In the "minimum traffic", the thing to look at is number of packets, and penalize further for anything that requires a synchronous reply. That's why I'd suggest just letting the client stream out the list of objects it has - it may appear wasteful to stream out even a thousand SHA1's, but hey, that's just 20kB worth of data, and especially if there is no synchronous stuff, that's just 15 ethernet packets. For the server side, looking up a thousand SHA's is pretty easy (it's _really_ cheap if the server ends up using a few big packed objects: you don't even have to look at the pack data itself, it can look at just the index and say "yup, I've got it") So I'd go for simple brute force over anything that needs to discuss things and have a back-and-forth between server/client. And making the client do the heavy lifting is the right thing to do (the server will have to create the pack, which can be expensive, but you can tune the delta window for how much CPU the server has) Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 22:26 ` "git-send-pack" Linus Torvalds @ 2005-06-30 23:40 ` H. Peter Anvin 2005-07-01 0:02 ` "git-send-pack" Linus Torvalds 0 siblings, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 23:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > > On Thu, 30 Jun 2005, H. Peter Anvin wrote: > >>For producing minimum network traffic, I think something like this would >>work: > > In the "minimum traffic", the thing to look at is number of packets, and > penalize further for anything that requires a synchronous reply. > > That's why I'd suggest just letting the client stream out the list of > objects it has - it may appear wasteful to stream out even a thousand > SHA1's, but hey, that's just 20kB worth of data, and especially if there > is no synchronous stuff, that's just 15 ethernet packets. > In your linux-2.6 tree, there are currently 54,204 objects, and that is after less than one full 2.6.x kernel release cycle. That's a megabyte of SHA1s. In /pub/scm on kernel.org, there are currently 1,815,573 objects or hard links to objects, which would take a 36.3 MB list to produce. Although this is better than what rsync does, which is it encodes this list into ASCII with pathnames and all and it ends up being closer to 200 MB, it isn't fundamentally different. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 23:40 ` "git-send-pack" H. Peter Anvin @ 2005-07-01 0:02 ` Linus Torvalds 2005-07-01 1:24 ` "git-send-pack" H. Peter Anvin 2005-07-01 23:44 ` "git-send-pack" Mike Taht 0 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-01 0:02 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Daniel Barkalow, Junio C Hamano, git On Thu, 30 Jun 2005, H. Peter Anvin wrote: > > In your linux-2.6 tree, there are currently 54,204 objects, and that is > after less than one full 2.6.x kernel release cycle. That's a megabyte > of SHA1s. But that's _all_ objects. There are "only" 4040 commit objects (which are always the starting point for a search). So streaming out the commit objects a few hundred at a time is actually a very simple strategy. Also, note that the server is usually _more_ ahead than the client is, and the server is the one that potentially has lots of commits that the client doesn't have. Not the other way around. So if the client makes a list of it's top commits, it almost certainly won't have to make a very long list until the server can tell it "ok, stop, I've seen it". Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit in a regular-sized ethernet packet, but whatever - you'd burst out your commits "latest first", so you'd never even get to the current 4040 unless you've literally done the kind of work we've done in the git tree for the last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-01 0:02 ` "git-send-pack" Linus Torvalds @ 2005-07-01 1:24 ` H. Peter Anvin 2005-07-01 23:44 ` "git-send-pack" Mike Taht 1 sibling, 0 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-01 1:24 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > > On Thu, 30 Jun 2005, H. Peter Anvin wrote: > >>In your linux-2.6 tree, there are currently 54,204 objects, and that is >>after less than one full 2.6.x kernel release cycle. That's a megabyte >>of SHA1s. > > > But that's _all_ objects. There are "only" 4040 commit objects (which are > always the starting point for a search). > Well, there are objects that reference commit objects (e.g. tag objects), not the other way around, but your point is well taken. > So streaming out the commit objects a few hundred at a time is actually > a very simple strategy. > > Also, note that the server is usually _more_ ahead than the client is, and > the server is the one that potentially has lots of commits that the > client doesn't have. Not the other way around. So if the client makes a > list of it's top commits, it almost certainly won't have to make a very > long list until the server can tell it "ok, stop, I've seen it". Well, what I proposed was pretty much that except to have the client (receiver) start first. I prefer calling it sender and receiver, because in the case of upload and download you have different sides being the "server". > Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit > in a regular-sized ethernet packet, but whatever - you'd burst out your > commits "latest first", so you'd never even get to the current 4040 unless > you've literally done the kind of work we've done in the git tree for the > last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_. Well, in the common case (sender has a superset of receiver), what I proposed would converge on the first iteration. I'm not even convinced that the algorithm *ever* needs to iterate. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-01 0:02 ` "git-send-pack" Linus Torvalds 2005-07-01 1:24 ` "git-send-pack" H. Peter Anvin @ 2005-07-01 23:44 ` Mike Taht 2005-07-02 0:07 ` "git-send-pack" H. Peter Anvin 2005-07-02 1:56 ` "git-send-pack" Linus Torvalds 1 sibling, 2 replies; 86+ messages in thread From: Mike Taht @ 2005-07-01 23:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > Also, note that the server is usually _more_ ahead than the client is, and > the server is the one that potentially has lots of commits that the > client doesn't have. Not the other way around. So if the client makes a > list of it's top commits, it almost certainly won't have to make a very > long list until the server can tell it "ok, stop, I've seen it". > > Yeah, maybe we want to limit the "burst" to 70 sha1's, since that will fit > in a regular-sized ethernet packet, but whatever - you'd burst out your > commits "latest first", so you'd never even get to the current 4040 unless > you've literally done the kind of work we've done in the git tree for the > last 3 months _and_you've_not_pulled_from_that_server_in_the_whole_time_. You are getting closer and closer to where something like bitTorrent or a multicast protocol makes sense. The problem isn't just the number of outstanding commit objects but the number of machines and developers that want to grab those commits at the same time. Mike Taht PostCards From The Bleeding Edge http://the-edge.blogspot.com "Tempel 1 worth 2.2 million trillion bux" ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-01 23:44 ` "git-send-pack" Mike Taht @ 2005-07-02 0:07 ` H. Peter Anvin 2005-07-02 1:56 ` "git-send-pack" Linus Torvalds 1 sibling, 0 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 0:07 UTC (permalink / raw) To: Mike Taht; +Cc: Linus Torvalds, Daniel Barkalow, Junio C Hamano, git Mike Taht wrote: > > You are getting closer and closer to where something like bitTorrent or > a multicast protocol makes sense. The problem isn't just the number of > outstanding commit objects but the number of machines and developers > that want to grab those commits at the same time. > Not really. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-01 23:44 ` "git-send-pack" Mike Taht 2005-07-02 0:07 ` "git-send-pack" H. Peter Anvin @ 2005-07-02 1:56 ` Linus Torvalds 2005-07-02 4:08 ` "git-send-pack" H. Peter Anvin 1 sibling, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 1:56 UTC (permalink / raw) To: Mike Taht; +Cc: H. Peter Anvin, Daniel Barkalow, Junio C Hamano, git On Fri, 1 Jul 2005, Mike Taht wrote: > > You are getting closer and closer to where something like bitTorrent or > a multicast protocol makes sense. The problem isn't just the number of > outstanding commit objects but the number of machines and developers > that want to grab those commits at the same time. I don't think so. First off, I don't think the decision is kernel- specific, in the sense that I at least use git for sparse and git itself too, so the solution should make sense for small projects as well. Also, even for the kernel, the total dataset right now (after three months or whatever) is a 60MB pack. It's not like we're sending DVD's or even CD's worth of data around - we're sending the equivalent of 20MB per _month_. That's really not a lot of data. You could easily keep up with a slow modem. Also, the number of people involved isn't _that_ big. We're talking a few thousand people who actively would update their trees for a big project, and many smaller projects have anything from a couple to maybe a hundred. A few mirrors, and you don't have any problem. So I think that the problem is actually not that big, and we just need to find an acceptable format. Quite frankly, it might be perfectly acceptable for kernel.org to run a simple packing script once a week which packs everything into one single file, and even if that means that the mirrors will have to re-get everything once a week, that actually sounds acceptable. It's obviously a _stupid_ way to handle the rsync problem, so there's bound to be some cleaner solution, but the point is that we can probably make mirroring acceptable even with a really really stupid approach. I'd be a bit ashamed of just how ugly it is, but it would likely _work_ fine. You'd create 52 pack-files in a year, but each pack-file is likely just ten megabytes each. Oh, each pack-file should also be associated with the list of "refs" that were used to generate that pack-file, so make that 104 files per project year (but the list of "refs" would usually be something small, like refs/heads/master 4a89a04f1ee21a7c1f4413f1ad7dcfac50ff9b63 refs/tags/v2.6.11 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.11-tree 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.12 26791a8bcf0e6d33f43aef7682bdb555236d56de refs/tags/v2.6.12-rc2 9e734775f7c22d2f89943ad6c745571f1930105f refs/tags/v2.6.12-rc3 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a refs/tags/v2.6.12-rc4 ebb5573ea8beaf000d4833735f3e53acb9af844c refs/tags/v2.6.12-rc5 06f6d9e2f140466eeb41e494e14167f90210f89d refs/tags/v2.6.12-rc6 701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e refs/tags/v2.6.13-rc1 733ad933f62e82ebc92fed988c7f0795e64dea62 which was trivially generated from my current tree with for i in refs/*/*; do echo -ne $i"\t"; cat $i; done so now you can use the refs associated with the previous pack-file as the list of refs you're _not_ interested in, and the current list of refs as the list you _are_ interested in, and generate the new pack-file. Generating the pack-file would literally be something like obj=$(git-rev-parse $(cut -f2 new-list) --not $(cut -f2 old-list)) git-rev-list $obj | git-pack-objects --stdin > new-pack so a few one-liners like this, run from a cron-job once a week, should just do it. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 1:56 ` "git-send-pack" Linus Torvalds @ 2005-07-02 4:08 ` H. Peter Anvin 2005-07-02 4:22 ` "git-send-pack" Linus Torvalds 0 siblings, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 4:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > > Also, the number of people involved isn't _that_ big. We're talking a few > thousand people who actively would update their trees for a big project, > and many smaller projects have anything from a couple to maybe a hundred. > A few mirrors, and you don't have any problem. > > So I think that the problem is actually not that big, and we just need to > find an acceptable format. Quite frankly, it might be perfectly acceptable > for kernel.org to run a simple packing script once a week which packs > everything into one single file, and even if that means that the mirrors > will have to re-get everything once a week, that actually sounds > acceptable. > > It's obviously a _stupid_ way to handle the rsync problem, so there's > bound to be some cleaner solution, but the point is that we can probably > make mirroring acceptable even with a really really stupid approach. I'd > be a bit ashamed of just how ugly it is, but it would likely _work_ fine. > You'd create 52 pack-files in a year, but each pack-file is likely just > ten megabytes each. > Any reason not to simply append objects to an existing packfile? It really seems like an easy solutions, and should have relatively good I/O patterns to boot simply because it naturally creates a topological sort of the objects. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 4:08 ` "git-send-pack" H. Peter Anvin @ 2005-07-02 4:22 ` Linus Torvalds 2005-07-02 4:29 ` "git-send-pack" H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 4:22 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git On Fri, 1 Jul 2005, H. Peter Anvin wrote: > > Any reason not to simply append objects to an existing packfile? What happens when somebody screws up in the middle? The one thing I care about more than anything else is consistency. We are careful about writing objects in the right order, and we can re-create the state from the originator etc. But if we start appending stuff and something goes wrong in the middle, I'm just not going to touch it. A "truncate and hope for the best" algorithm? Besides, the result is not a valid git archive any more. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 4:22 ` "git-send-pack" Linus Torvalds @ 2005-07-02 4:29 ` H. Peter Anvin 2005-07-02 17:16 ` "git-send-pack" Linus Torvalds 0 siblings, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 4:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > > On Fri, 1 Jul 2005, H. Peter Anvin wrote: > >>Any reason not to simply append objects to an existing packfile? > > > What happens when somebody screws up in the middle? > > The one thing I care about more than anything else is consistency. We are > careful about writing objects in the right order, and we can re-create the > state from the originator etc. But if we start appending stuff and > something goes wrong in the middle, I'm just not going to touch it. A > "truncate and hope for the best" algorithm? > > Besides, the result is not a valid git archive any more. > It's a log. It's a standard technique to append entries to a log. The requirements for this to always be consistent is that a) it's possible to know when the entry/entries at the end are inconsistent and b) it's always possible to roll back the log to a consistent state. This is normally done with commit records (write data - fdatasync - write commit record - fdatasync), but in the case of git, the commit record isn't required because each git record is self-validating. This is an incredibly powerful property. If the log is written in topological sort order, then even a truncated log file is a valid (subset) git object store. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 4:29 ` "git-send-pack" H. Peter Anvin @ 2005-07-02 17:16 ` Linus Torvalds 2005-07-02 17:37 ` "git-send-pack" H. Peter Anvin 2005-07-02 17:44 ` "git-send-pack" Tony Luck 0 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 17:16 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git On Fri, 1 Jul 2005, H. Peter Anvin wrote: > > It's a log. ..but that's not what we're looking for. I'm not looking for kernel.org to be my distributed backup tape. For it to be useful, it must do more than just log all activity and mirror it out via rsync. It must also be usable for people pulling on it. Which means that it has to be a valid git archive or at least easily incrementally unpackable, so that people can actually use the end result. A log of packs that are just incremented is certainly unpackable: you teach git-unpack-objects to just unpack several packs after each other. But since it's not seekable, you'd have to unpack a 100MB compressed archive just to get the last tip of it that you don't have unpacked yet. Also, it means that it's impossible to efficiently do a git-specific thing. I want people to be able to do what we used to be able to do with BK: just do a git pull master.kernel.org:xxxx and get something useful. And that means _not_ having to pull a 100MB blob to get the last objects at the end. And don't tell me "rsync can efficiently get just the end". That's true for _mirrors_, but it's not true for users that don't have every single archive on kernel.org. I don't have (and I don't want to have) a copy of every single persons log that ever might want to push to me. So no, a log simply isn't useful. It _has_ to be a valid git archive to be useful. Thousands of objects satisfy that. Or a "few packs + few objects". Not a log. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 17:16 ` "git-send-pack" Linus Torvalds @ 2005-07-02 17:37 ` H. Peter Anvin 2005-07-02 17:44 ` "git-send-pack" Tony Luck 1 sibling, 0 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 17:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Mike Taht, Daniel Barkalow, Junio C Hamano, git Linus Torvalds wrote: > > ..but that's not what we're looking for. I'm not looking for kernel.org to > be my distributed backup tape. > > For it to be useful, it must do more than just log all activity and mirror > it out via rsync. It must also be usable for people pulling on it. Which > means that it has to be a valid git archive or at least easily > incrementally unpackable, so that people can actually use the end result. > > A log of packs that are just incremented is certainly unpackable: you > teach git-unpack-objects to just unpack several packs after each other. > But since it's not seekable, you'd have to unpack a 100MB compressed > archive just to get the last tip of it that you don't have unpacked yet. > Agreed, you also need an index file. The index file can be recreated from the log file in case of corruption, but is what you'd use to seek directly to an object. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 17:16 ` "git-send-pack" Linus Torvalds 2005-07-02 17:37 ` "git-send-pack" H. Peter Anvin @ 2005-07-02 17:44 ` Tony Luck 2005-07-02 17:48 ` "git-send-pack" H. Peter Anvin 1 sibling, 1 reply; 86+ messages in thread From: Tony Luck @ 2005-07-02 17:44 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Mike Taht, Daniel Barkalow, Junio C Hamano, git Here's another approach. Teach the variants of git-pull to look for a file that names an alternate repository that should be used to get any object that is referenced in the repository, but doesn't exist in it. At least part of the problem for kernel.org is that there around 50 repositories that are tracking the 2.6 kernel. All of them have 50,000 objects that are duplicates of each other ... and a few hundred 'unique' objects that belong to just one repo, or are minimally shared. If there was a way to specify an alternate repo, then a large GIT server like kernel.org could set up a "git-history"[1] repo which each of the hosted repos could point to. Then a cron job could look for duplicates, and move them off to the history area. -Tony [1] Different projects, like git and sparse, might never have any common files with the Linux kernel ... but they can all share the same history. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 17:44 ` "git-send-pack" Tony Luck @ 2005-07-02 17:48 ` H. Peter Anvin 2005-07-02 18:12 ` "git-send-pack" A Large Angry SCM 0 siblings, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 17:48 UTC (permalink / raw) To: Tony Luck; +Cc: Linus Torvalds, Mike Taht, Daniel Barkalow, Junio C Hamano, git Tony Luck wrote: > > At least part of the problem for kernel.org is that there around 50 repositories > that are tracking the 2.6 kernel. All of them have 50,000 objects that are > duplicates of each other ... and a few hundred 'unique' objects that belong > to just one repo, or are minimally shared. > > If there was a way to specify an alternate repo, then a large GIT server like > kernel.org could set up a "git-history"[1] repo which each of the hosted repos > could point to. Then a cron job could look for duplicates, and move them > off to the history area. > This is why I've been talking about a global object repository -- including the problems associated with them. git as it currently stands permit a single global object store, *except* for the issue of duplicate tags. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-02 17:48 ` "git-send-pack" H. Peter Anvin @ 2005-07-02 18:12 ` A Large Angry SCM 0 siblings, 0 replies; 86+ messages in thread From: A Large Angry SCM @ 2005-07-02 18:12 UTC (permalink / raw) To: git H. Peter Anvin wrote: > Tony Luck wrote: >> ... > > This is why I've been talking about a global object repository -- > including the problems associated with them. git as it currently stands > permit a single global object store, *except* for the issue of duplicate > tags. So why not store just the git objects in the global repository and keep all the things that reference an object (HEAD, branches/*, refs/*/*, etc.) in a per project and/or contributor area like it is currently? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:29 ` "git-send-pack" Linus Torvalds 2005-06-30 21:55 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 22:25 ` Daniel Barkalow 2005-06-30 23:56 ` "git-send-pack" Linus Torvalds 1 sibling, 1 reply; 86+ messages in thread From: Daniel Barkalow @ 2005-06-30 22:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Thu, 30 Jun 2005, Linus Torvalds wrote: > On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > > > I suspect that I'll be able to merge send-pack/receive-pack with > > ssh-push/ssh-pull this evening, and then it'll have the feature of not > > caring too much which side your command line is on. > > The simple thing to do is to just get one commit at a time, see if you > have it already, parse if it not, and go on to the parents. > > That would fit the current git-pull thing, and may be good enough, but it > has the downside that it can need a _lot_ of back-and-forth fecthing of > commit objects from the other side until you find the one you want. That's > going to be _very_ slow over a high-latency connection. > > So what I'd suggest is: > > 1- puller starts by just asking "what's your SHA1 for the ref I want" > > The puller wants to know this, because a common case may be that it > already has it, in which case it doesn't need to do anything. But more > importantly, the puller will need to know this anyway if it gets an > object-pack, so that the puller can update it's FETCH_HEAD. Already have this, for the non-pack case. > - At some point the server sees the first SHA1 it recognizes, and at that > point the server will have to start working. It will just send back an > "ok, got it" message (telling the client to not bother continuing to > send it any more commit ID's), and then does > > git-rev-list --objects ref-client-wants ^first-common-sha1 | > git-pack-objects --stdout Right. > - the client just unpacks the objects, and if successful, it puts the new > top ref it got into FETCH_HEAD. It's now done. Or wherever it's been told to, yes. > And I do _not_ think that it makes a lot of sense to try to be symmetric. > For one thing, while a "git-send-pack" should update all the refs > in-place, a "git-pull-pack" should _not_ update the ref, it should just > set FETCH_HEAD instead and the puller can decide what he wants to do with > that ref (possibly merge it, but possibly just make it be a new local > branch "remote-branch"). My expectation is that the puller will have a ref "remote-branch", and will therefore: (1) want to update it, and (2) know the last commit pulled from it. In this situation, we can skip figuring out the start (the two points I didn't quote), because we saved it from before. At least, this is how I've always done it; I've got a "linus" branch that follows the public repo, and I commit changes to a different branch. I suppose one could skip hanging onto this info, but it seems like an obviously useful thing to keep, if for no other reason than that I want to diff against it. This is essentially promoting FETCH_HEAD to a refs/heads/ thing, and having separate ones when you pull from separate sources. I suppose things are different if you do a lot of one-shot pulls, rather than tracking branches that you pull from; I'll need to think about this case (assuming that's actually what you do). -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 22:25 ` "git-send-pack" Daniel Barkalow @ 2005-06-30 23:56 ` Linus Torvalds 2005-07-01 5:01 ` "git-send-pack" Daniel Barkalow 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 23:56 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Junio C Hamano, git On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > My expectation is that the puller will have a ref "remote-branch", and > will therefore: (1) want to update it, and (2) know the last commit pulled > from it. In this situation, we can skip figuring out the start (the two > points I didn't quote), because we saved it from before. This is _never_ how I do things, so I think that's a bad expectation. I have other peoples trees "just show up", since they are actually based on mine.. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 23:56 ` "git-send-pack" Linus Torvalds @ 2005-07-01 5:01 ` Daniel Barkalow 0 siblings, 0 replies; 86+ messages in thread From: Daniel Barkalow @ 2005-07-01 5:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Thu, 30 Jun 2005, Linus Torvalds wrote: > On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > > > My expectation is that the puller will have a ref "remote-branch", and > > will therefore: (1) want to update it, and (2) know the last commit pulled > > from it. In this situation, we can skip figuring out the start (the two > > points I didn't quote), because we saved it from before. > > This is _never_ how I do things, so I think that's a bad expectation. I > have other peoples trees "just show up", since they are actually based on > mine.. Okay, so my next task will be to support this case. What I'm doing now is: - if the source is using an old version, fall back on individual objects - send one (or more) ids to exclude - find out if the server recognized any of the ids - if not, fall back on transferring individual objects (or we could try another batch) - request a pack for the given hash, excluding whatever we've said to exclude I've implemented this for the case of updating a head, and got it to transfer a pack of 11 objects. It took 31s (including connecting) to transfer the entire history of git (3973 objects) over a DSL-DSL link with a 39ms ping time. I sent the same thing with the old method previously, and it took ages (wasn't timing it, though). It should be possible to notice that we're not updating a ref, send all the refs you have instead, see if the source recognized any, try again with the next 70 commits, check, and repeat. Does this match what you were suggesting? I can send you the messy version tomorrow if you want to hack on it or test it, and I'll have a clean patch series over the weekend. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:38 ` "git-send-pack" Junio C Hamano 2005-06-30 21:05 ` "git-send-pack" Daniel Barkalow @ 2005-06-30 21:08 ` Linus Torvalds 2005-06-30 21:10 ` "git-send-pack" Dan Holmsand 2 siblings, 0 replies; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 21:08 UTC (permalink / raw) To: Junio C Hamano; +Cc: Daniel Barkalow, git On Thu, 30 Jun 2005, Junio C Hamano wrote: > > Now is there anything for us poor mortals who would want to have > a "pull" support? Logging in via ssh and run send-pack on the > other end is workable but not so pretty ;-). I'm thinking about it. You can't actually do send-pack from the other end, since send-pack needs to know what the base is, and the base you have may not even exist in the remote. So a "git-pull-pack" will follow the objects on the other side until it hits one we have, and _then_ it can send a nice pack. It's not hard per se, and some of the problems are actually simpler than git-send-pack, but it needs more communication (and in order to be efficient you want to not ping-pong a "do-you-have-it" query every time around). I also want to make sure that the biggest burden is on the pull side, not the push side. I have a plan, though. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:38 ` "git-send-pack" Junio C Hamano 2005-06-30 21:05 ` "git-send-pack" Daniel Barkalow 2005-06-30 21:08 ` "git-send-pack" Linus Torvalds @ 2005-06-30 21:10 ` Dan Holmsand 2 siblings, 0 replies; 86+ messages in thread From: Dan Holmsand @ 2005-06-30 21:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Daniel Barkalow, git Junio C Hamano wrote: >>>>>>"LT" == Linus Torvalds <torvalds@osdl.org> writes: > > > LT> In fact, the most recent push was gone with a > > LT> git-send-pack master.kernel.org:/pub/scm/linux/kernel/git/torvalds/git.git > > Congrats for a job well done. Agree totally. And the whole pack thing is really cool. Git is sooo much faster when running from pack-files only on my poor laptop. > Now is there anything for us poor mortals who would want to have > a "pull" support? Logging in via ssh and run send-pack on the > other end is workable but not so pretty ;-). Agreed again :-) Even cooler would be pack-pulls via http. That would be a bit hard on the servers with the current git-pack-objects, but it ought to be possible to create something similar that doesn't re-delta anything, but instead just spits out what's in an existing pack-file, and (perhaps) deltifies objects from the file system. If people then re-pack their repositories occasionally, this should be plenty fast, the number of files for rsync to deal with could be kept down, as could download times for mortal users. /dan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 17:54 "git-send-pack" Linus Torvalds ` (3 preceding siblings ...) 2005-06-30 19:44 ` "git-send-pack" Linus Torvalds @ 2005-06-30 19:49 ` Daniel Barkalow 2005-06-30 20:12 ` "git-send-pack" Linus Torvalds 4 siblings, 1 reply; 86+ messages in thread From: Daniel Barkalow @ 2005-06-30 19:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, Linus Torvalds wrote: > Anyway, what are the limitations? Here's a few obvious ones: > > - I really hate how "ssh" apparently cannot be told to have alternate > paths. For example, on master.kernel.org, I don't control the setup, so > I can't install my own git binaries anywhere except in my ~/bin > directory, but I also cannot get ssh to accept that that is a valid > path. This one really bums me out, and I think it's an ssh deficiency. > > You apparently have to compile in the paths at compile-time into sshd, > and PermitUserEnvironment is disabled by default (not that it even > seems to work for the PATH environment, but that may have been my > testing that didn't re-start sshd). > > That just sucks. The easiest thing might be to have a centrally-installed wrapper script that could run programs installed in your home directory. E.g., if "git" had a "source ~/.git-env" at the beginning, and your ~/.git-env fixed your PATH, then "git receive-pack ARGS" should work, for a generic centrally installed git and special stuff in your home directory. > - It doesn't update the working directory at the other end. This is fine > for what it's intended for (pushing to a central "raw" git archives), > so this could be considered a feature, but it's worth pointing out. > Only a "pull" will update your working directory, and this pack sending > really is meant to be used in a kind of "push to central archive" way. I thought only "resolve" (as part of "fetch") updated your working directory, so this is completely consistant. > - this is also (at least once we've tested it a lot more and added the > code to allow it to create new refs on the remote side) meant to be a > good way to mirror things out, since clearly rsync isn't scaling. > > However, I don't know what the rules for acceptable mirroring > approaches are, and it's entirely possible (nay, probable) that an ssh > connection from the "master" ain't it. It would be good to know what > (of any) would be acceptable solutions.. The right solution probably involves getting each pack file you push to the mirrors as well as to the master. They'll probably update no less frequently than you push, and they should go through a series of states which matches the master, so it's not necessary to have anything smart on master sending them, and they only have to unpack the files they get (and update the refs afterward). That should make the cross-system trust requirements relatively minimal; the mirror can fetch things from master, and neither side has to allow the other to specify a command line. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow @ 2005-06-30 20:12 ` Linus Torvalds 2005-06-30 20:23 ` "git-send-pack" H. Peter Anvin 2005-06-30 20:49 ` "git-send-pack" Daniel Barkalow 0 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 20:12 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Git Mailing List, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > The right solution probably involves getting each pack file you push to > the mirrors as well as to the master. They'll probably update no less > frequently than you push, and they should go through a series of states > which matches the master, so it's not necessary to have anything smart on > master sending them, and they only have to unpack the files they get (and > update the refs afterward). Hmm, yes. That would work, together with just fetching the heads. It won't _really_ solve the problem, since the pushed pack objects will grow at a proportional rate to the current objects - it's just a constant factor (admittedly a potentially fairly _big_ constant factor) improvement both in size and in number of files. So the mirroring ends up getting slowly slower and slower as the number of pack files go up. In contrast, a git-aware thing can be basically constant-time, and mirroring expense ends up being relative to the size of the change rather than the size of the repository. But mirroring just pack-files might solve the problem for the forseeable future, so.. "git-receive-pack" would need to take a flag to tell it to instead of unpacking just check the object instead (ie call "git-unpack-object" with the "-n" flag - it will check that everything looks ok, including the embedded protecting SHA1 hash), and write it out to the filesystem (as it comes in) and then rename it to the right place. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:12 ` "git-send-pack" Linus Torvalds @ 2005-06-30 20:23 ` H. Peter Anvin 2005-06-30 20:52 ` "git-send-pack" Linus Torvalds 2005-06-30 20:49 ` "git-send-pack" Daniel Barkalow 1 sibling, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 20:23 UTC (permalink / raw) To: Linus Torvalds Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds wrote: > > It won't _really_ solve the problem, since the pushed pack objects will > grow at a proportional rate to the current objects - it's just a constant > factor (admittedly a potentially fairly _big_ constant factor) > improvement both in size and in number of files. > If I've understood this correctly, it's not a constant factor improvement in the number of files (in the size, yes); it's changing it from O(t*c) to O(t) where t is number of trees and c is number of changesets. That's key. The problem we're having (on kernel.org) right now is that there isn't a hierarchial time stamp in Unix, so we have to compare on a file-by-file level. rsync is quite good at discovering an invariant beginning of a file, but when it comes to a mass of files it has to compare the stamps on each and every one, each time. It will only descend into a single file, however, if that file has had its timestamp changed. For the purposes of rsync, storing the objects in a single append-only file would be a very efficient method, since the rsync algorithm will quickly discover an invariant head and only transmit the tail. It's not ideal, and having something git-aware would be better, but I think it's really would be nice to have something which also plays well with rsync. There is a *lot* of infrastructure in rsync which is actually hard to replicate with another tool (including the server architecture); in many ways it would be easier to convince the rsync developers to create a plugin architecture and re-use all that code rather than developing an equivalent tool from scratch. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:23 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 20:52 ` Linus Torvalds 2005-06-30 21:23 ` "git-send-pack" H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 20:52 UTC (permalink / raw) To: H. Peter Anvin Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, H. Peter Anvin wrote: > > If I've understood this correctly, it's not a constant factor > improvement in the number of files (in the size, yes); it's changing it > from O(t*c) to O(t) where t is number of trees and c is number of > changesets. That's key. No, it _is_ a constant factor even in number of files, if you just keep the pack objects around without re-packing them. Basically, you'd get one new pack-file every time I push. That's better than getting <n> "raw object" files (where <n> can be anything from just a couple to several thousand, depending on whether I had pulled things), but it's still just a constant factor on both number of files and size of files. Now, you could re-pack the objects every once in a while: it would force a whole new "epoch", of course and then the mirrorers would have to fetch the whole repacked file, but that might be fine. Especially if you stop re-packing after you've hit a certain size (say, a couple of megs), and then start on the next pack. > For the purposes of rsync, storing the objects in a single append-only > file would be a very efficient method, since the rsync algorithm will > quickly discover an invariant head and only transmit the tail. Actually, it won't be "quick" - it will have to read the whole file and do it's hash window thing. You _could_ append the pack-files into one single "superpack" file (since you can figure out where the pack boundaries are), but it would be extremely big after a while, and rsync would spend all its time doing over the hash window. You'd definitely be better off with re-packing. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:52 ` "git-send-pack" Linus Torvalds @ 2005-06-30 21:23 ` H. Peter Anvin 2005-06-30 21:26 ` "git-send-pack" H. Peter Anvin 2005-06-30 21:42 ` "git-send-pack" Linus Torvalds 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 21:23 UTC (permalink / raw) To: Linus Torvalds Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds wrote: > >>For the purposes of rsync, storing the objects in a single append-only >>file would be a very efficient method, since the rsync algorithm will >>quickly discover an invariant head and only transmit the tail. > > Actually, it won't be "quick" - it will have to read the whole file and do > it's hash window thing. > It does that, but it only have to do that when the actual file has changed. That's acceptable, at least for the repository sizes we're likely to deal with within the medium term. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:23 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 21:26 ` H. Peter Anvin 2005-06-30 21:42 ` "git-send-pack" Linus Torvalds 1 sibling, 0 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 21:26 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin H. Peter Anvin wrote: > Linus Torvalds wrote: > >> >>> For the purposes of rsync, storing the objects in a single >>> append-only file would be a very efficient method, since the rsync >>> algorithm will quickly discover an invariant head and only transmit >>> the tail. >> >> >> Actually, it won't be "quick" - it will have to read the whole file >> and do it's hash window thing. >> > > It does that, but it only have to do that when the actual file has > changed. That's acceptable, at least for the repository sizes we're > likely to deal with within the medium term. > I guess I should clarify a bit here. I'm concerned with two aspects: the "keeping mirrors in sync" problem, where asking people to use a tool other than rsync is a really tough sell, and the developer usage scenario, in which case something git-aware is obviously the better thing. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:23 ` "git-send-pack" H. Peter Anvin 2005-06-30 21:26 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 21:42 ` Linus Torvalds 2005-06-30 22:00 ` "git-send-pack" H. Peter Anvin 1 sibling, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-06-30 21:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, H. Peter Anvin wrote: > > It does that, but it only have to do that when the actual file has > changed. That's acceptable, at least for the repository sizes we're > likely to deal with within the medium term. Well, realize that "incremental packs" deltify a lot worse than a "big pack", since pack-files don't do deltas to objects outside the pack-file. So we'd get _some_ compression, but not as much as possible. The current kernel compresses down to a single 63 MB pack-file (that's with the 2.6.11 tree too, not just the HEAD history), but without deltas it weights in at about 177 MB. So a "sum of incremental packs" should be somewhere in between those two values, even today. For a single kernel archive. So repository sizes aren't exactly trivial. I don't know how expensive that rsync hash thing is, but one thing you lose is the ability to hardlink objects, so if you have a few kernel repositories at some point it doesn't fit in the cache any more, and then the rsync will have to read that much pack object stuff from disk in addition to doing the hash. Ugh. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 21:42 ` "git-send-pack" Linus Torvalds @ 2005-06-30 22:00 ` H. Peter Anvin 2005-07-01 10:31 ` "git-send-pack" Matthias Urlichs 2005-07-01 13:56 ` Tags Eric W. Biederman 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-06-30 22:00 UTC (permalink / raw) To: Linus Torvalds Cc: Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds wrote: > > On Thu, 30 Jun 2005, H. Peter Anvin wrote: > >>It does that, but it only have to do that when the actual file has >>changed. That's acceptable, at least for the repository sizes we're >>likely to deal with within the medium term. > > > Well, realize that "incremental packs" deltify a lot worse than a "big > pack", since pack-files don't do deltas to objects outside the pack-file. > > So we'd get _some_ compression, but not as much as possible. The current > kernel compresses down to a single 63 MB pack-file (that's with the 2.6.11 > tree too, not just the HEAD history), but without deltas it weights in at > about 177 MB. > > So a "sum of incremental packs" should be somewhere in between those two > values, even today. For a single kernel archive. > > So repository sizes aren't exactly trivial. I don't know how expensive > that rsync hash thing is, but one thing you lose is the ability to > hardlink objects, so if you have a few kernel repositories at some point > it doesn't fit in the cache any more, and then the rsync will have to read > that much pack object stuff from disk in addition to doing the hash. Ugh. > The bulk of the cost in doing the hashing comes from having to read the file. Well, if you grow a single pack file with appending, then you can have delta references to earlier objects within the same pack file. At least at this point, we'd handle a few very large files a lot better than an enormous swarm of smaller ones. In the end, it might be that the right thing to do for git on kernel.org is to have a single, unified object store which isn't accessible by anything other than git-specific protocols. There would have to be some way of dealing with, for example, conflicting tags that apply to different repositories, though. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 22:00 ` "git-send-pack" H. Peter Anvin @ 2005-07-01 10:31 ` Matthias Urlichs 2005-07-01 14:43 ` "git-send-pack" Jan Harkes 2005-07-01 13:56 ` Tags Eric W. Biederman 1 sibling, 1 reply; 86+ messages in thread From: Matthias Urlichs @ 2005-07-01 10:31 UTC (permalink / raw) To: git Hi, H. Peter Anvin wrote: > In the end, it might be that the right thing to do for git on kernel.org > is to have a single, unified object store which isn't accessible by > anything other than git-specific protocols. Makes sense. > There would have to be some > way of dealing with, for example, conflicting tags that apply to > different repositories, though. > It seems that user-specific subdirectories in refs/heads (and, presumably, ../tags) mostly work already. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Don't lock the barn after it is stolen. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-07-01 10:31 ` "git-send-pack" Matthias Urlichs @ 2005-07-01 14:43 ` Jan Harkes 0 siblings, 0 replies; 86+ messages in thread From: Jan Harkes @ 2005-07-01 14:43 UTC (permalink / raw) To: Matthias Urlichs; +Cc: git On Fri, Jul 01, 2005 at 12:31:53PM +0200, Matthias Urlichs wrote: > > In the end, it might be that the right thing to do for git on kernel.org > > is to have a single, unified object store which isn't accessible by > > anything other than git-specific protocols. > > Makes sense. > > > There would have to be some > > way of dealing with, for example, conflicting tags that apply to > > different repositories, though. > > It seems that user-specific subdirectories in refs/heads (and, presumably, > ../tags) mostly work already. They work pretty well, the core git commands have no problem with them and I just sent off some patches for gitweb and gitk. All git/objects directories can be merged into a common repository. The refs/heads and refs/tags be copied to user specific subdirectories. Then a pull like, git pull http://www.kernel.org/.../torvalds/linux-2.6.git Would become, git pull http://www.kernel.org/.../linux-2.6.git torvalds/linux-2.6/master It would make rsync more expensive for people who are interested in only a branch or two, but there is only one repository which should be easier on the mirrors. The http, ssh, and some future 'pack' transfer methods won't see a difference since they only pull the specific commits they need to catch up with a branch. Jan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Tags 2005-06-30 22:00 ` "git-send-pack" H. Peter Anvin 2005-07-01 10:31 ` "git-send-pack" Matthias Urlichs @ 2005-07-01 13:56 ` Eric W. Biederman 2005-07-01 16:37 ` Tags H. Peter Anvin 2005-07-01 18:09 ` Tags Petr Baudis 1 sibling, 2 replies; 86+ messages in thread From: Eric W. Biederman @ 2005-07-01 13:56 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > In the end, it might be that the right thing to do for git on kernel.org is to > have a single, unified object store which isn't accessible by anything other > than git-specific protocols. There would have to be some way of dealing with, > for example, conflicting tags that apply to different repositories, though. As far as I can tell public distributed tags are not that hard and if you are going to be synching them it is probably worth working on. The basic idea is that instead of having one global tag of 'linux-2.6.13-rc1' you have a global tag of 'torvalds@osdl.org/linux-2.6.13-rc1'. The important part is that the tag namespace is made hierarchical with at least 2 levels. Where the top level is a globally unique tag owner id and the bottom level is the actual tag. This prevents collisions when merging trees because two peoples tags are never in the same namespace, as least when people are not actively hostile :) Still being a complete git dummy I think the trivial mapping is to put tags in: .git/refs/tags/user@domain/tag and then have a symlink at: .git/TAGS that points to your default directory of tags. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 13:56 ` Tags Eric W. Biederman @ 2005-07-01 16:37 ` H. Peter Anvin 2005-07-01 22:38 ` Tags Eric W. Biederman 2005-07-01 18:09 ` Tags Petr Baudis 1 sibling, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-01 16:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > > >>In the end, it might be that the right thing to do for git on kernel.org is to >>have a single, unified object store which isn't accessible by anything other >>than git-specific protocols. There would have to be some way of dealing with, >>for example, conflicting tags that apply to different repositories, though. > > > As far as I can tell public distributed tags are not that hard and if > you are going to be synching them it is probably worth working on. > > The basic idea is that instead of having one global tag of > 'linux-2.6.13-rc1' you have a global tag of > 'torvalds@osdl.org/linux-2.6.13-rc1'. > > The important part is that the tag namespace is made hierarchical > with at least 2 levels. Where the top level is a globally > unique tag owner id and the bottom level is the actual tag. This > prevents collisions when merging trees because two peoples > tags are never in the same namespace, as least when > people are not actively hostile :) > > Still being a complete git dummy I think the trivial mapping is > to put tags in: > .git/refs/tags/user@domain/tag > and then have a symlink at: > .git/TAGS > that points to your default directory of tags. > Unless you have an authentication mechanism and *enforce* it (you can do that with GPG signatures if *and only if* your disambiguation includes your GPG signature fingerprint) you still have a problem with someone introducing fake tags as a DoS attack. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 16:37 ` Tags H. Peter Anvin @ 2005-07-01 22:38 ` Eric W. Biederman 2005-07-01 22:44 ` Tags H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Eric W. Biederman @ 2005-07-01 22:38 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> "H. Peter Anvin" <hpa@zytor.com> writes: >> > Unless you have an authentication mechanism and *enforce* it (you can do that > with GPG signatures if *and only if* your disambiguation includes your GPG > signature fingerprint) you still have a problem with someone introducing fake > tags as a DoS attack. There is a question of how bad is this. For releases you certainly need some kind of signature that people can verify and we already have that but I think we can keep spoofing tags down to the same level as spoofing patches. Basically all this takes is to make your global namespace the committer email address and you have the rule that you can only tag your own commits. Then when you merge tags you never automatically add tags to your own tag namespace. I think that is enough to make global tags usable in practice. And for those who are typing challenged if all you ever look at are your own tags the you should never need to specify a fully qualified tag name as git should be able to find the committer email address through other means. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 22:38 ` Tags Eric W. Biederman @ 2005-07-01 22:44 ` H. Peter Anvin 2005-07-01 23:07 ` Tags Eric W. Biederman 2005-07-02 16:00 ` Tags Matthias Urlichs 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-01 22:44 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > > There is a question of how bad is this. For releases you certainly > need some kind of signature that people can verify and we > already have that but I think we can keep spoofing tags > down to the same level as spoofing patches. > > Basically all this takes is to make your global namespace > the committer email address and you have the rule that > you can only tag your own commits. Then when you merge > tags you never automatically add tags to your own tag namespace. > Doesn't work. You can trivially generate a key with someone else's address. It would require a full PKI. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 22:44 ` Tags H. Peter Anvin @ 2005-07-01 23:07 ` Eric W. Biederman 2005-07-01 23:22 ` Tags Daniel Barkalow 2005-07-02 0:06 ` Tags H. Peter Anvin 2005-07-02 16:00 ` Tags Matthias Urlichs 1 sibling, 2 replies; 86+ messages in thread From: Eric W. Biederman @ 2005-07-01 23:07 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> There is a question of how bad is this. For releases you certainly >> need some kind of signature that people can verify and we >> already have that but I think we can keep spoofing tags >> down to the same level as spoofing patches. >> Basically all this takes is to make your global namespace >> the committer email address and you have the rule that >> you can only tag your own commits. Then when you merge >> tags you never automatically add tags to your own tag namespace. >> > > Doesn't work. You can trivially generate a key with someone else's address. It > would require a full PKI. I'm not saying it's provable correct. I'm simply saying it is as correct as the rest of the git repository. If I really care what developer xyz tagged I will pull from them, or a mirror I trust. And since developer xyz doesn't pull his own global tags from other repositories that should be sufficient. Plus if you pull from a spoofed tag somewhere further along when you merge your code the merge will fail because what you thought was a common ancestor isn't. And you will also likely get an error when you have the same tag coming from 2 different sources with different values. So all I am really arguing is that using the committer email address is simply sufficient to prevent non-malicious conflicts between developers, and it makes it enough that to get a malicious conflict isn't completely trivial. So I think it is good enough. But for releases and things lots of people must trust yes you want a full PKI infrastructure but I don't see a reason any of that should be inherently tied to tags. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 23:07 ` Tags Eric W. Biederman @ 2005-07-01 23:22 ` Daniel Barkalow 2005-07-02 0:06 ` Tags H. Peter Anvin 1 sibling, 0 replies; 86+ messages in thread From: Daniel Barkalow @ 2005-07-01 23:22 UTC (permalink / raw) To: Eric W. Biederman Cc: H. Peter Anvin, Linus Torvalds, Git Mailing List, Junio C Hamano, ftpadmin On Fri, 1 Jul 2005, Eric W. Biederman wrote: > Plus if you pull from a spoofed tag somewhere further along > when you merge your code the merge will fail because what > you thought was a common ancestor isn't. And you will > also likely get an error when you have the same tag > coming from 2 different sources with different values. Actually, I think it would be beneficial to support multiple tags with the same name in any case: if people are going to use local private tags like "broken", either we need to support having refs/tags/broken being a list of hashes, or any particular user can only have one broken version. I don't see any major problems with having refs/ files contain potentially multiple hashes (limited by what makes sense to be multiple; i.e., heads/* should have only one value), and this lets the users check the content of the tag objects to figure out what they care about, and either specify things in more detail or discard things they don't like (or, when appropriate, use all values). The main issue I see is that rsync wouldn't merge them usefully. (And it would be useful to have a structure to support keeping a simple piece of information about a set of objects.) -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 23:07 ` Tags Eric W. Biederman 2005-07-01 23:22 ` Tags Daniel Barkalow @ 2005-07-02 0:06 ` H. Peter Anvin 2005-07-02 7:00 ` Tags Eric W. Biederman 2005-07-02 20:38 ` Tags Jan Harkes 1 sibling, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 0:06 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > > If I really care what developer xyz tagged I will pull from them, > or a mirror I trust. And since developer xyz doesn't pull his > own global tags from other repositories that should be sufficient. > You're missing something totally and utterly fundamental here: I'm talking about creating an infrastructure (think sourceforge) where there is only one git repository for the whole system, period, full stop, end of story. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 0:06 ` Tags H. Peter Anvin @ 2005-07-02 7:00 ` Eric W. Biederman 2005-07-02 17:47 ` Tags H. Peter Anvin 2005-07-02 20:38 ` Tags Jan Harkes 1 sibling, 1 reply; 86+ messages in thread From: Eric W. Biederman @ 2005-07-02 7:00 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> If I really care what developer xyz tagged I will pull from them, >> or a mirror I trust. And since developer xyz doesn't pull his >> own global tags from other repositories that should be sufficient. >> > > You're missing something totally and utterly fundamental here: I'm talking about > creating an infrastructure (think sourceforge) where there is only one git > repository for the whole system, period, full stop, end of story. Could be I'm certainly not up to speed on git yet. However all you have to do for your single system git repository is to filter tags at creation time. So for a person to upload something you need a git aware tool and you need authentication so you are certain it is the right person creating the tag. Since it is a shared repository you probably want rules like you can only create tags that belong to yourself or are owned by people who do not have accounts on the system. Likewise in a system like sourceforge it is desirable to check all of the committer information in commits as well, so you have a reasonable audit trail, and it make sense to check little things like the file under a sha1 key actually matches the sha1 key. Downstream mirrors can happily rsync just fine. So long as they verify the upstream source. Tags that you mirror are of course suspect but they will always be. The primary tags created by people with accounts should be reliable though. So in essence I see nothing with my proposal that is any worse than any other part of git. That being said, it sounds like there is a slightly more git knowledgeable/native version suggested having to do with multiple heads. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 7:00 ` Tags Eric W. Biederman @ 2005-07-02 17:47 ` H. Peter Anvin 2005-07-02 17:54 ` Tags Eric W. Biederman 0 siblings, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 17:47 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > > However all you have to do for your single system git repository is > to filter tags at creation time. So for a person to upload something > you need a git aware tool and you need authentication so you are certain > it is the right person creating the tag. > That's complicated; it pretty much works out to having to have a PKI and a system of registered IDs, or some such. That's painful. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 17:47 ` Tags H. Peter Anvin @ 2005-07-02 17:54 ` Eric W. Biederman 2005-07-02 17:58 ` Tags H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Eric W. Biederman @ 2005-07-02 17:54 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> However all you have to do for your single system git repository is >> to filter tags at creation time. So for a person to upload something >> you need a git aware tool and you need authentication so you are certain >> it is the right person creating the tag. > > That's complicated; it pretty much works out to having to have a PKI and a > system of registered IDs, or some such. That's painful. ?? Isn't that what ssh is? To some extent a lot depends on how active you expect people to try and forge things. If there is an expectation of honesty you are fine. If you want to build one mondo repository with thousands of developers having write access you need to be more careful. But as far as I know none of that is specific to tags. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 17:54 ` Tags Eric W. Biederman @ 2005-07-02 17:58 ` H. Peter Anvin 2005-07-02 18:31 ` Tags Eric W. Biederman 2005-07-02 18:45 ` Tags Linus Torvalds 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 17:58 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > > ?? Isn't that what ssh is? > > To some extent a lot depends on how active you expect people to > try and forge things. If there is an expectation of honesty > you are fine. > I can't afford to have that. > If you want to build one mondo repository with thousands of developers > having write access you need to be more careful. But as far as I know > none of that is specific to tags. Well, you're wrong. Tags is the only part of git which cannot be protected by git's own self-validation system. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 17:58 ` Tags H. Peter Anvin @ 2005-07-02 18:31 ` Eric W. Biederman 2005-07-02 19:55 ` Tags Matthias Urlichs 2005-07-02 21:16 ` Tags H. Peter Anvin 2005-07-02 18:45 ` Tags Linus Torvalds 1 sibling, 2 replies; 86+ messages in thread From: Eric W. Biederman @ 2005-07-02 18:31 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> ?? Isn't that what ssh is? >> To some extent a lot depends on how active you expect people to >> try and forge things. If there is an expectation of honesty >> you are fine. > > I can't afford to have that. So you are now your requirements are more stringent then sourceforge? Sourcefore limited things by reducing the scope of commits per project. But once you had commit access to a project you could do just about anything. >> If you want to build one mondo repository with thousands of developers >> having write access you need to be more careful. But as far as I know >> none of that is specific to tags. > > Well, you're wrong. Tags is the only part of git which cannot be protected by > git's own self-validation system. Which is why I suggested having tags in sync with the committer information, that way you are as valid as the commit record in git. Although I suspect the multiple head solution is probably better, and simply limiting the people who can commit to an individual head will achieve what is necessary. One user per head? One thing arch has shown is that you can sucessfully move authentication/permission checking to the underlying environment if you structure things carefully. I guess the problem is really we want to structure things so that a user who has downloaded the code can verify they have the release/tag is what they are looking for. You can detect a spoofed file in objects by simply verifying the sha1 of the file. For a file that you can't internally verify that way the traditional way to handle that is to create a file with a gpg signature. So is there anything wrong with adding .git/refs/tags/tag-name.sign that is a traditional signature file? That will at least give you an end to end consistency check. (Hmm. Why didn't I suggest this before?) If you don't want to mirror and propagate data you need to do consistency checks earlier in the process, and I have probably had some poor suggestions on how to implement those. But if everything is setup so we can verify things once we have the code downloaded, where you perform the checks is simply a matter of optimization. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 18:31 ` Tags Eric W. Biederman @ 2005-07-02 19:55 ` Matthias Urlichs 2005-07-02 21:16 ` Tags H. Peter Anvin 1 sibling, 0 replies; 86+ messages in thread From: Matthias Urlichs @ 2005-07-02 19:55 UTC (permalink / raw) To: git Hi, Eric W. Biederman wrote: > So > is there anything wrong with adding .git/refs/tags/tag-name.sign > that is a traditional signature file? The signature is already appended to the tag file itself (or can be). See "git-tag-script". -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Democracy is that form of government where everybody gets what the majority deserves. -- James Dale Davidson ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 18:31 ` Tags Eric W. Biederman 2005-07-02 19:55 ` Tags Matthias Urlichs @ 2005-07-02 21:16 ` H. Peter Anvin 2005-07-02 21:39 ` Tags Linus Torvalds 1 sibling, 1 reply; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 21:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Eric W. Biederman wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > > >>Eric W. Biederman wrote: >> >>>?? Isn't that what ssh is? >>>To some extent a lot depends on how active you expect people to >>>try and forge things. If there is an expectation of honesty >>>you are fine. >> >>I can't afford to have that. > > So you are now your requirements are more stringent then sourceforge? > Sourcefore limited things by reducing the scope of commits per > project. But once you had commit access to a project you could do > just about anything. > They're not using a single global object storage. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 21:16 ` Tags H. Peter Anvin @ 2005-07-02 21:39 ` Linus Torvalds 2005-07-02 21:42 ` Tags H. Peter Anvin 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 21:39 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Sat, 2 Jul 2005, H. Peter Anvin wrote: > > They're not using a single global object storage. Note that the fact that you use a common object store does not mean that everything should be common. I still contend that tags and branches and things like that should be personal. A "gitforge" thing should _not_ try to unify tags. Instead, give people their own private area for keeping their own private references (you can limit it to just a few kilobytes per person, so you might as well just consider it to be part of their "user information" thing along with whatever other preferences they have). Then, they call all share the objects, and there's never any confusion about tags - everybody has their own tags, and you add a few simple operations like "copy user xxx's tag to my tag-space, and start a new branch from that". There're really no downsides. The only thing you need to have is some nice tag-browser (and some simple permission model where developers can say "others can read my tag" or "this tag is visible only to me" - the object store may be shared, but if nobody can see your pointers into the object store, you effectively have a totally private branch - which might be what some people want). There's really never any reason to make tags global. Even in the case of the kernel, people don't want to see a tag like "v2.6.12". They want to see what _I_ tagged v2.6.12, so implicit in that whole thing is very much that they want to see _my_ tags. Again, it's a _browsing_ issue, not a "tags should be global" issue. They should be visible and easily fetchable. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 21:39 ` Tags Linus Torvalds @ 2005-07-02 21:42 ` H. Peter Anvin 2005-07-02 22:02 ` Tags A Large Angry SCM ` (2 more replies) 0 siblings, 3 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-02 21:42 UTC (permalink / raw) To: Linus Torvalds Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds wrote: > > Note that the fact that you use a common object store does not mean that > everything should be common. > > I still contend that tags and branches and things like that should be > personal. A "gitforge" thing should _not_ try to unify tags. Instead, give > people their own private area for keeping their own private references > (you can limit it to just a few kilobytes per person, so you might as well > just consider it to be part of their "user information" thing along with > whatever other preferences they have). > > Then, they call all share the objects, and there's never any confusion > about tags - everybody has their own tags, and you add a few simple > operations like "copy user xxx's tag to my tag-space, and start a new > branch from that". > > There're really no downsides. The only thing you need to have is some nice > tag-browser (and some simple permission model where developers can say > "others can read my tag" or "this tag is visible only to me" - the object > store may be shared, but if nobody can see your pointers into the object > store, you effectively have a totally private branch - which might be > what some people want). > > There's really never any reason to make tags global. Even in the case of > the kernel, people don't want to see a tag like "v2.6.12". They want to > see what _I_ tagged v2.6.12, so implicit in that whole thing is very much > that they want to see _my_ tags. Again, it's a _browsing_ issue, not a > "tags should be global" issue. They should be visible and easily > fetchable. > OK, so let me retell what I think I hear you say: - Store all the tags in the object store; they may conflict. - Let each source user have a set of refs, and provide a method for the end user to select which refs to get. In other words, the only way (other than knowing what GPG keys to trust) to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" is that the former is referenced by *your* refs as opposed to JRH's refs. This also means the refs cannot be uniquely rebuilt from the object storage. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 21:42 ` Tags H. Peter Anvin @ 2005-07-02 22:02 ` A Large Angry SCM 2005-07-02 22:20 ` Tags Linus Torvalds 2005-07-02 22:14 ` Tags Petr Baudis 2005-07-02 22:17 ` Tags Linus Torvalds 2 siblings, 1 reply; 86+ messages in thread From: A Large Angry SCM @ 2005-07-02 22:02 UTC (permalink / raw) To: Git Mailing List Cc: H. Peter Anvin, Linus Torvalds, Eric W. Biederman, Daniel Barkalow, Junio C Hamano, ftpadmin H. Peter Anvin wrote: ... > > OK, so let me retell what I think I hear you say: > > - Store all the tags in the object store; they may conflict. > - Let each source user have a set of refs, and provide a method for the > end user to select which refs to get. > > In other words, the only way (other than knowing what GPG keys to trust) > to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" > is that the former is referenced by *your* refs as opposed to JRH's > refs. This also means the refs cannot be uniquely rebuilt from the > object storage. Why have tag objects at all? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 22:02 ` Tags A Large Angry SCM @ 2005-07-02 22:20 ` Linus Torvalds 2005-07-02 23:49 ` Tags A Large Angry SCM 0 siblings, 1 reply; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 22:20 UTC (permalink / raw) To: A Large Angry SCM Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman, Daniel Barkalow, Junio C Hamano, ftpadmin On Sat, 2 Jul 2005, A Large Angry SCM wrote: > > Why have tag objects at all? Trust. None of git itself normally has any "trust". The SHA1 means that the _integrity_ of the archive is ensured, but for some things (notably releases), you want to have something else. That's the "tag object". And I really should probably have called them something else. _I_ personally tend to want to have a 1:1 relationship between my "tag references" (ie the 20-byte SHA1 pointer) and my "tag objects", but that's because my releases are things that I envision people may actually want to verify are mine. In many cases, you'd never use a "tag object", and the "tag reference" would just point directly to a commit, with no extra indirect object. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 22:20 ` Tags Linus Torvalds @ 2005-07-02 23:49 ` A Large Angry SCM 2005-07-03 0:17 ` Tags Linus Torvalds 0 siblings, 1 reply; 86+ messages in thread From: A Large Angry SCM @ 2005-07-02 23:49 UTC (permalink / raw) To: Linus Torvalds Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman, Daniel Barkalow, Junio C Hamano, ftpadmin Linus Torvalds wrote: > > On Sat, 2 Jul 2005, A Large Angry SCM wrote: >>Why have tag objects at all? > > Trust. > > None of git itself normally has any "trust". The SHA1 means that the > _integrity_ of the archive is ensured, but for some things (notably > releases), you want to have something else. That's the "tag object". > But can't the commit object do this just as well by signing the commit text? > And I really should probably have called them something else. _I_ > personally tend to want to have a 1:1 relationship between my "tag > references" (ie the 20-byte SHA1 pointer) and my "tag objects", but that's > because my releases are things that I envision people may actually want to > verify are mine. > Your tendency is to use tag objects as a permanent, public label of some state. Signing the commit text or the email stating that commit ${COMMIT_SHA} would work just as well for verification purposes. Or even a blob object containing the signed text "${COMMIT_SHA} is vX.X.X.X". Either way, you'd still need some kind of external reference to find the object. > In many cases, you'd never use a "tag object", and the "tag reference" > would just point directly to a commit, with no extra indirect object. Tag refs, like head refs and branches, are all just (temporary) notational shorthand to make using the tools easier. The problem with the Borg repository is not the objects but the object refs. Isn't that just a namespace problem? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 23:49 ` Tags A Large Angry SCM @ 2005-07-03 0:17 ` Linus Torvalds 0 siblings, 0 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-03 0:17 UTC (permalink / raw) To: A Large Angry SCM Cc: Git Mailing List, H. Peter Anvin, Eric W. Biederman, Daniel Barkalow, Junio C Hamano, ftpadmin On Sat, 2 Jul 2005, A Large Angry SCM wrote: > > Linus Torvalds wrote: > > > > None of git itself normally has any "trust". The SHA1 means that the > > _integrity_ of the archive is ensured, but for some things (notably > > releases), you want to have something else. That's the "tag object". > > > > But can't the commit object do this just as well by signing the commit text? Yes and no. Technically yes, absolutely, you could add a signature to the commit text. However, that's just wrong for several reasons: First off, the signing is not necessarily done by the person committing something. Think of any paperwork: the person that signs the paperwork is not necessarily the same person that _wrote_ the paperwork. A signature is a "witness". For an example of this, look at the signatures that we've had for a long time on kernel.org: check out the files like "patch-2.6.8.1.sign". That's a signature, but it's not a signature by _me_. It's kernel.org signing the thing so that downstream people can verify things. And it would be not only wrong, but literally _impossible_ for me to do it in the commit. I don't have (or want to have) the kernel.org private key. That's not what the signature is about. kernel.org is signing that "this is what I got, and what I passed on". It's not signing that "this is what I wrote". In a lot of systems, you tag something good after it has passed a regression test. Ie the _tag_ may happen days or even weeks after the commit has been done. So any system that signs commits directly is doing something _wrong_. Secondly, you can say that you trust other things. In git, you can tag individual blobs, and you can tag individual trees. For an example of where it makes sense to tag (sign) individual file versions, we've actually had things like ISDN drivers (or firmware) that passed some telco verification suite, and in certain countries it used to be that you weren't legally supposed to use hadrware that hadn't passed that suite. In cases like that, you could sign the particular version of the driver, and say "this one is good". (Yeah, those laws are happily going away, but I think the ISDN people in germany actually ended up doing exactly that, except they obviously didn't use git signatures. I think they had a list of file+md5sum). Finally, it's a tools issue. It's wrong to mix up the notion of committing and signing in the same thing, because that just complicates a tool that has to be able to do both. Now you can have a nice graphical commit tool, and it doesn't need to know about public keys etc to be useful - you can use another tool to do the signing. Small is beautiful, but "independent" is even more so. > Your tendency is to use tag objects as a permanent, public label of some > state. Signing the commit text or the email stating that commit > ${COMMIT_SHA} would work just as well for verification purposes. Well, according to that logic, you'd never need signatures at all - you can always keep them totally outside the system. But if they are totally outside the system, then you have to have some other mechanism to track them, and you can never trust a git archive on its own. My goal with the tag objects was that you can just get my git archive, and the archive is _inherently_ trustworthy, because if you care, you can verify it without any external input at all (except you need to know my public key, of course, but that's not a tools issue any more, that's about how signatures work). So by having tag objects, I can just have refs to them, and anything that can fetch a ref (which implies _any_ kind of "pull" functionality) can get it. No special cases. No crap. Do one thing, and do it well. Git does objects with relationships. That's really what git is all about, and the "tag object" fits very well into that mentality. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 21:42 ` Tags H. Peter Anvin 2005-07-02 22:02 ` Tags A Large Angry SCM @ 2005-07-02 22:14 ` Petr Baudis 2005-07-02 22:17 ` Tags Linus Torvalds 2 siblings, 0 replies; 86+ messages in thread From: Petr Baudis @ 2005-07-02 22:14 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Dear diary, on Sat, Jul 02, 2005 at 11:42:51PM CEST, I got a letter where "H. Peter Anvin" <hpa@zytor.com> told me that... > Linus Torvalds wrote: > > > >Note that the fact that you use a common object store does not mean that > >everything should be common. \o/ Finally I have some hope that we don't end up with something braindead w.r.t. the tags... ;-) ..snip.. > OK, so let me retell what I think I hear you say: > > - Store all the tags in the object store; they may conflict. They may have the same "human-readable name", but they will have a different hash. > - Let each source user have a set of refs, and provide a method for the > end user to select which refs to get. > > In other words, the only way (other than knowing what GPG keys to trust) > to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" > is that the former is referenced by *your* refs as opposed to JRH's > refs. After all, this is the best way to distinguish it, isn't it? Just "tag name" without a name of the branch the tag concerns makes no sense - that's the point I'm trying to get along. JRH's v2.6.12 wouldn't make much sense to you if you use Linus' v2.6.12, since the object JRH's v2.6.12 references simply may not be in the branch you use. Yes, JRH could tag it somewhere in the common past, but that's kind of strange and is likely some private JRH's stuff. If Linus merged JRH, he will take his v2.6.12 if it makes sense in his branch - so the decision is then up to the one who merges, which makes some sense too. FYI, I'll teach Cogito about the refs/tags/<branch>/<tag> later today (and totally offtopic, it already has some trivial cg-push now). It will still fall back to refs/tags/<tag>. > This also means the refs cannot be uniquely rebuilt from the > object storage. Why should they be, after all. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 21:42 ` Tags H. Peter Anvin 2005-07-02 22:02 ` Tags A Large Angry SCM 2005-07-02 22:14 ` Tags Petr Baudis @ 2005-07-02 22:17 ` Linus Torvalds 2005-07-03 0:04 ` Tags Dan Holmsand 2005-07-05 13:04 ` Tags Eric W. Biederman 2 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 22:17 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Sat, 2 Jul 2005, H. Peter Anvin wrote: > > OK, so let me retell what I think I hear you say: > > - Store all the tags in the object store; they may conflict. No. They cannot conflict. A git "tag object" cannmot conflict in any way. It is just a generic "pointer object", and like all other objects, it is defined by its contents, and there are no "conflicts". If two people have exactly the same pointer, they'll just have the same object - that's not a conflict, that's just a fact of life with content-addressable filesystems. The git "tag object" contains a suggested symbolic name, but that actually has no meaning except as being informational. So for example: [torvalds@g5 linux]$ git-cat-file tag v2.6.12 object 9ee1c939d1cb936b1f98e8d81aeffab57bae46ab type commit tag v2.6.12 This is the final 2.6.12 release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQBCsykyF3YsRnbiHLsRAvPNAJ482tCZwuxp/bJRz7Q98MHlN83TpACdHr37 o6X/3T+vm8K3bf3driRr34c= =sBHn -----END PGP SIGNATURE----- here the "symbolic name" is "v2.6.12", but that's purely informational, and nothing at all cares if a million people have made their own tags that have that same tag-name. The git _object_ is: [torvalds@g5 linux]$ git-rev-parse v2.6.12 26791a8bcf0e6d33f43aef7682bdb555236d56de and that object name is going to be unique (modulo hash collissions) > - Let each source user have a set of refs, and provide a method for the > end user to select which refs to get. Right. Let users have any damn refs they want. They may be refs to tags objects, but they may just be direct refs to the commit. The tag object really has no meaning to git, except it allows signing. That's really the _only_ thing a tag object does: it introduces trust. There's no other reason to ever use one, really. And a "tag ref" thing is really nothing more (and nothing less) than a branch. It's a 41-byte filename, although if you actually were to have a "gitforge" deamon, it could also be just the raw 20-byte SHA1 in a database. Let people have their own refs, and have some good way to create them and delete them, and copy them from others (and refer to other peoples refs - one common usage might be "I want to merge with that other users ref 'xyzzy'". Note that the .git/refs/tags/xxx files are _literally_ treated exactly the same as the same files under "heads". Or under "mydir". Git really doesn't care, it's purely syntactic sugar. To git, a ref is a ref is a ref. It just refers to an object, and it's nothing more than a way to specify some random SHA1 at any time. > In other words, the only way (other than knowing what GPG keys to trust) > to distinguish between your "v2.6.12" and J. Random Hacker's "v2.6.12" > is that the former is referenced by *your* refs as opposed to JRH's > refs. This also means the refs cannot be uniquely rebuilt from the > object storage. Right. All the refs are personal and "fleeting" - some refs are actively changed all the time (branch refs - aka "heads" - get updated when you update the branch). Tags are really the same way in all technical ways, and the only real difference between a "branch ref" and a "tag ref" is your _expectation_ of them - one you expect to be mostly stable, the other you expect to be updated with development. _Technically_ there's no difference between the two, though. (And you might also change tag contents occasionally. One reason might be a bug and you decide to re-tag something else. But a more common reason might be because you want to have tags like "latest" that don't actually update with development, but they update with some other event, like a release event or some automated test cycle completion or something like that. So tags aren't _immutable_ even from an expectation standpoint, it's just that they tend to change _less_). Now, from tag _objects_ (as opposed to tag refs) you _can_ build them if somebody created a tag object, and you have the signature so that you can re-associate the tag-name with the person. But you should consider that a pretty heavy and unusual case. The normal case is that you just want to back up peoples refs. They're like a part of a personal ".gitrc": you could equally well think of them as "these are my shorthands, because I don't want to talk about 40-digit hex numbers all the time". It's nothing more than a personal address book, really. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 22:17 ` Tags Linus Torvalds @ 2005-07-03 0:04 ` Dan Holmsand 2005-07-03 22:34 ` Tags Kevin Smith 2005-07-05 13:04 ` Tags Eric W. Biederman 1 sibling, 1 reply; 86+ messages in thread From: Dan Holmsand @ 2005-07-03 0:04 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds wrote: > And a "tag ref" thing is really nothing more (and nothing less) than a > branch. I'm guessing that this is the root of the confusion here. To you, and to git, a tag is just a another branch. And a tag object is pretty much a specialized commit object, that can't have children and only one parent. But people seem to *expect* tags to be connected somehow to a specific repository. Or, rather, to a specific branch. That's why people want e.g. cogito to get "all the tags" from torvalds/linux-2.6.git when they cg-pull. From git's point of view, that doesn't really make any sense; it's like saying that you should pull all the branches from a specific branch. But from a practical point of view, it *does* make sense if you hold the view that tags are connected to a branch, and that you should be able to diff against v2.6.12 as soon as you've pulled the latest head. So why not add tags to the branch itself? It should be pretty straightforward: just make git look for tag refs in, say, a .gittags tree in the current HEAD. The whole thing would pretty much as if you've symlinked .git/refs/tags to .gittags in the current working tree, except that tag refs would have to be read directly from the repository. That way, tag refs could be handled pretty much just like any other git-managed file: they can be added, deleted, changed, merged, committed, etc. We could track their history, and see who tagged what and when. And tags could easily be signed and contain arbitrary text, just like the present day tag objects, as long as they start with a sha1 ref. This way, a git branch could have public, shared tags, with a minimum of hassle. No special-casing needed for storage or transfer. And there would be no room for conflicting tag names (but you could easily use the same name in different branches, just as any file can differ in content between two branches). It might be useful, though, to add some syntax for "tag in a specific branch", say <branch-name>@<tag-name>. The present tagging mechanism should be kept. It is useful for private tagging, and may be useful for signalling that "this is a branch that is unlikely to change". So, am I missing something obvious here? /dan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-03 0:04 ` Tags Dan Holmsand @ 2005-07-03 22:34 ` Kevin Smith 0 siblings, 0 replies; 86+ messages in thread From: Kevin Smith @ 2005-07-03 22:34 UTC (permalink / raw) To: Dan Holmsand; +Cc: Git Mailing List Dan Holmsand wrote: > So why not add tags to the branch itself? > > It should be pretty straightforward: just make git look for tag refs in, > say, a .gittags tree in the current HEAD. The whole thing would pretty > much as if you've symlinked .git/refs/tags to .gittags in the current > working tree, except that tag refs would have to be read directly from > the repository. > > That way, tag refs could be handled pretty much just like any other > git-managed file: they can be added, deleted, changed, merged, > committed, etc. We could track their history, and see who tagged what > and when. Sounds like the way mercurial handles tags. It really seemed weird to me at first, but the more I think about it, the more it makes sense. Even more so after reading this thread :-) http://www.serpentine.com/mercurial/index.cgi?Tag Kevin ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 22:17 ` Tags Linus Torvalds 2005-07-03 0:04 ` Tags Dan Holmsand @ 2005-07-05 13:04 ` Eric W. Biederman 2005-07-05 16:21 ` Tags Daniel Barkalow 1 sibling, 1 reply; 86+ messages in thread From: Eric W. Biederman @ 2005-07-05 13:04 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds <torvalds@osdl.org> writes: > (And you might also change tag contents occasionally. One reason might be > a bug and you decide to re-tag something else. But a more common reason > might be because you want to have tags like "latest" that don't actually > update with development, but they update with some other event, like a > release event or some automated test cycle completion or something like > that. So tags aren't _immutable_ even from an expectation standpoint, > it's just that they tend to change _less_). Could you include the person who generated the tag and the time the tag was generated in the tag object? For a tag like "latest" it would help quite a bit if you could actually find out which was the latest version of it :) Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 13:04 ` Tags Eric W. Biederman @ 2005-07-05 16:21 ` Daniel Barkalow 2005-07-05 17:51 ` Tags Eric W. Biederman 0 siblings, 1 reply; 86+ messages in thread From: Daniel Barkalow @ 2005-07-05 16:21 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, H. Peter Anvin, Git Mailing List, Junio C Hamano, ftpadmin On Tue, 5 Jul 2005, Eric W. Biederman wrote: > Could you include the person who generated the tag and the time the > tag was generated in the tag object? > > For a tag like "latest" it would help quite a bit if you could actually > find out which was the latest version of it :) Actually, what you really want here is to put in refs/tags/latest the hash of the tag whose "tag" field is v2.6.13-rc1 (or whatever it is). Having a tag with the "tag" field of "latest" would be a bit silly, because the object will probably stay in circulation long after it's no longer true. And the object itself would tell you that it was the latest version when it was created (but isn't every version?). That's why you want the _tag_ to say something useful about the version (maybe "v2.6.12", maybe just "tested"), and the _ref_ to tell you it's the latest. The fact that lots of tags get refs named with their contents is just due to tags only getting used for a small portion of their possible uses. This only happens when the feature you'd look something up under is a feature which is persistent. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 16:21 ` Tags Daniel Barkalow @ 2005-07-05 17:51 ` Eric W. Biederman 2005-07-05 18:33 ` Tags Linus Torvalds 0 siblings, 1 reply; 86+ messages in thread From: Eric W. Biederman @ 2005-07-05 17:51 UTC (permalink / raw) To: Daniel Barkalow Cc: Linus Torvalds, H. Peter Anvin, Git Mailing List, Junio C Hamano, ftpadmin Daniel Barkalow <barkalow@iabervon.org> writes: > On Tue, 5 Jul 2005, Eric W. Biederman wrote: > >> Could you include the person who generated the tag and the time the >> tag was generated in the tag object? >> >> For a tag like "latest" it would help quite a bit if you could actually >> find out which was the latest version of it :) > > The fact that lots of tags get refs named with their contents is just due > to tags only getting used for a small portion of their possible uses. This > only happens when the feature you'd look something up under is a feature > which is persistent. True but if you can you will get multiple tags with the same suggested name. So you need so way to find the one you care about. Either a date or it's position in the tree, are all you have to go on. I picked on latest as that is an extreme example that had already been mentioned. Eric ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 17:51 ` Tags Eric W. Biederman @ 2005-07-05 18:33 ` Linus Torvalds 2005-07-05 19:22 ` Tags Junio C Hamano 2005-07-07 3:31 ` Tags Eric W. Biederman 0 siblings, 2 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-05 18:33 UTC (permalink / raw) To: Eric W. Biederman Cc: Daniel Barkalow, H. Peter Anvin, Git Mailing List, Junio C Hamano, ftpadmin On Tue, 5 Jul 2005, Eric W. Biederman wrote: > > True but if you can you will get multiple tags with the > same suggested name. So you need so way to find the one you > care about. I do agree that it would make sense to have a "tagger" field with the same semantics as the "committer" in a commit (including all the same fields: real name, email, and date). Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 18:33 ` Tags Linus Torvalds @ 2005-07-05 19:22 ` Junio C Hamano 2005-07-06 18:04 ` Tags Matthias Urlichs 2005-07-07 3:31 ` Tags Eric W. Biederman 1 sibling, 1 reply; 86+ messages in thread From: Junio C Hamano @ 2005-07-05 19:22 UTC (permalink / raw) To: Linus Torvalds Cc: Eric W. Biederman, Daniel Barkalow, H. Peter Anvin, Git Mailing List, ftpadmin >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> On Tue, 5 Jul 2005, Eric W. Biederman wrote: >> >> True but if you can you will get multiple tags with the >> same suggested name. So you need so way to find the one you >> care about. LT> I do agree that it would make sense to have a "tagger" field with the same LT> semantics as the "committer" in a commit (including all the same fields: LT> real name, email, and date). While we are talking about changing tag object format/fields, I've wondered if we would want to be able to associate more than one objects with a single tag (i.e. have more than one "object" lines just like commits can have more than one "parent" lines). I admit that it would not be a "tag" anymore, rather, it would be a "bag". I wanted to have something like this in the past for some reason I do not exactly remember anymore, but basically it was to record "here is the list of related objects." I could fake it with a multi-parent commit with a commit message if all I want to include are commits with a single blob, but that is (1) abusing the commit to record something that is not even a merge, and (2) the tree associated with that commit would not mean anything. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 19:22 ` Tags Junio C Hamano @ 2005-07-06 18:04 ` Matthias Urlichs 0 siblings, 0 replies; 86+ messages in thread From: Matthias Urlichs @ 2005-07-06 18:04 UTC (permalink / raw) To: git Hi, Junio C Hamano wrote: > I wanted to have something like this in the past for some reason > I do not exactly remember anymore, but basically it was to > record "here is the list of related objects." One use I'd have for that is regression testing -- collect all IDs in one bag and then say "gitk bad ^good". OTOH, I dunno whether the core tools really need to understand that. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - If at first you don't succeed, you must be a programmer. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-05 18:33 ` Tags Linus Torvalds 2005-07-05 19:22 ` Tags Junio C Hamano @ 2005-07-07 3:31 ` Eric W. Biederman 1 sibling, 0 replies; 86+ messages in thread From: Eric W. Biederman @ 2005-07-07 3:31 UTC (permalink / raw) To: Linus Torvalds Cc: Daniel Barkalow, H. Peter Anvin, Git Mailing List, Junio C Hamano, ftpadmin Linus Torvalds <torvalds@osdl.org> writes: > On Tue, 5 Jul 2005, Eric W. Biederman wrote: >> >> True but if you can you will get multiple tags with the >> same suggested name. So you need so way to find the one you >> care about. > > I do agree that it would make sense to have a "tagger" field with the same > semantics as the "committer" in a commit (including all the same fields: > real name, email, and date). Ok here is a patch that implements it. I don't know how robust my code to get the defaults of tagger email address and especially tagger name are but basically it works. In addition I added a message when git-tag-script is waiting for you to type the tag message so people aren't confused. And of course I modified git-mktag to check that the tagger field is present. Now git-pull-script just needs to be tweaked to optionally add tags in the update into .git/refs/tags :) Using git-fsck-cache to find tags is doable but it slows down as your archive grows. Eric diff --git a/date.c b/date.c diff --git a/git-tag-script b/git-tag-script --- a/git-tag-script +++ b/git-tag-script @@ -1,12 +1,30 @@ #!/bin/sh # Copyright (c) 2005 Linus Torvalds +usage() { + echo 'git tag <tag name> [<sha1>]' + exit 1 +} + : ${GIT_DIR=.git} +if [ ! -d "$GIT_DIR" ]; then + echo Not a git directory 1>&2 + exit 1 +fi + +if [ $# -gt 2 -o $# -lt 1 ]; then + usage +fi object=${2:-$(cat "$GIT_DIR"/HEAD)} type=$(git-cat-file -t $object) || exit 1 -( echo -e "object $object\ntype $type\ntag $1\n"; cat ) > .tmp-tag +tagger_name=${GIT_COMMITTER_NAME:-$(sed -n -e "s/^$(whoami):[^:]*:[^:]*:[^:]*:\([^:,]*\).*:.*$/\1/p" < /etc/passwd)} +tagger_email=${GIT_COMMITTER_EMAIL:-"$(whoami)@$(hostname --fqdn)"} +tagger_date=$(date -d "${GIT_COMMITTER_DATE:-$(date -R)}" +"%s %z") || exit 1 +echo "Enter tag message now. ^D when finished" +( echo -e "object $object\ntype $type\ntag $1\ntagger $tagger_name <$tagger_email> $tagger_date\n"; cat) > .tmp-tag rm -f .tmp-tag.asc gpg -bsa .tmp-tag && cat .tmp-tag.asc >> .tmp-tag -git-mktag < .tmp-tag -#rm .tmp-tag .tmp-tag.sig +exit 1 +./git-mktag < .tmp-tag +rm -f .tmp-tag .tmp-tag.sig diff --git a/mktag.c b/mktag.c --- a/mktag.c +++ b/mktag.c @@ -42,7 +42,7 @@ static int verify_tag(char *buffer, unsi int typelen; char type[20]; unsigned char sha1[20]; - const char *object, *type_line, *tag_line; + const char *object, *type_line, *tag_line, *tagger_line; if (size < 64 || size > MAXSIZE-1) return -1; @@ -91,6 +91,11 @@ static int verify_tag(char *buffer, unsi continue; return -1; } + /* Verify the tagger line */ + tagger_line = tag_line; + + if (memcmp(tagger_line, "tagger ", 7) || (tagger_line[7] == '\n')) + return -1; /* The actual stuff afterwards we don't care about.. */ return 0; @@ -119,7 +124,7 @@ int main(int argc, char **argv) size += ret; } - // Verify it for some basic sanity: it needs to start with "object <sha1>\ntype " + // Verify it for some basic sanity: it needs to start with "object <sha1>\ntype\ntagger " if (verify_tag(buffer, size) < 0) die("invalid tag signature file"); ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 17:58 ` Tags H. Peter Anvin 2005-07-02 18:31 ` Tags Eric W. Biederman @ 2005-07-02 18:45 ` Linus Torvalds 1 sibling, 0 replies; 86+ messages in thread From: Linus Torvalds @ 2005-07-02 18:45 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Sat, 2 Jul 2005, H. Peter Anvin wrote: > > Well, you're wrong. Tags is the only part of git which cannot be > protected by git's own self-validation system. Well, you _can_ use the tag objects. That's what I do. The namespace isn't the tag name you use ("v2.6.12"), it's the name of the tag itself (in this case "26791a8bcf0e6d33f43aef7682bdb555236d56de"), and then it does actually distribute fine. The symbolic name is encoded within the tag, but isn't guaranteed to be unique in any way. So no, it doesn't protect the tag _name_ per se. Anybody can create a tag called "v2.6.12", and I don't think there's any way to handle clashes sanely. But you can find the tag objects in a pack, and you could index them separately. Then you'd need to let the users decide which ones they trust or want to use. Linus ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 0:06 ` Tags H. Peter Anvin 2005-07-02 7:00 ` Tags Eric W. Biederman @ 2005-07-02 20:38 ` Jan Harkes 2005-07-02 22:32 ` Tags Jan Harkes 1 sibling, 1 reply; 86+ messages in thread From: Jan Harkes @ 2005-07-02 20:38 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin On Fri, Jul 01, 2005 at 05:06:15PM -0700, H. Peter Anvin wrote: > Eric W. Biederman wrote: > > > >If I really care what developer xyz tagged I will pull from them, > >or a mirror I trust. And since developer xyz doesn't pull his > >own global tags from other repositories that should be sufficient. > > > > You're missing something totally and utterly fundamental here: I'm > talking about creating an infrastructure (think sourceforge) where there > is only one git repository for the whole system, period, full stop, end > of story. I'm not entirely sure what you are envisoning, but it is definitely doable in a secure way. - Assume that each developer will one or more private trees with one or more branches on kernel.org, lets say all these private repositories are stored under /scm/git/<user>/ - Now you create a single 'global repository' which is going to be the publicly visible one that will be mirrored out, - Then you run the following script (untested) #!/bin/sh GIT_DIR=$global_repo for user in `(cd /scm/git ; ls)`; do for tree in `find /scm/git/$user -name *.git` ; do for ref in `find $tree/refs -type f` ; do type=`echo $ref | sed 'sX^.*/refs/\([^/]*\)/.*$X\1X'` name=`echo $ref | sed 'sX^.*/refs/[^/]*/\(.*\)$X\1X'` git fetch /scm/git/$tree $branch mkdir -p $GIT_DIR/refs/$type/$user/$name cat $GIT_DIR/FETCH_HEAD > $GIT_DIR/refs/$type/$user/$name done done done - You can repack the global repository whenever you want. - Finally, once a user knows that all his changes are available from the global repository, he can remove any objects from his tree and use GIT_ALTERNATE_OBJECT_DIRECTORIES=$global_repo/objects (maybe there should be a flag for git prune to removes local objects that are already available in the alternate object directories) Jan ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-02 20:38 ` Tags Jan Harkes @ 2005-07-02 22:32 ` Jan Harkes 0 siblings, 0 replies; 86+ messages in thread From: Jan Harkes @ 2005-07-02 22:32 UTC (permalink / raw) To: Git Mailing List Cc: H. Peter Anvin, Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Junio C Hamano, ftpadmin On Sat, Jul 02, 2005 at 04:38:06PM -0400, Jan Harkes wrote: > - Then you run the following script (untested) Ok, I tested it and it was pretty broken, I assumed that git-fetch-script accepted the same arguments as git-pull-script. Here is one that actually seems to work. Jan #!/bin/sh # # combine per-user private trees into a single repository. # assumes that user repositories are stored as "$repos/<user>/<tree>.git" # global=global.git repos=/path/to/user/repositories export GIT_DIR="$global" # create global repository if it doesn't exist git-init-db for tree in $(cd "$repos" && find . -name '*.git' -prune | sed 'sX./XX') do root="$repos/$tree" for ref in $(cd "$root" && find refs -type f) ; do echo Synchronizing $tree git fetch "$root" "$ref" type=$(echo "$ref" | sed -ne 'sX^refs/\([^/]*\)/.*$X\1Xp') name=$(echo "$ref" | sed -ne 'sX^refs/[^/]*/\(.*\)$X\1Xp') dest="$GIT_DIR/refs/$type/$tree/$name" mkdir -p $(dirname "$dest") cat "$GIT_DIR/FETCH_HEAD" > "$dest" done done ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 22:44 ` Tags H. Peter Anvin 2005-07-01 23:07 ` Tags Eric W. Biederman @ 2005-07-02 16:00 ` Matthias Urlichs 1 sibling, 0 replies; 86+ messages in thread From: Matthias Urlichs @ 2005-07-02 16:00 UTC (permalink / raw) To: git Hi, H. Peter Anvin wrote: > Doesn't work. You can trivially generate a key with someone else's > address. It would require a full PKI. So you use the GPG key's fingerprint as the directory name, and add a few strategically named symlinks for convenience. *Shrug* Besides, what's wrong with requiring full PKI? Everybody who has a kernel.org account should be in the strongly connected set... -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - What I want is all of the power and none of the responsibility. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 13:56 ` Tags Eric W. Biederman 2005-07-01 16:37 ` Tags H. Peter Anvin @ 2005-07-01 18:09 ` Petr Baudis 2005-07-01 18:37 ` Tags H. Peter Anvin 1 sibling, 1 reply; 86+ messages in thread From: Petr Baudis @ 2005-07-01 18:09 UTC (permalink / raw) To: Eric W. Biederman Cc: H. Peter Anvin, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter where "Eric W. Biederman" <ebiederm@xmission.com> told me that... > "H. Peter Anvin" <hpa@zytor.com> writes: > > > In the end, it might be that the right thing to do for git on kernel.org is to > > have a single, unified object store which isn't accessible by anything other > > than git-specific protocols. There would have to be some way of dealing with, > > for example, conflicting tags that apply to different repositories, though. > > As far as I can tell public distributed tags are not that hard and if > you are going to be synching them it is probably worth working on. > > The basic idea is that instead of having one global tag of > 'linux-2.6.13-rc1' you have a global tag of > 'torvalds@osdl.org/linux-2.6.13-rc1'. > > The important part is that the tag namespace is made hierarchical > with at least 2 levels. Where the top level is a globally > unique tag owner id and the bottom level is the actual tag. This > prevents collisions when merging trees because two peoples > tags are never in the same namespace, as least when > people are not actively hostile :) I don't know, I don't consider this very appealing myself. I'd rather prefer the private tags to be per-repository rather than per-user, since those ugly "merged-here", "broken" etc. tags aren't very useful on larger scope than of a repository. OTOH, what tags would be per-user, not per-repository and not global? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 18:09 ` Tags Petr Baudis @ 2005-07-01 18:37 ` H. Peter Anvin 2005-07-01 21:20 ` Tags Matthias Urlichs 2005-07-01 21:42 ` Tags Petr Baudis 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-01 18:37 UTC (permalink / raw) To: Petr Baudis Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Petr Baudis wrote: > Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter > where "Eric W. Biederman" <ebiederm@xmission.com> told me that... > >>"H. Peter Anvin" <hpa@zytor.com> writes: >> >> >>>In the end, it might be that the right thing to do for git on kernel.org is to >>>have a single, unified object store which isn't accessible by anything other >>>than git-specific protocols. There would have to be some way of dealing with, >>>for example, conflicting tags that apply to different repositories, though. >> >>As far as I can tell public distributed tags are not that hard and if >>you are going to be synching them it is probably worth working on. >> >>The basic idea is that instead of having one global tag of >>'linux-2.6.13-rc1' you have a global tag of >>'torvalds@osdl.org/linux-2.6.13-rc1'. >> >>The important part is that the tag namespace is made hierarchical >>with at least 2 levels. Where the top level is a globally >>unique tag owner id and the bottom level is the actual tag. This >>prevents collisions when merging trees because two peoples >>tags are never in the same namespace, as least when >>people are not actively hostile :) > > > I don't know, I don't consider this very appealing myself. I'd rather > prefer the private tags to be per-repository rather than per-user, since > those ugly "merged-here", "broken" etc. tags aren't very useful on > larger scope than of a repository. OTOH, what tags would be per-user, > not per-repository and not global? > He's talking about global tags, just using a "globally unique" namespace. Which of course only works right if only genuinely can't create tags outside your assigned namespace. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 18:37 ` Tags H. Peter Anvin @ 2005-07-01 21:20 ` Matthias Urlichs 2005-07-01 21:42 ` Tags Petr Baudis 1 sibling, 0 replies; 86+ messages in thread From: Matthias Urlichs @ 2005-07-01 21:20 UTC (permalink / raw) To: git Hi, H. Peter Anvin wrote: > Which of course only works right if only genuinely can't > create tags outside your assigned namespace. I'd rather say that you can't *push* the tags to the central server if their namspace is wrong, but nothing would prevent you from *creating* arbitrary tags in your own repository. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de - - Habit is habit, and not to be flung out of the window by any man, but coaxed down-stairs a step at a time. -- Mark Twain ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 18:37 ` Tags H. Peter Anvin 2005-07-01 21:20 ` Tags Matthias Urlichs @ 2005-07-01 21:42 ` Petr Baudis 2005-07-01 21:52 ` Tags H. Peter Anvin 1 sibling, 1 reply; 86+ messages in thread From: Petr Baudis @ 2005-07-01 21:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Dear diary, on Fri, Jul 01, 2005 at 08:37:55PM CEST, I got a letter where "H. Peter Anvin" <hpa@zytor.com> told me that... > Petr Baudis wrote: > >Dear diary, on Fri, Jul 01, 2005 at 03:56:06PM CEST, I got a letter > >where "Eric W. Biederman" <ebiederm@xmission.com> told me that... > > > >>"H. Peter Anvin" <hpa@zytor.com> writes: > >> > >> > >>>In the end, it might be that the right thing to do for git on kernel.org > >>>is to > >>>have a single, unified object store which isn't accessible by anything > >>>other > >>>than git-specific protocols. There would have to be some way of dealing > >>>with, > >>>for example, conflicting tags that apply to different repositories, > >>>though. > >> > >>As far as I can tell public distributed tags are not that hard and if > >>you are going to be synching them it is probably worth working on. > >> > >>The basic idea is that instead of having one global tag of > >>'linux-2.6.13-rc1' you have a global tag of > >>'torvalds@osdl.org/linux-2.6.13-rc1'. > >> > >>The important part is that the tag namespace is made hierarchical > >>with at least 2 levels. Where the top level is a globally > >>unique tag owner id and the bottom level is the actual tag. This > >>prevents collisions when merging trees because two peoples > >>tags are never in the same namespace, as least when > >>people are not actively hostile :) > > > > > >I don't know, I don't consider this very appealing myself. I'd rather > >prefer the private tags to be per-repository rather than per-user, since > >those ugly "merged-here", "broken" etc. tags aren't very useful on > >larger scope than of a repository. OTOH, what tags would be per-user, > >not per-repository and not global? > > > > He's talking about global tags, just using a "globally unique" > namespace. Which of course only works right if only genuinely can't > create tags outside your assigned namespace. I doubt that's really useful either. Rather artificial mechanisms for protection of the namespace would have to be deployed, and again, what would it be good for anyway? If you are tagging linux-2.m.n, you are probably whoever you should be - David, Alan, Marcelo, Linus, or whoever else, while if you are tagging linux-2.m.n-cki, you are likely Con Kolivas. I don't believe there is any (or much) potential for "natural" conflicts and if you are malicious, you will just fake the namespace; but frequently what's interesting about the tags is not the author at all - I would consider it confusing to have to suddenly dive to another namespace when Linus hands maintenance of linux-2.m to someone else. The only significant value I can therefore see in the namespaces is prevention of user mistakes, but I think the successful strategy here would be just "upstream will notice", and make sure the upstream will be noticed properly (perhaps even interactively) about any new tags it gets. Ok, I admit that it boils down to me being lazy and that "it'd be more typing!"... ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 21:42 ` Tags Petr Baudis @ 2005-07-01 21:52 ` H. Peter Anvin 2005-07-01 22:27 ` Tags Daniel Barkalow 2005-07-01 22:59 ` Tags Petr Baudis 0 siblings, 2 replies; 86+ messages in thread From: H. Peter Anvin @ 2005-07-01 21:52 UTC (permalink / raw) To: Petr Baudis Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Petr Baudis wrote: > > I doubt that's really useful either. Rather artificial mechanisms for > protection of the namespace would have to be deployed, and again, what > would it be good for anyway? If you are tagging linux-2.m.n, you are > probably whoever you should be - David, Alan, Marcelo, Linus, or whoever > else, while if you are tagging linux-2.m.n-cki, you are likely Con > Kolivas. I don't believe there is any (or much) potential for "natural" > conflicts and if you are malicious, you will just fake the namespace; > but frequently what's interesting about the tags is not the author at > all - I would consider it confusing to have to suddenly dive to another > namespace when Linus hands maintenance of linux-2.m to someone else. > > The only significant value I can therefore see in the namespaces is > prevention of user mistakes, but I think the successful strategy here > would be just "upstream will notice", and make sure the upstream will be > noticed properly (perhaps even interactively) about any new tags it > gets. > > Ok, I admit that it boils down to me being lazy and that "it'd be more > typing!"... ;-) > You're missing the whole point of the discussion. Right now the only thing that makes a global object store impossible is the potential for a tag conflict, either intentional or accidental. -hpa ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 21:52 ` Tags H. Peter Anvin @ 2005-07-01 22:27 ` Daniel Barkalow 2005-07-01 22:59 ` Tags Petr Baudis 1 sibling, 0 replies; 86+ messages in thread From: Daniel Barkalow @ 2005-07-01 22:27 UTC (permalink / raw) To: H. Peter Anvin Cc: Petr Baudis, Eric W. Biederman, Linus Torvalds, Git Mailing List, Junio C Hamano, ftpadmin On Fri, 1 Jul 2005, H. Peter Anvin wrote: > You're missing the whole point of the discussion. Right now the only > thing that makes a global object store impossible is the potential for a > tag conflict, either intentional or accidental. Is there some issue remaining with having a global *object* store, symlinked from multiple repositories, each with its own tags and such? (I'd think that, in the refs, there would be more contention over the heads than the tags, in any case; refs/heads/master is kind of popular) -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: Tags 2005-07-01 21:52 ` Tags H. Peter Anvin 2005-07-01 22:27 ` Tags Daniel Barkalow @ 2005-07-01 22:59 ` Petr Baudis 1 sibling, 0 replies; 86+ messages in thread From: Petr Baudis @ 2005-07-01 22:59 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Linus Torvalds, Daniel Barkalow, Git Mailing List, Junio C Hamano, ftpadmin Dear diary, on Fri, Jul 01, 2005 at 11:52:51PM CEST, I got a letter where "H. Peter Anvin" <hpa@zytor.com> told me that... > You're missing the whole point of the discussion. Right now the only > thing that makes a global object store impossible is the potential for a > tag conflict, either intentional or accidental. Ok, I was arguing about something a bit different here, sorry. The point of refs/tags/ should be to just indicate tags which we have in the current head (remember that this structure comes from the times before Dave, when the repository:"master branch" mapping was 1:1), since that are usually the only objects you have in _your_ repository. What's the point of having tag linux-1.0.4-ac128 when you don't have the linux-1.0.4-ac branch whatsoever? The distinction of "public" vs "private" tags here is really only that the "public" tags should be propagated to your head when you merge the remote head. This way, each head will have its own set of tags, and it will be only tags which actually reference objects relevant to the head. Now that we can have many branches in a repository, each with its own set of tags, we should probably extend the tags hierarchy to refs/tags/<head>/<tagname>. And see, you can actually have that in the global object store, as long as the head names are unique. But heads don't propagate in any way so that's a purely administrative issue on the global store side. BTW, I don't think many (most?) heads named "master" are big issue. That's how the head is called locally, and noone says that's how the head should be known at the other side too. It's fine to have a head called "master" in your repository and when pushing to the global object store call it "pasky/linux-l33t" over there. (If you are using Cogito, you can add that branch using a URL proto://global/obj/store#pasky/linux-l33t.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: "git-send-pack" 2005-06-30 20:12 ` "git-send-pack" Linus Torvalds 2005-06-30 20:23 ` "git-send-pack" H. Peter Anvin @ 2005-06-30 20:49 ` Daniel Barkalow 1 sibling, 0 replies; 86+ messages in thread From: Daniel Barkalow @ 2005-06-30 20:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, ftpadmin On Thu, 30 Jun 2005, Linus Torvalds wrote: > On Thu, 30 Jun 2005, Daniel Barkalow wrote: > > > > The right solution probably involves getting each pack file you push to > > the mirrors as well as to the master. They'll probably update no less > > frequently than you push, and they should go through a series of states > > which matches the master, so it's not necessary to have anything smart on > > master sending them, and they only have to unpack the files they get (and > > update the refs afterward). > > Hmm, yes. That would work, together with just fetching the heads. > > It won't _really_ solve the problem, since the pushed pack objects will > grow at a proportional rate to the current objects - it's just a constant > factor (admittedly a potentially fairly _big_ constant factor) > improvement both in size and in number of files. > > So the mirroring ends up getting slowly slower and slower as the number of > pack files go up. In contrast, a git-aware thing can be basically > constant-time, and mirroring expense ends up being relative to the size of > the change rather than the size of the repository. > > But mirroring just pack-files might solve the problem for the forseeable > future, so.. Whenever it gets slow, you could replace all the old packs with a single new pack containing all the old objects; and master could repack whenever it has a lot of pack files. That's pretty close to O(n) in change size. Alternatively, having a reverse-ordered list of pack files would mean that mirrors could just go through that list until they found one they already had, and stop there, which would really be O(n). -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 86+ messages in thread
end of thread, other threads:[~2005-07-07 3:36 UTC | newest] Thread overview: 86+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-06-30 17:54 "git-send-pack" Linus Torvalds 2005-06-30 18:24 ` "git-send-pack" A Large Angry SCM 2005-06-30 18:27 ` "git-send-pack" A Large Angry SCM 2005-06-30 19:04 ` "git-send-pack" Linus Torvalds 2005-06-30 18:45 ` "git-send-pack" Jan Harkes 2005-06-30 19:01 ` "git-send-pack" Mike Taht 2005-06-30 19:42 ` "git-send-pack" Linus Torvalds 2005-07-01 9:50 ` "git-send-pack" Matthias Urlichs 2005-06-30 19:44 ` "git-send-pack" Linus Torvalds 2005-06-30 20:38 ` "git-send-pack" Junio C Hamano 2005-06-30 21:05 ` "git-send-pack" Daniel Barkalow 2005-06-30 21:29 ` "git-send-pack" Linus Torvalds 2005-06-30 21:55 ` "git-send-pack" H. Peter Anvin 2005-06-30 22:26 ` "git-send-pack" Linus Torvalds 2005-06-30 23:40 ` "git-send-pack" H. Peter Anvin 2005-07-01 0:02 ` "git-send-pack" Linus Torvalds 2005-07-01 1:24 ` "git-send-pack" H. Peter Anvin 2005-07-01 23:44 ` "git-send-pack" Mike Taht 2005-07-02 0:07 ` "git-send-pack" H. Peter Anvin 2005-07-02 1:56 ` "git-send-pack" Linus Torvalds 2005-07-02 4:08 ` "git-send-pack" H. Peter Anvin 2005-07-02 4:22 ` "git-send-pack" Linus Torvalds 2005-07-02 4:29 ` "git-send-pack" H. Peter Anvin 2005-07-02 17:16 ` "git-send-pack" Linus Torvalds 2005-07-02 17:37 ` "git-send-pack" H. Peter Anvin 2005-07-02 17:44 ` "git-send-pack" Tony Luck 2005-07-02 17:48 ` "git-send-pack" H. Peter Anvin 2005-07-02 18:12 ` "git-send-pack" A Large Angry SCM 2005-06-30 22:25 ` "git-send-pack" Daniel Barkalow 2005-06-30 23:56 ` "git-send-pack" Linus Torvalds 2005-07-01 5:01 ` "git-send-pack" Daniel Barkalow 2005-06-30 21:08 ` "git-send-pack" Linus Torvalds 2005-06-30 21:10 ` "git-send-pack" Dan Holmsand 2005-06-30 19:49 ` "git-send-pack" Daniel Barkalow 2005-06-30 20:12 ` "git-send-pack" Linus Torvalds 2005-06-30 20:23 ` "git-send-pack" H. Peter Anvin 2005-06-30 20:52 ` "git-send-pack" Linus Torvalds 2005-06-30 21:23 ` "git-send-pack" H. Peter Anvin 2005-06-30 21:26 ` "git-send-pack" H. Peter Anvin 2005-06-30 21:42 ` "git-send-pack" Linus Torvalds 2005-06-30 22:00 ` "git-send-pack" H. Peter Anvin 2005-07-01 10:31 ` "git-send-pack" Matthias Urlichs 2005-07-01 14:43 ` "git-send-pack" Jan Harkes 2005-07-01 13:56 ` Tags Eric W. Biederman 2005-07-01 16:37 ` Tags H. Peter Anvin 2005-07-01 22:38 ` Tags Eric W. Biederman 2005-07-01 22:44 ` Tags H. Peter Anvin 2005-07-01 23:07 ` Tags Eric W. Biederman 2005-07-01 23:22 ` Tags Daniel Barkalow 2005-07-02 0:06 ` Tags H. Peter Anvin 2005-07-02 7:00 ` Tags Eric W. Biederman 2005-07-02 17:47 ` Tags H. Peter Anvin 2005-07-02 17:54 ` Tags Eric W. Biederman 2005-07-02 17:58 ` Tags H. Peter Anvin 2005-07-02 18:31 ` Tags Eric W. Biederman 2005-07-02 19:55 ` Tags Matthias Urlichs 2005-07-02 21:16 ` Tags H. Peter Anvin 2005-07-02 21:39 ` Tags Linus Torvalds 2005-07-02 21:42 ` Tags H. Peter Anvin 2005-07-02 22:02 ` Tags A Large Angry SCM 2005-07-02 22:20 ` Tags Linus Torvalds 2005-07-02 23:49 ` Tags A Large Angry SCM 2005-07-03 0:17 ` Tags Linus Torvalds 2005-07-02 22:14 ` Tags Petr Baudis 2005-07-02 22:17 ` Tags Linus Torvalds 2005-07-03 0:04 ` Tags Dan Holmsand 2005-07-03 22:34 ` Tags Kevin Smith 2005-07-05 13:04 ` Tags Eric W. Biederman 2005-07-05 16:21 ` Tags Daniel Barkalow 2005-07-05 17:51 ` Tags Eric W. Biederman 2005-07-05 18:33 ` Tags Linus Torvalds 2005-07-05 19:22 ` Tags Junio C Hamano 2005-07-06 18:04 ` Tags Matthias Urlichs 2005-07-07 3:31 ` Tags Eric W. Biederman 2005-07-02 18:45 ` Tags Linus Torvalds 2005-07-02 20:38 ` Tags Jan Harkes 2005-07-02 22:32 ` Tags Jan Harkes 2005-07-02 16:00 ` Tags Matthias Urlichs 2005-07-01 18:09 ` Tags Petr Baudis 2005-07-01 18:37 ` Tags H. Peter Anvin 2005-07-01 21:20 ` Tags Matthias Urlichs 2005-07-01 21:42 ` Tags Petr Baudis 2005-07-01 21:52 ` Tags H. Peter Anvin 2005-07-01 22:27 ` Tags Daniel Barkalow 2005-07-01 22:59 ` Tags Petr Baudis 2005-06-30 20:49 ` "git-send-pack" Daniel Barkalow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).