* Regarding "git log" on "git series" metadata @ 2016-11-04 17:57 Junio C Hamano 2016-11-04 19:19 ` Jacob Keller ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Junio C Hamano @ 2016-11-04 17:57 UTC (permalink / raw) To: Josh Triplett; +Cc: git After your talk at LPC2016, I was thinking about your proposal to give an option to hide certain parents from "git log" traversal. While I do not think we would terribly mind a new feature in the core to support third-party additions like "git series" better, I think this particular one is a big mistake that we shouldn't take. For those listening from sidelines, here is a version of my understanding of "git series": * "git series" wants to represent a patch series evolution. It is a history of history, and each element of this evolution is represented by: - a commit object, that is used to describe what this reroll of the topic is about, and its parent links point at previous rerolls (it could be a merge of two independent incarnations of a series). - the tree contained in the commit object records the base commit where the topic forks from the main history, and the tip commit where the topic ends. These are pointers into the main history DAG. - the tree may have other metadata, an example of which is the cover letter contents to be used when the topic becomes ready for re-submission. There may be more metadata you would want to add in the future versions of "git series". Needless to say, the commits that represent the history of a series record a tree that is completely differently shaped. The only relation between the series history and main history is that the former has pointers into the latter. * You chose to represent the base and tip commit object as gitlinks in the tree of a series commit, simply because it was a way that was already implemented to record a commit object name in a tree. * However, because gitlink is designed to be used for "external" things (the prominent example is submodule), recording these as gitlinks would guarantee that they will get GCed as a series progresses, the main history rewound and rewritten thereby making the base and tip recorded in the older part of the series history unreachable from the main history. Because you want to make sure that base and tip objects will stay in the repository even after the topic branch in the main history gets rewound, this is not what you want. * In order to workaround that reachability issue, the hack you invented is to add the tip commit as a "parent" of a commit that represents one step in the series. This may guarantee the reachability---as long as a commit in a series history is reachable from a ref, the tip and base commits will be reachable from there even if they are rebased away from the main history. But of course, there are downsides. * Due to this hack, feeding "gitk" (or "git log") a commit in the series history will give you nonsense results. You are not interested in traversing or viewing the commits in the main history. * Because of the above, you propose another hack to tell the revision traversal machinery to optionally omit a parent commit that appear as a gitlink in the tree. I think this is backwards. The root cause of the issue you have with "gitk" is because you added something that is *NOT* a parent to your commit. We shouldn't have to add a mechanism to filter something that shouldn't have been added there in the first place. I am wondering if an alternative approach would work better. Imagine we invent a new tree entry type, "gitref", that is similar to "gitlink" in that it can record a commit object name in a tree, but unlike "gitlink" it does imply reachability. And you do not add phony parents to your commit object. A tree that has "gitref"s in it is about annotating the commits in the same repository (e.g. the tree references two commits, "base" and "tip", to point into a slice of the main history). And it is perfectly sensible for such a pointer to imply reachability---after all it serves different purposes from "gitlink". Another alternative that I am negative about (but is probably a better hack than how you abused the "parent" link) might be to add a new commit object header field that behaves similarly to "parent" only in that it implies reachability. But recording the extra parent in commit object was not something you wanted to do in the first place (i.e. your series processing is done solely on the contents of the tree, and you do not read this extra parent). If you need to add an in-tree reference to another commit in your future versions of "git series", with either this variant or your original implementation, you would end up needing adding more "parent" (or pseudo parent) only to preserve reachability. At that point, I think it makes more sense to have entries in the tree to directly ensure reachability, if you want these entries to always point at an in-tree object. I am afraid that I probably am two steps ahead of myself, because I am reasonably sure that it is quite possible that I have overlooked something trivially obvious that makes the "gitref" approach unworkable. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 17:57 Regarding "git log" on "git series" metadata Junio C Hamano @ 2016-11-04 19:19 ` Jacob Keller 2016-11-04 19:49 ` Jeff King 2016-11-04 20:47 ` Christian Couder 2016-11-04 21:06 ` Josh Triplett 2 siblings, 1 reply; 35+ messages in thread From: Jacob Keller @ 2016-11-04 19:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Josh Triplett, Git mailing list On Fri, Nov 4, 2016 at 10:57 AM, Junio C Hamano <gitster@pobox.com> wrote: > I think this is backwards. The root cause of the issue you have > with "gitk" is because you added something that is *NOT* a parent to > your commit. We shouldn't have to add a mechanism to filter > something that shouldn't have been added there in the first place. > > I am wondering if an alternative approach would work better. > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". > I agree with your assessment here. The main difficulty in implementing gitrefs is to ensure that they actually do get picked up by reachability checks to prevent dropping commits. I'm not sure how easy this is, but I would much rather we go this route rather than continuing along with the hack. This seems like the ideal solution, since it solves the entire problem and doesn't need more hacks bolted on. It would of course mean some work for people who previously used git series as you would want to re-write the commits to drop the parent links and become gitrefs instead of gitlinks. However, this can (probably?) be solved by some sort of use of the filter-branch code. I don't think you've hit upon any trivially obvious unworkable things. It is probably somewhat complex to make the reachability checks detect in-tree gitrefs but I don't think it would be impossible. Thanks, Jake ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 19:19 ` Jacob Keller @ 2016-11-04 19:49 ` Jeff King 2016-11-04 21:55 ` Josh Triplett ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Jeff King @ 2016-11-04 19:49 UTC (permalink / raw) To: Jacob Keller; +Cc: Junio C Hamano, Josh Triplett, Git mailing list On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > I agree with your assessment here. The main difficulty in implementing > gitrefs is to ensure that they actually do get picked up by > reachability checks to prevent dropping commits. I'm not sure how easy > this is, but I would much rather we go this route rather than > continuing along with the hack. This seems like the ideal solution, > since it solves the entire problem and doesn't need more hacks bolted > on. I think the main complication is that the reachability rules are used during object transfer. So you'd probably want to introduce some protocol extension to say "I understand gitrefs", so that when one side says "I have sha1 X and its reachable objects", we know whether they are including gitrefs there. And likewise receivers with transfer.fsckObjects may complain about the new gitref tree mode (fortunately a new object type shouldn't be needed). You might also want fallback rules for storing gitrefs on "old" servers (e.g., backfilling gitrefs you need if the server didn't them in the initial fetch). But I guess storing any gitrefs on such a server is inherently dangerous, because the server might prune them at any time. So perhaps a related question is: how can gitrefs be designed such that existing servers reject them (rather than accepting the push and then later throwing away half the data). It would be easy to notice in the client during a push that we are sending gitrefs to a server which does not claim that capability. But it seems more robust if it is the server who decides "I will not accept these bogus objects". I haven't thought all that hard about this. That's just my initial thoughts on what sound hard. Tweaking the reachability code doesn't seem all that bad; we already know all of the spots that care about S_ISGITLINK(). It may even be that some of those spots work out of the box (because gitlinks are usually about telling the graph-walking code that we _don't_ care about reachability; we do by default for trees and blobs). I'd be surprised if all such sites work out of the box, though. Even if they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and therefore should be reachable", they need to also note that "X" is a commit and recursively walk its objects. -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 19:49 ` Jeff King @ 2016-11-04 21:55 ` Josh Triplett 2016-11-04 23:37 ` Jacob Keller 2016-11-04 23:34 ` Jacob Keller 2016-11-05 4:41 ` Junio C Hamano 2 siblings, 1 reply; 35+ messages in thread From: Josh Triplett @ 2016-11-04 21:55 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Junio C Hamano, Git mailing list On Fri, Nov 04, 2016 at 03:49:07PM -0400, Jeff King wrote: > On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > > > I agree with your assessment here. The main difficulty in implementing > > gitrefs is to ensure that they actually do get picked up by > > reachability checks to prevent dropping commits. I'm not sure how easy > > this is, but I would much rather we go this route rather than > > continuing along with the hack. This seems like the ideal solution, > > since it solves the entire problem and doesn't need more hacks bolted > > on. > > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). > > You might also want fallback rules for storing gitrefs on "old" servers > (e.g., backfilling gitrefs you need if the server didn't them in the > initial fetch). But I guess storing any gitrefs on such a server is > inherently dangerous, because the server might prune them at any time. > > So perhaps a related question is: how can gitrefs be designed such that > existing servers reject them (rather than accepting the push and then > later throwing away half the data). It would be easy to notice in the > client during a push that we are sending gitrefs to a server which does > not claim that capability. But it seems more robust if it is the server > who decides "I will not accept these bogus objects". This seems like the critical problem, here. The parent hack I used in git-series might be a hack, but it transparently works with old servers and clients. So, for instance, I can push a git-series ref to github, with no changes required on github's part. If git added gitrefs, and I started using them in git-series, then that'd eliminate parent hack and allow many standard git tools to work naturally on git-series commits and history, but it'd also mean that people couldn't push git-series commits to any server until that server updates git. That said, I'd *love* to have gitrefs available, for a wide variety of applications, and I can see an argument for introducing them and waiting a few years for them to become universally available, similar to the process gitlinks went through. But I'd also love to have a backward-compatible solution. - Josh Triplett ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 21:55 ` Josh Triplett @ 2016-11-04 23:37 ` Jacob Keller 2016-11-04 23:46 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Jacob Keller @ 2016-11-04 23:37 UTC (permalink / raw) To: Josh Triplett; +Cc: Jeff King, Junio C Hamano, Git mailing list On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplett <josh@joshtriplett.org> wrote: > That said, I'd *love* to have gitrefs available, for a wide variety of > applications, and I can see an argument for introducing them and waiting > a few years for them to become universally available, similar to the > process gitlinks went through. > > But I'd also love to have a backward-compatible solution. > > - Josh Triplett I think that you won't really find a backwards compatible solution other than something like automatically generating refs for each point of history. I know that gerrit does something like this by storing each version in "refs/changes/id/version" or something along those lines. I think this might actually be cleaner than your parent links hack, and could be used as a fallback for when gitrefs don't work, though you'd have to code exactly how to tell what to push to a repository when pushing a series? Thanks, Jake ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 23:37 ` Jacob Keller @ 2016-11-04 23:46 ` Josh Triplett 0 siblings, 0 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-04 23:46 UTC (permalink / raw) To: Jacob Keller; +Cc: Jeff King, Junio C Hamano, Git mailing list On Fri, Nov 04, 2016 at 04:37:34PM -0700, Jacob Keller wrote: > On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplett <josh@joshtriplett.org> wrote: > > That said, I'd *love* to have gitrefs available, for a wide variety of > > applications, and I can see an argument for introducing them and waiting > > a few years for them to become universally available, similar to the > > process gitlinks went through. > > > > But I'd also love to have a backward-compatible solution. > > > > - Josh Triplett > > I think that you won't really find a backwards compatible solution > other than something like automatically generating refs for each point > of history. I know that gerrit does something like this by storing > each version in "refs/changes/id/version" or something along those > lines. I think this might actually be cleaner than your parent links > hack, and could be used as a fallback for when gitrefs don't work, > though you'd have to code exactly how to tell what to push to a > repository when pushing a series? I'm not sure what the advantage of that would be, and it would mean that if you ever have one branch without pushing the other(s), you'd get severe time-delated breakage due to pruning. (And if you pushed the series without the other ref(s), its history would look right but then you couldn't access the underlying versions of the patch series.) One of my design goals was to *not* need a special "git series push" or "git series pull"; you should just be able to use git push and git pull, and you can set up normal refspecs. That said, I could fairly easily generate the existing format with artificial parent refs for backward compatibility, and provide a way to use the new gitref-based storage format if you know that all your servers and clients can handle it. I'm also open to other suggestions for how to make such a transition while still working with every git server and git client that exists today. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 19:49 ` Jeff King 2016-11-04 21:55 ` Josh Triplett @ 2016-11-04 23:34 ` Jacob Keller 2016-11-05 1:48 ` Jeff King 2016-11-05 4:41 ` Junio C Hamano 2 siblings, 1 reply; 35+ messages in thread From: Jacob Keller @ 2016-11-04 23:34 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Josh Triplett, Git mailing list On Fri, Nov 4, 2016 at 12:49 PM, Jeff King <peff@peff.net> wrote: > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). > > You might also want fallback rules for storing gitrefs on "old" servers > (e.g., backfilling gitrefs you need if the server didn't them in the > initial fetch). But I guess storing any gitrefs on such a server is > inherently dangerous, because the server might prune them at any time. > Is it possible currently for a protocol extension to result in "oh the server doesn't support this so I'm going to stop pushing"? This would be a rather hard transition, but it would at least ensure that pushing to a server which doesn't support gitrefs would fail rather than silently accept objects and then discard them later? I think this is the only real transition unless we can make a change that old servers object to already. > So perhaps a related question is: how can gitrefs be designed such that > existing servers reject them (rather than accepting the push and then > later throwing away half the data). It would be easy to notice in the > client during a push that we are sending gitrefs to a server which does > not claim that capability. But it seems more robust if it is the server > who decides "I will not accept these bogus objects". > > I haven't thought all that hard about this. That's just my initial > thoughts on what sound hard. Tweaking the reachability code doesn't seem > all that bad; we already know all of the spots that care about > S_ISGITLINK(). It may even be that some of those spots work out of the > box (because gitlinks are usually about telling the graph-walking code > that we _don't_ care about reachability; we do by default for trees and > blobs). Right. I'm assuming tree objects don't get checked for invalid mode already? If they do, we could just change the mode to something unsupported currently. But... that seems like it might not be the case because it requires checking every tree object coming in? I'm not familiar with what sort of checking already exists... Thoughts? > > I'd be surprised if all such sites work out of the box, though. Even if > they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and > therefore should be reachable", they need to also note that "X" is a > commit and recursively walk its objects. > They won't all work out of the box, but it shouldn't be much work to do this part. > -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 23:34 ` Jacob Keller @ 2016-11-05 1:48 ` Jeff King 2016-11-05 3:55 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Jeff King @ 2016-11-05 1:48 UTC (permalink / raw) To: Jacob Keller; +Cc: Junio C Hamano, Josh Triplett, Git mailing list On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote: > > You might also want fallback rules for storing gitrefs on "old" servers > > (e.g., backfilling gitrefs you need if the server didn't them in the > > initial fetch). But I guess storing any gitrefs on such a server is > > inherently dangerous, because the server might prune them at any time. > > Is it possible currently for a protocol extension to result in "oh the > server doesn't support this so I'm going to stop pushing"? Yes, it would be easy for the client to abort if the server fails to advertise a particular extension. What I would worry about more is that "somehow" an older client gets hold of history with a gitref, and then pushes it. It would be nice if even an old server said "nope, I don't understand this and I won't take it" rather than propagating the data to a server that will throw it away. > Right. I'm assuming tree objects don't get checked for invalid mode > already? If they do, we could just change the mode to something > unsupported currently. But... that seems like it might not be the case > because it requires checking every tree object coming in? > > I'm not familiar with what sort of checking already exists... Thoughts? If the server sets receive.fsckObjects, then fsck_tree() runs and will reject any non-standard mode. That option is not the default, though some big hosters set it (GitHub does, but I am actually not that worried about GitHub; if gitrefs support materialized I would probably ship it there fairly promptly). -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 1:48 ` Jeff King @ 2016-11-05 3:55 ` Josh Triplett 2016-11-05 4:41 ` Jeff King 0 siblings, 1 reply; 35+ messages in thread From: Josh Triplett @ 2016-11-05 3:55 UTC (permalink / raw) To: Jeff King, Jacob Keller; +Cc: Junio C Hamano, Git mailing list On November 4, 2016 7:48:17 PM MDT, Jeff King <peff@peff.net> wrote: >On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote: > >> > You might also want fallback rules for storing gitrefs on "old" >servers >> > (e.g., backfilling gitrefs you need if the server didn't them in >the >> > initial fetch). But I guess storing any gitrefs on such a server is >> > inherently dangerous, because the server might prune them at any >time. >> >> Is it possible currently for a protocol extension to result in "oh >the >> server doesn't support this so I'm going to stop pushing"? > >Yes, it would be easy for the client to abort if the server fails to >advertise a particular extension. And the reverse (old client, new server) should work as well? - Josh Triplett ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 3:55 ` Josh Triplett @ 2016-11-05 4:41 ` Jeff King 0 siblings, 0 replies; 35+ messages in thread From: Jeff King @ 2016-11-05 4:41 UTC (permalink / raw) To: Josh Triplett; +Cc: Jacob Keller, Junio C Hamano, Git mailing list On Fri, Nov 04, 2016 at 09:55:23PM -0600, Josh Triplett wrote: > >> Is it possible currently for a protocol extension to result in "oh > >the > >> server doesn't support this so I'm going to stop pushing"? > > > >Yes, it would be easy for the client to abort if the server fails to > >advertise a particular extension. > > And the reverse (old client, new server) should work as well? Yes. The server says "I know about gitrefs" and if the client does not respond with "I also know about gitrefs", then the server can act appropriately (e.g., for a fetch, bail if the fetched content includes gitrefs). -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 19:49 ` Jeff King 2016-11-04 21:55 ` Josh Triplett 2016-11-04 23:34 ` Jacob Keller @ 2016-11-05 4:41 ` Junio C Hamano 2016-11-05 4:44 ` Jeff King 2016-11-05 5:00 ` Junio C Hamano 2 siblings, 2 replies; 35+ messages in thread From: Junio C Hamano @ 2016-11-05 4:41 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Josh Triplett, Git mailing list Jeff King <peff@peff.net> writes: > On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > >> I agree with your assessment here. The main difficulty in implementing >> gitrefs is to ensure that they actually do get picked up by >> reachability checks to prevent dropping commits. I'm not sure how easy >> this is, but I would much rather we go this route rather than >> continuing along with the hack. This seems like the ideal solution, >> since it solves the entire problem and doesn't need more hacks bolted >> on. > > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). Quite honestly I do not think backward compatibility here matters. When gitlinks were introduced, a repository that was created with gitlink capable version of Git would have failed "git fsck" that is not gitlink aware, and I think this new "link with reachability" is the same deal. No existing implemention understands a tree entry whose mode bits are 140000 or whatever new bit pattern we would assign to this thing. You have to wait until both ends understand the new thing, and that is perfectly OK. Besides, I think the point of having this discussion is that Josh did a good prototyping work of "git series" to discover what he can do in that area of "keeping track of history of history" and what operations are useful, without wasting time on mucking with the object model and traversal machinery that are available to him. Now his prototyping is at the point where he knows at least one enhancement to the core that would help him to redo the prototype in the right way. And I do not mind helping him from the core side. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 4:41 ` Junio C Hamano @ 2016-11-05 4:44 ` Jeff King 2016-11-05 5:00 ` Junio C Hamano 1 sibling, 0 replies; 35+ messages in thread From: Jeff King @ 2016-11-05 4:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Josh Triplett, Git mailing list On Fri, Nov 04, 2016 at 09:41:06PM -0700, Junio C Hamano wrote: > > I think the main complication is that the reachability rules are used > > during object transfer. So you'd probably want to introduce some > > protocol extension to say "I understand gitrefs", so that when one side > > says "I have sha1 X and its reachable objects", we know whether they are > > including gitrefs there. And likewise receivers with > > transfer.fsckObjects may complain about the new gitref tree mode > > (fortunately a new object type shouldn't be needed). > > Quite honestly I do not think backward compatibility here matters. > When gitlinks were introduced, a repository that was created with > gitlink capable version of Git would have failed "git fsck" that is > not gitlink aware, and I think this new "link with reachability" is > the same deal. No existing implemention understands a tree entry > whose mode bits are 140000 or whatever new bit pattern we would > assign to this thing. You have to wait until both ends understand > the new thing, and that is perfectly OK. I'm OK with saying "if you use the gitref feature, you cannot push or pull those objects with remotes that do not understand it". But unlike gitlink, if we fail to notice the situation, we run into a case where we might silently lose objects, which is bad. So I think we need to be a bit more careful. I don't think the problems are insurmountable. I just think that's where the real complexity is, not in the changes to teach a single git about gitrefs. I'm happy to stand back and let you or Josh figure out all the corner cases. :) -Peff ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 4:41 ` Junio C Hamano 2016-11-05 4:44 ` Jeff King @ 2016-11-05 5:00 ` Junio C Hamano 1 sibling, 0 replies; 35+ messages in thread From: Junio C Hamano @ 2016-11-05 5:00 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Josh Triplett, Git mailing list Junio C Hamano <gitster@pobox.com> writes: >> I think the main complication is that the reachability rules are used >> during object transfer. One should not type after spending 20+ waking hours on plane and airport. I missed it when I wrote my first response, but yes, the reachability that originates from inside a tree object indeed is a problem, as we do not dig into trees while doing the want/have/ack exhange. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 17:57 Regarding "git log" on "git series" metadata Junio C Hamano 2016-11-04 19:19 ` Jacob Keller @ 2016-11-04 20:47 ` Christian Couder 2016-11-04 21:19 ` Josh Triplett 2016-11-05 4:42 ` Junio C Hamano 2016-11-04 21:06 ` Josh Triplett 2 siblings, 2 replies; 35+ messages in thread From: Christian Couder @ 2016-11-04 20:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: Josh Triplett, git, Shawn O. Pierce, Jeff King On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote: > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". The more I think about this (and also about how to limit ref advertisements as recently discussed in https://public-inbox.org/git/20161024132932.i42rqn2vlpocqmkq@sigill.intra.peff.net/), the more I think about Shawn's RefTree: https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/ Couldn't a RefTree be used to store refs that point to the base commit, the tip commit and the blob that contains the cover letter, and maybe also a ref pointing to the RefTree of the previous version of the series? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 20:47 ` Christian Couder @ 2016-11-04 21:19 ` Josh Triplett 2016-11-04 23:04 ` Christian Couder 2016-11-05 21:56 ` Christian Couder 2016-11-05 4:42 ` Junio C Hamano 1 sibling, 2 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-04 21:19 UTC (permalink / raw) To: Christian Couder; +Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: > On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote: > > > > Imagine we invent a new tree entry type, "gitref", that is similar > > to "gitlink" in that it can record a commit object name in a tree, > > but unlike "gitlink" it does imply reachability. And you do not add > > phony parents to your commit object. A tree that has "gitref"s in > > it is about annotating the commits in the same repository (e.g. the > > tree references two commits, "base" and "tip", to point into a slice > > of the main history). And it is perfectly sensible for such a > > pointer to imply reachability---after all it serves different > > purposes from "gitlink". > > The more I think about this (and also about how to limit ref > advertisements as recently discussed in > https://public-inbox.org/git/20161024132932.i42rqn2vlpocqmkq@sigill.intra.peff.net/), > the more I think about Shawn's RefTree: > > https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/ > > Couldn't a RefTree be used to store refs that point to the base > commit, the tip commit and the blob that contains the cover letter, > and maybe also a ref pointing to the RefTree of the previous version > of the series? That's really interesting! The Software Heritage project is working on something similar, because they want to store all the refs as part of their data model as well. I'll point them to the reftree work. If upstream git supported RefTree, I could potentially use that for git-series. However, I do want a commit message and history for the series itself, and using refs in the reftree to refer to the parents seems like abusing reftree to recreate commits, in a reversal of the hack of using commit parents as a reftree. :) What if, rather than storing a hash reference to a reftree as a single reference and replacing it with no history, a reftree could be referenced from a commit and have history? (That would also allow tagging a version of the reftree.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 21:19 ` Josh Triplett @ 2016-11-04 23:04 ` Christian Couder 2016-11-13 17:50 ` Stefano Zacchiroli 2016-11-05 21:56 ` Christian Couder 1 sibling, 1 reply; 35+ messages in thread From: Christian Couder @ 2016-11-04 23:04 UTC (permalink / raw) To: Josh Triplett Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King, Stefano Zacchiroli On Fri, Nov 4, 2016 at 10:19 PM, Josh Triplett <josh@joshtriplett.org> wrote: > On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: >> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote: >> > >> > Imagine we invent a new tree entry type, "gitref", that is similar >> > to "gitlink" in that it can record a commit object name in a tree, >> > but unlike "gitlink" it does imply reachability. And you do not add >> > phony parents to your commit object. A tree that has "gitref"s in >> > it is about annotating the commits in the same repository (e.g. the >> > tree references two commits, "base" and "tip", to point into a slice >> > of the main history). And it is perfectly sensible for such a >> > pointer to imply reachability---after all it serves different >> > purposes from "gitlink". >> >> The more I think about this (and also about how to limit ref >> advertisements as recently discussed in >> https://public-inbox.org/git/20161024132932.i42rqn2vlpocqmkq@sigill.intra.peff.net/), >> the more I think about Shawn's RefTree: >> >> https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/ >> >> Couldn't a RefTree be used to store refs that point to the base >> commit, the tip commit and the blob that contains the cover letter, >> and maybe also a ref pointing to the RefTree of the previous version >> of the series? > > That's really interesting! The Software Heritage project is working on > something similar, because they want to store all the refs as part of > their data model as well. I'll point them to the reftree work. Yeah, I know them :-) and I think I have already told Stefano Zacchiroli about this, but I am not sure anymore. Anyway I am CC'ing him. > If upstream git supported RefTree, I could potentially use that for > git-series. However, I do want a commit message and history for the > series itself, and using refs in the reftree to refer to the parents > seems like abusing reftree to recreate commits, in a reversal of the > hack of using commit parents as a reftree. :) Yeah, maybe :-) But the properties of the existing Git objects we already use wouldn't change at all. > What if, rather than storing a hash reference to a reftree as a single > reference and replacing it with no history, In what I suggest the history is kept because the new reftree has a ref that points to the old one it is replacing. Yeah, this reftree history maybe seen as "redundant" with the commit history, but in my opinion this can be seen as a "feature" that will prevent us from "mucking" too much with the commit object. > a reftree could be > referenced from a commit and have history? (That would also allow > tagging a version of the reftree.) I think that tags are already allowed to point to any kind of Git object, so tagging a reftree should be allowed anyway if we add a reftree object. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 23:04 ` Christian Couder @ 2016-11-13 17:50 ` Stefano Zacchiroli 0 siblings, 0 replies; 35+ messages in thread From: Stefano Zacchiroli @ 2016-11-13 17:50 UTC (permalink / raw) To: Christian Couder Cc: Josh Triplett, Junio C Hamano, git, Shawn O. Pierce, Jeff King Hi everyone, On Sat, Nov 05, 2016 at 12:04:08AM +0100, Christian Couder wrote: > On Fri, Nov 4, 2016 at 10:19 PM, Josh Triplett <josh@joshtriplett.org> wrote: > > On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: > >> > >> Couldn't a RefTree be used to store refs that point to the base > >> commit, the tip commit and the blob that contains the cover letter, > >> and maybe also a ref pointing to the RefTree of the previous version > >> of the series? > > > > That's really interesting! The Software Heritage project is working on > > something similar, because they want to store all the refs as part of > > their data model as well. I'll point them to the reftree work. > > Yeah, I know them :-) and I think I have already told Stefano > Zacchiroli about this, but I am not sure anymore. > Anyway I am CC'ing him. Thanks Christian (and Josh, on swh-devel) for pointing me to this. As a bit of background, the conceptual data model we have adopted for Software Heritage [1] is indeed that of a global Merkle DAG, very much inspired by Git, but where we deduplicate past the boundaries of individual VCS repositories. This way we can store only once the same software artifacts (blobs, trees, commits, etc.) even when they can be found at different software origins [2] (be it due to GitHub-like forks, projects moving around, or simply rogue copies of the same code scattered around the Internet). [1]: https://www.softwareheritage.org/ [2]: "software origin" is Software Heritage terminology, which just stands for places on the Internet where we can find source code In our original design the topmost entries in our Merkle hierarchy used to be commits and tags, similar to what Git does. But then we realized that doing so inhibited us from sharing entire repository states across multiple software origins or multiple visits of the same software origin. So we decided to add "repository snapshot objects" as our topmost entries, which are essentially git-like objects that map refs to the ID of the corresponding (typed-)objects. Rationale and a more lengthy description of this is available on our wiki [3]. It is not implemented yet, but we're pretty sold on the design at this point. [3]: https://wiki.softwareheritage.org/index.php?title=Repository_snapshot_objects Now, even if my only awareness of what's going on in Git upstream is limited to sporadic chats with Josh and Christian :-), it seems to me that various ideas in the Git ecosystem go in the same direction of our snapshot objects (git-series, RefTree). Which is understandable, given a number of use cases might be served by this. I don't think we have much to contribute to discussion or implementation here, and for our needs it doesn't really matter which one gets implemented. That's because we need an implementation of the concept which is *external* to Git anyhow. But even if it happens to exist within actual VCS, it's not a big deal for us, as we do have ways to distinguish "synthetic" objects in the DAG that we create for our own needs from "real" objects coming from actual software origins. (Another example of this concept we already have is when we inject distribution source packages or tarballs in our archive. In that case we create synthetic commits that points to the tree extracted from the tarball/package, preserving the ability to distinguish them from real commits coming from VCS out there.) If you think we can help in any other way, other than sharing our experiences and design considerations that is, please let me know! (I'm not subscribed to the Git upstream mailing list, but feel free to Cc:-me in conversations related to this topic.) Cheers. -- Stefano Zacchiroli . zack@upsilon.cc . upsilon.cc/zack . . o . . . o . o Computer Science Professor . CTO Software Heritage . . . . . o . . . o o Former Debian Project Leader . OSI Board Director . . . o o o . . . o . « the first rule of tautology club is the first rule of tautology club » ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 21:19 ` Josh Triplett 2016-11-04 23:04 ` Christian Couder @ 2016-11-05 21:56 ` Christian Couder 1 sibling, 0 replies; 35+ messages in thread From: Christian Couder @ 2016-11-05 21:56 UTC (permalink / raw) To: Josh Triplett; +Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King On Fri, Nov 4, 2016 at 10:19 PM, Josh Triplett <josh@joshtriplett.org> wrote: > On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: >> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano <gitster@pobox.com> wrote: >> > >> > Imagine we invent a new tree entry type, "gitref", that is similar >> > to "gitlink" in that it can record a commit object name in a tree, >> > but unlike "gitlink" it does imply reachability. And you do not add >> > phony parents to your commit object. A tree that has "gitref"s in >> > it is about annotating the commits in the same repository (e.g. the >> > tree references two commits, "base" and "tip", to point into a slice >> > of the main history). And it is perfectly sensible for such a >> > pointer to imply reachability---after all it serves different >> > purposes from "gitlink". >> >> The more I think about this (and also about how to limit ref >> advertisements as recently discussed in >> https://public-inbox.org/git/20161024132932.i42rqn2vlpocqmkq@sigill.intra.peff.net/), >> the more I think about Shawn's RefTree: >> >> https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/ Just to make things clear, after reading the above link that I posted :-) ... >> Couldn't a RefTree be used to store refs that point to the base >> commit, the tip commit and the blob that contains the cover letter, >> and maybe also a ref pointing to the RefTree of the previous version >> of the series? > > That's really interesting! The Software Heritage project is working on > something similar, because they want to store all the refs as part of > their data model as well. I'll point them to the reftree work. > > If upstream git supported RefTree, I could potentially use that for > git-series. However, I do want a commit message and history for the > series itself, and using refs in the reftree to refer to the parents > seems like abusing reftree to recreate commits, in a reversal of the > hack of using commit parents as a reftree. :) > > What if, rather than storing a hash reference to a reftree as a single > reference and replacing it with no history, a reftree could be > referenced from a commit and have history? (That would also allow > tagging a version of the reftree.) ... I think that indeed that's what Shawn's reftree proposal is about, so I agree that it makes sense. We just need to find a good way to specify object reachability. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 20:47 ` Christian Couder 2016-11-04 21:19 ` Josh Triplett @ 2016-11-05 4:42 ` Junio C Hamano 2016-11-05 12:17 ` Christian Couder 1 sibling, 1 reply; 35+ messages in thread From: Junio C Hamano @ 2016-11-05 4:42 UTC (permalink / raw) To: Christian Couder; +Cc: Josh Triplett, git, Shawn O. Pierce, Jeff King Christian Couder <christian.couder@gmail.com> writes: > Couldn't a RefTree be used to store refs that point to the base > commit, I think it is the other way around. With the new "gitref" thing that is a pointer to an in-repository commit, RefTree can be naturally implemented. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 4:42 ` Junio C Hamano @ 2016-11-05 12:17 ` Christian Couder 2016-11-05 12:45 ` Christian Couder 0 siblings, 1 reply; 35+ messages in thread From: Christian Couder @ 2016-11-05 12:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: Josh Triplett, git, Shawn O. Pierce, Jeff King On Sat, Nov 5, 2016 at 5:42 AM, Junio C Hamano <gitster@pobox.com> wrote: > Christian Couder <christian.couder@gmail.com> writes: > >> Couldn't a RefTree be used to store refs that point to the base >> commit, > > I think it is the other way around. With the new "gitref" thing > that is a pointer to an in-repository commit, RefTree can be > naturally implemented. Yeah, I should have read Shawn's RefTree email thread again before posting and especially before replying to Josh. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 12:17 ` Christian Couder @ 2016-11-05 12:45 ` Christian Couder 2016-11-05 15:18 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Christian Couder @ 2016-11-05 12:45 UTC (permalink / raw) To: Junio C Hamano Cc: Josh Triplett, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy On Sat, Nov 5, 2016 at 1:17 PM, Christian Couder <christian.couder@gmail.com> wrote: > On Sat, Nov 5, 2016 at 5:42 AM, Junio C Hamano <gitster@pobox.com> wrote: >> Christian Couder <christian.couder@gmail.com> writes: >> >>> Couldn't a RefTree be used to store refs that point to the base >>> commit, >> >> I think it is the other way around. With the new "gitref" thing >> that is a pointer to an in-repository commit, RefTree can be >> naturally implemented. > > Yeah, I should have read Shawn's RefTree email thread again before > posting and especially before replying to Josh. By the way, reading the following email by Peff where gitlink reachability was already discussed: https://public-inbox.org/git/20151217221045.GA8150@sigill.intra.peff.net/ and where Peff wrote: > Of course, the lack of reachability has advantages, too. You can > drop commits pointed to by old reflogs without rewriting the ref > history. Unfortunately you cannot expunge the reflogs at all. That's > good if you like audit trails. Bad if you are worried that your reflogs > will grow large. :) I think that we may not need "gitref" at all. We perhaps could just have more ways to configure and tweak how a repo manages commit reachability related to gitlinks. With shallow clones we already need ways to configure and tweak commit reachability anyway. And with what Peff says above it looks like we will need ways configure and tweak commit reachability with gitlink/gitref anyway. So the point of gitref compared to gitlink would be that they just have a different reachability by default. But couldn't that be replaced by a default rule saying that when a gitlink is reached "this way or that way" then the commit reachability should be enforced, and otherwise it should not be? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 12:45 ` Christian Couder @ 2016-11-05 15:18 ` Josh Triplett 2016-11-05 20:21 ` Christian Couder 0 siblings, 1 reply; 35+ messages in thread From: Josh Triplett @ 2016-11-05 15:18 UTC (permalink / raw) To: Christian Couder Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote: > And with what Peff says above it looks like we will need ways > configure and tweak commit reachability with gitlink/gitref anyway. So > the point of gitref compared to gitlink would be that they just have a > different reachability by default. But couldn't that be replaced by a > default rule saying that when a gitlink is reached "this way or that > way" then the commit reachability should be enforced, and otherwise it > should not be? Any version of git unaware of that rule, though, would consider objects only reachable by gitlink as unreachable and delete them, causing data loss. Likewise for a server not aware of that rule. And a server unaware of that rule would not supply those objects to a client pulling such a branch. So I don't think "gitlink defined as reachable" quite works, unless we make some other format-incompatible change that forces clients and servers touching that gitlink to know about that reachability rule. (In the absence of a hack such as making the same commit a parent.) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 15:18 ` Josh Triplett @ 2016-11-05 20:21 ` Christian Couder 2016-11-05 20:25 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Christian Couder @ 2016-11-05 20:21 UTC (permalink / raw) To: Josh Triplett Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <josh@joshtriplett.org> wrote: > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote: >> And with what Peff says above it looks like we will need ways >> configure and tweak commit reachability with gitlink/gitref anyway. So >> the point of gitref compared to gitlink would be that they just have a >> different reachability by default. But couldn't that be replaced by a >> default rule saying that when a gitlink is reached "this way or that >> way" then the commit reachability should be enforced, and otherwise it >> should not be? > > Any version of git unaware of that rule, though, would consider objects > only reachable by gitlink as unreachable and delete them, causing data > loss. Likewise for a server not aware of that rule. And a server > unaware of that rule would not supply those objects to a client pulling > such a branch. Yeah, so you would really need an up-to-date server and client to store the git-series data. But anyway if we create a gitref object, you would also need up-to-date servers and clients to make it work. > So I don't think "gitlink defined as reachable" quite works, unless we > make some other format-incompatible change that forces clients and > servers touching that gitlink to know about that reachability rule. (In > the absence of a hack such as making the same commit a parent.) There are other tools that would like to tweak reachability rules for objects. For example Mike Hommey's git-cinnabar: https://public-inbox.org/git/20150331100756.GA13377@glandium.org/ and my current work on external object databases: https://public-inbox.org/git/20160628181933.24620-1-chriscool@tuxfamily.org/ would be interested in a way to make some blobs not reachable. So if we had default rules and a generic way to specify that some objects are, or are not, reachable, that could be used by many tools, and the design of these tools would be simplified. Maybe this could be specified in the attributes files, or in a special file like for shallow clone. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 20:21 ` Christian Couder @ 2016-11-05 20:25 ` Josh Triplett 2016-11-06 4:50 ` Jacob Keller 0 siblings, 1 reply; 35+ messages in thread From: Josh Triplett @ 2016-11-05 20:25 UTC (permalink / raw) To: Christian Couder Cc: Junio C Hamano, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sat, Nov 05, 2016 at 09:21:58PM +0100, Christian Couder wrote: > On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <josh@joshtriplett.org> wrote: > > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote: > >> And with what Peff says above it looks like we will need ways > >> configure and tweak commit reachability with gitlink/gitref anyway. So > >> the point of gitref compared to gitlink would be that they just have a > >> different reachability by default. But couldn't that be replaced by a > >> default rule saying that when a gitlink is reached "this way or that > >> way" then the commit reachability should be enforced, and otherwise it > >> should not be? > > > > Any version of git unaware of that rule, though, would consider objects > > only reachable by gitlink as unreachable and delete them, causing data > > loss. Likewise for a server not aware of that rule. And a server > > unaware of that rule would not supply those objects to a client pulling > > such a branch. > > Yeah, so you would really need an up-to-date server and client to > store the git-series data. > But anyway if we create a gitref object, you would also need > up-to-date servers and clients to make it work. Agreed, but gitrefs have the advantage of failing safe, rather than failing with dataloss. - Josh Triplett ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-05 20:25 ` Josh Triplett @ 2016-11-06 4:50 ` Jacob Keller 2016-11-06 16:34 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Jacob Keller @ 2016-11-06 4:50 UTC (permalink / raw) To: Josh Triplett Cc: Christian Couder, Junio C Hamano, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sat, Nov 5, 2016 at 1:25 PM, Josh Triplett <josh@joshtriplett.org> wrote: > On Sat, Nov 05, 2016 at 09:21:58PM +0100, Christian Couder wrote: >> On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <josh@joshtriplett.org> wrote: >> > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote: >> >> And with what Peff says above it looks like we will need ways >> >> configure and tweak commit reachability with gitlink/gitref anyway. So >> >> the point of gitref compared to gitlink would be that they just have a >> >> different reachability by default. But couldn't that be replaced by a >> >> default rule saying that when a gitlink is reached "this way or that >> >> way" then the commit reachability should be enforced, and otherwise it >> >> should not be? >> > >> > Any version of git unaware of that rule, though, would consider objects >> > only reachable by gitlink as unreachable and delete them, causing data >> > loss. Likewise for a server not aware of that rule. And a server >> > unaware of that rule would not supply those objects to a client pulling >> > such a branch. >> >> Yeah, so you would really need an up-to-date server and client to >> store the git-series data. >> But anyway if we create a gitref object, you would also need >> up-to-date servers and clients to make it work. > > Agreed, but gitrefs have the advantage of failing safe, rather than > failing with dataloss. > > - Josh Triplett Isn't the "failing safe" only true if the client disconnects when a server doesn't advertise "i understand gitrefs"? So couldn't we, as part of the rules for reachability advertise a capability that does a similar thing and fails safe as well? Thanks. Jake ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 4:50 ` Jacob Keller @ 2016-11-06 16:34 ` Josh Triplett 2016-11-06 17:14 ` Junio C Hamano 0 siblings, 1 reply; 35+ messages in thread From: Josh Triplett @ 2016-11-06 16:34 UTC (permalink / raw) To: Jacob Keller Cc: Christian Couder, Junio C Hamano, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sat, Nov 05, 2016 at 09:50:07PM -0700, Jacob Keller wrote: > On Sat, Nov 5, 2016 at 1:25 PM, Josh Triplett <josh@joshtriplett.org> wrote: > > On Sat, Nov 05, 2016 at 09:21:58PM +0100, Christian Couder wrote: > >> On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <josh@joshtriplett.org> wrote: > >> > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote: > >> >> And with what Peff says above it looks like we will need ways > >> >> configure and tweak commit reachability with gitlink/gitref anyway. So > >> >> the point of gitref compared to gitlink would be that they just have a > >> >> different reachability by default. But couldn't that be replaced by a > >> >> default rule saying that when a gitlink is reached "this way or that > >> >> way" then the commit reachability should be enforced, and otherwise it > >> >> should not be? > >> > > >> > Any version of git unaware of that rule, though, would consider objects > >> > only reachable by gitlink as unreachable and delete them, causing data > >> > loss. Likewise for a server not aware of that rule. And a server > >> > unaware of that rule would not supply those objects to a client pulling > >> > such a branch. > >> > >> Yeah, so you would really need an up-to-date server and client to > >> store the git-series data. > >> But anyway if we create a gitref object, you would also need > >> up-to-date servers and clients to make it work. > > > > Agreed, but gitrefs have the advantage of failing safe, rather than > > failing with dataloss. > > > > - Josh Triplett > > Isn't the "failing safe" only true if the client disconnects when a > server doesn't advertise "i understand gitrefs"? So couldn't we, as > part of the rules for reachability advertise a capability that does a > similar thing and fails safe as well? We could, but if we (or one of the many third-party git implementations) miss a case, gitlinks+reachability may appear to work in many cases with dataloss afterward, while gitrefs will fail early and not appear functional. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 16:34 ` Josh Triplett @ 2016-11-06 17:14 ` Junio C Hamano 2016-11-06 17:33 ` Josh Triplett 0 siblings, 1 reply; 35+ messages in thread From: Junio C Hamano @ 2016-11-06 17:14 UTC (permalink / raw) To: Josh Triplett Cc: Jacob Keller, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey Josh Triplett <josh@joshtriplett.org> writes: > We could, but if we (or one of the many third-party git implementations) > miss a case, gitlinks+reachability may appear to work in many cases with > dataloss afterward, while gitrefs will fail early and not appear > functional. I wonder what happens if we do not introduce the "gitref" but instead change the behaviour of "gitlink" to imply an optional reachability. That is, when enumerating what is reachable in your repository, if you see a gitlink and if you notice that you locally have the target of that gitlink, you follow, but if you know you lack it, you do not error out. This may be making things too complex to feasibily implement by simplify them ;-) and I see a few immediate fallout that needs to be thought through (i.e. downsides) and a few upsides, too. I am feeling feverish and not thinking straight, so I won't try to weigh pros-and-cons. This would definitely need protocol extension when transferring objects across repositories. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 17:14 ` Junio C Hamano @ 2016-11-06 17:33 ` Josh Triplett 2016-11-06 20:17 ` Jacob Keller 2016-11-09 22:57 ` Junio C Hamano 0 siblings, 2 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-06 17:33 UTC (permalink / raw) To: Junio C Hamano Cc: Jacob Keller, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sun, Nov 06, 2016 at 09:14:56AM -0800, Junio C Hamano wrote: > Josh Triplett <josh@joshtriplett.org> writes: > > We could, but if we (or one of the many third-party git implementations) > > miss a case, gitlinks+reachability may appear to work in many cases with > > dataloss afterward, while gitrefs will fail early and not appear > > functional. > > I wonder what happens if we do not introduce the "gitref" but > instead change the behaviour of "gitlink" to imply an optional > reachability. That is, when enumerating what is reachable in your > repository, if you see a gitlink and if you notice that you locally > have the target of that gitlink, you follow, but if you know you > lack it, you do not error out. This may be making things too > complex to feasibily implement by simplify them ;-) and I see a few > immediate fallout that needs to be thought through (i.e. downsides) > and a few upsides, too. I am feeling feverish and not thinking > straight, so I won't try to weigh pros-and-cons. > > This would definitely need protocol extension when transferring > objects across repositories. It'd also need a repository format extension locally. Otherwise, if you ever touched that repository with an older git (or a tool built on an older libgit2 or JGit or other library), you could lose data. It does seem conceptually appealing, though. In an ideal world, the original version of gitlink would have had opt-out reachability (and .gitmodules with an external repository reference could count as opting out). But I can't think of any case where it's OK for a git implementation to not know about this reachability extension and still operate on the gitlink. And given that, it might as well use a new object type that the old version definitely won't think it understands. - Josh Triplett ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 17:33 ` Josh Triplett @ 2016-11-06 20:17 ` Jacob Keller 2016-11-07 1:18 ` Josh Triplett 2016-11-09 22:57 ` Junio C Hamano 1 sibling, 1 reply; 35+ messages in thread From: Jacob Keller @ 2016-11-06 20:17 UTC (permalink / raw) To: Josh Triplett Cc: Junio C Hamano, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sun, Nov 6, 2016 at 9:33 AM, Josh Triplett <josh@joshtriplett.org> wrote: > On Sun, Nov 06, 2016 at 09:14:56AM -0800, Junio C Hamano wrote: >> Josh Triplett <josh@joshtriplett.org> writes: >> > We could, but if we (or one of the many third-party git implementations) >> > miss a case, gitlinks+reachability may appear to work in many cases with >> > dataloss afterward, while gitrefs will fail early and not appear >> > functional. >> >> I wonder what happens if we do not introduce the "gitref" but >> instead change the behaviour of "gitlink" to imply an optional >> reachability. That is, when enumerating what is reachable in your >> repository, if you see a gitlink and if you notice that you locally >> have the target of that gitlink, you follow, but if you know you >> lack it, you do not error out. This may be making things too >> complex to feasibily implement by simplify them ;-) and I see a few >> immediate fallout that needs to be thought through (i.e. downsides) >> and a few upsides, too. I am feeling feverish and not thinking >> straight, so I won't try to weigh pros-and-cons. >> >> This would definitely need protocol extension when transferring >> objects across repositories. > > It'd also need a repository format extension locally. Otherwise, if you > ever touched that repository with an older git (or a tool built on an > older libgit2 or JGit or other library), you could lose data. > > It does seem conceptually appealing, though. In an ideal world, the > original version of gitlink would have had opt-out reachability (and > .gitmodules with an external repository reference could count as opting > out). > > But I can't think of any case where it's OK for a git implementation to > not know about this reachability extension and still operate on the > gitlink. And given that, it might as well use a new object type that > the old version definitely won't think it understands. > > - Josh Triplett That's still only true if the receiving end runs fsck, isn't it? I suppose that's a large number of receivers, and at least there are ways post-push to determine that objects don't make sense to that version of git. I think using a new mode is the safest way, and it allows easily implementing RefTrees as well as other projects. Additionally, if we *wanted* additional "opt-in / opt-out" support we could add this by default to gitrefs,and they could (possibly) replace gitlinks in the future? Thanks, Jake ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 20:17 ` Jacob Keller @ 2016-11-07 1:18 ` Josh Triplett 2016-11-07 5:35 ` Jacob Keller 2016-11-07 9:42 ` Duy Nguyen 0 siblings, 2 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-07 1:18 UTC (permalink / raw) To: Jacob Keller Cc: Junio C Hamano, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sun, Nov 06, 2016 at 12:17:10PM -0800, Jacob Keller wrote: > On Sun, Nov 6, 2016 at 9:33 AM, Josh Triplett <josh@joshtriplett.org> wrote: > > On Sun, Nov 06, 2016 at 09:14:56AM -0800, Junio C Hamano wrote: > >> Josh Triplett <josh@joshtriplett.org> writes: > >> > We could, but if we (or one of the many third-party git implementations) > >> > miss a case, gitlinks+reachability may appear to work in many cases with > >> > dataloss afterward, while gitrefs will fail early and not appear > >> > functional. > >> > >> I wonder what happens if we do not introduce the "gitref" but > >> instead change the behaviour of "gitlink" to imply an optional > >> reachability. That is, when enumerating what is reachable in your > >> repository, if you see a gitlink and if you notice that you locally > >> have the target of that gitlink, you follow, but if you know you > >> lack it, you do not error out. This may be making things too > >> complex to feasibily implement by simplify them ;-) and I see a few > >> immediate fallout that needs to be thought through (i.e. downsides) > >> and a few upsides, too. I am feeling feverish and not thinking > >> straight, so I won't try to weigh pros-and-cons. > >> > >> This would definitely need protocol extension when transferring > >> objects across repositories. > > > > It'd also need a repository format extension locally. Otherwise, if you > > ever touched that repository with an older git (or a tool built on an > > older libgit2 or JGit or other library), you could lose data. > > > > It does seem conceptually appealing, though. In an ideal world, the > > original version of gitlink would have had opt-out reachability (and > > .gitmodules with an external repository reference could count as opting > > out). > > > > But I can't think of any case where it's OK for a git implementation to > > not know about this reachability extension and still operate on the > > gitlink. And given that, it might as well use a new object type that > > the old version definitely won't think it understands. > > > > - Josh Triplett > > That's still only true if the receiving end runs fsck, isn't it? I > suppose that's a large number of receivers, and at least there are > ways post-push to determine that objects don't make sense to that > version of git. > > I think using a new mode is the safest way, and it allows easily > implementing RefTrees as well as other projects. Additionally, if we > *wanted* additional "opt-in / opt-out" support we could add this by > default to gitrefs,and they could (possibly) replace gitlinks in the > future? Once we have gitrefs, you have both alternatives: reachable (gitref) or not reachable (gitlink). However, if you want some way to mark reachable objects as not reachable, such as for a sparse checkout, external large-object storage, or similar, then you can use a single unified mechanism for that whether working with gitrefs, trees, or blobs. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-07 1:18 ` Josh Triplett @ 2016-11-07 5:35 ` Jacob Keller 2016-11-07 9:42 ` Duy Nguyen 1 sibling, 0 replies; 35+ messages in thread From: Jacob Keller @ 2016-11-07 5:35 UTC (permalink / raw) To: Josh Triplett Cc: Junio C Hamano, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey On Sun, Nov 6, 2016 at 5:18 PM, Josh Triplett <josh@joshtriplett.org> wrote: > Once we have gitrefs, you have both alternatives: reachable (gitref) or > not reachable (gitlink). > > However, if you want some way to mark reachable objects as not > reachable, such as for a sparse checkout, external large-object storage, > or similar, then you can use a single unified mechanism for that whether > working with gitrefs, trees, or blobs. Fair enough. Thanks, Jake ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-07 1:18 ` Josh Triplett 2016-11-07 5:35 ` Jacob Keller @ 2016-11-07 9:42 ` Duy Nguyen 2016-11-07 16:11 ` Josh Triplett 1 sibling, 1 reply; 35+ messages in thread From: Duy Nguyen @ 2016-11-07 9:42 UTC (permalink / raw) To: Josh Triplett Cc: Jacob Keller, Junio C Hamano, Christian Couder, git, Shawn O. Pierce, Jeff King, Mike Hommey On Mon, Nov 7, 2016 at 8:18 AM, Josh Triplett <josh@joshtriplett.org> wrote: > Once we have gitrefs, you have both alternatives: reachable (gitref) or > not reachable (gitlink). > > However, if you want some way to mark reachable objects as not > reachable, such as for a sparse checkout, external large-object storage, > or similar, then you can use a single unified mechanism for that whether > working with gitrefs, trees, or blobs. How? Whether an object reachable or not is baked in the definition (of either gitlink or gitref). I don't think you can have a "maybe reachable" type then rely on an external source to determine reachability, -- Duy ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-07 9:42 ` Duy Nguyen @ 2016-11-07 16:11 ` Josh Triplett 0 siblings, 0 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-07 16:11 UTC (permalink / raw) To: Duy Nguyen Cc: Jacob Keller, Junio C Hamano, Christian Couder, git, Shawn O. Pierce, Jeff King, Mike Hommey On Mon, Nov 07, 2016 at 04:42:04PM +0700, Duy Nguyen wrote: > On Mon, Nov 7, 2016 at 8:18 AM, Josh Triplett <josh@joshtriplett.org> wrote: > > Once we have gitrefs, you have both alternatives: reachable (gitref) or > > not reachable (gitlink). > > > > However, if you want some way to mark reachable objects as not > > reachable, such as for a sparse checkout, external large-object storage, > > or similar, then you can use a single unified mechanism for that whether > > working with gitrefs, trees, or blobs. > > How? Whether an object reachable or not is baked in the definition (of > either gitlink or gitref). I don't think you can have a "maybe > reachable" type then rely on an external source to determine > reachability, You'd have various "reachable by default" entries in trees, including trees, blobs, and gitrefs, and then have an external mechanism (likely via .git/config) to say "ignore objects with these hashes/paths". For instance, you might say "ignore all objects only reachable from the path 'assets/video/*' within a commit's tree". With the right set of client and server extensions, you could then avoid downloading those objects. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-06 17:33 ` Josh Triplett 2016-11-06 20:17 ` Jacob Keller @ 2016-11-09 22:57 ` Junio C Hamano 1 sibling, 0 replies; 35+ messages in thread From: Junio C Hamano @ 2016-11-09 22:57 UTC (permalink / raw) To: Josh Triplett Cc: Jacob Keller, Christian Couder, git, Shawn O. Pierce, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey Josh Triplett <josh@joshtriplett.org> writes: >> This would definitely need protocol extension when transferring >> objects across repositories. > > It'd also need a repository format extension locally. Otherwise, if you > ever touched that repository with an older git (or a tool built on an > older libgit2 or JGit or other library), you could lose data. True. Thanks for sanity-checking me. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Regarding "git log" on "git series" metadata 2016-11-04 17:57 Regarding "git log" on "git series" metadata Junio C Hamano 2016-11-04 19:19 ` Jacob Keller 2016-11-04 20:47 ` Christian Couder @ 2016-11-04 21:06 ` Josh Triplett 2 siblings, 0 replies; 35+ messages in thread From: Josh Triplett @ 2016-11-04 21:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Fri, Nov 04, 2016 at 10:57:09AM -0700, Junio C Hamano wrote: > After your talk at LPC2016, I was thinking about your proposal to > give an option to hide certain parents from "git log" traversal. > > While I do not think we would terribly mind a new feature in the > core to support third-party additions like "git series" better, I > think this particular one is a big mistake that we shouldn't take. [...] > I think this is backwards. The root cause of the issue you have > with "gitk" is because you added something that is *NOT* a parent to > your commit. We shouldn't have to add a mechanism to filter > something that shouldn't have been added there in the first place. > > I am wondering if an alternative approach would work better. > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". I absolutely agree with this, and I'd love to have gitref or similar in core git. Given the availability of that mechanism, I'd love to use it in git-series. (And in git submodule, as well, for other projects.) The one critical issue there, though: that would break backward compatibility with old versions of git. No old version of git could push, pull, gc, repack, or otherwise touch a repository that used this feature. The advantages of the approach (viewing and manipulating the series with pure git) seem sufficiently high to make that worth considering, but it is a significant downside. > Another alternative that I am negative about (but is probably a > better hack than how you abused the "parent" link) might be to add a > new commit object header field that behaves similarly to "parent" > only in that it implies reachability. But recording the extra > parent in commit object was not something you wanted to do in the > first place (i.e. your series processing is done solely on the > contents of the tree, and you do not read this extra parent). If you > need to add an in-tree reference to another commit in your future > versions of "git series", with either this variant or your original > implementation, you would end up needing adding more "parent" (or > pseudo parent) only to preserve reachability. At that point, I > think it makes more sense to have entries in the tree to directly > ensure reachability, if you want these entries to always point at an > in-tree object. This would similarly break compatibility with old git, as old git wouldn't follow those reachability-only links from commits, so it could throw away the data. One approach compatible with old git would be to continue adding the relevant commits as artificial parents, but have a separate commit metadata field that says which parents to ignore; old git would then do the right thing, as long as it doesn't rewrite the commit entirely. That does have the same disadvantages of having to duplicate the information in both the tree and the parent list, though; it's the same class of hack, just with improved usability. I'd much rather use gitrefs. > I am afraid that I probably am two steps ahead of myself, because I > am reasonably sure that it is quite possible that I have overlooked > something trivially obvious that makes the "gitref" approach > unworkable. gitref seems like a good idea to me, as long as we can sort out the compatibility story. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2016-11-13 17:58 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-04 17:57 Regarding "git log" on "git series" metadata Junio C Hamano 2016-11-04 19:19 ` Jacob Keller 2016-11-04 19:49 ` Jeff King 2016-11-04 21:55 ` Josh Triplett 2016-11-04 23:37 ` Jacob Keller 2016-11-04 23:46 ` Josh Triplett 2016-11-04 23:34 ` Jacob Keller 2016-11-05 1:48 ` Jeff King 2016-11-05 3:55 ` Josh Triplett 2016-11-05 4:41 ` Jeff King 2016-11-05 4:41 ` Junio C Hamano 2016-11-05 4:44 ` Jeff King 2016-11-05 5:00 ` Junio C Hamano 2016-11-04 20:47 ` Christian Couder 2016-11-04 21:19 ` Josh Triplett 2016-11-04 23:04 ` Christian Couder 2016-11-13 17:50 ` Stefano Zacchiroli 2016-11-05 21:56 ` Christian Couder 2016-11-05 4:42 ` Junio C Hamano 2016-11-05 12:17 ` Christian Couder 2016-11-05 12:45 ` Christian Couder 2016-11-05 15:18 ` Josh Triplett 2016-11-05 20:21 ` Christian Couder 2016-11-05 20:25 ` Josh Triplett 2016-11-06 4:50 ` Jacob Keller 2016-11-06 16:34 ` Josh Triplett 2016-11-06 17:14 ` Junio C Hamano 2016-11-06 17:33 ` Josh Triplett 2016-11-06 20:17 ` Jacob Keller 2016-11-07 1:18 ` Josh Triplett 2016-11-07 5:35 ` Jacob Keller 2016-11-07 9:42 ` Duy Nguyen 2016-11-07 16:11 ` Josh Triplett 2016-11-09 22:57 ` Junio C Hamano 2016-11-04 21:06 ` Josh Triplett
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).