* A generalization of git notes from blobs to trees - git metadata? @ 2010-02-06 13:32 Jon Seymour 2010-02-07 1:36 ` Johan Herland 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2010-02-06 13:32 UTC (permalink / raw) To: Git Mailing List git notes is a nice innovation - well done to all those involved. Has consideration ever been given to generalizing the concept to allow note (or more correctly - metadata) trees with arbitrary sha1s? For example, suppose you had reason to cache the distribution that resulted from the build of a particular commit, then it'd be nice to be able to do this using a notes like mechanism. git metadata import foo-1.1.0 dist ~/foo/dist would create a git tree from the contents of ~/foo/dist and then bind it to meta item called dist associated with the sha1 corresponding to foo-1.1.0 To retrieve the contents of the previous build, you'd do something like get metadata export foo-1.1.0 dist /tmp/foo-1.1.0 This would find the metadata tree associated with foo-1.1.0, extract the dist subtree from that tree and write it to disk at /tmp/foo-1.1.0 I've used build outputs as an example here, but really it needn't be limited to that. I can see this facility would be useful for any kind of annotation or derived result that is more complex than a single text blob. Metadata trees in combination with a name spacing technique, could be used to store arbitrary metadata created by an arbitrary set of tools to arbitrary SHA1 objects. jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-06 13:32 A generalization of git notes from blobs to trees - git metadata? Jon Seymour @ 2010-02-07 1:36 ` Johan Herland 2010-02-07 2:21 ` Junio C Hamano 2010-02-07 3:27 ` Jon Seymour 0 siblings, 2 replies; 19+ messages in thread From: Johan Herland @ 2010-02-07 1:36 UTC (permalink / raw) To: Jon Seymour; +Cc: git On Saturday 06 February 2010, Jon Seymour wrote: > git notes is a nice innovation - well done to all those involved. Thanks. > Has consideration ever been given to generalizing the concept to allow > note (or more correctly - metadata) trees with arbitrary sha1s? Not sure what you mean here. The note infrastructure allows _any_ SHA1 (not necessarily the SHA1 of an existing Git object) to be bound to a note object. Furthermore, although we currently assume that all note objects are blobs, someone (who?) has already suggested (as mentioned in the notes TODO list) that a note object could also be a _tree_ object that can be unpacked/read to reveal further "sub-notes". Hence, in addition to having multiple notes refs (e.g. refs/notes/commits:deadbeef, refs/notes/bugs:deadbeef, etc.) to categorize notes, you could also classify notes _after_ having traversed the notes tree (e.g. refs/notes/bugs:deadbeef/fixes, refs/notes/bugs:deadbeef/causes). Note that support for this has not yet been written, and AFAIK it is also uncertain how such a change would affect the different use cases for notes (e.g. how to display them in 'git log') > For example, suppose you had reason to cache the distribution that > resulted from the build of a particular commit, then it'd be nice to > be able to do this using a notes like mechanism. > > git metadata import foo-1.1.0 dist ~/foo/dist > > would create a git tree from the contents of ~/foo/dist and then bind > it to meta item called dist associated with the sha1 corresponding to > foo-1.1.0 You can do this already today by simply using 'git tag': # Prepare an index with the contents of ~/foo/dist git tag foo-1.1.0-dist $(git write-tree) I don't see why you'd need to add a new metadata command. > To retrieve the contents of the previous build, you'd do something like > > get metadata export foo-1.1.0 dist /tmp/foo-1.1.0 > > This would find the metadata tree associated with foo-1.1.0, extract > the dist subtree from that tree and write it to disk at /tmp/foo-1.1.0 Or, if you use a tag instead: git --work-tree=/tmp/foo-1.1.0 checkout foo-1.1.0-dist > I've used build outputs as an example here, but really it needn't be > limited to that. I can see this facility would be useful for any kind > of annotation or derived result that is more complex than a single > text blob. Metadata trees in combination with a name spacing > technique, could be used to store arbitrary metadata created by an > arbitrary set of tools to arbitrary SHA1 objects. I still don't see why this provides anything that isn't already supported by either using 'git tag', or by implementing support for notes-as-trees in the notes feature. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 1:36 ` Johan Herland @ 2010-02-07 2:21 ` Junio C Hamano 2010-02-07 5:02 ` Jeff King 2010-02-07 3:27 ` Jon Seymour 1 sibling, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2010-02-07 2:21 UTC (permalink / raw) To: Johan Herland; +Cc: Jon Seymour, git Johan Herland <johan@herland.net> writes: > Furthermore, although we currently assume that all note objects are blobs, > someone (who?) has already suggested (as mentioned in the notes TODO list) > that a note object could also be a _tree_ object that can be unpacked/read > to reveal further "sub-notes". I would advice you not to go there. How would you even _merge_ such a thing with other notes attached to the same object? What determines the path in that tree object? Clueless ones can freely make misguided suggestions without thinking things through and make things unnecessarily complex without real gain. You do not have to listen to every one of them. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 2:21 ` Junio C Hamano @ 2010-02-07 5:02 ` Jeff King 2010-02-07 5:36 ` Jon Seymour ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Jeff King @ 2010-02-07 5:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johan Herland, Jon Seymour, git On Sat, Feb 06, 2010 at 06:21:37PM -0800, Junio C Hamano wrote: > Johan Herland <johan@herland.net> writes: > > > Furthermore, although we currently assume that all note objects are blobs, > > someone (who?) has already suggested (as mentioned in the notes TODO list) > > that a note object could also be a _tree_ object that can be unpacked/read > > to reveal further "sub-notes". > > I would advice you not to go there. How would you even _merge_ such a > thing with other notes attached to the same object? What determines the > path in that tree object? > > Clueless ones can freely make misguided suggestions without thinking > things through and make things unnecessarily complex without real gain. > You do not have to listen to every one of them. I think I may have been the one to suggest trees or notes at one point. But let me clarify that this is not exactly what the OP is proposing in this thread. My suggestion was that some use cases may have many key/value pairs of notes for a single sha1. We basically have two options: 1. store each in a separate notes ref, with each sha1 mapping to a blob. The note "name" is the name of the ref. 2. store notes in a single notes ref, with each sha1 mapping to a tree with named sub-notes. The note "name" is the combination of ref-name and tree entry name. The advantage of (1) is that notes are not bound tightly to each other. I can distribute the notes tree for one "name" independent of the others. The advantage of (2) is that it is faster and smaller. In (1), each note has a separate index, and we must traverse each note index separately. In practice, I would expect to use (1) for logically separate datasets. For example, automatic bug-tracking notes would go in a different ref from human annotations. But I would expect to use (2) if I had, say, 5 different pieces of bug tracking information and I wanted an easy way to refer to them individually. And a specialized merge for that is straightforward. In the simplest case, you simply say "notes of this ref are tree-type, or they are blob-type" and then you have no merge problems. But if you want to get fancy, you can say that a conflict between "sha1/blob" and "sha1/tree/key" should automatically "promote" the first one into "sha1/tree/default" or some other canonical name. Note that all of this is my pie-in-the-sky "here is what I was thinking of when I looked at notes a long time ago". I don't care strongly if it gets implemented or not at this point; I just wanted to add some context to what Johan had in his notes todo list (or maybe I am wrong, and what is in his todo list was based on something totally different said by somebody else, and I have just confused the issue more. :) ). With respect to the idea of storing an arbitrary tree, I agree it is probably too complex with respect to merging. In addition, it makes things like "git log --format=%N" confusing. I think you would do better to simply store a tree sha1 inside the note blob, and callers who were interested in the tree contents could then dereference it and examine as they saw fit. The only caveat is that you need some way of telling git that the referenced trees are reachable and not to be pruned. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 5:02 ` Jeff King @ 2010-02-07 5:36 ` Jon Seymour 2010-02-07 9:15 ` Jakub Narebski 2010-02-07 19:33 ` Jeff King 2010-02-07 18:48 ` Junio C Hamano 2010-02-07 22:46 ` Johan Herland 2 siblings, 2 replies; 19+ messages in thread From: Jon Seymour @ 2010-02-07 5:36 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Johan Herland, git On Sun, Feb 7, 2010 at 4:02 PM, Jeff King <peff@peff.net> wrote: > On Sat, Feb 06, 2010 at 06:21:37PM -0800, Junio C Hamano wrote: > >> Johan Herland <johan@herland.net> writes: >> >> > Furthermore, although we currently assume that all note objects are blobs, >> > someone (who?) has already suggested (as mentioned in the notes TODO list) >> > that a note object could also be a _tree_ object that can be unpacked/read >> > to reveal further "sub-notes". >> >> I would advice you not to go there. How would you even _merge_ such a >> thing with other notes attached to the same object? What determines the >> path in that tree object? >> >> Clueless ones can freely make misguided suggestions without thinking >> things through and make things unnecessarily complex without real gain. >> You do not have to listen to every one of them. > > I think I may have been the one to suggest trees or notes at one point. > But let me clarify that this is not exactly what the OP is proposing in > this thread. > > My suggestion was that some use cases may have many key/value pairs of > notes for a single sha1. We basically have two options: > > 1. store each in a separate notes ref, with each sha1 mapping to > a blob. The note "name" is the name of the ref. > > 2. store notes in a single notes ref, with each sha1 mapping to a > tree with named sub-notes. The note "name" is the combination of > ref-name and tree entry name. > So, of course, options (1) and (2) need not be exclusive. Use Option (1) for different metadata sets and option (2) to partition individual datasets. > > With respect to the idea of storing an arbitrary tree, I agree it is > probably too complex with respect to merging. In addition, it makes > things like "git log --format=%N" confusing. I think you would do better > to simply store a tree sha1 inside the note blob, and callers who were > interested in the tree contents could then dereference it and examine as > they saw fit. The only caveat is that you need some way of telling git > that the referenced trees are reachable and not to be pruned. > As I see it, the existing use of notes is a special instance of a more general metadata capability in which the metadata is constrained to be a single blob. If notes continued to be constrained in this way, there is no reason to change anything with respect to its current userspace behaviour. That said, most of the plumbing which enabled notes could be generalized to enable the arbitrary tree case [ which admittedly, I have yet to sell successfully !] In one sense, there is a sense in the merge issue doesn't exist. When the maintainer publishes a tag no-one expects to have to deal with downstream conflicting definitions of the tag. Likewise, if the maintainer were to publish the /man and /html metadata trees (per my previous example) for a release tag, anyone who received /refs/metadata/doc would expect to receive the metadata trees as published by the maintainer. Anyone who didn't wouldn't have to pull /refs/metadata/doc. I can see there are use cases where multiple parties might want to contribute metadata and I do not currently have a good solution to that problem, but that is not to say there isn't one - surely it is just a question of applying a little intellect creatively? jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 5:36 ` Jon Seymour @ 2010-02-07 9:15 ` Jakub Narebski 2010-02-07 9:41 ` Jon Seymour 2010-02-07 19:33 ` Jeff King 1 sibling, 1 reply; 19+ messages in thread From: Jakub Narebski @ 2010-02-07 9:15 UTC (permalink / raw) To: Jon Seymour; +Cc: Jeff King, Junio C Hamano, Johan Herland, git Jon Seymour <jon.seymour@gmail.com> writes: [cut] > As I see it, the existing use of notes is a special instance of a more > general metadata capability in which the metadata is constrained to be > a single blob. If notes continued to be constrained in this way, there > is no reason to change anything with respect to its current userspace > behaviour. That said, most of the plumbing which enabled notes could > be generalized to enable the arbitrary tree case [ which admittedly, I > have yet to sell successfully !] > > In one sense, there is a sense in the merge issue doesn't exist. When > the maintainer publishes a tag no-one expects to have to deal with > downstream conflicting definitions of the tag. Likewise, if the > maintainer were to publish the /man and /html metadata trees (per my > previous example) for a release tag, anyone who received > /refs/metadata/doc would expect to receive the metadata trees as > published by the maintainer. Anyone who didn't wouldn't have to pull > /refs/metadata/doc. > > I can see there are use cases where multiple parties might want to > contribute metadata and I do not currently have a good solution to > that problem, but that is not to say there isn't one - surely it is > just a question of applying a little intellect creatively? Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem data/resource forks, and Microsoft's Alternate Data Streams in git? :-) -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 9:15 ` Jakub Narebski @ 2010-02-07 9:41 ` Jon Seymour 2010-02-07 10:15 ` Jon Seymour 0 siblings, 1 reply; 19+ messages in thread From: Jon Seymour @ 2010-02-07 9:41 UTC (permalink / raw) To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Johan Herland, git On Sun, Feb 7, 2010 at 8:15 PM, Jakub Narebski <jnareb@gmail.com> wrote: > Jon Seymour <jon.seymour@gmail.com> writes: > > [cut] > >> As I see it, the existing use of notes is a special instance of a more >> general metadata capability in which the metadata is constrained to be >> a single blob. If notes continued to be constrained in this way, there >> is no reason to change anything with respect to its current userspace >> behaviour. That said, most of the plumbing which enabled notes could >> be generalized to enable the arbitrary tree case [ which admittedly, I >> have yet to sell successfully !] >> >> In one sense, there is a sense in the merge issue doesn't exist. When >> the maintainer publishes a tag no-one expects to have to deal with >> downstream conflicting definitions of the tag. Likewise, if the >> maintainer were to publish the /man and /html metadata trees (per my >> previous example) for a release tag, anyone who received >> /refs/metadata/doc would expect to receive the metadata trees as >> published by the maintainer. Anyone who didn't wouldn't have to pull >> /refs/metadata/doc. >> >> I can see there are use cases where multiple parties might want to >> contribute metadata and I do not currently have a good solution to >> that problem, but that is not to say there isn't one - surely it is >> just a question of applying a little intellect creatively? > > Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem > data/resource forks, and Microsoft's Alternate Data Streams in git? :-) > No I am not. I don't see why a metadata proposal is any more exposed to subversive payloads than say, use of git merge -s ours [ a subversive payload could be made reachable from a commit that otherwise merges in favour of the legitimate source - who would know? ] Really, I can't see why the rationale that makes a single blob used for extending a commit message justified can't be used to justify associating a metadata tree of arbitrary complexity to an arbitrary sha1 object. What makes maintaining a mapping to a single blob acceptable but maintaining a mapping to a tree unacceptable? Is there really any fundamental difference? jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 9:41 ` Jon Seymour @ 2010-02-07 10:15 ` Jon Seymour 0 siblings, 0 replies; 19+ messages in thread From: Jon Seymour @ 2010-02-07 10:15 UTC (permalink / raw) To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Johan Herland, git To explain a little further why the metadata concept is different to the resource fork or alternate data stream concept. These two concepts were based on the idea of associating metadata with the name of the resource and preserving the metadata along with the resource as the resource evolved. This is not the intent with the metadata concept. Rather, the idea is to annotate the content (whether it be a commit, tree or blob) with other content (a tree in the general metadata case, or a blob in the git notes case) The use cases I have in mind relate to caching "expensive (or impractical) to re-derive" results from an input. So, for example, storing /man and /html trees for a given commit in a metadata commit called "refs/metadata/doc" would be one case. Storing the foreign SCM revision id that a git repo was pushed into would be another [ storing it the commit message isn't an option because the commit has already happened before the push ]. [ And granted: git notes can already be used for this scenario ]. jon. On Sun, Feb 7, 2010 at 8:41 PM, Jon Seymour <jon.seymour@gmail.com> wrote: > On Sun, Feb 7, 2010 at 8:15 PM, Jakub Narebski <jnareb@gmail.com> wrote: >> Jon Seymour <jon.seymour@gmail.com> writes: >> >> [cut] >> >>> As I see it, the existing use of notes is a special instance of a more >>> general metadata capability in which the metadata is constrained to be >>> a single blob. If notes continued to be constrained in this way, there >>> is no reason to change anything with respect to its current userspace >>> behaviour. That said, most of the plumbing which enabled notes could >>> be generalized to enable the arbitrary tree case [ which admittedly, I >>> have yet to sell successfully !] >>> >>> In one sense, there is a sense in the merge issue doesn't exist. When >>> the maintainer publishes a tag no-one expects to have to deal with >>> downstream conflicting definitions of the tag. Likewise, if the >>> maintainer were to publish the /man and /html metadata trees (per my >>> previous example) for a release tag, anyone who received >>> /refs/metadata/doc would expect to receive the metadata trees as >>> published by the maintainer. Anyone who didn't wouldn't have to pull >>> /refs/metadata/doc. >>> >>> I can see there are use cases where multiple parties might want to >>> contribute metadata and I do not currently have a good solution to >>> that problem, but that is not to say there isn't one - surely it is >>> just a question of applying a little intellect creatively? >> >> Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem >> data/resource forks, and Microsoft's Alternate Data Streams in git? :-) >> > > No I am not. I don't see why a metadata proposal is any more exposed > to subversive payloads than say, use of git merge -s ours [ a > subversive payload could be made reachable from a commit that > otherwise merges in favour of the legitimate source - who would know? > ] > > Really, I can't see why the rationale that makes a single blob used > for extending a commit message justified can't be used to justify > associating a metadata tree of arbitrary complexity to an arbitrary > sha1 object. What makes maintaining a mapping to a single blob > acceptable but maintaining a mapping to a tree unacceptable? Is there > really any fundamental difference? > > jon. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 5:36 ` Jon Seymour 2010-02-07 9:15 ` Jakub Narebski @ 2010-02-07 19:33 ` Jeff King 2010-02-07 20:25 ` Junio C Hamano 1 sibling, 1 reply; 19+ messages in thread From: Jeff King @ 2010-02-07 19:33 UTC (permalink / raw) To: Jon Seymour; +Cc: Junio C Hamano, Johan Herland, git On Sun, Feb 07, 2010 at 04:36:59PM +1100, Jon Seymour wrote: > As I see it, the existing use of notes is a special instance of a more > general metadata capability in which the metadata is constrained to be > a single blob. If notes continued to be constrained in this way, there > is no reason to change anything with respect to its current userspace > behaviour. That said, most of the plumbing which enabled notes could > be generalized to enable the arbitrary tree case [ which admittedly, I > have yet to sell successfully !] I do agree that storing trees is a natural generalization of the current notes implementation. Callers have to be made aware that they may see trees, of course, but you could probably "demote" trees into their representative sha1s for callers who were interested only in a blob form. But what I am concerned with is that generalizing may violate some assumptions made about how notes work. Notes trees can re-balance themselves to some degree, I thought (though I am pretty out of the loop on current notes developments). So during merges we need to normalize tree representations (though we probably already need to do that for the blob case). We would also need to do some magic with rename detection during merges. You would probably want rename detection _within_ a tree stored as a note for a particular commit, but not between notes stored for different commits. Or perhaps you would not even want to do a tree-merge between notes at all, and would rather see a conflict if two people noted two different trees. This would make sense to me if you were doing something like noting a build setup. If I note that commit X builds with a tree pointing to version Y of the build tools, and you note that it builds with version Z of the build tools, what should happen when we merge our notes? I can imagine wanting a conflict, and resolving it to Y or Z (perhaps whichever is more desirable). I can also see resolving it to Y _and_ Z (iow, treating it like a list). But doing a merge on the two trees of build tools (which are presumably somewhat immutable) is probably not helpful. Which to me argues in favor of adding the extra level of indirection. The note should store the tree sha1, and those who want to treat it as a tree can do so. Rename and merge issues just go away, as they operate on the tree sha1 and not on the tree itself. And of course the representation is just an implementation detail; you could still make a "git metadata" wrapper to transparently store trees from the user's perspective. The only complication is that git doesn't know to follow those sha1s for reachability analysis. In some cases that won't matter (like Junio's html/man example), but I suspect in some it will. Perhaps there is some way to flag the note entry as "this stores a sha1 that should be followed by fsck, but not otherwise dereferenced". I dunno. That is all just thinking out loud. It would help if we had some really detailed concrete examples of notes being used in practice. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 19:33 ` Jeff King @ 2010-02-07 20:25 ` Junio C Hamano 2010-02-08 2:03 ` Steven E. Harris 2010-02-10 5:09 ` Jeff King 0 siblings, 2 replies; 19+ messages in thread From: Junio C Hamano @ 2010-02-07 20:25 UTC (permalink / raw) To: Jeff King; +Cc: Jon Seymour, Johan Herland, git Jeff King <peff@peff.net> writes: > Or perhaps you would not even want to do a tree-merge between notes at > all, and would rather see a conflict if two people noted two different > trees. I've been thinking about the merge issues, and am starting to suspect that we might want a merge strategy quite drastically different even for blob cases. That is one of the reasons why I don't want to see us muddy the issues by introducing even more complex "tree" case. Anybody working in the same project can start 'notes' tree with his or her own root. That is the normal use case for annotating commits for your own use. For merges inside the history of primary contents that people try to collaborate to advance, three-way merge pivoting on a common ancestor is a natural way to reach a satisfactory result. In notes namespace, on the other hand, the norm is to simply overlay the notes trees, adjusting for the fan-out. You annotated that commit I was not interested in, while I annotated this commit you weren't interested in. We have our notes in the end result, and both of us are happy. If we happen to have annotated the same commit without knowing what the other was doing, then there is no sane consolidation---in the most typical case, we would want to keep both, perhaps concatenating them together. Textual merge becomes the exception that triggers two "notes" histories happened to have forked from the same root somehow. And for that most typical use case, I suspect even the current "notes on any and all commits for a single purpose are thrown into a one _bag_ that is a notes tree, and the growth of that bag is made into a history" model captures sets of notes that is too wide. Suppose Alice, Bob and I are involved in a project, and we annotate commits for some shared purpose (say, tracking regressions). Alice and Bob may independently annotate overlapping set of commits (and hopefully they have shared root for their notes history as they are collaborating), and they may even be working together on the same issue, but I may not be involved in the area. What happens when I pull from Alice and Bob and get conflicts in notes they produced, especially the only reason I was interested was because they have new things to say about commits that I am interested in? You can end up with conflicts in areas you are not familiar with but Alice and Bob are in charge of even in the primary content space, but there is a fundamental difference of this type of conflict in the notes space, I think. The set of contents in the primary content space are supposed to make a consistent whole, and there is a topic branch workflow to partition the work to allow me to easily kick the merge back to them (i.e. I can tell Alice and Bob to resolve the conflicts between themselves and trust that what they do between them do not touch outside of their area) without getting blocked. I don't see a clear workflow to resolve this in the notes space, especially with the set of operations the current "git notes" (and obvious and straightforward enhancements of what it does). At least not yet. It's like "keeping track of /etc" (or "your home directory"). It is a misguided thing to do because you are throwing in records of the states of totally unrelated things into a single history (e.g. "Why does it matter I added new user frotz to /etc/passwd before I futzed with my sendmail configuration? ---It shouldn't matter; there shouldn't be ancestry relationships between these two changes"). I somehow feel that keeping track of the "growth of the bag of annotations to any and all commits" in a single history may be making the same mistake. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 20:25 ` Junio C Hamano @ 2010-02-08 2:03 ` Steven E. Harris 2010-02-10 5:09 ` Jeff King 1 sibling, 0 replies; 19+ messages in thread From: Steven E. Harris @ 2010-02-08 2:03 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > It's like "keeping track of /etc" (or "your home directory"). It is a > misguided thing to do because you are throwing in records of the > states of totally unrelated things into a single history. I've recently tried doing this again with Git, so this comment piqued my interest. (That is, tracking changes to my various configuration files.) I agree that browsing the history in toto is jarring, though the history of a particular file may be telling. Is there an alternative you'd recommend? -- Steven E. Harris ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 20:25 ` Junio C Hamano 2010-02-08 2:03 ` Steven E. Harris @ 2010-02-10 5:09 ` Jeff King 2010-02-10 5:23 ` Junio C Hamano 1 sibling, 1 reply; 19+ messages in thread From: Jeff King @ 2010-02-10 5:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jon Seymour, Johan Herland, git On Sun, Feb 07, 2010 at 12:25:13PM -0800, Junio C Hamano wrote: > Suppose Alice, Bob and I are involved in a project, and we annotate > commits for some shared purpose (say, tracking regressions). Alice and > Bob may independently annotate overlapping set of commits (and hopefully > they have shared root for their notes history as they are collaborating), > and they may even be working together on the same issue, but I may not be > involved in the area. What happens when I pull from Alice and Bob and get > conflicts in notes they produced, especially the only reason I was > interested was because they have new things to say about commits that I am > interested in? Hmm. OK, I see the point of Jakub's message a bit more now. You want to create a new view, inconsistent with that of either Alice or Bob (that is, you have taken snippets of each's state, but you cannot in good faith represent this as a history merge, because your state should not supersede either of theirs). The standard way to do such a thing in git is to create a new, alternate history through cherry-picking or rebasing. So I suspect we could do something like: 1. git notes pull alice We fast-forward (or do the trivial merge) with Alice's work. 2. git notes pull --ignore-conflicts bob We try to merge Bob's work and see that there are conflicts. So we iterate through refs/notes..bob/notes, cherry-picking each one that applies cleanly and ignoring the rest. And then you're at a state inconsistent with Bob, and a superset of what Alice has. And that's what your history represents, too: you've branched but done some of the same things as Bob. At that point you can examine your inconsistent state, and then when you're done, you can either: 3a. Reset back to your pre-ignore-conflicts state. 3b. Leave it. When you pull from Bob later, your shared changes will be ignored[1], and you will get the conflicts that you ignored earlier. It is perhaps a hacky band-aid to handle notes this way, but it is the "most git" way of doing it. That is, it uses our standard tools and practices. And when all you have is a hammer... :) And I really expect the "I am collaborating with these people, but I want an inconsistent view of their history" to be the exception. Most people would _want_ to resolve the conflicts (especially if there is a --cat-conflicts option to do it automatically) in a collaboration scenario. -Peff [1] Actually because history has diverged, you have the usual cherry pick problems with merging later. If some note is at state A, then I cherry-pick Bob's change to B, then Bob changes it to C and I try to merge with him, from the 3-way merge's perspective we have a conflict, because nothing in the history says that Bob's change to C meant to supersede my cherry-picked version of his history. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-10 5:09 ` Jeff King @ 2010-02-10 5:23 ` Junio C Hamano 2010-02-10 5:29 ` Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2010-02-10 5:23 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Jon Seymour, Johan Herland, git Jeff King <peff@peff.net> writes: > On Sun, Feb 07, 2010 at 12:25:13PM -0800, Junio C Hamano wrote: > >> Suppose Alice, Bob and I are involved in a project, and we annotate >> commits for some shared purpose (say, tracking regressions). Alice and >> Bob may independently annotate overlapping set of commits (and hopefully >> they have shared root for their notes history as they are collaborating), >> and they may even be working together on the same issue, but I may not be >> involved in the area. What happens when I pull from Alice and Bob and get >> conflicts in notes they produced, especially the only reason I was >> interested was because they have new things to say about commits that I am >> interested in? > > Hmm. OK, I see the point of Jakub's message a bit more now. You want to > create a new view, inconsistent with that of either Alice or Bob (that > is, you have taken snippets of each's state, but you cannot in good > faith represent this as a history merge, because your state should not > supersede either of theirs). In the message you are quoting, I am not interested in creating a narrowed view. If I cannot resolve conflicts between Alice and Bob in a merge in the contents space, I would ask either of them (because they are more familiar with the area) to do the merge. I however was unsure if asking the same for merges in the notes space is a reasonable thing to do. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-10 5:23 ` Junio C Hamano @ 2010-02-10 5:29 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2010-02-10 5:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jon Seymour, Johan Herland, git On Tue, Feb 09, 2010 at 09:23:12PM -0800, Junio C Hamano wrote: > > Hmm. OK, I see the point of Jakub's message a bit more now. You want to > > create a new view, inconsistent with that of either Alice or Bob (that > > is, you have taken snippets of each's state, but you cannot in good > > faith represent this as a history merge, because your state should not > > supersede either of theirs). > > In the message you are quoting, I am not interested in creating a narrowed > view. If I cannot resolve conflicts between Alice and Bob in a merge in > the contents space, I would ask either of them (because they are more > familiar with the area) to do the merge. I however was unsure if asking > the same for merges in the notes space is a reasonable thing to do. No, I don't see a problem with asking them to do it. If you are all collaborating as a group, it is something they will need to do eventually anyway. If they are not, and you are an intermediary, you are eventually going to share Alice's history with Bob and vice versa. So you pull from Alice, then say to Bob: "I have some history but I'm not sure of the correct merge. Pull from me and merge please". The only real problem is if you _never_ want to share the history between the two of them. In that case, I think you should keep two parallel branches of history (refs/notes/alice and refs/notes/bob), and then squash the trees at run-time (either concatenating them, or favoring one over the other in the case of conflicts). -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 5:02 ` Jeff King 2010-02-07 5:36 ` Jon Seymour @ 2010-02-07 18:48 ` Junio C Hamano 2010-02-07 19:18 ` Jeff King 2010-02-07 22:46 ` Johan Herland 2 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2010-02-07 18:48 UTC (permalink / raw) To: Jeff King; +Cc: Johan Herland, Jon Seymour, git Jeff King <peff@peff.net> writes: > ... I think you would do better > to simply store a tree sha1 inside the note blob, and callers who were > interested in the tree contents could then dereference it and examine as > they saw fit. The only caveat is that you need some way of telling git > that the referenced trees are reachable and not to be pruned. Thanks for a good summary. To paraphrase the idea, for the "pre-built binaries" use case, I could update the dodoc.sh script (in 'todo'---that is what autobuilds the html and man documentation and updates the corresponding branches at k.org when I push things out to the master branch) to add a note to the commit from 'master' the docs are generated from, and the note would say which commits on html and man branches correspond to that commit. That way, the referenced "trees" are of course protected because they are reachable from html/man refs. Right? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 18:48 ` Junio C Hamano @ 2010-02-07 19:18 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2010-02-07 19:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johan Herland, Jon Seymour, git On Sun, Feb 07, 2010 at 10:48:58AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > ... I think you would do better > > to simply store a tree sha1 inside the note blob, and callers who were > > interested in the tree contents could then dereference it and examine as > > they saw fit. The only caveat is that you need some way of telling git > > that the referenced trees are reachable and not to be pruned. > > Thanks for a good summary. To paraphrase the idea, for the "pre-built > binaries" use case, I could update the dodoc.sh script (in 'todo'---that > is what autobuilds the html and man documentation and updates the > corresponding branches at k.org when I push things out to the master > branch) to add a note to the commit from 'master' the docs are generated > from, and the note would say which commits on html and man branches > correspond to that commit. That way, the referenced "trees" are of course > protected because they are reachable from html/man refs. > > Right? Yeah, I think that would work fine. I guess there are cases, though, where somebody might not be keeping a linear history of noted trees in a separate ref (the way you keep html/man refs). In which case they would have to deal with the reachability problem separately. I can't think of an example off the top of my head, though. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 5:02 ` Jeff King 2010-02-07 5:36 ` Jon Seymour 2010-02-07 18:48 ` Junio C Hamano @ 2010-02-07 22:46 ` Johan Herland 2 siblings, 0 replies; 19+ messages in thread From: Johan Herland @ 2010-02-07 22:46 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Jon Seymour, git On Sunday 07 February 2010, Jeff King wrote: > I think I may have been the one to suggest trees or notes at one point. > But let me clarify that this is not exactly what the OP is proposing in > this thread. > > My suggestion was that some use cases may have many key/value pairs of > notes for a single sha1. We basically have two options: > > 1. store each in a separate notes ref, with each sha1 mapping to > a blob. The note "name" is the name of the ref. > > 2. store notes in a single notes ref, with each sha1 mapping to a > tree with named sub-notes. The note "name" is the combination of > ref-name and tree entry name. > > The advantage of (1) is that notes are not bound tightly to each other. > I can distribute the notes tree for one "name" independent of the > others. The advantage of (2) is that it is faster and smaller. In (1), > each note has a separate index, and we must traverse each note index > separately. > > In practice, I would expect to use (1) for logically separate datasets. > For example, automatic bug-tracking notes would go in a different ref > from human annotations. But I would expect to use (2) if I had, say, 5 > different pieces of bug tracking information and I wanted an easy way to > refer to them individually. > > And a specialized merge for that is straightforward. In the simplest > case, you simply say "notes of this ref are tree-type, or they are > blob-type" and then you have no merge problems. But if you want to get > fancy, you can say that a conflict between "sha1/blob" and > "sha1/tree/key" should automatically "promote" the first one into > "sha1/tree/default" or some other canonical name. > > Note that all of this is my pie-in-the-sky "here is what I was thinking > of when I looked at notes a long time ago". I don't care strongly if it > gets implemented or not at this point; I just wanted to add some context > to what Johan had in his notes todo list (or maybe I am wrong, and what > is in his todo list was based on something totally different said by > somebody else, and I have just confused the issue more. :) ). No, My TODO item was indeed based on your suggestion (although poorly represented by me, both in the TODO list, and in my original answer to Jon). However, note that I don't feel this specific itch myself, so I'm unlikely to scratch it. > With respect to the idea of storing an arbitrary tree, I agree it is > probably too complex with respect to merging. In addition, it makes > things like "git log --format=%N" confusing. I think you would do better > to simply store a tree sha1 inside the note blob, and callers who were > interested in the tree contents could then dereference it and examine as > they saw fit. The only caveat is that you need some way of telling git > that the referenced trees are reachable and not to be pruned. Agreed. Arbitrary trees as notes objects is probably not a good idea. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 1:36 ` Johan Herland 2010-02-07 2:21 ` Junio C Hamano @ 2010-02-07 3:27 ` Jon Seymour 2010-02-07 4:32 ` Jon Seymour 1 sibling, 1 reply; 19+ messages in thread From: Jon Seymour @ 2010-02-07 3:27 UTC (permalink / raw) To: Johan Herland; +Cc: git, Junio C Hamano > I still don't see why this provides anything that isn't already supported by > either using 'git tag', or by implementing support for notes-as-trees in the > notes feature. > The intent of the metadata facility is to associate derivatives of sha1 with the sha1 itself. If I have calculated a derivative of sha1 in the past, then let me reference that derivative using a metadata path which I can look up knowing only the sha1 of the input and nothing more. Yes, I could create tags of the form ${sha1}/metadata-path for all my derived results but really, this seems an abuse of the tag facility. Here's another motivating example: Suppose git-svn wrote the SVN id it was synched with into structured metadata associated with a commit, instead of into the commit message, the equivalent of: echo ${svn-id} | git metadata write-blob ${sha1} svn-id Which means: for the specified sha1, read a blob from stdin and create a metadata item with a metadata path called svn-id To get it out again, you would write: git metadata read-blob ${sha1} svn-id Which says, for the given object ${sha1}, read the blob from the metadata tree at path svn-id and write its contents to stdout. This would avoid cluttering the commit message with the svn-id, avoid cluttering the tag space with the info and allow any commit to be tagged in this way. Admittedly similar function could be achieved a little more clumsily now with appropriate use of GIT_NOTES_REF or with note subtrees, but I share Junio's reservations about trying to generalize notes from blobs to trees, given way notes are currently used by the rest of infrastructure. jon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: A generalization of git notes from blobs to trees - git metadata? 2010-02-07 3:27 ` Jon Seymour @ 2010-02-07 4:32 ` Jon Seymour 0 siblings, 0 replies; 19+ messages in thread From: Jon Seymour @ 2010-02-07 4:32 UTC (permalink / raw) To: Johan Herland; +Cc: git, Junio C Hamano Another use case could be to store the contents of the man and html trees of git, which are currently published as separate branches. With the metadata concept, the man and html trees for each release could be stored as metadata paths (/man, /html) of the associated commit for each release, providing a trivial way to address and access these trees. jon. On Sun, Feb 7, 2010 at 2:27 PM, Jon Seymour <jon.seymour@gmail.com> wrote: >> I still don't see why this provides anything that isn't already supported by >> either using 'git tag', or by implementing support for notes-as-trees in the >> notes feature. >> > > The intent of the metadata facility is to associate derivatives of > sha1 with the sha1 itself. If I have calculated a derivative of sha1 > in the past, then let me reference that derivative using a metadata > path which I can look up knowing only the sha1 of the input and > nothing more. Yes, I could create tags of the form > ${sha1}/metadata-path for all my derived results but really, this > seems an abuse of the tag facility. > > Here's another motivating example: > > Suppose git-svn wrote the SVN id it was synched with into structured > metadata associated with a commit, instead of into the commit message, > the equivalent of: > > echo ${svn-id} | git metadata write-blob ${sha1} svn-id > > Which means: for the specified sha1, read a blob from stdin and create > a metadata item with a metadata path called svn-id > > To get it out again, you would write: > > git metadata read-blob ${sha1} svn-id > > Which says, for the given object ${sha1}, read the blob from the > metadata tree at path svn-id and write its contents to stdout. > > This would avoid cluttering the commit message with the svn-id, avoid > cluttering the tag space with the info and allow any commit to be > tagged in this way. > > Admittedly similar function could be achieved a little more clumsily > now with appropriate use of GIT_NOTES_REF or with note subtrees, but I > share Junio's reservations about trying to generalize notes from > blobs to trees, given way notes are currently used by the rest of > infrastructure. > > jon. > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-02-10 5:30 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-06 13:32 A generalization of git notes from blobs to trees - git metadata? Jon Seymour 2010-02-07 1:36 ` Johan Herland 2010-02-07 2:21 ` Junio C Hamano 2010-02-07 5:02 ` Jeff King 2010-02-07 5:36 ` Jon Seymour 2010-02-07 9:15 ` Jakub Narebski 2010-02-07 9:41 ` Jon Seymour 2010-02-07 10:15 ` Jon Seymour 2010-02-07 19:33 ` Jeff King 2010-02-07 20:25 ` Junio C Hamano 2010-02-08 2:03 ` Steven E. Harris 2010-02-10 5:09 ` Jeff King 2010-02-10 5:23 ` Junio C Hamano 2010-02-10 5:29 ` Jeff King 2010-02-07 18:48 ` Junio C Hamano 2010-02-07 19:18 ` Jeff King 2010-02-07 22:46 ` Johan Herland 2010-02-07 3:27 ` Jon Seymour 2010-02-07 4:32 ` Jon Seymour
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).