* Storing additional information in commit headers @ 2011-08-01 18:20 martin f krafft 2011-08-01 18:27 ` Sverre Rabbelier ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: martin f krafft @ 2011-08-01 18:20 UTC (permalink / raw) To: git discussion list; +Cc: Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 1997 bytes --] Dear list, I've read — with great interest — the recent discussion on generation numbers[0], mostly because Clemens Buchacher pointed me to it as a warning not to mess with commit objects. 0. http://comments.gmane.org/gmane.comp.version-control.git/177146 My intent was to add an extra commit header to select commits as a way to store extra information needed to automate the management of interdependent branches and patch generation à la TopGit. Having read the generation numbers debate, I am not sure that adding additional commit headers is a bad idea per se. From what I understand, the main pushback to Linus' idea was that people did not feel it right to store redundant, calculateable information permanently in commit objects, where they cannot be altered anymore, despite the non-zero chance of there being an error. Instead, the use of a cache was advocated. I do not want to take a side in this debate with this mail of mine. Instead, I am investigating ways in which I can store additional information for a branch, and ideally in a way to make it transparent and automatic for all users of a project's repo. Hence, if I were to store additional information in the commit object headers, this information would by design be correct, immutable, and non-redundant. I am going to reply to my own mail with some implementation details to feed the curious, with the hope to keep this debate focused. Are there any strong reasons against my use of commit headers for specific, well-defined purposes in contained use-cases? E.g. are there tools known to only copy "known" headers, which could potentially break my assumptions? Thanks, -- martin | http://madduck.net/ | http://two.sentenc.es/ "when a gentoo admin tells me that the KISS principle is good for 'busy sysadmins', and that it's not an evolutionary step backwards, i wonder whether their tape is already running backwards." spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:20 Storing additional information in commit headers martin f krafft @ 2011-08-01 18:27 ` Sverre Rabbelier 2011-08-01 18:34 ` martin f krafft 2011-08-01 18:28 ` martin f krafft ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Sverre Rabbelier @ 2011-08-01 18:27 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher Heya, On Mon, Aug 1, 2011 at 20:20, martin f krafft <madduck@madduck.net> wrote: > My intent was to add an extra commit header to select commits as > a way to store extra information needed to automate the management > of interdependent branches and patch generation à la TopGit. Have you had a look at git notes? -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:27 ` Sverre Rabbelier @ 2011-08-01 18:34 ` martin f krafft 2011-08-01 20:01 ` Clemens Buchacher 0 siblings, 1 reply; 23+ messages in thread From: martin f krafft @ 2011-08-01 18:34 UTC (permalink / raw) To: Sverre Rabbelier, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 1488 bytes --] also sprach Sverre Rabbelier <srabbelier@gmail.com> [2011.08.01.2027 +0200]: > On Mon, Aug 1, 2011 at 20:20, martin f krafft <madduck@madduck.net> wrote: > > My intent was to add an extra commit header to select commits as > > a way to store extra information needed to automate the management > > of interdependent branches and patch generation à la TopGit. > > Have you had a look at git notes? Hello, and thanks for taking the time to reply to me! Yes, I have considered git-notes. The issue I have with git-notes is that it requires every contributor to set up refspecs for fetch and push, or else the notes will not be exchanged/shared. I realise this is a minor concern to most of you, or maybe even a feature (part of the beauty of Git is, after all, that it works without requiring everyone to have the same local setup), but in our use-case (distro packaging), it's a relatively large burden to new contributors and passerby's (sp?). Also, git-notes are mutable (at least from the UI perspectiv) and I strive to encode information immutably. Therefore I am looking for a means to encode this (necessary) information as part of the main DAG (i.e. not polluting the worktree). I hope this makes sense. -- martin | http://madduck.net/ | http://two.sentenc.es/ "first get your facts; then you can distort them at your leisure." -- mark twain spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:34 ` martin f krafft @ 2011-08-01 20:01 ` Clemens Buchacher 2011-08-01 20:55 ` martin f krafft 0 siblings, 1 reply; 23+ messages in thread From: Clemens Buchacher @ 2011-08-01 20:01 UTC (permalink / raw) To: martin f krafft; +Cc: Sverre Rabbelier, git discussion list, Petr Baudis On Mon, Aug 01, 2011 at 08:34:11PM +0200, martin f krafft wrote: > > Yes, I have considered git-notes. The issue I have with git-notes is > that it requires every contributor to set up refspecs for fetch and > push, or else the notes will not be exchanged/shared. Notes are tracked using a 'branch' too. It's just a branch in the refs/notes namespace, the notes ref. You could simply tag your notes ref or point a ref from the refs/heads namespace to it each time you create new notes. > Also, git-notes are mutable (at least from the UI perspectiv) and > I strive to encode information immutably. Notes are also used by textconv, for example, to cache immutable data. It's not likely a user will end up editing it by accident unless you use the default notes ref. Clemens ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 20:01 ` Clemens Buchacher @ 2011-08-01 20:55 ` martin f krafft 0 siblings, 0 replies; 23+ messages in thread From: martin f krafft @ 2011-08-01 20:55 UTC (permalink / raw) To: Clemens Buchacher, Sverre Rabbelier, git discussion list, Petr Baudis [-- Attachment #1: Type: text/plain, Size: 1196 bytes --] also sprach Clemens Buchacher <drizzd@aon.at> [2011.08.01.2201 +0200]: > Notes are tracked using a 'branch' too. It's just a branch in the > refs/notes namespace, the notes ref. You could simply tag your > notes ref or point a ref from the refs/heads namespace to it each > time you create new notes. Hi Clemens, thanks for responding! You suggest integrating refs/notes/foo into refs/heads by means of a pointer… at which point we are polluting the branch history space again (think gitk), no? I appreciate the simplicity of this idea of yours, which I had not thought of. Indeed, maintaining a head at the top of refs/notes/topgit-metadata (or whatever) has charm. I do not mean to discard it at all right now, and will think about this more! git-notes was designed to be used for such cases, I was pleased to note the configurability. Maybe it is the ticket. Still: why not commit headers? -- martin | http://madduck.net/ | http://two.sentenc.es/ ... with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea. -- douglas adams, "the hitchhiker's guide to the galaxy" spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:20 Storing additional information in commit headers martin f krafft 2011-08-01 18:27 ` Sverre Rabbelier @ 2011-08-01 18:28 ` martin f krafft 2011-08-01 19:33 ` Martin Langhoff 2011-08-01 20:13 ` Jeff King 2011-08-02 13:53 ` Michael Haggerty 3 siblings, 1 reply; 23+ messages in thread From: martin f krafft @ 2011-08-01 18:28 UTC (permalink / raw) To: git discussion list; +Cc: Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 1950 bytes --] also sprach martin f krafft <madduck@madduck.net> [2011.08.01.2020 +0200]: > Hence, if I were to store additional information in the commit > object headers, this information would by design be correct, > immutable, and non-redundant. I am going to reply to my own mail > with some implementation details to feed the curious, with the hope > to keep this debate focused. For lack of a better idea (cf. [0]), I am currently toying with the following approach: Possibly in addition to the orphan parent pointer to a commit object suggested in [0], and in order to provide a clear means to identify said orphan parent pointer (holding additional information), I am considering storing this orphan parent commit's ref in the main commit, using a header like x-topgit-top-base [1]. 0. http://permalink.gmane.org/gmane.comp.version-control.git/178349 1. The use of the x- prefix is obviously intentional to suggest that this is a free-form, non-standard extension. Whenever the extra data need changing, a new x-topgit-top-base ref is added to HEAD. Now, given a commitish, I simply have to walk back in time until I find a commit object with such a header, and I have the most recent metadata at my fingertips. Instead of a ref to the orphan parent commit (which visibily pollutes the history), I could also just store the information right there. This is arguably hackish, but unless I find a better way, it's the best I've come up with thus far. And of course, this could go into the commit message body text, but it being an implementation detail, that's really not the right place for it. Thanks for your consideration, -- martin | http://madduck.net/ | http://two.sentenc.es/ "there are two major products that come out of berkeley: lsd and unix." one caused me an addiction -- fyodor spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:28 ` martin f krafft @ 2011-08-01 19:33 ` Martin Langhoff 2011-08-01 20:51 ` martin f krafft 0 siblings, 1 reply; 23+ messages in thread From: Martin Langhoff @ 2011-08-01 19:33 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Mon, Aug 1, 2011 at 2:28 PM, martin f krafft <madduck@madduck.net> wrote: > For lack of a better idea (cf. [0]), I am currently toying with the > following approach: Hi Martin! What data are you trying to include? Some time ago, I had similar ideas to yours for a while... and it ended up being that all I needed was to put the additional data /in a file/ and commit that file. Speculation: you mention distro packaging, so I assume you're improving the Debian packaging integration, with git tracking debian/rules, perhaps with a wrapper. If you are using a wrapper program, it is trivial to update this "metadata" file, or to ensure it's valid/sane, in the preparations to commit, perhaps ensuring that a pre-commit-hook script is in place and executable. hth, m -- martin.langhoff@gmail.com martin@laptop.org -- Software Architect - OLPC - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 19:33 ` Martin Langhoff @ 2011-08-01 20:51 ` martin f krafft 0 siblings, 0 replies; 23+ messages in thread From: martin f krafft @ 2011-08-01 20:51 UTC (permalink / raw) To: Martin Langhoff, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 1583 bytes --] also sprach Martin Langhoff <martin.langhoff@gmail.com> [2011.08.01.2133 +0200]: > What data are you trying to include? Some time ago, I had similar > ideas to yours for a while... and it ended up being that all I needed > was to put the additional data /in a file/ and commit that file. Hi, thanks for taking the time to reply to me! I am trying to store the top-base of a TopGit branch, which is the merge of all a branch's dependencies. TopGit uses refs for that, but a ref can only ever point at one such merge, and so it's hard-to-impossible to reconstruct a branch dependency in the past. TopGit does use files in the worktree too. I would love to get rid of this as well, since a file like .topmsg (which differs between all branches, even related ones), requires to always remember to use the 'ours' merge driver, which requires setup, which makes it harder to use. > If you are using a wrapper program, I am trying to stay as close as possible to plain Git. All of this could easily be done by a wrapper, but a wrapper always makes too many assumptions to become a viable standard for Debian packaging. > it's valid/sane, in the preparations to commit, perhaps ensuring > that a pre-commit-hook script is in place and executable. Again, that requires setup, which increases the barrier of entry to passerby's and new contributors. -- martin | http://madduck.net/ | http://two.sentenc.es/ "verbing weirds language." -- calvin spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:20 Storing additional information in commit headers martin f krafft 2011-08-01 18:27 ` Sverre Rabbelier 2011-08-01 18:28 ` martin f krafft @ 2011-08-01 20:13 ` Jeff King 2011-08-01 21:11 ` martin f krafft 2011-08-02 13:53 ` Michael Haggerty 3 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2011-08-01 20:13 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Mon, Aug 01, 2011 at 08:20:15PM +0200, martin f krafft wrote: > Instead, I am investigating ways in which I can store additional > information for a branch, and ideally in a way to make it > transparent and automatic for all users of a project's repo. > > Hence, if I were to store additional information in the commit > object headers, this information would by design be correct, > immutable, and non-redundant. I am going to reply to my own mail > with some implementation details to feed the curious, with the hope > to keep this debate focused. > > Are there any strong reasons against my use of commit headers for > specific, well-defined purposes in contained use-cases? E.g. are > there tools known to only copy "known" headers, which could > potentially break my assumptions? This topic has come up several times in the past few years. I think some of the relevant questions to consider about your new data are: 1. Does git actually care about your data? E.g., would it want to use it for reachability analysis in git-fsck? 2. Is it an immutable property of a commit, or can it be changed after the fact? If (2) is no, then git-notes is probably the best choice. Otherwise, if (1) is yes, then a commit header makes sense. But then, it should also be something that git is taught about, and your commit header should not be some topgit-specific thing, but a header showing the generalized form. Otherwise, the usual recommendation is to use a pseudo-header within the body of the commit message (i.e., "Topgit-Base: ..." at the end of the commit message). The upside is that it's easy to create, manipulate, and examine using existing git tools. The downside is that it is something that the user is more likely to see in "git log" or when editing a rebased commit message. Just about every discussion on this topic ends with the pseudo-header recommendation. The only exceptions AFAIK are "encoding" (which git itself needs to care about), and "generation" (which, as you noted, raises other questions). -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 20:13 ` Jeff King @ 2011-08-01 21:11 ` martin f krafft 2011-08-02 3:50 ` Jeff King 0 siblings, 1 reply; 23+ messages in thread From: martin f krafft @ 2011-08-01 21:11 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 3716 bytes --] also sprach Jeff King <peff@peff.net> [2011.08.01.2213 +0200]: > This topic has come up several times in the past few years. I am sorry that I am bothering the list again. I tried hard to find whatever I could, but after 2–3 hours of web searching, I came here… Thank you for taking the time to answer! > I think some > of the relevant questions to consider about your new data are: > > 1. Does git actually care about your data? E.g., would it want to use > it for reachability analysis in git-fsck? > > 2. Is it an immutable property of a commit, or can it be changed after > the fact? Excellent points, and I have answers to both: 1. Ideally, I would like to point to another blob containing information. Right now, in order to prevent gc from pruning it, that would have to be a commit pointed to with a parent pointer, which is just not right (it's not a parent) and causes the commit to show up in the history (which it should not, as it's an implementation detail). I'll return to this point further down… 2. It is immutable. Ideally, I would like to store extra information for a ref in ref/heads/*, but there seems to be no way of doing this. Hence, I need to store it in commits and backtrack for it. Or so I think, at least… > Otherwise, if (1) is yes, then a commit header makes sense. But > then, it should also be something that git is taught about, and > your commit header should not be some topgit-specific thing, but > a header showing the generalized form. I agree entirely and would be all too excited to see this happening. I already had ideas too: In addition to the standard tree and parent pointers, there could be *-ref and x-*-ref headers, which take a single ref argument, presumably to a blob containing more data. While I cannot conceive a *-ref example, I think it's obvious that x-*-ref should be introduced at the same time to keep the *-ref namespace clear for future, "official" Git use. In terms of gc and fsck and the like, all *-ref and x-*-ref headers would contribute to reachability tests and hence prevent pruning of those blobs. > Otherwise, the usual recommendation is to use a pseudo-header > within the body of the commit message (i.e., "Topgit-Base: ..." at > the end of the commit message). The upside is that it's easy to > create, manipulate, and examine using existing git tools. The > downside is that it is something that the user is more likely to > see in "git log" or when editing a rebased commit message. … to see *and to accidentally mess up*. And while that may even be unlikely, it does expose information that really ought to be hidden. > Just about every discussion on this topic ends with the > pseudo-header recommendation. The only exceptions AFAIK are > "encoding" (which git itself needs to care about), and > "generation" (which, as you noted, raises other questions). I can see how it's arguable too why one would want to give git commit objects the ability to reference arbitrary blobs containing additional information. I suppose the answer to this question is related to the answer to the question of whether Git is a contained/complete tool as-is, or also serves as a "framework"/"toolkit" for advanced/creative use. The availability of the porcelain commands seems to suggest that extensible/flexible additional features should be welcome! ;) -- martin; (greetings from the heart of the sun.) \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck http://www.transnationalrepublic.org/ spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 21:11 ` martin f krafft @ 2011-08-02 3:50 ` Jeff King 2011-08-02 8:28 ` martin f krafft 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2011-08-02 3:50 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Mon, Aug 01, 2011 at 11:11:04PM +0200, martin f krafft wrote: > > 1. Does git actually care about your data? E.g., would it want to use > > it for reachability analysis in git-fsck? > > > > 2. Is it an immutable property of a commit, or can it be changed after > > the fact? > > Excellent points, and I have answers to both: > > 1. Ideally, I would like to point to another blob containing > information. Right now, in order to prevent gc from pruning > it, that would have to be a commit pointed to with a parent > pointer, which is just not right (it's not a parent) and causes > the commit to show up in the history (which it should not, as > it's an implementation detail). In that case, notes sound like a nice solution, as that is exactly what they do. Yes, they are mutable, but that might not be that big a deal. > 2. It is immutable. Ideally, I would like to store extra > information for a ref in ref/heads/*, but there seems to be no > way of doing this. Hence, I need to store it in commits and > backtrack for it. Or so I think, at least… Wait, so you want metadata on a _ref_, not on a commit? That is a very different thing, I think. We usually accomplish that with data in .git/config. Or if you need to push data between repos, or if it's too big to easily fit in the config, then put it in a blob and keep a parallel ref structure (e.g., refs/topgit/bases/refs/heads/master). Or maybe I'm just misunderstanding. > > Otherwise, if (1) is yes, then a commit header makes sense. But > > then, it should also be something that git is taught about, and > > your commit header should not be some topgit-specific thing, but > > a header showing the generalized form. > > I agree entirely and would be all too excited to see this happening. > I already had ideas too: > > In addition to the standard tree and parent pointers, there could > be *-ref and x-*-ref headers, which take a single ref argument, > presumably to a blob containing more data. I'm not sure how well-defined that is, though. What does the ref mean? What does it point to, and what is the meaning with respect to the original commit? Or are you suggesting that "*" would be "topgit-base" here, and that git core would understand only that any header matching the pattern "x-*-ref" should be followed with respect to reachability/pruning. Only the owner of the "*" part (topgit in this case) would be able to make sense of the meaning of the ref. If that is the case, that does make sense to me. It's basically an immutable version of a note. However, implementing such a thing would mean you have an awkward transition period where some versions of git think the referenced object is relevant, and others do not. That's something we can overcome, but it's going to require code in git, and possibly a dormant introduction period. I suspect you would give git people more warm fuzzies about implementing this by showing a system that is built on git-notes and saying "this works really well, except that the external note storage is not a good reason because { it's mutable, it's not efficient, whatever other reason you find}". And then we know that the system is proven to work, and that migrating the note-like structure into the object is sensible. But I get the impression you're one step back from that now. So it makes sense to me to at least prototype it via git-notes, which will give you the same semantic storage (a mapping of commits to some blobs, with reachability handled automatically). > > Otherwise, the usual recommendation is to use a pseudo-header > > within the body of the commit message (i.e., "Topgit-Base: ..." at > > the end of the commit message). The upside is that it's easy to > > create, manipulate, and examine using existing git tools. The > > downside is that it is something that the user is more likely to > > see in "git log" or when editing a rebased commit message. > > … to see *and to accidentally mess up*. And while that may even be > unlikely, it does expose information that really ought to be hidden. I'm not quite sure what the information is, so I can't really judge. Do you have a concrete example? I got the impression earlier you were wanting to store a human-readable text string. That makes a pseudo-header a reasonable choice. But if you are going to reference some blob (which it seems from what you wrote above), and you are interested in proper reachability analysis, then no, it probably isn't a good idea. > I can see how it's arguable too why one would want to give git > commit objects the ability to reference arbitrary blobs containing > additional information. I suppose the answer to this question is > related to the answer to the question of whether Git is > a contained/complete tool as-is, or also serves as > a "framework"/"toolkit" for advanced/creative use. > > The availability of the porcelain commands seems to suggest that > extensible/flexible additional features should be welcome! ;) I think extensibility is welcome. It's just that most discussions so far have ended up realizing that a new header would just be cruft. Maybe yours is different. I'm still not 100% sure I understand what you want to accomplish, but the idea of an x-*-ref header is a reasonable thing for git to have. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-02 3:50 ` Jeff King @ 2011-08-02 8:28 ` martin f krafft 2011-08-02 15:03 ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft 2011-08-02 18:51 ` Storing additional information in commit headers Jeff King 0 siblings, 2 replies; 23+ messages in thread From: martin f krafft @ 2011-08-02 8:28 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 9090 bytes --] also sprach Jeff King <peff@peff.net> [2011.08.02.0550 +0200]: > > 2. It is immutable. Ideally, I would like to store extra > > information for a ref in ref/heads/*, but there seems to be no > > way of doing this. Hence, I need to store it in commits and > > backtrack for it. Or so I think, at least… > > Wait, so you want metadata on a _ref_, not on a commit? That is a very > different thing, I think. We usually accomplish that with data in > .git/config. Or if you need to push data between repos, or if it's too > big to easily fit in the config, then put it in a blob and keep a > parallel ref structure (e.g., refs/topgit/bases/refs/heads/master). > > Or maybe I'm just misunderstanding. You nailed it perfectly well. Thank you for taking the time again to reply to me. TopGit does what you suggest (a parallel ref structure), but there are three problems with this, which I am trying to address: 1. you need to ensure that these refs are pushed and fetched, which requires set up and possible migration issues when things change, and can cause big problems for contributors who just so happened to forget. 2. the additional refs confuse people a lot — and I can attest to that because I have also at times found myself overwhelmed by them when staring at gitk. 3. once a ref updates, we need to keep a pointer to the previous location, since one of the goals is the ability to be able to return to a point in history (e.g. for security updates to a stable package, or backports). Additional refs enhance the aforementioned two problems. Therefore I thought it would be sensible to store these data in commit. When the data change, there will always be a new commit to store these data, and we do *not* want to update the data in previous commits. Finding the data then becomes backtracking the branch history until a commit is found containing them. > > In addition to the standard tree and parent pointers, there could > > be *-ref and x-*-ref headers, which take a single ref argument, > > presumably to a blob containing more data. > > I'm not sure how well-defined that is, though. What does the ref > mean? What does it point to, and what is the meaning with respect > to the original commit? Or are you suggesting that "*" would be > "topgit-base" here, and that git core would understand only that > any header matching the pattern "x-*-ref" should be followed with > respect to reachability/pruning. Only the owner of the "*" part > (topgit in this case) would be able to make sense of the meaning > of the ref. Exactly the latter. Sorry for my unannounced use of wildcards in this context. > If that is the case, that does make sense to me. It's basically an > immutable version of a note. > > However, implementing such a thing would mean you have an awkward > transition period where some versions of git think the referenced > object is relevant, and others do not. That's something we can > overcome, but it's going to require code in git, and possibly > a dormant introduction period. Indeed. This could be adressed by letting a tool like TopGit require a minimum version of Git. For a while, this will burden developers, but ensure that it works. Over time, this will cease to be a problem. > I suspect you would give git people more warm fuzzies about > implementing this by showing a system that is built on git-notes > and saying "this works really well, except that the external note > storage is not a good reason because { it's mutable, it's not > efficient, whatever other reason you find}". And then we know that > the system is proven to work, and that migrating the note-like > structure into the object is sensible. > > But I get the impression you're one step back from that now. So it makes > sense to me to at least prototype it via git-notes, which will give you > the same semantic storage (a mapping of commits to some blobs, with > reachability handled automatically). I appreciate how you are developing your reasoning, and the advice you give. Indeed, I am already prototyping using git-notes, and I designed the datastore to be extensible, so that I can use other ways to find the data. Using pseudo-headers is another (temporary) way to prove the concept works, but I am afraid that it will become standard too quickly (because it's so easy), essentially preventing progress into x-*-ref domain, or forcing us to carry compatibility with us forever. What do you think about using the idea of orphan parent commits (OPC) for now? These are conceptually closest to the x-*-ref pointers, do not require extra setup, pollute history only a little bit (IMHO), and slot in with Git and fsck/gc alright. Here's the idea again, graphically: o--o--o--● / # while at HEAD, I would backtrack history until I found HEAD^, which has a parent with a well-defined commit message and holding the data I am looking for. Later, when x-*-ref is mainline, instead of parent pointers, it can be used in place. When there is a merge and the TopGit data need updating, a new OPC is slotted into place, on the merge commit. In the following graph, the user then decided also at a later point to update e.g. the TopGit patch description (.topmsg), which is also stored in this OPC: o--o-o / \ maint master o--o--o--o--+--o--O--o--o--o--● / / / # # # To keep things simple, every OPC copies the unchanged data from the previous one as well (compression will reduce the overhead). Later, I can use the maint branch just in the same way I could use master when it was at that age. > > > Otherwise, the usual recommendation is to use a pseudo-header > > > within the body of the commit message (i.e., "Topgit-Base: ..." at > > > the end of the commit message). The upside is that it's easy to > > > create, manipulate, and examine using existing git tools. The > > > downside is that it is something that the user is more likely to > > > see in "git log" or when editing a rebased commit message. > > > > … to see *and to accidentally mess up*. And while that may even be > > unlikely, it does expose information that really ought to be hidden. > > I'm not quite sure what the information is, so I can't really judge. Do > you have a concrete example? > > I got the impression earlier you were wanting to store a human-readable > text string. That makes a pseudo-header a reasonable choice. But if you > are going to reference some blob (which it seems from what you wrote > above), and you are interested in proper reachability analysis, then no, > it probably isn't a good idea. I am not yet sure what information needs storing. Right now, I am keeping five fields: Depend-Refs A list of the most recent branch points from dependency branchs, so that a tool can tell when the dependent branch needs an update (commits following those refs that are not reachable by the branch head). Base-Ref The ref to the most recent merge of all dependencies, used to create diffs. Patch-Branch boolean to suggest whether this branch is designed to develop a single patch for submission or use in a quilt series. Patch-Message Patch description (think git-send-email). Integration-Branch boolean to suggest whether instead this branch is a branch designed to collect features. At the moment, I do now know which of those are necessary, and which I am missing. The flexibility of being able to store as much as I want, in whatever format I want, without having to fear overloading the commit message or burdening the user, is what makes me want to use refs to blobs. > I think extensibility is welcome. It's just that most discussions > so far have ended up realizing that a new header would just be > cruft. Maybe yours is different. I'm still not 100% sure > I understand what you want to accomplish, but the idea of an > x-*-ref header is a reasonable thing for git to have. I think there are two questions: 1. would x-*-ref be a suitable idea for Git core? I think the answer is yes, as (I think) it's well-defined and I cannot see any problems with it, really. 2. can we prevent abuse? No, never. But just like you cannot abuse X-* headers in the RFC822 format due to their design, x-*-ref abuse would only affect those who chose it. Thank you, -- martin | http://madduck.net/ | http://two.sentenc.es/ "the question of whether computers can think is like the question of whether submarines can swim." -- edsgar w. dijkstra spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) 2011-08-02 8:28 ` martin f krafft @ 2011-08-02 15:03 ` martin f krafft 2011-08-02 18:57 ` Jeff King 2011-08-02 18:51 ` Storing additional information in commit headers Jeff King 1 sibling, 1 reply; 23+ messages in thread From: martin f krafft @ 2011-08-02 15:03 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 4446 bytes --] also sprach martin f krafft <madduck@madduck.net> [2011.08.02.1028 +0200]: > What do you think about using the idea of orphan parent commits > (OPC) for now? These are conceptually closest to the x-*-ref > pointers, do not require extra setup, pollute history only a little > bit (IMHO), and slot in with Git and fsck/gc alright. > > Here's the idea again, graphically: > > o--o--o--● > / > # > > while at HEAD, I would backtrack history until I found HEAD^, which > has a parent with a well-defined commit message and holding the data > I am looking for. > > Later, when x-*-ref is mainline, instead of parent pointers, it can > be used in place. > > When there is a merge and the TopGit data need updating, a new > OPC is slotted into place, on the merge commit. In > the following graph, the user then decided also at a later point to > update e.g. the TopGit patch description (.topmsg), which is also > stored in this OPC: > > o--o-o > / \ maint master > o--o--o--o--+--o--O--o--o--o--● > / / / > # # # > > To keep things simple, every OPC copies the unchanged data from the > previous one as well (compression will reduce the overhead). I have published a working prototype of this kind of datastore, in case people are interested: http://git.madduck.net/v/code/topgit-ng.git Here's a bit of synopsis: % ./tg-datastore list I: returns non-zero if no datastore found at given commit. I: prints contents of datastore otherwise. message: this is a proof-of-concept % ./tg-datastore find commitref I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. dc58ec49df849ec1aef6929cd40c759a6018e056 % git commit --allow-empty -mone [master 78918bb] one % git commit --allow-empty -mtwo [master 7eca0cd] two % ./tg-datastore find message I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. this is a proof-of-concept % ./tg-datastore find commitref I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. dc58ec49df849ec1aef6929cd40c759a6018e056 % ./tg-datastore add message='this is a new message' I: returns non-zero if there is already a datastore on HEAD. I: adding the following data to the datastore of HEAD: I: message: this is a new message % ./tg-datastore find commitref I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. 8e6179050a1aca5485f3e1702780f1b555d8643b % ./tg-datastore find message I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. this is a new message tig output now: 2011-08-02 16:52 martin f. krafft M─┐ [master] two 2011-08-02 16:54 TopGit │ I TopGit data node 2011-08-02 16:52 martin f. krafft I one 2011-08-02 16:50 martin f. krafft M─┐ [origin/master] import first prototype 2011-08-02 16:50 TopGit │ I TopGit data node 2011-08-02 16:48 martin f. krafft I Initial (empty) root commit % ./tg-datastore remove I: always returns zero, even if there was nothing to remove. % ./tg-datastore find message I: prints the value of the parameter, or empty if parameter is not found. I: returns non-zero if no datastore was found. this is a proof-of-concept Note three things: 1. I am actually using a x-* header in the TopGit data node commit object to help identify it as a commit. This could be done differently (e.g. parse the commit message for some magic), but I chose to do this on purpose to see how well it fares. 2. If Git grew x-*-ref headers (refs to objects in general), I could use that instead and drop the parent pointer, which would make the DAG cleaner. 3. Right now, you cannot add parent orphan commits to orphans themselves, but it would be trivial to enable. I just couldn't be bothered. Enjoy, and comments of course welcome. -- martin | http://madduck.net/ | http://two.sentenc.es/ windoze nt crashed. i am the blue screen of death. no one hears your screams. spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) 2011-08-02 15:03 ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft @ 2011-08-02 18:57 ` Jeff King 2011-08-02 19:09 ` martin f krafft 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2011-08-02 18:57 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Tue, Aug 02, 2011 at 05:03:21PM +0200, martin f krafft wrote: > tig output now: > 2011-08-02 16:52 martin f. krafft M─┐ [master] two > 2011-08-02 16:54 TopGit │ I TopGit data node > 2011-08-02 16:52 martin f. krafft I one > 2011-08-02 16:50 martin f. krafft M─┐ [origin/master] import first prototype > 2011-08-02 16:50 TopGit │ I TopGit data node > 2011-08-02 16:48 martin f. krafft I Initial (empty) root commit Look at "git show origin/master" here. It ends up as a combined diff. Which is kind of ugly. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) 2011-08-02 18:57 ` Jeff King @ 2011-08-02 19:09 ` martin f krafft 2011-08-02 19:26 ` martin f krafft 0 siblings, 1 reply; 23+ messages in thread From: martin f krafft @ 2011-08-02 19:09 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 693 bytes --] also sprach Jeff King <peff@peff.net> [2011.08.02.2057 +0200]: > Look at "git show origin/master" here. It ends up as a combined diff. > Which is kind of ugly. Yes, absolutely. However, this would no longer be the case if x-*-ref could be used. Right now, I am just using orphan parent commits to avoid garbage collection. -- martin | http://madduck.net/ | http://two.sentenc.es/ "he gave me his card he said, 'call me if they die' i shook his hand and said goodbye ran out to the street when a bowling ball came down the road and knocked me off my feet" -- bob dylan spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) 2011-08-02 19:09 ` martin f krafft @ 2011-08-02 19:26 ` martin f krafft 0 siblings, 0 replies; 23+ messages in thread From: martin f krafft @ 2011-08-02 19:26 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 793 bytes --] also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2109 +0200]: > Yes, absolutely. However, this would no longer be the case if > x-*-ref could be used. Right now, I am just using orphan parent > commits to avoid garbage collection. refs/heads/master is a file, containing its payload in the first line by format definition, right? I mean: the storage is right there, isn't it? Of course this opens a whole new can of worms: merging per-ref data. -- martin | http://madduck.net/ | http://two.sentenc.es/ "'oh, that was easy,' says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing." -- douglas adams, "the hitchhiker's guide to the galaxy" spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-02 8:28 ` martin f krafft 2011-08-02 15:03 ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft @ 2011-08-02 18:51 ` Jeff King 2011-08-02 19:06 ` martin f krafft 1 sibling, 1 reply; 23+ messages in thread From: Jeff King @ 2011-08-02 18:51 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Tue, Aug 02, 2011 at 10:28:10AM +0200, martin f krafft wrote: > TopGit does what you suggest (a parallel ref structure), but there > are three problems with this, which I am trying to address: > > 1. you need to ensure that these refs are pushed and fetched, > which requires set up and possible migration issues when things > change, and can cause big problems for contributors who just so > happened to forget. I agree that is an annoyance, but it is one we can deal with. In the near term, I wonder if a "tg clone" would be appropriate to add the extra fetch refspecs when cloning (or even a "tg init" inside an existing git repo -- I don't actually use topgit, so I'm not sure what the usual initialization process, if any, is). In the longer term, it might be nice if git was better at sharing third-party refs. The problem is that we don't know what the refs mean, so we don't know which ones are appropriate for sharing. Maybe we could do something like "refs/shared/topgit/*", and git by default would push and pull items under refs/shared? There have also been proposals to have a more mirror-like structure to what we fetch from remotes. E.g., to put remote refs/tags into refs/remotes/origin/refs/tags, and similar for notes. It may be that it is sensible for us to just fetch everything from a remote into refs/remotes, including unknown hierarchies like topgit. > 2. the additional refs confuse people a lot — and I can attest to > that because I have also at times found myself overwhelmed by > them when staring at gitk. Using "gitk --all", I assume? I agree it is annoying, though "gitk --branches" probably better specifies what you want (unless you stick the parallel ref structure under refs/heads above, which is also a solution to the "should it be fetched" plan). > 3. once a ref updates, we need to keep a pointer to the previous > location, since one of the goals is the ability to be able to > return to a point in history (e.g. for security updates to > a stable package, or backports). Additional refs enhance the > aforementioned two problems. Reflogs provide a linear history of the ref updates, but I suspect you want to be able to push and pull these histories. Which reflogs will not do. If you want to version the state of refs, then using raw refs isn't the right answer. You want a separate commit history with trees that map ref names to commits or other objects. Which is _almost_ what notes are; they map commit sha1s, but you want to map ref names. > Therefore I thought it would be sensible to store these data in > commit. When the data change, there will always be a new commit to > store these data, and we do *not* want to update the data in > previous commits. Finding the data then becomes backtracking the > branch history until a commit is found containing them. That seems to me like you are sticking information in a commit that is not actually about the commit, but about the ref that happens to point to the commit. What if I have two refs that point to the same commit, but with two different topgit bases? What about years later, when that information isn't interesting anymore? You're still carrying the cruft inside your commit objects. > > However, implementing such a thing would mean you have an awkward > > transition period where some versions of git think the referenced > > object is relevant, and others do not. That's something we can > > overcome, but it's going to require code in git, and possibly > > a dormant introduction period. > > Indeed. This could be adressed by letting a tool like TopGit require > a minimum version of Git. For a while, this will burden developers, > but ensure that it works. Over time, this will cease to be > a problem. Keep in mind that your requirement is not just a local thing. Object reachability is something that both sides of a transfer need to agree on. So imagine you use TopGit with a new version of git, and you push to a site like GitHub. The remote side will take your objects, but it will not send them back to anyone who fetches from your repository (since it has no idea they're relevant). And it will probably prune them after a week or two. > What do you think about using the idea of orphan parent commits > (OPC) for now? These are conceptually closest to the x-*-ref > pointers, do not require extra setup, pollute history only a little > bit (IMHO), and slot in with Git and fsck/gc alright. It doesn't seem like a good idea to me. Parent pointers have a well-defined meaning, and other parts of git (and other tools, even) are going to assume that's what your parent pointers mean. They are used in merge base calculations, for example. I _think_ you are mostly safe here, because your OPC wouldn't have any real history to it, so finding a merge base down that path would be fruitless. But consider something like "diff", which shows a merge commit differently than a regular commit. Your commits will unexpectedly appear as merges to git, and we will show a combined diff versus the OPC, which is going to be ugly. > I am not yet sure what information needs storing. Right now, I am > keeping five fields: > [...] Thanks, that helped with getting a sense of what you're doing. > I think there are two questions: > > 1. would x-*-ref be a suitable idea for Git core? > > I think the answer is yes, as (I think) it's well-defined and > I cannot see any problems with it, really. I think it's a nice idea for extensibility. And if it had been there from day one, there would be no problems. But now we have to deal with the transition period, and the fact that two different versions of git will have different ideas about the set of objects that are reachable from a given commit. > 2. can we prevent abuse? > > No, never. But just like you cannot abuse X-* headers in the > RFC822 format due to their design, x-*-ref abuse would only > affect those who chose it. I don't worry about abuse. You can already stick random cruft in a commit header, and you can already connect objects to a commit via tree entries. This idea is just giving git some rules for dealing with it. I'm still not 100% convinced you want per-commit storage, though, and not per-ref storage. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-02 18:51 ` Storing additional information in commit headers Jeff King @ 2011-08-02 19:06 ` martin f krafft 2011-08-02 19:27 ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft 2011-08-04 3:39 ` Storing additional information in commit headers Jeff King 0 siblings, 2 replies; 23+ messages in thread From: martin f krafft @ 2011-08-02 19:06 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 2701 bytes --] also sprach Jeff King <peff@peff.net> [2011.08.02.2051 +0200]: > I agree that is an annoyance, but it is one we can deal with. In the > near term, I wonder if a "tg clone" would be appropriate to add the > extra fetch refspecs when cloning (or even a "tg init" inside an > existing git repo -- I don't actually use topgit, so I'm not sure what > the usual initialization process, if any, is). Hey Jeff, thanks for your response. TopGit does come with these commands to do the setup for you, but that does not ensure that a new contributor without any idea about TopGit won't forget to run them. The argument against tg-clone is mainly that I really do not want to encapsulate/abstract functionality, but rather stay as close as possible to pure Git, and never to mandate anyone to use anything else. > In the longer term, it might be nice if git was better at sharing > third-party refs. The problem is that we don't know what the refs > mean, so we don't know which ones are appropriate for sharing. > Maybe we could do something like "refs/shared/topgit/*", and git > by default would push and pull items under refs/shared? This could be an interesting and viable approach. > > Therefore I thought it would be sensible to store these data in > > commit. When the data change, there will always be a new commit to > > store these data, and we do *not* want to update the data in > > previous commits. Finding the data then becomes backtracking the > > branch history until a commit is found containing them. > > That seems to me like you are sticking information in a commit that is > not actually about the commit, but about the ref that happens to point > to the commit. What if I have two refs that point to the same commit, > but with two different topgit bases? I don't think this can happen, but the point is valid. > What about years later, when that information isn't interesting > anymore? You're still carrying the cruft inside your commit > objects. […] > I'm still not 100% convinced you want per-commit storage, though, > and not per-ref storage. Yes, I do want per-ref storage. Your arguments against my orphan parent pointer approach (which could later be a x-*-ref approach) are valid. It just seems to me that per-ref storage is a lot further away than per-commit storage, and I'd really like to move forward with TopGit… Thank you, -- martin | http://madduck.net/ | http://two.sentenc.es/ "one should never trust a woman who tells her real age. if she tells that, she will tell anything." -- oscar wilde spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* per-ref data storage (was: Storing additional information in commit headers) 2011-08-02 19:06 ` martin f krafft @ 2011-08-02 19:27 ` martin f krafft 2011-08-02 21:12 ` per-ref data storage martin f krafft 2011-08-04 3:41 ` per-ref data storage (was: Storing additional information in commit headers) Jeff King 2011-08-04 3:39 ` Storing additional information in commit headers Jeff King 1 sibling, 2 replies; 23+ messages in thread From: martin f krafft @ 2011-08-02 19:27 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 775 bytes --] [sorry, my previous message was a total reply FAIL] also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2106 +0200]: > It just seems to me that per-ref storage is a lot further away than > per-commit storage, and I'd really like to move forward with TopGit… refs/heads/master is a file, containing its payload in the first line by format definition, right? I mean: the storage is right there, isn't it? Of course this opens a whole new can of worms: merging per-ref data. -- martin | http://madduck.net/ | http://two.sentenc.es/ "nothing can cure the soul but the senses, just as nothing can cure the senses but the soul." -- oscar wilde spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: per-ref data storage 2011-08-02 19:27 ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft @ 2011-08-02 21:12 ` martin f krafft 2011-08-04 3:41 ` per-ref data storage (was: Storing additional information in commit headers) Jeff King 1 sibling, 0 replies; 23+ messages in thread From: martin f krafft @ 2011-08-02 21:12 UTC (permalink / raw) To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher [-- Attachment #1: Type: text/plain, Size: 1020 bytes --] also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2127 +0200]: > refs/heads/master is a file, containing its payload in the first > line by format definition, right? > > I mean: the storage is right there, isn't it? > > Of course this opens a whole new can of worms: merging per-ref data. origin/master can contain a different set of per-ref data than master, and the consolidation would need to happen during the normal merge. But unless there's always a new commit associated with a change of those data, git-push will happily overwrite those data on the remote. … unless the remote refuses to accept a ref update if the data have changed. Conceivably that's could lead into a control path similar to what happens on a non-fast-forward push — unless receive.nonFastForwards is on. What then? -- martin | http://madduck.net/ | http://two.sentenc.es/ seminars, n.: from "semi" and "arse", hence, any half-assed discussion. spamtraps: madduck.bogus@madduck.net [-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --] [-- Type: application/pgp-signature, Size: 1124 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: per-ref data storage (was: Storing additional information in commit headers) 2011-08-02 19:27 ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft 2011-08-02 21:12 ` per-ref data storage martin f krafft @ 2011-08-04 3:41 ` Jeff King 1 sibling, 0 replies; 23+ messages in thread From: Jeff King @ 2011-08-04 3:41 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Tue, Aug 02, 2011 at 09:27:28PM +0200, martin f krafft wrote: > [sorry, my previous message was a total reply FAIL] > > also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2106 +0200]: > > It just seems to me that per-ref storage is a lot further away than > > per-commit storage, and I'd really like to move forward with TopGit… > > refs/heads/master is a file, containing its payload in the first > line by format definition, right? > > I mean: the storage is right there, isn't it? Yes, and I think git will even ignore other stuff in the file. But I don't think you can count on git not obliterating the other stuff when it updates the ref. Nor would it be passed over a clone or fetch. > Of course this opens a whole new can of worms: merging per-ref data. Yes. That's the tricky part. And that's something you'll have to deal with no matter how you store it, I expect. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-02 19:06 ` martin f krafft 2011-08-02 19:27 ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft @ 2011-08-04 3:39 ` Jeff King 1 sibling, 0 replies; 23+ messages in thread From: Jeff King @ 2011-08-04 3:39 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On Tue, Aug 02, 2011 at 09:06:45PM +0200, martin f krafft wrote: > It just seems to me that per-ref storage is a lot further away than > per-commit storage, and I'd really like to move forward with TopGit… I don't think it's that hard. For example: # our mapping for all refs, and the history of that mapping, will be # stored under this ref MAP=refs/topgit/metadata refmap_set() { ( # start with a pristine index based on the current map GIT_INDEX_FILE="$(git rev-parse --git-dir)/tg-meta-index" export GIT_INDEX_FILE if git rev-parse -q --verify $MAP >/dev/null; then git read-tree $MAP fi # and then put our new ref and metadata in blob=`git hash-object --stdin -w` git update-index --add --cacheinfo 100644 $blob $1 tree=`git write-tree` parent=$(git rev-parse -q --verify $MAP) commit=`echo 'updated map' | git commit-tree $tree ${parent:+-p $parent}` git update-ref $MAP $commit $old ) } refmap_get() { git cat-file blob $MAP:$1 } # and some examples of use echo some metadata | refmap_set refs/heads/foo refmap_get refs/heads/foo | sed 's/meta/changed &/' | refmap_set refs/heads/foo It's a little more clunky than notes, of course, but it's not too bad to put into a script. The tricky part is how to handle fetching and merging the metadata ref from other people. But that's not really different from notes. In either case, you're probably going to want to make a custom merge program for combining the meta-information. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Storing additional information in commit headers 2011-08-01 18:20 Storing additional information in commit headers martin f krafft ` (2 preceding siblings ...) 2011-08-01 20:13 ` Jeff King @ 2011-08-02 13:53 ` Michael Haggerty 3 siblings, 0 replies; 23+ messages in thread From: Michael Haggerty @ 2011-08-02 13:53 UTC (permalink / raw) To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher On 08/01/2011 08:20 PM, martin f krafft wrote: > Are there any strong reasons against my use of commit headers for > specific, well-defined purposes in contained use-cases? E.g. are > there tools known to only copy "known" headers, which could > potentially break my assumptions? Before you store important information in a git-internal data structure, please consider: * Some of your developers might prefer using another DVCS (e.g., Mercurial via hg-git) and they will not be able to see the information at all * Some day the main project might want to (god forbid!) switch to a successor to git, and your extra information might be difficult to migrate. * Somebody might want to work with your project from a tarball rather than having to install and use git. Therefore, I recommend a strong bias towards storing information in as transparent, non-system-specific a way as possible. Metadata and scripts stored within the file tree part of the repository are typically a lot easier to work with and more transparent than git-specific hacks. That being said, I haven't understood your application well enough to know whether these biases might be trumped by convenience in your particular situation. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2011-08-04 3:41 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-01 18:20 Storing additional information in commit headers martin f krafft 2011-08-01 18:27 ` Sverre Rabbelier 2011-08-01 18:34 ` martin f krafft 2011-08-01 20:01 ` Clemens Buchacher 2011-08-01 20:55 ` martin f krafft 2011-08-01 18:28 ` martin f krafft 2011-08-01 19:33 ` Martin Langhoff 2011-08-01 20:51 ` martin f krafft 2011-08-01 20:13 ` Jeff King 2011-08-01 21:11 ` martin f krafft 2011-08-02 3:50 ` Jeff King 2011-08-02 8:28 ` martin f krafft 2011-08-02 15:03 ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft 2011-08-02 18:57 ` Jeff King 2011-08-02 19:09 ` martin f krafft 2011-08-02 19:26 ` martin f krafft 2011-08-02 18:51 ` Storing additional information in commit headers Jeff King 2011-08-02 19:06 ` martin f krafft 2011-08-02 19:27 ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft 2011-08-02 21:12 ` per-ref data storage martin f krafft 2011-08-04 3:41 ` per-ref data storage (was: Storing additional information in commit headers) Jeff King 2011-08-04 3:39 ` Storing additional information in commit headers Jeff King 2011-08-02 13:53 ` Michael Haggerty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).