Storing additional information in commit headers

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Storing additional information in commit headers
@ 2011-08-01 18:20 martin f krafft
  2011-08-01 18:27 ` Sverre Rabbelier
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-01 18:20 UTC (permalink / raw)
  To: git discussion list; +Cc: Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

Dear list,

I've read — with great interest — the recent discussion on
generation numbers[0], mostly because Clemens Buchacher pointed me
to it as a warning not to mess with commit objects.

0. http://comments.gmane.org/gmane.comp.version-control.git/177146

My intent was to add an extra commit header to select commits as
a way to store extra information needed to automate the management
of interdependent branches and patch generation à la TopGit.

Having read the generation numbers debate, I am not sure that adding
additional commit headers is a bad idea per se. From what
I understand, the main pushback to Linus' idea was that people did
not feel it right to store redundant, calculateable information
permanently in commit objects, where they cannot be altered anymore,
despite the non-zero chance of there being an error. Instead, the
use of a cache was advocated. I do not want to take a side in this
debate with this mail of mine.

Instead, I am investigating ways in which I can store additional
information for a branch, and ideally in a way to make it
transparent and automatic for all users of a project's repo.

Hence, if I were to store additional information in the commit
object headers, this information would by design be correct,
immutable, and non-redundant. I am going to reply to my own mail
with some implementation details to feed the curious, with the hope
to keep this debate focused.

Are there any strong reasons against my use of commit headers for
specific, well-defined purposes in contained use-cases? E.g. are
there tools known to only copy "known" headers, which could
potentially break my assumptions?

Thanks,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"when a gentoo admin tells me that the KISS principle is good for
 'busy sysadmins', and that it's not an evolutionary step backwards,
 i wonder whether their tape is already running backwards."

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:20 Storing additional information in commit headers martin f krafft
@ 2011-08-01 18:27 ` Sverre Rabbelier
  2011-08-01 18:34   ` martin f krafft
  2011-08-01 18:28 ` martin f krafft
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Sverre Rabbelier @ 2011-08-01 18:27 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

Heya,

On Mon, Aug 1, 2011 at 20:20, martin f krafft <madduck@madduck.net> wrote:
> My intent was to add an extra commit header to select commits as
> a way to store extra information needed to automate the management
> of interdependent branches and patch generation à la TopGit.

Have you had a look at git notes?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:27 ` Sverre Rabbelier
@ 2011-08-01 18:34   ` martin f krafft
  2011-08-01 20:01     ` Clemens Buchacher
  0 siblings, 1 reply; 23+ messages in thread
From: martin f krafft @ 2011-08-01 18:34 UTC (permalink / raw)
  To: Sverre Rabbelier, git discussion list, Petr Baudis,
	Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]

also sprach Sverre Rabbelier <srabbelier@gmail.com> [2011.08.01.2027 +0200]:
> On Mon, Aug 1, 2011 at 20:20, martin f krafft <madduck@madduck.net> wrote:
> > My intent was to add an extra commit header to select commits as
> > a way to store extra information needed to automate the management
> > of interdependent branches and patch generation à la TopGit.
> 
> Have you had a look at git notes?

Hello, and thanks for taking the time to reply to me!

Yes, I have considered git-notes. The issue I have with git-notes is
that it requires every contributor to set up refspecs for fetch and
push, or else the notes will not be exchanged/shared.

I realise this is a minor concern to most of you, or maybe even
a feature (part of the beauty of Git is, after all, that it works
without requiring everyone to have the same local setup), but in our
use-case (distro packaging), it's a relatively large burden to new
contributors and passerby's (sp?).

Also, git-notes are mutable (at least from the UI perspectiv) and
I strive to encode information immutably.

Therefore I am looking for a means to encode this (necessary)
information as part of the main DAG (i.e. not polluting the
worktree).

I hope this makes sense.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"first get your facts; then you can distort them at your leisure."
                                                       -- mark twain

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:34   ` martin f krafft
@ 2011-08-01 20:01     ` Clemens Buchacher
  2011-08-01 20:55       ` martin f krafft
  0 siblings, 1 reply; 23+ messages in thread
From: Clemens Buchacher @ 2011-08-01 20:01 UTC (permalink / raw)
  To: martin f krafft; +Cc: Sverre Rabbelier, git discussion list, Petr Baudis

On Mon, Aug 01, 2011 at 08:34:11PM +0200, martin f krafft wrote:
>
> Yes, I have considered git-notes. The issue I have with git-notes is
> that it requires every contributor to set up refspecs for fetch and
> push, or else the notes will not be exchanged/shared.

Notes are tracked using a 'branch' too. It's just a branch in the
refs/notes namespace, the notes ref. You could simply tag your
notes ref or point a ref from the refs/heads namespace to it each
time you create new notes.

> Also, git-notes are mutable (at least from the UI perspectiv) and
> I strive to encode information immutably.

Notes are also used by textconv, for example, to cache immutable
data. It's not likely a user will end up editing it by accident
unless you use the default notes ref.

Clemens

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 20:01     ` Clemens Buchacher
@ 2011-08-01 20:55       ` martin f krafft
  0 siblings, 0 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-01 20:55 UTC (permalink / raw)
  To: Clemens Buchacher, Sverre Rabbelier, git discussion list,
	Petr Baudis

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

also sprach Clemens Buchacher <drizzd@aon.at> [2011.08.01.2201 +0200]:
> Notes are tracked using a 'branch' too. It's just a branch in the
> refs/notes namespace, the notes ref. You could simply tag your
> notes ref or point a ref from the refs/heads namespace to it each
> time you create new notes.

Hi Clemens, thanks for responding!

You suggest integrating refs/notes/foo into refs/heads by means of
a pointer… at which point we are polluting the branch history space
again (think gitk), no?

I appreciate the simplicity of this idea of yours, which I had not
thought of. Indeed, maintaining a head at the top of
refs/notes/topgit-metadata (or whatever) has charm. I do not mean to
discard it at all right now, and will think about this more!

git-notes was designed to be used for such cases, I was pleased to
note the configurability. Maybe it is the ticket.

Still: why not commit headers?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

... with a plastic cup filled with a liquid that was almost,
but not quite, entirely unlike tea.
            -- douglas adams, "the hitchhiker's guide to the galaxy"

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:20 Storing additional information in commit headers martin f krafft
  2011-08-01 18:27 ` Sverre Rabbelier
@ 2011-08-01 18:28 ` martin f krafft
  2011-08-01 19:33   ` Martin Langhoff
  2011-08-01 20:13 ` Jeff King
  2011-08-02 13:53 ` Michael Haggerty
  3 siblings, 1 reply; 23+ messages in thread
From: martin f krafft @ 2011-08-01 18:28 UTC (permalink / raw)
  To: git discussion list; +Cc: Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

also sprach martin f krafft <madduck@madduck.net> [2011.08.01.2020 +0200]:
> Hence, if I were to store additional information in the commit
> object headers, this information would by design be correct,
> immutable, and non-redundant. I am going to reply to my own mail
> with some implementation details to feed the curious, with the hope
> to keep this debate focused.

For lack of a better idea (cf. [0]), I am currently toying with the
following approach:

Possibly in addition to the orphan parent pointer to a commit object
suggested in [0], and in order to provide a clear means to identify
said orphan parent pointer (holding additional information), I am
considering storing this orphan parent commit's ref in the main
commit, using a header like x-topgit-top-base [1].

0. http://permalink.gmane.org/gmane.comp.version-control.git/178349
1. The use of the x- prefix is obviously intentional to suggest that
   this is a free-form, non-standard extension.

Whenever the extra data need changing, a new x-topgit-top-base ref
is added to HEAD.

Now, given a commitish, I simply have to walk back in time until
I find a commit object with such a header, and I have the most
recent metadata at my fingertips.

Instead of a ref to the orphan parent commit (which visibily
pollutes the history), I could also just store the information right
there.

This is arguably hackish, but unless I find a better way, it's the
best I've come up with thus far.

And of course, this could go into the commit message body text,
but it being an implementation detail, that's really not the right
place for it.

Thanks for your consideration,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"there are two major products that come out of berkeley: lsd and unix."
 one caused me an addiction
                                                             -- fyodor

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:28 ` martin f krafft
@ 2011-08-01 19:33   ` Martin Langhoff
  2011-08-01 20:51     ` martin f krafft
  0 siblings, 1 reply; 23+ messages in thread
From: Martin Langhoff @ 2011-08-01 19:33 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Mon, Aug 1, 2011 at 2:28 PM, martin f krafft <madduck@madduck.net> wrote:
> For lack of a better idea (cf. [0]), I am currently toying with the
> following approach:

Hi Martin!

What data are you trying to include? Some time ago, I had similar
ideas to yours for a while... and it ended up being that all I needed
was to put the additional data /in a file/ and commit that file.

Speculation: you mention distro packaging, so I assume you're
improving the Debian packaging integration, with git tracking
debian/rules, perhaps with a wrapper. If you are using a wrapper
program, it is trivial to update this "metadata" file, or to ensure
it's valid/sane, in the preparations to commit, perhaps ensuring that
a pre-commit-hook script is in place and executable.

hth,

m
-- 
 martin.langhoff@gmail.com
 martin@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 19:33   ` Martin Langhoff
@ 2011-08-01 20:51     ` martin f krafft
  0 siblings, 0 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-01 20:51 UTC (permalink / raw)
  To: Martin Langhoff, git discussion list, Petr Baudis,
	Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]

also sprach Martin Langhoff <martin.langhoff@gmail.com> [2011.08.01.2133 +0200]:
> What data are you trying to include? Some time ago, I had similar
> ideas to yours for a while... and it ended up being that all I needed
> was to put the additional data /in a file/ and commit that file.

Hi, thanks for taking the time to reply to me!

I am trying to store the top-base of a TopGit branch, which is the
merge of all a branch's dependencies.

TopGit uses refs for that, but a ref can only ever point at one such
merge, and so it's hard-to-impossible to reconstruct a branch
dependency in the past.

TopGit does use files in the worktree too. I would love to get rid
of this as well, since a file like .topmsg (which differs between
all branches, even related ones), requires to always remember to use
the 'ours' merge driver, which requires setup, which makes it harder
to use.

> If you are using a wrapper program,

I am trying to stay as close as possible to plain Git. All of this
could easily be done by a wrapper, but a wrapper always makes too
many assumptions to become a viable standard for Debian packaging.

> it's valid/sane, in the preparations to commit, perhaps ensuring
> that a pre-commit-hook script is in place and executable.

Again, that requires setup, which increases the barrier of entry to
passerby's and new contributors.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"verbing weirds language."
                                                           -- calvin

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:20 Storing additional information in commit headers martin f krafft
  2011-08-01 18:27 ` Sverre Rabbelier
  2011-08-01 18:28 ` martin f krafft
@ 2011-08-01 20:13 ` Jeff King
  2011-08-01 21:11   ` martin f krafft
  2011-08-02 13:53 ` Michael Haggerty
  3 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-08-01 20:13 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Mon, Aug 01, 2011 at 08:20:15PM +0200, martin f krafft wrote:

> Instead, I am investigating ways in which I can store additional
> information for a branch, and ideally in a way to make it
> transparent and automatic for all users of a project's repo.
> 
> Hence, if I were to store additional information in the commit
> object headers, this information would by design be correct,
> immutable, and non-redundant. I am going to reply to my own mail
> with some implementation details to feed the curious, with the hope
> to keep this debate focused.
> 
> Are there any strong reasons against my use of commit headers for
> specific, well-defined purposes in contained use-cases? E.g. are
> there tools known to only copy "known" headers, which could
> potentially break my assumptions?

This topic has come up several times in the past few years. I think some
of the relevant questions to consider about your new data are:

  1. Does git actually care about your data? E.g., would it want to use
     it for reachability analysis in git-fsck?

  2. Is it an immutable property of a commit, or can it be changed after
     the fact?

If (2) is no, then git-notes is probably the best choice.

Otherwise, if (1) is yes, then a commit header makes sense. But then, it
should also be something that git is taught about, and your commit
header should not be some topgit-specific thing, but a header showing
the generalized form.

Otherwise, the usual recommendation is to use a pseudo-header within the
body of the commit message (i.e., "Topgit-Base: ..." at the end of the
commit message). The upside is that it's easy to create, manipulate, and
examine using existing git tools. The downside is that it is something
that the user is more likely to see in "git log" or when editing a
rebased commit message.

Just about every discussion on this topic ends with the pseudo-header
recommendation. The only exceptions AFAIK are "encoding" (which git
itself needs to care about), and "generation" (which, as you noted,
raises other questions).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 20:13 ` Jeff King
@ 2011-08-01 21:11   ` martin f krafft
  2011-08-02  3:50     ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: martin f krafft @ 2011-08-01 21:11 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 3716 bytes --]

also sprach Jeff King <peff@peff.net> [2011.08.01.2213 +0200]:
> This topic has come up several times in the past few years.

I am sorry that I am bothering the list again. I tried hard to find
whatever I could, but after 2–3 hours of web searching, I came here…

Thank you for taking the time to answer!

> I think some
> of the relevant questions to consider about your new data are:
> 
>   1. Does git actually care about your data? E.g., would it want to use
>      it for reachability analysis in git-fsck?
> 
>   2. Is it an immutable property of a commit, or can it be changed after
>      the fact?

Excellent points, and I have answers to both:

  1. Ideally, I would like to point to another blob containing
     information. Right now, in order to prevent gc from pruning
     it, that would have to be a commit pointed to with a parent
     pointer, which is just not right (it's not a parent) and causes
     the commit to show up in the history (which it should not, as
     it's an implementation detail).

     I'll return to this point further down…

  2. It is immutable. Ideally, I would like to store extra
     information for a ref in ref/heads/*, but there seems to be no
     way of doing this. Hence, I need to store it in commits and
     backtrack for it. Or so I think, at least…

> Otherwise, if (1) is yes, then a commit header makes sense. But
> then, it should also be something that git is taught about, and
> your commit header should not be some topgit-specific thing, but
> a header showing the generalized form.

I agree entirely and would be all too excited to see this happening.
I already had ideas too:

  In addition to the standard tree and parent pointers, there could
  be *-ref and x-*-ref headers, which take a single ref argument,
  presumably to a blob containing more data.

  While I cannot conceive a *-ref example, I think it's obvious that
  x-*-ref should be introduced at the same time to keep the *-ref
  namespace clear for future, "official" Git use.

  In terms of gc and fsck and the like, all *-ref and x-*-ref
  headers would contribute to reachability tests and hence prevent
  pruning of those blobs.

> Otherwise, the usual recommendation is to use a pseudo-header
> within the body of the commit message (i.e., "Topgit-Base: ..." at
> the end of the commit message). The upside is that it's easy to
> create, manipulate, and examine using existing git tools. The
> downside is that it is something that the user is more likely to
> see in "git log" or when editing a rebased commit message.

… to see *and to accidentally mess up*. And while that may even be
unlikely, it does expose information that really ought to be hidden.

> Just about every discussion on this topic ends with the
> pseudo-header recommendation. The only exceptions AFAIK are
> "encoding" (which git itself needs to care about), and
> "generation" (which, as you noted, raises other questions).

I can see how it's arguable too why one would want to give git
commit objects the ability to reference arbitrary blobs containing
additional information. I suppose the answer to this question is
related to the answer to the question of whether Git is
a contained/complete tool as-is, or also serves as
a "framework"/"toolkit" for advanced/creative use.

The availability of the porcelain commands seems to suggest that
extensible/flexible additional features should be welcome! ;)

-- 
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck

http://www.transnationalrepublic.org/

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 21:11   ` martin f krafft
@ 2011-08-02  3:50     ` Jeff King
  2011-08-02  8:28       ` martin f krafft
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-08-02  3:50 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Mon, Aug 01, 2011 at 11:11:04PM +0200, martin f krafft wrote:

> >   1. Does git actually care about your data? E.g., would it want to use
> >      it for reachability analysis in git-fsck?
> > 
> >   2. Is it an immutable property of a commit, or can it be changed after
> >      the fact?
> 
> Excellent points, and I have answers to both:
> 
>   1. Ideally, I would like to point to another blob containing
>      information. Right now, in order to prevent gc from pruning
>      it, that would have to be a commit pointed to with a parent
>      pointer, which is just not right (it's not a parent) and causes
>      the commit to show up in the history (which it should not, as
>      it's an implementation detail).

In that case, notes sound like a nice solution, as that is exactly what
they do. Yes, they are mutable, but that might not be that big a deal.

>   2. It is immutable. Ideally, I would like to store extra
>      information for a ref in ref/heads/*, but there seems to be no
>      way of doing this. Hence, I need to store it in commits and
>      backtrack for it. Or so I think, at least…

Wait, so you want metadata on a _ref_, not on a commit? That is a very
different thing, I think. We usually accomplish that with data in
.git/config. Or if you need to push data between repos, or if it's too
big to easily fit in the config, then put it in a blob and keep a
parallel ref structure (e.g., refs/topgit/bases/refs/heads/master).

Or maybe I'm just misunderstanding.

> > Otherwise, if (1) is yes, then a commit header makes sense. But
> > then, it should also be something that git is taught about, and
> > your commit header should not be some topgit-specific thing, but
> > a header showing the generalized form.
> 
> I agree entirely and would be all too excited to see this happening.
> I already had ideas too:
> 
>   In addition to the standard tree and parent pointers, there could
>   be *-ref and x-*-ref headers, which take a single ref argument,
>   presumably to a blob containing more data.

I'm not sure how well-defined that is, though. What does the ref mean?
What does it point to, and what is the meaning with respect to the
original commit? Or are you suggesting that "*" would be "topgit-base"
here, and that git core would understand only that any header matching
the pattern "x-*-ref" should be followed with respect to
reachability/pruning. Only the owner of the "*" part (topgit in this
case) would be able to make sense of the meaning of the ref.

If that is the case, that does make sense to me. It's basically an
immutable version of a note.

However, implementing such a thing would mean you have an awkward
transition period where some versions of git think the referenced object
is relevant, and others do not. That's something we can overcome, but
it's going to require code in git, and possibly a dormant introduction
period.

I suspect you would give git people more warm fuzzies about implementing
this by showing a system that is built on git-notes and saying "this
works really well, except that the external note storage is not a good
reason because { it's mutable, it's not efficient, whatever other reason
you find}". And then we know that the system is proven to work, and that
migrating the note-like structure into the object is sensible.

But I get the impression you're one step back from that now. So it makes
sense to me to at least prototype it via git-notes, which will give you
the same semantic storage (a mapping of commits to some blobs, with
reachability handled automatically).

> > Otherwise, the usual recommendation is to use a pseudo-header
> > within the body of the commit message (i.e., "Topgit-Base: ..." at
> > the end of the commit message). The upside is that it's easy to
> > create, manipulate, and examine using existing git tools. The
> > downside is that it is something that the user is more likely to
> > see in "git log" or when editing a rebased commit message.
> 
> … to see *and to accidentally mess up*. And while that may even be
> unlikely, it does expose information that really ought to be hidden.

I'm not quite sure what the information is, so I can't really judge. Do
you have a concrete example?

I got the impression earlier you were wanting to store a human-readable
text string.  That makes a pseudo-header a reasonable choice. But if you
are going to reference some blob (which it seems from what you wrote
above), and you are interested in proper reachability analysis, then no,
it probably isn't a good idea.

> I can see how it's arguable too why one would want to give git
> commit objects the ability to reference arbitrary blobs containing
> additional information. I suppose the answer to this question is
> related to the answer to the question of whether Git is
> a contained/complete tool as-is, or also serves as
> a "framework"/"toolkit" for advanced/creative use.
> 
> The availability of the porcelain commands seems to suggest that
> extensible/flexible additional features should be welcome! ;)

I think extensibility is welcome. It's just that most discussions so far
have ended up realizing that a new header would just be cruft. Maybe
yours is different. I'm still not 100% sure I understand what you want
to accomplish, but the idea of an x-*-ref header is a reasonable thing
for git to have.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-02  3:50     ` Jeff King
@ 2011-08-02  8:28       ` martin f krafft
  2011-08-02 15:03         ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft
  2011-08-02 18:51         ` Storing additional information in commit headers Jeff King
  0 siblings, 2 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-02  8:28 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 9090 bytes --]

also sprach Jeff King <peff@peff.net> [2011.08.02.0550 +0200]:
> >   2. It is immutable. Ideally, I would like to store extra
> >      information for a ref in ref/heads/*, but there seems to be no
> >      way of doing this. Hence, I need to store it in commits and
> >      backtrack for it. Or so I think, at least…
> 
> Wait, so you want metadata on a _ref_, not on a commit? That is a very
> different thing, I think. We usually accomplish that with data in
> .git/config. Or if you need to push data between repos, or if it's too
> big to easily fit in the config, then put it in a blob and keep a
> parallel ref structure (e.g., refs/topgit/bases/refs/heads/master).
> 
> Or maybe I'm just misunderstanding.

You nailed it perfectly well. Thank you for taking the time again to
reply to me.

TopGit does what you suggest (a parallel ref structure), but there
are three problems with this, which I am trying to address:

  1. you need to ensure that these refs are pushed and fetched,
     which requires set up and possible migration issues when things
     change, and can cause big problems for contributors who just so
     happened to forget.

  2. the additional refs confuse people a lot — and I can attest to
     that because I have also at times found myself overwhelmed by
     them when staring at gitk.

  3. once a ref updates, we need to keep a pointer to the previous
     location, since one of the goals is the ability to be able to
     return to a point in history (e.g. for security updates to
     a stable package, or backports). Additional refs enhance the
     aforementioned two problems.

Therefore I thought it would be sensible to store these data in
commit. When the data change, there will always be a new commit to
store these data, and we do *not* want to update the data in
previous commits. Finding the data then becomes backtracking the
branch history until a commit is found containing them.

> >   In addition to the standard tree and parent pointers, there could
> >   be *-ref and x-*-ref headers, which take a single ref argument,
> >   presumably to a blob containing more data.
>
> I'm not sure how well-defined that is, though. What does the ref
> mean? What does it point to, and what is the meaning with respect
> to the original commit? Or are you suggesting that "*" would be
> "topgit-base" here, and that git core would understand only that
> any header matching the pattern "x-*-ref" should be followed with
> respect to reachability/pruning. Only the owner of the "*" part
> (topgit in this case) would be able to make sense of the meaning
> of the ref.

Exactly the latter. Sorry for my unannounced use of wildcards in
this context.

> If that is the case, that does make sense to me. It's basically an
> immutable version of a note.
>
> However, implementing such a thing would mean you have an awkward
> transition period where some versions of git think the referenced
> object is relevant, and others do not. That's something we can
> overcome, but it's going to require code in git, and possibly
> a dormant introduction period.

Indeed. This could be adressed by letting a tool like TopGit require
a minimum version of Git. For a while, this will burden developers,
but ensure that it works. Over time, this will cease to be
a problem.

> I suspect you would give git people more warm fuzzies about
> implementing this by showing a system that is built on git-notes
> and saying "this works really well, except that the external note
> storage is not a good reason because { it's mutable, it's not
> efficient, whatever other reason you find}". And then we know that
> the system is proven to work, and that migrating the note-like
> structure into the object is sensible.
>
> But I get the impression you're one step back from that now. So it makes
> sense to me to at least prototype it via git-notes, which will give you
> the same semantic storage (a mapping of commits to some blobs, with
> reachability handled automatically).

I appreciate how you are developing your reasoning, and the advice
you give.

Indeed, I am already prototyping using git-notes, and I designed the
datastore to be extensible, so that I can use other ways to find the
data.

Using pseudo-headers is another (temporary) way to prove the concept
works, but I am afraid that it will become standard too quickly
(because it's so easy), essentially preventing progress into x-*-ref
domain, or forcing us to carry compatibility with us forever.

What do you think about using the idea of orphan parent commits
(OPC) for now? These are conceptually closest to the x-*-ref
pointers, do not require extra setup, pollute history only a little
bit (IMHO), and slot in with Git and fsck/gc alright.

Here's the idea again, graphically:

  o--o--o--●
       /
      #

while at HEAD, I would backtrack history until I found HEAD^, which
has a parent with a well-defined commit message and holding the data
I am looking for.

Later, when x-*-ref is mainline, instead of parent pointers, it can
be used in place.

When there is a merge and the TopGit data need updating, a new
OPC is slotted into place, on the merge commit. In
the following graph, the user then decided also at a later point to
update e.g. the TopGit patch description (.topmsg), which is also
stored in this OPC:

       o--o-o
      /      \      maint       master
  o--o--o--o--+--o--O--o--o--o--●
       /     /           /
      #     #           #

To keep things simple, every OPC copies the unchanged data from the
previous one as well (compression will reduce the overhead).

Later, I can use the maint branch just in the same way I could use
master when it was at that age.

> > > Otherwise, the usual recommendation is to use a pseudo-header
> > > within the body of the commit message (i.e., "Topgit-Base: ..." at
> > > the end of the commit message). The upside is that it's easy to
> > > create, manipulate, and examine using existing git tools. The
> > > downside is that it is something that the user is more likely to
> > > see in "git log" or when editing a rebased commit message.
> > 
> > … to see *and to accidentally mess up*. And while that may even be
> > unlikely, it does expose information that really ought to be hidden.
> 
> I'm not quite sure what the information is, so I can't really judge. Do
> you have a concrete example?
> 
> I got the impression earlier you were wanting to store a human-readable
> text string.  That makes a pseudo-header a reasonable choice. But if you
> are going to reference some blob (which it seems from what you wrote
> above), and you are interested in proper reachability analysis, then no,
> it probably isn't a good idea.

I am not yet sure what information needs storing. Right now, I am
keeping five fields:

  Depend-Refs         A list of the most recent branch points from
                      dependency branchs, so that a tool can tell
                      when the dependent branch needs an update
                      (commits following those refs that are not
                      reachable by the branch head).
  Base-Ref            The ref to the most recent merge of all
                      dependencies, used to create diffs.
  Patch-Branch        boolean to suggest whether this branch is
                      designed to develop a single patch for
                      submission or use in a quilt series.
  Patch-Message       Patch description (think git-send-email).
  Integration-Branch  boolean to suggest whether instead this branch
                      is a branch designed to collect features.

At the moment, I do now know which of those are necessary, and which
I am missing.

The flexibility of being able to store as much as I want, in
whatever format I want, without having to fear overloading the
commit message or burdening the user, is what makes me want to use
refs to blobs.

> I think extensibility is welcome. It's just that most discussions
> so far have ended up realizing that a new header would just be
> cruft. Maybe yours is different. I'm still not 100% sure
> I understand what you want to accomplish, but the idea of an
> x-*-ref header is a reasonable thing for git to have.

I think there are two questions:

  1. would x-*-ref be a suitable idea for Git core?

     I think the answer is yes, as (I think) it's well-defined and
     I cannot see any problems with it, really.

  2. can we prevent abuse?

     No, never. But just like you cannot abuse X-* headers in the
     RFC822 format due to their design, x-*-ref abuse would only
     affect those who chose it.

Thank you,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"the question of whether computers can think
 is like the question of whether submarines can swim."
                                                 -- edsgar w. dijkstra
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers)
  2011-08-02  8:28       ` martin f krafft
@ 2011-08-02 15:03         ` martin f krafft
  2011-08-02 18:57           ` Jeff King
  2011-08-02 18:51         ` Storing additional information in commit headers Jeff King
  1 sibling, 1 reply; 23+ messages in thread
From: martin f krafft @ 2011-08-02 15:03 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 4446 bytes --]

also sprach martin f krafft <madduck@madduck.net> [2011.08.02.1028 +0200]:
> What do you think about using the idea of orphan parent commits
> (OPC) for now? These are conceptually closest to the x-*-ref
> pointers, do not require extra setup, pollute history only a little
> bit (IMHO), and slot in with Git and fsck/gc alright.
> 
> Here's the idea again, graphically:
> 
>   o--o--o--●
>        /
>       #
> 
> while at HEAD, I would backtrack history until I found HEAD^, which
> has a parent with a well-defined commit message and holding the data
> I am looking for.
> 
> Later, when x-*-ref is mainline, instead of parent pointers, it can
> be used in place.
> 
> When there is a merge and the TopGit data need updating, a new
> OPC is slotted into place, on the merge commit. In
> the following graph, the user then decided also at a later point to
> update e.g. the TopGit patch description (.topmsg), which is also
> stored in this OPC:
> 
>        o--o-o
>       /      \      maint       master
>   o--o--o--o--+--o--O--o--o--o--●
>        /     /           /
>       #     #           #
> 
> To keep things simple, every OPC copies the unchanged data from the
> previous one as well (compression will reduce the overhead).

I have published a working prototype of this kind of datastore, in
case people are interested:

  http://git.madduck.net/v/code/topgit-ng.git

Here's a bit of synopsis:

% ./tg-datastore list
I: returns non-zero if no datastore found at given commit.
I: prints contents of datastore otherwise.
message: this is a proof-of-concept

% ./tg-datastore find commitref
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
dc58ec49df849ec1aef6929cd40c759a6018e056

% git commit --allow-empty -mone
[master 78918bb] one

% git commit --allow-empty -mtwo
[master 7eca0cd] two

% ./tg-datastore find message
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
this is a proof-of-concept

% ./tg-datastore find commitref
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
dc58ec49df849ec1aef6929cd40c759a6018e056

% ./tg-datastore add message='this is a new message'
I: returns non-zero if there is already a datastore on HEAD.
I: adding the following data to the datastore of HEAD:
I:   message: this is a new message

% ./tg-datastore find commitref
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
8e6179050a1aca5485f3e1702780f1b555d8643b

% ./tg-datastore find message
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
this is a new message

  tig output now:
    2011-08-02 16:52 martin f. krafft   M─┐ [master] two
    2011-08-02 16:54 TopGit             │ I TopGit data node
    2011-08-02 16:52 martin f. krafft   I one
    2011-08-02 16:50 martin f. krafft   M─┐ [origin/master] import first prototype
    2011-08-02 16:50 TopGit             │ I TopGit data node
    2011-08-02 16:48 martin f. krafft   I Initial (empty) root commit

% ./tg-datastore remove
I: always returns zero, even if there was nothing to remove.

% ./tg-datastore find message
I: prints the value of the parameter, or empty if parameter is not found.
I: returns non-zero if no datastore was found.
this is a proof-of-concept

Note three things:

  1. I am actually using a x-* header in the TopGit data node commit
     object to help identify it as a commit. This could be done
     differently (e.g. parse the commit message for some magic), but
     I chose to do this on purpose to see how well it fares.

  2. If Git grew x-*-ref headers (refs to objects in general),
     I could use that instead and drop the parent pointer, which
     would make the DAG cleaner.

  3. Right now, you cannot add parent orphan commits to orphans
     themselves, but it would be trivial to enable. I just couldn't
     be bothered.

Enjoy, and comments of course welcome.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

windoze nt crashed.
i am the blue screen of death.
no one hears your screams.

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers)
  2011-08-02 15:03         ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft
@ 2011-08-02 18:57           ` Jeff King
  2011-08-02 19:09             ` martin f krafft
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-08-02 18:57 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Tue, Aug 02, 2011 at 05:03:21PM +0200, martin f krafft wrote:

>   tig output now:
>     2011-08-02 16:52 martin f. krafft   M─┐ [master] two
>     2011-08-02 16:54 TopGit             │ I TopGit data node
>     2011-08-02 16:52 martin f. krafft   I one
>     2011-08-02 16:50 martin f. krafft   M─┐ [origin/master] import first prototype
>     2011-08-02 16:50 TopGit             │ I TopGit data node
>     2011-08-02 16:48 martin f. krafft   I Initial (empty) root commit

Look at "git show origin/master" here. It ends up as a combined diff.
Which is kind of ugly.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers)
  2011-08-02 18:57           ` Jeff King
@ 2011-08-02 19:09             ` martin f krafft
  2011-08-02 19:26               ` martin f krafft
  0 siblings, 1 reply; 23+ messages in thread
From: martin f krafft @ 2011-08-02 19:09 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 693 bytes --]

also sprach Jeff King <peff@peff.net> [2011.08.02.2057 +0200]:
> Look at "git show origin/master" here. It ends up as a combined diff.
> Which is kind of ugly.

Yes, absolutely. However, this would no longer be the case if
x-*-ref could be used. Right now, I am just using orphan parent
commits to avoid garbage collection.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"he gave me his card
 he said, 'call me if they die'
 i shook his hand and said goodbye
 ran out to the street
 when a bowling ball came down the road
 and knocked me off my feet"
                                                        -- bob dylan
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers)
  2011-08-02 19:09             ` martin f krafft
@ 2011-08-02 19:26               ` martin f krafft
  0 siblings, 0 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-02 19:26 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2109 +0200]:
> Yes, absolutely. However, this would no longer be the case if
> x-*-ref could be used. Right now, I am just using orphan parent
> commits to avoid garbage collection.

refs/heads/master is a file, containing its payload in the first
line by format definition, right?

I mean: the storage is right there, isn't it?

Of course this opens a whole new can of worms: merging per-ref data.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"'oh, that was easy,' says Man, and for an encore goes on to prove
 that black is white and gets himself killed on the next zebra
 crossing."
            -- douglas adams, "the hitchhiker's guide to the galaxy"
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-02  8:28       ` martin f krafft
  2011-08-02 15:03         ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft
@ 2011-08-02 18:51         ` Jeff King
  2011-08-02 19:06           ` martin f krafft
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-08-02 18:51 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Tue, Aug 02, 2011 at 10:28:10AM +0200, martin f krafft wrote:

> TopGit does what you suggest (a parallel ref structure), but there
> are three problems with this, which I am trying to address:
> 
>   1. you need to ensure that these refs are pushed and fetched,
>      which requires set up and possible migration issues when things
>      change, and can cause big problems for contributors who just so
>      happened to forget.

I agree that is an annoyance, but it is one we can deal with. In the
near term, I wonder if a "tg clone" would be appropriate to add the
extra fetch refspecs when cloning (or even a "tg init" inside an
existing git repo -- I don't actually use topgit, so I'm not sure what
the usual initialization process, if any, is).

In the longer term, it might be nice if git was better at sharing
third-party refs. The problem is that we don't know what the refs mean,
so we don't know which ones are appropriate for sharing. Maybe we could
do something like "refs/shared/topgit/*", and git by default would push
and pull items under refs/shared?

There have also been proposals to have a more mirror-like structure to
what we fetch from remotes. E.g., to put remote refs/tags into
refs/remotes/origin/refs/tags, and similar for notes. It may be that it
is sensible for us to just fetch everything from a remote into
refs/remotes, including unknown hierarchies like topgit.

>   2. the additional refs confuse people a lot — and I can attest to
>      that because I have also at times found myself overwhelmed by
>      them when staring at gitk.

Using "gitk --all", I assume? I agree it is annoying, though "gitk
--branches" probably better specifies what you want (unless you stick
the parallel ref structure under refs/heads above, which is also a
solution to the "should it be fetched" plan).

>   3. once a ref updates, we need to keep a pointer to the previous
>      location, since one of the goals is the ability to be able to
>      return to a point in history (e.g. for security updates to
>      a stable package, or backports). Additional refs enhance the
>      aforementioned two problems.

Reflogs provide a linear history of the ref updates, but I suspect you
want to be able to push and pull these histories. Which reflogs will not
do.

If you want to version the state of refs, then using raw refs isn't the
right answer. You want a separate commit history with trees that map ref
names to commits or other objects. Which is _almost_ what notes are;
they map commit sha1s, but you want to map ref names.

> Therefore I thought it would be sensible to store these data in
> commit. When the data change, there will always be a new commit to
> store these data, and we do *not* want to update the data in
> previous commits. Finding the data then becomes backtracking the
> branch history until a commit is found containing them.

That seems to me like you are sticking information in a commit that is
not actually about the commit, but about the ref that happens to point
to the commit. What if I have two refs that point to the same commit,
but with two different topgit bases? What about years later, when that
information isn't interesting anymore? You're still carrying the cruft
inside your commit objects.

> > However, implementing such a thing would mean you have an awkward
> > transition period where some versions of git think the referenced
> > object is relevant, and others do not. That's something we can
> > overcome, but it's going to require code in git, and possibly
> > a dormant introduction period.
> 
> Indeed. This could be adressed by letting a tool like TopGit require
> a minimum version of Git. For a while, this will burden developers,
> but ensure that it works. Over time, this will cease to be
> a problem.

Keep in mind that your requirement is not just a local thing. Object
reachability is something that both sides of a transfer need to agree
on. So imagine you use TopGit with a new version of git, and you push to
a site like GitHub. The remote side will take your objects, but it will
not send them back to anyone who fetches from your repository (since it
has no idea they're relevant). And it will probably prune them after a
week or two.

> What do you think about using the idea of orphan parent commits
> (OPC) for now? These are conceptually closest to the x-*-ref
> pointers, do not require extra setup, pollute history only a little
> bit (IMHO), and slot in with Git and fsck/gc alright.

It doesn't seem like a good idea to me. Parent pointers have a
well-defined meaning, and other parts of git (and other tools, even) are
going to assume that's what your parent pointers mean. They are used in
merge base calculations, for example. I _think_ you are mostly safe
here, because your OPC wouldn't have any real history to it, so finding
a merge base down that path would be fruitless.

But consider something like "diff", which shows a merge commit
differently than a regular commit. Your commits will unexpectedly appear
as merges to git, and we will show a combined diff versus the OPC, which
is going to be ugly.

> I am not yet sure what information needs storing. Right now, I am
> keeping five fields:
> [...]

Thanks, that helped with getting a sense of what you're doing.

> I think there are two questions:
> 
>   1. would x-*-ref be a suitable idea for Git core?
> 
>      I think the answer is yes, as (I think) it's well-defined and
>      I cannot see any problems with it, really.

I think it's a nice idea for extensibility. And if it had been there
from day one, there would be no problems. But now we have to deal with
the transition period, and the fact that two different versions of git
will have different ideas about the set of objects that are reachable
from a given commit.

>   2. can we prevent abuse?
> 
>      No, never. But just like you cannot abuse X-* headers in the
>      RFC822 format due to their design, x-*-ref abuse would only
>      affect those who chose it.

I don't worry about abuse. You can already stick random cruft in a
commit header, and you can already connect objects to a commit via tree
entries. This idea is just giving git some rules for dealing with it.

I'm still not 100% convinced you want per-commit storage, though, and
not per-ref storage.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-02 18:51         ` Storing additional information in commit headers Jeff King
@ 2011-08-02 19:06           ` martin f krafft
  2011-08-02 19:27             ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft
  2011-08-04  3:39             ` Storing additional information in commit headers Jeff King
  0 siblings, 2 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-02 19:06 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 2701 bytes --]

also sprach Jeff King <peff@peff.net> [2011.08.02.2051 +0200]:
> I agree that is an annoyance, but it is one we can deal with. In the
> near term, I wonder if a "tg clone" would be appropriate to add the
> extra fetch refspecs when cloning (or even a "tg init" inside an
> existing git repo -- I don't actually use topgit, so I'm not sure what
> the usual initialization process, if any, is).

Hey Jeff, thanks for your response.

TopGit does come with these commands to do the setup for you, but
that does not ensure that a new contributor without any idea about
TopGit won't forget to run them.

The argument against tg-clone is mainly that I really do not want to
encapsulate/abstract functionality, but rather stay as close as
possible to pure Git, and never to mandate anyone to use anything
else.

> In the longer term, it might be nice if git was better at sharing
> third-party refs. The problem is that we don't know what the refs
> mean, so we don't know which ones are appropriate for sharing.
> Maybe we could do something like "refs/shared/topgit/*", and git
> by default would push and pull items under refs/shared?

This could be an interesting and viable approach.

> > Therefore I thought it would be sensible to store these data in
> > commit. When the data change, there will always be a new commit to
> > store these data, and we do *not* want to update the data in
> > previous commits. Finding the data then becomes backtracking the
> > branch history until a commit is found containing them.
>
> That seems to me like you are sticking information in a commit that is
> not actually about the commit, but about the ref that happens to point
> to the commit. What if I have two refs that point to the same commit,
> but with two different topgit bases?

I don't think this can happen, but the point is valid.

> What about years later, when that information isn't interesting
> anymore? You're still carrying the cruft inside your commit
> objects.
[…]
> I'm still not 100% convinced you want per-commit storage, though,
> and not per-ref storage.

Yes, I do want per-ref storage. Your arguments against my orphan
parent pointer approach (which could later be a x-*-ref approach)
are valid.

It just seems to me that per-ref storage is a lot further away than
per-commit storage, and I'd really like to move forward with TopGit…

Thank you,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"one should never trust a woman who tells her real age.
 if she tells that, she will tell anything."
                                                        -- oscar wilde
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* per-ref data storage (was: Storing additional information in commit headers)
  2011-08-02 19:06           ` martin f krafft
@ 2011-08-02 19:27             ` martin f krafft
  2011-08-02 21:12               ` per-ref data storage martin f krafft
  2011-08-04  3:41               ` per-ref data storage (was: Storing additional information in commit headers) Jeff King
  2011-08-04  3:39             ` Storing additional information in commit headers Jeff King
  1 sibling, 2 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-02 19:27 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

[sorry, my previous message was a total reply FAIL]

also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2106 +0200]:
> It just seems to me that per-ref storage is a lot further away than
> per-commit storage, and I'd really like to move forward with TopGit…

refs/heads/master is a file, containing its payload in the first
line by format definition, right?

I mean: the storage is right there, isn't it?

Of course this opens a whole new can of worms: merging per-ref data.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"nothing can cure the soul but the senses,
 just as nothing can cure the senses but the soul."
                                                        -- oscar wilde
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-ref data storage
  2011-08-02 19:27             ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft
@ 2011-08-02 21:12               ` martin f krafft
  2011-08-04  3:41               ` per-ref data storage (was: Storing additional information in commit headers) Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: martin f krafft @ 2011-08-02 21:12 UTC (permalink / raw)
  To: Jeff King, git discussion list, Petr Baudis, Clemens Buchacher

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2127 +0200]:
> refs/heads/master is a file, containing its payload in the first
> line by format definition, right?
> 
> I mean: the storage is right there, isn't it?
> 
> Of course this opens a whole new can of worms: merging per-ref data.

origin/master can contain a different set of per-ref data than
master, and the consolidation would need to happen during the normal
merge.

But unless there's always a new commit associated with a change of
those data, git-push will happily overwrite those data on the
remote.

… unless the remote refuses to accept a ref update if the data have
changed. Conceivably that's could lead into a control path similar
to what happens on a non-fast-forward push — unless
receive.nonFastForwards is on.

What then?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

seminars, n.:
  from "semi" and "arse", hence, any half-assed discussion.

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) --]
[-- Type: application/pgp-signature, Size: 1124 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: per-ref data storage (was: Storing additional information in commit headers)
  2011-08-02 19:27             ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft
  2011-08-02 21:12               ` per-ref data storage martin f krafft
@ 2011-08-04  3:41               ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff King @ 2011-08-04  3:41 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Tue, Aug 02, 2011 at 09:27:28PM +0200, martin f krafft wrote:

> [sorry, my previous message was a total reply FAIL]
> 
> also sprach martin f krafft <madduck@madduck.net> [2011.08.02.2106 +0200]:
> > It just seems to me that per-ref storage is a lot further away than
> > per-commit storage, and I'd really like to move forward with TopGit…
> 
> refs/heads/master is a file, containing its payload in the first
> line by format definition, right?
> 
> I mean: the storage is right there, isn't it?

Yes, and I think git will even ignore other stuff in the file. But I
don't think you can count on git not obliterating the other stuff when
it updates the ref. Nor would it be passed over a clone or fetch.

> Of course this opens a whole new can of worms: merging per-ref data.

Yes. That's the tricky part. And that's something you'll have to deal
with no matter how you store it, I expect.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-02 19:06           ` martin f krafft
  2011-08-02 19:27             ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft
@ 2011-08-04  3:39             ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff King @ 2011-08-04  3:39 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On Tue, Aug 02, 2011 at 09:06:45PM +0200, martin f krafft wrote:

> It just seems to me that per-ref storage is a lot further away than
> per-commit storage, and I'd really like to move forward with TopGit…

I don't think it's that hard. For example:

  # our mapping for all refs, and the history of that mapping, will be
  # stored under this ref
  MAP=refs/topgit/metadata

  refmap_set() {
    (
      # start with a pristine index based on the current map
      GIT_INDEX_FILE="$(git rev-parse --git-dir)/tg-meta-index"
      export GIT_INDEX_FILE
      if git rev-parse -q --verify $MAP >/dev/null; then
        git read-tree $MAP
      fi

      # and then put our new ref and metadata in
      blob=`git hash-object --stdin -w`
      git update-index --add --cacheinfo 100644 $blob $1
      tree=`git write-tree`
      parent=$(git rev-parse -q --verify $MAP)
      commit=`echo 'updated map' | git commit-tree $tree ${parent:+-p $parent}`
      git update-ref $MAP $commit $old
    )
  }

  refmap_get() {
    git cat-file blob $MAP:$1
  }

  # and some examples of use
  echo some metadata | refmap_set refs/heads/foo
  refmap_get refs/heads/foo |
    sed 's/meta/changed &/' |
    refmap_set refs/heads/foo

It's a little more clunky than notes, of course, but it's not too bad to
put into a script. The tricky part is how to handle fetching and merging
the metadata ref from other people. But that's not really different from
notes. In either case, you're probably going to want to make a custom
merge program for combining the meta-information.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Storing additional information in commit headers
  2011-08-01 18:20 Storing additional information in commit headers martin f krafft
                   ` (2 preceding siblings ...)
  2011-08-01 20:13 ` Jeff King
@ 2011-08-02 13:53 ` Michael Haggerty
  3 siblings, 0 replies; 23+ messages in thread
From: Michael Haggerty @ 2011-08-02 13:53 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Petr Baudis, Clemens Buchacher

On 08/01/2011 08:20 PM, martin f krafft wrote:
> Are there any strong reasons against my use of commit headers for
> specific, well-defined purposes in contained use-cases? E.g. are
> there tools known to only copy "known" headers, which could
> potentially break my assumptions?

Before you store important information in a git-internal data structure,
please consider:

* Some of your developers might prefer using another DVCS (e.g.,
Mercurial via hg-git) and they will not be able to see the information
at all

* Some day the main project might want to (god forbid!) switch to a
successor to git, and your extra information might be difficult to migrate.

* Somebody might want to work with your project from a tarball rather
than having to install and use git.

Therefore, I recommend a strong bias towards storing information in as
transparent, non-system-specific a way as possible.  Metadata and
scripts stored within the file tree part of the repository are typically
a lot easier to work with and more transparent than git-specific hacks.

That being said, I haven't understood your application well enough to
know whether these biases might be trumped by convenience in your
particular situation.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-08-04  3:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-01 18:20 Storing additional information in commit headers martin f krafft
2011-08-01 18:27 ` Sverre Rabbelier
2011-08-01 18:34   ` martin f krafft
2011-08-01 20:01     ` Clemens Buchacher
2011-08-01 20:55       ` martin f krafft
2011-08-01 18:28 ` martin f krafft
2011-08-01 19:33   ` Martin Langhoff
2011-08-01 20:51     ` martin f krafft
2011-08-01 20:13 ` Jeff King
2011-08-01 21:11   ` martin f krafft
2011-08-02  3:50     ` Jeff King
2011-08-02  8:28       ` martin f krafft
2011-08-02 15:03         ` working prototype of orphan parent commits as datastores (was: Storing additional information in commit headers) martin f krafft
2011-08-02 18:57           ` Jeff King
2011-08-02 19:09             ` martin f krafft
2011-08-02 19:26               ` martin f krafft
2011-08-02 18:51         ` Storing additional information in commit headers Jeff King
2011-08-02 19:06           ` martin f krafft
2011-08-02 19:27             ` per-ref data storage (was: Storing additional information in commit headers) martin f krafft
2011-08-02 21:12               ` per-ref data storage martin f krafft
2011-08-04  3:41               ` per-ref data storage (was: Storing additional information in commit headers) Jeff King
2011-08-04  3:39             ` Storing additional information in commit headers Jeff King
2011-08-02 13:53 ` Michael Haggerty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).