A generalization of git notes from blobs to trees

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* A generalization of git notes from blobs to trees - git metadata?
@ 2010-02-06 13:32 Jon Seymour
  2010-02-07  1:36 ` Johan Herland
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2010-02-06 13:32 UTC (permalink / raw)
  To: Git Mailing List

git notes is a nice innovation - well done to all those involved.

Has consideration ever been given to generalizing the concept to allow
note (or more correctly -  metadata) trees with arbitrary sha1s?

For example, suppose you had reason to cache the distribution that
resulted from the build of a particular commit, then it'd be nice to
be able to do this using a notes like mechanism.

    git metadata import foo-1.1.0 dist ~/foo/dist

would create a git tree from the contents of ~/foo/dist and then bind
it to meta item called dist associated with the sha1 corresponding to
foo-1.1.0

To retrieve the contents of the previous build, you'd do something like

   get metadata export foo-1.1.0 dist /tmp/foo-1.1.0

This would find the metadata tree associated with foo-1.1.0, extract
the dist subtree from that tree and write it to disk at /tmp/foo-1.1.0

I've used build outputs as an example here, but really it needn't be
limited to that. I can see this facility would be useful for any kind
of annotation or derived result that is more complex than a single
text blob. Metadata trees in combination with a name spacing
technique, could be used to store arbitrary metadata created by an
arbitrary set of tools to arbitrary SHA1 objects.

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-06 13:32 A generalization of git notes from blobs to trees - git metadata? Jon Seymour
@ 2010-02-07  1:36 ` Johan Herland
  2010-02-07  2:21   ` Junio C Hamano
  2010-02-07  3:27   ` Jon Seymour
  0 siblings, 2 replies; 19+ messages in thread
From: Johan Herland @ 2010-02-07  1:36 UTC (permalink / raw)
  To: Jon Seymour; +Cc: git

On Saturday 06 February 2010, Jon Seymour wrote:
> git notes is a nice innovation - well done to all those involved.

Thanks.

> Has consideration ever been given to generalizing the concept to allow
> note (or more correctly -  metadata) trees with arbitrary sha1s?

Not sure what you mean here. The note infrastructure allows _any_ SHA1 (not 
necessarily the SHA1 of an existing Git object) to be bound to a note 
object.

Furthermore, although we currently assume that all note objects are blobs, 
someone (who?) has already suggested (as mentioned in the notes TODO list) 
that a note object could also be a _tree_ object that can be unpacked/read 
to reveal further "sub-notes". Hence, in addition to having multiple notes 
refs (e.g. refs/notes/commits:deadbeef, refs/notes/bugs:deadbeef, etc.) to 
categorize notes, you could also classify notes _after_ having traversed the 
notes tree (e.g. refs/notes/bugs:deadbeef/fixes, 
refs/notes/bugs:deadbeef/causes). Note that support for this has not yet 
been written, and AFAIK it is also uncertain how such a change would affect 
the different use cases for notes (e.g. how to display them in 'git log')

> For example, suppose you had reason to cache the distribution that
> resulted from the build of a particular commit, then it'd be nice to
> be able to do this using a notes like mechanism.
> 
>     git metadata import foo-1.1.0 dist ~/foo/dist
> 
> would create a git tree from the contents of ~/foo/dist and then bind
> it to meta item called dist associated with the sha1 corresponding to
> foo-1.1.0

You can do this already today by simply using 'git tag':
	# Prepare an index with the contents of ~/foo/dist
	git tag foo-1.1.0-dist $(git write-tree)

I don't see why you'd need to add a new metadata command.

> To retrieve the contents of the previous build, you'd do something like
> 
>    get metadata export foo-1.1.0 dist /tmp/foo-1.1.0
> 
> This would find the metadata tree associated with foo-1.1.0, extract
> the dist subtree from that tree and write it to disk at /tmp/foo-1.1.0

Or, if you use a tag instead:
	git --work-tree=/tmp/foo-1.1.0 checkout foo-1.1.0-dist

> I've used build outputs as an example here, but really it needn't be
> limited to that. I can see this facility would be useful for any kind
> of annotation or derived result that is more complex than a single
> text blob. Metadata trees in combination with a name spacing
> technique, could be used to store arbitrary metadata created by an
> arbitrary set of tools to arbitrary SHA1 objects.

I still don't see why this provides anything that isn't already supported by 
either using 'git tag', or by implementing support for notes-as-trees in the 
notes feature.

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  1:36 ` Johan Herland
@ 2010-02-07  2:21   ` Junio C Hamano
  2010-02-07  5:02     ` Jeff King
  2010-02-07  3:27   ` Jon Seymour
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2010-02-07  2:21 UTC (permalink / raw)
  To: Johan Herland; +Cc: Jon Seymour, git

Johan Herland <johan@herland.net> writes:

> Furthermore, although we currently assume that all note objects are blobs, 
> someone (who?) has already suggested (as mentioned in the notes TODO list) 
> that a note object could also be a _tree_ object that can be unpacked/read 
> to reveal further "sub-notes".

I would advice you not to go there.  How would you even _merge_ such a
thing with other notes attached to the same object?  What determines the
path in that tree object?

Clueless ones can freely make misguided suggestions without thinking
things through and make things unnecessarily complex without real gain.
You do not have to listen to every one of them.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  1:36 ` Johan Herland
  2010-02-07  2:21   ` Junio C Hamano
@ 2010-02-07  3:27   ` Jon Seymour
  2010-02-07  4:32     ` Jon Seymour
  1 sibling, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2010-02-07  3:27 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

> I still don't see why this provides anything that isn't already supported by
> either using 'git tag', or by implementing support for notes-as-trees in the
> notes feature.
>

The intent of the metadata facility is to associate derivatives of
sha1 with the sha1 itself. If I have calculated a derivative of sha1
in the past, then let me reference that derivative using a metadata
path which I can look up knowing only the sha1 of the input and
nothing more. Yes,  I could create tags of the form
${sha1}/metadata-path for all my derived results but really, this
seems an abuse of the tag facility.

Here's another motivating example:

Suppose git-svn wrote the SVN id it was synched with into structured
metadata associated with a commit, instead of into the commit message,
the equivalent of:

    echo ${svn-id} | git metadata write-blob ${sha1} svn-id

Which means: for the specified sha1, read a blob from stdin and create
a metadata item with a metadata path called svn-id

To get it out again, you would write:

    git metadata read-blob ${sha1} svn-id

Which says, for the given object ${sha1}, read the blob from the
metadata tree at path svn-id and write its contents to stdout.

This would avoid cluttering the commit message with the svn-id, avoid
cluttering the tag space with the info and allow any commit to be
tagged in this way.

Admittedly similar function could be achieved a little more clumsily
now with appropriate use of GIT_NOTES_REF or with note subtrees, but I
share Junio's  reservations about trying to generalize notes from
blobs to trees, given way notes are currently used by the rest of
infrastructure.

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  3:27   ` Jon Seymour
@ 2010-02-07  4:32     ` Jon Seymour
  0 siblings, 0 replies; 19+ messages in thread
From: Jon Seymour @ 2010-02-07  4:32 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, Junio C Hamano

Another use case could be to store the contents of the man and html
trees of git, which are currently published as separate branches.

With the metadata concept, the man and html trees for each release
could be stored as metadata paths (/man, /html) of the associated
commit for each release, providing a trivial way to address and access
these trees.

jon.

On Sun, Feb 7, 2010 at 2:27 PM, Jon Seymour <jon.seymour@gmail.com> wrote:
>> I still don't see why this provides anything that isn't already supported by
>> either using 'git tag', or by implementing support for notes-as-trees in the
>> notes feature.
>>
>
> The intent of the metadata facility is to associate derivatives of
> sha1 with the sha1 itself. If I have calculated a derivative of sha1
> in the past, then let me reference that derivative using a metadata
> path which I can look up knowing only the sha1 of the input and
> nothing more. Yes,  I could create tags of the form
> ${sha1}/metadata-path for all my derived results but really, this
> seems an abuse of the tag facility.
>
> Here's another motivating example:
>
> Suppose git-svn wrote the SVN id it was synched with into structured
> metadata associated with a commit, instead of into the commit message,
> the equivalent of:
>
>    echo ${svn-id} | git metadata write-blob ${sha1} svn-id
>
> Which means: for the specified sha1, read a blob from stdin and create
> a metadata item with a metadata path called svn-id
>
> To get it out again, you would write:
>
>    git metadata read-blob ${sha1} svn-id
>
> Which says, for the given object ${sha1}, read the blob from the
> metadata tree at path svn-id and write its contents to stdout.
>
> This would avoid cluttering the commit message with the svn-id, avoid
> cluttering the tag space with the info and allow any commit to be
> tagged in this way.
>
> Admittedly similar function could be achieved a little more clumsily
> now with appropriate use of GIT_NOTES_REF or with note subtrees, but I
> share Junio's  reservations about trying to generalize notes from
> blobs to trees, given way notes are currently used by the rest of
> infrastructure.
>
> jon.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  2:21   ` Junio C Hamano
@ 2010-02-07  5:02     ` Jeff King
  2010-02-07  5:36       ` Jon Seymour
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff King @ 2010-02-07  5:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Jon Seymour, git

On Sat, Feb 06, 2010 at 06:21:37PM -0800, Junio C Hamano wrote:

> Johan Herland <johan@herland.net> writes:
> 
> > Furthermore, although we currently assume that all note objects are blobs, 
> > someone (who?) has already suggested (as mentioned in the notes TODO list) 
> > that a note object could also be a _tree_ object that can be unpacked/read 
> > to reveal further "sub-notes".
> 
> I would advice you not to go there.  How would you even _merge_ such a
> thing with other notes attached to the same object?  What determines the
> path in that tree object?
> 
> Clueless ones can freely make misguided suggestions without thinking
> things through and make things unnecessarily complex without real gain.
> You do not have to listen to every one of them.

I think I may have been the one to suggest trees or notes at one point.
But let me clarify that this is not exactly what the OP is proposing in
this thread.

My suggestion was that some use cases may have many key/value pairs of
notes for a single sha1. We basically have two options:

  1. store each in a separate notes ref, with each sha1 mapping to
     a blob. The note "name" is the name of the ref.

  2. store notes in a single notes ref, with each sha1 mapping to a
     tree with named sub-notes. The note "name" is the combination of
     ref-name and tree entry name.

The advantage of (1) is that notes are not bound tightly to each other.
I can distribute the notes tree for one "name" independent of the
others.  The advantage of (2) is that it is faster and smaller. In (1),
each note has a separate index, and we must traverse each note index
separately.

In practice, I would expect to use (1) for logically separate datasets.
For example, automatic bug-tracking notes would go in a different ref
from human annotations. But I would expect to use (2) if I had, say, 5
different pieces of bug tracking information and I wanted an easy way to
refer to them individually.

And a specialized merge for that is straightforward. In the simplest
case, you simply say "notes of this ref are tree-type, or they are
blob-type" and then you have no merge problems. But if you want to get
fancy, you can say that a conflict between "sha1/blob" and
"sha1/tree/key" should automatically "promote" the first one into
"sha1/tree/default" or some other canonical name.

Note that all of this is my pie-in-the-sky "here is what I was thinking
of when I looked at notes a long time ago". I don't care strongly if it
gets implemented or not at this point; I just wanted to add some context
to what Johan had in his notes todo list (or maybe I am wrong, and what
is in his todo list was based on something totally different said by
somebody else, and I have just confused the issue more. :) ).

With respect to the idea of storing an arbitrary tree, I agree it is
probably too complex with respect to merging. In addition, it makes
things like "git log --format=%N" confusing. I think you would do better
to simply store a tree sha1 inside the note blob, and callers who were
interested in the tree contents could then dereference it and examine as
they saw fit.  The only caveat is that you need some way of telling git
that the referenced trees are reachable and not to be pruned.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  5:02     ` Jeff King
@ 2010-02-07  5:36       ` Jon Seymour
  2010-02-07  9:15         ` Jakub Narebski
  2010-02-07 19:33         ` Jeff King
  2010-02-07 18:48       ` Junio C Hamano
  2010-02-07 22:46       ` Johan Herland
  2 siblings, 2 replies; 19+ messages in thread
From: Jon Seymour @ 2010-02-07  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johan Herland, git

On Sun, Feb 7, 2010 at 4:02 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Feb 06, 2010 at 06:21:37PM -0800, Junio C Hamano wrote:
>
>> Johan Herland <johan@herland.net> writes:
>>
>> > Furthermore, although we currently assume that all note objects are blobs,
>> > someone (who?) has already suggested (as mentioned in the notes TODO list)
>> > that a note object could also be a _tree_ object that can be unpacked/read
>> > to reveal further "sub-notes".
>>
>> I would advice you not to go there.  How would you even _merge_ such a
>> thing with other notes attached to the same object?  What determines the
>> path in that tree object?
>>
>> Clueless ones can freely make misguided suggestions without thinking
>> things through and make things unnecessarily complex without real gain.
>> You do not have to listen to every one of them.
>
> I think I may have been the one to suggest trees or notes at one point.
> But let me clarify that this is not exactly what the OP is proposing in
> this thread.
>
> My suggestion was that some use cases may have many key/value pairs of
> notes for a single sha1. We basically have two options:
>
>  1. store each in a separate notes ref, with each sha1 mapping to
>     a blob. The note "name" is the name of the ref.
>
>  2. store notes in a single notes ref, with each sha1 mapping to a
>     tree with named sub-notes. The note "name" is the combination of
>     ref-name and tree entry name.
>

So, of course, options (1) and (2) need not be exclusive. Use Option
(1) for different metadata sets and option (2) to partition individual
datasets.

>
> With respect to the idea of storing an arbitrary tree, I agree it is
> probably too complex with respect to merging. In addition, it makes
> things like "git log --format=%N" confusing. I think you would do better
> to simply store a tree sha1 inside the note blob, and callers who were
> interested in the tree contents could then dereference it and examine as
> they saw fit.  The only caveat is that you need some way of telling git
> that the referenced trees are reachable and not to be pruned.
>

As I see it, the existing use of notes is a special instance of a more
general metadata capability in which the metadata is constrained to be
a single blob. If notes continued to be constrained in this way, there
is no reason to change anything with respect to its current userspace
behaviour. That said, most of the plumbing which enabled notes could
be generalized to enable the arbitrary tree case [ which admittedly, I
have yet to sell successfully !]

In one sense, there is a sense in the merge issue doesn't exist. When
the maintainer publishes a tag no-one expects to have to deal with
downstream conflicting definitions of the tag. Likewise, if the
maintainer were to publish the /man and /html metadata trees (per my
previous example) for a release tag, anyone who received
/refs/metadata/doc would expect to receive the metadata trees as
published by the maintainer. Anyone who didn't wouldn't have to pull
/refs/metadata/doc.

I can see there are use cases where multiple parties might want to
contribute metadata and I do not currently have a good solution to
that problem, but that is not to say there isn't one - surely it is
just a question of applying a little intellect creatively?

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  5:36       ` Jon Seymour
@ 2010-02-07  9:15         ` Jakub Narebski
  2010-02-07  9:41           ` Jon Seymour
  2010-02-07 19:33         ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Narebski @ 2010-02-07  9:15 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Jeff King, Junio C Hamano, Johan Herland, git

Jon Seymour <jon.seymour@gmail.com> writes:

[cut]

> As I see it, the existing use of notes is a special instance of a more
> general metadata capability in which the metadata is constrained to be
> a single blob. If notes continued to be constrained in this way, there
> is no reason to change anything with respect to its current userspace
> behaviour. That said, most of the plumbing which enabled notes could
> be generalized to enable the arbitrary tree case [ which admittedly, I
> have yet to sell successfully !]
> 
> In one sense, there is a sense in the merge issue doesn't exist. When
> the maintainer publishes a tag no-one expects to have to deal with
> downstream conflicting definitions of the tag. Likewise, if the
> maintainer were to publish the /man and /html metadata trees (per my
> previous example) for a release tag, anyone who received
> /refs/metadata/doc would expect to receive the metadata trees as
> published by the maintainer. Anyone who didn't wouldn't have to pull
> /refs/metadata/doc.
> 
> I can see there are use cases where multiple parties might want to
> contribute metadata and I do not currently have a good solution to
> that problem, but that is not to say there isn't one - surely it is
> just a question of applying a little intellect creatively?

Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem
data/resource forks, and Microsoft's Alternate Data Streams in git? :-)

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  9:15         ` Jakub Narebski
@ 2010-02-07  9:41           ` Jon Seymour
  2010-02-07 10:15             ` Jon Seymour
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Seymour @ 2010-02-07  9:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Johan Herland, git

On Sun, Feb 7, 2010 at 8:15 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> Jon Seymour <jon.seymour@gmail.com> writes:
>
> [cut]
>
>> As I see it, the existing use of notes is a special instance of a more
>> general metadata capability in which the metadata is constrained to be
>> a single blob. If notes continued to be constrained in this way, there
>> is no reason to change anything with respect to its current userspace
>> behaviour. That said, most of the plumbing which enabled notes could
>> be generalized to enable the arbitrary tree case [ which admittedly, I
>> have yet to sell successfully !]
>>
>> In one sense, there is a sense in the merge issue doesn't exist. When
>> the maintainer publishes a tag no-one expects to have to deal with
>> downstream conflicting definitions of the tag. Likewise, if the
>> maintainer were to publish the /man and /html metadata trees (per my
>> previous example) for a release tag, anyone who received
>> /refs/metadata/doc would expect to receive the metadata trees as
>> published by the maintainer. Anyone who didn't wouldn't have to pull
>> /refs/metadata/doc.
>>
>> I can see there are use cases where multiple parties might want to
>> contribute metadata and I do not currently have a good solution to
>> that problem, but that is not to say there isn't one - surely it is
>> just a question of applying a little intellect creatively?
>
> Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem
> data/resource forks, and Microsoft's Alternate Data Streams in git? :-)
>

No I am not. I don't see why a metadata proposal is any more exposed
to subversive payloads than say, use of git merge -s ours [ a
subversive payload could be made reachable from a commit that
otherwise merges in favour of the legitimate source - who would know?
]

Really, I can't see why the rationale that makes a single blob used
for extending a commit message justified can't be used to justify
associating a metadata tree of arbitrary complexity to an arbitrary
sha1 object. What makes maintaining a mapping to a single blob
acceptable but maintaining a mapping to a tree unacceptable? Is there
really any fundamental difference?

jon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  9:41           ` Jon Seymour
@ 2010-02-07 10:15             ` Jon Seymour
  0 siblings, 0 replies; 19+ messages in thread
From: Jon Seymour @ 2010-02-07 10:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Johan Herland, git

To explain a little further why the metadata concept is different to
the resource fork or alternate data stream concept.

These two concepts were based on the idea of associating metadata with
the name of the resource and preserving the metadata along with the
resource as the resource evolved.

This is not the intent with the metadata concept. Rather, the idea is
to annotate the content (whether it be a commit, tree or blob) with
other content (a tree in the general metadata case, or a blob in the
git notes case)

The use cases I have in mind relate to caching "expensive (or
impractical) to re-derive" results from an input. So, for example,
storing /man and /html  trees for a given commit in a metadata commit
called "refs/metadata/doc" would be one case. Storing the foreign SCM
revision id that a git repo was pushed into would be another [ storing
it the commit message isn't an option because the commit has already
happened before the push ]. [ And granted: git notes can already be
used for this scenario ].

jon.

On Sun, Feb 7, 2010 at 8:41 PM, Jon Seymour <jon.seymour@gmail.com> wrote:
> On Sun, Feb 7, 2010 at 8:15 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>> Jon Seymour <jon.seymour@gmail.com> writes:
>>
>> [cut]
>>
>>> As I see it, the existing use of notes is a special instance of a more
>>> general metadata capability in which the metadata is constrained to be
>>> a single blob. If notes continued to be constrained in this way, there
>>> is no reason to change anything with respect to its current userspace
>>> behaviour. That said, most of the plumbing which enabled notes could
>>> be generalized to enable the arbitrary tree case [ which admittedly, I
>>> have yet to sell successfully !]
>>>
>>> In one sense, there is a sense in the merge issue doesn't exist. When
>>> the maintainer publishes a tag no-one expects to have to deal with
>>> downstream conflicting definitions of the tag. Likewise, if the
>>> maintainer were to publish the /man and /html metadata trees (per my
>>> previous example) for a release tag, anyone who received
>>> /refs/metadata/doc would expect to receive the metadata trees as
>>> published by the maintainer. Anyone who didn't wouldn't have to pull
>>> /refs/metadata/doc.
>>>
>>> I can see there are use cases where multiple parties might want to
>>> contribute metadata and I do not currently have a good solution to
>>> that problem, but that is not to say there isn't one - surely it is
>>> just a question of applying a little intellect creatively?
>>
>> Are you trying to repeat fail of Apple's / MacOS / HFS+ filesystem
>> data/resource forks, and Microsoft's Alternate Data Streams in git? :-)
>>
>
> No I am not. I don't see why a metadata proposal is any more exposed
> to subversive payloads than say, use of git merge -s ours [ a
> subversive payload could be made reachable from a commit that
> otherwise merges in favour of the legitimate source - who would know?
> ]
>
> Really, I can't see why the rationale that makes a single blob used
> for extending a commit message justified can't be used to justify
> associating a metadata tree of arbitrary complexity to an arbitrary
> sha1 object. What makes maintaining a mapping to a single blob
> acceptable but maintaining a mapping to a tree unacceptable? Is there
> really any fundamental difference?
>
> jon.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  5:02     ` Jeff King
  2010-02-07  5:36       ` Jon Seymour
@ 2010-02-07 18:48       ` Junio C Hamano
  2010-02-07 19:18         ` Jeff King
  2010-02-07 22:46       ` Johan Herland
  2 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2010-02-07 18:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, Jon Seymour, git

Jeff King <peff@peff.net> writes:

> ... I think you would do better
> to simply store a tree sha1 inside the note blob, and callers who were
> interested in the tree contents could then dereference it and examine as
> they saw fit.  The only caveat is that you need some way of telling git
> that the referenced trees are reachable and not to be pruned.

Thanks for a good summary.  To paraphrase the idea, for the "pre-built
binaries" use case, I could update the dodoc.sh script (in 'todo'---that
is what autobuilds the html and man documentation and updates the
corresponding branches at k.org when I push things out to the master
branch) to add a note to the commit from 'master' the docs are generated
from, and the note would say which commits on html and man branches
correspond to that commit.  That way, the referenced "trees" are of course
protected because they are reachable from html/man refs.

Right?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07 18:48       ` Junio C Hamano
@ 2010-02-07 19:18         ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2010-02-07 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, Jon Seymour, git

On Sun, Feb 07, 2010 at 10:48:58AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... I think you would do better
> > to simply store a tree sha1 inside the note blob, and callers who were
> > interested in the tree contents could then dereference it and examine as
> > they saw fit.  The only caveat is that you need some way of telling git
> > that the referenced trees are reachable and not to be pruned.
> 
> Thanks for a good summary.  To paraphrase the idea, for the "pre-built
> binaries" use case, I could update the dodoc.sh script (in 'todo'---that
> is what autobuilds the html and man documentation and updates the
> corresponding branches at k.org when I push things out to the master
> branch) to add a note to the commit from 'master' the docs are generated
> from, and the note would say which commits on html and man branches
> correspond to that commit.  That way, the referenced "trees" are of course
> protected because they are reachable from html/man refs.
> 
> Right?

Yeah, I think that would work fine. I guess there are cases, though,
where somebody might not be keeping a linear history of noted trees in a
separate ref (the way you keep html/man refs). In which case they would
have to deal with the reachability problem separately. I can't think of
an example off the top of my head, though.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  5:36       ` Jon Seymour
  2010-02-07  9:15         ` Jakub Narebski
@ 2010-02-07 19:33         ` Jeff King
  2010-02-07 20:25           ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2010-02-07 19:33 UTC (permalink / raw)
  To: Jon Seymour; +Cc: Junio C Hamano, Johan Herland, git

On Sun, Feb 07, 2010 at 04:36:59PM +1100, Jon Seymour wrote:

> As I see it, the existing use of notes is a special instance of a more
> general metadata capability in which the metadata is constrained to be
> a single blob. If notes continued to be constrained in this way, there
> is no reason to change anything with respect to its current userspace
> behaviour. That said, most of the plumbing which enabled notes could
> be generalized to enable the arbitrary tree case [ which admittedly, I
> have yet to sell successfully !]

I do agree that storing trees is a natural generalization of the current
notes implementation. Callers have to be made aware that they may see
trees, of course, but you could probably "demote" trees into their
representative sha1s for callers who were interested only in a blob
form.

But what I am concerned with is that generalizing may violate some
assumptions made about how notes work. Notes trees can re-balance
themselves to some degree, I thought (though I am pretty out of the loop
on current notes developments). So during merges we need to normalize
tree representations (though we probably already need to do that for the
blob case). We would also need to do some magic with rename detection
during merges.  You would probably want rename detection _within_ a tree
stored as a note for a particular commit, but not between notes stored
for different commits.

Or perhaps you would not even want to do a tree-merge between notes at
all, and would rather see a conflict if two people noted two different
trees. This would make sense to me if you were doing something like
noting a build setup. If I note that commit X builds with a tree
pointing to version Y of the build tools, and you note that it builds
with version Z of the build tools, what should happen when we merge our
notes? I can imagine wanting a conflict, and resolving it to Y or Z
(perhaps whichever is more desirable). I can also see resolving it to Y
_and_ Z (iow, treating it like a list). But doing a merge on the two
trees of build tools (which are presumably somewhat immutable) is
probably not helpful.

Which to me argues in favor of adding the extra level of indirection.
The note should store the tree sha1, and those who want to treat it as a
tree can do so. Rename and merge issues just go away, as they operate on
the tree sha1 and not on the tree itself. And of course the
representation is just an implementation detail; you could still make a
"git metadata" wrapper to transparently store trees from the user's
perspective.

The only complication is that git doesn't know to follow those sha1s for
reachability analysis. In some cases that won't matter (like Junio's
html/man example), but I suspect in some it will. Perhaps there is some
way to flag the note entry as "this stores a sha1 that should be
followed by fsck, but not otherwise dereferenced".

I dunno. That is all just thinking out loud. It would help if we had
some really detailed concrete examples of notes being used in practice.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07 19:33         ` Jeff King
@ 2010-02-07 20:25           ` Junio C Hamano
  2010-02-08  2:03             ` Steven E. Harris
  2010-02-10  5:09             ` Jeff King
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2010-02-07 20:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Jon Seymour, Johan Herland, git

Jeff King <peff@peff.net> writes:

> Or perhaps you would not even want to do a tree-merge between notes at
> all, and would rather see a conflict if two people noted two different
> trees.

I've been thinking about the merge issues, and am starting to suspect that
we might want a merge strategy quite drastically different even for blob
cases.  That is one of the reasons why I don't want to see us muddy the
issues by introducing even more complex "tree" case.

Anybody working in the same project can start 'notes' tree with his or her
own root.  That is the normal use case for annotating commits for your own
use.  For merges inside the history of primary contents that people try to
collaborate to advance, three-way merge pivoting on a common ancestor is a
natural way to reach a satisfactory result.  In notes namespace, on the
other hand, the norm is to simply overlay the notes trees, adjusting for
the fan-out.  You annotated that commit I was not interested in, while I
annotated this commit you weren't interested in.  We have our notes in the
end result, and both of us are happy.  If we happen to have annotated the
same commit without knowing what the other was doing, then there is no
sane consolidation---in the most typical case, we would want to keep both,
perhaps concatenating them together.  Textual merge becomes the exception
that triggers two "notes" histories happened to have forked from the same
root somehow.

And for that most typical use case, I suspect even the current "notes on
any and all commits for a single purpose are thrown into a one _bag_ that
is a notes tree, and the growth of that bag is made into a history" model
captures sets of notes that is too wide.

Suppose Alice, Bob and I are involved in a project, and we annotate
commits for some shared purpose (say, tracking regressions).  Alice and
Bob may independently annotate overlapping set of commits (and hopefully
they have shared root for their notes history as they are collaborating),
and they may even be working together on the same issue, but I may not be
involved in the area.  What happens when I pull from Alice and Bob and get
conflicts in notes they produced, especially the only reason I was
interested was because they have new things to say about commits that I am
interested in?

You can end up with conflicts in areas you are not familiar with but Alice
and Bob are in charge of even in the primary content space, but there is a
fundamental difference of this type of conflict in the notes space, I
think.  The set of contents in the primary content space are supposed to
make a consistent whole, and there is a topic branch workflow to partition
the work to allow me to easily kick the merge back to them (i.e. I can
tell Alice and Bob to resolve the conflicts between themselves and trust
that what they do between them do not touch outside of their area) without
getting blocked.  I don't see a clear workflow to resolve this in the
notes space, especially with the set of operations the current "git notes"
(and obvious and straightforward enhancements of what it does).  At least
not yet.

It's like "keeping track of /etc" (or "your home directory").  It is a
misguided thing to do because you are throwing in records of the states of
totally unrelated things into a single history (e.g. "Why does it matter I
added new user frotz to /etc/passwd before I futzed with my sendmail
configuration?  ---It shouldn't matter; there shouldn't be ancestry
relationships between these two changes").  I somehow feel that keeping
track of the "growth of the bag of annotations to any and all commits" in
a single history may be making the same mistake.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07  5:02     ` Jeff King
  2010-02-07  5:36       ` Jon Seymour
  2010-02-07 18:48       ` Junio C Hamano
@ 2010-02-07 22:46       ` Johan Herland
  2 siblings, 0 replies; 19+ messages in thread
From: Johan Herland @ 2010-02-07 22:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jon Seymour, git

On Sunday 07 February 2010, Jeff King wrote:
> I think I may have been the one to suggest trees or notes at one point.
> But let me clarify that this is not exactly what the OP is proposing in
> this thread.
> 
> My suggestion was that some use cases may have many key/value pairs of
> notes for a single sha1. We basically have two options:
> 
>   1. store each in a separate notes ref, with each sha1 mapping to
>      a blob. The note "name" is the name of the ref.
> 
>   2. store notes in a single notes ref, with each sha1 mapping to a
>      tree with named sub-notes. The note "name" is the combination of
>      ref-name and tree entry name.
> 
> The advantage of (1) is that notes are not bound tightly to each other.
> I can distribute the notes tree for one "name" independent of the
> others.  The advantage of (2) is that it is faster and smaller. In (1),
> each note has a separate index, and we must traverse each note index
> separately.
> 
> In practice, I would expect to use (1) for logically separate datasets.
> For example, automatic bug-tracking notes would go in a different ref
> from human annotations. But I would expect to use (2) if I had, say, 5
> different pieces of bug tracking information and I wanted an easy way to
> refer to them individually.
> 
> And a specialized merge for that is straightforward. In the simplest
> case, you simply say "notes of this ref are tree-type, or they are
> blob-type" and then you have no merge problems. But if you want to get
> fancy, you can say that a conflict between "sha1/blob" and
> "sha1/tree/key" should automatically "promote" the first one into
> "sha1/tree/default" or some other canonical name.
> 
> Note that all of this is my pie-in-the-sky "here is what I was thinking
> of when I looked at notes a long time ago". I don't care strongly if it
> gets implemented or not at this point; I just wanted to add some context
> to what Johan had in his notes todo list (or maybe I am wrong, and what
> is in his todo list was based on something totally different said by
> somebody else, and I have just confused the issue more. :) ).

No, My TODO item was indeed based on your suggestion (although poorly 
represented by me, both in the TODO list, and in my original answer to Jon). 
However, note that I don't feel this specific itch myself, so I'm unlikely 
to scratch it.

> With respect to the idea of storing an arbitrary tree, I agree it is
> probably too complex with respect to merging. In addition, it makes
> things like "git log --format=%N" confusing. I think you would do better
> to simply store a tree sha1 inside the note blob, and callers who were
> interested in the tree contents could then dereference it and examine as
> they saw fit.  The only caveat is that you need some way of telling git
> that the referenced trees are reachable and not to be pruned.

Agreed. Arbitrary trees as notes objects is probably not a good idea.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07 20:25           ` Junio C Hamano
@ 2010-02-08  2:03             ` Steven E. Harris
  2010-02-10  5:09             ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Steven E. Harris @ 2010-02-08  2:03 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> It's like "keeping track of /etc" (or "your home directory").  It is a
> misguided thing to do because you are throwing in records of the
> states of totally unrelated things into a single history.

I've recently tried doing this again with Git, so this comment piqued my
interest. (That is, tracking changes to my various configuration files.)
I agree that browsing the history in toto is jarring, though the history
of a particular file may be telling.

Is there an alternative you'd recommend?

-- 
Steven E. Harris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-07 20:25           ` Junio C Hamano
  2010-02-08  2:03             ` Steven E. Harris
@ 2010-02-10  5:09             ` Jeff King
  2010-02-10  5:23               ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2010-02-10  5:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Seymour, Johan Herland, git

On Sun, Feb 07, 2010 at 12:25:13PM -0800, Junio C Hamano wrote:

> Suppose Alice, Bob and I are involved in a project, and we annotate
> commits for some shared purpose (say, tracking regressions).  Alice and
> Bob may independently annotate overlapping set of commits (and hopefully
> they have shared root for their notes history as they are collaborating),
> and they may even be working together on the same issue, but I may not be
> involved in the area.  What happens when I pull from Alice and Bob and get
> conflicts in notes they produced, especially the only reason I was
> interested was because they have new things to say about commits that I am
> interested in?

Hmm. OK, I see the point of Jakub's message a bit more now. You want to
create a new view, inconsistent with that of either Alice or Bob (that
is, you have taken snippets of each's state, but you cannot in good
faith represent this as a history merge, because your state should not
supersede either of theirs).

The standard way to do such a thing in git is to create a new, alternate
history through cherry-picking or rebasing. So I suspect we could do
something like:

  1. git notes pull alice

     We fast-forward (or do the trivial merge) with Alice's work.

  2. git notes pull --ignore-conflicts bob

     We try to merge Bob's work and see that there are conflicts. So we
     iterate through refs/notes..bob/notes, cherry-picking each one that
     applies cleanly and ignoring the rest.

And then you're at a state inconsistent with Bob, and a superset of what
Alice has. And that's what your history represents, too: you've branched
but done some of the same things as Bob. At that point you can examine
your inconsistent state, and then when you're done, you can either:

  3a. Reset back to your pre-ignore-conflicts state.

  3b. Leave it. When you pull from Bob later, your shared changes will
      be ignored[1], and you will get the conflicts that you ignored
      earlier.

It is perhaps a hacky band-aid to handle notes this way, but it is the
"most git" way of doing it. That is, it uses our standard tools and
practices.  And when all you have is a hammer... :)  And I really expect
the "I am collaborating with these people, but I want an inconsistent
view of their history" to be the exception. Most people would _want_ to
resolve the conflicts (especially if there is a --cat-conflicts
option to do it automatically) in a collaboration scenario.

-Peff

[1] Actually because history has diverged, you have the usual cherry
pick problems with merging later. If some note is at state A, then I
cherry-pick Bob's change to B, then Bob changes it to C and I try to
merge with him, from the 3-way merge's perspective we have a conflict,
because nothing in the history says that Bob's change to C meant to
supersede my cherry-picked version of his history.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-10  5:09             ` Jeff King
@ 2010-02-10  5:23               ` Junio C Hamano
  2010-02-10  5:29                 ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2010-02-10  5:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jon Seymour, Johan Herland, git

Jeff King <peff@peff.net> writes:

> On Sun, Feb 07, 2010 at 12:25:13PM -0800, Junio C Hamano wrote:
>
>> Suppose Alice, Bob and I are involved in a project, and we annotate
>> commits for some shared purpose (say, tracking regressions).  Alice and
>> Bob may independently annotate overlapping set of commits (and hopefully
>> they have shared root for their notes history as they are collaborating),
>> and they may even be working together on the same issue, but I may not be
>> involved in the area.  What happens when I pull from Alice and Bob and get
>> conflicts in notes they produced, especially the only reason I was
>> interested was because they have new things to say about commits that I am
>> interested in?
>
> Hmm. OK, I see the point of Jakub's message a bit more now. You want to
> create a new view, inconsistent with that of either Alice or Bob (that
> is, you have taken snippets of each's state, but you cannot in good
> faith represent this as a history merge, because your state should not
> supersede either of theirs).

In the message you are quoting, I am not interested in creating a narrowed
view.  If I cannot resolve conflicts between Alice and Bob in a merge in
the contents space, I would ask either of them (because they are more
familiar with the area) to do the merge.  I however was unsure if asking
the same for merges in the notes space is a reasonable thing to do.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: A generalization of git notes from blobs to trees - git metadata?
  2010-02-10  5:23               ` Junio C Hamano
@ 2010-02-10  5:29                 ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2010-02-10  5:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jon Seymour, Johan Herland, git

On Tue, Feb 09, 2010 at 09:23:12PM -0800, Junio C Hamano wrote:

> > Hmm. OK, I see the point of Jakub's message a bit more now. You want to
> > create a new view, inconsistent with that of either Alice or Bob (that
> > is, you have taken snippets of each's state, but you cannot in good
> > faith represent this as a history merge, because your state should not
> > supersede either of theirs).
> 
> In the message you are quoting, I am not interested in creating a narrowed
> view.  If I cannot resolve conflicts between Alice and Bob in a merge in
> the contents space, I would ask either of them (because they are more
> familiar with the area) to do the merge.  I however was unsure if asking
> the same for merges in the notes space is a reasonable thing to do.

No, I don't see a problem with asking them to do it. If you are all
collaborating as a group, it is something they will need to do
eventually anyway. If they are not, and you are an intermediary, you are
eventually going to share Alice's history with Bob and vice versa. So
you pull from Alice, then say to Bob: "I have some history but I'm not
sure of the correct merge. Pull from me and merge please". The only real
problem is if you _never_ want to share the history between the two of
them. In that case, I think you should keep two parallel branches of
history (refs/notes/alice and refs/notes/bob), and then squash the trees
at run-time (either concatenating them, or favoring one over the other
in the case of conflicts).

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-02-10  5:30 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-06 13:32 A generalization of git notes from blobs to trees - git metadata? Jon Seymour
2010-02-07  1:36 ` Johan Herland
2010-02-07  2:21   ` Junio C Hamano
2010-02-07  5:02     ` Jeff King
2010-02-07  5:36       ` Jon Seymour
2010-02-07  9:15         ` Jakub Narebski
2010-02-07  9:41           ` Jon Seymour
2010-02-07 10:15             ` Jon Seymour
2010-02-07 19:33         ` Jeff King
2010-02-07 20:25           ` Junio C Hamano
2010-02-08  2:03             ` Steven E. Harris
2010-02-10  5:09             ` Jeff King
2010-02-10  5:23               ` Junio C Hamano
2010-02-10  5:29                 ` Jeff King
2010-02-07 18:48       ` Junio C Hamano
2010-02-07 19:18         ` Jeff King
2010-02-07 22:46       ` Johan Herland
2010-02-07  3:27   ` Jon Seymour
2010-02-07  4:32     ` Jon Seymour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).