[RFD] Notes are independent: proposal for new notes implementation

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFD] Notes are independent: proposal for new notes implementation
@ 2010-02-09 20:05 Jakub Narebski
  2010-02-09 20:26 ` Avery Pennarun
  2010-02-10  4:51 ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Narebski @ 2010-02-09 20:05 UTC (permalink / raw)
  To: git; +Cc: Johan Herland, Junio C Hamano, Jon Seymour

Junio have noticed in one of threads about notes implementation in
git[*1*] that current notes implementation has (conceptual) problems:

[1] "Re: A generalization of git notes from blobs to trees - git metadata?"
    Message-ID: <7v8wb4aj4m.fsf@alter.siamese.dyndns.org>
    http://permalink.gmane.org/gmane.comp.version-control.git/139252

    (its one of threads that IIRC started with implementing hand-rolled
    support for notes in gitweb by Giuseppe Bilotta)

Junio C Hamano <gitster@pobox.com> writes:
JH>
JH> It's [current notes implementation] like "keeping track of /etc" (or
JH> "your home directory").  It is a misguided thing to do because you
JH> are throwing in records of the states of totally unrelated things
JH> into a single history (e.g. "Why does it matter I added new user 
JH> frotz to /etc/passwd before I futzed with my sendmail configuration?
JH> ---It shouldn't matter; there shouldn't be ancestry relationships
JH> between these two changes").  I somehow feel that keeping track of
JH> the "growth of the bag of annotations to any and all commits" in a
JH> single history may be making the same mistake.

The proposed solution was to use custom merge strategy for notes.  But
what if the answer was to change implementation, decoupling history of
notes from each other, and keeping history of each note separate.

Let's simplify situation, and talk for now about single notes namespace
(refs/notes/commits), no fanout scheme, and plain blob notes.

In CURRENT notes implementation the notes ref (e.g. refs/notes/commits)
point to a commit object: the tip of history of all notes.  This commit
stores information about last change to any note; it's commit message is
"Annotate <SHA-1>".  It's tree contains mapping between notes and
annotated object: notes are stored as leafs in the tree, and their
pathnames are (representing) objects they are annotating.

This means for example that if in repository A somebody annotated
commits foo and bar creating notes in this order, and in repository B
somebody annotated bar and foo (creating notes in reverse order), then
merging those changes would require generating merge commit even if
those notes are identical.

 tree:                         <-- Annotate bar <-- refs/notes/commits
 <foo note SHA-1> <foo SHA-1>           |
 <bar note SHA-1> <bar SHA-1>           | (parent)
                                        |
                                        v
 tree:                         <-- Annotate foo
 <foo note SHA-1> <foo SHA-1>      (no parent)

The PROPOSED solution (with admittedly larger overhead) is to have notes
history stored in submodule-like fashion.  The notes ref would point to
the tree object.  In this tree each leaf would point to a *commit*
representing tip of history for a given note (like for submodules).
Each commit would contain tree, which would map note to annotated object
(it is extra level of indirection, needed because commit cannot point to
blob directly... unless multiple notes for the same commit in tree
structure got implemented, or tree annotations got implemented.)

This way history of each note is in kind of a separate branch, and notes
refs point to tree object representing branch hierarchy.

Merge conflict would appear only if notes for the same object would have
different contents or/and different history.

                  tree:                         <-- refs/notes/commits
    /------------ <foo hist SHA-1> <foo SHA-1> 
    |         /-- <bar hist SHA-1> <bar SHA-1> 
    |         |
    |         v
    |     Annotate bar --> tree:
    |     (no parent)      <bar note SHA-1> <bar SHA-1>
    v
  Annotate foo ----------> tree
  (no parent)              <foo note SHA-1> <foo SHA-1>

One thing that would need to be addressed is converting from older notes
implementation, but this should be doable.  The problem would be in
supporting both implementations in one repository; it might be not
possible.  Also this would break compatibility: older git versions
supporting notes wouldn't be able, I guess, to access new (proposed)
format.

There are probably numerous issues with proposed implementation, beside
breaking backward compatibility...

P.S. This shows why git tools (such as gitweb) should not access notes
directly, but use git-notes, %N / %N(<ref>) format specifier, and proposed
<object>^@{} / object^@{<ref>} or <object>^{notes} / <object>^{notes:<ref>}

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] Notes are independent: proposal for new notes  implementation
  2010-02-09 20:05 [RFD] Notes are independent: proposal for new notes implementation Jakub Narebski
@ 2010-02-09 20:26 ` Avery Pennarun
  2010-02-09 20:55   ` Jakub Narebski
  2010-02-10  4:51 ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Avery Pennarun @ 2010-02-09 20:26 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Johan Herland, Junio C Hamano, Jon Seymour

2010/2/9 Jakub Narebski <jnareb@gmail.com>:
> But
> what if the answer was to change implementation, decoupling history of
> notes from each other, and keeping history of each note separate.

Congratulations, you've re-invented CVS! :)

Seriously though, I'm not sure what problems this solved.  Notes that
are related to each other can (and perhaps should) be in the same
notes commit history; notes that are not related to each other can
exist in separate histories with their own ref.

> This means for example that if in repository A somebody annotated
> commits foo and bar creating notes in this order, and in repository B
> somebody annotated bar and foo (creating notes in reverse order), then
> merging those changes would require generating merge commit even if
> those notes are identical.

That's a feature; now you have the true history of your notes, which
is good for all the same reasons it's good in git.

Of course this whole line of reasoning could lead to questions like
"can I rebase my notes history?" and "what about rebase -i" and "can I
maintain a notes patch queue" and so on.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] Notes are independent: proposal for new notes  implementation
  2010-02-09 20:26 ` Avery Pennarun
@ 2010-02-09 20:55   ` Jakub Narebski
  2010-02-09 21:37     ` Avery Pennarun
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Narebski @ 2010-02-09 20:55 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: git, Johan Herland, Junio C Hamano, Jon Seymour

On Tue, 9 Feb 2010, Avery Pennarun wrote:
> 2010/2/9 Jakub Narebski <jnareb@gmail.com>:

> > But
> > what if the answer was to change implementation, decoupling history of
> > notes from each other, and keeping history of each note separate.
> 
> Congratulations, you've re-invented CVS! :)
> 
> Seriously though, I'm not sure what problems this solved.  Notes that
> are related to each other can (and perhaps should) be in the same
> notes commit history; notes that are not related to each other can
> exist in separate histories with their own ref.

The problem is (as I see it) that notes are _not_ (in almost all cases)
related to each other, just like files in $HOME or in /etc are not
related to each other.  Separate notes refs for separate histories
are not IMHO a good solution: refs namespaces are about *kind* (flavor)
of notes: commits annotations, bisect, git-svn, apply-email, bugs / tickets,
etc. and each flavor (kind) of notes contain many independent notes.
 
This is opposed to workspace history, where history (in almost all cases)
makes sense only of all files, history of a project as a whole.

And of course we would have atomic commits, merge tracking, support for
renames etc., something like Zit[1]

[1]: http://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools#Zit

> > This means for example that if in repository A somebody annotated
> > commits foo and bar creating notes in this order, and in repository B
> > somebody annotated bar and foo (creating notes in reverse order), then
> > merging those changes would require generating merge commit even if
> > those notes are identical.
> 
> That's a feature; now you have the true history of your notes, which
> is good for all the same reasons it's good in git.

No, you are introducing artificial ordering in something that is a bag,
unordered collection.

> Of course this whole line of reasoning could lead to questions like
> "can I rebase my notes history?" and "what about rebase -i" and "can I
> maintain a notes patch queue" and so on.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] Notes are independent: proposal for new notes  implementation
  2010-02-09 20:55   ` Jakub Narebski
@ 2010-02-09 21:37     ` Avery Pennarun
  0 siblings, 0 replies; 5+ messages in thread
From: Avery Pennarun @ 2010-02-09 21:37 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Johan Herland, Junio C Hamano, Jon Seymour

On Tue, Feb 9, 2010 at 3:55 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Tue, 9 Feb 2010, Avery Pennarun wrote:
>> 2010/2/9 Jakub Narebski <jnareb@gmail.com>:
>> > But
>> > what if the answer was to change implementation, decoupling history of
>> > notes from each other, and keeping history of each note separate.
>>
>> Congratulations, you've re-invented CVS! :)
>>
>> Seriously though, I'm not sure what problems this solved.  Notes that
>> are related to each other can (and perhaps should) be in the same
>> notes commit history; notes that are not related to each other can
>> exist in separate histories with their own ref.
>
> The problem is (as I see it) that notes are _not_ (in almost all cases)
> related to each other, just like files in $HOME or in /etc are not
> related to each other.

As a side note, I didn't find this example compelling at all.  I
*absolutely* want to manage all my files in /etc as a single repo.
"The configuration of my computer" is an ongoing project where the
configuration of my smtp daemon depends on /etc/hosts and /etc/passwd
and /etc/group.  If I set up another server, I want to be able to fork
my basic configuration and apply some patches.  If I set up some
clever aliases in /etc/profile, I want to send that patch "upstream"
to the /etc project on my other servers.

Similarly with $HOME; the evolution of my home directory over time is
a thing I can talk about as a sensible whole, and of course I want
rename tracking and deltas and so on.

Combining /etc and $HOME into a single repo would be harder to
justify.  But that sounds to me like the "kind (flavor)" distinction
you're talking about; system config files and personal files are two
different kinds of files.

>> > This means for example that if in repository A somebody annotated
>> > commits foo and bar creating notes in this order, and in repository B
>> > somebody annotated bar and foo (creating notes in reverse order), then
>> > merging those changes would require generating merge commit even if
>> > those notes are identical.
>>
>> That's a feature; now you have the true history of your notes, which
>> is good for all the same reasons it's good in git.
>
> No, you are introducing artificial ordering in something that is a bag,
> unordered collection.

I would put it another way: you're recording the true ordering of
something where the ordering *may* not be important.  It is easy to
ignore that ordering.

However, it's very hard to unignore ordering that you didn't record in
the first place.  That's why CVS's model of recording
one-history-per-file is so nasty.  Yet it seemed so clever when they
invented it.

What is a real use case where the "artificial ordering" causes a problem?

Here's a use case where having a single history would be a clear
benefit: say you're running an autobuilder such as gitbuilder[1].
Something goes wrong with your autobuild environment, like the disk
fills up, and all the build results since yesterday at noon are
invalid.  If the build results are stored as a single history of git
notes[2], I can just rewind the history to yesterday at noon and
discard the entire sequence of bad results across all commits.  If
they each had their own history, it would be more complex to
implement.

Have fun,

Avery

[1] http://github.com/apenwarr/gitbuilder

[2] gitbuilder doesn't currently use git-notes, it just uses plain files.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFD] Notes are independent: proposal for new notes implementation
  2010-02-09 20:05 [RFD] Notes are independent: proposal for new notes implementation Jakub Narebski
  2010-02-09 20:26 ` Avery Pennarun
@ 2010-02-10  4:51 ` Jeff King
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2010-02-10  4:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Johan Herland, Junio C Hamano, Jon Seymour

On Tue, Feb 09, 2010 at 09:05:23PM +0100, Jakub Narebski wrote:

> The proposed solution was to use custom merge strategy for notes.  But
> what if the answer was to change implementation, decoupling history of
> notes from each other, and keeping history of each note separate.

If I am understanding you correctly, instead of keeping a commit history
of trees of many notes (one per sha1), we will have a tree of commit
histories, one history per sha1. What problem is this solving?

If I modify commit X and you modify commit Y, we avoid making a merge
commit. But so what? The merge would be trivial, since we did not modify
the same entries. The user never cares that there is a merge commit.

And if we both _did_ edit commit X, both cases result in a merge.

If we both modified X and Y, then you will presumably do the merge for X
and Y iteratively before you can create a new notes tree. Or you could
merge them separately. But why? Why would I want to pull some subset of
your notes (and keep in mind this is a subset of the commits you have
noted, not a semantically different notes namespace), and how would I
even specify which notes were of interest and which were not?

So what is the concrete use case where this helps?

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-02-10  4:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-09 20:05 [RFD] Notes are independent: proposal for new notes implementation Jakub Narebski
2010-02-09 20:26 ` Avery Pennarun
2010-02-09 20:55   ` Jakub Narebski
2010-02-09 21:37     ` Avery Pennarun
2010-02-10  4:51 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).