What happened with git-note?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* What happened with git-note?
@ 2007-07-10 23:17 Alberto Bertogli
  2007-07-11  1:17 ` Johannes Schindelin
  0 siblings, 1 reply; 2+ messages in thread
From: Alberto Bertogli @ 2007-07-10 23:17 UTC (permalink / raw)
  To: git

Hi!

There was a very long thread about a proposed feature, git-note
(http://thread.gmane.org/gmane.comp.version-control.git/46770/focus=48540),
around May this year.

I was wondering what happened to it, because some issues were brought up
but then everything seem to have stalled.

Is someone working on it? Is there a clear list of the pending issues
somewhere?

Thanks,
		Alberto

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: What happened with git-note?
  2007-07-10 23:17 What happened with git-note? Alberto Bertogli
@ 2007-07-11  1:17 ` Johannes Schindelin
  0 siblings, 0 replies; 2+ messages in thread
From: Johannes Schindelin @ 2007-07-11  1:17 UTC (permalink / raw)
  To: Alberto Bertogli; +Cc: git, gitster, Johan Herland

Hi,

[Johan, I Cc'ed you, since you were the person pushing for git-notes 
originally. Junio, I Cc'ed you, because you correctly pointed out the 
flaws in my approach.]

On Tue, 10 Jul 2007, Alberto Bertogli wrote:

> There was a very long thread about a proposed feature, git-note 
> (http://thread.gmane.org/gmane.comp.version-control.git/46770/focus=48540), 
> around May this year.
> 
> I was wondering what happened to it, because some issues were brought up
> but then everything seem to have stalled.
> 
> Is someone working on it? Is there a clear list of the pending issues
> somewhere?

Okay, I seem to be at fault for the stalling.  So I better speak up about 
what my proof-of-concept did:

It creates a new ref, "refs/notes/commits", which had a tree object in it. 
The tree entries were actually fan-out directories a la .git/objects/??/. 
So if you added a not to commit abcdef01234..., it created a "file" named 
refs/notes/commits:ab/cdef01234...

The advantages of this approach, as compared to an external list, or even 
SQLite database were, amongst others:

- you can use most of the existing framework (storage, pruning, etc.),

- there is an obvious way how to fetch the notes from someone else,

- there is an obvious way how to deal with conflicts when merging notes
  from someone else,

- it is conceptually trivial.

However, the biggest disadvantage was pointed out by Junio:

The tree objects under refs/notes/commits:??/ can get pretty large, in 
theory.  Suppose you have a rate of a couple of hundred commits a day, and 
you want to use the notes framework, amongst others, to be able to add 
Acked-by's all over the place.  Pretty soon we are in the hundreds of 
entries in the tree objects.

I thought that this is not a problem, since the names all have the same 
length.  Alas, Git is general purpose, so it does not, and cannot, assume 
a certain filename format.  Git might be considered a stupid content 
tracker, but it is not quite that stupid.

Since the tree objects just contain lists of tree entries, and each tree 
entry is variable length, you have to look through half of the entries on 
average, before you find the one you look for.

Possible solutions I can come up with:

There was a lot of talk, and quite a few good ideas, in the packv4 side 
branch, which was undertaken by Nico and Shawn.  The basic idea, IIUC, is 
to store objects differently, by having nice things like a filename table 
containing the most common names, sorted, which are then not only 
addressed much more efficiently (by a 16-bit index), but reverse indexing 
is also much more efficient, since you do not have to expand and compare 
the full filenames, but instead compare 16-bin numbers.

This format would help with the approach I presented above.

Another idea which just crossed my mind, is to add indexes which are 
generated locally, but not transmitted when fetching, much like the pack 
indices, which are generated from the pack while fetching it.

There are a couple of possible technical ways to do that efficiently, such 
as mmap()ed rbtrees, or hashtables.  Of course, Git's own history suggests 
to prefer hashtables over other data structures, when your keys are 
already pretty good hashes, which they are.  However, I have not thought 
it through, what with the issue of growing the data structure, or 
thrashing the I/O caches.

Of course, one does not have to go with the approach of storing the things 
as blobs.  There might well be a genius coming along, reading this mail, 
and proposing the uber-solution.  But dammit, I am pretty sure that we 
have a real good framework, and that a proper solution is just a stone's 
throw away.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-11  1:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-10 23:17 What happened with git-note? Alberto Bertogli
2007-07-11  1:17 ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).