git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: Re: [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits
Date: Tue, 29 May 2007 09:06:29 +0200	[thread overview]
Message-ID: <200705290906.29328.johan@herland.net> (raw)
In-Reply-To: <7vwsysbrtg.fsf@assigned-by-dhcp.cox.net>

On Monday 28 May 2007, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > On Monday 28 May 2007, Linus Torvalds wrote:
> >> On Mon, 28 May 2007, Johan Herland wrote:
> >> > I still don't see what makes note objects inherently more expensive than
> >> > commit objects. Except for the refs, of course, but we're getting rid
> >> > of those (at least replacing them with a more efficient reverse mapping).
> >> 
> >> It's exactly the refs that I worry about.
> >> 
> >> Anything that needs to read in all notes at startup is going to be _slow_.
> >> 
> >> In contrast, commits we read when (and only when) we need them.
> >
> > Ok. But the reverse mapping will help with this, won't it?
> > We'll look up the interesting commits and find their associated
> > note objects directly.
> 
> The issue Linus brought up worries me, too.
> 
> The "efficient reverse mapping" is still handwaving at this
> stage.  What it needs to do is an equivalent to your
> implementation with "refs/notes/<a dir per commit>/<note>".  The
> "efficient" one might do a flat file that says "notee note" per
> line sorted by notee, or it might use BDB or sqlite, but the
> amount of the data and complexity of the look-up is really the
> same.  A handful notes per each commit in the history (I think
> Linus's "Acked-by after the fact" example a very sensible thing
> to want from this subsystem).
> 
> I am not saying that it is impossible to make the set-up cost
> for the "efficient lookup" almost zero, and to make it lazy and
> on-demand.  The concern above just adds one design constraints
> to that "efficient reverse mapping" code yet to come.

Ok, here's what I'm thinking so far on that reverse mapping:

1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form
"pointee pointer" per line. The file is sorted on "pointee", so we can
easily do the radix-256-fan-out-followed-by-binary-search trick that
Linus mentioned in another thread. This should hopefully make lookup
fairly cheap. BTW, if there is a similar "pointee pointer"-type format
already being used in git, I'd be happy to use that instead. I looked
at the "peeled" format being used by packed-refs, but using that
directly doesn't sound like a good idea, since the refname causes the
entries to be of variable length, and the refnames are not interesting
to me at all.

2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1).
This file has exactly the same format, minus the sorting. It exists just
to make insertion cheap. Once this file reaches a certain size (i.e.
when trawling it on lookup becomes slightly painful), we shuffle the
entries into the sorted file (this would happen automatically on
insertion of an entry, and should _not_ have to be triggered by 'git-gc'
etc.).


Of course, if we think insertion directly into (1) will never be too
expensive, we can drop (2) altogether.

I don't know enough about packing to have a good idea on how to pack
these reverse tagmaps, but Shawn's thoughts about keeping associated
tags/notes and objects close together makes a lot of sense. I'm just
not sure yet where these reverse tagmaps fit into the whole picture.

Currently, AFAICS, the packed-refs file is never propagated into the
packs, but stays separate for the lifetime of the repo, but then it
seems we're designing these reverse tagmaps for managing a handful of
notes per commit, i.e. to hold a couple of orders of magnitude more
entries than the packed-refs file.

Maybe each pack should keep the reverse tagmap for all the object->note
relationships internal to that pack? Everything else (unpacked notes,
and object->note relationships spanning packs) would be kept in (1).
Of course, when repacking, we'd try to keep objects and their notes
together as much as possible, to maximize the in-pack reverse tagmap,
and minimize the number of entries left behind in (1).


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  parent reply	other threads:[~2007-05-29  7:06 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-09 19:20 [RFC] Second parent for reverts Daniel Barkalow
2007-05-09 20:07 ` Johannes Schindelin
2007-05-09 20:22   ` Shawn O. Pearce
2007-05-09 22:26     ` Johan Herland
2007-05-09 21:54 ` Junio C Hamano
2007-05-09 22:16   ` Linus Torvalds
2007-05-10 16:35     ` Linus Torvalds
2007-05-10 18:06       ` Johan Herland
2007-05-10 18:22         ` Linus Torvalds
2007-05-27 14:08           ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Johan Herland
2007-05-27 14:09             ` [PATCH 01/15] git-note: Add git-note command for adding/listing/deleting git notes Johan Herland
2007-05-27 14:10             ` [PATCH 02/15] git-note: (Documentation) Add git-note manual page Johan Herland
2007-05-27 14:11             ` [PATCH 03/15] git-note: (Administrivia) Add git-note to Makefile, .gitignore, etc Johan Herland
2007-05-27 14:11             ` [PATCH 04/15] git-note: (Plumbing) Add plumbing-level support for git notes Johan Herland
2007-05-27 14:12             ` [PATCH 05/15] git-note: (Plumbing) Add support for git notes to git-rev-parse and git-show-ref Johan Herland
2007-05-27 14:13             ` [PATCH 06/15] git-note: (Documentation) Explain the new '--notes' option " Johan Herland
2007-05-27 14:13             ` [PATCH 07/15] git-note: (Almost plumbing) Add support for git notes to git-pack-refs and git-fsck Johan Herland
2007-05-27 14:14             ` [PATCH 08/15] git-note: (Decorations) Add note decorations to "git-{log,show,whatchanged} --decorate" Johan Herland
2007-05-27 14:14             ` [PATCH 09/15] git-note: (Documentation) Explain new behaviour of --decorate in git-{log,show,whatchanged} Johan Herland
2007-05-27 14:15             ` [PATCH 10/15] git-note: (Transfer) Teach git-clone how to clone notes Johan Herland
2007-05-27 14:15             ` [PATCH 11/15] git-note: (Transfer) Teach git-fetch to auto-follow notes Johan Herland
2007-05-27 14:15             ` [PATCH 12/15] git-note: (Transfer) Teach git-push to push notes when --all or --notes is given Johan Herland
2007-05-27 14:16             ` [PATCH 13/15] git-note: (Documentation) Explain the new --notes option to git-push Johan Herland
2007-05-27 14:16             ` [PATCH 14/15] git-note: (Tests) Add tests for git-note and associated functionality Johan Herland
2007-05-27 14:17             ` [PATCH 15/15] git-note: Add display of notes to gitk Johan Herland
2007-05-27 20:09             ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Junio C Hamano
2007-05-28  0:29               ` Johan Herland
2007-05-28  0:59               ` Jakub Narebski
2007-05-28  4:37             ` Linus Torvalds
2007-05-28 10:54               ` Johan Herland
2007-05-28 16:28                 ` Linus Torvalds
2007-05-28 16:40                   ` Johan Herland
2007-05-28 16:58                     ` Linus Torvalds
2007-05-28 17:48                       ` Johan Herland
2007-05-28 20:45                         ` Junio C Hamano
2007-05-28 21:35                           ` Shawn O. Pearce
2007-05-28 23:37                             ` Johannes Schindelin
2007-05-29  3:12                             ` Linus Torvalds
2007-05-29  3:22                               ` Shawn O. Pearce
2007-05-29  7:04                                 ` Jakub Narebski
2007-05-29 11:04                               ` Andy Parkins
2007-05-29 11:12                                 ` Johannes Schindelin
2007-05-29  7:06                           ` Johan Herland [this message]
2007-05-29  8:22                             ` Jeff King
2007-05-29  9:23                               ` Johan Herland
2007-05-28 20:45                 ` Junio C Hamano
2007-05-28 21:19                   ` Shawn O. Pearce
2007-05-28 23:46                   ` [PATCH] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-05-28 17:29               ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Michael S. Tsirkin
2007-05-28 17:42                 ` Michael S. Tsirkin
2007-05-28 17:58                   ` Johan Herland
2007-05-10 22:33       ` [RFC] Second parent for reverts Martin Langhoff
2007-05-10  1:43   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200705290906.29328.johan@herland.net \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).