From: Johan Herland <johan@herland.net>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: Re: [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits
Date: Tue, 29 May 2007 09:06:29 +0200 [thread overview]
Message-ID: <200705290906.29328.johan@herland.net> (raw)
In-Reply-To: <7vwsysbrtg.fsf@assigned-by-dhcp.cox.net>
On Monday 28 May 2007, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > On Monday 28 May 2007, Linus Torvalds wrote:
> >> On Mon, 28 May 2007, Johan Herland wrote:
> >> > I still don't see what makes note objects inherently more expensive than
> >> > commit objects. Except for the refs, of course, but we're getting rid
> >> > of those (at least replacing them with a more efficient reverse mapping).
> >>
> >> It's exactly the refs that I worry about.
> >>
> >> Anything that needs to read in all notes at startup is going to be _slow_.
> >>
> >> In contrast, commits we read when (and only when) we need them.
> >
> > Ok. But the reverse mapping will help with this, won't it?
> > We'll look up the interesting commits and find their associated
> > note objects directly.
>
> The issue Linus brought up worries me, too.
>
> The "efficient reverse mapping" is still handwaving at this
> stage. What it needs to do is an equivalent to your
> implementation with "refs/notes/<a dir per commit>/<note>". The
> "efficient" one might do a flat file that says "notee note" per
> line sorted by notee, or it might use BDB or sqlite, but the
> amount of the data and complexity of the look-up is really the
> same. A handful notes per each commit in the history (I think
> Linus's "Acked-by after the fact" example a very sensible thing
> to want from this subsystem).
>
> I am not saying that it is impossible to make the set-up cost
> for the "efficient lookup" almost zero, and to make it lazy and
> on-demand. The concern above just adds one design constraints
> to that "efficient reverse mapping" code yet to come.
Ok, here's what I'm thinking so far on that reverse mapping:
1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form
"pointee pointer" per line. The file is sorted on "pointee", so we can
easily do the radix-256-fan-out-followed-by-binary-search trick that
Linus mentioned in another thread. This should hopefully make lookup
fairly cheap. BTW, if there is a similar "pointee pointer"-type format
already being used in git, I'd be happy to use that instead. I looked
at the "peeled" format being used by packed-refs, but using that
directly doesn't sound like a good idea, since the refname causes the
entries to be of variable length, and the refnames are not interesting
to me at all.
2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1).
This file has exactly the same format, minus the sorting. It exists just
to make insertion cheap. Once this file reaches a certain size (i.e.
when trawling it on lookup becomes slightly painful), we shuffle the
entries into the sorted file (this would happen automatically on
insertion of an entry, and should _not_ have to be triggered by 'git-gc'
etc.).
Of course, if we think insertion directly into (1) will never be too
expensive, we can drop (2) altogether.
I don't know enough about packing to have a good idea on how to pack
these reverse tagmaps, but Shawn's thoughts about keeping associated
tags/notes and objects close together makes a lot of sense. I'm just
not sure yet where these reverse tagmaps fit into the whole picture.
Currently, AFAICS, the packed-refs file is never propagated into the
packs, but stays separate for the lifetime of the repo, but then it
seems we're designing these reverse tagmaps for managing a handful of
notes per commit, i.e. to hold a couple of orders of magnitude more
entries than the packed-refs file.
Maybe each pack should keep the reverse tagmap for all the object->note
relationships internal to that pack? Everything else (unpacked notes,
and object->note relationships spanning packs) would be kept in (1).
Of course, when repacking, we'd try to keep objects and their notes
together as much as possible, to maximize the in-pack reverse tagmap,
and minimize the number of entries left behind in (1).
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2007-05-29 7:06 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-09 19:20 [RFC] Second parent for reverts Daniel Barkalow
2007-05-09 20:07 ` Johannes Schindelin
2007-05-09 20:22 ` Shawn O. Pearce
2007-05-09 22:26 ` Johan Herland
2007-05-09 21:54 ` Junio C Hamano
2007-05-09 22:16 ` Linus Torvalds
2007-05-10 16:35 ` Linus Torvalds
2007-05-10 18:06 ` Johan Herland
2007-05-10 18:22 ` Linus Torvalds
2007-05-27 14:08 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Johan Herland
2007-05-27 14:09 ` [PATCH 01/15] git-note: Add git-note command for adding/listing/deleting git notes Johan Herland
2007-05-27 14:10 ` [PATCH 02/15] git-note: (Documentation) Add git-note manual page Johan Herland
2007-05-27 14:11 ` [PATCH 03/15] git-note: (Administrivia) Add git-note to Makefile, .gitignore, etc Johan Herland
2007-05-27 14:11 ` [PATCH 04/15] git-note: (Plumbing) Add plumbing-level support for git notes Johan Herland
2007-05-27 14:12 ` [PATCH 05/15] git-note: (Plumbing) Add support for git notes to git-rev-parse and git-show-ref Johan Herland
2007-05-27 14:13 ` [PATCH 06/15] git-note: (Documentation) Explain the new '--notes' option " Johan Herland
2007-05-27 14:13 ` [PATCH 07/15] git-note: (Almost plumbing) Add support for git notes to git-pack-refs and git-fsck Johan Herland
2007-05-27 14:14 ` [PATCH 08/15] git-note: (Decorations) Add note decorations to "git-{log,show,whatchanged} --decorate" Johan Herland
2007-05-27 14:14 ` [PATCH 09/15] git-note: (Documentation) Explain new behaviour of --decorate in git-{log,show,whatchanged} Johan Herland
2007-05-27 14:15 ` [PATCH 10/15] git-note: (Transfer) Teach git-clone how to clone notes Johan Herland
2007-05-27 14:15 ` [PATCH 11/15] git-note: (Transfer) Teach git-fetch to auto-follow notes Johan Herland
2007-05-27 14:15 ` [PATCH 12/15] git-note: (Transfer) Teach git-push to push notes when --all or --notes is given Johan Herland
2007-05-27 14:16 ` [PATCH 13/15] git-note: (Documentation) Explain the new --notes option to git-push Johan Herland
2007-05-27 14:16 ` [PATCH 14/15] git-note: (Tests) Add tests for git-note and associated functionality Johan Herland
2007-05-27 14:17 ` [PATCH 15/15] git-note: Add display of notes to gitk Johan Herland
2007-05-27 20:09 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Junio C Hamano
2007-05-28 0:29 ` Johan Herland
2007-05-28 0:59 ` Jakub Narebski
2007-05-28 4:37 ` Linus Torvalds
2007-05-28 10:54 ` Johan Herland
2007-05-28 16:28 ` Linus Torvalds
2007-05-28 16:40 ` Johan Herland
2007-05-28 16:58 ` Linus Torvalds
2007-05-28 17:48 ` Johan Herland
2007-05-28 20:45 ` Junio C Hamano
2007-05-28 21:35 ` Shawn O. Pearce
2007-05-28 23:37 ` Johannes Schindelin
2007-05-29 3:12 ` Linus Torvalds
2007-05-29 3:22 ` Shawn O. Pearce
2007-05-29 7:04 ` Jakub Narebski
2007-05-29 11:04 ` Andy Parkins
2007-05-29 11:12 ` Johannes Schindelin
2007-05-29 7:06 ` Johan Herland [this message]
2007-05-29 8:22 ` Jeff King
2007-05-29 9:23 ` Johan Herland
2007-05-28 20:45 ` Junio C Hamano
2007-05-28 21:19 ` Shawn O. Pearce
2007-05-28 23:46 ` [PATCH] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-05-28 17:29 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Michael S. Tsirkin
2007-05-28 17:42 ` Michael S. Tsirkin
2007-05-28 17:58 ` Johan Herland
2007-05-10 22:33 ` [RFC] Second parent for reverts Martin Langhoff
2007-05-10 1:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200705290906.29328.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).