From: Johan Herland <johan@herland.net>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, git@vger.kernel.org
Subject: Re: [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits
Date: Tue, 29 May 2007 09:06:29 +0200 [thread overview]
Message-ID: <200705290906.29328.johan@herland.net> (raw)
In-Reply-To: <7vwsysbrtg.fsf@assigned-by-dhcp.cox.net>
On Monday 28 May 2007, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > On Monday 28 May 2007, Linus Torvalds wrote:
> >> On Mon, 28 May 2007, Johan Herland wrote:
> >> > I still don't see what makes note objects inherently more expensive than
> >> > commit objects. Except for the refs, of course, but we're getting rid
> >> > of those (at least replacing them with a more efficient reverse mapping).
> >>
> >> It's exactly the refs that I worry about.
> >>
> >> Anything that needs to read in all notes at startup is going to be _slow_.
> >>
> >> In contrast, commits we read when (and only when) we need them.
> >
> > Ok. But the reverse mapping will help with this, won't it?
> > We'll look up the interesting commits and find their associated
> > note objects directly.
>
> The issue Linus brought up worries me, too.
>
> The "efficient reverse mapping" is still handwaving at this
> stage. What it needs to do is an equivalent to your
> implementation with "refs/notes/<a dir per commit>/<note>". The
> "efficient" one might do a flat file that says "notee note" per
> line sorted by notee, or it might use BDB or sqlite, but the
> amount of the data and complexity of the look-up is really the
> same. A handful notes per each commit in the history (I think
> Linus's "Acked-by after the fact" example a very sensible thing
> to want from this subsystem).
>
> I am not saying that it is impossible to make the set-up cost
> for the "efficient lookup" almost zero, and to make it lazy and
> on-demand. The concern above just adds one design constraints
> to that "efficient reverse mapping" code yet to come.
Ok, here's what I'm thinking so far on that reverse mapping:
1. Keep a file, ".git/reverse_tagmap_sorted" with one entry of the form
"pointee pointer" per line. The file is sorted on "pointee", so we can
easily do the radix-256-fan-out-followed-by-binary-search trick that
Linus mentioned in another thread. This should hopefully make lookup
fairly cheap. BTW, if there is a similar "pointee pointer"-type format
already being used in git, I'd be happy to use that instead. I looked
at the "peeled" format being used by packed-refs, but using that
directly doesn't sound like a good idea, since the refname causes the
entries to be of variable length, and the refnames are not interesting
to me at all.
2. Keep another file, ".git/reverse_tagmap_unsorted" in front of (1).
This file has exactly the same format, minus the sorting. It exists just
to make insertion cheap. Once this file reaches a certain size (i.e.
when trawling it on lookup becomes slightly painful), we shuffle the
entries into the sorted file (this would happen automatically on
insertion of an entry, and should _not_ have to be triggered by 'git-gc'
etc.).
Of course, if we think insertion directly into (1) will never be too
expensive, we can drop (2) altogether.
I don't know enough about packing to have a good idea on how to pack
these reverse tagmaps, but Shawn's thoughts about keeping associated
tags/notes and objects close together makes a lot of sense. I'm just
not sure yet where these reverse tagmaps fit into the whole picture.
Currently, AFAICS, the packed-refs file is never propagated into the
packs, but stays separate for the lifetime of the repo, but then it
seems we're designing these reverse tagmaps for managing a handful of
notes per commit, i.e. to hold a couple of orders of magnitude more
entries than the packed-refs file.
Maybe each pack should keep the reverse tagmap for all the object->note
relationships internal to that pack? Everything else (unpacked notes,
and object->note relationships spanning packs) would be kept in (1).
Of course, when repacking, we'd try to keep objects and their notes
together as much as possible, to maximize the in-pack reverse tagmap,
and minimize the number of entries left behind in (1).
Have fun!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2007-05-29 7:06 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-09 19:20 [RFC] Second parent for reverts Daniel Barkalow
2007-05-09 20:07 ` Johannes Schindelin
2007-05-09 20:22 ` Shawn O. Pearce
2007-05-09 22:26 ` Johan Herland
2007-05-09 21:54 ` Junio C Hamano
2007-05-09 22:16 ` Linus Torvalds
2007-05-10 16:35 ` Linus Torvalds
2007-05-10 18:06 ` Johan Herland
2007-05-10 18:22 ` Linus Torvalds
2007-05-27 14:08 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Johan Herland
2007-05-27 14:09 ` [PATCH 01/15] git-note: Add git-note command for adding/listing/deleting git notes Johan Herland
2007-05-27 14:10 ` [PATCH 02/15] git-note: (Documentation) Add git-note manual page Johan Herland
2007-05-27 14:11 ` [PATCH 03/15] git-note: (Administrivia) Add git-note to Makefile, .gitignore, etc Johan Herland
2007-05-27 14:11 ` [PATCH 04/15] git-note: (Plumbing) Add plumbing-level support for git notes Johan Herland
2007-05-27 14:12 ` [PATCH 05/15] git-note: (Plumbing) Add support for git notes to git-rev-parse and git-show-ref Johan Herland
2007-05-27 14:13 ` [PATCH 06/15] git-note: (Documentation) Explain the new '--notes' option " Johan Herland
2007-05-27 14:13 ` [PATCH 07/15] git-note: (Almost plumbing) Add support for git notes to git-pack-refs and git-fsck Johan Herland
2007-05-27 14:14 ` [PATCH 08/15] git-note: (Decorations) Add note decorations to "git-{log,show,whatchanged} --decorate" Johan Herland
2007-05-27 14:14 ` [PATCH 09/15] git-note: (Documentation) Explain new behaviour of --decorate in git-{log,show,whatchanged} Johan Herland
2007-05-27 14:15 ` [PATCH 10/15] git-note: (Transfer) Teach git-clone how to clone notes Johan Herland
2007-05-27 14:15 ` [PATCH 11/15] git-note: (Transfer) Teach git-fetch to auto-follow notes Johan Herland
2007-05-27 14:15 ` [PATCH 12/15] git-note: (Transfer) Teach git-push to push notes when --all or --notes is given Johan Herland
2007-05-27 14:16 ` [PATCH 13/15] git-note: (Documentation) Explain the new --notes option to git-push Johan Herland
2007-05-27 14:16 ` [PATCH 14/15] git-note: (Tests) Add tests for git-note and associated functionality Johan Herland
2007-05-27 14:17 ` [PATCH 15/15] git-note: Add display of notes to gitk Johan Herland
2007-05-27 20:09 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Junio C Hamano
2007-05-28 0:29 ` Johan Herland
2007-05-28 0:59 ` Jakub Narebski
2007-05-28 4:37 ` Linus Torvalds
2007-05-28 10:54 ` Johan Herland
2007-05-28 16:28 ` Linus Torvalds
2007-05-28 16:40 ` Johan Herland
2007-05-28 16:58 ` Linus Torvalds
2007-05-28 17:48 ` Johan Herland
2007-05-28 20:45 ` Junio C Hamano
2007-05-28 21:35 ` Shawn O. Pearce
2007-05-28 23:37 ` Johannes Schindelin
2007-05-29 3:12 ` Linus Torvalds
2007-05-29 3:22 ` Shawn O. Pearce
2007-05-29 7:04 ` Jakub Narebski
2007-05-29 11:04 ` Andy Parkins
2007-05-29 11:12 ` Johannes Schindelin
2007-05-29 7:06 ` Johan Herland [this message]
2007-05-29 8:22 ` Jeff King
2007-05-29 9:23 ` Johan Herland
2007-05-28 20:45 ` Junio C Hamano
2007-05-28 21:19 ` Shawn O. Pearce
2007-05-28 23:46 ` [PATCH] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-05-28 17:29 ` [PATCH 00/15] git-note: A mechanisim for providing free-form after-the-fact annotations on commits Michael S. Tsirkin
2007-05-28 17:42 ` Michael S. Tsirkin
2007-05-28 17:58 ` Johan Herland
2007-05-10 22:33 ` [RFC] Second parent for reverts Martin Langhoff
2007-05-10 1:43 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200705290906.29328.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.