From: Johan Herland <johan@herland.net>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation
Date: Tue, 08 Dec 2009 02:44:30 +0100 [thread overview]
Message-ID: <200912080244.30390.johan@herland.net> (raw)
In-Reply-To: <20091207164130.GD17173@spearce.org>
On Monday 07 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland.net> wrote:
> > +static uintmax_t do_change_note_fanout(
> > + struct tree_entry *orig_root, struct tree_entry *root,
> > + char *hex_sha1, unsigned int hex_sha1_len,
> > + char *fullpath, unsigned int fullpath_len,
> > + unsigned char fanout)
>
> I think this function winds up processing all notes twice. Yuck.
>
> tree_content_set() adds a new tree entry to the end of the current
> tree. So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916"
> into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the
> end of orig_root, and this function will walk 1a/ recursively,
> examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again.
Yep, you're right. Still, we only do the tree_content_remove()/set() once
per note, so although performance is probably not abysmal, we are still
clearly suboptimal.
Also, keep in mind that change_note_fanout() is only called when the number
of notes crosses a power of 256. Thus for typical notes trees (which are
assumed to mostly accumulate notes over their lifetime),
change_note_fanout() will be called zero, one or two times (depending on the
final number of notes).
> If we're here, isn't it likely that *all* notes are in the wrong
> path in the tree, and we need to move them all to a new location?
> If that's true then should we instead just build an entirely new
> tree and swap the root when we are done?
Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a
single commit, the current code would need to rewrite 255 of them from
fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the
vast majority (1,934,465) would not require rewriting (having been added at
the correct fanout initially). However, if we build a new tree (by which I
assume you mean tree_content_remove() from the old tree and
tree_content_set() to the new tree for every single note (and non-note)), we
end up processing all 2,000,000 entries.
> As we empty out a tree the object will be recycled into a pool of
> trees which can be reused at a later point. It might actually make
> sense to build the new tree under a different root. We won't scan
> entries we've moved, and the memory difference should be fairly
> small as tree_content_remove() will make a subtree available for
> reuse as soon as its empty. So we're only dealing with a handful
> of additional tree objects as we do the conversion.
I'm not sure I get the details here. How can we avoid doing the
_remove()/_set() from/to the old/new tree for every tree_entry? In other
words, how do we avoid removing and re-setting the 2,000,000 notes in the
above example?
Thanks for the review!
...Johan
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2009-12-08 1:44 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-07 11:27 [RFC/PATCHv10 00/11] git notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation Johan Herland
2009-12-07 16:41 ` Shawn O. Pearce
2009-12-08 1:44 ` Johan Herland [this message]
2009-12-08 2:01 ` Shawn O. Pearce
2009-12-08 2:45 ` Johan Herland
2009-12-10 9:39 ` Johan Herland
2009-12-10 14:03 ` Shawn O. Pearce
2009-12-10 14:40 ` Johan Herland
2009-12-11 3:00 ` Junio C Hamano
2009-12-07 16:43 ` Shawn O. Pearce
2009-12-08 1:55 ` Johan Herland
2009-12-08 1:59 ` Shawn O. Pearce
2009-12-07 20:42 ` Junio C Hamano
2009-12-08 2:34 ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 02/11] Rename t9301 to t9350, to make room for more fast-import tests Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 03/11] Add more testcases to test fast-import of notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 04/11] Minor style fixes to notes.c Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 05/11] Notes API: get_commit_notes() -> format_note() + remove the commit restriction Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 06/11] Notes API: init_notes(): Initialize the notes tree from the given notes ref Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 07/11] Notes API: add_note(): Add note objects to the internal notes tree structure Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 08/11] Notes API: get_note(): Return the note annotating the given object Johan Herland
2009-12-07 20:52 ` Junio C Hamano
2009-12-08 3:18 ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 09/11] Notes API: for_each_note(): Traverse the entire notes tree with a callback Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 10/11] Notes API: Allow multiple concurrent notes trees with new struct notes_tree Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 11/11] Refactor notes concatenation into a flexible interface for combining notes Johan Herland
2009-12-08 9:25 ` [RFC/PATCHv10 00/11] git notes Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200912080244.30390.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.