All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland.net>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation
Date: Tue, 08 Dec 2009 02:44:30 +0100	[thread overview]
Message-ID: <200912080244.30390.johan@herland.net> (raw)
In-Reply-To: <20091207164130.GD17173@spearce.org>

On Monday 07 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland.net> wrote:
> > +static uintmax_t do_change_note_fanout(
> > +		struct tree_entry *orig_root, struct tree_entry *root,
> > +		char *hex_sha1, unsigned int hex_sha1_len,
> > +		char *fullpath, unsigned int fullpath_len,
> > +		unsigned char fanout)
> 
> I think this function winds up processing all notes twice.  Yuck.
> 
> tree_content_set() adds a new tree entry to the end of the current
> tree.  So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916"
> into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the
> end of orig_root, and this function will walk 1a/ recursively,
> examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again.

Yep, you're right. Still, we only do the tree_content_remove()/set() once 
per note, so although performance is probably not abysmal, we are still 
clearly suboptimal.

Also, keep in mind that change_note_fanout() is only called when the number 
of notes crosses a power of 256. Thus for typical notes trees (which are 
assumed to mostly accumulate notes over their lifetime), 
change_note_fanout() will be called zero, one or two times (depending on the 
final number of notes).

> If we're here, isn't it likely that *all* notes are in the wrong
> path in the tree, and we need to move them all to a new location?
> If that's true then should we instead just build an entirely new
> tree and swap the root when we are done?

Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a 
single commit, the current code would need to rewrite 255 of them from 
fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the 
vast majority (1,934,465) would not require rewriting (having been added at 
the correct fanout initially). However, if we build a new tree (by which I 
assume you mean tree_content_remove() from the old tree and 
tree_content_set() to the new tree for every single note (and non-note)), we 
end up processing all 2,000,000 entries.

> As we empty out a tree the object will be recycled into a pool of
> trees which can be reused at a later point.  It might actually make
> sense to build the new tree under a different root.  We won't scan
> entries we've moved, and the memory difference should be fairly
> small as tree_content_remove() will make a subtree available for
> reuse as soon as its empty.  So we're only dealing with a handful
> of additional tree objects as we do the conversion.

I'm not sure I get the details here. How can we avoid doing the 
_remove()/_set() from/to the old/new tree for every tree_entry? In other 
words, how do we avoid removing and re-setting the 2,000,000 notes in the 
above example?


Thanks for the review!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

  reply	other threads:[~2009-12-08  1:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-07 11:27 [RFC/PATCHv10 00/11] git notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation Johan Herland
2009-12-07 16:41   ` Shawn O. Pearce
2009-12-08  1:44     ` Johan Herland [this message]
2009-12-08  2:01       ` Shawn O. Pearce
2009-12-08  2:45         ` Johan Herland
2009-12-10  9:39           ` Johan Herland
2009-12-10 14:03             ` Shawn O. Pearce
2009-12-10 14:40               ` Johan Herland
2009-12-11  3:00                 ` Junio C Hamano
2009-12-07 16:43   ` Shawn O. Pearce
2009-12-08  1:55     ` Johan Herland
2009-12-08  1:59       ` Shawn O. Pearce
2009-12-07 20:42   ` Junio C Hamano
2009-12-08  2:34     ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 02/11] Rename t9301 to t9350, to make room for more fast-import tests Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 03/11] Add more testcases to test fast-import of notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 04/11] Minor style fixes to notes.c Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 05/11] Notes API: get_commit_notes() -> format_note() + remove the commit restriction Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 06/11] Notes API: init_notes(): Initialize the notes tree from the given notes ref Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 07/11] Notes API: add_note(): Add note objects to the internal notes tree structure Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 08/11] Notes API: get_note(): Return the note annotating the given object Johan Herland
2009-12-07 20:52   ` Junio C Hamano
2009-12-08  3:18     ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 09/11] Notes API: for_each_note(): Traverse the entire notes tree with a callback Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 10/11] Notes API: Allow multiple concurrent notes trees with new struct notes_tree Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 11/11] Refactor notes concatenation into a flexible interface for combining notes Johan Herland
2009-12-08  9:25 ` [RFC/PATCHv10 00/11] git notes Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912080244.30390.johan@herland.net \
    --to=johan@herland.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.