From: Johan Herland <johan@herland.net>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [RFC/PATCHv9 01/11] fast-import: Proper notes tree manipulation
Date: Thu, 03 Dec 2009 04:45:55 +0100 [thread overview]
Message-ID: <200912030445.55732.johan@herland.net> (raw)
In-Reply-To: <20091202203953.GE31648@spearce.org>
(Oops. I forgot to answer a couple of your questions...)
On Wednesday 02 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland.net> wrote:
> > +static unsigned int do_change_note_fanout(
> > + struct tree_entry *orig_root, struct tree_entry *root,
> > + char *hex_sha1, unsigned int hex_sha1_len,
> > + char *fullpath, unsigned int fullpath_len,
> > + unsigned char fanout)
> > +{
> > + struct tree_content *t = root->tree;
> > + struct tree_entry *e, leaf;
> > + unsigned int i, tmp_hex_sha1_len, tmp_fullpath_len, num_notes = 0;
> > + unsigned char sha1[20];
> > + char realpath[60];
> > + int is_note;
> > +
> > + for (i = 0; i < t->entry_count; i++) {
> > + e = t->entries[i];
> > + is_note = (e->versions[1].mode & NOTE_MODE) == NOTE_MODE;
> > + tmp_hex_sha1_len = hex_sha1_len + e->name->str_len;
> > + tmp_fullpath_len = fullpath_len;
> > +
> > + if (tmp_hex_sha1_len <= 40 && e->name->str_len >= 2) {
> > + memcpy(hex_sha1 + hex_sha1_len, e->name->str_dat,
> > + e->name->str_len);
> > + if (tmp_fullpath_len)
> > + fullpath[tmp_fullpath_len++] = '/';
> > + memcpy(fullpath + tmp_fullpath_len, e->name->str_dat,
> > + e->name->str_len);
> > + tmp_fullpath_len += e->name->str_len;
> > + assert(tmp_fullpath_len < 60);
> > + fullpath[tmp_fullpath_len] = '\0';
> > + } else {
> > + assert(!is_note);
> > + continue;
> > + }
>
> Are we rejecting a mixed content-tree here? I thought a notes
> tree was allowed to hold anything, e.g. isn't it ok to put a
> ".gitattributes" file into a notes tree.
>
> I think we'd do better to have at the top of our loop:
>
> if (!is_note && !S_ISDIR(e->versions[1].mode))
> continue;
>
> so that we ignore non-notes and non-subdirectories which might
> contain notes.
AFAICS, the current code does not reject ".gitattributes", but instead
simply ignores it presence (i.e. does not change its fanout). However, it
certainly does some unnecessary work (setting up hex_sha1 and fullpath)
which is ignored in the bottom part of the loop (where it fails both the "if
(is_note)" and the following "else if").
However, while adding a test to verify the handling of non-notes, I came
across an unrelated crasher in the notes.c code. Will send a fix for this
ASAP.
In any case, I think your proposed if-condition is better, and I will
rewrite the code accordingly.
> > @@ -2080,8 +2195,10 @@ static void note_change_n(struct branch *b)
> > typename(type), command_buf.buf);
> > }
> >
> > - tree_content_set(&b->branch_tree, sha1_to_hex(commit_sha1), sha1,
> > - S_IFREG | 0644, NULL);
> > + construct_path_with_fanout(sha1_to_hex(commit_sha1), fanout, path);
> > + b->num_notes += adjust_num_notes(&b->branch_tree, path, sha1);
> > + mode = (is_null_sha1(sha1) ? S_IFREG : NOTE_MODE) | 0644;
> > + tree_content_set(&b->branch_tree, path, sha1, mode, NULL);
>
> I wonder if it wouldn't be better to compute the fan out here on
> each put. That way if an importer drives 2,000,000 notes at once
> to us in a single commit, we don't wind up with a flat 0 fan-out
> tree and trying to perform a linear insert on each one, but instead
> will start to increase the fan out as the number of entries goes up.
>
> Basically, tree_content_set/remove are O(N) operations on N paths
> in the tree, because their structures aren't necessarily sorted.
> IIRC at one point in time I tried to do this with a binary search but
> gave up and just did it unsorted. At least using the fan out here
> would help partition the search space dramatically on large inserts.
This is a really good idea, but it's a bit more complicated than that:
An 'N' command may not only add new notes, but also rewrite existing notes,
and when rewriting an existing note we must take care to _replace_ the old
note, and not merely adding a new note at a different fanout level. The
above code implicitly guarantees this, by calling note_change_n() with the
'old' fanout level (which will cause the new note to overwrite the old note
in-place).
However, as long as we remember to check for (and remove, if found) the note
at the old fanout level, we might still be able to place the new note at the
_current_ fanout level [1], and thus avoid the worst case you describe
above. Hmm. I need to think some more about this...
Have fun! :)
...Johan
[1] This is actually not _completely_ right: If you have several 'N'
commands in the same commit that act on the same note, but happen to do so
in at least two different fanout levels, you will end up with two notes for
the same object (at least until do_change_note_fanout() arbitrarily
overwrites one of them with the other). However, this might be such a far-
fetched scenario, that we don't have to worry about it in practice.
--
Johan Herland, <johan@herland.net>
www.herland.net
next prev parent reply other threads:[~2009-12-03 3:46 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-02 2:09 [RFC/PATCHv9 00/11] git notes Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 01/11] fast-import: Proper notes tree manipulation Johan Herland
2009-12-02 20:39 ` Shawn O. Pearce
2009-12-02 22:41 ` Johan Herland
2009-12-03 3:45 ` Johan Herland [this message]
2009-12-02 2:09 ` [RFC/PATCHv9 02/11] Rename t9301 to t9350, to make room for more fast-import tests Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 03/11] Add more testcases to test fast-import of notes Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 04/11] Minor style fixes to notes.c Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 05/11] Notes API: get_commit_notes() -> format_note() + remove the commit restriction Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 06/11] Notes API: init_notes(): Initialize the notes tree from the given notes ref Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 07/11] Notes API: add_note(): Add note objects to the internal notes tree structure Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 08/11] Notes API: get_note(): Return the note annotating the given object Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 09/11] Notes API: for_each_note(): Traverse the entire notes tree with a callback Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 10/11] Notes API: Allow multiple concurrent notes trees with new struct notes_tree Johan Herland
2009-12-02 2:09 ` [RFC/PATCHv9 11/11] Refactor notes concatenation into a flexible interface for combining notes Johan Herland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200912030445.55732.johan@herland.net \
--to=johan@herland.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).