* fast-import's notemodify doesn't work the same as git notes @ 2014-12-23 0:06 Mike Hommey 2015-01-05 3:37 ` Johan Herland 0 siblings, 1 reply; 6+ messages in thread From: Mike Hommey @ 2014-12-23 0:06 UTC (permalink / raw) To: git; +Cc: Johan Herland Hi, There are two major differences between adding notes with fast-import and git notes, one of which is a serious problem: - fast-import doesn't want to add notes for non commits, while git notes does. - fast-import and git notes have different, conflicting fanouts: - take e.g. the git repo (there needs to be a lot of commits to start to see the problem) - run the following to create notes for every commit: (echo 'blob'; echo 'mark :1'; echo 'data 0'; echo 'commit refs/notes/foo'; echo 'committer <foo> 0 +0000'; echo 'data 0'; git rev-list --all | awk '{print "N :1", $1}'; echo) | git fast-import - pick a random commit. I'll pick bbcefffcea9789e4a1a2023a1c778e2c07db77a7 as it is current master as of writing. Take the first two characters of that sha1, and look at the ls-tree: git ls-tree refs/notes/foo bb/ You'll see a number of blobs. - Now, remove the note for that commit with git notes: git notes --ref foo remove bbcefffcea9789e4a1a2023a1c778e2c07db77a7 - ls-tree again, you'll now see a number of trees instead of blobs, because git notes will have done a fanout. -> git notes does fanouts for much less items than fast-import does. - Re-add a note for that commit with fast-import: git fast-import <<EOF blob mark :1 data 0 commit refs/notes/foo committer <foo> 0 +0000 data 0 from refs/notes/foo^0 N :1 bbcefffcea9789e4a1a2023a1c778e2c07db77a7 EOF - ls-tree again, and you'll see a number of trees and *one* blob, for bb/cefffcea9789e4a1a2023a1c778e2c07db77a7 - See the thread starting with 20141126004242.GA13915@glandium.org, this type of notes branch make things very slow. - Now, if you take an even bigger repository (as long as there are more than 65536 commits, that's good ; I guess the linux kernel qualifies, I've been checking with a mozilla-central clone), and create exactly 65535 notes with git fast-import, you'll end up with a 1-level tree (2/38). Add one more note, and the entire tree turns into a 2-level tree (2/2/36). git notes would only add a level to the tree containing the added note. git notes's behavior scales better, because think about what happens on the next fanout with fast-import... adding one note would need to create millions of trees. Cheers, Mike ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fast-import's notemodify doesn't work the same as git notes 2014-12-23 0:06 fast-import's notemodify doesn't work the same as git notes Mike Hommey @ 2015-01-05 3:37 ` Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' Johan Herland 0 siblings, 1 reply; 6+ messages in thread From: Johan Herland @ 2015-01-05 3:37 UTC (permalink / raw) To: Mike Hommey; +Cc: Git mailing list (sorry for the late answer, I've been away from email over the holidays) On Tue, Dec 23, 2014 at 1:06 AM, Mike Hommey <mh@glandium.org> wrote: > Hi, > > There are two major differences between adding notes with fast-import > and git notes, one of which is a serious problem: > > - fast-import doesn't want to add notes for non commits, while git notes > does. True. I consider this a bug in fast-import. A first draft of a patch series to fix that will be posted shortly. > - fast-import and git notes have different, conflicting fanouts: Yes. The relevant code in fast-import and git-notes is largely separate and independent of each other. The main reason for this, IIRC, is that fast-import uses its own data structures and algorithms when building trees, and that makes it hard to reuse the git-notes code (fast-import.c does not even #include notes.h because of this). fast-import is fundamentally built to solve a somewhat different use case than the rest of the git tools, and I believe this is why it "goes it alone" and implements its own (duplicate - to a certain degree) data structures and algorithms: It is heavily geared towards creating a _lot_ of trees in as little time and space as possible. The fast-import documentation states that the packs it generates are sub-optimal, and that you should always repack after fast-import to get more optimal/usable packs. Although not documented, a similar analogy applies to notes trees: fast-import will import a lot of notes quickly and efficiently, but the resulting fanout might not be the same (or as optimal) as what git-notes would produce. IMHO, it is not optimal (at least not currently) to use a workflow that interleaves notes tree manipulation from fast-import and git-notes, like you do below. You risk a lot of fanout churn, since the fanout calculation logic is different between the two. That said, there are clearly also bugs in the current fast-import behavior that we should fix... :) > - take e.g. the git repo (there needs to be a lot of commits to start > to see the problem) > - run the following to create notes for every commit: > (echo 'blob'; > echo 'mark :1'; > echo 'data 0'; > echo 'commit refs/notes/foo'; > echo 'committer <foo> 0 +0000'; > echo 'data 0'; > git rev-list --all | awk '{print "N :1", $1}'; > echo) | git fast-import > - pick a random commit. I'll pick > bbcefffcea9789e4a1a2023a1c778e2c07db77a7 as it is current master as > of writing. Take the first two characters of that sha1, and look at > the ls-tree: > git ls-tree refs/notes/foo bb/ > You'll see a number of blobs. Ok, so fast-import yields a 2/38 fanout for ~50 000 notes (#commits in git.git). fast-import keeps track of the total number of notes in the tree (although there are bugs, see the patch series), and divides that number repeatedly by 256 to find the fanout, so this result is expected, AFAICS > - Now, remove the note for that commit with git notes: > git notes --ref foo remove bbcefffcea9789e4a1a2023a1c778e2c07db77a7 > - ls-tree again, you'll now see a number of trees instead of blobs, > because git notes will have done a fanout. -> git notes does fanouts > for much less items than fast-import does. Maybe. git-notes has a different heuristic that does not keep track of the total number of notes in the tree. This is because git-notes only loads the required parts of the notes tree, so when you remove that single note, only the relevant parts of the notes tree (the bb/* tree) is loaded and manipulated. Then, git-notes looks at the density of note entries in that sub-tree (see determine_fanout() in notes.c for the details) to figure out if the fanout needs to be adjusted (but only in that sub-tree). In this case, it looks like the heuristic triggers a nested fanout within the bb/* tree. However, the other 255 "top-level" trees are not loaded, and hence not "re-balanced" by git-notes. > - Re-add a note for that commit with fast-import: > git fast-import <<EOF > blob > mark :1 > data 0 > commit refs/notes/foo > committer <foo> 0 +0000 > data 0 > from refs/notes/foo^0 > N :1 bbcefffcea9789e4a1a2023a1c778e2c07db77a7 > > EOF > - ls-tree again, and you'll see a number of trees and *one* blob, for > bb/cefffcea9789e4a1a2023a1c778e2c07db77a7 Hmm... Here I'd expect fast-import to rewrite the entire notes tree, not just a single entry. Not sure if this is a symptom of the bug discussed in our previous thread, or if this is a deeper problem. > - See the thread starting with 20141126004242.GA13915@glandium.org, > this type of notes branch make things very slow. I have a suggested fix for this in my upcoming patch series. Please test this with your scenario to see if it makes a difference. > - Now, if you take an even bigger repository (as long as there are more > than 65536 commits, that's good ; I guess the linux kernel > qualifies, I've been checking with a mozilla-central clone), and > create exactly 65535 notes with git fast-import, you'll end up with > a 1-level tree (2/38). Add one more note, and the entire tree turns > into a 2-level tree (2/2/36). git notes would only add a level to > the tree containing the added note. git notes's behavior scales > better, because think about what happens on the next fanout with > fast-import... adding one note would need to create millions of trees. True, this is a good illustration of the difference between the notes code in fast-import and git-notes. It might be possible to change the fast-import code to work more like the git-notes code, but it's been quite a while since I looked closely at this, and I'm not sure it is as easy as it sounds. Have fun! :) ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' 2015-01-05 3:37 ` Johan Herland @ 2015-01-05 3:39 ` Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 2/4] fast-import.c:do_change_note_fanout(): Also apply load_tree() to initial root Johan Herland ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Johan Herland @ 2015-01-05 3:39 UTC (permalink / raw) To: Mike Hommey; +Cc: Git mailing list, Johan Herland The struct tree_content *t variable was used as a shorthand for the given root->tree, but it was only used in the first two lines of the for loop, so the shorthand itself does not add much value. Furthermore, since the for loop body might end up reallocating root->tree we had to reinitialize t at the end of the for loop body, which is ugly and brittle. Simply use root->tree directly instead, and remote the t shorthand. Signed-off-by: Johan Herland <johan@herland.net> --- fast-import.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fast-import.c b/fast-import.c index d0bd285..04dfd50 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2153,15 +2153,14 @@ static uintmax_t do_change_note_fanout( char *fullpath, unsigned int fullpath_len, unsigned char fanout) { - struct tree_content *t = root->tree; struct tree_entry *e, leaf; unsigned int i, tmp_hex_sha1_len, tmp_fullpath_len; uintmax_t num_notes = 0; unsigned char sha1[20]; char realpath[60]; - for (i = 0; t && i < t->entry_count; i++) { - e = t->entries[i]; + for (i = 0; root->tree && i < root->tree->entry_count; i++) { + e = root->tree->entries[i]; tmp_hex_sha1_len = hex_sha1_len + e->name->str_len; tmp_fullpath_len = fullpath_len; @@ -2217,9 +2216,6 @@ static uintmax_t do_change_note_fanout( hex_sha1, tmp_hex_sha1_len, fullpath, tmp_fullpath_len, fanout); } - - /* The above may have reallocated the current tree_content */ - t = root->tree; } return num_notes; } -- 2.1.1.392.g062cc5d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC/PATCHv0 2/4] fast-import.c:do_change_note_fanout(): Also apply load_tree() to initial root 2015-01-05 3:39 ` [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' Johan Herland @ 2015-01-05 3:39 ` Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 3/4] fast-import: Support adding notes to non-commits Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 4/4] fast-import.c:note_change_n(): Rename commit_* to target_* Johan Herland 2 siblings, 0 replies; 6+ messages in thread From: Johan Herland @ 2015-01-05 3:39 UTC (permalink / raw) To: Mike Hommey; +Cc: Git mailing list, Johan Herland TODO: tests! do_change_note_fanout() recursively traverses a "struct tree_entry" data structure. Before recursively traversing into a subtree, we make sure to call load_tree() on that subtree. However, for the initial/root struct tree_entry object, we assumed that load_tree() had already been invoked by our caller. This is true, except in the following case: If we load a previously written notes tree using 'filemodify' instead of 'from' [1], then we do not pass the notes ref to load_branch() (which takes care of calling load_tree() on the root tree_entry). When load_tree(root) has not been called before the first 'notemodify' command is parsed, the for loop in do_change_note_fanout() does nothing, and we end up not counting the existing notes, which ends up producing a notes tree with mixed/insufficient fanout. Fix this by always making sure to call load_tree() at the start of do_change_note_fanout(). Since load_tree() is now called before the for loop, we no longer need to call load_tree() on a subtree before recursing into it, the recursive call will take care of its own load_tree(). [1]: Usually, when adding notes to an existing notes tree, one would use a 'from' command like this: from refs/notes/foo^0 However, if one does not want to retain notes history (i.e. one wants the current refs/notes/foo commit to be build without any parents), then one can replace the 'from' command with the following 'filemodify' "hack" to pre-load the existing notes tree without using it as a parent: M 040000 refs/notes/foo^{tree} \n Discovered-by: Mike Hommey <mh@glandium.org> Signed-off-by: Johan Herland <johan@herland.net> --- fast-import.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fast-import.c b/fast-import.c index 04dfd50..aa7b64e 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2159,6 +2159,9 @@ static uintmax_t do_change_note_fanout( unsigned char sha1[20]; char realpath[60]; + if (!root->tree) + load_tree(root); + for (i = 0; root->tree && i < root->tree->entry_count; i++) { e = root->tree->entries[i]; tmp_hex_sha1_len = hex_sha1_len + e->name->str_len; @@ -2210,8 +2213,6 @@ static uintmax_t do_change_note_fanout( leaf.tree); } else if (S_ISDIR(e->versions[1].mode)) { /* This is a subdir that may contain note entries */ - if (!e->tree) - load_tree(e); num_notes += do_change_note_fanout(orig_root, e, hex_sha1, tmp_hex_sha1_len, fullpath, tmp_fullpath_len, fanout); -- 2.1.1.392.g062cc5d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC/PATCHv0 3/4] fast-import: Support adding notes to non-commits 2015-01-05 3:39 ` [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 2/4] fast-import.c:do_change_note_fanout(): Also apply load_tree() to initial root Johan Herland @ 2015-01-05 3:39 ` Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 4/4] fast-import.c:note_change_n(): Rename commit_* to target_* Johan Herland 2 siblings, 0 replies; 6+ messages in thread From: Johan Herland @ 2015-01-05 3:39 UTC (permalink / raw) To: Mike Hommey; +Cc: Git mailing list, Johan Herland "git notes" allows adding notes to non-commit objects, but fast-import restricts the 'notemodify' command to only add notes for commit objects. Remove that silly restriction from fast-import. Suggested-by: Mike Hommey <mh@glandium.org> Signed-off-by: Johan Herland <johan@herland.net> --- fast-import.c | 12 ++++------ t/t9301-fast-import-notes.sh | 56 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 8 deletions(-) diff --git a/fast-import.c b/fast-import.c index aa7b64e..b6df00b 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2498,16 +2498,12 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa } else if (*p == ':') { uintmax_t commit_mark = parse_mark_ref_eol(p); struct object_entry *commit_oe = find_mark(commit_mark); - if (commit_oe->type != OBJ_COMMIT) - die("Mark :%" PRIuMAX " not a commit", commit_mark); hashcpy(commit_sha1, commit_oe->idx.sha1); } else if (!get_sha1(p, commit_sha1)) { - unsigned long size; - char *buf = read_object_with_reference(commit_sha1, - commit_type, &size, commit_sha1); - if (!buf || size < 46) - die("Not a valid commit: %s", p); - free(buf); + if (!find_object(commit_sha1)) { + if (sha1_object_info(commit_sha1, NULL) < 0) + die("Not a valid object: %s", p); + } } else die("Invalid ref name or SHA1 expression: %s", p); diff --git a/t/t9301-fast-import-notes.sh b/t/t9301-fast-import-notes.sh index 83acf68..b8ff673 100755 --- a/t/t9301-fast-import-notes.sh +++ b/t/t9301-fast-import-notes.sh @@ -679,4 +679,60 @@ test_expect_success "add notes to $num_commits commits in each of $num_notes_ref ' +tree4=$(git rev-parse refs/heads/master^{tree}) +blob4=$(git rev-parse refs/heads/master:bar) + +test_tick +cat >input <<INPUT_END +commit refs/notes/noncommits +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE +data <<COMMIT +adding notes to non-commits +COMMIT + +N inline $tree4 +data <<EOF +note for tree object +EOF + +N inline $blob4 +data <<EOF +note for blob object +EOF + +INPUT_END + +test_expect_success 'add notes to non-commits' " + + git fast-import <input && + echo 'note for tree object' >expect && + git notes --ref noncommits show $tree4 >actual && + test_cmp expect actual && + echo 'note for blob object' >expect && + git notes --ref noncommits show $blob4 >actual && + test_cmp expect actual + +" + +test_tick +cat >input <<INPUT_END +commit refs/notes/nonobjects +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE +data <<COMMIT +adding notes to non-objects MUST FAIL +COMMIT + +N inline deadbeefdeadbeefdeadbeefdeadbeefdeadbeef +data <<EOF +note for missing object +EOF + +INPUT_END + +test_expect_success 'cannot add notes to non-objects' " + + test_must_fail git fast-import <input + +" + test_done -- 2.1.1.392.g062cc5d ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC/PATCHv0 4/4] fast-import.c:note_change_n(): Rename commit_* to target_* 2015-01-05 3:39 ` [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 2/4] fast-import.c:do_change_note_fanout(): Also apply load_tree() to initial root Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 3/4] fast-import: Support adding notes to non-commits Johan Herland @ 2015-01-05 3:39 ` Johan Herland 2 siblings, 0 replies; 6+ messages in thread From: Johan Herland @ 2015-01-05 3:39 UTC (permalink / raw) To: Mike Hommey; +Cc: Git mailing list, Johan Herland Since we now allow adding notes to any kind of object (not just commits), rename the related variables from "commit_*" to "target_*". TODO: Also replace "commit-ish" with something more suitable in the documentation. However, the same needs to be done for the 'tag' command, and we also the BNF at the top of fast-import.c needs corresponding updates. Signed-off-by: Johan Herland <johan@herland.net> --- fast-import.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fast-import.c b/fast-import.c index b6df00b..4ef7a95 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2451,7 +2451,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa static struct strbuf uq = STRBUF_INIT; struct object_entry *oe; struct branch *s; - unsigned char sha1[20], commit_sha1[20]; + unsigned char sha1[20], target_sha1[20]; char path[60]; uint16_t inline_data = 0; unsigned char new_fanout; @@ -2489,19 +2489,19 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa die("Missing space after SHA1: %s", command_buf.buf); } - /* <commit-ish> */ + /* <object> */ s = lookup_branch(p); if (s) { if (is_null_sha1(s->sha1)) die("Can't add a note on empty branch."); - hashcpy(commit_sha1, s->sha1); + hashcpy(target_sha1, s->sha1); } else if (*p == ':') { - uintmax_t commit_mark = parse_mark_ref_eol(p); - struct object_entry *commit_oe = find_mark(commit_mark); - hashcpy(commit_sha1, commit_oe->idx.sha1); - } else if (!get_sha1(p, commit_sha1)) { - if (!find_object(commit_sha1)) { - if (sha1_object_info(commit_sha1, NULL) < 0) + uintmax_t target_mark = parse_mark_ref_eol(p); + struct object_entry *target_oe = find_mark(target_mark); + hashcpy(target_sha1, target_oe->idx.sha1); + } else if (!get_sha1(p, target_sha1)) { + if (!find_object(target_sha1)) { + if (sha1_object_info(target_sha1, NULL) < 0) die("Not a valid object: %s", p); } } else @@ -2527,7 +2527,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa typename(type), command_buf.buf); } - construct_path_with_fanout(sha1_to_hex(commit_sha1), *old_fanout, path); + construct_path_with_fanout(sha1_to_hex(target_sha1), *old_fanout, path); if (tree_content_remove(&b->branch_tree, path, NULL, 0)) b->num_notes--; @@ -2536,7 +2536,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa b->num_notes++; new_fanout = convert_num_notes_to_fanout(b->num_notes); - construct_path_with_fanout(sha1_to_hex(commit_sha1), new_fanout, path); + construct_path_with_fanout(sha1_to_hex(target_sha1), new_fanout, path); tree_content_set(&b->branch_tree, path, sha1, S_IFREG | 0644, NULL); } -- 2.1.1.392.g062cc5d ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-05 4:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-23 0:06 fast-import's notemodify doesn't work the same as git notes Mike Hommey 2015-01-05 3:37 ` Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 1/4] fast-import.c:do_change_note_fanout(): Remove unneeded local var 't' Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 2/4] fast-import.c:do_change_note_fanout(): Also apply load_tree() to initial root Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 3/4] fast-import: Support adding notes to non-commits Johan Herland 2015-01-05 3:39 ` [RFC/PATCHv0 4/4] fast-import.c:note_change_n(): Rename commit_* to target_* Johan Herland
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).