* [RFC PATCH v4 0/4] Detection of directory renames @ 2010-10-03 20:42 Yann Dirson 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-03 20:42 UTC (permalink / raw) To: git; +Cc: Yann Dirson At last a reroll of that old series. The most prominent change is a split of the old --factorize-renames flag into --detect-dir-renames and --hide-dir-rename-details - hopefully following Junio's comments dated nearly 2 years ago. While doing this, a couple of testcases got added (as well as a list of more testcases to be written), and some minor optimizations were done, but no heavy surgery. I hope we can turn the simple detection feature to something acceptable for inclusion - that is, once unified diff output shows the detected renames as annotations, and once the known major holes are plugged (both from FIXME and testcases). For the "hiding" one OTOH, a more precise description of what we want must probably be done before progressing, but I suggest we keep that for later. Also, maybe the --detect-dir-renames would get a good alias as "-M -M". Does all of that look reasonable ? Yann Dirson (4): Introduce wholesame directory move detection in diffcore. Add testcases for the --detect-dir-renames diffcore flag. Allow hiding renames of individual files involved in a directory rename. Add testcases for the --hide-dir-rename-details diffcore flag. diff-lib.c | 6 +- diff.c | 12 + diff.h | 6 + diffcore-rename.c | 316 +++++++++++++++++++++++++++- diffcore.h | 1 + t/t4046-diff-rename-factorize.sh | 433 ++++++++++++++++++++++++++++++++++++++ tree-diff.c | 4 +- 7 files changed, 766 insertions(+), 12 deletions(-) create mode 100755 t/t4046-diff-rename-factorize.sh -- 1.7.2.3 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-03 20:42 [RFC PATCH v4 0/4] Detection of directory renames Yann Dirson @ 2010-10-03 20:42 ` Yann Dirson 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson ` (3 more replies) 2010-10-04 6:20 ` [RFC PATCH v4 0/4] Detection of directory renames Jonathan Nieder 2010-10-05 1:42 ` Jonathan Nieder 2 siblings, 4 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-03 20:42 UTC (permalink / raw) To: git; +Cc: Yann Dirson This feature tries to group together files moving from and to identical directories - the most common case being directory renames. It is activated by the new --detect-dir-renames diffcore flag. This is only the first step, adding the basic functionnality and adding support to raw diff output (and it breaks unified-diff output which does not know how to handle that stuff yet). Even the output format may not be kept as is. For now both the result of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is probably not what we want. "Rnnn a/* b/" could be a good choice for the latter if we want them to be distinguished, and even if we want them to look the same. Other future developements to be made on top of this include: * extension of unified-diff format to express this * application of such new diffs * teach git-svn (and others ?) to make use of that flag * merge correctly in case of addition into a moved dir * detect "directory splits" so merge can flag a conflict on file adds * use inexact dir renames to bump score of below-threshold renames from/to same locations * add yours here --- diff-lib.c | 6 +- diff.c | 5 + diff.h | 3 + diffcore-rename.c | 253 +++++++++++++++++++++++++++++++++++++++++++++++++++-- tree-diff.c | 4 +- 5 files changed, 260 insertions(+), 11 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index 392ce2b..f95a672 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) ce_option, &dirty_submodule); if (!changed && !dirty_submodule) { ce_mark_uptodate(ce); - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) continue; } oldmode = ce->ce_mode; @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, oldmode = old->ce_mode; if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule && - !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) + !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) return 0; diff_change(&revs->diffopt, oldmode, mode, diff --git a/diff.c b/diff.c index 71efa8e..47ef7ea 100644 --- a/diff.c +++ b/diff.c @@ -3188,6 +3188,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) DIFF_OPT_SET(options, REVERSE_DIFF); else if (!strcmp(arg, "--find-copies-harder")) DIFF_OPT_SET(options, FIND_COPIES_HARDER); + else if (!strcmp(arg, "--detect-dir-renames")) { + DIFF_OPT_SET(options, DETECT_DIR_RENAMES); + if (!options->detect_rename) + options->detect_rename = DIFF_DETECT_RENAME; + } else if (!strcmp(arg, "--follow")) DIFF_OPT_SET(options, FOLLOW_RENAMES); else if (!strcmp(arg, "--color")) diff --git a/diff.h b/diff.h index 1fd44f5..40c548d 100644 --- a/diff.h +++ b/diff.h @@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data) #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25) #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26) #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27) +#define DIFF_OPT_DETECT_DIR_RENAMES (1 << 28) #define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag) #define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag) @@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *); " -C detect copies.\n" \ " --find-copies-harder\n" \ " try unchanged files as candidate for copy detection.\n" \ +" --detect-dir-renames\n" \ +" detect wholesale directory renames.\n" \ " -l<n> limit rename attempts up to <n> paths.\n" \ " -O<file> reorder diffs according to the <file>.\n" \ " -S<string> find filepair whose only one side contains the string.\n" \ diff --git a/diffcore-rename.c b/diffcore-rename.c index df41be5..06a8b6c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -11,6 +11,7 @@ static struct diff_rename_dst { struct diff_filespec *two; struct diff_filepair *pair; + int i_am_not_single:1; // does not look for a match, only here to be looked at } *rename_dst; static int rename_dst_nr, rename_dst_alloc; @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, rename_dst[first].two = alloc_filespec(two->path); fill_filespec(rename_dst[first].two, two->sha1, two->mode); rename_dst[first].pair = NULL; + rename_dst[first].i_am_not_single = 0; return &(rename_dst[first]); } +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) +{ + /* code mostly duplicated from locate_rename_dst - not sure we + * could merge them efficiently,though + */ + int first, last; + int prefixlength = strlen(dir->path); + + first = 0; + last = rename_dst_nr; + while (last > first) { + int next = (last + first) >> 1; + struct diff_rename_dst *dst = &(rename_dst[next]); + int cmp = strncmp(dir->path, dst->two->path, prefixlength); + if (!cmp) + return dst; + if (cmp < 0) { + last = next; + continue; + } + first = next+1; + } + /* not found */ + return NULL; +} + /* Table of rename/copy src files */ static struct diff_rename_src { struct diff_filespec *one; @@ -386,8 +414,11 @@ static int find_exact_renames(void) for (i = 0; i < rename_src_nr; i++) insert_file_table(&file_table, -1, i, rename_src[i].one); - for (i = 0; i < rename_dst_nr; i++) + for (i = 0; i < rename_dst_nr; i++) { + if (rename_dst[i].i_am_not_single) + continue; insert_file_table(&file_table, 1, i, rename_dst[i].two); + } /* Find the renames */ i = for_each_hash(&file_table, find_same_files); @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) m[worst] = *o; } +struct diff_dir_rename { + struct diff_filespec *one; + struct diff_filespec *two; + int discarded; + struct diff_dir_rename* next; +}; + +// FIXME: prevent possible overflow +/* + * Copy dirname of src into dst, suitable to append a filename without + * an additional "/". + * Only handles relative paths since there is no absolute path in a git repo. + * Writes "" when there is no "/" in src. + * May overwrite more chars than really needed, if src ends with a "/". + */ +static const char* copy_dirname(char* dst, const char* src) +{ + char* lastslash = strrchr(src, '/'); + if (!lastslash) { + *dst = '\0'; + return dst; + } + strncpy(dst, src, lastslash - src + 1); + dst[lastslash - src + 1] = '\0'; + + // if src ends with a "/" strip the last component + if (lastslash[1] == '\0') { + lastslash = strrchr(dst, '/'); + if (!lastslash) + return strcpy(dst, "."); + lastslash[1] = '\0'; + } + + return dst; +} + +/* + * Take all file renames recorded so far and check if they could cause + * a bulk-rename to be detected. + * + * + * FIXME: we could optimize the 100%-rename case by preventing + * recursion to unfold what we know we would refold here. + * FIXME: do we want to replace linked list with sorted array ? + * FIXME: this prototype only properly handles renaming of dirs without + * a subdir. + * FIXME: leaks like hell. + * FIXME: ideas to evaluate a similarity score, anyone ? + * 10% * tree similarity + 90% * moved files similarity ? + * FIXME: ideas to consider under-threshold moves as part of bulk move ? + */ +static struct diff_dir_rename* factorization_candidates = NULL; +static void diffcore_factorize_renames(void) +{ + char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX]; + int i; + + for (i = 0; i < rename_dst_nr; i++) { + struct diff_dir_rename* seen; + + // FIXME: what are those ? + if (!rename_dst[i].pair) + continue; + // dummy renames used by copy detection + if (!strcmp(rename_dst[i].pair->one->path, rename_dst[i].pair->two->path)) + continue; + + copy_dirname(one_parent_path, rename_dst[i].pair->one->path); + copy_dirname(two_parent_path, rename_dst[i].pair->two->path); + + // simple rename with no directory change + if (!strcmp(one_parent_path, two_parent_path)) + continue; + + struct diff_filespec* one_parent = alloc_filespec(one_parent_path); + fill_filespec(one_parent, null_sha1 /*FIXME*/, S_IFDIR); + + // After this commit, are there any files still under one_parent_path ? + struct diff_rename_dst* one_leftover = locate_rename_dst_dir(one_parent); + if (one_leftover) { // FIXME: should only be run if !seen + /* this might be a dir split, or files added + * after the bulk rename, or just an isolated + * rename */ + /* FIXME: should we cache this information to avoid + * recomputing that result when many files trigger it ? + * See eg. git.git:761e742d692. + */ + int two_idx, j, onep_len, maybe_dir_rename; + + // try to see if it is a file added after the bulk rename + two_idx = one_leftover - rename_dst; + onep_len = strlen(one_parent_path); + maybe_dir_rename = 1; + + // check no leftover file was already here before + for (j = two_idx; j < rename_dst_nr; j++) { + if (strncmp(rename_dst[j].two->path, + one_parent_path, onep_len)) + break; // exhausted directory in this direction + fprintf(stderr, "[DBG] leftover file %s in %s\n", + rename_dst[j].two->path, one_parent_path); + if (rename_dst[j].i_am_not_single || // those were already here + (rename_dst[j].pair && + !strncmp(rename_dst[j].pair->one->path, + one_parent_path, onep_len) && // renamed from here + strncmp(rename_dst[j].two->path, + one_parent_path, onep_len))) { // not to a subdir + maybe_dir_rename = 0; + fprintf(stderr, "[DBG] ... tells not a bulk rename\n"); + break; + } + fprintf(stderr, "[DBG] ... not believed to prevent bulk rename\n"); + } + if (!maybe_dir_rename) continue; + // try the other direction (dup code) + for (j = two_idx-1; j >= 0; j--) { + if (strncmp(rename_dst[j].two->path, + one_parent_path, onep_len)) + break; // exhausted directory in this direction + fprintf(stderr, "[DBG] leftover file %s in %s\n", + rename_dst[j].two->path, one_parent_path); + if (rename_dst[j].i_am_not_single || // those were already here + (rename_dst[j].pair && + !strncmp(rename_dst[j].pair->one->path, + one_parent_path, onep_len) && // renamed from here + strncmp(rename_dst[j].two->path, + one_parent_path, onep_len))) { // not to a subdir + maybe_dir_rename = 0; + fprintf (stderr, "[DBG] ... tells not a bulk rename\n"); + break; + } + fprintf(stderr, "[DBG] ... not believed to prevent bulk rename\n"); + } + if (!maybe_dir_rename) continue; + + // here we are in the case where a directory + // content was completely moved, but files + // were added to it afterwards. Proceed as + // for a simple bulk move. + } + + // already considered ? + for (seen=factorization_candidates; seen; seen = seen->next) + if (!strcmp(seen->one->path, one_parent_path)) break; + if (!seen) { + // record potential dir rename + seen = xmalloc(sizeof(*seen)); + seen->one = one_parent; + seen->two = alloc_filespec(two_parent_path); + fill_filespec(seen->two, null_sha1 /*FIXME*/, S_IFDIR); + seen->discarded = 0; + seen->next = factorization_candidates; + factorization_candidates = seen; + fprintf(stderr, "[DBG] %s -> %s suggests possible rename from %s to %s\n", + rename_dst[i].pair->one->path, + rename_dst[i].pair->two->path, + one_parent_path, two_parent_path); + continue; + } + if (seen->discarded) + continue; + // check that seen entry matches this rename + if (strcmp(two_parent_path, seen->two->path)) { + fprintf(stderr, "[DBG] discarding dir %s from renames (split into %s and %s)\n", + one_parent_path, two_parent_path, seen->two->path); + seen->discarded = 1; + } + + /* all checks ok, we keep that entry */ + } + + return; +} + void diffcore_rename(struct diff_options *options) { int detect_rename = options->detect_rename; @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) p->one->rename_used++; register_rename_src(p->one, p->score); } - else if (detect_rename == DIFF_DETECT_COPY) { - /* - * Increment the "rename_used" score by - * one, to indicate ourselves as a user. - */ - p->one->rename_used++; - register_rename_src(p->one, p->score); + else { + if (detect_rename == DIFF_DETECT_COPY) { + /* + * Increment the "rename_used" score by + * one, to indicate ourselves as a user. + */ + p->one->rename_used++; + register_rename_src(p->one, p->score); + } + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { + /* similarly, rename factorization needs to + * see all files from second tree, but we don't + * want them to be matched against single sources. + */ + locate_rename_dst(p->two, 1)->i_am_not_single = 1; + } } } if (rename_dst_nr == 0 || rename_src_nr == 0) @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options) if (rename_dst[i].pair) continue; /* dealt with exact match already. */ + if (rename_dst[i].i_am_not_single) + continue; /* not looking for a match. */ m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; for (j = 0; j < NUM_CANDIDATE_PER_DST; j++) @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) /* At this point, we have found some renames and copies and they * are recorded in rename_dst. The original list is still in *q. */ + + /* Now possibly factorize those renames and copies. */ + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) + diffcore_factorize_renames(); + DIFF_QUEUE_CLEAR(&outq); + + // Now turn non-discarded factorization_candidates into real renames + struct diff_dir_rename* candidate; + for (candidate=factorization_candidates; candidate; candidate = candidate->next) { + struct diff_filepair* pair; + if (candidate->discarded) continue; + // visualize toplevel dir if needed - FIXME: wrong place for this ? + if (!*candidate->one->path) + candidate->one->path = "./"; + if (!*candidate->two->path) + candidate->two->path = "./"; + pair = diff_queue(&outq, candidate->one, candidate->two); + pair->score = MAX_SCORE; + pair->renamed_pair = 1; + } + for (i = 0; i < q->nr; i++) { struct diff_filepair *p = q->queue[i]; struct diff_filepair *pair_to_free = NULL; diff --git a/tree-diff.c b/tree-diff.c index cd659c6..ca0b84f 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const show_entry(opt, "+", t2, base, baselen); return 1; } - if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2) + if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && + !DIFF_OPT_TST(opt, DETECT_DIR_RENAMES) && + !hashcmp(sha1, sha2) && mode1 == mode2) return 0; /* -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson @ 2010-10-03 20:42 ` Yann Dirson 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson ` (2 more replies) 2010-10-04 2:59 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 3 siblings, 3 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-03 20:42 UTC (permalink / raw) To: git; +Cc: Yann Dirson From: Yann Dirson <ydirson@free.fr> This notably includes a couple of tests for cases known not to be working correctly yet. --- t/t4046-diff-rename-factorize.sh | 326 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 326 insertions(+), 0 deletions(-) create mode 100755 t/t4046-diff-rename-factorize.sh diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh new file mode 100755 index 0000000..51b0b0b --- /dev/null +++ b/t/t4046-diff-rename-factorize.sh @@ -0,0 +1,326 @@ +#!/bin/sh +# +# Copyright (c) 2008,2010 Yann Dirson +# Copyright (c) 2005 Junio C Hamano +# + +# TODO - missing tests: +# * two dirs or more moving all their files to a single dir +# * simultaneous bulkmove and rename +# * add a new file under a dir that was moved in the same commit + +test_description='Test rename factorization in diff engine. + +' +. ./test-lib.sh +. "$TEST_DIRECTORY"/diff-lib.sh + +test_expect_success \ + 'commit the index.' \ + 'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))' + +mkdir a +echo >a/path0 'Line 1 +Line 2 +Line 3 +Line 4 +Line 5 +Line 6 +Line 7 +Line 8 +Line 9 +Line 10 +line 11 +Line 12 +Line 13 +Line 14 +Line 15 +' +sed <a/path0 >a/path1 s/Line/Record/ +sed <a/path0 >a/path2 s/Line/Stuff/ +sed <a/path0 >a/path3 s/Line/Blurb/ + +test_expect_success \ + 'update-index --add file inside a directory.' \ + 'git update-index --add a/path*' + +test_expect_success \ + 'commit the index.' \ + 'git update-ref HEAD $(echo "original set of files" | git commit-tree $(git write-tree))' + +mv a b +test_expect_success \ + 'renamed the directory.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*' + +test_expect_success \ + 'git diff-index --detect-dir-renames after directory move.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 +EOF + +test_expect_success \ + 'validate the output for directory move.' \ + 'compare_diff_patch expected current.filtered' + +# now test non-100% renames + +echo 'Line 16' >> b/path0 +mv b/path2 b/2path +rm b/path3 +echo anything > b/path100 +test_expect_success \ + 'edited dir contents.' \ + 'git update-index --add --remove b/* b/path2 b/path3' + +test_expect_success \ + 'git diff-index --detect-dir-renames after directory move and content changes.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +:100644 000000 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a 0000000000000000000000000000000000000000 D a/path3 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/2path +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 4db595d12886f90e36765fc1732c17bccb836663 R093 a/path0 b/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 +:000000 100644 0000000000000000000000000000000000000000 1ba4650885513e62386fd3e23aeb45beeb67d3bb A b/path100 +EOF + +test_expect_success \ + 'validate the output for directory move and content changes.' \ + 'compare_diff_patch expected current.filtered' + +git reset -q --hard + +# now test bulk moves that are not directory moves (get consensus before going further ?) + +mkdir c +for i in 0 1 2; do cp a/path$i c/apath$i; done +test_expect_success \ + 'add files into a new directory.' \ + 'git update-index --add c/apath*' + +test_expect_success \ + 'commit all this.' \ + 'git commit -m "first set of changes"' + +mv c/* a/ +test_expect_success \ + 'move all of the new dir contents into a preexisting dir.' \ + 'git update-index --add --remove a/* c/apath0 c/apath1 c/apath2' + +test_expect_success \ + 'git diff-index --detect-dir-renames without full-dir rename.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* a/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 c/apath0 a/apath0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 c/apath1 a/apath1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 c/apath2 a/apath2 +EOF + +test_expect_failure \ + 'validate the output for bulk rename without full-dir rename.' \ + 'compare_diff_patch expected current.filtered' + +git reset -q --hard + +# now test moves to toplevel + +mv c/* . +test_expect_success \ + 'move all of the new dir contents into toplevel.' \ + 'git update-index --add --remove apath* c/apath0 c/apath1 c/apath2' + +test_expect_success \ + 'git diff-index --detect-dir-renames files bulk-moved to toplevel.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* ./ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 c/apath0 apath0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 c/apath1 apath1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 c/apath2 apath2 +EOF + +test_expect_failure \ + 'validate the output for files bulk-moved to toplevel.' \ + 'compare_diff_patch expected current.filtered' + +git reset -q --hard + +# now test renaming with subdirs (does not take subdirs into account) + +mv c a/ +test_expect_success \ + 'move the new dir as subdir of another.' \ + 'git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2' + +test_expect_success \ + 'commit all this.' \ + 'git commit -m "move as subdir"' + +mv a b +echo foo >> b/c/apath0 +test_expect_success \ + 'rename the directory with some changes.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath*' + +test_expect_success \ + 'git diff-index --detect-dir-renames on a move including a subdir.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 00084e5ea68b5ae339b7c4b429e4a70fe25d069b R096 a/c/apath0 b/c/apath0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 b/c/apath1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 b/c/apath2 +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 +EOF + +test_expect_failure \ + 'validate the output for a move including a subdir.' \ + 'compare_diff_patch expected current.filtered' + +git reset -q --hard + +# now check that moving all files but not subdirs is not mistaken for dir move + +mkdir b +mv a/path* b/ +test_expect_success \ + 'rename files in the directory but not subdir.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*' + +test_expect_success \ + 'git diff-index --detect-dir-renames on a move without a subdir.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 +EOF + +test_expect_success \ + 'validate the output for a move without a subdir.' \ + 'compare_diff_patch expected current.filtered' + +git reset -q --hard + +# now check that moving subdirs into one dir and files into another is not mistaken for dir move +# (well, clearly it is ...) + +mv a/c b +mv a d +test_expect_success \ + 'rename subdir and files into different places.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath*' + +test_expect_success \ + 'git diff-index --detect-dir-renames on a split of subdir and files into different places.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/c/ b/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/c/apath0 b/apath0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 b/apath1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 b/apath2 +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 d/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 d/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 d/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 d/path3 +EOF + +test_expect_failure \ + 'validate the output for a split of subdir and files into different places.' \ + 'compare_diff_patch expected current.filtered' + +# now test moving a dir with no files but only subdirs +# (only factorizes lowest-level directories - not a big deal, just not perfect) + +git reset -q --hard +mkdir a/b +mv a/path* a/b/ +test_expect_success \ + 'setup the directory with only subdirs, no direct child files.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path*' +test_expect_success \ + 'commit all this.' \ + 'git commit -m "move all toplevel files down one level"' + +test_expect_success \ + 'move this dir.' \ + 'git mv a z' + +test_expect_success \ + 'git diff-index --detect-dir-renames with only subdirs' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ z/ +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/b/ z/b/ +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/c/ z/c/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/b/path0 z/b/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/b/path1 z/b/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/b/path2 z/b/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/b/path3 z/b/path3 +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/c/apath0 z/c/apath0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 z/c/apath1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 z/c/apath2 +EOF + +test_expect_failure \ + 'validate the output for a move with only subdirs.' \ + 'compare_diff_patch expected current.filtered' + +# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax) +# Note: this is a special case of move of a dir into one of its own subdirs, which in +# turn is a variant of new files/dirs being added into a dir after all its contents +# are moved away + +git reset -q --hard HEAD~3 + +mv a/* . +test_expect_success \ + 'rename the directory with some changes.' \ + 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path*' + +test_expect_success \ + 'commit all this.' \ + 'git commit -m "move all files to toplevel"' + +mkdir z +mv path* z/ +test_expect_success \ + 'rename the directory with some changes.' \ + 'git update-index --add --remove path0 path1 path2 path3 z/path*' + +test_expect_success \ + 'git diff-index --detect-dir-renames everything from toplevel.' \ + 'git diff-index --detect-dir-renames HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 ./* z/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 path0 z/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 path1 z/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 path2 z/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 path3 z/path3 +EOF + +test_expect_failure \ + 'validate the output for a move of everything from toplevel.' \ + 'compare_diff_patch expected current.filtered' + +test_done -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename. 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson @ 2010-10-03 20:42 ` Yann Dirson 2010-10-03 20:42 ` [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag Yann Dirson 2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier 2010-10-04 3:03 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Ævar Arnfjörð Bjarmason 2010-10-04 20:32 ` Jonathan Nieder 2 siblings, 2 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-03 20:42 UTC (permalink / raw) To: git; +Cc: Yann Dirson From: Yann Dirson <ydirson@free.fr> Once --detect-dir-renames has identified groups of bulk-moved files, and then the --hide-dir-rename-details flag hides those of the individual renames which carry no other information (further name change, or content changes). This makes it much easier to a human reader to spot content changes in a commit that also moves a whole subtree. --- diff.c | 7 +++++ diff.h | 3 ++ diffcore-rename.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++-- diffcore.h | 1 + 4 files changed, 75 insertions(+), 3 deletions(-) diff --git a/diff.c b/diff.c index 47ef7ea..2731c33 100644 --- a/diff.c +++ b/diff.c @@ -3193,6 +3193,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) if (!options->detect_rename) options->detect_rename = DIFF_DETECT_RENAME; } + else if (!strcmp(arg, "--hide-dir-rename-details")) { + DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS); + if (!DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) + DIFF_OPT_SET(options, DETECT_DIR_RENAMES); + if (!options->detect_rename) + options->detect_rename = DIFF_DETECT_RENAME; + } else if (!strcmp(arg, "--follow")) DIFF_OPT_SET(options, FOLLOW_RENAMES); else if (!strcmp(arg, "--color")) diff --git a/diff.h b/diff.h index 40c548d..ae1e2e7 100644 --- a/diff.h +++ b/diff.h @@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data) #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26) #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27) #define DIFF_OPT_DETECT_DIR_RENAMES (1 << 28) +#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29) #define DIFF_OPT_TST(opts, flag) ((opts)->flags & DIFF_OPT_##flag) #define DIFF_OPT_SET(opts, flag) ((opts)->flags |= DIFF_OPT_##flag) @@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *); " try unchanged files as candidate for copy detection.\n" \ " --detect-dir-renames\n" \ " detect wholesale directory renames.\n" \ +" --hide-dir-rename-details\n" \ +" hide renames of individual files in a directory rename.\n" \ " -l<n> limit rename attempts up to <n> paths.\n" \ " -O<file> reorder diffs according to the <file>.\n" \ " -S<string> find filepair whose only one side contains the string.\n" \ diff --git a/diffcore-rename.c b/diffcore-rename.c index 06a8b6c..89e0b53 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -452,6 +452,34 @@ struct diff_dir_rename { struct diff_dir_rename* next; }; +/* + * Marks as such file_rename if it is made uninteresting by dir_rename. + * Returns -1 if the file_rename is outside of the range in which given + * renames concerned by dir_rename are to be found (ie. end of loop), + * 0 otherwise. + */ +static int maybe_mark_uninteresting(struct diff_rename_dst* file_rename, + struct diff_dir_rename* dir_rename, + int one_len, int two_len) +{ + if (!file_rename->pair) /* file add */ + return 0; + if (strncmp(file_rename->two->path, + dir_rename->two->path, two_len)) + return -1; + if (strncmp(file_rename->pair->one->path, + dir_rename->one->path, one_len) || + !basename_same(file_rename->pair->one, file_rename->two) || + file_rename->pair->score != MAX_SCORE) + return 0; + + file_rename->pair->uninteresting_rename = 1; + fprintf (stderr, "[DBG] %s* -> %s* makes %s -> %s uninteresting\n", + dir_rename->one->path, dir_rename->two->path, + file_rename->pair->one->path, file_rename->two->path); + return 0; +} + // FIXME: prevent possible overflow /* * Copy dirname of src into dst, suitable to append a filename without @@ -497,7 +525,7 @@ static const char* copy_dirname(char* dst, const char* src) * FIXME: ideas to consider under-threshold moves as part of bulk move ? */ static struct diff_dir_rename* factorization_candidates = NULL; -static void diffcore_factorize_renames(void) +static void diffcore_factorize_renames(int opt_hide_renames) { char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX]; int i; @@ -616,6 +644,38 @@ static void diffcore_factorize_renames(void) /* all checks ok, we keep that entry */ } + if (opt_hide_renames) { + // flag as "uninteresting" those candidates hidden by dir move + struct diff_dir_rename* candidate; + for (candidate=factorization_candidates; + candidate; candidate = candidate->next) { + int two_idx, i, one_len, two_len; + if (candidate->discarded) + continue; + + // bisect to any entry within candidate dst dir + struct diff_rename_dst* two_sample = + locate_rename_dst_dir(candidate->two); + if (!two_sample) { + die ("PANIC: %s candidate of rename not in target tree (from %s)\n", + candidate->two->path, candidate->one->path); + } + two_idx = two_sample - rename_dst; + + // now remove extraneous 100% files inside. + one_len = strlen(candidate->one->path); + two_len = strlen(candidate->two->path); + for (i = two_idx; i < rename_dst_nr; i++) + if (maybe_mark_uninteresting (rename_dst+i, candidate, + one_len, two_len) < 0) + break; + for (i = two_idx-1; i >= 0; i--) + if (maybe_mark_uninteresting (rename_dst+i, candidate, + one_len, two_len) < 0) + break; + } + } + return; } @@ -788,7 +848,7 @@ void diffcore_rename(struct diff_options *options) /* Now possibly factorize those renames and copies. */ if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) - diffcore_factorize_renames(); + diffcore_factorize_renames(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS)); DIFF_QUEUE_CLEAR(&outq); @@ -821,7 +881,8 @@ void diffcore_rename(struct diff_options *options) struct diff_rename_dst *dst = locate_rename_dst(p->two, 0); if (dst && dst->pair) { - diff_q(&outq, dst->pair); + if (!dst->pair->uninteresting_rename) + diff_q(&outq, dst->pair); pair_to_free = p; } else diff --git a/diffcore.h b/diffcore.h index b8f1fde..beee596 100644 --- a/diffcore.h +++ b/diffcore.h @@ -69,6 +69,7 @@ struct diff_filepair { unsigned broken_pair : 1; unsigned renamed_pair : 1; unsigned is_unmerged : 1; + unsigned uninteresting_rename : 1; }; #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged) -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag. 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson @ 2010-10-03 20:42 ` Yann Dirson 2010-10-03 23:04 ` Sverre Rabbelier 2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier 1 sibling, 1 reply; 25+ messages in thread From: Yann Dirson @ 2010-10-03 20:42 UTC (permalink / raw) To: git; +Cc: Yann Dirson From: Yann Dirson <ydirson@free.fr> --- t/t4046-diff-rename-factorize.sh | 107 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 107 insertions(+), 0 deletions(-) diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh index 51b0b0b..d982658 100755 --- a/t/t4046-diff-rename-factorize.sh +++ b/t/t4046-diff-rename-factorize.sh @@ -69,6 +69,18 @@ test_expect_success \ 'validate the output for directory move.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details after directory move.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +EOF + +test_expect_success \ + 'validate the output for directory move.' \ + 'compare_diff_patch expected current.filtered' + # now test non-100% renames echo 'Line 16' >> b/path0 @@ -96,6 +108,22 @@ test_expect_success \ 'validate the output for directory move and content changes.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details after directory move and content changes.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +:100644 000000 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a 0000000000000000000000000000000000000000 D a/path3 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/2path +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 4db595d12886f90e36765fc1732c17bccb836663 R093 a/path0 b/path0 +:000000 100644 0000000000000000000000000000000000000000 1ba4650885513e62386fd3e23aeb45beeb67d3bb A b/path100 +EOF + +test_expect_success \ + 'validate the output for directory move and content changes.' \ + 'compare_diff_patch expected current.filtered' + git reset -q --hard # now test bulk moves that are not directory moves (get consensus before going further ?) @@ -130,6 +158,18 @@ test_expect_failure \ 'validate the output for bulk rename without full-dir rename.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details without full-dir rename.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* a/ +EOF + +test_expect_failure \ + 'validate the output for bulk rename without full-dir rename.' \ + 'compare_diff_patch expected current.filtered' + git reset -q --hard # now test moves to toplevel @@ -154,6 +194,18 @@ test_expect_failure \ 'validate the output for files bulk-moved to toplevel.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details files bulk-moved to toplevel.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* ./ +EOF + +test_expect_failure \ + 'validate the output for files bulk-moved to toplevel.' \ + 'compare_diff_patch expected current.filtered' + git reset -q --hard # now test renaming with subdirs (does not take subdirs into account) @@ -192,6 +244,19 @@ test_expect_failure \ 'validate the output for a move including a subdir.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details on a move including a subdir.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 00084e5ea68b5ae339b7c4b429e4a70fe25d069b R096 a/c/apath0 b/c/apath0 +EOF + +test_expect_failure \ + 'validate the output for a move including a subdir.' \ + 'compare_diff_patch expected current.filtered' + git reset -q --hard # now check that moving all files but not subdirs is not mistaken for dir move @@ -217,6 +282,8 @@ test_expect_success \ 'validate the output for a move without a subdir.' \ 'compare_diff_patch expected current.filtered' +# no need to test for --hide-dir-rename-details here + git reset -q --hard # now check that moving subdirs into one dir and files into another is not mistaken for dir move @@ -247,6 +314,22 @@ test_expect_failure \ 'validate the output for a split of subdir and files into different places.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details on a split of subdir and files into different places.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/c/ b/ +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 d/path0 +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 d/path1 +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 d/path2 +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 d/path3 +EOF + +test_expect_failure \ + 'validate the output for a split of subdir and files into different places.' \ + 'compare_diff_patch expected current.filtered' + # now test moving a dir with no files but only subdirs # (only factorizes lowest-level directories - not a big deal, just not perfect) @@ -285,6 +368,18 @@ test_expect_failure \ 'validate the output for a move with only subdirs.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details with only subdirs' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ z/ +EOF + +test_expect_failure \ + 'validate the output for a move with only subdirs.' \ + 'compare_diff_patch expected current.filtered' + # now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax) # Note: this is a special case of move of a dir into one of its own subdirs, which in # turn is a variant of new files/dirs being added into a dir after all its contents @@ -323,4 +418,16 @@ test_expect_failure \ 'validate the output for a move of everything from toplevel.' \ 'compare_diff_patch expected current.filtered' +test_expect_success \ + 'git diff-index --hide-dir-rename-details everything from toplevel.' \ + 'git diff-index --hide-dir-rename-details HEAD >current' +grep -v "^\[DBG\] " <current >current.filtered +cat >expected <<\EOF +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 ./* z/ +EOF + +test_expect_failure \ + 'validate the output for a move of everything from toplevel.' \ + 'compare_diff_patch expected current.filtered' + test_done -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag. 2010-10-03 20:42 ` [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag Yann Dirson @ 2010-10-03 23:04 ` Sverre Rabbelier 0 siblings, 0 replies; 25+ messages in thread From: Sverre Rabbelier @ 2010-10-03 23:04 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Yann Dirson Heya, On Sun, Oct 3, 2010 at 22:42, Yann Dirson <ydirson@altern.org> wrote: > +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ > +:100644 000000 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a 0000000000000000000000000000000000000000 D a/path3 > +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/2path > +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 4db595d12886f90e36765fc1732c17bccb836663 R093 a/path0 b/path0 > +:000000 100644 0000000000000000000000000000000000000000 1ba4650885513e62386fd3e23aeb45beeb67d3bb A b/path100 There was a recent discussion that came to the conclusion that we don't want to be comparing against hashes outside t0000. So either don't use diff-index, or use 'git rev-parse' to get those objects first, and then compare against them symbolically (e.g., ":100644 000000 $(path3) $(nullref)", or such). -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename. 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson 2010-10-03 20:42 ` [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag Yann Dirson @ 2010-10-03 23:06 ` Sverre Rabbelier 2010-10-03 23:28 ` Junio C Hamano 2010-10-04 18:21 ` Yann Dirson 1 sibling, 2 replies; 25+ messages in thread From: Sverre Rabbelier @ 2010-10-03 23:06 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Yann Dirson Heya, On Sun, Oct 3, 2010 at 22:42, Yann Dirson <ydirson@altern.org> wrote: > This makes it much easier to a human reader to spot content changes > in a commit that also moves a whole subtree. I'd like to use this by default (but only for regular 'git diff', 'git log', etc., not for 'git format-patch') if/when it gets merged, can has config option? -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename. 2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier @ 2010-10-03 23:28 ` Junio C Hamano 2010-10-04 6:43 ` Sverre Rabbelier 2010-10-04 18:21 ` Yann Dirson 1 sibling, 1 reply; 25+ messages in thread From: Junio C Hamano @ 2010-10-03 23:28 UTC (permalink / raw) To: Sverre Rabbelier; +Cc: Yann Dirson, git, Yann Dirson Sverre Rabbelier <srabbelier@gmail.com> writes: > I'd like to use this by default (but only for regular 'git diff', 'git > log', etc., not for 'git format-patch') if/when it gets merged, can > has config option? Having a configuration variable might be a good idea, as long as there is a provision to avoid breaking scripted porcelains when the option is set. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename. 2010-10-03 23:28 ` Junio C Hamano @ 2010-10-04 6:43 ` Sverre Rabbelier 0 siblings, 0 replies; 25+ messages in thread From: Sverre Rabbelier @ 2010-10-04 6:43 UTC (permalink / raw) To: Junio C Hamano; +Cc: Yann Dirson, git, Yann Dirson Heya, On Mon, Oct 4, 2010 at 01:28, Junio C Hamano <gitster@pobox.com> wrote: > Having a configuration variable might be a good idea, as long as there is > a provision to avoid breaking scripted porcelains when the option is set. I agree... maybe we need a generic way to have a config that only affects porcelain? -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename. 2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier 2010-10-03 23:28 ` Junio C Hamano @ 2010-10-04 18:21 ` Yann Dirson 1 sibling, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-04 18:21 UTC (permalink / raw) To: Sverre Rabbelier; +Cc: git, Yann Dirson On Mon, Oct 04, 2010 at 01:06:13AM +0200, Sverre Rabbelier wrote: > Heya, > > On Sun, Oct 3, 2010 at 22:42, Yann Dirson <ydirson@altern.org> wrote: > > This makes it much easier to a human reader to spot content changes > > in a commit that also moves a whole subtree. > > I'd like to use this by default (but only for regular 'git diff', 'git > log', etc., not for 'git format-patch') if/when it gets merged, can > has config option? Sure - I was leaving that sort of beels and whistles for after the core feature gets cold enough. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson @ 2010-10-04 3:03 ` Ævar Arnfjörð Bjarmason 2010-10-04 18:32 ` Yann Dirson 2010-10-04 20:32 ` Jonathan Nieder 2 siblings, 1 reply; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-10-04 3:03 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Yann Dirson On Sun, Oct 3, 2010 at 20:42, Yann Dirson <ydirson@altern.org> wrote: > From: Yann Dirson <ydirson@free.fr> > > This notably includes a couple of tests for cases known not to be > working correctly yet. > --- > t/t4046-diff-rename-factorize.sh | 326 ++++++++++++++++++++++++++++++++++++++ > 1 files changed, 326 insertions(+), 0 deletions(-) > create mode 100755 t/t4046-diff-rename-factorize.sh > > diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh > new file mode 100755 > index 0000000..51b0b0b > --- /dev/null > +++ b/t/t4046-diff-rename-factorize.sh > @@ -0,0 +1,326 @@ > +#!/bin/sh > +# > +# Copyright (c) 2008,2010 Yann Dirson > +# Copyright (c) 2005 Junio C Hamano > +# > + > +# TODO - missing tests: > +# * two dirs or more moving all their files to a single dir > +# * simultaneous bulkmove and rename > +# * add a new file under a dir that was moved in the same commit > + > +test_description='Test rename factorization in diff engine. > + > +' > +. ./test-lib.sh > +. "$TEST_DIRECTORY"/diff-lib.sh > + > +test_expect_success \ > + 'commit the index.' \ > + 'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))' Since this is a new test it should use the nominal style: test_expect_success 'commit the index.' ' git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree)) ' > +mkdir a > +echo >a/path0 'Line 1 > +Line 2 > +Line 3 > +Line 4 > +Line 5 > +Line 6 > +Line 7 > +Line 8 > +Line 9 > +Line 10 > +line 11 > +Line 12 > +Line 13 > +Line 14 > +Line 15 > +' > +sed <a/path0 >a/path1 s/Line/Record/ > +sed <a/path0 >a/path2 s/Line/Stuff/ > +sed <a/path0 >a/path3 s/Line/Blurb/ Should be in a test, see "Put all code inside test_expect_success and other assertions." in t/README. > +test_expect_success \ > + 'update-index --add file inside a directory.' \ > + 'git update-index --add a/path*' > + > +test_expect_success \ > + 'commit the index.' \ > + 'git update-ref HEAD $(echo "original set of files" | git commit-tree $(git write-tree))' > + > +mv a b likewise here, and throughout the test of the test. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-04 3:03 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Ævar Arnfjörð Bjarmason @ 2010-10-04 18:32 ` Yann Dirson 0 siblings, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-04 18:32 UTC (permalink / raw) To: ?var Arnfj?r? Bjarmason; +Cc: git, Yann Dirson On Mon, Oct 04, 2010 at 03:03:35AM +0000, ?var Arnfj?r? Bjarmason wrote: > > +mkdir a [...] > > +sed <a/path0 >a/path3 s/Line/Blurb/ > > Should be in a test, see "Put all code inside test_expect_success and > other assertions." in t/README. While I can see the value of having everything run from the test framework, I was thinking it would be even better to have a special "test_setup" clause, which could be as a first step just an alias to test_expect_success, but could afterwards gain features such as "not counted as success, and counted specially when failed", so we get better stats of the number of failures we get, and so that editor outlining/folding modes could be developped in a useful fashion. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson 2010-10-04 3:03 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Ævar Arnfjörð Bjarmason @ 2010-10-04 20:32 ` Jonathan Nieder 2010-10-04 21:37 ` Yann Dirson 2 siblings, 1 reply; 25+ messages in thread From: Jonathan Nieder @ 2010-10-04 20:32 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Yann Dirson Yann Dirson wrote: > From: Yann Dirson <ydirson@free.fr> > Subject: Add testcases for the --detect-dir-renames diffcore flag Probably better for the test to be squashed with the patch that adds that option. > --- /dev/null > +++ b/t/t4046-diff-rename-factorize.sh > @@ -0,0 +1,326 @@ [...] > +test_expect_success \ > + 'commit the index.' \ > + 'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))' Nit: Now test assertions tend to be written as test_expect_success '...' ' command && command && ... ' The tabs to indent and opening ' at the end of the first line means less fuss in lining things up. Making each test assertion include its setup means tests don't pass if something gets messed up in the setup, and using multiline test assertions with && means there is no confusion about what the tests were written to test. As you mentioned, this has the unfortunate downside of messing with syntax highlighting. Maybe the common text editors need a new mode for git-style test scripts? Or maybe it would make sense to implement test_begin_expecting_success '...' ... test_end --- it would certainly make quoting easier. It might make sense to compute the tree, commit, etc one at a time instead of this long one-liner. > + > +mkdir a > +echo >a/path0 'Line 1 > +Line 2 > +Line 3 > +Line 4 > +Line 5 > +Line 6 > +Line 7 > +Line 8 > +Line 9 > +Line 10 > +line 11 > +Line 12 > +Line 13 > +Line 14 > +Line 15 > +' To avoid quoting troubles and since these are innocuous commands, it could be nice to put this before the first test assertion. Or even better: mkdir a && printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 && ... [...] > +test_expect_success \ > + 'git diff-index --detect-dir-renames after directory move.' \ > + 'git diff-index --detect-dir-renames HEAD >current' > +grep -v "^\[DBG\] " <current >current.filtered > +cat >expected <<\EOF > +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ > +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 > +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 > +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 > +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 > +EOF Nit: although compare_diff_patch ensures the result is not dependent on the hash function, these hard-coded hashes are still hard for a human to read. Could they be computed instead? (Sorry, no thoughts about the actual content yet...) --- diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh index 51b0b0b..1e7badb 100755 --- a/t/t4046-diff-rename-factorize.sh +++ b/t/t4046-diff-rename-factorize.sh @@ -15,312 +15,246 @@ test_description='Test rename factorization in diff engine. . ./test-lib.sh . "$TEST_DIRECTORY"/diff-lib.sh -test_expect_success \ - 'commit the index.' \ - 'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))' - -mkdir a -echo >a/path0 'Line 1 -Line 2 -Line 3 -Line 4 -Line 5 -Line 6 -Line 7 -Line 8 -Line 9 -Line 10 -line 11 -Line 12 -Line 13 -Line 14 -Line 15 +test_expect_success 'setup' ' + zeroes=0000000000000000000000000000000000000000 && + + tree=$(git write-tree) && + commit=$(echo "original empty commit" | git commit-tree $tree) && + git update-ref HEAD $commit && + + mkdir a && + printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 && + sed <a/path0 >a/path1 s/Line/Record/ && + sed <a/path0 >a/path2 s/Line/Stuff/ && + sed <a/path0 >a/path3 s/Line/Blurb/ && + + git update-index --add a/path* && + tree=$(git write-tree) && + commit=$(echo "original set of files" | git commit-tree $tree) && + git update-ref HEAD $commit && + + path0_id=$(git rev-parse HEAD:a/path0) && + path1_id=$(git rev-parse HEAD:a/path1) && + path2_id=$(git rev-parse HEAD:a/path2) && + path3_id=$(git rev-parse HEAD:a/path3) && + + : rename the directory && + mv a b && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path* ' -sed <a/path0 >a/path1 s/Line/Record/ -sed <a/path0 >a/path2 s/Line/Stuff/ -sed <a/path0 >a/path3 s/Line/Blurb/ - -test_expect_success \ - 'update-index --add file inside a directory.' \ - 'git update-index --add a/path*' - -test_expect_success \ - 'commit the index.' \ - 'git update-ref HEAD $(echo "original set of files" | git commit-tree $(git write-tree))' - -mv a b -test_expect_success \ - 'renamed the directory.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*' - -test_expect_success \ - 'git diff-index --detect-dir-renames after directory move.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 -EOF - -test_expect_success \ - 'validate the output for directory move.' \ - 'compare_diff_patch expected current.filtered' - -# now test non-100% renames - -echo 'Line 16' >> b/path0 -mv b/path2 b/2path -rm b/path3 -echo anything > b/path100 -test_expect_success \ - 'edited dir contents.' \ - 'git update-index --add --remove b/* b/path2 b/path3' - -test_expect_success \ - 'git diff-index --detect-dir-renames after directory move and content changes.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ -:100644 000000 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a 0000000000000000000000000000000000000000 D a/path3 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/2path -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 4db595d12886f90e36765fc1732c17bccb836663 R093 a/path0 b/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 -:000000 100644 0000000000000000000000000000000000000000 1ba4650885513e62386fd3e23aeb45beeb67d3bb A b/path100 -EOF - -test_expect_success \ - 'validate the output for directory move and content changes.' \ - 'compare_diff_patch expected current.filtered' - -git reset -q --hard - -# now test bulk moves that are not directory moves (get consensus before going further ?) - -mkdir c -for i in 0 1 2; do cp a/path$i c/apath$i; done -test_expect_success \ - 'add files into a new directory.' \ - 'git update-index --add c/apath*' - -test_expect_success \ - 'commit all this.' \ - 'git commit -m "first set of changes"' - -mv c/* a/ -test_expect_success \ - 'move all of the new dir contents into a preexisting dir.' \ - 'git update-index --add --remove a/* c/apath0 c/apath1 c/apath2' - -test_expect_success \ - 'git diff-index --detect-dir-renames without full-dir rename.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* a/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 c/apath0 a/apath0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 c/apath1 a/apath1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 c/apath2 a/apath2 -EOF - -test_expect_failure \ - 'validate the output for bulk rename without full-dir rename.' \ - 'compare_diff_patch expected current.filtered' - -git reset -q --hard - -# now test moves to toplevel - -mv c/* . -test_expect_success \ - 'move all of the new dir contents into toplevel.' \ - 'git update-index --add --remove apath* c/apath0 c/apath1 c/apath2' - -test_expect_success \ - 'git diff-index --detect-dir-renames files bulk-moved to toplevel.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 c/* ./ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 c/apath0 apath0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 c/apath1 apath1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 c/apath2 apath2 -EOF - -test_expect_failure \ - 'validate the output for files bulk-moved to toplevel.' \ - 'compare_diff_patch expected current.filtered' -git reset -q --hard - -# now test renaming with subdirs (does not take subdirs into account) - -mv c a/ -test_expect_success \ - 'move the new dir as subdir of another.' \ - 'git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2' - -test_expect_success \ - 'commit all this.' \ - 'git commit -m "move as subdir"' - -mv a b -echo foo >> b/c/apath0 -test_expect_success \ - 'rename the directory with some changes.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath*' - -test_expect_success \ - 'git diff-index --detect-dir-renames on a move including a subdir.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 00084e5ea68b5ae339b7c4b429e4a70fe25d069b R096 a/c/apath0 b/c/apath0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 b/c/apath1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 b/c/apath2 -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 -EOF - -test_expect_failure \ - 'validate the output for a move including a subdir.' \ - 'compare_diff_patch expected current.filtered' +test_expect_success 'diff-index --detect-dir-renames after directory move.' ' + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 a/ b/ + :100644 100644 $path0_id $path0_id R100 a/path0 b/path0 + :100644 100644 $path1_id $path1_id R100 a/path1 b/path1 + :100644 100644 $path2_id $path2_id R100 a/path2 b/path2 + :100644 100644 $path3_id $path3_id R100 a/path3 b/path3 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -git reset -q --hard +test_expect_success 'setup non-100% rename' ' + echo "Line 16" >>b/path0 && + mv b/path2 b/2path && + rm b/path3 && + echo anything >b/path100 && + git update-index --add --remove b/* b/path2 b/path3 +' -# now check that moving all files but not subdirs is not mistaken for dir move +test_expect_success 'diff-index --detect-dir-renames after content changes.' ' + path0_id2=$(git rev-parse :b/path0) && + path100_id=$(git rev-parse :b/path100) && + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 a/ b/ + :100644 000000 $path3_id $zeroes D a/path3 + :100644 100644 $path2_id $path2_id R100 a/path2 b/2path + :100644 100644 $path0_id $path0_id2 R093 a/path0 b/path0 + :100644 100644 $path1_id $path1_id R100 a/path1 b/path1 + :000000 100644 $zeroes $path100_id A b/path100 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -mkdir b -mv a/path* b/ -test_expect_success \ - 'rename files in the directory but not subdir.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*' +test_expect_success 'setup bulk move that is not directory move' ' + git reset -q --hard && + + mkdir c && + ( + for i in 0 1 2 + do + cp a/path$i c/apath$i || + exit + done + ) && + git update-index --add c/apath* && + git commit -m "first set of changes" && + + mv c/* a/ && + git update-index --add --remove a/* c/apath0 c/apath1 c/apath2 +' -test_expect_success \ - 'git diff-index --detect-dir-renames on a move without a subdir.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 -EOF +test_expect_failure 'diff-index --detect-dir-renames without full-dir rename.' ' + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 c/* a/ + :100644 100644 $path0_id $path0_id R100 c/apath0 a/apath0 + :100644 100644 $path1_id $path1_id R100 c/apath1 a/apath1 + :100644 100644 $path2_id $path2_id R100 c/apath2 a/apath2 + EOF + git diff-index --detect-dir-renames HEAD >current + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -test_expect_success \ - 'validate the output for a move without a subdir.' \ - 'compare_diff_patch expected current.filtered' +test_expect_success 'setup bulk move to toplevel' ' + git reset -q --hard && + mv c/* . && + git update-index --add --remove apath* c/apath0 c/apath1 c/apath2 +' -git reset -q --hard +test_expect_failure 'diff-index --detect-dir-renames bulk move to toplevel.' ' + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 c/* ./ + :100644 100644 $path0_id $path0_id R100 c/apath0 apath0 + :100644 100644 $path1_id $path1_id R100 c/apath1 apath1 + :100644 100644 $path2_id $path2_id R100 c/apath2 apath2 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -# now check that moving subdirs into one dir and files into another is not mistaken for dir move -# (well, clearly it is ...) +test_expect_success 'setup move including a subdir, with some content changes' ' + git reset -q --hard && + mv c a/ && + git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 && + git commit -m "move as subdir" && -mv a/c b -mv a d -test_expect_success \ - 'rename subdir and files into different places.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath*' + mv a b && + echo foo >>b/c/apath0 && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \ + a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath* +' -test_expect_success \ - 'git diff-index --detect-dir-renames on a split of subdir and files into different places.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/c/ b/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/c/apath0 b/apath0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 b/apath1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 b/apath2 -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 d/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 d/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 d/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 d/path3 -EOF +test_expect_failure 'diff-index --detect-dir-renames on a move including a subdir.' ' + path0_id2=$(git rev-parse :b/c/apath0) && + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 a/ b/ + :100644 100644 $path0_id $path0_id2 R096 a/c/apath0 b/c/apath0 + :100644 100644 $path1_id $path1_id R100 a/c/apath1 b/c/apath1 + :100644 100644 $path2_id $path2_id R100 a/c/apath2 b/c/apath2 + :100644 100644 $path0_id $path0_id R100 a/path0 b/path0 + :100644 100644 $path1_id $path1_id R100 a/path1 b/path1 + :100644 100644 $path2_id $path2_id R100 a/path2 b/path2 + :100644 100644 $path3_id $path3_id R100 a/path3 b/path3 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -test_expect_failure \ - 'validate the output for a split of subdir and files into different places.' \ - 'compare_diff_patch expected current.filtered' +test_expect_success 'setup move without a subdir' ' + git reset -q --hard && + mkdir b && + mv a/path* b/ && + : rename files in the directory but not subdir. && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path* +' -# now test moving a dir with no files but only subdirs -# (only factorizes lowest-level directories - not a big deal, just not perfect) +test_expect_success 'moving files but not subdirs is not mistaken for dir move' ' + cat >expected <<-EOF && + :100644 100644 $path0_id $path0_id R100 a/path0 b/path0 + :100644 100644 $path1_id $path1_id R100 a/path1 b/path1 + :100644 100644 $path2_id $path2_id R100 a/path2 b/path2 + :100644 100644 $path3_id $path3_id R100 a/path3 b/path3 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -git reset -q --hard -mkdir a/b -mv a/path* a/b/ -test_expect_success \ - 'setup the directory with only subdirs, no direct child files.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path*' -test_expect_success \ - 'commit all this.' \ - 'git commit -m "move all toplevel files down one level"' +test_expect_success 'setup move of files and subdirs to different places' ' + git reset -q --hard && + mv a/c b && + mv a d && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \ + a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath* +' -test_expect_success \ - 'move this dir.' \ - 'git mv a z' +test_expect_failure 'moving subdirs into one dir and files into another is not mistaken for dir move' ' + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 a/c/ b/ + :100644 100644 $path0_id $path0_id R100 a/c/apath0 b/apath0 + :100644 100644 $path1_id $path1_id R100 a/c/apath1 b/apath1 + :100644 100644 $path2_id $path2_id R100 a/c/apath2 b/apath2 + :100644 100644 $path0_id $path0_id R100 a/path0 d/path0 + :100644 100644 $path1_id $path1_id R100 a/path1 d/path1 + :100644 100644 $path2_id $path2_id R100 a/path2 d/path2 + :100644 100644 $path3_id $path3_id R100 a/path3 d/path3 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' -test_expect_success \ - 'git diff-index --detect-dir-renames with only subdirs' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ z/ -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/b/ z/b/ -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/c/ z/c/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/b/path0 z/b/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/b/path1 z/b/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/b/path2 z/b/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/b/path3 z/b/path3 -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/c/apath0 z/c/apath0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/c/apath1 z/c/apath1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/c/apath2 z/c/apath2 -EOF +test_expect_success 'setup move of dir with only subdirs' ' + git reset -q --hard && + mkdir a/b && + mv a/path* a/b/ && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* && + git commit -m "move all toplevel files down one level" && -test_expect_failure \ - 'validate the output for a move with only subdirs.' \ - 'compare_diff_patch expected current.filtered' + git mv a z +' +test_expect_failure 'moving a dir with no files' ' + cat >expected <<-EOF && + :040000 040000 $zeroes $zeroes R100 a/ z/ + :040000 040000 $zeroes $zeroes R100 a/b/ z/b/ + :040000 040000 $zeroes $zeroes R100 a/c/ z/c/ + :100644 100644 $path0_id $path0_id R100 a/b/path0 z/b/path0 + :100644 100644 $path1_id $path1_id R100 a/b/path1 z/b/path1 + :100644 100644 $path2_id $path2_id R100 a/b/path2 z/b/path2 + :100644 100644 $path3_id $path3_id R100 a/b/path3 z/b/path3 + :100644 100644 $path0_id $path0_id R100 a/c/apath0 z/c/apath0 + :100644 100644 $path1_id $path1_id R100 a/c/apath1 z/c/apath1 + :100644 100644 $path2_id $path2_id R100 a/c/apath2 z/c/apath2 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' # now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax) # Note: this is a special case of move of a dir into one of its own subdirs, which in # turn is a variant of new files/dirs being added into a dir after all its contents # are moved away -git reset -q --hard HEAD~3 +test_expect_success 'setup move from toplevel to subdir' ' + git reset -q --hard HEAD~3 && + mv a/* . && + git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* && + git commit -m "move all files to toplevel" && -mv a/* . -test_expect_success \ - 'rename the directory with some changes.' \ - 'git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path*' + mkdir z && + mv path* z/ && + git update-index --add --remove path0 path1 path2 path3 z/path* +' -test_expect_success \ - 'commit all this.' \ - 'git commit -m "move all files to toplevel"' - -mkdir z -mv path* z/ -test_expect_success \ - 'rename the directory with some changes.' \ - 'git update-index --add --remove path0 path1 path2 path3 z/path*' - -test_expect_success \ - 'git diff-index --detect-dir-renames everything from toplevel.' \ - 'git diff-index --detect-dir-renames HEAD >current' -grep -v "^\[DBG\] " <current >current.filtered -cat >expected <<\EOF -:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 ./* z/ -:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 path0 z/path0 -:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 path1 z/path1 -:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 path2 z/path2 -:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 path3 z/path3 -EOF - -test_expect_failure \ - 'validate the output for a move of everything from toplevel.' \ - 'compare_diff_patch expected current.filtered' +test_expect_failure '--detect-dir-renames everything from toplevel.' ' + cat >expected <<-EOF && + :040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 ./* z/ + :100644 100644 $path0_id $path0_id R100 path0 z/path0 + :100644 100644 $path1_id $path1_id R100 path1 z/path1 + :100644 100644 $path2_id $path2_id R100 path2 z/path2 + :100644 100644 $path3_id $path3_id R100 path3 z/path3 + EOF + git diff-index --detect-dir-renames HEAD >current && + grep -v "^\[DBG\] " <current >current.filtered && + compare_diff_patch expected current.filtered +' test_done -- ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-04 20:32 ` Jonathan Nieder @ 2010-10-04 21:37 ` Yann Dirson 2010-10-04 22:09 ` Jonathan Nieder 2010-10-05 9:21 ` Andreas Ericsson 0 siblings, 2 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-04 21:37 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Yann Dirson On Mon, Oct 04, 2010 at 03:32:41PM -0500, Jonathan Nieder wrote: > Yann Dirson wrote: > > > From: Yann Dirson <ydirson@free.fr> > > Subject: Add testcases for the --detect-dir-renames diffcore flag > > Probably better for the test to be squashed with the patch that adds > that option. I tend to agree, but we see quite a lot of patches split this way it seems. Junio, any preference here ? > > --- /dev/null > > +++ b/t/t4046-diff-rename-factorize.sh > > @@ -0,0 +1,326 @@ > [...] > > +test_expect_success \ > > + 'commit the index.' \ > > + 'git update-ref HEAD $(echo "original empty commit" | git commit-tree $(git write-tree))' > > Nit: > > Now test assertions tend to be written as > > test_expect_success '...' ' > command && > command && > ... > ' > > The tabs to indent and opening ' at the end of the first line means > less fuss in lining things up. Making each test assertion include its > setup means tests don't pass if something gets messed up in the setup, > and using multiline test assertions with && means there is no > confusion about what the tests were written to test. > > As you mentioned, this has the unfortunate downside of messing with > syntax highlighting. Maybe the common text editors need a new mode > for git-style test scripts? There used to be a mmm-mode thingie for emacs which made such things easier (multiple major modes with custom delimiters) - but that did not handle quoting issues automagically either :) One solution seen in some test scripts already is to define functions outside of the test_expect* clauses. > Or maybe it would make sense to implement > > test_begin_expecting_success '...' > ... > test_end > > --- it would certainly make quoting easier. Not sure it would be possible without a macro processor. And not sure a macro processor would be welcome by everyone here :) > It might make sense to compute the tree, commit, etc one at a time > instead of this long one-liner. If moved into a function which would make it readable, yes. > To avoid quoting troubles and since these are innocuous commands, it > could be nice to put this before the first test assertion. Or even > better: > > mkdir a && > printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 && > ... Well, when seeing for the first time such a construct, i tend to wonder how portable it is amont printf implementations. But hey, I'm always thrilled to discover parts of the standard tools I had missed until now :) > [...] > > +test_expect_success \ > > + 'git diff-index --detect-dir-renames after directory move.' \ > > + 'git diff-index --detect-dir-renames HEAD >current' > > +grep -v "^\[DBG\] " <current >current.filtered > > +cat >expected <<\EOF > > +:040000 040000 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 R100 a/ b/ > > +:100644 100644 fdbec444a77953b1bcc899d9fabfa202e5e68f08 fdbec444a77953b1bcc899d9fabfa202e5e68f08 R100 a/path0 b/path0 > > +:100644 100644 2f1f8d70c0fdad990819dfe37a31deb010805161 2f1f8d70c0fdad990819dfe37a31deb010805161 R100 a/path1 b/path1 > > +:100644 100644 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 dbde7141d737c8aa0003672c1bc21ded48c6c3b9 R100 a/path2 b/path2 > > +:100644 100644 c6971ab9f08a6cd9c89a0f87d94ae347aad6144a c6971ab9f08a6cd9c89a0f87d94ae347aad6144a R100 a/path3 b/path3 > > +EOF > > Nit: although compare_diff_patch ensures the result is not dependent > on the hash function, these hard-coded hashes are still hard for a > human to read. Could they be computed instead? Well, that would just make the test harder to read imho. Using regexps would help for readability, but that would require another compare_diff variant. BTW, I should be using compare_diff_raw not compare_diff_patch here. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-04 21:37 ` Yann Dirson @ 2010-10-04 22:09 ` Jonathan Nieder 2010-10-05 9:21 ` Andreas Ericsson 1 sibling, 0 replies; 25+ messages in thread From: Jonathan Nieder @ 2010-10-04 22:09 UTC (permalink / raw) To: Yann Dirson; +Cc: git Yann Dirson wrote: > On Mon, Oct 04, 2010 at 03:32:41PM -0500, Jonathan Nieder wrote: >> It might make sense to compute the tree, commit, etc one at a time >> instead of this long one-liner. > > If moved into a function which would make it readable, yes. Something like commit_index () { test_tick && tree=$(git write-tree) && commit=$( printf "%s\n" "$*" | git commit-tree $tree ) && git update-ref HEAD $commit } ? Maybe the following (similar to what you use in later tests) would be even better, for more verbose output when running with -v. commit_index () { test_tick && git commit -m "$*" } >> printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 && >> ... > > Well, when seeing for the first time such a construct, i tend to > wonder how portable it is amont printf implementations. Yes, it's portable. >> Nit: although compare_diff_patch ensures the result is not dependent >> on the hash function, these hard-coded hashes are still hard for a >> human to read. Could they be computed instead? > > Well, that would just make the test harder to read imho. Using > regexps would help for readability If you had said "writability" I would agree with you here. And that's an important concern, too. What I was suggesting looks like this: path0_id2=$(git rev-parse :b/path0) && path100_id=$(git rev-parse :b/path100) && cat >expected <<-EOF && :040000 040000 $zeroes $zeroes R100 a/ b/ :100644 000000 $path3_id $zeroes D a/path3 :100644 100644 $path2_id $path2_id R100 a/path2 b/2path :100644 100644 $path0_id $path0_id2 R093 a/path0 b/path0 :100644 100644 $path1_id $path1_id R100 a/path1 b/path1 :000000 100644 $zeroes $path100_id A b/path100 EOF ... Maybe it would be better to do cat >expected <<-\EOF && :040000 040000 X X R100 a/ b/ :100644 000000 X X D a/path3 :100644 100644 X X R100 a/path2 b/2path :100644 100644 X X R093 a/path0 b/path0 :100644 100644 X X R100 a/path1 b/path1 :000000 100644 X X A b/path100 EOF since the hashes are not being checked, anyway. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag. 2010-10-04 21:37 ` Yann Dirson 2010-10-04 22:09 ` Jonathan Nieder @ 2010-10-05 9:21 ` Andreas Ericsson 1 sibling, 0 replies; 25+ messages in thread From: Andreas Ericsson @ 2010-10-05 9:21 UTC (permalink / raw) To: Yann Dirson; +Cc: Jonathan Nieder, git On 10/04/2010 11:37 PM, Yann Dirson wrote: > On Mon, Oct 04, 2010 at 03:32:41PM -0500, Jonathan Nieder wrote: >> Yann Dirson wrote: >> >>> From: Yann Dirson<ydirson@free.fr> >>> Subject: Add testcases for the --detect-dir-renames diffcore flag >> >> Probably better for the test to be squashed with the patch that adds >> that option. > > I tend to agree, but we see quite a lot of patches split this way it > seems. Junio, any preference here ? > Personally, I prefer adding the test first with "test_expect_failure" and then changing it to test_expect_success in the patch that has the real code changes. That way it's really easy to accept a testcase and still keep altering how the test itself is fixed, which is usually a far more complex and nitpickley process than showing that the bug is there in the first place. I know Junio has voiced the same preference for patchseries before and especially when the solution isn't obvious. Note that I obviously can't speak for him in any particular case. Patches adding features should ofcourse have the code first and the tests after, since the tests in that case are only about preventing future breakage rather than showing that there is one now. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson @ 2010-10-04 2:59 ` Ævar Arnfjörð Bjarmason 2010-10-04 18:19 ` Yann Dirson 2010-10-04 7:28 ` Jonathan Nieder 2010-10-05 1:06 ` Jonathan Nieder 3 siblings, 1 reply; 25+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2010-10-04 2:59 UTC (permalink / raw) To: Yann Dirson; +Cc: git On Sun, Oct 3, 2010 at 20:42, Yann Dirson <ydirson@altern.org> wrote: > This feature tries to group together files moving from and to > identical directories - the most common case being directory renames. You should change the C++/C99 comments to C89 comments since we target the latter. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-04 2:59 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Ævar Arnfjörð Bjarmason @ 2010-10-04 18:19 ` Yann Dirson 0 siblings, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-04 18:19 UTC (permalink / raw) To: ?var Arnfj?r? Bjarmason; +Cc: git On Mon, Oct 04, 2010 at 02:59:20AM +0000, ?var Arnfj?r? Bjarmason wrote: > On Sun, Oct 3, 2010 at 20:42, Yann Dirson <ydirson@altern.org> wrote: > > This feature tries to group together files moving from and to > > identical directories - the most common case being directory renames. > > You should change the C++/C99 comments to C89 comments since we target > the latter. Right. Most of them are here as reminders to either fix something or be sure fixing it was considered before submission for inclusion - hopefully using unacceptable syntax helps here :) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson 2010-10-04 2:59 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Ævar Arnfjörð Bjarmason @ 2010-10-04 7:28 ` Jonathan Nieder 2010-10-04 21:13 ` Yann Dirson 2010-10-05 1:06 ` Jonathan Nieder 3 siblings, 1 reply; 25+ messages in thread From: Jonathan Nieder @ 2010-10-04 7:28 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Nguyễn Thái Ngọc Duy, Baz (+cc: Duy and Baz, who commented before) Yann Dirson wrote: > This feature tries to group together files moving from and to > identical directories - the most common case being directory renames. > > It is activated by the new --detect-dir-renames diffcore > flag. > > This is only the first step, adding the basic functionnality and > adding support to raw diff output (and it breaks unified-diff output > which does not know how to handle that stuff yet). [...] > For now both the result > of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is > probably not what we want. "Rnnn a/* b/" could be a good choice for > the latter if we want them to be distinguished, and even if we want > them to look the same. Example? > Other future developements to be made on top of this include: > * extension of unified-diff format to express this [...] > --- Oh, so this is for diff --raw only. For the confused: the discussion from v3 perhaps explains why. But: 1. Just like --word-diff, this could be used as a user-facing feature for diffs that cannot be applied 2. Even if no one agrees on the proper diff format, it would be good plumbing to have. v3: http://thread.gmane.org/gmane.comp.version-control.git/99780/focus=99782 v2: http://thread.gmane.org/gmane.comp.version-control.git/99529/focus=99528 v1: http://thread.gmane.org/gmane.comp.version-control.git/94612/focus=96807 I kind of like the cover letter to v1: The idea is to add a new pass to diffcore-rename, to group file renames looking like a full directory move, and then to hide those file renames which do not carry any additionnal information. Here is a sample run: $ ./git-diff-tree --abbrev=6 ee491 --factorize-renames -r [DBG] possible rename from arm/ to bar/ [DBG] possible rename from ppc/ to moved/ [DBG] discarding dir rename of arm/, mixing moved/ and bar/ [DBG] ppc/* -> moved/* makes ppc/sha1ppc.S -> moved/sha1ppc.S uninteresting [DBG] ppc/* -> moved/* makes ppc/sha1.c -> moved/sha1.c uninteresting ee491a42190ec6e716f46a55fa0a7f4e307f1629 :040000 040000 000000... 000000... R100 ppc/ moved/ :100644 100644 9e3ae0... 9e3ae0... R100 arm/sha1.c bar/sha1.c :100644 100644 395264... 395264... R100 arm/sha1.h bar/sha1.h :100644 100644 c3c51a... c065ee... R099 ppc/sha1.h moved/sha1.h :100644 100644 8c1cb9... 8c1cb9... R100 arm/sha1_arm.S moved/sha1_arm.S Presumably this patch takes care of the first step (grouping potential full-directory moves) and not the second (hiding the redundant file renames). Naïve review: > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > ce_option, &dirty_submodule); > if (!changed && !dirty_submodule) { > ce_mark_uptodate(ce); > - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > continue; Hm, why? > @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, [...] Likewise. [...] > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -11,6 +11,7 @@ > static struct diff_rename_dst { > struct diff_filespec *two; > struct diff_filepair *pair; > + int i_am_not_single:1; // does not look for a match, only here to be looked at > } *rename_dst; What does single mean? > @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, [...] > > +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) > +{ > + /* code mostly duplicated from locate_rename_dst - not sure we > + * could merge them efficiently,though > + */ > + int first, last; > + int prefixlength = strlen(dir->path); > + > + first = 0; > + last = rename_dst_nr; > + while (last > first) { > + int next = (last + first) >> 1; > + struct diff_rename_dst *dst = &(rename_dst[next]); > + int cmp = strncmp(dir->path, dst->two->path, prefixlength); > + if (!cmp) > + return dst; > + if (cmp < 0) { > + last = next; > + continue; > + } > + first = next+1; > + } > + /* not found */ > + return NULL; > +} Binary search --- this is just a way to index into the sorted list of rename_dsts, right? At first I thought it was searching for a good rename destination. A comment (or overview in the log message) could help clarify. > @@ -386,8 +414,11 @@ static int find_exact_renames(void) > for (i = 0; i < rename_src_nr; i++) > insert_file_table(&file_table, -1, i, rename_src[i].one); > > - for (i = 0; i < rename_dst_nr; i++) > + for (i = 0; i < rename_dst_nr; i++) { > + if (rename_dst[i].i_am_not_single) > + continue; > insert_file_table(&file_table, 1, i, rename_dst[i].two); > + } What is this code path for? (Sorry for the tedious questions. My thinking is, if I cannot answer them without doing some legwork, how could the person about to break your code who is only glancing for a moment before plunging forward on an exciting new feature?) > @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) > m[worst] = *o; > } > > +struct diff_dir_rename { > + struct diff_filespec *one; > + struct diff_filespec *two; > + int discarded; > + struct diff_dir_rename* next; > +}; Okay. > +// FIXME: prevent possible overflow > +/* > + * Copy dirname of src into dst, suitable to append a filename without > + * an additional "/". > + * Only handles relative paths since there is no absolute path in a git repo. > + * Writes "" when there is no "/" in src. > + * May overwrite more chars than really needed, if src ends with a "/". > + */ > +static const char* copy_dirname(char* dst, const char* src) > +{ > + char* lastslash = strrchr(src, '/'); > + if (!lastslash) { > + *dst = '\0'; > + return dst; > + } > + strncpy(dst, src, lastslash - src + 1); memcpy? > + dst[lastslash - src + 1] = '\0'; > + > + // if src ends with a "/" strip the last component > + if (lastslash[1] == '\0') { > + lastslash = strrchr(dst, '/'); > + if (!lastslash) > + return strcpy(dst, "."); > + lastslash[1] = '\0'; > + } It might be easier to read like this: /* Write name of the parent directory of src to dest. */ static char *remove_last_component(char *dst, const char *src) { size_t len = strlen(src); const char *slash; if (len > 0 && src[len - 1] == '/') /* Trailing slash. Ignore it. */ len--; slash = memrchr(src, '/', len); if (!slash) { *dst = '\0'; return dst; } *mempcpy(dst, src, slash - src) = '\0'; return dst; } Requires the glibc-specific function memrchr(), but that is easy to implement. Compare strchrnul() [in git-compat-util.h]. [...] > +static struct diff_dir_rename* factorization_candidates = NULL; > +static void diffcore_factorize_renames(void) Maybe this could be refactored into multiple functions? > @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) > p->one->rename_used++; > register_rename_src(p->one, p->score); > } > + else { [...] > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { > + /* similarly, rename factorization needs to > + * see all files from second tree, but we don't > + * want them to be matched against single sources. > + */ > + locate_rename_dst(p->two, 1)->i_am_not_single = 1; Huh? > @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options) > > if (rename_dst[i].pair) > continue; /* dealt with exact match already. */ > + if (rename_dst[i].i_am_not_single) > + continue; /* not looking for a match. */ Oh, not single means "not seeking a new relationship". > @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) > /* At this point, we have found some renames and copies and they > * are recorded in rename_dst. The original list is still in *q. > */ > + > + /* Now possibly factorize those renames and copies. */ > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) > + diffcore_factorize_renames(); Huh? Factorize? > DIFF_QUEUE_CLEAR(&outq); > + > + // Now turn non-discarded factorization_candidates into real renames > + struct diff_dir_rename* candidate; > + for (candidate=factorization_candidates; candidate; candidate = candidate->next) { > + struct diff_filepair* pair; > + if (candidate->discarded) continue; > + // visualize toplevel dir if needed - FIXME: wrong place for this ? > + if (!*candidate->one->path) > + candidate->one->path = "./"; > + if (!*candidate->two->path) > + candidate->two->path = "./"; > + pair = diff_queue(&outq, candidate->one, candidate->two); > + pair->score = MAX_SCORE; > + pair->renamed_pair = 1; > + } Looks like I'll need to read diffcore_factorize_renames() after all. :( Sorry, not much to say yet. Hopefully some of that can be useful nonetheless. Jonathan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-04 7:28 ` Jonathan Nieder @ 2010-10-04 21:13 ` Yann Dirson 0 siblings, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-04 21:13 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, Nguy???n Th?i Ng???c Duy, Baz On Mon, Oct 04, 2010 at 02:28:50AM -0500, Jonathan Nieder wrote: > > of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is > > probably not what we want. "Rnnn a/* b/" could be a good choice for > > the latter if we want them to be distinguished, and even if we want > > them to look the same. > > Example? It's just that every time we detect a bulk rename (of all files of a directory) it is currently shown as a directory rename, even if they all get moved into an existing dir. Technically, a "directory rename" (which I'd label "mv a b") is a special case of "bulk rename" (for which "mv a/* b/" is more correct), where the target directory is empty. Currently the distinction between the two is not done, but everything is incorrectly advertized as a rename. I shall fix that. > > Other future developements to be made on top of this include: > > * extension of unified-diff format to express this > [...] > > --- > > Oh, so this is for diff --raw only. Right, but that TODO list has not be revised in this iteration. --detect-dir-renames does not need much to gain unified-diff, it is mostly --hide-dir-rename-details which needs thougth and work - but as it is now targetted at humans only, we can start with whatever seems adequate and rewrite/refine as see fit. > For the confused: the discussion from v3 perhaps explains why. Thanks for this work of putting everything into context - I should have done that myself. > $ ./git-diff-tree --abbrev=6 ee491 --factorize-renames -r > [DBG] possible rename from arm/ to bar/ > [DBG] possible rename from ppc/ to moved/ > [DBG] discarding dir rename of arm/, mixing moved/ and bar/ > [DBG] ppc/* -> moved/* makes ppc/sha1ppc.S -> moved/sha1ppc.S uninteresting > [DBG] ppc/* -> moved/* makes ppc/sha1.c -> moved/sha1.c uninteresting > ee491a42190ec6e716f46a55fa0a7f4e307f1629 > :040000 040000 000000... 000000... R100 ppc/ moved/ > :100644 100644 9e3ae0... 9e3ae0... R100 arm/sha1.c bar/sha1.c > :100644 100644 395264... 395264... R100 arm/sha1.h bar/sha1.h > :100644 100644 c3c51a... c065ee... R099 ppc/sha1.h moved/sha1.h > :100644 100644 8c1cb9... 8c1cb9... R100 arm/sha1_arm.S moved/sha1_arm.S > > Presumably this patch takes care of the first step (grouping potential > full-directory moves) and not the second (hiding the redundant file renames). No, it did hide the two moves labeled as "uninteresting" in the debug traces above. The ones left are one which is part of the dif rename but was also modified, and the other ones (from arm/) are part of a directory split (see 3rd debug line). > Naïve review: > > > --- a/diff-lib.c > > +++ b/diff-lib.c > > @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > > ce_option, &dirty_submodule); > > if (!changed && !dirty_submodule) { > > ce_mark_uptodate(ce); > > - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > > + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > > continue; > > Hm, why? > > @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, > [...] > > Likewise. The code below that (calls to diff_change) was previously only used for FIND_COPIES_HARDER, but we also need to go there for DETECT_DIR_RENAMES. Removing those eg. make test "validate the output for a move without a subdir." fail. I admit I have not re-analyzed this part recently, those tests were already parts of the first iterations. > [...] > > --- a/diffcore-rename.c > > +++ b/diffcore-rename.c > > @@ -11,6 +11,7 @@ > > static struct diff_rename_dst { > > struct diff_filespec *two; > > struct diff_filepair *pair; > > + int i_am_not_single:1; // does not look for a match, only here to be looked at > > } *rename_dst; > > What does single mean? "looking for a match" :) > > @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) > > p->one->rename_used++; > > register_rename_src(p->one, p->score); > > } > > + else { > [...] > > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { > > + /* similarly, rename factorization needs to > > + * see all files from second tree, but we don't > > + * want them to be matched against single sources. > > + */ > > + locate_rename_dst(p->two, 1)->i_am_not_single = 1; > > Huh? > > @@ -386,8 +414,11 @@ static int find_exact_renames(void) > > for (i = 0; i < rename_src_nr; i++) > > insert_file_table(&file_table, -1, i, rename_src[i].one); > > > > - for (i = 0; i < rename_dst_nr; i++) > > + for (i = 0; i < rename_dst_nr; i++) { > > + if (rename_dst[i].i_am_not_single) > > + continue; > > insert_file_table(&file_table, 1, i, rename_dst[i].two); > > + } > > What is this code path for? (Sorry for the tedious questions. My > thinking is, if I cannot answer them without doing some legwork, how > could the person about to break your code who is only glancing for > a moment before plunging forward on an exciting new feature?) Those are definitely related to the above. This mimics the FIND_COPIES_HARDER behaviour (which is adding untouched files from *src* tree to the rename_dst list so the -C algorithm works): here we need to see all files from *dst* tree so we can tell if there was any file left in a dir after a move. This has a cost: we must ensure in other places that codepaths not intended for those extra files are ok. Hm, thinking quickly about it, maybe we could use a separate list to store them, and only run through them when explicitely needed, that would cost less and avoid the polutions you pointed at. I shall have a look at that. > > @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, > [...] > > > > +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) > > +{ > > + /* code mostly duplicated from locate_rename_dst - not sure we > > + * could merge them efficiently,though > > + */ > > + int first, last; > > + int prefixlength = strlen(dir->path); > > + > > + first = 0; > > + last = rename_dst_nr; > > + while (last > first) { > > + int next = (last + first) >> 1; > > + struct diff_rename_dst *dst = &(rename_dst[next]); > > + int cmp = strncmp(dir->path, dst->two->path, prefixlength); > > + if (!cmp) > > + return dst; > > + if (cmp < 0) { > > + last = next; > > + continue; > > + } > > + first = next+1; > > + } > > + /* not found */ > > + return NULL; > > +} > > Binary search --- this is just a way to index into the sorted list > of rename_dsts, right? right. > At first I thought it was searching for a good rename destination. > A comment (or overview in the log message) could help clarify. Ok. More comments are surely good in this hairy piece of code :) > > +// FIXME: prevent possible overflow > > +/* > > + * Copy dirname of src into dst, suitable to append a filename without > > + * an additional "/". > > + * Only handles relative paths since there is no absolute path in a git repo. > > + * Writes "" when there is no "/" in src. > > + * May overwrite more chars than really needed, if src ends with a "/". > > + */ > > +static const char* copy_dirname(char* dst, const char* src) > > +{ > > + char* lastslash = strrchr(src, '/'); > > + if (!lastslash) { > > + *dst = '\0'; > > + return dst; > > + } > > + strncpy(dst, src, lastslash - src + 1); > > memcpy? right > > + dst[lastslash - src + 1] = '\0'; > > + > > + // if src ends with a "/" strip the last component > > + if (lastslash[1] == '\0') { > > + lastslash = strrchr(dst, '/'); > > + if (!lastslash) > > + return strcpy(dst, "."); > > + lastslash[1] = '\0'; > > + } > > It might be easier to read like this: > > /* Write name of the parent directory of src to dest. */ > static char *remove_last_component(char *dst, const char *src) > { > size_t len = strlen(src); > const char *slash; > > if (len > 0 && src[len - 1] == '/') > /* Trailing slash. Ignore it. */ > len--; > > slash = memrchr(src, '/', len); > if (!slash) { > *dst = '\0'; > return dst; > } > > *mempcpy(dst, src, slash - src) = '\0'; > return dst; > } > > Requires the glibc-specific function memrchr(), but that is easy to > implement. Compare strchrnul() [in git-compat-util.h]. Right, that's much more elegant - and should also be more efficient in the "ends with /" case. > [...] > > +static struct diff_dir_rename* factorization_candidates = NULL; > > +static void diffcore_factorize_renames(void) > > Maybe this could be refactored into multiple functions? > > > @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options) > > > > if (rename_dst[i].pair) > > continue; /* dealt with exact match already. */ > > + if (rename_dst[i].i_am_not_single) > > + continue; /* not looking for a match. */ > > Oh, not single means "not seeking a new relationship". > > > @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) > > /* At this point, we have found some renames and copies and they > > * are recorded in rename_dst. The original list is still in *q. > > */ > > + > > + /* Now possibly factorize those renames and copies. */ > > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) > > + diffcore_factorize_renames(); > > Huh? Factorize? "--factorize-renames" was how the single-option was called in the previous series. It has still not turned completely false, but sure using "bulk rename" would be easier to read. > Looks like I'll need to read diffcore_factorize_renames() after all. :( Hopefully that part is better commented :) > Sorry, not much to say yet. Hopefully some of that can be useful > nonetheless. Thanks for this review, that work is always useful. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson ` (2 preceding siblings ...) 2010-10-04 7:28 ` Jonathan Nieder @ 2010-10-05 1:06 ` Jonathan Nieder 2010-10-06 23:13 ` Yann Dirson 3 siblings, 1 reply; 25+ messages in thread From: Jonathan Nieder @ 2010-10-05 1:06 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Nguyễn Thái Ngọc Duy, Brian Ewins Yann Dirson wrote: > This feature tries to group together files moving from and to > identical directories - the most common case being directory renames. Intro for the uninitiated (e.g., me): 1. Entry point for the existing rename detection. The entry point to diffcore's rename translator is diffcore_rename(), which is used, naturally enough, by diffcore_std() if and only if the detect_rename option is enabled. 2. Semantics of the existing rename detection. When diffcore_std() is called, the caller has already queued up a nice list of files to diff (see Documentation/technical/api-diff.txt). The rename detection pass replaces the queue with a new list of filepairs that reflects the detected renames, guided by the detect_rename, rename_score, and rename_limit options. 3. Implementation of the existing rename detection. I. find rename candidates (all files, if detecting copies; added and deleted files, if not) II. find exact renames and copies III. for each target file not accounted for yet, find the best 4 candidate source files IV. record rename winners, in order from most to least similar pair V. record copy winners, in order from most to least similar pair VI. queue up results 4. Data maintained by the existing rename detection. rename_dst, a list of rename candidates from step I, sorted by name For each rename dest candidate: a filespec and a filepair. filepair::one source filespec filepair::two destination filespec filepair::score similarity score rename_src, a list of rename candidates from step I, sorted by name For each rename source candidate: a filespec and a score filespec::rename_used number of filespecs it's paired against (to distinguish renames from copies) > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > ce_option, &dirty_submodule); > if (!changed && !dirty_submodule) { > ce_mark_uptodate(ce); > - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > continue; > } > oldmode = ce->ce_mode; Sane. diff-files and friends need to pass unchanged files to the diffcore lib when directory renames are being considered, just like they would with -C -C. Otherwise, we cannot distinguish between a/1 stays a/2 stays a/3 stays a/4 stays ... a/1000 stays a/1001 -> b/1001 a/1002 -> b/1002 which is clearly not a directory rename, and a/1001 -> b/1001 a/1002 -> b/1002 which sure looks like one. Reference: v0.99~515 (Diff overhaul, adding the other half of copy detection, 2005-05-21). > @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, > > oldmode = old->ce_mode; > if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule && > - !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > + !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > return 0; The diff-index part. Would we also need something like v1.6.4-rc0~45^2 (Avoid "diff-index --cached" optimization under --find-copies-harder, 2009-05-22) in order to grok unchanged subtrees, or is that taken care of some other way? [...] > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -11,6 +11,7 @@ > static struct diff_rename_dst { > struct diff_filespec *two; > struct diff_filepair *pair; > + int i_am_not_single:1; // does not look for a match, only here to be looked at > } *rename_dst; If we're looking for directory renames but not finding-copies-harder, then unmodified files, while interesting and deserving of dst entries, should not be considered candidates for tracing individual files. Yes? If so, maybe the less colorful unsigned rename_target_candidate:1; would make that clearer (and more importantly, the new kind of dst entry should be mentioned in the commit log message). > @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, > rename_dst[first].two = alloc_filespec(two->path); > fill_filespec(rename_dst[first].two, two->sha1, two->mode); > rename_dst[first].pair = NULL; > + rename_dst[first].i_am_not_single = 0; > return &(rename_dst[first]); > } All new files are rename target candidates. > > +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) > +{ > + /* code mostly duplicated from locate_rename_dst - not sure we > + * could merge them efficiently,though > + */ > + int first, last; > + int prefixlength = strlen(dir->path); > + > + first = 0; > + last = rename_dst_nr; > + while (last > first) { > + int next = (last + first) >> 1; > + struct diff_rename_dst *dst = &(rename_dst[next]); > + int cmp = strncmp(dir->path, dst->two->path, prefixlength); prefixcmp? > + if (!cmp) > + return dst; > + if (cmp < 0) { > + last = next; > + continue; > + } > + first = next+1; > + } > + /* not found */ > + return NULL; > +} Hmm, so this is like locate_rename_dst(..., 0), except it matches prefixes. Result is a rename_dst entry within that directory (why not spend the time to find the first?). > + > /* Table of rename/copy src files */ > static struct diff_rename_src { > struct diff_filespec *one; > @@ -386,8 +414,11 @@ static int find_exact_renames(void) > for (i = 0; i < rename_src_nr; i++) > insert_file_table(&file_table, -1, i, rename_src[i].one); > > - for (i = 0; i < rename_dst_nr; i++) > + for (i = 0; i < rename_dst_nr; i++) { > + if (rename_dst[i].i_am_not_single) > + continue; > insert_file_table(&file_table, 1, i, rename_dst[i].two); > + } Prevents the new feature from turning on --find-copies-harder in step II. Good. > > /* Find the renames */ > i = for_each_hash(&file_table, find_same_files); > @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) > m[worst] = *o; > } > > +struct diff_dir_rename { > + struct diff_filespec *one; > + struct diff_filespec *two; > + int discarded; > + struct diff_dir_rename* next; > +}; A linked list of renamed directories. What functions maintain it? [...] > +static const char* copy_dirname(char* dst, const char* src) Discussed already. [...] > +static struct diff_dir_rename* factorization_candidates = NULL; > +static void diffcore_factorize_renames(void) > +{ > + char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX]; > + int i; > + > + for (i = 0; i < rename_dst_nr; i++) { For each rename target candidate: ... Would it make sense to give the body of this loop its own function? > + struct diff_dir_rename* seen; > + > + // FIXME: what are those ? > + if (!rename_dst[i].pair) > + continue; Genuine new files are not interesting to us (since they do not provide evidence about directory renames, postive or negative). > + // dummy renames used by copy detection > + if (!strcmp(rename_dst[i].pair->one->path, rename_dst[i].pair->two->path)) > + continue; Whether a file was copied is probably interesting to us, but the corresponding filepair isn't. > + > + copy_dirname(one_parent_path, rename_dst[i].pair->one->path); > + copy_dirname(two_parent_path, rename_dst[i].pair->two->path); > + > + // simple rename with no directory change > + if (!strcmp(one_parent_path, two_parent_path)) > + continue; File renames within a flat directory are not interesting to us. > + > + struct diff_filespec* one_parent = alloc_filespec(one_parent_path); > + fill_filespec(one_parent, null_sha1 /*FIXME*/, S_IFDIR); Initialize a filespec, for use in filepairs in case the containing directories are a source/target pair. (?) > + > + // After this commit, are there any files still under one_parent_path ? > + struct diff_rename_dst* one_leftover = locate_rename_dst_dir(one_parent); > + if (one_leftover) { // FIXME: should only be run if !seen [...] > + } A candidate for a separate function, no? The idea, if I understand it: too many unchanged files can disqualify a rename source. > + > + // already considered ? > + for (seen=factorization_candidates; seen; seen = seen->next) > + if (!strcmp(seen->one->path, one_parent_path)) break; Lookup. A candidate for a separate function, I think. > + if (!seen) { > + // record potential dir rename > + seen = xmalloc(sizeof(*seen)); > + seen->one = one_parent; > + seen->two = alloc_filespec(two_parent_path); > + fill_filespec(seen->two, null_sha1 /*FIXME*/, S_IFDIR); > + seen->discarded = 0; > + seen->next = factorization_candidates; > + factorization_candidates = seen; > + fprintf(stderr, "[DBG] %s -> %s suggests possible rename from %s to %s\n", > + rename_dst[i].pair->one->path, > + rename_dst[i].pair->two->path, > + one_parent_path, two_parent_path); Insertion. Likewise. > + continue; > + } > + if (seen->discarded) > + continue; One disqualification disqualifies a directory completely. > + // check that seen entry matches this rename > + if (strcmp(two_parent_path, seen->two->path)) { > + fprintf(stderr, "[DBG] discarding dir %s from renames (split into %s and %s)\n", > + one_parent_path, two_parent_path, seen->two->path); > + seen->discarded = 1; > + } If a directory split in two, it gets disqualified. Ok. > + > + /* all checks ok, we keep that entry */ > + } > + > + return; > +} > + > void diffcore_rename(struct diff_options *options) > { > int detect_rename = options->detect_rename; > @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) [...] > if (detect_rename == DIFF_DETECT_COPY) { > /* > * Increment the "rename_used" score by > * one, to indicate ourselves as a user. > */ > p->one->rename_used++; > register_rename_src(p->one, p->score); > } > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { > + /* similarly, rename factorization needs to > + * see all files from second tree, but we don't > + * want them to be matched against single sources. > + */ > + locate_rename_dst(p->two, 1)->i_am_not_single = 1; ... except when --find-copies-harder is being used, right? So I suspect there needs to be a DIFF_OPT_TST(options, FIND_COPIES_HARDER) somewhere around here. > @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options) > > if (rename_dst[i].pair) > continue; /* dealt with exact match already. */ > + if (rename_dst[i].i_am_not_single) > + continue; /* not looking for a match. */ > > m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST]; > for (j = 0; j < NUM_CANDIDATE_PER_DST; j++) Prevents the new feature from turning on --find-copies-harder in step III. Good. > @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) > /* At this point, we have found some renames and copies and they > * are recorded in rename_dst. The original list is still in *q. > */ > + > + /* Now possibly factorize those renames and copies. */ > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) > + diffcore_factorize_renames(); > + > DIFF_QUEUE_CLEAR(&outq); > + > + // Now turn non-discarded factorization_candidates into real renames > + struct diff_dir_rename* candidate; > + for (candidate=factorization_candidates; candidate; candidate = candidate->next) { > + struct diff_filepair* pair; > + if (candidate->discarded) continue; > + // visualize toplevel dir if needed - FIXME: wrong place for this ? > + if (!*candidate->one->path) > + candidate->one->path = "./"; > + if (!*candidate->two->path) > + candidate->two->path = "./"; > + pair = diff_queue(&outq, candidate->one, candidate->two); > + pair->score = MAX_SCORE; > + pair->renamed_pair = 1; Outputting the discovered directory renames. > + } > + > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > struct diff_filepair *pair_to_free = NULL; > diff --git a/tree-diff.c b/tree-diff.c > index cd659c6..ca0b84f 100644 > --- a/tree-diff.c > +++ b/tree-diff.c > @@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const > show_entry(opt, "+", t2, base, baselen); > return 1; > } > - if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2) > + if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && > + !DIFF_OPT_TST(opt, DETECT_DIR_RENAMES) && > + !hashcmp(sha1, sha2) && mode1 == mode2) > return 0; Right, makes sense. Conclusions: - the basic idea looks sane - the main function would benefit a lot from being split up a bit - would be nice to have an overview of the design (especially, a quick description of the heuristics used) for the commit message Looks promising, and a lot simpler than I feared. :) Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore. 2010-10-05 1:06 ` Jonathan Nieder @ 2010-10-06 23:13 ` Yann Dirson 0 siblings, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-06 23:13 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git On Mon, Oct 04, 2010 at 08:06:20PM -0500, Jonathan Nieder wrote: > > @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, > > > > oldmode = old->ce_mode; > > if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule && > > - !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > > + !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > > return 0; > > The diff-index part. > > Would we also need something like v1.6.4-rc0~45^2 (Avoid "diff-index > --cached" optimization under --find-copies-harder, 2009-05-22) in > order to grok unchanged subtrees, or is that taken care of some other > way? Note to self: look at that > [...] > > --- a/diffcore-rename.c > > +++ b/diffcore-rename.c > > @@ -11,6 +11,7 @@ > > static struct diff_rename_dst { > > struct diff_filespec *two; > > struct diff_filepair *pair; > > + int i_am_not_single:1; // does not look for a match, only here to be looked at > > } *rename_dst; > > If we're looking for directory renames but not finding-copies-harder, > then unmodified files, while interesting and deserving of dst entries, > should not be considered candidates for tracing individual files. > > Yes? If so, maybe the less colorful > > unsigned rename_target_candidate:1; > > would make that clearer Right - but maybe we can use a separate list for those non-candidates as I suggested elsewhere, to make things even more clear. > (and more importantly, the new kind of dst entry should be mentioned > in the commit log message). Hm, the commit message is a bit long already, isn't that too much detail ? > > +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) > > +{ > > + /* code mostly duplicated from locate_rename_dst - not sure we > > + * could merge them efficiently,though > > + */ > > + int first, last; > > + int prefixlength = strlen(dir->path); > > + > > + first = 0; > > + last = rename_dst_nr; > > + while (last > first) { > > + int next = (last + first) >> 1; > > + struct diff_rename_dst *dst = &(rename_dst[next]); > > + int cmp = strncmp(dir->path, dst->two->path, prefixlength); > > prefixcmp? Since prefixlength is constant accross the loop, strncmp spares comparing every char with \0 - we only do that once. I guess it's worth keeping. > > + if (!cmp) > > + return dst; > > + if (cmp < 0) { > > + last = next; > > + continue; > > + } > > + first = next+1; > > + } > > + /* not found */ > > + return NULL; > > +} > > Hmm, so this is like locate_rename_dst(..., 0), except it matches > prefixes. Result is a rename_dst entry within that directory (why > not spend the time to find the first?). We just don't need the first. In most cases, finding one entry just invalidates the possiblity of a bulk move. Then when we want all of them, we have to scan the list in both directions - and yes, a clean way to avoid code duplication would be nice for those. > > > > /* Find the renames */ > > i = for_each_hash(&file_table, find_same_files); > > @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) > > m[worst] = *o; > > } > > > > +struct diff_dir_rename { > > + struct diff_filespec *one; > > + struct diff_filespec *two; > > + int discarded; > > + struct diff_dir_rename* next; > > +}; > > A linked list of renamed directories. What functions maintain it? That is just the type decl. For this patch it is only used from diffcore_factorize_renames() and diffcore_rename(), but the --hide-dir-rename-details part needs it for maybe_mark_uninteresting, which explains its position in the source file. > [...] > > +static struct diff_dir_rename* factorization_candidates = NULL; > > +static void diffcore_factorize_renames(void) > > +{ > > + char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX]; > > + int i; > > + > > + for (i = 0; i < rename_dst_nr; i++) { > > For each rename target candidate: ... > > Would it make sense to give the body of this loop its own function? Yes, that makes things more readable. > > + struct diff_dir_rename* seen; > > + > > + // FIXME: what are those ? > > + if (!rename_dst[i].pair) > > + continue; > > Genuine new files are not interesting to us (since they do not > provide evidence about directory renames, postive or negative). Well, maybe except for Junio's idea of using bulk-rename detection to bump rename scores (we may want to look at that afterwards, but I fear it will require heavy surgery). > > + // dummy renames used by copy detection > > + if (!strcmp(rename_dst[i].pair->one->path, rename_dst[i].pair->two->path)) > > + continue; > > Whether a file was copied is probably interesting to us, but the > corresponding filepair isn't. Good point: bulk-rename can already be made to direct move+copy detection to select the correct dst for the move. > > + > > + struct diff_filespec* one_parent = alloc_filespec(one_parent_path); > > + fill_filespec(one_parent, null_sha1 /*FIXME*/, S_IFDIR); > > Initialize a filespec, for use in filepairs in case the containing > directories are a source/target pair. (?) We need a filespec for locate_rename_dst_dir, but it is silly, we can just pass it a path string, and initialize the filespec when really needed. Good catch. > > + > > + // After this commit, are there any files still under one_parent_path ? > > + struct diff_rename_dst* one_leftover = locate_rename_dst_dir(one_parent); > > + if (one_leftover) { // FIXME: should only be run if !seen > [...] > > + } > > A candidate for a separate function, no? Right. > The idea, if I understand it: too many unchanged files can > disqualify a rename source. Not exactly: any file left in a dir disqualifies this dir for bulk-rename. > > + > > + // already considered ? > > + for (seen=factorization_candidates; seen; seen = seen->next) > > + if (!strcmp(seen->one->path, one_parent_path)) break; > > Lookup. A candidate for a separate function, I think. > > > + if (!seen) { > > + // record potential dir rename > > + seen = xmalloc(sizeof(*seen)); > > + seen->one = one_parent; > > + seen->two = alloc_filespec(two_parent_path); > > + fill_filespec(seen->two, null_sha1 /*FIXME*/, S_IFDIR); > > + seen->discarded = 0; > > + seen->next = factorization_candidates; > > + factorization_candidates = seen; > > + fprintf(stderr, "[DBG] %s -> %s suggests possible rename from %s to %s\n", > > + rename_dst[i].pair->one->path, > > + rename_dst[i].pair->two->path, > > + one_parent_path, two_parent_path); > > Insertion. Likewise. Hm, it's only used once, and once the other splits are done the func fits on a screen (at least on my display ;) OTOH, if we decide for a sorted list, to allow for binary search to be more efficient, then yes. > > + > > + /* all checks ok, we keep that entry */ > > + } > > + > > + return; > > +} > > + > > void diffcore_rename(struct diff_options *options) > > { > > int detect_rename = options->detect_rename; > > @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) > [...] > > if (detect_rename == DIFF_DETECT_COPY) { > > /* > > * Increment the "rename_used" score by > > * one, to indicate ourselves as a user. > > */ > > p->one->rename_used++; > > register_rename_src(p->one, p->score); > > } > > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { > > + /* similarly, rename factorization needs to > > + * see all files from second tree, but we don't > > + * want them to be matched against single sources. > > + */ > > + locate_rename_dst(p->two, 1)->i_am_not_single = 1; > > ... except when --find-copies-harder is being used, right? I have not investigated interactions with copy detection yet. > > @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) > > /* At this point, we have found some renames and copies and they > > * are recorded in rename_dst. The original list is still in *q. > > */ > > + > > + /* Now possibly factorize those renames and copies. */ > > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) > > + diffcore_factorize_renames(); > > + > > DIFF_QUEUE_CLEAR(&outq); > > + > > + // Now turn non-discarded factorization_candidates into real renames > > + struct diff_dir_rename* candidate; > > + for (candidate=factorization_candidates; candidate; candidate = candidate->next) { > > + struct diff_filepair* pair; > > + if (candidate->discarded) continue; > > + // visualize toplevel dir if needed - FIXME: wrong place for this ? > > + if (!*candidate->one->path) > > + candidate->one->path = "./"; > > + if (!*candidate->two->path) > > + candidate->two->path = "./"; > > + pair = diff_queue(&outq, candidate->one, candidate->two); > > + pair->score = MAX_SCORE; > > + pair->renamed_pair = 1; > > Outputting the discovered directory renames. Not really outputting, just queueing for ouput here. > Conclusions: > > - the basic idea looks sane > - the main function would benefit a lot from being split up a bit > - would be nice to have an overview of the design (especially, a > quick description of the heuristics used) for the commit message Thanks for this detailed review. Pushing to http://repo.or.cz/w/git/ydirson.git a set of patches addressing those remarks and others from this thread. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH v4 0/4] Detection of directory renames 2010-10-03 20:42 [RFC PATCH v4 0/4] Detection of directory renames Yann Dirson 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson @ 2010-10-04 6:20 ` Jonathan Nieder 2010-10-05 1:42 ` Jonathan Nieder 2 siblings, 0 replies; 25+ messages in thread From: Jonathan Nieder @ 2010-10-04 6:20 UTC (permalink / raw) To: Yann Dirson; +Cc: git Yann Dirson wrote: > At last a reroll of that old series. v3: http://thread.gmane.org/gmane.comp.version-control.git/99780 v2: http://thread.gmane.org/gmane.comp.version-control.git/99529 v1: http://thread.gmane.org/gmane.comp.version-control.git/94612 In particular, from the cover letter to v1: | I often found myself lost when looking at a diff where a couple of | large dirs where renamed, and a handful of files were modified to take | the rename into account - not a rare situation, I'd say. In such a | case, the diffs themselves are mostly hidden among numerous rename | entries. | | To that, I felt that git ought to know better, and could easily | present a directory rename as such. Yes! > I hope we can turn the simple detection feature to something > acceptable for inclusion - that is, once unified diff output shows the > detected renames as annotations, and once the known major holes are > plugged (both from FIXME and testcases). Okay, I look forward to learning the output format. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH v4 0/4] Detection of directory renames 2010-10-03 20:42 [RFC PATCH v4 0/4] Detection of directory renames Yann Dirson 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson 2010-10-04 6:20 ` [RFC PATCH v4 0/4] Detection of directory renames Jonathan Nieder @ 2010-10-05 1:42 ` Jonathan Nieder 2010-10-06 23:17 ` Yann Dirson 2 siblings, 1 reply; 25+ messages in thread From: Jonathan Nieder @ 2010-10-05 1:42 UTC (permalink / raw) To: Yann Dirson; +Cc: git, Nguyễn Thái Ngọc Duy, Brian Ewins Yann Dirson wrote: > The most prominent change is a > split of the old --factorize-renames flag into --detect-dir-renames > and --hide-dir-rename-details - hopefully following Junio's comments > dated nearly 2 years ago. I suspect that makes the patches more readable, too. > I hope we can turn the simple detection feature to something > acceptable for inclusion - that is, once unified diff output shows the > detected renames as annotations, and once the known major holes are > plugged (both from FIXME and testcases). >From my point of view, I think it would be best to start with the smallest usable piece, which is the raw format. It probably makes the most sense to error out when -u and --detect-dir-renames are used together. Then unified diff could be reenabled in a separate patch series on top of this one. Another nice feature might be to let the directory-move detection feed back into file-move detection to make it more accurate. Have you thought about this? Would it be feasible, and if so, would it be useful? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC PATCH v4 0/4] Detection of directory renames 2010-10-05 1:42 ` Jonathan Nieder @ 2010-10-06 23:17 ` Yann Dirson 0 siblings, 0 replies; 25+ messages in thread From: Yann Dirson @ 2010-10-06 23:17 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git On Mon, Oct 04, 2010 at 08:42:08PM -0500, Jonathan Nieder wrote: > >From my point of view, I think it would be best to start with > the smallest usable piece, which is the raw format. It probably > makes the most sense to error out when -u and --detect-dir-renames are > used together. Then unified diff could be reenabled in a separate > patch series on top of this one. Well, there's probably not much to do - in the same order of work than adding the check to error out :) > Another nice feature might be to let the directory-move > detection feed back into file-move detection to make it more > accurate. Have you thought about this? Would it be feasible, > and if so, would it be useful? That was suggested by Junio at that time. It would surely be useful in some cases, but it's not clear to me how frequently it would. And I'd suspect that quite some design changes would be necessary: that would be a 2-way influence, in that a nearly-complete bulkmove could bump the score of the latest rename that would make it a full bulkmove. Probably not trivial to do right. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-10-06 23:07 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-03 20:42 [RFC PATCH v4 0/4] Detection of directory renames Yann Dirson 2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson 2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson 2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson 2010-10-03 20:42 ` [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag Yann Dirson 2010-10-03 23:04 ` Sverre Rabbelier 2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier 2010-10-03 23:28 ` Junio C Hamano 2010-10-04 6:43 ` Sverre Rabbelier 2010-10-04 18:21 ` Yann Dirson 2010-10-04 3:03 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Ævar Arnfjörð Bjarmason 2010-10-04 18:32 ` Yann Dirson 2010-10-04 20:32 ` Jonathan Nieder 2010-10-04 21:37 ` Yann Dirson 2010-10-04 22:09 ` Jonathan Nieder 2010-10-05 9:21 ` Andreas Ericsson 2010-10-04 2:59 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Ævar Arnfjörð Bjarmason 2010-10-04 18:19 ` Yann Dirson 2010-10-04 7:28 ` Jonathan Nieder 2010-10-04 21:13 ` Yann Dirson 2010-10-05 1:06 ` Jonathan Nieder 2010-10-06 23:13 ` Yann Dirson 2010-10-04 6:20 ` [RFC PATCH v4 0/4] Detection of directory renames Jonathan Nieder 2010-10-05 1:42 ` Jonathan Nieder 2010-10-06 23:17 ` Yann Dirson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).