From: Jonathan Nieder <jrnieder@gmail.com>
To: Yann Dirson <ydirson@altern.org>
Cc: git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
Baz <brian.ewins@gmail.com>
Subject: Re: [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore.
Date: Mon, 4 Oct 2010 02:28:50 -0500 [thread overview]
Message-ID: <20101004072850.GJ24884@burratino> (raw)
In-Reply-To: <1286138529-6780-2-git-send-email-ydirson@altern.org>
(+cc: Duy and Baz, who commented before)
Yann Dirson wrote:
> This feature tries to group together files moving from and to
> identical directories - the most common case being directory renames.
>
> It is activated by the new --detect-dir-renames diffcore
> flag.
>
> This is only the first step, adding the basic functionnality and
> adding support to raw diff output (and it breaks unified-diff output
> which does not know how to handle that stuff yet).
[...]
> For now both the result
> of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is
> probably not what we want. "Rnnn a/* b/" could be a good choice for
> the latter if we want them to be distinguished, and even if we want
> them to look the same.
Example?
> Other future developements to be made on top of this include:
> * extension of unified-diff format to express this
[...]
> ---
Oh, so this is for diff --raw only.
For the confused: the discussion from v3 perhaps explains why. But:
1. Just like --word-diff, this could be used as a user-facing feature
for diffs that cannot be applied
2. Even if no one agrees on the proper diff format, it would be good
plumbing to have.
v3: http://thread.gmane.org/gmane.comp.version-control.git/99780/focus=99782
v2: http://thread.gmane.org/gmane.comp.version-control.git/99529/focus=99528
v1: http://thread.gmane.org/gmane.comp.version-control.git/94612/focus=96807
I kind of like the cover letter to v1:
The idea is to add a new pass to diffcore-rename, to group file renames
looking like a full directory move, and then to hide those file
renames which do not carry any additionnal information.
Here is a sample run:
$ ./git-diff-tree --abbrev=6 ee491 --factorize-renames -r
[DBG] possible rename from arm/ to bar/
[DBG] possible rename from ppc/ to moved/
[DBG] discarding dir rename of arm/, mixing moved/ and bar/
[DBG] ppc/* -> moved/* makes ppc/sha1ppc.S -> moved/sha1ppc.S uninteresting
[DBG] ppc/* -> moved/* makes ppc/sha1.c -> moved/sha1.c uninteresting
ee491a42190ec6e716f46a55fa0a7f4e307f1629
:040000 040000 000000... 000000... R100 ppc/ moved/
:100644 100644 9e3ae0... 9e3ae0... R100 arm/sha1.c bar/sha1.c
:100644 100644 395264... 395264... R100 arm/sha1.h bar/sha1.h
:100644 100644 c3c51a... c065ee... R099 ppc/sha1.h moved/sha1.h
:100644 100644 8c1cb9... 8c1cb9... R100 arm/sha1_arm.S moved/sha1_arm.S
Presumably this patch takes care of the first step (grouping potential
full-directory moves) and not the second (hiding the redundant file renames).
Naïve review:
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> ce_option, &dirty_submodule);
> if (!changed && !dirty_submodule) {
> ce_mark_uptodate(ce);
> - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
> + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
> + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES))
> continue;
Hm, why?
> @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
[...]
Likewise.
[...]
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -11,6 +11,7 @@
> static struct diff_rename_dst {
> struct diff_filespec *two;
> struct diff_filepair *pair;
> + int i_am_not_single:1; // does not look for a match, only here to be looked at
> } *rename_dst;
What does single mean?
> @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
[...]
>
> +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir)
> +{
> + /* code mostly duplicated from locate_rename_dst - not sure we
> + * could merge them efficiently,though
> + */
> + int first, last;
> + int prefixlength = strlen(dir->path);
> +
> + first = 0;
> + last = rename_dst_nr;
> + while (last > first) {
> + int next = (last + first) >> 1;
> + struct diff_rename_dst *dst = &(rename_dst[next]);
> + int cmp = strncmp(dir->path, dst->two->path, prefixlength);
> + if (!cmp)
> + return dst;
> + if (cmp < 0) {
> + last = next;
> + continue;
> + }
> + first = next+1;
> + }
> + /* not found */
> + return NULL;
> +}
Binary search --- this is just a way to index into the sorted list
of rename_dsts, right?
At first I thought it was searching for a good rename destination.
A comment (or overview in the log message) could help clarify.
> @@ -386,8 +414,11 @@ static int find_exact_renames(void)
> for (i = 0; i < rename_src_nr; i++)
> insert_file_table(&file_table, -1, i, rename_src[i].one);
>
> - for (i = 0; i < rename_dst_nr; i++)
> + for (i = 0; i < rename_dst_nr; i++) {
> + if (rename_dst[i].i_am_not_single)
> + continue;
> insert_file_table(&file_table, 1, i, rename_dst[i].two);
> + }
What is this code path for? (Sorry for the tedious questions. My
thinking is, if I cannot answer them without doing some legwork, how
could the person about to break your code who is only glancing for
a moment before plunging forward on an exciting new feature?)
> @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
> m[worst] = *o;
> }
>
> +struct diff_dir_rename {
> + struct diff_filespec *one;
> + struct diff_filespec *two;
> + int discarded;
> + struct diff_dir_rename* next;
> +};
Okay.
> +// FIXME: prevent possible overflow
> +/*
> + * Copy dirname of src into dst, suitable to append a filename without
> + * an additional "/".
> + * Only handles relative paths since there is no absolute path in a git repo.
> + * Writes "" when there is no "/" in src.
> + * May overwrite more chars than really needed, if src ends with a "/".
> + */
> +static const char* copy_dirname(char* dst, const char* src)
> +{
> + char* lastslash = strrchr(src, '/');
> + if (!lastslash) {
> + *dst = '\0';
> + return dst;
> + }
> + strncpy(dst, src, lastslash - src + 1);
memcpy?
> + dst[lastslash - src + 1] = '\0';
> +
> + // if src ends with a "/" strip the last component
> + if (lastslash[1] == '\0') {
> + lastslash = strrchr(dst, '/');
> + if (!lastslash)
> + return strcpy(dst, ".");
> + lastslash[1] = '\0';
> + }
It might be easier to read like this:
/* Write name of the parent directory of src to dest. */
static char *remove_last_component(char *dst, const char *src)
{
size_t len = strlen(src);
const char *slash;
if (len > 0 && src[len - 1] == '/')
/* Trailing slash. Ignore it. */
len--;
slash = memrchr(src, '/', len);
if (!slash) {
*dst = '\0';
return dst;
}
*mempcpy(dst, src, slash - src) = '\0';
return dst;
}
Requires the glibc-specific function memrchr(), but that is easy to
implement. Compare strchrnul() [in git-compat-util.h].
[...]
> +static struct diff_dir_rename* factorization_candidates = NULL;
> +static void diffcore_factorize_renames(void)
Maybe this could be refactored into multiple functions?
> @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options)
> p->one->rename_used++;
> register_rename_src(p->one, p->score);
> }
> + else {
[...]
> + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) {
> + /* similarly, rename factorization needs to
> + * see all files from second tree, but we don't
> + * want them to be matched against single sources.
> + */
> + locate_rename_dst(p->two, 1)->i_am_not_single = 1;
Huh?
> @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options)
>
> if (rename_dst[i].pair)
> continue; /* dealt with exact match already. */
> + if (rename_dst[i].i_am_not_single)
> + continue; /* not looking for a match. */
Oh, not single means "not seeking a new relationship".
> @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options)
> /* At this point, we have found some renames and copies and they
> * are recorded in rename_dst. The original list is still in *q.
> */
> +
> + /* Now possibly factorize those renames and copies. */
> + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES))
> + diffcore_factorize_renames();
Huh? Factorize?
> DIFF_QUEUE_CLEAR(&outq);
> +
> + // Now turn non-discarded factorization_candidates into real renames
> + struct diff_dir_rename* candidate;
> + for (candidate=factorization_candidates; candidate; candidate = candidate->next) {
> + struct diff_filepair* pair;
> + if (candidate->discarded) continue;
> + // visualize toplevel dir if needed - FIXME: wrong place for this ?
> + if (!*candidate->one->path)
> + candidate->one->path = "./";
> + if (!*candidate->two->path)
> + candidate->two->path = "./";
> + pair = diff_queue(&outq, candidate->one, candidate->two);
> + pair->score = MAX_SCORE;
> + pair->renamed_pair = 1;
> + }
Looks like I'll need to read diffcore_factorize_renames() after all. :(
Sorry, not much to say yet. Hopefully some of that can be useful
nonetheless.
Jonathan
next prev parent reply other threads:[~2010-10-04 7:32 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-03 20:42 [RFC PATCH v4 0/4] Detection of directory renames Yann Dirson
2010-10-03 20:42 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Yann Dirson
2010-10-03 20:42 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Yann Dirson
2010-10-03 20:42 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Yann Dirson
2010-10-03 20:42 ` [PATCH v4 4/4] Add testcases for the --hide-dir-rename-details diffcore flag Yann Dirson
2010-10-03 23:04 ` Sverre Rabbelier
2010-10-03 23:06 ` [PATCH v4 3/4] Allow hiding renames of individual files involved in a directory rename Sverre Rabbelier
2010-10-03 23:28 ` Junio C Hamano
2010-10-04 6:43 ` Sverre Rabbelier
2010-10-04 18:21 ` Yann Dirson
2010-10-04 3:03 ` [PATCH v4 2/4] Add testcases for the --detect-dir-renames diffcore flag Ævar Arnfjörð Bjarmason
2010-10-04 18:32 ` Yann Dirson
2010-10-04 20:32 ` Jonathan Nieder
2010-10-04 21:37 ` Yann Dirson
2010-10-04 22:09 ` Jonathan Nieder
2010-10-05 9:21 ` Andreas Ericsson
2010-10-04 2:59 ` [PATCH v4 1/4] Introduce wholesame directory move detection in diffcore Ævar Arnfjörð Bjarmason
2010-10-04 18:19 ` Yann Dirson
2010-10-04 7:28 ` Jonathan Nieder [this message]
2010-10-04 21:13 ` Yann Dirson
2010-10-05 1:06 ` Jonathan Nieder
2010-10-06 23:13 ` Yann Dirson
2010-10-04 6:20 ` [RFC PATCH v4 0/4] Detection of directory renames Jonathan Nieder
2010-10-05 1:42 ` Jonathan Nieder
2010-10-06 23:17 ` Yann Dirson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101004072850.GJ24884@burratino \
--to=jrnieder@gmail.com \
--cc=brian.ewins@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=ydirson@altern.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox