* [PATCH 3/3] Diff overhaul, adding the other half of copy detection.
From: Junio C Hamano @ 2005-05-21 9:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vzmuokjhg.fsf@assigned-by-dhcp.cox.net>
This patch extends diff-cache and diff-files to report the
unmodified files to diff-core as well when -C (copy detection)
is in effect, so that the unmodified files can also be used as
the source candidates. The existing test t4003 has been
extended to cover this case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff-cache.c | 3 ++-
diff-files.c | 2 +-
t/t4003-diff-rename-1.sh | 38 +++++++++++++++++++++++++++++++++++++-
3 files changed, 40 insertions(+), 3 deletions(-)
diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -71,7 +71,8 @@ static int show_modified(struct cache_en
}
oldmode = old->ce_mode;
- if (mode == oldmode && !memcmp(sha1, old->sha1, 20))
+ if (mode == oldmode && !memcmp(sha1, old->sha1, 20) &&
+ detect_rename < 2)
return 0;
mode = ntohl(mode);
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -126,7 +126,7 @@ int main(int argc, char **argv)
continue;
}
changed = ce_match_stat(ce, &st);
- if (!changed)
+ if (!changed && detect_rename < 2)
continue;
oldmode = ntohl(ce->ce_mode);
diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
--- a/t/t4003-diff-rename-1.sh
+++ b/t/t4003-diff-rename-1.sh
@@ -22,6 +22,10 @@ test_expect_success \
rm -f COPYING &&
git-update-cache --add --remove COPYING COPYING.?'
+# tree has COPYING. work tree has COPYING.1 and COPYING.2,
+# both are slightly edited. So we say you copy-and-edit one,
+# and rename-and-edit the other.
+
GIT_DIFF_OPTS=-u0 git-diff-cache -M $tree |
sed -e 's/\([0-9][0-9]*\)/#/g' >current &&
cat >expected <<\EOF
@@ -58,7 +62,11 @@ test_expect_success \
test_expect_success \
'prepare work tree again' \
'mv COPYING.2 COPYING &&
- git-update-cache --add --remove COPYING COPYING.1'
+ git-update-cache --add --remove COPYING COPYING.1 COPYING.2'
+
+# tree has COPYING. work tree has COPYING and COPYING.1,
+# both are slightly edited. So we say you edited one,
+# and copy-and-edit the other.
GIT_DIFF_OPTS=-u0 git-diff-cache -C $tree |
sed -e 's/\([0-9][0-9]*\)/#/g' >current
@@ -90,4 +98,32 @@ test_expect_success \
'validate output from rename/copy detection' \
'diff -u current expected'
+test_expect_success \
+ 'prepare work tree once again' \
+ 'cat ../../COPYING >COPYING &&
+ git-update-cache --add --remove COPYING COPYING.1'
+
+# tree has COPYING. work tree has the same COPYING and COPYING.1,
+# but COPYING is not edited. We say you copy-and-edit COPYING.1;
+# this is only possible because -C mode now reports the unmodified
+# file to the diff-core.
+
+GIT_DIFF_OPTS=-u0 git-diff-cache -C $tree |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+EOF
+
+test_expect_success \
+ 'validate output from rename/copy detection' \
+ 'diff -u current expected'
+
test_done
------------------------------------------------
^ permalink raw reply
* [PATCH 2/3] Introducing software archaeologist's tool "pickaxe".
From: Junio C Hamano @ 2005-05-21 9:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vzmuokjhg.fsf@assigned-by-dhcp.cox.net>
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer. From the command line, the user
gives a string he is intersted in.
Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other. For example:
$ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M
would show the diffs that touch the string "diff-tree-helper".
In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.
The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
Documentation/git-diff-cache.txt | 6 +++-
Documentation/git-diff-files.txt | 6 +++-
Documentation/git-diff-helper.txt | 6 +++-
Documentation/git-diff-tree.txt | 5 ++-
Makefile | 3 +-
diff-cache.c | 11 +++++--
diff-files.c | 9 ++++--
diff-helper.c | 10 ++++--
diff-tree.c | 15 ++++++----
diff.c | 23 +++++++++------
diff.h | 1
diffcore-pickaxe.c | 56 ++++++++++++++++++++++++++++++++++++++
diffcore-rename.c | 29 +++++++------------
diffcore.h | 11 ++++---
14 files changed, 140 insertions(+), 51 deletions(-)
new file (100644): diffcore-pickaxe.c
diff --git a/Documentation/git-diff-cache.txt b/Documentation/git-diff-cache.txt
--- a/Documentation/git-diff-cache.txt
+++ b/Documentation/git-diff-cache.txt
@@ -9,7 +9,7 @@ git-diff-cache - Compares content and mo
SYNOPSIS
--------
-'git-diff-cache' [-p] [-r] [-z] [-m] [-M] [-R] [-C] [--cached] <tree-ish>
+'git-diff-cache' [-p] [-r] [-z] [-m] [-M] [-R] [-C] [-S<string>] [--cached] <tree-ish>
DESCRIPTION
-----------
@@ -39,6 +39,10 @@ OPTIONS
-C::
Detect copies as well as renames; implies -p.
+-S<string>::
+ Look for differences that contains the change in <string>.
+
+
-R::
Output diff in reverse.
diff --git a/Documentation/git-diff-files.txt b/Documentation/git-diff-files.txt
--- a/Documentation/git-diff-files.txt
+++ b/Documentation/git-diff-files.txt
@@ -9,7 +9,7 @@ git-diff-files - Compares files in the w
SYNOPSIS
--------
-'git-diff-files' [-p] [-q] [-r] [-z] [-M] [-C] [-R] [<pattern>...]
+'git-diff-files' [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [<pattern>...]
DESCRIPTION
-----------
@@ -35,6 +35,10 @@ OPTIONS
-C::
Detect copies as well as renames; implies -p.
+-S<string>::
+ Look for differences that contains the change in <string>.
+
+
-r::
This flag does not mean anything. It is there only to match
git-diff-tree. Unlike git-diff-tree, git-diff-files always looks
diff --git a/Documentation/git-diff-helper.txt b/Documentation/git-diff-helper.txt
--- a/Documentation/git-diff-helper.txt
+++ b/Documentation/git-diff-helper.txt
@@ -9,7 +9,7 @@ git-diff-helper - Generates patch format
SYNOPSIS
--------
-'git-diff-helper' [-z] [-R] [-M] [-C]
+'git-diff-helper' [-z] [-R] [-M] [-C] [-S<string>]
DESCRIPTION
-----------
@@ -37,6 +37,10 @@ OPTIONS
-C::
Detect copies as well as renames.
+-S<string>::
+ Look for differences that contains the change in <string>.
+
+
See Also
--------
The section on generating patches in link:git-diff-cache.html[git-diff-cache]
diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -9,7 +9,7 @@ git-diff-tree - Compares the content and
SYNOPSIS
--------
-'git-diff-tree' [-p] [-r] [-z] [--stdin] [-M] [-R] [-C] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
+'git-diff-tree' [-p] [-r] [-z] [--stdin] [-M] [-R] [-C] [-S<string>] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
DESCRIPTION
-----------
@@ -43,6 +43,9 @@ OPTIONS
-R::
Output diff in reverse.
+-S<string>::
+ Look for differences that contains the change in <string>.
+
-r::
recurse
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ LIB_H += strbuf.h
LIB_OBJS += strbuf.o
LIB_H += diff.h
-LIB_OBJS += diff.o diffcore-rename.o
+LIB_OBJS += diff.o diffcore-rename.o diffcore-pickaxe.o
LIB_OBJS += gitenv.o
@@ -125,6 +125,7 @@ strbuf.o: $(LIB_H)
gitenv.o: $(LIB_H)
diff.o: $(LIB_H)
diffcore-rename.o : $(LIB_H)
+diffcore-pickaxe.o : $(LIB_H)
test: all
make -C t/ all
diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -8,6 +8,7 @@ static int line_termination = '\n';
static int detect_rename = 0;
static int reverse_diff = 0;
static int diff_score_opt = 0;
+static char *pickaxe = 0;
/* A file entry went away or appeared */
static void show_file(const char *prefix, struct cache_entry *ce, unsigned char *sha1, unsigned int mode)
@@ -153,7 +154,7 @@ static void mark_merge_entries(void)
}
static char *diff_cache_usage =
-"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [--cached] <tree-ish>";
+"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [-S<string>] [--cached] <tree-ish>";
int main(int argc, char **argv)
{
@@ -194,6 +195,10 @@ int main(int argc, char **argv)
reverse_diff = 1;
continue;
}
+ if (!strcmp(arg, "-S")) {
+ pickaxe = arg + 2;
+ continue;
+ }
if (!strcmp(arg, "-m")) {
match_nonexisting = 1;
continue;
@@ -208,8 +213,8 @@ int main(int argc, char **argv)
if (argc != 2 || get_sha1(argv[1], tree_sha1))
usage(diff_cache_usage);
- diff_setup(detect_rename, diff_score_opt, reverse_diff,
- (generate_patch ? -1 : line_termination),
+ diff_setup(detect_rename, diff_score_opt, pickaxe,
+ reverse_diff, (generate_patch ? -1 : line_termination),
NULL, 0);
mark_merge_entries();
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -7,13 +7,14 @@
#include "diff.h"
static const char *diff_files_usage =
-"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [paths...]";
+"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [-S<string>] [paths...]";
static int generate_patch = 0;
static int line_termination = '\n';
static int detect_rename = 0;
static int reverse_diff = 0;
static int diff_score_opt = 0;
+static char *pickaxe = 0;
static int silent = 0;
static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt)
@@ -67,6 +68,8 @@ int main(int argc, char **argv)
line_termination = 0;
else if (!strcmp(argv[1], "-R"))
reverse_diff = 1;
+ else if (!strcmp(argv[1], "-S"))
+ pickaxe = argv[1] + 2;
else if (!strncmp(argv[1], "-M", 2)) {
diff_score_opt = diff_scoreopt_parse(argv[1]);
detect_rename = generate_patch = 1;
@@ -89,8 +92,8 @@ int main(int argc, char **argv)
exit(1);
}
- diff_setup(detect_rename, diff_score_opt, reverse_diff,
- (generate_patch ? -1 : line_termination),
+ diff_setup(detect_rename, diff_score_opt, pickaxe,
+ reverse_diff, (generate_patch ? -1 : line_termination),
NULL, 0);
for (i = 0; i < entries; i++) {
diff --git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -9,6 +9,7 @@
static int detect_rename = 0;
static int diff_score_opt = 0;
static int generate_patch = 1;
+static char *pickaxe = 0;
static int parse_oneside_change(const char *cp, int *mode,
unsigned char *sha1, char *path)
@@ -93,7 +94,7 @@ static int parse_diff_raw_output(const c
}
static const char *diff_helper_usage =
- "git-diff-helper [-z] [-R] [-M] [-C] paths...";
+ "git-diff-helper [-z] [-R] [-M] [-C] [-S<string>] paths...";
int main(int ac, const char **av) {
struct strbuf sb;
@@ -117,14 +118,17 @@ int main(int ac, const char **av) {
detect_rename = 2;
diff_score_opt = diff_scoreopt_parse(av[1]);
}
+ else if (av[1][1] == 'S') {
+ pickaxe = av[1] + 2;
+ }
else
usage(diff_helper_usage);
ac--; av++;
}
/* the remaining parameters are paths patterns */
- diff_setup(detect_rename, diff_score_opt, reverse,
- (generate_patch ? -1 : line_termination),
+ diff_setup(detect_rename, diff_score_opt, pickaxe,
+ reverse, (generate_patch ? -1 : line_termination),
av+1, ac-1);
while (1) {
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -13,6 +13,7 @@ static int generate_patch = 0;
static int detect_rename = 0;
static int reverse_diff = 0;
static int diff_score_opt = 0;
+static char *pickaxe = 0;
static const char *header = NULL;
static const char *header_prefix = "";
@@ -271,8 +272,8 @@ static int diff_tree_sha1_top(const unsi
{
int ret;
- diff_setup(detect_rename, diff_score_opt, reverse_diff,
- (generate_patch ? -1 : line_termination),
+ diff_setup(detect_rename, diff_score_opt, pickaxe,
+ reverse_diff, (generate_patch ? -1 : line_termination),
NULL, 0);
ret = diff_tree_sha1(old, new, base);
diff_flush();
@@ -285,8 +286,8 @@ static int diff_root_tree(const unsigned
void *tree;
unsigned long size;
- diff_setup(detect_rename, diff_score_opt, reverse_diff,
- (generate_patch ? -1 : line_termination),
+ diff_setup(detect_rename, diff_score_opt, pickaxe,
+ reverse_diff, (generate_patch ? -1 : line_termination),
NULL, 0);
tree = read_object_with_reference(new, "tree", &size, NULL);
if (!tree)
@@ -430,7 +431,7 @@ static int diff_tree_stdin(char *line)
}
static char *diff_tree_usage =
-"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-m] [-s] [-v] <tree-ish> <tree-ish>";
+"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-S<string>] [-m] [-s] [-v] <tree-ish> <tree-ish>";
int main(int argc, char **argv)
{
@@ -473,6 +474,10 @@ int main(int argc, char **argv)
recursive = generate_patch = 1;
continue;
}
+ if (!strncmp(arg, "-S", 2)) {
+ pickaxe = arg + 2;
+ continue;
+ }
if (!strncmp(arg, "-M", 2)) {
detect_rename = recursive = generate_patch = 1;
diff_score_opt = diff_scoreopt_parse(arg);
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -17,6 +17,7 @@ static int reverse_diff;
static int diff_raw_output = -1;
static const char **pathspec;
static int speccnt;
+static const char *pickaxe;
static int minimum_score;
static const char *external_diff(void)
@@ -511,8 +512,9 @@ int diff_scoreopt_parse(const char *opt)
return MAX_SCORE * num / scale;
}
-void diff_setup(int detect_rename_, int minimum_score_, int reverse_diff_,
- int diff_raw_output_,
+void diff_setup(int detect_rename_, int minimum_score_,
+ char *pickaxe_,
+ int reverse_diff_, int diff_raw_output_,
const char **pathspec_, int speccnt_)
{
detect_rename = detect_rename_;
@@ -521,15 +523,16 @@ void diff_setup(int detect_rename_, int
diff_raw_output = diff_raw_output_;
speccnt = speccnt_;
minimum_score = minimum_score_ ? : DEFAULT_MINIMUM_SCORE;
+ pickaxe = pickaxe_;
}
static struct diff_queue_struct queued_diff;
-struct diff_file_pair *diff_queue(struct diff_queue_struct *queue,
+struct diff_filepair *diff_queue(struct diff_queue_struct *queue,
struct diff_filespec *one,
struct diff_filespec *two)
{
- struct diff_file_pair *dp = xmalloc(sizeof(*dp));
+ struct diff_filepair *dp = xmalloc(sizeof(*dp));
dp->one = one;
dp->two = two;
dp->xfrm_msg = 0;
@@ -549,7 +552,7 @@ static const char *git_object_type(unsig
return S_ISDIR(mode) ? "tree" : "blob";
}
-static void diff_flush_raw(struct diff_file_pair *p)
+static void diff_flush_raw(struct diff_filepair *p)
{
struct diff_filespec *it;
int addremove;
@@ -583,7 +586,7 @@ static void diff_flush_raw(struct diff_f
sha1_to_hex(it->sha1), it->path, diff_raw_output);
}
-static void diff_flush_patch(struct diff_file_pair *p)
+static void diff_flush_patch(struct diff_filepair *p)
{
const char *name, *other;
@@ -600,7 +603,7 @@ static int identical(struct diff_filespe
{
/* This function is written stricter than necessary to support
* the currently implemented transformers, but the idea is to
- * let transformers to produce diff_file_pairs any way they want,
+ * let transformers to produce diff_filepairs any way they want,
* and filter and clean them up here before producing the output.
*/
@@ -623,7 +626,7 @@ static int identical(struct diff_filespe
return 0;
}
-static void diff_flush_one(struct diff_file_pair *p)
+static void diff_flush_one(struct diff_filepair *p)
{
if (identical(p->one, p->two))
return;
@@ -640,11 +643,13 @@ void diff_flush(void)
if (detect_rename)
diff_detect_rename(q, detect_rename, minimum_score);
+ if (pickaxe)
+ diff_pickaxe(q, pickaxe);
for (i = 0; i < q->nr; i++)
diff_flush_one(q->queue[i]);
for (i = 0; i < q->nr; i++) {
- struct diff_file_pair *p = q->queue[i];
+ struct diff_filepair *p = q->queue[i];
diff_free_filespec_data(p->one);
diff_free_filespec_data(p->two);
free(p->xfrm_msg);
diff --git a/diff.h b/diff.h
--- a/diff.h
+++ b/diff.h
@@ -20,6 +20,7 @@ extern void diff_unmerge(const char *pat
extern int diff_scoreopt_parse(const char *opt);
extern void diff_setup(int detect_rename, int minimum_score,
+ char *pickaxe,
int reverse, int raw_output,
const char **spec, int cnt);
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
new file mode 100644
--- /dev/null
+++ b/diffcore-pickaxe.c
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2005 Junio C Hamano
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "delta.h"
+
+static int contains(struct diff_filespec *one,
+ const char *needle, unsigned long len)
+{
+ unsigned long offset, sz;
+ const char *data;
+ if (diff_populate_filespec(one))
+ return 0;
+ sz = one->size;
+ data = one->data;
+ for (offset = 0; offset + len <= sz; offset++)
+ if (!strncmp(needle, data + offset, len))
+ return 1;
+ return 0;
+}
+
+void diff_pickaxe(struct diff_queue_struct *q, const char *needle)
+{
+ unsigned long len = strlen(needle);
+ int i;
+ struct diff_queue_struct outq;
+ outq.queue = NULL;
+ outq.nr = outq.alloc = 0;
+
+ for (i = 0; i < q->nr; i++) {
+ struct diff_filepair *p = q->queue[i];
+ if (!p->one->file_valid) {
+ if (!p->two->file_valid)
+ continue; /* ignore nonsense */
+ /* created */
+ if (contains(p->two, needle, len))
+ diff_queue(&outq, p->one, p->two);
+ }
+ else if (!p->two->file_valid) {
+ if (contains(p->one, needle, len))
+ diff_queue(&outq, p->one, p->two);
+ }
+ else if (contains(p->one, needle, len) !=
+ contains(p->two, needle, len))
+ diff_queue(&outq, p->one, p->two);
+ }
+ for (i = 0; i < q->nr; i++) {
+ struct diff_filepair *p = q->queue[i];
+ free(p);
+ }
+ free(q->queue);
+ *q = outq;
+ return;
+}
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -129,7 +129,7 @@ static void record_rename_pair(struct di
* To achieve this sort order, we give xform_work the number
* above.
*/
- struct diff_file_pair *dp = diff_queue(outq, src, dst);
+ struct diff_filepair *dp = diff_queue(outq, src, dst);
dp->xfrm_work = (rank * 2 + 1) | (score<<RENAME_SCORE_SHIFT);
dst->xfrm_flags |= RENAME_DST_MATCHED;
}
@@ -148,7 +148,7 @@ static void debug_filespec(struct diff_f
s->size, s->xfrm_flags);
}
-static void debug_filepair(const struct diff_file_pair *p, int i)
+static void debug_filepair(const struct diff_filepair *p, int i)
{
debug_filespec(p->one, i, "one");
debug_filespec(p->two, i, "two");
@@ -165,7 +165,7 @@ static void debug_queue(const char *msg,
fprintf(stderr, "%s\n", msg);
fprintf(stderr, "q->nr = %d\n", q->nr);
for (i = 0; i < q->nr; i++) {
- struct diff_file_pair *p = q->queue[i];
+ struct diff_filepair *p = q->queue[i];
debug_filepair(p, i);
}
}
@@ -180,8 +180,8 @@ static void debug_queue(const char *msg,
*/
static int rank_compare(const void *a_, const void *b_)
{
- const struct diff_file_pair *a = *(const struct diff_file_pair **)a_;
- const struct diff_file_pair *b = *(const struct diff_file_pair **)b_;
+ const struct diff_filepair *a = *(const struct diff_filepair **)a_;
+ const struct diff_filepair *b = *(const struct diff_filepair **)b_;
int a_rank = a->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
int b_rank = b->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
@@ -207,7 +207,7 @@ static int needs_to_stay(struct diff_que
* as the source of rename/copy), we need to copy, not rename.
*/
while (i < q->nr) {
- struct diff_file_pair *p = q->queue[i++];
+ struct diff_filepair *p = q->queue[i++];
if (!p->two->file_valid)
continue; /* removed is fine */
if (strcmp(p->one->path, it->path))
@@ -243,15 +243,8 @@ void diff_detect_rename(struct diff_queu
srcs[0] = &deleted;
srcs[1] = &stay;
- /* NEEDSWORK:
- * (1) make sure we properly ignore but pass trees.
- *
- * (2) make sure we do right thing on the same path deleted
- * and created in the same patch.
- */
-
for (i = 0; i < q->nr; i++) {
- struct diff_file_pair *p = q->queue[i];
+ struct diff_filepair *p = q->queue[i];
if (!p->one->file_valid)
if (!p->two->file_valid)
continue; /* ignore nonsense */
@@ -340,11 +333,11 @@ void diff_detect_rename(struct diff_queu
* See comments at the top of record_rename_pair for numbers used
* to assign xfrm_work.
*
- * Note that we have not annotated the diff_file_pair with any comment
+ * Note that we have not annotated the diff_filepair with any comment
* so there is nothing other than p to free.
*/
for (i = 0; i < q->nr; i++) {
- struct diff_file_pair *dp, *p = q->queue[i];
+ struct diff_filepair *dp, *p = q->queue[i];
if (!p->one->file_valid) {
if (p->two->file_valid) {
/* creation */
@@ -378,7 +371,7 @@ void diff_detect_rename(struct diff_queu
/* Copy it out to q, removing duplicates. */
for (i = 0; i < outq.nr; i++) {
- struct diff_file_pair *p = outq.queue[i];
+ struct diff_filepair *p = outq.queue[i];
if (!p->one->file_valid) {
/* created */
if (p->two->xfrm_flags & RENAME_DST_MATCHED)
@@ -395,7 +388,7 @@ void diff_detect_rename(struct diff_queu
}
else if (strcmp(p->one->path, p->two->path)) {
/* rename or copy */
- struct diff_file_pair *dp =
+ struct diff_filepair *dp =
diff_queue(q, p->one, p->two);
int msglen = (strlen(p->one->path) +
strlen(p->two->path) + 100);
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -38,7 +38,7 @@ extern void fill_filespec(struct diff_fi
extern int diff_populate_filespec(struct diff_filespec *);
extern void diff_free_filespec_data(struct diff_filespec *);
-struct diff_file_pair {
+struct diff_filepair {
struct diff_filespec *one;
struct diff_filespec *two;
char *xfrm_msg;
@@ -47,14 +47,15 @@ struct diff_file_pair {
};
struct diff_queue_struct {
- struct diff_file_pair **queue;
+ struct diff_filepair **queue;
int alloc;
int nr;
};
-extern struct diff_file_pair *diff_queue(struct diff_queue_struct *,
- struct diff_filespec *,
- struct diff_filespec *);
+extern struct diff_filepair *diff_queue(struct diff_queue_struct *,
+ struct diff_filespec *,
+ struct diff_filespec *);
extern void diff_detect_rename(struct diff_queue_struct *, int, int);
+extern void diff_pickaxe(struct diff_queue_struct *, const char *);
#endif
------------------------------------------------
^ permalink raw reply
* [PATCH 1/3] Diff overhaul, adding half of copy detection.
From: Junio C Hamano @ 2005-05-21 9:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <7vzmuokjhg.fsf@assigned-by-dhcp.cox.net>
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine. The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change. The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.
The recently introduced rename detection code has been rewritten
to use the diff-core facility. When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.
This patch also enhances the rename detection code further to be
able to detect copies. Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates. Extending the callers this way will be done in a
separate patch.
Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
Documentation/git-diff-cache.txt | 5
Documentation/git-diff-files.txt | 5
Documentation/git-diff-helper.txt | 4
Documentation/git-diff-tree.txt | 6
Makefile | 5
diff-cache.c | 8
diff-files.c | 7
diff-helper.c | 25 +
diff-tree.c | 8
diff.c | 703 +++++++++++++++-----------------------
diffcore-rename.c | 443 +++++++++++++++++++++++
diffcore.h | 60 +++
git-apply-patch-script | 2
t/t0000-basic.sh | 2
t/t4001-diff-rename.sh | 4
t/t4003-diff-rename-1.sh | 93 +++++
16 files changed, 946 insertions(+), 434 deletions(-)
new file (100644): diffcore-rename.c
new file (100644): diffcore.h
new file (100755): t/t4003-diff-rename-1.sh
diff --git a/Documentation/git-diff-cache.txt b/Documentation/git-diff-cache.txt
--- a/Documentation/git-diff-cache.txt
+++ b/Documentation/git-diff-cache.txt
@@ -9,7 +9,7 @@ git-diff-cache - Compares content and mo
SYNOPSIS
--------
-'git-diff-cache' [-p] [-r] [-z] [-m] [-M] [-R] [--cached] <tree-ish>
+'git-diff-cache' [-p] [-r] [-z] [-m] [-M] [-R] [-C] [--cached] <tree-ish>
DESCRIPTION
-----------
@@ -36,6 +36,9 @@ OPTIONS
-M::
Detect renames; implies -p.
+-C::
+ Detect copies as well as renames; implies -p.
+
-R::
Output diff in reverse.
diff --git a/Documentation/git-diff-files.txt b/Documentation/git-diff-files.txt
--- a/Documentation/git-diff-files.txt
+++ b/Documentation/git-diff-files.txt
@@ -9,7 +9,7 @@ git-diff-files - Compares files in the w
SYNOPSIS
--------
-'git-diff-files' [-p] [-q] [-r] [-z] [-M] [-R] [<pattern>...]
+'git-diff-files' [-p] [-q] [-r] [-z] [-M] [-C] [-R] [<pattern>...]
DESCRIPTION
-----------
@@ -32,6 +32,9 @@ OPTIONS
-M::
Detect renames; implies -p.
+-C::
+ Detect copies as well as renames; implies -p.
+
-r::
This flag does not mean anything. It is there only to match
git-diff-tree. Unlike git-diff-tree, git-diff-files always looks
diff --git a/Documentation/git-diff-helper.txt b/Documentation/git-diff-helper.txt
--- a/Documentation/git-diff-helper.txt
+++ b/Documentation/git-diff-helper.txt
@@ -9,7 +9,7 @@ git-diff-helper - Generates patch format
SYNOPSIS
--------
-'git-diff-helper' [-z] [-R] [-M]
+'git-diff-helper' [-z] [-R] [-M] [-C]
DESCRIPTION
-----------
@@ -34,6 +34,8 @@ OPTIONS
-M::
Detect renames.
+-C::
+ Detect copies as well as renames.
See Also
--------
diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -9,7 +9,7 @@ git-diff-tree - Compares the content and
SYNOPSIS
--------
-'git-diff-tree' [-p] [-r] [-z] [--stdin] [-M] [-R] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
+'git-diff-tree' [-p] [-r] [-z] [--stdin] [-M] [-R] [-C] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
DESCRIPTION
-----------
@@ -36,6 +36,10 @@ OPTIONS
-M::
Detect renames; implies -p, in turn implying also '-r'.
+-C::
+ Detect copies as well as renames; implies -p, in turn
+ implying also '-r'.
+
-R::
Output diff in reverse.
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -45,7 +45,7 @@ LIB_H += strbuf.h
LIB_OBJS += strbuf.o
LIB_H += diff.h
-LIB_OBJS += diff.o
+LIB_OBJS += diff.o diffcore-rename.o
LIB_OBJS += gitenv.o
@@ -121,9 +121,10 @@ object.o: $(LIB_H)
read-cache.o: $(LIB_H)
sha1_file.o: $(LIB_H)
usage.o: $(LIB_H)
-diff.o: $(LIB_H)
strbuf.o: $(LIB_H)
gitenv.o: $(LIB_H)
+diff.o: $(LIB_H)
+diffcore-rename.o : $(LIB_H)
test: all
make -C t/ all
diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -153,7 +153,7 @@ static void mark_merge_entries(void)
}
static char *diff_cache_usage =
-"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-R] [--cached] <tree-ish>";
+"git-diff-cache [-p] [-r] [-z] [-m] [-M] [-C] [-R] [--cached] <tree-ish>";
int main(int argc, char **argv)
{
@@ -180,6 +180,12 @@ int main(int argc, char **argv)
diff_score_opt = diff_scoreopt_parse(arg);
continue;
}
+ if (!strncmp(arg, "-C", 2)) {
+ generate_patch = 1;
+ detect_rename = 2;
+ diff_score_opt = diff_scoreopt_parse(arg);
+ continue;
+ }
if (!strcmp(arg, "-z")) {
line_termination = '\0';
continue;
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -7,7 +7,7 @@
#include "diff.h"
static const char *diff_files_usage =
-"git-diff-files [-p] [-q] [-r] [-z] [-M] [-R] [paths...]";
+"git-diff-files [-p] [-q] [-r] [-z] [-M] [-C] [-R] [paths...]";
static int generate_patch = 0;
static int line_termination = '\n';
@@ -71,6 +71,11 @@ int main(int argc, char **argv)
diff_score_opt = diff_scoreopt_parse(argv[1]);
detect_rename = generate_patch = 1;
}
+ else if (!strncmp(argv[1], "-C", 2)) {
+ diff_score_opt = diff_scoreopt_parse(argv[1]);
+ detect_rename = 2;
+ generate_patch = 1;
+ }
else
usage(diff_files_usage);
argv++; argc--;
diff --git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -8,6 +8,7 @@
static int detect_rename = 0;
static int diff_score_opt = 0;
+static int generate_patch = 1;
static int parse_oneside_change(const char *cp, int *mode,
unsigned char *sha1, char *path)
@@ -20,7 +21,8 @@ static int parse_oneside_change(const ch
cp++;
}
*mode = m;
- if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6))
+ if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6) &&
+ strncmp(cp, "\ttree\t", 6) && strncmp(cp, " tree ", 6))
return -1;
cp += 6;
if (get_sha1_hex(cp, sha1))
@@ -44,11 +46,13 @@ static int parse_diff_raw_output(const c
diff_unmerge(cp + 1);
break;
case '+':
- parse_oneside_change(cp, &new_mode, new_sha1, path);
+ if (parse_oneside_change(cp, &new_mode, new_sha1, path))
+ return -1;
diff_addremove('+', new_mode, new_sha1, path, NULL);
break;
case '-':
- parse_oneside_change(cp, &old_mode, old_sha1, path);
+ if (parse_oneside_change(cp, &old_mode, old_sha1, path))
+ return -1;
diff_addremove('-', old_mode, old_sha1, path, NULL);
break;
case '*':
@@ -64,7 +68,8 @@ static int parse_diff_raw_output(const c
new_mode = (new_mode << 3) | (ch - '0');
cp++;
}
- if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6))
+ if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6) &&
+ strncmp(cp, "\ttree\t", 6) && strncmp(cp, " tree ", 6))
return -1;
cp += 6;
if (get_sha1_hex(cp, old_sha1))
@@ -88,7 +93,7 @@ static int parse_diff_raw_output(const c
}
static const char *diff_helper_usage =
- "git-diff-helper [-z] [-R] [-M] paths...";
+ "git-diff-helper [-z] [-R] [-M] [-C] paths...";
int main(int ac, const char **av) {
struct strbuf sb;
@@ -102,17 +107,25 @@ int main(int ac, const char **av) {
reverse = 1;
else if (av[1][1] == 'z')
line_termination = 0;
+ else if (av[1][1] == 'p') /* hidden from the help */
+ generate_patch = 0;
else if (av[1][1] == 'M') {
detect_rename = 1;
diff_score_opt = diff_scoreopt_parse(av[1]);
}
+ else if (av[1][1] == 'C') {
+ detect_rename = 2;
+ diff_score_opt = diff_scoreopt_parse(av[1]);
+ }
else
usage(diff_helper_usage);
ac--; av++;
}
/* the remaining parameters are paths patterns */
- diff_setup(detect_rename, diff_score_opt, reverse, -1, av+1, ac-1);
+ diff_setup(detect_rename, diff_score_opt, reverse,
+ (generate_patch ? -1 : line_termination),
+ av+1, ac-1);
while (1) {
int status;
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -430,7 +430,7 @@ static int diff_tree_stdin(char *line)
}
static char *diff_tree_usage =
-"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-R] [-m] [-s] [-v] <tree-ish> <tree-ish>";
+"git-diff-tree [-p] [-r] [-z] [--stdin] [-M] [-C] [-R] [-m] [-s] [-v] <tree-ish> <tree-ish>";
int main(int argc, char **argv)
{
@@ -478,6 +478,12 @@ int main(int argc, char **argv)
diff_score_opt = diff_scoreopt_parse(arg);
continue;
}
+ if (!strncmp(arg, "-C", 2)) {
+ detect_rename = 2;
+ recursive = generate_patch = 1;
+ diff_score_opt = diff_scoreopt_parse(arg);
+ continue;
+ }
if (!strcmp(arg, "-z")) {
line_termination = '\0';
continue;
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -7,12 +7,17 @@
#include <limits.h>
#include "cache.h"
#include "diff.h"
-#include "delta.h"
+#include "diffcore.h"
static const char *diff_opts = "-pu";
static unsigned char null_sha1[20] = { 0, };
-#define MAX_SCORE 10000
-#define DEFAULT_MINIMUM_SCORE 5000
+
+static int detect_rename;
+static int reverse_diff;
+static int diff_raw_output = -1;
+static const char **pathspec;
+static int speccnt;
+static int minimum_score;
static const char *external_diff(void)
{
@@ -77,26 +82,16 @@ static char *sq_expand(const char *src)
}
static struct diff_tempfile {
- const char *name;
+ const char *name; /* filename external diff should read from */
char hex[41];
char mode[10];
char tmp_path[50];
} diff_temp[2];
-struct diff_spec {
- unsigned char blob_sha1[20];
- unsigned short mode; /* file mode */
- unsigned sha1_valid : 1; /* if true, use blob_sha1 and trust mode;
- * if false, use the name and read from
- * the filesystem.
- */
- unsigned file_valid : 1; /* if false the file does not exist */
-};
-
static void builtin_diff(const char *name_a,
const char *name_b,
struct diff_tempfile *temp,
- int rename_score)
+ const char *xfrm_msg)
{
int i, next_at, cmd_size;
const char *diff_cmd = "diff -L'%s%s' -L'%s%s'";
@@ -151,14 +146,9 @@ static void builtin_diff(const char *nam
printf("old mode %s\n", temp[0].mode);
printf("new mode %s\n", temp[1].mode);
}
- if (strcmp(name_a, name_b)) {
- if (0 < rename_score)
- printf("rename similarity index %d%%\n",
- (int)(0.5+
- rename_score*100.0/MAX_SCORE));
- printf("rename old %s\n", name_a);
- printf("rename new %s\n", name_b);
- }
+ if (xfrm_msg && xfrm_msg[0])
+ fputs(xfrm_msg, stdout);
+
if (strncmp(temp[0].mode, temp[1].mode, 3))
/* we do not run diff between different kind
* of objects.
@@ -169,6 +159,28 @@ static void builtin_diff(const char *nam
execlp("/bin/sh","sh", "-c", cmd, NULL);
}
+struct diff_filespec *alloc_filespec(const char *path)
+{
+ int namelen = strlen(path);
+ struct diff_filespec *spec = xmalloc(sizeof(*spec) + namelen + 1);
+ spec->path = (char *)(spec + 1);
+ strcpy(spec->path, path);
+ spec->should_free = spec->should_munmap = spec->file_valid = 0;
+ spec->xfrm_flags = 0;
+ spec->size = 0;
+ spec->data = 0;
+ return spec;
+}
+
+void fill_filespec(struct diff_filespec *spec, const unsigned char *sha1,
+ unsigned short mode)
+{
+ spec->mode = mode;
+ memcpy(spec->sha1, sha1, 20);
+ spec->sha1_valid = !!memcmp(sha1, null_sha1, 20);
+ spec->file_valid = 1;
+}
+
/*
* Given a name and sha1 pair, if the dircache tells us the file in
* the work tree has that object contents, return true, so that
@@ -201,13 +213,86 @@ static int work_tree_matches(const char
return 0;
ce = active_cache[pos];
if ((lstat(name, &st) < 0) ||
- !S_ISREG(st.st_mode) ||
+ !S_ISREG(st.st_mode) || /* careful! */
ce_match_stat(ce, &st) ||
memcmp(sha1, ce->sha1, 20))
return 0;
+ /* we return 1 only when we can stat, it is a regular file,
+ * stat information matches, and sha1 recorded in the cache
+ * matches. I.e. we know the file in the work tree really is
+ * the same as the <name, sha1> pair.
+ */
return 1;
}
+/*
+ * While doing rename detection and pickaxe operation, we may need to
+ * grab the data for the blob (or file) for our own in-core comparison.
+ * diff_filespec has data and size fields for this purpose.
+ */
+int diff_populate_filespec(struct diff_filespec *s)
+{
+ int err = 0;
+ if (!s->file_valid)
+ die("internal error: asking to populate invalid file.");
+ if (S_ISDIR(s->mode))
+ return -1;
+
+ if (s->data)
+ return err;
+ if (!s->sha1_valid ||
+ work_tree_matches(s->path, s->sha1)) {
+ struct stat st;
+ int fd;
+ if (lstat(s->path, &st) < 0) {
+ if (errno == ENOENT) {
+ err_empty:
+ err = -1;
+ empty:
+ s->data = "";
+ s->size = 0;
+ return err;
+ }
+ }
+ s->size = st.st_size;
+ if (!s->size)
+ goto empty;
+ if (S_ISLNK(st.st_mode)) {
+ int ret;
+ s->data = xmalloc(s->size);
+ s->should_free = 1;
+ ret = readlink(s->path, s->data, s->size);
+ if (ret < 0) {
+ free(s->data);
+ goto err_empty;
+ }
+ return 0;
+ }
+ fd = open(s->path, O_RDONLY);
+ if (fd < 0)
+ goto err_empty;
+ s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
+ s->should_munmap = 1;
+ close(fd);
+ }
+ else {
+ char type[20];
+ s->data = read_sha1_file(s->sha1, type, &s->size);
+ s->should_free = 1;
+ }
+ return 0;
+}
+
+void diff_free_filespec_data(struct diff_filespec *s)
+{
+ if (s->should_free)
+ free(s->data);
+ else if (s->should_munmap)
+ munmap(s->data, s->size);
+ s->should_free = s->should_munmap = 0;
+ s->data = 0;
+}
+
static void prep_temp_blob(struct diff_tempfile *temp,
void *blob,
unsigned long size,
@@ -231,7 +316,7 @@ static void prep_temp_blob(struct diff_t
static void prepare_temp_file(const char *name,
struct diff_tempfile *temp,
- struct diff_spec *one)
+ struct diff_filespec *one)
{
if (!one->file_valid) {
not_a_valid_file:
@@ -245,13 +330,12 @@ static void prepare_temp_file(const char
}
if (!one->sha1_valid ||
- work_tree_matches(name, one->blob_sha1)) {
+ work_tree_matches(name, one->sha1)) {
struct stat st;
- temp->name = name;
- if (lstat(temp->name, &st) < 0) {
+ if (lstat(name, &st) < 0) {
if (errno == ENOENT)
goto not_a_valid_file;
- die("stat(%s): %s", temp->name, strerror(errno));
+ die("stat(%s): %s", name, strerror(errno));
}
if (S_ISLNK(st.st_mode)) {
int ret;
@@ -263,31 +347,27 @@ static void prepare_temp_file(const char
die("readlink(%s)", name);
prep_temp_blob(temp, buf, st.st_size,
(one->sha1_valid ?
- one->blob_sha1 : null_sha1),
+ one->sha1 : null_sha1),
(one->sha1_valid ?
one->mode : S_IFLNK));
}
else {
+ /* we can borrow from the file in the work tree */
+ temp->name = name;
if (!one->sha1_valid)
strcpy(temp->hex, sha1_to_hex(null_sha1));
else
- strcpy(temp->hex, sha1_to_hex(one->blob_sha1));
+ strcpy(temp->hex, sha1_to_hex(one->sha1));
sprintf(temp->mode, "%06o",
S_IFREG |ce_permissions(st.st_mode));
}
return;
}
else {
- void *blob;
- char type[20];
- unsigned long size;
-
- blob = read_sha1_file(one->blob_sha1, type, &size);
- if (!blob || strcmp(type, "blob"))
- die("unable to read blob object for %s (%s)",
- name, sha1_to_hex(one->blob_sha1));
- prep_temp_blob(temp, blob, size, one->blob_sha1, one->mode);
- free(blob);
+ if (diff_populate_filespec(one))
+ die("cannot read data blob for %s", one->path);
+ prep_temp_blob(temp, one->data, one->size,
+ one->sha1, one->mode);
}
}
@@ -307,13 +387,6 @@ static void remove_tempfile_on_signal(in
remove_tempfile();
}
-static int detect_rename;
-static int reverse_diff;
-static int diff_raw_output = -1;
-static const char **pathspec;
-static int speccnt;
-static int minimum_score;
-
static int matches_pathspec(const char *name)
{
int i;
@@ -341,9 +414,9 @@ static int matches_pathspec(const char *
*/
static void run_external_diff(const char *name,
const char *other,
- struct diff_spec *one,
- struct diff_spec *two,
- int rename_score)
+ struct diff_filespec *one,
+ struct diff_filespec *two,
+ const char *xfrm_msg)
{
struct diff_tempfile *temp = diff_temp;
pid_t pid;
@@ -373,7 +446,7 @@ static void run_external_diff(const char
const char *pgm = external_diff();
if (pgm) {
if (one && two) {
- const char *exec_arg[9];
+ const char *exec_arg[10];
const char **arg = &exec_arg[0];
*arg++ = pgm;
*arg++ = name;
@@ -383,9 +456,11 @@ static void run_external_diff(const char
*arg++ = temp[1].name;
*arg++ = temp[1].hex;
*arg++ = temp[1].mode;
- if (other)
+ if (other) {
*arg++ = other;
- *arg = NULL;
+ *arg++ = xfrm_msg;
+ }
+ *arg = 0;
execvp(pgm, (char *const*) exec_arg);
}
else
@@ -395,7 +470,7 @@ static void run_external_diff(const char
* otherwise we use the built-in one.
*/
if (one && two)
- builtin_diff(name, other ? : name, temp, rename_score);
+ builtin_diff(name, other ? : name, temp, xfrm_msg);
else
printf("* Unmerged path %s\n", name);
exit(0);
@@ -418,335 +493,166 @@ static void run_external_diff(const char
remove_tempfile();
}
-/*
- * We do not detect circular renames. Just hold created and deleted
- * entries and later attempt to match them up. If they do not match,
- * then spit them out as deletes or creates as original.
- */
-
-static struct diff_spec_hold {
- struct diff_spec_hold *next;
- struct diff_spec it;
- unsigned long size;
- int flags;
-#define MATCHED 1
-#define SHOULD_FREE 2
-#define SHOULD_MUNMAP 4
- void *data;
- char path[1];
-} *createdfile, *deletedfile;
-
-static void hold_diff(const char *name,
- struct diff_spec *one,
- struct diff_spec *two)
-{
- struct diff_spec_hold **list, *elem;
-
- if (one->file_valid && two->file_valid)
- die("internal error");
-
- if (!detect_rename) {
- run_external_diff(name, NULL, one, two, -1);
- return;
- }
- elem = xmalloc(sizeof(*elem) + strlen(name));
- strcpy(elem->path, name);
- elem->size = 0;
- elem->data = NULL;
- elem->flags = 0;
- if (one->file_valid) {
- list = &deletedfile;
- elem->it = *one;
- }
- else {
- list = &createdfile;
- elem->it = *two;
- }
- elem->next = *list;
- *list = elem;
-}
-
-static int populate_data(struct diff_spec_hold *s)
+int diff_scoreopt_parse(const char *opt)
{
- char type[20];
+ int diglen, num, scale, i;
+ if (opt[0] != '-' || (opt[1] != 'M' && opt[1] != 'C'))
+ return -1; /* that is not a -M nor -C option */
+ diglen = strspn(opt+2, "0123456789");
+ if (diglen == 0 || strlen(opt+2) != diglen)
+ return 0; /* use default */
+ sscanf(opt+2, "%d", &num);
+ for (i = 0, scale = 1; i < diglen; i++)
+ scale *= 10;
- if (s->data)
- return 0;
- if (s->it.sha1_valid) {
- s->data = read_sha1_file(s->it.blob_sha1, type, &s->size);
- s->flags |= SHOULD_FREE;
- }
- else {
- struct stat st;
- int fd;
- fd = open(s->path, O_RDONLY);
- if (fd < 0)
- return -1;
- if (fstat(fd, &st)) {
- close(fd);
- return -1;
- }
- s->size = st.st_size;
- s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
- close(fd);
- if (!s->size)
- s->data = "";
- else
- s->flags |= SHOULD_MUNMAP;
- }
- return 0;
+ /* user says num divided by scale and we say internally that
+ * is MAX_SCORE * num / scale.
+ */
+ return MAX_SCORE * num / scale;
}
-static void free_data(struct diff_spec_hold *s)
+void diff_setup(int detect_rename_, int minimum_score_, int reverse_diff_,
+ int diff_raw_output_,
+ const char **pathspec_, int speccnt_)
{
- if (s->flags & SHOULD_FREE)
- free(s->data);
- else if (s->flags & SHOULD_MUNMAP)
- munmap(s->data, s->size);
- s->flags &= ~(SHOULD_FREE|SHOULD_MUNMAP);
- s->data = NULL;
+ detect_rename = detect_rename_;
+ reverse_diff = reverse_diff_;
+ pathspec = pathspec_;
+ diff_raw_output = diff_raw_output_;
+ speccnt = speccnt_;
+ minimum_score = minimum_score_ ? : DEFAULT_MINIMUM_SCORE;
}
-static void flush_remaining_diff(struct diff_spec_hold *elem,
- int on_created_list)
-{
- static struct diff_spec null_file_spec;
+static struct diff_queue_struct queued_diff;
- null_file_spec.file_valid = 0;
- for ( ; elem ; elem = elem->next) {
- free_data(elem);
- if (elem->flags & MATCHED)
- continue;
- if (on_created_list)
- run_external_diff(elem->path, NULL,
- &null_file_spec, &elem->it, -1);
- else
- run_external_diff(elem->path, NULL,
- &elem->it, &null_file_spec, -1);
+struct diff_file_pair *diff_queue(struct diff_queue_struct *queue,
+ struct diff_filespec *one,
+ struct diff_filespec *two)
+{
+ struct diff_file_pair *dp = xmalloc(sizeof(*dp));
+ dp->one = one;
+ dp->two = two;
+ dp->xfrm_msg = 0;
+ dp->orig_order = queue->nr;
+ dp->xfrm_work = 0;
+ if (queue->alloc <= queue->nr) {
+ queue->alloc = alloc_nr(queue->alloc);
+ queue->queue = xrealloc(queue->queue,
+ sizeof(dp) * queue->alloc);
}
+ queue->queue[queue->nr++] = dp;
+ return dp;
}
-static int is_exact_match(struct diff_spec_hold *src,
- struct diff_spec_hold *dst)
+static const char *git_object_type(unsigned mode)
{
- if (src->it.sha1_valid && dst->it.sha1_valid &&
- !memcmp(src->it.blob_sha1, dst->it.blob_sha1, 20))
- return 1;
- if (populate_data(src) || populate_data(dst))
- /* this is an error but will be caught downstream */
- return 0;
- if (src->size == dst->size &&
- !memcmp(src->data, dst->data, src->size))
- return 1;
- return 0;
+ return S_ISDIR(mode) ? "tree" : "blob";
}
-static int estimate_similarity(struct diff_spec_hold *src, struct diff_spec_hold *dst)
+static void diff_flush_raw(struct diff_file_pair *p)
{
- /* src points at a deleted file and dst points at a created
- * file. They may be quite similar, in which case we want to
- * say src is renamed to dst.
- *
- * Compare them and return how similar they are, representing
- * the score as an integer between 0 and 10000, except
- * where they match exactly it is considered better than anything
- * else.
- */
- void *delta;
- unsigned long delta_size;
- int score;
-
- delta_size = ((src->size < dst->size) ?
- (dst->size - src->size) : (src->size - dst->size));
-
- /* We would not consider rename followed by more than
- * minimum_score/MAX_SCORE edits; that is, delta_size must be smaller
- * than (src->size + dst->size)/2 * minimum_score/MAX_SCORE,
- * which means...
- */
-
- if ((src->size+dst->size)*minimum_score < delta_size*MAX_SCORE*2)
- return 0;
+ struct diff_filespec *it;
+ int addremove;
- delta = diff_delta(src->data, src->size,
- dst->data, dst->size,
- &delta_size);
- free(delta);
+ /* raw output does not have a way to express rename nor copy */
+ if (strcmp(p->one->path, p->two->path))
+ return;
- /* This "delta" is really xdiff with adler32 and all the
- * overheads but it is a quick and dirty approximation.
- *
- * Now we will give some score to it. 100% edit gets
- * 0 points and 0% edit gets MAX_SCORE points. That is, every
- * 1/MAX_SCORE edit gets 1 point penalty. The amount of penalty is:
- *
- * (delta_size * 2 / (src->size + dst->size)) * MAX_SCORE
- *
- */
- score = MAX_SCORE-(MAX_SCORE*2*delta_size/(src->size+dst->size));
- if (score < 0) return 0;
- if (MAX_SCORE < score) return MAX_SCORE;
- return score;
-}
+ if (p->one->file_valid && p->two->file_valid) {
+ char hex[41];
+ strcpy(hex, sha1_to_hex(p->one->sha1));
+ printf("*%06o->%06o %s %s->%s %s%c",
+ p->one->mode, p->two->mode,
+ git_object_type(p->one->mode),
+ hex, sha1_to_hex(p->two->sha1),
+ p->one->path, diff_raw_output);
+ return;
+ }
-struct diff_score {
- struct diff_spec_hold *src;
- struct diff_spec_hold *dst;
- int score;
-};
+ if (p->one->file_valid) {
+ it = p->one;
+ addremove = '-';
+ } else {
+ it = p->two;
+ addremove = '+';
+ }
-static int score_compare(const void *a_, const void *b_)
-{
- const struct diff_score *a = a_, *b = b_;
- return b->score - a->score;
+ printf("%c%06o %s %s %s%c",
+ addremove,
+ it->mode, git_object_type(it->mode),
+ sha1_to_hex(it->sha1), it->path, diff_raw_output);
}
-static void flush_rename_pair(struct diff_spec_hold *src,
- struct diff_spec_hold *dst,
- int rename_score)
+static void diff_flush_patch(struct diff_file_pair *p)
{
- src->flags |= MATCHED;
- dst->flags |= MATCHED;
- free_data(src);
- free_data(dst);
- run_external_diff(src->path, dst->path,
- &src->it, &dst->it, rename_score);
-}
+ const char *name, *other;
-static void free_held_diff(struct diff_spec_hold *list)
-{
- struct diff_spec_hold *h;
- for (h = list; list; list = h) {
- h = list->next;
- free_data(list);
- free(list);
- }
+ name = p->one->path;
+ other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+ if ((p->one->file_valid && S_ISDIR(p->one->mode)) ||
+ (p->two->file_valid && S_ISDIR(p->two->mode)))
+ return; /* no tree diffs in patch format */
+
+ run_external_diff(name, other, p->one, p->two, p->xfrm_msg);
}
-void diff_flush(void)
+static int identical(struct diff_filespec *one, struct diff_filespec *two)
{
- int num_create, num_delete, c, d;
- struct diff_spec_hold *elem, *src, *dst;
- struct diff_score *mx;
-
- /* We really want to cull the candidates list early
- * with cheap tests in order to avoid doing deltas.
- *
- * With the current callers, we should not have already
- * matched entries at this point, but it is nonetheless
- * checked for sanity.
+ /* This function is written stricter than necessary to support
+ * the currently implemented transformers, but the idea is to
+ * let transformers to produce diff_file_pairs any way they want,
+ * and filter and clean them up here before producing the output.
*/
- for (dst = createdfile; dst; dst = dst->next) {
- if (dst->flags & MATCHED)
- continue;
- for (src = deletedfile; src; src = src->next) {
- if (src->flags & MATCHED)
- continue;
- if (! is_exact_match(src, dst))
- continue;
- flush_rename_pair(src, dst, MAX_SCORE);
- break;
- }
- }
-
- /* Count surviving candidates */
- for (num_create = 0, elem = createdfile; elem; elem = elem->next)
- if (!(elem->flags & MATCHED))
- num_create++;
-
- for (num_delete = 0, elem = deletedfile; elem; elem = elem->next)
- if (!(elem->flags & MATCHED))
- num_delete++;
-
- if (num_create == 0 || num_delete == 0)
- goto exit_path;
-
- mx = xmalloc(sizeof(*mx) * num_create * num_delete);
- for (c = 0, dst = createdfile; dst; dst = dst->next) {
- int base = c * num_delete;
- if (dst->flags & MATCHED)
- continue;
- for (d = 0, src = deletedfile; src; src = src->next) {
- struct diff_score *m = &mx[base+d];
- if (src->flags & MATCHED)
- continue;
- m->src = src;
- m->dst = dst;
- m->score = estimate_similarity(src, dst);
- d++;
- }
- c++;
- }
- qsort(mx, num_create*num_delete, sizeof(*mx), score_compare);
-
-#if 0
- for (c = 0; c < num_create * num_delete; c++) {
- src = mx[c].src;
- dst = mx[c].dst;
- if ((src->flags & MATCHED) || (dst->flags & MATCHED))
- continue;
- fprintf(stderr,
- "**score ** %d %s %s\n",
- mx[c].score, src->path, dst->path);
- }
-#endif
-
- for (c = 0; c < num_create * num_delete; c++) {
- src = mx[c].src;
- dst = mx[c].dst;
- if ((src->flags & MATCHED) || (dst->flags & MATCHED))
- continue;
- if (mx[c].score < minimum_score)
- break;
- flush_rename_pair(src, dst, mx[c].score);
- }
- free(mx);
-
- exit_path:
- flush_remaining_diff(createdfile, 1);
- flush_remaining_diff(deletedfile, 0);
- free_held_diff(createdfile);
- free_held_diff(deletedfile);
- createdfile = deletedfile = NULL;
-}
-int diff_scoreopt_parse(const char *opt)
-{
- int diglen, num, scale, i;
- if (opt[0] != '-' || opt[1] != 'M')
- return -1; /* that is not -M option */
- diglen = strspn(opt+2, "0123456789");
- if (diglen == 0 || strlen(opt+2) != diglen)
- return 0; /* use default */
- sscanf(opt+2, "%d", &num);
- for (i = 0, scale = 1; i < diglen; i++)
- scale *= 10;
+ if (!one->file_valid && !two->file_valid)
+ return 1; /* not interesting */
- /* user says num divided by scale and we say internally that
- * is MAX_SCORE * num / scale.
+ /* deletion, addition, mode change and renames are all interesting. */
+ if ((one->file_valid != two->file_valid) || (one->mode != two->mode) ||
+ strcmp(one->path, two->path))
+ return 0;
+
+ /* both are valid and point at the same path. that is, we are
+ * dealing with a change.
*/
- return MAX_SCORE * num / scale;
+ if (one->sha1_valid && two->sha1_valid &&
+ !memcmp(one->sha1, two->sha1, sizeof(one->sha1)))
+ return 1; /* no change */
+ if (!one->sha1_valid && !two->sha1_valid)
+ return 1; /* both look at the same file on the filesystem. */
+ return 0;
}
-void diff_setup(int detect_rename_, int minimum_score_, int reverse_diff_,
- int diff_raw_output_,
- const char **pathspec_, int speccnt_)
+static void diff_flush_one(struct diff_file_pair *p)
{
- free_held_diff(createdfile);
- free_held_diff(deletedfile);
- createdfile = deletedfile = NULL;
-
- detect_rename = detect_rename_;
- reverse_diff = reverse_diff_;
- pathspec = pathspec_;
- diff_raw_output = diff_raw_output_;
- speccnt = speccnt_;
- minimum_score = minimum_score_ ? : DEFAULT_MINIMUM_SCORE;
+ if (identical(p->one, p->two))
+ return;
+ if (0 <= diff_raw_output)
+ diff_flush_raw(p);
+ else
+ diff_flush_patch(p);
}
-static const char *git_object_type(unsigned mode)
+void diff_flush(void)
{
- return S_ISDIR(mode) ? "tree" : "blob";
+ struct diff_queue_struct *q = &queued_diff;
+ int i;
+
+ if (detect_rename)
+ diff_detect_rename(q, detect_rename, minimum_score);
+ for (i = 0; i < q->nr; i++)
+ diff_flush_one(q->queue[i]);
+
+ for (i = 0; i < q->nr; i++) {
+ struct diff_file_pair *p = q->queue[i];
+ diff_free_filespec_data(p->one);
+ diff_free_filespec_data(p->two);
+ free(p->xfrm_msg);
+ free(p);
+ }
+ free(q->queue);
+ q->queue = NULL;
+ q->nr = q->alloc = 0;
}
void diff_addremove(int addremove, unsigned mode,
@@ -754,41 +660,35 @@ void diff_addremove(int addremove, unsig
const char *base, const char *path)
{
char concatpath[PATH_MAX];
- struct diff_spec spec[2], *one, *two;
+ struct diff_filespec *one, *two;
+ /* This may look odd, but it is a preparation for
+ * feeding "there are unchanged files which should
+ * not produce diffs, but when you are doing copy
+ * detection you would need them, so here they are"
+ * entries to the diff-core. They will be prefixed
+ * with something like '=' or '*' (I haven't decided
+ * which but should not make any difference).
+ * Feeding the same new and old to diff_change() should
+ * also have the same effect. diff_flush() should
+ * filter the identical ones out at the final output
+ * stage.
+ */
if (reverse_diff)
- addremove = (addremove == '+' ? '-' : '+');
-
- if (0 <= diff_raw_output) {
- if (!path)
- path = "";
- printf("%c%06o %s %s %s%s%c",
- addremove,
- mode,
- git_object_type(mode), sha1_to_hex(sha1),
- base, path, diff_raw_output);
- return;
- }
- if (S_ISDIR(mode))
- return;
+ addremove = (addremove == '+' ? '-' :
+ addremove == '-' ? '+' : addremove);
- memcpy(spec[0].blob_sha1, sha1, 20);
- spec[0].mode = mode;
- spec[0].sha1_valid = !!memcmp(sha1, null_sha1, 20);
- spec[0].file_valid = 1;
- spec[1].file_valid = 0;
-
- if (addremove == '+') {
- one = spec + 1; two = spec;
- } else {
- one = spec; two = one + 1;
- }
+ if (!path) path = "";
+ sprintf(concatpath, "%s%s", base, path);
+ one = alloc_filespec(concatpath);
+ two = alloc_filespec(concatpath);
+
+ if (addremove != '+')
+ fill_filespec(one, sha1, mode);
+ if (addremove != '-')
+ fill_filespec(two, sha1, mode);
- if (path) {
- strcpy(concatpath, base);
- strcat(concatpath, path);
- }
- hold_diff(path ? concatpath : base, one, two);
+ diff_queue(&queued_diff, one, two);
}
void diff_change(unsigned old_mode, unsigned new_mode,
@@ -796,7 +696,7 @@ void diff_change(unsigned old_mode, unsi
const unsigned char *new_sha1,
const char *base, const char *path) {
char concatpath[PATH_MAX];
- struct diff_spec spec[2];
+ struct diff_filespec *one, *two;
if (reverse_diff) {
unsigned tmp;
@@ -804,41 +704,14 @@ void diff_change(unsigned old_mode, unsi
tmp = old_mode; old_mode = new_mode; new_mode = tmp;
tmp_c = old_sha1; old_sha1 = new_sha1; new_sha1 = tmp_c;
}
+ if (!path) path = "";
+ sprintf(concatpath, "%s%s", base, path);
+ one = alloc_filespec(concatpath);
+ two = alloc_filespec(concatpath);
+ fill_filespec(one, old_sha1, old_mode);
+ fill_filespec(two, new_sha1, new_mode);
- if (0 <= diff_raw_output) {
- char old_hex[41];
- strcpy(old_hex, sha1_to_hex(old_sha1));
-
- if (!path)
- path = "";
- printf("*%06o->%06o %s %s->%s %s%s%c",
- old_mode, new_mode,
- git_object_type(new_mode),
- old_hex, sha1_to_hex(new_sha1),
- base, path, diff_raw_output);
- return;
- }
- if (S_ISDIR(new_mode))
- return;
-
- if (path) {
- strcpy(concatpath, base);
- strcat(concatpath, path);
- }
-
- memcpy(spec[0].blob_sha1, old_sha1, 20);
- spec[0].mode = old_mode;
- memcpy(spec[1].blob_sha1, new_sha1, 20);
- spec[1].mode = new_mode;
- spec[0].sha1_valid = !!memcmp(old_sha1, null_sha1, 20);
- spec[1].sha1_valid = !!memcmp(new_sha1, null_sha1, 20);
- spec[1].file_valid = spec[0].file_valid = 1;
-
- /* We do not look at changed files as candidate for
- * rename detection ever.
- */
- run_external_diff(path ? concatpath : base, NULL,
- &spec[0], &spec[1], -1);
+ diff_queue(&queued_diff, one, two);
}
void diff_unmerge(const char *path)
@@ -847,5 +720,5 @@ void diff_unmerge(const char *path)
printf("U %s%c", path, diff_raw_output);
return;
}
- run_external_diff(path, NULL, NULL, NULL, -1);
+ run_external_diff(path, NULL, NULL, NULL, NULL);
}
diff --git a/diffcore-rename.c b/diffcore-rename.c
new file mode 100644
--- /dev/null
+++ b/diffcore-rename.c
@@ -0,0 +1,443 @@
+/*
+ * Copyright (C) 2005 Junio C Hamano
+ */
+#include "cache.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "delta.h"
+
+struct diff_rename_pool {
+ struct diff_filespec **s;
+ int nr, alloc;
+};
+
+static void diff_rename_pool_clear(struct diff_rename_pool *pool)
+{
+ pool->s = NULL; pool->nr = pool->alloc = 0;
+}
+
+static void diff_rename_pool_add(struct diff_rename_pool *pool,
+ struct diff_filespec *s)
+{
+ if (S_ISDIR(s->mode))
+ return; /* rename/copy patch for tree does not make sense. */
+
+ if (pool->alloc <= pool->nr) {
+ pool->alloc = alloc_nr(pool->alloc);
+ pool->s = xrealloc(pool->s,
+ sizeof(*(pool->s)) * pool->alloc);
+ }
+ pool->s[pool->nr] = s;
+ pool->nr++;
+}
+
+static int is_exact_match(struct diff_filespec *src, struct diff_filespec *dst)
+{
+ if (src->sha1_valid && dst->sha1_valid &&
+ !memcmp(src->sha1, dst->sha1, 20))
+ return 1;
+ if (diff_populate_filespec(src) || diff_populate_filespec(dst))
+ /* this is an error but will be caught downstream */
+ return 0;
+ if (src->size == dst->size &&
+ !memcmp(src->data, dst->data, src->size))
+ return 1;
+ return 0;
+}
+
+struct diff_score {
+ struct diff_filespec *src;
+ struct diff_filespec *dst;
+ int score;
+ int rank;
+};
+
+static int estimate_similarity(struct diff_filespec *src,
+ struct diff_filespec *dst,
+ int minimum_score)
+{
+ /* src points at a file that existed in the original tree (or
+ * optionally a file in the destination tree) and dst points
+ * at a newly created file. They may be quite similar, in which
+ * case we want to say src is renamed to dst or src is copied into
+ * dst, and then some edit has been applied to dst.
+ *
+ * Compare them and return how similar they are, representing
+ * the score as an integer between 0 and 10000, except
+ * where they match exactly it is considered better than anything
+ * else.
+ */
+ void *delta;
+ unsigned long delta_size;
+ int score;
+
+ delta_size = ((src->size < dst->size) ?
+ (dst->size - src->size) : (src->size - dst->size));
+
+ /* We would not consider rename followed by more than
+ * minimum_score/MAX_SCORE edits; that is, delta_size must be smaller
+ * than (src->size + dst->size)/2 * minimum_score/MAX_SCORE,
+ * which means...
+ */
+
+ if ((src->size+dst->size)*minimum_score < delta_size*MAX_SCORE*2)
+ return 0;
+
+ delta = diff_delta(src->data, src->size,
+ dst->data, dst->size,
+ &delta_size);
+ free(delta);
+
+ /* This "delta" is really xdiff with adler32 and all the
+ * overheads but it is a quick and dirty approximation.
+ *
+ * Now we will give some score to it. 100% edit gets
+ * 0 points and 0% edit gets MAX_SCORE points. That is, every
+ * 1/MAX_SCORE edit gets 1 point penalty. The amount of penalty is:
+ *
+ * (delta_size * 2 / (src->size + dst->size)) * MAX_SCORE
+ *
+ */
+ score = MAX_SCORE-(MAX_SCORE*2*delta_size/(src->size+dst->size));
+ if (score < 0) return 0;
+ if (MAX_SCORE < score) return MAX_SCORE;
+ return score;
+}
+
+static void record_rename_pair(struct diff_queue_struct *outq,
+ struct diff_filespec *src,
+ struct diff_filespec *dst,
+ int rank,
+ int score)
+{
+ /* The rank is used to sort the final output, because there
+ * are certain dependencies.
+ *
+ * - rank #0 depends on deleted ones.
+ * - rank #1 depends on kept files before they are modified.
+ * - rank #2 depends on kept files after they are modified;
+ * currently not used.
+ *
+ * Therefore, the final output order should be:
+ *
+ * 1. rank #0 rename/copy diffs.
+ * 2. deletions in the original.
+ * 3. rank #1 rename/copy diffs.
+ * 4. additions and modifications in the original.
+ * 5. rank #2 rename/copy diffs; currently not used.
+ *
+ * To achieve this sort order, we give xform_work the number
+ * above.
+ */
+ struct diff_file_pair *dp = diff_queue(outq, src, dst);
+ dp->xfrm_work = (rank * 2 + 1) | (score<<RENAME_SCORE_SHIFT);
+ dst->xfrm_flags |= RENAME_DST_MATCHED;
+}
+
+#if 0
+static void debug_filespec(struct diff_filespec *s, int x, const char *one)
+{
+ fprintf(stderr, "queue[%d] %s (%s) %s %06o %s\n",
+ x, one,
+ s->path,
+ s->file_valid ? "valid" : "invalid",
+ s->mode,
+ s->sha1_valid ? sha1_to_hex(s->sha1) : "");
+ fprintf(stderr, "queue[%d] %s size %lu flags %d\n",
+ x, one,
+ s->size, s->xfrm_flags);
+}
+
+static void debug_filepair(const struct diff_file_pair *p, int i)
+{
+ debug_filespec(p->one, i, "one");
+ debug_filespec(p->two, i, "two");
+ fprintf(stderr, "pair flags %d, orig order %d, score %d\n",
+ (p->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1)),
+ p->orig_order,
+ (p->xfrm_work >> RENAME_SCORE_SHIFT));
+}
+
+static void debug_queue(const char *msg, struct diff_queue_struct *q)
+{
+ int i;
+ if (msg)
+ fprintf(stderr, "%s\n", msg);
+ fprintf(stderr, "q->nr = %d\n", q->nr);
+ for (i = 0; i < q->nr; i++) {
+ struct diff_file_pair *p = q->queue[i];
+ debug_filepair(p, i);
+ }
+}
+#else
+#define debug_queue(a,b) do { ; /*nothing*/ } while(0)
+#endif
+
+/*
+ * We sort the outstanding diff entries according to the rank (see
+ * comment at the beginning of record_rename_pair) and tiebreak with
+ * the order in the original input.
+ */
+static int rank_compare(const void *a_, const void *b_)
+{
+ const struct diff_file_pair *a = *(const struct diff_file_pair **)a_;
+ const struct diff_file_pair *b = *(const struct diff_file_pair **)b_;
+ int a_rank = a->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
+ int b_rank = b->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
+
+ if (a_rank != b_rank)
+ return a_rank - b_rank;
+ return a->orig_order - b->orig_order;
+}
+
+/*
+ * We sort the rename similarity matrix with the score, in descending
+ * order (more similar first).
+ */
+static int score_compare(const void *a_, const void *b_)
+{
+ const struct diff_score *a = a_, *b = b_;
+ return b->score - a->score;
+}
+
+static int needs_to_stay(struct diff_queue_struct *q, int i,
+ struct diff_filespec *it)
+{
+ /* If it will be used in later entry (either stay or used
+ * as the source of rename/copy), we need to copy, not rename.
+ */
+ while (i < q->nr) {
+ struct diff_file_pair *p = q->queue[i++];
+ if (!p->two->file_valid)
+ continue; /* removed is fine */
+ if (strcmp(p->one->path, it->path))
+ continue; /* not relevant */
+
+ /* p has its src set to *it and it is not a delete;
+ * it will be used for in-place change or rename/copy,
+ * so we cannot rename it out.
+ */
+ return 1;
+ }
+ return 0;
+}
+
+void diff_detect_rename(struct diff_queue_struct *q,
+ int detect_rename,
+ int minimum_score)
+{
+ struct diff_queue_struct outq;
+ struct diff_rename_pool created, deleted, stay;
+ struct diff_rename_pool *(srcs[2]);
+ struct diff_score *mx;
+ int h, i, j;
+ int num_create, num_src, dst_cnt, src_cnt;
+
+ outq.queue = NULL;
+ outq.nr = outq.alloc = 0;
+
+ diff_rename_pool_clear(&created);
+ diff_rename_pool_clear(&deleted);
+ diff_rename_pool_clear(&stay);
+
+ srcs[0] = &deleted;
+ srcs[1] = &stay;
+
+ /* NEEDSWORK:
+ * (1) make sure we properly ignore but pass trees.
+ *
+ * (2) make sure we do right thing on the same path deleted
+ * and created in the same patch.
+ */
+
+ for (i = 0; i < q->nr; i++) {
+ struct diff_file_pair *p = q->queue[i];
+ if (!p->one->file_valid)
+ if (!p->two->file_valid)
+ continue; /* ignore nonsense */
+ else
+ diff_rename_pool_add(&created, p->two);
+ else if (!p->two->file_valid)
+ diff_rename_pool_add(&deleted, p->one);
+ else if (1 < detect_rename) /* find copy, too */
+ diff_rename_pool_add(&stay, p->one);
+ }
+ if (created.nr == 0)
+ goto cleanup; /* nothing to do */
+
+ /* We really want to cull the candidates list early
+ * with cheap tests in order to avoid doing deltas.
+ *
+ * With the current callers, we should not have already
+ * matched entries at this point, but it is nonetheless
+ * checked for sanity.
+ */
+ for (i = 0; i < created.nr; i++) {
+ if (created.s[i]->xfrm_flags & RENAME_DST_MATCHED)
+ continue; /* we have matched exactly already */
+ for (h = 0; h < sizeof(srcs)/sizeof(srcs[0]); h++) {
+ struct diff_rename_pool *p = srcs[h];
+ for (j = 0; j < p->nr; j++) {
+ if (!is_exact_match(p->s[j], created.s[i]))
+ continue;
+ record_rename_pair(&outq,
+ p->s[j], created.s[i], h,
+ MAX_SCORE);
+ break; /* we are done with this entry */
+ }
+ }
+ }
+ debug_queue("done detecting exact", &outq);
+
+ /* Have we run out the created file pool? If so we can avoid
+ * doing the delta matrix altogether.
+ */
+ if (outq.nr == created.nr)
+ goto flush_rest;
+
+ num_create = (created.nr - outq.nr);
+ num_src = deleted.nr + stay.nr;
+ mx = xmalloc(sizeof(*mx) * num_create * num_src);
+ for (dst_cnt = i = 0; i < created.nr; i++) {
+ int base = dst_cnt * num_src;
+ if (created.s[i]->xfrm_flags & RENAME_DST_MATCHED)
+ continue; /* dealt with exact match already. */
+ for (src_cnt = h = 0; h < sizeof(srcs)/sizeof(srcs[0]); h++) {
+ struct diff_rename_pool *p = srcs[h];
+ for (j = 0; j < p->nr; j++, src_cnt++) {
+ struct diff_score *m = &mx[base + src_cnt];
+ m->src = p->s[j];
+ m->dst = created.s[i];
+ m->score = estimate_similarity(m->src, m->dst,
+ minimum_score);
+ m->rank = h;
+ }
+ }
+ dst_cnt++;
+ }
+ /* cost matrix sorted by most to least similar pair */
+ qsort(mx, num_create * num_src, sizeof(*mx), score_compare);
+ for (i = 0; i < num_create * num_src; i++) {
+ if (mx[i].dst->xfrm_flags & RENAME_DST_MATCHED)
+ continue; /* alreayd done, either exact or fuzzy. */
+ if (mx[i].score < minimum_score)
+ continue;
+ record_rename_pair(&outq,
+ mx[i].src, mx[i].dst, mx[i].rank,
+ mx[i].score);
+ }
+ free(mx);
+ debug_queue("done detecting fuzzy", &outq);
+
+ flush_rest:
+ /* At this point, we have found some renames and copies and they
+ * are kept in outq. The original list is still in *q.
+ *
+ * Scan the original list and move them into the outq; we will sort
+ * outq and swap it into the queue supplied to pass that to
+ * downstream, so we assign the sort keys in this loop.
+ *
+ * See comments at the top of record_rename_pair for numbers used
+ * to assign xfrm_work.
+ *
+ * Note that we have not annotated the diff_file_pair with any comment
+ * so there is nothing other than p to free.
+ */
+ for (i = 0; i < q->nr; i++) {
+ struct diff_file_pair *dp, *p = q->queue[i];
+ if (!p->one->file_valid) {
+ if (p->two->file_valid) {
+ /* creation */
+ dp = diff_queue(&outq, p->one, p->two);
+ dp->xfrm_work = 4;
+ }
+ /* otherwise it is a nonsense; just ignore it */
+ }
+ else if (!p->two->file_valid) {
+ /* deletion */
+ dp = diff_queue(&outq, p->one, p->two);
+ dp->xfrm_work = 2;
+ }
+ else {
+ /* modification, or stay as is */
+ dp = diff_queue(&outq, p->one, p->two);
+ dp->xfrm_work = 4;
+ }
+ free(p);
+ }
+ debug_queue("done copying original", &outq);
+
+ /* Sort outq */
+ qsort(outq.queue, outq.nr, sizeof(outq.queue[0]), rank_compare);
+
+ debug_queue("done sorting", &outq);
+
+ free(q->queue);
+ q->nr = q->alloc = 0;
+ q->queue = NULL;
+
+ /* Copy it out to q, removing duplicates. */
+ for (i = 0; i < outq.nr; i++) {
+ struct diff_file_pair *p = outq.queue[i];
+ if (!p->one->file_valid) {
+ /* created */
+ if (p->two->xfrm_flags & RENAME_DST_MATCHED)
+ ; /* rename/copy created it already */
+ else
+ diff_queue(q, p->one, p->two);
+ }
+ else if (!p->two->file_valid) {
+ /* deleted */
+ if (p->one->xfrm_flags & RENAME_SRC_GONE)
+ ; /* rename/copy deleted it already */
+ else
+ diff_queue(q, p->one, p->two);
+ }
+ else if (strcmp(p->one->path, p->two->path)) {
+ /* rename or copy */
+ struct diff_file_pair *dp =
+ diff_queue(q, p->one, p->two);
+ int msglen = (strlen(p->one->path) +
+ strlen(p->two->path) + 100);
+ int score = (p->xfrm_work >> RENAME_SCORE_SHIFT);
+ dp->xfrm_msg = xmalloc(msglen);
+
+ /* if we have a later entry that is a rename/copy
+ * that depends on p->one, then we copy here.
+ * otherwise we rename it.
+ */
+ if (needs_to_stay(&outq, i+1, p->one)) {
+ /* copy it */
+ sprintf(dp->xfrm_msg,
+ "similarity index %d%%\n"
+ "copy from %s\n"
+ "copy to %s\n",
+ (int)(0.5 + score * 100 / MAX_SCORE),
+ p->one->path, p->two->path);
+ }
+ else {
+ /* rename it, and mark it as gone. */
+ p->one->xfrm_flags |= RENAME_SRC_GONE;
+ sprintf(dp->xfrm_msg,
+ "similarity index %d%%\n"
+ "rename old %s\n"
+ "rename new %s\n",
+ (int)(0.5 + score * 100 / MAX_SCORE),
+ p->one->path, p->two->path);
+ }
+ }
+ else
+ /* otherwise it is a modified (or stayed) entry */
+ diff_queue(q, p->one, p->two);
+ free(p);
+ }
+
+ free(outq.queue);
+ debug_queue("done collapsing", q);
+
+ cleanup:
+ free(created.s);
+ free(deleted.s);
+ free(stay.s);
+ return;
+}
diff --git a/diffcore.h b/diffcore.h
new file mode 100644
--- /dev/null
+++ b/diffcore.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2005 Junio C Hamano
+ */
+#ifndef _DIFFCORE_H_
+#define _DIFFCORE_H_
+
+/* This header file is internal between diff.c and its diff transformers
+ * (e.g. diffcore-rename, diffcore-pickaxe). Never include this header
+ * in anything else.
+ */
+#define MAX_SCORE 10000
+#define DEFAULT_MINIMUM_SCORE 5000
+
+#define RENAME_DST_MATCHED 01
+#define RENAME_SRC_GONE 02
+#define RENAME_SCORE_SHIFT 8
+
+struct diff_filespec {
+ unsigned char sha1[20];
+ char *path;
+ void *data;
+ unsigned long size;
+ int xfrm_flags; /* for use by the xfrm */
+ unsigned short mode; /* file mode */
+ unsigned sha1_valid : 1; /* if true, use sha1 and trust mode;
+ * if false, use the name and read from
+ * the filesystem.
+ */
+ unsigned file_valid : 1; /* if false the file does not exist */
+ unsigned should_free : 1; /* data should be free()'ed */
+ unsigned should_munmap : 1; /* data should be munmap()'ed */
+};
+
+extern struct diff_filespec *alloc_filespec(const char *);
+extern void fill_filespec(struct diff_filespec *, const unsigned char *,
+ unsigned short);
+
+extern int diff_populate_filespec(struct diff_filespec *);
+extern void diff_free_filespec_data(struct diff_filespec *);
+
+struct diff_file_pair {
+ struct diff_filespec *one;
+ struct diff_filespec *two;
+ char *xfrm_msg;
+ int orig_order; /* the original order of insertion into the queue */
+ int xfrm_work; /* for use by tramsformers, not by diffcore */
+};
+
+struct diff_queue_struct {
+ struct diff_file_pair **queue;
+ int alloc;
+ int nr;
+};
+
+extern struct diff_file_pair *diff_queue(struct diff_queue_struct *,
+ struct diff_filespec *,
+ struct diff_filespec *);
+extern void diff_detect_rename(struct diff_queue_struct *, int, int);
+
+#endif
diff --git a/git-apply-patch-script b/git-apply-patch-script
--- a/git-apply-patch-script
+++ b/git-apply-patch-script
@@ -11,7 +11,7 @@ case "$#" in
1)
echo >&2 "cannot handle unmerged diff on path $1."
exit 1 ;;
-8)
+8 | 9)
echo >&2 "cannot handle rename diff between $1 and $8 yet."
exit 1 ;;
esac
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -32,7 +32,7 @@ test_expect_success \
find .git/objects -type d -print >full-of-directories
test_expect_success \
'.git/objects should have 256 subdirectories.' \
- 'test "$(wc -l full-of-directories | sed -e "s/ .*//")" = 257'
+ 'test $(cat full-of-directories | wc -l) = 257'
################################################################
# Basics of the basics
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -31,7 +31,7 @@ test_expect_success \
test_expect_success \
'write that tree.' \
- 'tree=$(git-write-tree)'
+ 'tree=$(git-write-tree) && echo $tree'
sed -e 's/line/Line/' <path0 >path1
rm -f path0
@@ -61,6 +61,6 @@ EOF
test_expect_success \
'validate the output.' \
- 'diff -I "rename similarity.*" >/dev/null current expected'
+ 'diff -I "similarity.*" >/dev/null current expected'
test_done
diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
new file mode 100755
--- /dev/null
+++ b/t/t4003-diff-rename-1.sh
@@ -0,0 +1,93 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='More rename detection
+
+'
+. ./test-lib.sh
+
+test_expect_success \
+ 'prepare reference tree' \
+ 'cat ../../COPYING >COPYING &&
+ git-update-cache --add COPYING &&
+ tree=$(git-write-tree) &&
+ echo $tree'
+
+test_expect_success \
+ 'prepare work tree' \
+ 'sed -e 's/HOWEVER/However/' <COPYING >COPYING.1 &&
+ sed -e 's/GPL/G.P.L/g' <COPYING >COPYING.2 &&
+ rm -f COPYING &&
+ git-update-cache --add --remove COPYING COPYING.?'
+
+GIT_DIFF_OPTS=-u0 git-diff-cache -M $tree |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current &&
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+rename old COPYING
+rename new COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- Note that the only valid version of the GPL as far as this project
++ Note that the only valid version of the G.P.L as far as this project
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ HOWEVER, in order to allow a migration to G.P.Lv# if that seems like
+@@ -# +# @@
+- This file is licensed under the GPL v#, or a later version
++ This file is licensed under the G.P.L v#, or a later version
+EOF
+
+test_expect_success \
+ 'validate output from rename/copy detection' \
+ 'diff -u current expected'
+
+test_expect_success \
+ 'prepare work tree again' \
+ 'mv COPYING.2 COPYING &&
+ git-update-cache --add --remove COPYING COPYING.1'
+
+GIT_DIFF_OPTS=-u0 git-diff-cache -C $tree |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+diff --git a/COPYING b/COPYING
+--- a/COPYING
++++ b/COPYING
+@@ -# +# @@
+- Note that the only valid version of the GPL as far as this project
++ Note that the only valid version of the G.P.L as far as this project
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ HOWEVER, in order to allow a migration to G.P.Lv# if that seems like
+@@ -# +# @@
+- This file is licensed under the GPL v#, or a later version
++ This file is licensed under the G.P.L v#, or a later version
+EOF
+
+test_expect_success \
+ 'validate output from rename/copy detection' \
+ 'diff -u current expected'
+
+test_done
------------------------------------------------
^ permalink raw reply
* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-21 9:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505190901340.2322@ppc970.osdl.org>
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> [ rambling mode on: ]
LT> One thing that struck me is that there is nothing wrong with having the
LT> same old file marked twice for a rename, or considering new files to be
LT> copies of old files. So if we ever allow that, then "rename" may be the
LT> wrong name for this, since the logic certainly allows the old file to
LT> still exist (or be removed and show up multiple times in a new guise).
People say be careful what you wish and for a reason. You may
get it ;-). I am sending the following:
[PATCH 1/3] Diff overhaul, adding half of copy detection.
[PATCH 2/3] Introducing software archaeologist's tool "pickaxe".
[PATCH 3/3] Diff overhaul, adding the other half of copy detection.
^ permalink raw reply
* [PATCH] Adding limits.h to cache.h in order to compile under Solaris
From: Thomas Glanzmann @ 2005-05-21 9:13 UTC (permalink / raw)
To: GIT
diff-tree 1ede81fa79aa5bd656f2b2aae3541719d306698d (from 559967c6d4fa3bab269d4a22d2db23f70e0156b7)
Author: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Date: Sat May 21 10:50:22 2005 +0200
Adding limits.h to cache.h in order to compile under Solaris
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -2,6 +2,7 @@
#define CACHE_H
#include <unistd.h>
+#include <limits.h>
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
^ permalink raw reply
* cvs->git (was Re: gitweb wishlist)
From: Matthias Urlichs @ 2005-05-21 7:35 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.58.0505201702170.2206@ppc970.osdl.org>
Hi, Linus Torvalds wrote:
> Bah. What crud.
I have an old CVS->BK merge script lying around which does all of this
reasonably correctly. If somebody wants to git-ify it, be my guest
(I won't have time in the foreseeable futurre :-( ); it's in the
rsync://server.smurf.noris.de/sourcemgr.git/#main repository as
bin/b.cvs{,.pl}.
--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de
^ permalink raw reply
* Re: gitweb wishlist
From: Matthias Urlichs @ 2005-05-21 7:29 UTC (permalink / raw)
To: git
In-Reply-To: <1116611932.12975.22.camel@dhcp-188>
Hi, Kay Sievers wrote:
> Something like that: :)
Cool.
More feature requests: ;-)
- Alternate white and almost-white backgrounds in the lists (all of them ;-)
so that wide-screened people like me don't lose context when their eyes
travel the long road from left to right edge of the screen. ;-)
- Merges currently don't have diff links. It'd be nice to have one for
each parent.
- File diffs have the "diff" link on the *parent*, not on the child.
That's counter-intuitive -- if I want to see what the Foo patch changes,
I should be able to click on the "diff" link on _that_ line, not the one
below it. Example:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;h=9636273dae265b9354b861b373cd43cd76a6d0fe;f=MAINTAINERS
--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de
^ permalink raw reply
* Re: gitk-1.0 released
From: Jon Seymour @ 2005-05-21 6:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kari Hameenaho, git
In-Reply-To: <Pine.LNX.4.58.0505201150220.2206@ppc970.osdl.org>
> - mark everything reachable from OLD_HEAD as being uninteresting (aka
> "seen"), and everything that reaches OLD_HEAD as being interesting
> and print it out.
Won't this step end up traversing back to the root anyway?
jon.
--
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Junio C Hamano @ 2005-05-21 5:59 UTC (permalink / raw)
To: Jeff Garzik; +Cc: git, Sean
In-Reply-To: <20050521051536.GA9387@havoc.gtf.org>
>>>>> "JG" == Jeff Garzik <jgarzik@pobox.com> writes:
JG> On Fri, May 20, 2005 at 10:08:19PM -0700, Junio C Hamano wrote:
>> The patch looks good. Before you proceed to convert the rest,
>> could I ask you to first let us see the list of new set of
>> options and semantics changes, if any ("checkout-cache -f -a" vs
>> "checkout-cache -a -f" immediately comes to mind)?
JG> FWIW most users (including me!) would expect that order of
JG> options is -not- significant.
That set of users includes me. I was hoping that this round
would not just change things to use argp, but at the same time
attempt to "fix" the problems around argument parsing.
JG> Typical implementation is agnostic on the ordering of
JG> options, but with a few lines of code in parse_opt() that
JG> need not always be the case.
I think you are responding to my "semantic changes" question,
but I did not mean to say that exactly emulating the current
behaviour is the requirement, nor I meant to ask if doing so is
impossible using argp(). I just wanted to see "the set of
planned fixes" while we are at it.
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Jeff Garzik @ 2005-05-21 5:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Sean, git
In-Reply-To: <7voeb5np30.fsf@assigned-by-dhcp.cox.net>
On Fri, May 20, 2005 at 10:08:19PM -0700, Junio C Hamano wrote:
> The patch looks good. Before you proceed to convert the rest,
> could I ask you to first let us see the list of new set of
> options and semantics changes, if any ("checkout-cache -f -a" vs
> "checkout-cache -a -f" immediately comes to mind)?
Typical implementation is agnostic on the ordering of options,
but with a few lines of code in parse_opt() that need not always be the
case.
FWIW most users (including me!) would expect that order of options is
-not- significant.
Jeff
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Jeff Garzik @ 2005-05-21 5:09 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <4966.10.10.10.24.1116650175.squirrel@linux1>
Patch looks good to me...
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Junio C Hamano @ 2005-05-21 5:08 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <4870.10.10.10.24.1116646732.squirrel@linux1>
The patch looks good. Before you proceed to convert the rest,
could I ask you to first let us see the list of new set of
options and semantics changes, if any ("checkout-cache -f -a" vs
"checkout-cache -a -f" immediately comes to mind)?
Presumably you would be doing the Documentation updates as well,
so starting from the documentaiton updates before writing the
actual code may be a good way for us to understand and ack on
what is going to happen.
^ permalink raw reply
* Re: [PATCH 3/3] delta creation
From: Junio C Hamano @ 2005-05-21 5:02 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505201659190.4397@localhost.localdomain>
Just a stupid wording question, but is "deltafy" the word we
want to use? The other command name "mkdelta" feels more easier
to swallow at least for me ...
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Sean @ 2005-05-21 4:36 UTC (permalink / raw)
To: Jeff Garzik; +Cc: git
In-Reply-To: <428EB444.7010200@pobox.com>
[-- Attachment #1: Type: text/plain, Size: 839 bytes --]
On Sat, May 21, 2005 12:08 am, Jeff Garzik said:
> Pretty good. You'll probably want some additional changes:
>
> 1) eliminate
> + case ARGP_KEY_ARG: state->next = state->argc; break;
>
> This will cause option processing to stop at the first unknown argument.
>
> 2) Pass-by-reference a variable to argp_parse(), which will store the
> index of the argument where processing stopped. This is the first
> hash/file/etc. non-option argument.
>
Thanks Jeff, that's pretty cool. Here's an updated patch.
With this updated patch, options following or even intermingled with the
SHA1 list also are picked up, for example:
$ git-fsck-cache 804c64ea864d0a8ee13f3de0b74158a3e9c3166d -crudt
fsck-cache.c | 66
+++++++++++++++++++++++++++++++++++------------------------
1 files changed, 40 insertions(+), 26 deletions(-)
Sean
[-- Attachment #2: fsck-cache-argp-v2.patch --]
[-- Type: application/octet-stream, Size: 2565 bytes --]
fsck-cache.c: needs update
Index: fsck-cache.c
===================================================================
--- 58741c69570705801db4b785681790d636475695/fsck-cache.c (mode:100644)
+++ uncommitted/fsck-cache.c (mode:100644)
@@ -1,5 +1,7 @@
#include <sys/types.h>
#include <dirent.h>
+#include <argp.h>
+const char *argp_program_version = "git 1.0";
#include "cache.h"
#include "commit.h"
@@ -407,36 +409,48 @@
find_file_objects(git_dir, "refs");
}
+#define O_UNREACH 'u'
+#define O_TAGS 't'
+#define O_ROOT 'r'
+#define O_DELTA 'd'
+#define O_CACHE 'c'
+
+static const char doc[] = "Perform repository consistency check";
+
+static struct argp_option options[] = {
+ {"unreachable", O_UNREACH, 0, 0, "Show missing objects or deltas"},
+ {"tags", O_TAGS, 0, 0, "Show revision tags"},
+ {"root", O_ROOT, 0, 0, "Show root objects, ie. those without parents"},
+ {"delta-depth", O_DELTA, 0, 0, "Show the maximum length of delta chains"},
+ {"cache", O_CACHE, 0, 0, "Mark all objects referenced by cache as reachable"},
+ { }
+};
+
+static error_t parse_opt (int key, char *arg, struct argp_state *state)
+{
+ switch (key) {
+ case O_UNREACH: show_unreachable = 1; break;
+ case O_TAGS: show_tags = 1; break;
+ case O_ROOT: show_root = 1; break;
+ case O_DELTA: show_max_delta_depth = 1; break;
+ case O_CACHE: keep_cache_objects = 1; break;
+ default: return ARGP_ERR_UNKNOWN;
+ }
+ return 0;
+}
+
+static const struct argp argp = { options, parse_opt, "[HEAD-SHA1...]", doc };
+
int main(int argc, char **argv)
{
int i, heads;
char *sha1_dir;
+ int idx;
- for (i = 1; i < argc; i++) {
- const char *arg = argv[i];
-
- if (!strcmp(arg, "--unreachable")) {
- show_unreachable = 1;
- continue;
- }
- if (!strcmp(arg, "--tags")) {
- show_tags = 1;
- continue;
- }
- if (!strcmp(arg, "--root")) {
- show_root = 1;
- continue;
- }
- if (!strcmp(arg, "--delta-depth")) {
- show_max_delta_depth = 1;
- continue;
- }
- if (!strcmp(arg, "--cache")) {
- keep_cache_objects = 1;
- continue;
- }
- if (*arg == '-')
- usage("git-fsck-cache [--tags] [[--unreachable] [--cache] <head-sha1>*]");
+ error_t rc = argp_parse(&argp, argc, argv, 0, &idx, NULL);
+ if (rc) {
+ fprintf(stderr, "argument failed: %s\n", strerror(rc));
+ return 1;
}
sha1_dir = get_object_directory();
@@ -450,7 +464,7 @@
expand_deltas();
heads = 0;
- for (i = 1; i < argc; i++) {
+ for (i = idx; i < argc; i++) {
const char *arg = argv[i];
if (*arg == '-')
^ permalink raw reply
* Re: [RFC] git-fsck-cache argument processing
From: Jeff Garzik @ 2005-05-21 4:08 UTC (permalink / raw)
To: Sean; +Cc: git
In-Reply-To: <4870.10.10.10.24.1116646732.squirrel@linux1>
Sean wrote:
> Here is a first crack at using argp as suggested by Jeff Garzik to
> implement argument processing as requested by Junio and Linus. Each of
> the long arguments have been given a single character equivalent as well.
>
> This patch only converts fsck-cache to use argp in case anyone has
> objections to the basic format or style. The patch includes a version
> number inside of fsck-cache.c; this should really be in a separate include
> file so you can run any command with --version and get the same answer.
Pretty good. You'll probably want some additional changes:
1) eliminate
+ case ARGP_KEY_ARG: state->next = state->argc; break;
This will cause option processing to stop at the first unknown argument.
2) Pass-by-reference a variable to argp_parse(), which will store the
index of the argument where processing stopped. This is the first
hash/file/etc. non-option argument.
(example code from posixutils)
int parse_cmdline(struct cmdline_walker *cw)
{
error_t rc_argp;
int idx = 0;
rc_argp = argp_parse(cw->argp, cw->argc, cw->argv, 0, &idx, NULL);
if (rc_argp) {
fprintf(stderr, "argp_parse: %s\n", strerror(rc_argp));
return -rc_argp;
}
return idx;
}
'idx' in this case is the first non-option argument, which can be passed
directly to argv[]. From there, you perform standard iteration over the
arguments provided on the command line, starting at argv[idx].
If you have a fixed number of arguments following the options, then your
parse_opt function can easily parse those args as well:
static error_t parse_opt (int key, char *arg, struct argp_state *state)
{
switch (key) {
case '1':
outmask |= OPT_FILE1;
break;
case '2':
outmask |= OPT_FILE2;
break;
case '3':
outmask |= OPT_DUP;
break;
case ARGP_KEY_ARG:
switch(state->arg_num) {
case 0: file1 = arg; break; /* 1st non-opt arg */
case 1: file2 = arg; break; /* 2nd non-opt arg */
default: argp_usage (state); break; /* too many args */
}
break;
case ARGP_KEY_END:
if (state->arg_num < 2) /* not enough args */
argp_usage (state);
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
^ permalink raw reply
* Re: [PATCH] show changed tree objects with recursive git-diff-tree
From: Junio C Hamano @ 2005-05-21 3:47 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, git
In-Reply-To: <Pine.LNX.4.58.0505202019330.2206@ppc970.osdl.org>
Diff heler _should_ not get confused, but maybe it currently
does. If that is the case, I would consider that a bug (my
bad).
... goes back to the Linus tip for a while and comes back ...
Yup. It says a change line containing tree is not something it
recognizes. Sorry, there is a bug there (and another bug that
partially hides that bug).
I'm doing major rewrite of the diff-core right now but even
after that, diff helper _should_ just ignore trees.
In the meantime, this patch should fix it. And this should be
the right fix even after the "major rewrite" I am doing now.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
# - linus: [PATCH] delta creation
# + (working tree)
diff --git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -20,7 +20,8 @@ static int parse_oneside_change(const ch
cp++;
}
*mode = m;
- if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6))
+ if (strncmp(cp, "\ttree\t", 6) && strncmp(cp, " tree ", 6) &&
+ strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6))
return -1;
cp += 6;
if (get_sha1_hex(cp, sha1))
@@ -44,11 +45,13 @@ static int parse_diff_raw_output(const c
diff_unmerge(cp + 1);
break;
case '+':
- parse_oneside_change(cp, &new_mode, new_sha1, path);
+ if (parse_oneside_change(cp, &new_mode, new_sha1, path))
+ return -1;
diff_addremove('+', new_mode, new_sha1, path, NULL);
break;
case '-':
- parse_oneside_change(cp, &old_mode, old_sha1, path);
+ if (parse_oneside_change(cp, &old_mode, old_sha1, path))
+ return -1;
diff_addremove('-', old_mode, old_sha1, path, NULL);
break;
case '*':
@@ -64,7 +67,8 @@ static int parse_diff_raw_output(const c
new_mode = (new_mode << 3) | (ch - '0');
cp++;
}
- if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6))
+ if (strncmp(cp, "\tblob\t", 6) && strncmp(cp, " blob ", 6) &&
+ (strncmp(cp, "\ttree\t", 6) && strncmp(cp, " tree ", 6)))
return -1;
cp += 6;
if (get_sha1_hex(cp, old_sha1))
^ permalink raw reply
* [RFC] git-fsck-cache argument processing
From: Sean @ 2005-05-21 3:38 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]
Here is a first crack at using argp as suggested by Jeff Garzik to
implement argument processing as requested by Junio and Linus. Each of
the long arguments have been given a single character equivalent as well.
This patch only converts fsck-cache to use argp in case anyone has
objections to the basic format or style. The patch includes a version
number inside of fsck-cache.c; this should really be in a separate include
file so you can run any command with --version and get the same answer.
With this change you have:
$ git-fsck-cache -?
Usage: git-fsck-cache [OPTION...] [HEAD-SHA1...]
git-fsck-cache - repository consistency check
-c, --cache Mark all objects referenced by cache as reachable
-d, --delta-depth Show the maximum length of delta chains
-r, --root Show root objects, ie. those without parents
-t, --tags Show revision tags
-u, --unreachable Show missing objects or deltas
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
And the following should work as expected:
$ git-fsck-cache -crudt
fsck-cache.c | 64
+++++++++++++++++++++++++++++++++++------------------------
1 files changed, 39 insertions(+), 25 deletions(-)
Sean
[-- Attachment #2: fsck-cache-argp-v1.patch --]
[-- Type: application/octet-stream, Size: 2375 bytes --]
fsck-cache.c: needs update
Index: fsck-cache.c
===================================================================
--- 58741c69570705801db4b785681790d636475695/fsck-cache.c (mode:100644)
+++ uncommitted/fsck-cache.c (mode:100644)
@@ -1,5 +1,7 @@
#include <sys/types.h>
#include <dirent.h>
+#include <argp.h>
+const char *argp_program_version = "git 1.0";
#include "cache.h"
#include "commit.h"
@@ -407,36 +409,48 @@
find_file_objects(git_dir, "refs");
}
+#define O_UNREACH 'u'
+#define O_TAGS 't'
+#define O_ROOT 'r'
+#define O_DELTA 'd'
+#define O_CACHE 'c'
+
+static const char doc[] = "Perform repository consistency check";
+
+static struct argp_option options[] = {
+ {"unreachable", O_UNREACH, 0, 0, "Show missing objects or deltas"},
+ {"tags", O_TAGS, 0, 0, "Show revision tags"},
+ {"root", O_ROOT, 0, 0, "Show root objects, ie. those without parents"},
+ {"delta-depth", O_DELTA, 0, 0, "Show the maximum length of delta chains"},
+ {"cache", O_CACHE, 0, 0, "Mark all objects referenced by cache as reachable"},
+ { }
+};
+
+static error_t parse_opt (int key, char *arg, struct argp_state *state)
+{
+ switch (key) {
+ case O_UNREACH: show_unreachable = 1; break;
+ case O_TAGS: show_tags = 1; break;
+ case O_ROOT: show_root = 1; break;
+ case O_DELTA: show_max_delta_depth = 1; break;
+ case O_CACHE: keep_cache_objects = 1; break;
+ case ARGP_KEY_ARG: state->next = state->argc; break;
+ default: return ARGP_ERR_UNKNOWN;
+ }
+ return 0;
+}
+
+static const struct argp argp = { options, parse_opt, "[HEAD-SHA1...]", doc };
+
int main(int argc, char **argv)
{
int i, heads;
char *sha1_dir;
- for (i = 1; i < argc; i++) {
- const char *arg = argv[i];
-
- if (!strcmp(arg, "--unreachable")) {
- show_unreachable = 1;
- continue;
- }
- if (!strcmp(arg, "--tags")) {
- show_tags = 1;
- continue;
- }
- if (!strcmp(arg, "--root")) {
- show_root = 1;
- continue;
- }
- if (!strcmp(arg, "--delta-depth")) {
- show_max_delta_depth = 1;
- continue;
- }
- if (!strcmp(arg, "--cache")) {
- keep_cache_objects = 1;
- continue;
- }
- if (*arg == '-')
- usage("git-fsck-cache [--tags] [[--unreachable] [--cache] <head-sha1>*]");
+ error_t rc = argp_parse(&argp, argc, argv, 0, NULL, NULL);
+ if (rc) {
+ fprintf(stderr, "argument failed: %s\n", strerror(rc));
+ return 1;
}
sha1_dir = get_object_directory();
^ permalink raw reply
* Re: [PATCH] show changed tree objects with recursive git-diff-tree
From: Linus Torvalds @ 2005-05-21 3:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vsm0hpbub.fsf@assigned-by-dhcp.cox.net>
On Fri, 20 May 2005, Junio C Hamano wrote:
>
> Although I do not have immediate objections to what it tries to
> do, I have to think about the intent of the patch and its
> ramifications.
I really think it should be a totally separate flag to enable showing the
sub-trees if the tree-blobification wants this.
In fact, I can pretty much _guarantee_ that the patch as posted is the
wrong thing to do: it will do horribly wrong things for things like
git-whatchanged arch/i386/kernel/head.S
(but I haven't tried it - try it yourself. The correct output for the
kernel archive is just a single commit, and a single blob change in that
commit).
My bet is that the patch will end up showing every single changeset that
touches anything under "arch/", since such _trees_ will be marked as
interesting. Which is absolutely the wrong thing to do.
Nico, try it, maybe you'll prove me wrong.
Linus
^ permalink raw reply
* Re: [PATCH] show changed tree objects with recursive git-diff-tree
From: Linus Torvalds @ 2005-05-21 3:20 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.62.0505202131520.4397@localhost.localdomain>
On Fri, 20 May 2005, Nicolas Pitre wrote:
>
> When -p is not used, git-diff-tree currently shows changed tree objects
> but only when not recursive. This patch makes the recursive output
> show tree objects as well.
That sounds wrong. That would seem to make diff-helper confused, no?
Linus
^ permalink raw reply
* Re: [PATCH] show changed tree objects with recursive git-diff-tree
From: Junio C Hamano @ 2005-05-21 2:11 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505202131520.4397@localhost.localdomain>
Although I do not have immediate objections to what it tries to
do, I have to think about the intent of the patch and its
ramifications.
However, I think the patch operates at the wrong level if you
are basing on the tip of Linus tree. With the new diff core,
you do not filter like this:
- char *newbase = malloc_base(base, path1, pathlen1);
+ char *newbase;
+ if (!silent && !generate_patch)
+ diff_change(mode1, mode2, sha1, sha2, base, path1);
+ newbase = malloc_base(base, path1, pathlen1);
I'd just say "if (!silent)" there. The updated diff_change
_should_ do the right thing regardless of generate_patch value
you have there, because it is already told about it with
diff_setup(), and it knows not to say silly things when we are
generating patch.
^ permalink raw reply
* Re: checkout-cache -f: a better way?
From: Jeff Garzik @ 2005-05-21 1:45 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201841550.2206@ppc970.osdl.org>
Linus Torvalds wrote:
> That's what, 20 times faster?
:)
> More, actually, I suspect, since the "-m" version is not only faster, but
> it doesn't do much IO, so you'll not have tons of dirty pages/inodes etc
> afterwards.
Yep. A -lot- of writeback would occur, a few seconds after my original
script completed.
Jeff
^ permalink raw reply
* Re: [PATCH] Fix use of wc in t0000-basic
From: Junio C Hamano @ 2005-05-21 1:43 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Sean, Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.21.0505202113500.30848-100000@iabervon.org>
>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:
DB> Junio was stripping the filename (not whitespace) from wc, not knowing
DB> that it could be suppressed by using stdin.
Actually the reason I did so initially was because I recalled
seeing a wc that said "-" instead of omitting the filename. I
do not have access to those obscure Unixen so I cannot these
things easily anymore, though.
DB> of wc that put whitespace at the beginning. I think the
DB> sed-only solution is far more obscure and no cleaner than
DB> cat and wc.
This I tend to agree, but that is probably one of the most
portable.
^ permalink raw reply
* Re: checkout-cache -f: a better way?
From: Linus Torvalds @ 2005-05-21 1:44 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <428E7994.1090402@pobox.com>
On Fri, 20 May 2005, Jeff Garzik wrote:
>
> Yep, thanks. Script does seem faster now.
Yeah. They "seem faster".
> real 0m7.069s
to
> real 0m0.389s
That's what, 20 times faster?
More, actually, I suspect, since the "-m" version is not only faster, but
it doesn't do much IO, so you'll not have tons of dirty pages/inodes etc
afterwards.
Linus
^ permalink raw reply
* [PATCH] show changed tree objects with recursive git-diff-tree
From: Nicolas Pitre @ 2005-05-21 1:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
When -p is not used, git-diff-tree currently shows changed tree objects
but only when not recursive. This patch makes the recursive output
show tree objects as well.
This has the immediate benefit of making git-deltafy-script handle
deltafication of tree objects.
Signed-off-by: Nicolas Pitre <nico@cam.org>
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -131,7 +131,10 @@ static int compare_tree_entry(void *tree
if (recursive && S_ISDIR(mode1)) {
int retval;
- char *newbase = malloc_base(base, path1, pathlen1);
+ char *newbase;
+ if (!silent && !generate_patch)
+ diff_change(mode1, mode2, sha1, sha2, base, path1);
+ newbase = malloc_base(base, path1, pathlen1);
retval = diff_tree_sha1(sha1, sha2, newbase);
free(newbase);
return retval;
diff --git a/git-deltafy-script b/git-deltafy-script
--- a/git-deltafy-script
+++ b/git-deltafy-script
@@ -9,8 +9,6 @@
# NOTE: the "best earlier version" is not implemented in mkdelta yet
# and therefore only the next eariler version is used at this time.
#
-# TODO: deltafy tree objects as well.
-#
# The -d argument allows to provide a limit on the delta chain depth.
# If 0 is passed then everything is undeltafied.
^ permalink raw reply
* Re: [PATCH] Fix use of wc in t0000-basic
From: Daniel Barkalow @ 2005-05-21 1:16 UTC (permalink / raw)
To: Sean; +Cc: Linus Torvalds, git
In-Reply-To: <4616.10.10.10.24.1116637985.squirrel@linux1>
On Fri, 20 May 2005, Sean wrote:
> Yes, i was. But presumably someone was stripping the whitespace from wc
> for a reason? Either way the sed-only solution seems a little cleaner.
Junio was stripping the filename (not whitespace) from wc, not knowing
that it could be suppressed by using stdin. This didn't work with versions
of wc that put whitespace at the beginning. I think the sed-only solution
is far more obscure and no cleaner than cat and wc.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox