* [PATCH 0/4] Add a testsuite to stgit (take 2)
From: Yann Dirson @ 2006-04-13 21:38 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
This is an update of the previous patch series, including minor improvements
to the way the test engine is adapted to stgit, as well as a new testsuite
demonstrating robustness issues on series creation, and proposed fixes for all
those bugs. And hopefully more standard patches.
--
Yann Dirson <ydirson@altern.org> |
Debian-related: <dirson@debian.org> | Support Debian GNU/Linux:
| Freedom, Power, Stability, Gratis
http://ydirson.free.fr/ | Check <http://www.debian.org/>
^ permalink raw reply
* [PATCH 1/4] Add a testsuite framework copied from git-core
From: Yann Dirson @ 2006-04-13 21:44 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
In-Reply-To: <20060413213819.8806.53300.stgit@gandelf.nowhere.earth>
From: Yann Dirson <ydirson@altern.org>
See git's t/README for details on how to use this framework.
There is no integration yet in the toplevel Makefile, I'll let
python masters take care of this. Use "make -C t" to run the
tests for now.
A patch-naming policy should be defined for stgit, since the
git one does not apply.
---
TODO | 2 -
t/Makefile | 25 ++++++
t/t0000-dummy.sh | 19 +++++
t/test-lib.sh | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 253 insertions(+), 1 deletions(-)
diff --git a/TODO b/TODO
index e5affe0..d97ffd1 100644
--- a/TODO
+++ b/TODO
@@ -6,7 +6,7 @@ The TODO list until 1.0:
- debian package support
- man page
- code execution allowed from templates
-- regression tests
+- more regression tests
- release 1.0
diff --git a/t/Makefile b/t/Makefile
new file mode 100644
index 0000000..d5d7b6f
--- /dev/null
+++ b/t/Makefile
@@ -0,0 +1,25 @@
+# Run tests
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+#GIT_TEST_OPTS=--verbose --debug
+SHELL_PATH ?= $(SHELL)
+TAR ?= $(TAR)
+
+# Shell quote;
+SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+
+T = $(wildcard t[0-9][0-9][0-9][0-9]-*.sh)
+
+all: $(T) clean
+
+$(T):
+ @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
+
+clean:
+ rm -fr trash
+
+.PHONY: $(T) clean
+.NOPARALLEL:
+
diff --git a/t/t0000-dummy.sh b/t/t0000-dummy.sh
new file mode 100755
index 0000000..8dc25d3
--- /dev/null
+++ b/t/t0000-dummy.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Yann Dirson
+#
+
+test_description='Dummy test.
+
+Only to test the testing environment.
+'
+
+. ./test-lib.sh
+
+test_stg_init
+
+test_expect_success \
+ 'check stgit can be run' \
+ 'stg version'
+
+test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
new file mode 100755
index 0000000..2580bcc
--- /dev/null
+++ b/t/test-lib.sh
@@ -0,0 +1,208 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+# Copyright (c) 2006 Yann Dirson
+#
+
+# For repeatability, reset the environment to known value.
+LANG=C
+LC_ALL=C
+PAGER=cat
+TZ=UTC
+export LANG LC_ALL PAGER TZ
+unset AUTHOR_DATE
+unset AUTHOR_EMAIL
+unset AUTHOR_NAME
+unset COMMIT_AUTHOR_EMAIL
+unset COMMIT_AUTHOR_NAME
+unset GIT_ALTERNATE_OBJECT_DIRECTORIES
+unset GIT_AUTHOR_DATE
+GIT_AUTHOR_EMAIL=author@example.com
+GIT_AUTHOR_NAME='A U Thor'
+unset GIT_COMMITTER_DATE
+GIT_COMMITTER_EMAIL=committer@example.com
+GIT_COMMITTER_NAME='C O Mitter'
+unset GIT_DIFF_OPTS
+unset GIT_DIR
+unset GIT_EXTERNAL_DIFF
+unset GIT_INDEX_FILE
+unset GIT_OBJECT_DIRECTORY
+unset SHA1_FILE_DIRECTORIES
+unset SHA1_FILE_DIRECTORY
+export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
+export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
+
+# Each test should start with something like this, after copyright notices:
+#
+# test_description='Description of this test...
+# This test checks if command xyzzy does the right thing...
+# '
+# . ./test-lib.sh
+
+error () {
+ echo "* error: $*"
+ trap - exit
+ exit 1
+}
+
+say () {
+ echo "* $*"
+}
+
+test "${test_description}" != "" ||
+error "Test script did not set test_description."
+
+while test "$#" -ne 0
+do
+ case "$1" in
+ -d|--d|--de|--deb|--debu|--debug)
+ debug=t; shift ;;
+ -i|--i|--im|--imm|--imme|--immed|--immedi|--immedia|--immediat|--immediate)
+ immediate=t; shift ;;
+ -h|--h|--he|--hel|--help)
+ echo "$test_description"
+ exit 0 ;;
+ -v|--v|--ve|--ver|--verb|--verbo|--verbos|--verbose)
+ verbose=t; shift ;;
+ *)
+ break ;;
+ esac
+done
+
+exec 5>&1
+if test "$verbose" = "t"
+then
+ exec 4>&2 3>&1
+else
+ exec 4>/dev/null 3>/dev/null
+fi
+
+test_failure=0
+test_count=0
+
+trap 'echo >&5 "FATAL: Unexpected exit with code $?"; exit 1' exit
+
+
+# You are not expected to call test_ok_ and test_failure_ directly, use
+# the text_expect_* functions instead.
+
+test_ok_ () {
+ test_count=$(expr "$test_count" + 1)
+ say " ok $test_count: $@"
+}
+
+test_failure_ () {
+ test_count=$(expr "$test_count" + 1)
+ test_failure=$(expr "$test_failure" + 1);
+ say "FAIL $test_count: $1"
+ shift
+ echo "$@" | sed -e 's/^/ /'
+ test "$immediate" = "" || { trap - exit; exit 1; }
+}
+
+
+test_debug () {
+ test "$debug" = "" || eval "$1"
+}
+
+test_run_ () {
+ eval >&3 2>&4 "$1"
+ eval_ret="$?"
+ return 0
+}
+
+test_expect_failure () {
+ test "$#" = 2 ||
+ error "bug in the test script: not 2 parameters to test-expect-failure"
+ say >&3 "expecting failure: $2"
+ test_run_ "$2"
+ if [ "$?" = 0 -a "$eval_ret" != 0 ]
+ then
+ test_ok_ "$1"
+ else
+ test_failure_ "$@"
+ fi
+}
+
+test_expect_success () {
+ test "$#" = 2 ||
+ error "bug in the test script: not 2 parameters to test-expect-success"
+ say >&3 "expecting success: $2"
+ test_run_ "$2"
+ if [ "$?" = 0 -a "$eval_ret" = 0 ]
+ then
+ test_ok_ "$1"
+ else
+ test_failure_ "$@"
+ fi
+}
+
+test_expect_code () {
+ test "$#" = 3 ||
+ error "bug in the test script: not 3 parameters to test-expect-code"
+ say >&3 "expecting exit code $1: $3"
+ test_run_ "$3"
+ if [ "$?" = 0 -a "$eval_ret" = "$1" ]
+ then
+ test_ok_ "$2"
+ else
+ test_failure_ "$@"
+ fi
+}
+
+# Most tests can use the created repository, but some amy need to create more.
+# Usage: test_create_repo <directory>
+test_create_repo () {
+ test "$#" = 1 ||
+ error "bug in the test script: not 1 parameter to test-create-repo"
+ owd=`pwd`
+ repo="$1"
+ mkdir "$repo"
+ cd "$repo" || error "Cannot setup test environment"
+ git-init-db 2>/dev/null ||
+ error "cannot run git-init-db -- have you installed git-core?"
+ mv .git/hooks .git/hooks-disabled
+ cd "$owd"
+}
+
+test_stg_init () {
+ echo "empty start" |
+ git-commit-tree `git-write-tree` >.git/refs/heads/master 2>/dev/null ||
+ error "cannot run git-commit -- is your git-core funtionning?"
+ stg init ||
+ error "cannot run stg init -- have you built things yet?"
+}
+
+test_done () {
+ trap - exit
+ case "$test_failure" in
+ 0)
+ # We could:
+ # cd .. && rm -fr trash
+ # but that means we forbid any tests that use their own
+ # subdirectory from calling test_done without coming back
+ # to where they started from.
+ # The Makefile provided will clean this test area so
+ # we will leave things as they are.
+
+ say "passed all $test_count test(s)"
+ exit 0 ;;
+
+ *)
+ say "failed $test_failure among $test_count test(s)"
+ exit 1 ;;
+
+ esac
+}
+
+# Test the binaries we have just built. The tests are kept in
+# t/ subdirectory and are run in trash subdirectory.
+PATH=$(pwd)/..:$PATH
+export PATH
+
+
+# Test repository
+test=trash
+rm -fr "$test"
+test_create_repo $test
+cd "$test"
^ permalink raw reply related
* [PATCH 2/4] Add list of bugs to TODO
From: Yann Dirson @ 2006-04-13 21:44 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
In-Reply-To: <20060413213819.8806.53300.stgit@gandelf.nowhere.earth>
From: Yann Dirson <ydirson@altern.org>
Since there is no formal place to register bugs, other than the git ml,
I have added a couple of them, even mentionned on the ml, but which are
still around.
---
TODO | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/TODO b/TODO
index d97ffd1..7dd099c 100644
--- a/TODO
+++ b/TODO
@@ -17,3 +17,12 @@ The future, when time allows or if someo
synchronising with other patches (diff format or in other
repositories)
- write bash-completion script for the StGIT commands
+- support for branches with / in names
+ (ml: "Handle branch names with slashes")
+- "pull" argument should default to a sane value, "origin" is wrong in
+ many cases
+
+Bugs:
+
+- the following commands break in subdirs:
+ - refresh (ml: "Running StGIT in subdirectories")
^ permalink raw reply related
* [PATCH 4/4] Add a couple of safety checks to series creation
From: Yann Dirson @ 2006-04-13 21:44 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
In-Reply-To: <20060413213819.8806.53300.stgit@gandelf.nowhere.earth>
From: Yann Dirson <ydirson@altern.org>
Check first whether the operation can complete, instead of
bombing out halfway.
---
stgit/commands/branch.py | 5 +++
stgit/stack.py | 7 ++++-
t/t1000-branch-create.sh | 66 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 76 insertions(+), 2 deletions(-)
diff --git a/stgit/commands/branch.py b/stgit/commands/branch.py
index c4b5945..c95e529 100644
--- a/stgit/commands/branch.py
+++ b/stgit/commands/branch.py
@@ -122,12 +122,15 @@ def func(parser, options, args):
check_conflicts()
check_head_top_equal()
+ if git.branch_exists(args[0]):
+ raise CmdException, 'Branch "%s" already exists' % args[0]
+
tree_id = None
if len(args) == 2:
tree_id = git_id(args[1])
- git.create_branch(args[0], tree_id)
stack.Series(args[0]).init()
+ git.create_branch(args[0], tree_id)
print 'Branch "%s" created.' % args[0]
return
diff --git a/stgit/stack.py b/stgit/stack.py
index 92407e7..236e67f 100644
--- a/stgit/stack.py
+++ b/stgit/stack.py
@@ -431,8 +431,13 @@ class Series:
"""
bases_dir = os.path.join(self.__base_dir, 'refs', 'bases')
- if self.is_initialised():
+ if os.path.exists(self.__patch_dir):
raise StackException, self.__patch_dir + ' already exists'
+ if os.path.exists(self.__refs_dir):
+ raise StackException, self.__refs_dir + ' already exists'
+ if os.path.exists(self.__base_file):
+ raise StackException, self.__base_file + ' already exists'
+
os.makedirs(self.__patch_dir)
if not os.path.isdir(bases_dir):
diff --git a/t/t1000-branch-create.sh b/t/t1000-branch-create.sh
new file mode 100755
index 0000000..bee0b1c
--- /dev/null
+++ b/t/t1000-branch-create.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Yann Dirson
+#
+
+test_description='Branch operations.
+
+Exercises the "stg branch" commands.
+'
+
+. ./test-lib.sh
+
+test_stg_init
+
+test_expect_failure \
+ 'Try to create an stgit branch with a spurious refs/patches/ entry' \
+ 'find .git -name foo | xargs rm -rf &&
+ touch .git/refs/patches/foo &&
+ stg branch -c foo
+'
+
+test_expect_success \
+ 'Check no part of the branch was created' \
+ 'test "`find .git -name foo | tee /dev/stderr`" = ".git/refs/patches/foo"
+'
+
+
+test_expect_failure \
+ 'Try to create an stgit branch with a spurious patches/ entry' \
+ 'find .git -name foo | xargs rm -rf &&
+ touch .git/patches/foo &&
+ stg branch -c foo
+'
+
+test_expect_success \
+ 'Check no part of the branch was created' \
+ 'test "`find .git -name foo | tee /dev/stderr`" = ".git/patches/foo"
+'
+
+
+test_expect_failure \
+ 'Try to create an stgit branch with a spurious refs/bases/ entry' \
+ 'find .git -name foo | xargs rm -rf &&
+ touch .git/refs/bases/foo &&
+ stg branch -c foo
+'
+
+test_expect_success \
+ 'Check no part of the branch was created' \
+ 'test "`find .git -name foo | tee /dev/stderr`" = ".git/refs/bases/foo"
+'
+
+
+# test_expect_failure \
+# 'Try to create an stgit branch with a spurious refs/heads/ entry' \
+# 'find .git -name foo | xargs rm -rf &&
+# touch .git/refs/heads/foo &&
+# stg branch -c foo
+# '
+
+# test_expect_success \
+# 'Check no part of the branch was created' \
+# 'test "`find .git -name foo | tee /dev/stderr`" = ".git/refs/heads/foo"
+# '
+
+test_done
^ permalink raw reply related
* [PATCH 3/4] Correctly handle refs/patches on series rename
From: Yann Dirson @ 2006-04-13 21:44 UTC (permalink / raw)
To: Catalin Marinas; +Cc: git
In-Reply-To: <20060413213819.8806.53300.stgit@gandelf.nowhere.earth>
From: Yann Dirson <ydirson@altern.org>
When renaming a series, the refs/patches dir was not moved, and
by chance a new one was created by the repository-upgrade code, but
that left the old one behind as cruft.
Also added a regression test to assert that nothing by the old name
is left behind.
---
stgit/stack.py | 2 ++
t/t1001-branch-rename.sh | 33 +++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/stgit/stack.py b/stgit/stack.py
index f4d7490..92407e7 100644
--- a/stgit/stack.py
+++ b/stgit/stack.py
@@ -497,6 +497,8 @@ class Series:
os.rename(self.__series_dir, to_stack.__series_dir)
if os.path.exists(self.__base_file):
os.rename(self.__base_file, to_stack.__base_file)
+ if os.path.exists(self.__refs_dir):
+ os.rename(self.__refs_dir, to_stack.__refs_dir)
self.__init__(to_name)
diff --git a/t/t1001-branch-rename.sh b/t/t1001-branch-rename.sh
new file mode 100755
index 0000000..65a5280
--- /dev/null
+++ b/t/t1001-branch-rename.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+#
+# Copyright (c) 2006 Yann Dirson
+#
+
+test_description='Branch operations.
+
+Exercises the "stg branch" commands.
+'
+
+. ./test-lib.sh
+
+test_stg_init
+
+test_expect_success \
+ 'Create an stgit branch from scratch' \
+ 'stg branch -c foo &&
+ stg new p1 -m "p1"
+'
+
+test_expect_failure \
+ 'Rename the current stgit branch' \
+ 'stg branch -r foo bar
+'
+
+test_expect_success \
+ 'Rename an stgit branch' \
+ 'stg branch -c buz &&
+ stg branch -r foo bar &&
+ test -z `find .git -name foo | tee /dev/stderr`
+'
+
+test_done
^ permalink raw reply related
* Re: [RFH] shifting xdiff hunks?
From: Davide Libenzi @ 2006-04-13 21:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604122348010.7104@alien.or.mcafeemobile.com>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 419 bytes --]
On Wed, 12 Apr 2006, Davide Libenzi wrote:
> Yes, this is what GNU diff does. It's a post-process of the edit script. Not
> a problem at all. Till this weekend (included) I'm pretty booked, but I'll do
> that in the following days.
Dang, that was a short weekend. I found a lunch-time hour for this. Would
you try to see if this libxdiff-based diff merges on your tree?
See also how it looks for you.
- Davide
[-- Attachment #2: Type: TEXT/plain, Size: 4294 bytes --]
--- a/xdiffi.c
+++ b/xdiffi.c
@@ -45,6 +45,8 @@
long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
xdalgoenv_t *xenv);
static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo);
+
@@ -394,6 +396,110 @@
}
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo) {
+ long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
+ char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+ xrecord_t **recs = xdf->recs;
+
+ /*
+ * This is the same of what GNU diff does. Move back and forward
+ * change groups for a consistent and pretty diff output. This also
+ * helps in finding joineable change groups and reduce the diff size.
+ */
+ for (ix = ixo = 0;;) {
+ /*
+ * Find the first changed line in the to-be-compacted file.
+ * We need to keep track of both indexes, so if we find a
+ * changed lines group on the other file, while scanning the
+ * to-be-compacted file, we need to skip it properly. Note
+ * that loops that are testing for changed lines on rchg* do
+ * not need index bounding since the array is prepared with
+ * a zero at position -1 and N.
+ */
+ for (; ix < nrec && !rchg[ix]; ix++)
+ while (rchgo[ixo++]);
+ if (ix == nrec)
+ break;
+
+ /*
+ * Record the start of a changed-group in the to-be-compacted file
+ * and find the end of it, on both to-be-compacted and other file
+ * indexes (ix and ixo).
+ */
+ ixs = ix;
+ for (ix++; rchg[ix]; ix++);
+ for (; rchgo[ixo]; ixo++);
+
+ do {
+ grpsiz = ix - ixs;
+
+ /*
+ * If the line before the current change group, is equal to
+ * the last line of the current change group, shift backward
+ * the group.
+ */
+ while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
+ XDL_RECMATCH(recs[ixs - 1], recs[ix - 1])) {
+ rchg[--ixs] = 1;
+ rchg[--ix] = 0;
+
+ /*
+ * This change might have joined two change groups,
+ * so we try to take this scenario in account by moving
+ * the start index accordingly (and so the other-file
+ * end-of-group index).
+ */
+ for (; rchg[ixs - 1]; ixs--);
+ while (rchgo[--ixo]);
+ }
+
+ /*
+ * Record the end-of-group position in case we are matched
+ * with a group of changes in the other file (that is, the
+ * change record before the enf-of-group index in the other
+ * file is set).
+ */
+ ixref = rchgo[ixo - 1] ? ix: nrec;
+
+ /*
+ * If the first line of the current change group, is equal to
+ * the line next of the current change group, shift forward
+ * the group.
+ */
+ while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
+ XDL_RECMATCH(recs[ixs], recs[ix])) {
+ rchg[ixs++] = 0;
+ rchg[ix++] = 1;
+
+ /*
+ * This change might have joined two change groups,
+ * so we try to take this scenario in account by moving
+ * the start index accordingly (and so the other-file
+ * end-of-group index). Keep tracking the reference
+ * index in case we are shifting together with a
+ * corresponding group of changes in the other file.
+ */
+ for (; rchg[ix]; ix++);
+ while (rchgo[++ixo])
+ ixref = ix;
+ }
+ } while (grpsiz != ix - ixs);
+
+ /*
+ * Try to move back the possibly merged group of changes, to match
+ * the recorded postion in the other file.
+ */
+ while (ixref < ix) {
+ rchg[--ixs] = 1;
+ rchg[--ix] = 0;
+ while (rchgo[--ixo]);
+ }
+ }
+
+ return 0;
+}
+
+
int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
xdchange_t *cscr = NULL, *xch;
char *rchg1 = xe->xdf1.rchg, *rchg2 = xe->xdf2.rchg;
@@ -439,13 +545,13 @@
return -1;
}
-
- if (xdl_build_script(&xe, &xscr) < 0) {
+ if (xdl_change_compact(&xe.xdf1, &xe.xdf2) < 0 ||
+ xdl_change_compact(&xe.xdf2, &xe.xdf1) < 0 ||
+ xdl_build_script(&xe, &xscr) < 0) {
xdl_free_env(&xe);
return -1;
}
-
if (xscr) {
if (xdl_emit_diff(&xe, xscr, ecb, xecfg) < 0) {
@@ -453,10 +559,8 @@
xdl_free_env(&xe);
return -1;
}
-
xdl_free_script(xscr);
}
-
xdl_free_env(&xe);
return 0;
^ permalink raw reply
* [PATCH] Shell utilities: Guard against expr' magic tokens.
From: Mark Wooding @ 2006-04-13 22:01 UTC (permalink / raw)
To: git
From: Mark Wooding <mdw@distorted.org.uk>
Some words, e.g., `match', are special to expr(1), and cause strange
parsing effects. Track down all uses of expr and mangle the arguments
so that this isn't a problem.
Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
---
Amusing one, this. I hacked on one of my projects, messing with a
simple glob matching function. Being uncreative, I called my topic
branch `match'. When I was ready, I switched back to my master branch
and said
$ git pull . match
Already up-to-date.
Oh. I checked. Nope, not up-to-date. I tried
$ git merge fast HEAD match, and that
and that did the right thing. But I was puzzled. I fired up the
git-bisect machinery and tried to find a good version to no avail. And
then, comparing `sh -x' traces of git-fetch, I noticed what had gone
wrong.
There's a line in git-parse-remote.sh, in canon_refs_list_for_fetch,
which says
expr "$ref" : '.*:' >/dev/null || ref="${ref}:"
In my case, $ref is `match', so this expands to
expr match : '.*:' >...
Unfortunately, GNU expr has a magic keyword `match'. So what this does
is compare `:' to the regexp `.*:', which /succeeds/, even though POSIX
expr without the `match' keyword would do the right thing and fail. So
$ref never has a `:' appended, which makes the later parsing fail, and
all sorts of strange things happen.
This patch puts magical extra characters in expr regexp calls
throughout the shell bits of GIT, to robustify them against this kind of
crapness.
There's a small chance I got something wrong while making this fix. I
was fairly careful, though, and ran the test suite without any
problems. I also checked Cogito, though that has no truck with expr.
---
git-cherry.sh | 2 +-
git-clone.sh | 6 +++---
git-commit.sh | 4 ++--
git-fetch.sh | 18 +++++++++---------
git-format-patch.sh | 4 ++--
git-merge-one-file.sh | 2 +-
git-parse-remote.sh | 20 ++++++++++----------
git-rebase.sh | 2 +-
git-tag.sh | 2 +-
9 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/git-cherry.sh b/git-cherry.sh
index 1a62320..f0e8831 100755
--- a/git-cherry.sh
+++ b/git-cherry.sh
@@ -20,7 +20,7 @@ case "$1" in -v) verbose=t; shift ;; esa
case "$#,$1" in
1,*..*)
- upstream=$(expr "$1" : '\(.*\)\.\.') ours=$(expr "$1" : '.*\.\.\(.*\)$')
+ upstream=$(expr "z$1" : 'z\(.*\)\.\.') ours=$(expr "z$1" : '.*\.\.\(.*\)$')
set x "$upstream" "$ours"
shift ;;
esac
diff --git a/git-clone.sh b/git-clone.sh
index c013e48..0805168 100755
--- a/git-clone.sh
+++ b/git-clone.sh
@@ -38,12 +38,12 @@ Perhaps git-update-server-info needs to
}
while read sha1 refname
do
- name=`expr "$refname" : 'refs/\(.*\)'` &&
+ name=`expr "z$refname" : 'zrefs/\(.*\)'` &&
case "$name" in
*^*) continue;;
esac
if test -n "$use_separate_remote" &&
- branch_name=`expr "$name" : 'heads/\(.*\)'`
+ branch_name=`expr "z$name" : 'zheads/\(.*\)'`
then
tname="remotes/$origin/$branch_name"
else
@@ -346,7 +346,7 @@ then
# new style repository with a symref HEAD).
# Ideally we should skip the guesswork but for now
# opt for minimum change.
- head_sha1=`expr "$head_sha1" : 'ref: refs/heads/\(.*\)'`
+ head_sha1=`expr "z$head_sha1" : 'zref: refs/heads/\(.*\)'`
head_sha1=`cat "$GIT_DIR/$remote_top/$head_sha1"`
;;
esac
diff --git a/git-commit.sh b/git-commit.sh
index bd3dc71..01c73bd 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -549,8 +549,8 @@ fi >>"$GIT_DIR"/COMMIT_EDITMSG
# Author
if test '' != "$force_author"
then
- GIT_AUTHOR_NAME=`expr "$force_author" : '\(.*[^ ]\) *<.*'` &&
- GIT_AUTHOR_EMAIL=`expr "$force_author" : '.*\(<.*\)'` &&
+ GIT_AUTHOR_NAME=`expr "z$force_author" : 'z\(.*[^ ]\) *<.*'` &&
+ GIT_AUTHOR_EMAIL=`expr "z$force_author" : '.*\(<.*\)'` &&
test '' != "$GIT_AUTHOR_NAME" &&
test '' != "$GIT_AUTHOR_EMAIL" ||
die "malformatted --author parameter"
diff --git a/git-fetch.sh b/git-fetch.sh
index 954901d..711650f 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -112,7 +112,7 @@ append_fetch_head () {
*)
note_="$remote_name of " ;;
esac
- remote_1_=$(expr "$remote_" : '\(.*\)\.git/*$') &&
+ remote_1_=$(expr "z$remote_" : 'z\(.*\)\.git/*$') &&
remote_="$remote_1_"
note_="$note_$remote_"
@@ -245,22 +245,22 @@ fetch_main () {
# These are relative path from $GIT_DIR, typically starting at refs/
# but may be HEAD
- if expr "$ref" : '\.' >/dev/null
+ if expr "z$ref" : 'z\.' >/dev/null
then
not_for_merge=t
- ref=$(expr "$ref" : '\.\(.*\)')
+ ref=$(expr "z$ref" : 'z\.\(.*\)')
else
not_for_merge=
fi
- if expr "$ref" : '\+' >/dev/null
+ if expr "z$ref" : 'z\+' >/dev/null
then
single_force=t
- ref=$(expr "$ref" : '\+\(.*\)')
+ ref=$(expr "z$ref" : 'z\+\(.*\)')
else
single_force=
fi
- remote_name=$(expr "$ref" : '\([^:]*\):')
- local_name=$(expr "$ref" : '[^:]*:\(.*\)')
+ remote_name=$(expr "z$ref" : 'z\([^:]*\):')
+ local_name=$(expr "z$ref" : 'z[^:]*:\(.*\)')
rref="$rref$LF$remote_name"
@@ -276,7 +276,7 @@ fetch_main () {
print "$u";
' "$remote_name")
head=$(curl -nsfL $curl_extra_args "$remote/$remote_name_quoted") &&
- expr "$head" : "$_x40\$" >/dev/null ||
+ expr "z$head" : "z$_x40\$" >/dev/null ||
die "Failed to fetch $remote_name from $remote"
echo >&2 Fetching "$remote_name from $remote" using http
git-http-fetch -v -a "$head" "$remote/" || exit
@@ -362,7 +362,7 @@ fetch_main () {
break ;;
esac
done
- local_name=$(expr "$found" : '[^:]*:\(.*\)')
+ local_name=$(expr "z$found" : 'z[^:]*:\(.*\)')
append_fetch_head "$sha1" "$remote" \
"$remote_name" "$remote_nick" "$local_name" "$not_for_merge"
done
diff --git a/git-format-patch.sh b/git-format-patch.sh
index 2ebf7e8..c7133bc 100755
--- a/git-format-patch.sh
+++ b/git-format-patch.sh
@@ -126,8 +126,8 @@ for revpair
do
case "$revpair" in
?*..?*)
- rev1=`expr "$revpair" : '\(.*\)\.\.'`
- rev2=`expr "$revpair" : '.*\.\.\(.*\)'`
+ rev1=`expr "z$revpair" : 'z\(.*\)\.\.'`
+ rev2=`expr "z$revpair" : 'z.*\.\.\(.*\)'`
;;
*)
rev1="$revpair^"
diff --git a/git-merge-one-file.sh b/git-merge-one-file.sh
index 5349a1c..5619409 100755
--- a/git-merge-one-file.sh
+++ b/git-merge-one-file.sh
@@ -26,7 +26,7 @@ #
fi
if test -f "$4"; then
rm -f -- "$4" &&
- rmdir -p "$(expr "$4" : '\(.*\)/')" 2>/dev/null || :
+ rmdir -p "$(expr "z$4" : 'z\(.*\)/')" 2>/dev/null || :
fi &&
exec git-update-index --remove -- "$4"
;;
diff --git a/git-parse-remote.sh b/git-parse-remote.sh
index 63f2281..65c66d5 100755
--- a/git-parse-remote.sh
+++ b/git-parse-remote.sh
@@ -8,8 +8,8 @@ get_data_source () {
case "$1" in
*/*)
# Not so fast. This could be the partial URL shorthand...
- token=$(expr "$1" : '\([^/]*\)/')
- remainder=$(expr "$1" : '[^/]*/\(.*\)')
+ token=$(expr "z$1" : 'z\([^/]*\)/')
+ remainder=$(expr "z$1" : 'z[^/]*/\(.*\)')
if test -f "$GIT_DIR/branches/$token"
then
echo branches-partial
@@ -43,8 +43,8 @@ get_remote_url () {
branches)
sed -e 's/#.*//' "$GIT_DIR/branches/$1" ;;
branches-partial)
- token=$(expr "$1" : '\([^/]*\)/')
- remainder=$(expr "$1" : '[^/]*/\(.*\)')
+ token=$(expr "z$1" : 'z\([^/]*\)/')
+ remainder=$(expr "z$1" : 'z[^/]*/\(.*\)')
url=$(sed -e 's/#.*//' "$GIT_DIR/branches/$token")
echo "$url/$remainder"
;;
@@ -77,13 +77,13 @@ canon_refs_list_for_fetch () {
force=
case "$ref" in
+*)
- ref=$(expr "$ref" : '\+\(.*\)')
+ ref=$(expr "z$ref" : 'z\+\(.*\)')
force=+
;;
esac
- expr "$ref" : '.*:' >/dev/null || ref="${ref}:"
- remote=$(expr "$ref" : '\([^:]*\):')
- local=$(expr "$ref" : '[^:]*:\(.*\)')
+ expr "z$ref" : 'z.*:' >/dev/null || ref="${ref}:"
+ remote=$(expr "z$ref" : 'z\([^:]*\):')
+ local=$(expr "z$ref" : 'z[^:]*:\(.*\)')
case "$remote" in
'') remote=HEAD ;;
refs/heads/* | refs/tags/* | refs/remotes/*) ;;
@@ -97,7 +97,7 @@ canon_refs_list_for_fetch () {
*) local="refs/heads/$local" ;;
esac
- if local_ref_name=$(expr "$local" : 'refs/\(.*\)')
+ if local_ref_name=$(expr "z$local" : 'zrefs/\(.*\)')
then
git-check-ref-format "$local_ref_name" ||
die "* refusing to create funny ref '$local_ref_name' locally"
@@ -171,7 +171,7 @@ get_remote_refs_for_fetch () {
resolve_alternates () {
# original URL (xxx.git)
- top_=`expr "$1" : '\([^:]*:/*[^/]*\)/'`
+ top_=`expr "z$1" : 'z\([^:]*:/*[^/]*\)/'`
while read path
do
case "$path" in
diff --git a/git-rebase.sh b/git-rebase.sh
index 5956f06..86dfe9c 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -94,7 +94,7 @@ case "$#" in
;;
*)
branch_name=`git symbolic-ref HEAD` || die "No current branch"
- branch_name=`expr "$branch_name" : 'refs/heads/\(.*\)'`
+ branch_name=`expr "z$branch_name" : 'zrefs/heads/\(.*\)'`
;;
esac
branch=$(git-rev-parse --verify "${branch_name}^0") || exit
diff --git a/git-tag.sh b/git-tag.sh
index 76e51ed..dc6aa95 100755
--- a/git-tag.sh
+++ b/git-tag.sh
@@ -75,7 +75,7 @@ git-check-ref-format "tags/$name" ||
object=$(git-rev-parse --verify --default HEAD "$@") || exit 1
type=$(git-cat-file -t $object) || exit 1
tagger=$(git-var GIT_COMMITTER_IDENT) || exit 1
-: ${username:=$(expr "$tagger" : '\(.*>\)')}
+: ${username:=$(expr "z$tagger" : 'z\(.*>\)')}
trap 'rm -f "$GIT_DIR"/TAG_TMP* "$GIT_DIR"/TAG_FINALMSG "$GIT_DIR"/TAG_EDITMSG' 0
-- [mdw]
^ permalink raw reply related
* [PATCH] diff-options: add --stat (take 2)
From: Johannes Schindelin @ 2006-04-13 22:15 UTC (permalink / raw)
To: git, junkio
Now, you can say "git diff --stat" (to get an idea how many changes are
uncommitted), or "git log --stat".
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
Thanks to Junio's comments, this looks much better now; I am
reasonably happy about it (it even lost some pounds: 31 lines).
I still did not find a way to share code with git-apply's diffstat
code, though. But then, it is only one function (show_stats).
Documentation/diff-options.txt | 3 +
diff.c | 220 +++++++++++++++++++++++++++++++++++++++-
diff.h | 2
git-diff.sh | 6 +
4 files changed, 225 insertions(+), 6 deletions(-)
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 338014c..447e522 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -7,6 +7,9 @@
--patch-with-raw::
Generate patch but keep also the default raw diff output.
+--stat::
+ Generate a diffstat instead of a patch.
+
-z::
\0 line termination on output
diff --git a/diff.c b/diff.c
index a14e664..ad8478b 100644
--- a/diff.c
+++ b/diff.c
@@ -8,7 +8,7 @@ #include "cache.h"
#include "quote.h"
#include "diff.h"
#include "diffcore.h"
-#include "xdiff/xdiff.h"
+#include "xdiff-interface.h"
static int use_size_cache;
@@ -195,6 +195,137 @@ static int fn_out(void *priv, mmbuffer_t
return 0;
}
+struct diffstat_t {
+ struct xdiff_emit_state xm;
+
+ int nr;
+ int alloc;
+ struct diffstat_file {
+ char *name;
+ unsigned int added, deleted;
+ } **files;
+};
+
+static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat,
+ const char *name)
+{
+ struct diffstat_file *x;
+ x = xcalloc(sizeof (*x), 1);
+ if (diffstat->nr == diffstat->alloc) {
+ diffstat->alloc = alloc_nr(diffstat->alloc);
+ diffstat->files = xrealloc(diffstat->files,
+ diffstat->alloc * sizeof(x));
+ }
+ diffstat->files[diffstat->nr++] = x;
+ x->name = strdup(name);
+ return x;
+}
+
+static void diffstat_consume(void *priv, char *line, unsigned long len)
+{
+ struct diffstat_t *diffstat = priv;
+ struct diffstat_file *x = diffstat->files[diffstat->nr - 1];
+
+ if (line[0] == '+')
+ x->added++;
+ else if (line[0] == '-')
+ x->deleted++;
+}
+
+static const char pluses[] = "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++";
+static const char minuses[]= "----------------------------------------------------------------------";
+
+static void show_stats(struct diffstat_t* data)
+{
+ char *prefix = "";
+ int i, len, add, del, total, adds = 0, dels = 0;
+ int max, max_change = 0, max_len = 0;
+ int total_files = data->nr;
+
+ if (data->nr == 0)
+ return;
+
+ printf("---\n");
+
+ for (i = 0; i < data->nr; i++) {
+ struct diffstat_file *file = data->files[i];
+
+ if (max_change < file->added + file->deleted)
+ max_change = file->added + file->deleted;
+ len = strlen(file->name);
+ if (max_len < len)
+ max_len = len;
+ }
+
+ for (i = 0; i < data->nr; i++) {
+ char *name = data->files[i]->name;
+ int added = data->files[i]->added;
+ int deleted = data->files[i]->deleted;
+
+ if (0 < (len = quote_c_style(name, NULL, NULL, 0))) {
+ char *qname = xmalloc(len + 1);
+ quote_c_style(name, qname, NULL, 0);
+ free(name);
+ name = qname;
+ }
+
+ /*
+ * "scale" the filename
+ */
+ len = strlen(name);
+ max = max_len;
+ if (max > 50)
+ max = 50;
+ if (len > max) {
+ char *slash;
+ prefix = "...";
+ max -= 3;
+ name += len - max;
+ slash = strchr(name, '/');
+ if (slash)
+ name = slash;
+ }
+ len = max;
+
+ /*
+ * scale the add/delete
+ */
+ max = max_change;
+ if (max + len > 70)
+ max = 70 - len;
+
+ if (added < 0) {
+ /* binary file */
+ printf(" %s%-*s | Bin\n", prefix, len, name);
+ continue;
+ } else if (added + deleted == 0) {
+ total_files--;
+ continue;
+ }
+
+ add = added;
+ del = deleted;
+ total = add + del;
+ adds += add;
+ dels += del;
+
+ if (max_change > 0) {
+ total = (total * max + max_change / 2) / max_change;
+ add = (add * max + max_change / 2) / max_change;
+ del = total - add;
+ }
+ /* TODO: binary */
+ printf(" %s%-*s |%5d %.*s%.*s\n", prefix,
+ len, name, added + deleted,
+ add, pluses, del, minuses);
+ free(name);
+ free(data->files[i]);
+ }
+ free(data->files);
+ printf(" %d files changed, %d insertions(+), %d deletions(-)\n",
+ total_files, adds, dels);
+}
+
#define FIRST_FEW_BYTES 8000
static int mmfile_is_binary(mmfile_t *mf)
{
@@ -285,7 +416,36 @@ static void builtin_diff(const char *nam
free(b_two);
return;
}
+
+static void builtin_diffstat(const char *name_a, const char *name_b,
+ struct diff_filespec *one, struct diff_filespec *two,
+ struct diffstat_t *diffstat)
+{
+ mmfile_t mf1, mf2;
+ struct diffstat_file *data;
+ data = diffstat_add(diffstat, name_a ? name_a : name_b);
+
+ if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
+ die("unable to read files to diff");
+
+ if (mmfile_is_binary(&mf1) || mmfile_is_binary(&mf2))
+ data->added = -1;
+ else {
+ /* Crazy xdl interfaces.. */
+ xpparam_t xpp;
+ xdemitconf_t xecfg;
+ xdemitcb_t ecb;
+
+ xpp.flags = XDF_NEED_MINIMAL;
+ xecfg.ctxlen = 3;
+ xecfg.flags = XDL_EMIT_FUNCNAMES;
+ ecb.outf = xdiff_outf;
+ ecb.priv = diffstat;
+ xdl_diff(&mf1, &mf2, &xpp, &xecfg, &ecb);
+ }
+}
+
struct diff_filespec *alloc_filespec(const char *path)
{
int namelen = strlen(path);
@@ -818,7 +978,28 @@ static void run_diff(struct diff_filepai
free(name_munged);
free(other_munged);
}
+
+static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
+ struct diffstat_t *diffstat)
+{
+ const char *name;
+ const char *other;
+ if (DIFF_PAIR_UNMERGED(p)) {
+ /* unmerged */
+ builtin_diffstat(p->one->path, NULL, NULL, NULL, diffstat);
+ return;
+ }
+
+ name = p->one->path;
+ other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+
+ diff_fill_sha1_info(p->one);
+ diff_fill_sha1_info(p->two);
+
+ builtin_diffstat(name, other, p->one, p->two, diffstat);
+}
+
void diff_setup(struct diff_options *options)
{
memset(options, 0, sizeof(*options));
@@ -866,6 +1047,8 @@ int diff_opt_parse(struct diff_options *
options->output_format = DIFF_FORMAT_PATCH;
options->with_raw = 1;
}
+ else if (!strcmp(arg, "--stat"))
+ options->output_format = DIFF_FORMAT_DIFFSTAT;
else if (!strcmp(arg, "-z"))
options->line_termination = 0;
else if (!strncmp(arg, "-l", 2))
@@ -1163,6 +1346,19 @@ static void diff_flush_patch(struct diff
return; /* no tree diffs in patch format */
run_diff(p, o);
+}
+
+static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
+ struct diffstat_t *diffstat)
+{
+ if (diff_unmodified_pair(p))
+ return;
+
+ if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+ (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
+ return; /* no tree diffs in patch format */
+
+ run_diffstat(p, o, diffstat);
}
int diff_queue_is_empty(void)
@@ -1276,7 +1472,8 @@ static void diff_resolve_rename_copy(voi
static void flush_one_pair(struct diff_filepair *p,
int diff_output_format,
- struct diff_options *options)
+ struct diff_options *options,
+ struct diffstat_t *diffstat)
{
int inter_name_termination = '\t';
int line_termination = options->line_termination;
@@ -1291,6 +1488,9 @@ static void flush_one_pair(struct diff_f
break;
default:
switch (diff_output_format) {
+ case DIFF_FORMAT_DIFFSTAT:
+ diff_flush_stat(p, options, diffstat);
+ break;
case DIFF_FORMAT_PATCH:
diff_flush_patch(p, options);
break;
@@ -1316,19 +1516,31 @@ void diff_flush(struct diff_options *opt
struct diff_queue_struct *q = &diff_queued_diff;
int i;
int diff_output_format = options->output_format;
+ struct diffstat_t *diffstat = NULL;
+ if (diff_output_format == DIFF_FORMAT_DIFFSTAT) {
+ diffstat = xcalloc(sizeof (struct diffstat_t), 1);
+ diffstat->xm.consume = diffstat_consume;
+ }
+
if (options->with_raw) {
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- flush_one_pair(p, DIFF_FORMAT_RAW, options);
+ flush_one_pair(p, DIFF_FORMAT_RAW, options, NULL);
}
putchar(options->line_termination);
}
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- flush_one_pair(p, diff_output_format, options);
+ flush_one_pair(p, diff_output_format, options, diffstat);
diff_free_filepair(p);
}
+
+ if (diffstat) {
+ show_stats(diffstat);
+ free(diffstat);
+ }
+
free(q->queue);
q->queue = NULL;
q->nr = q->alloc = 0;
diff --git a/diff.h b/diff.h
index 236095f..2f8aff2 100644
--- a/diff.h
+++ b/diff.h
@@ -119,6 +119,7 @@ #define COMMON_DIFF_OPTIONS_HELP \
" -u synonym for -p.\n" \
" --patch-with-raw\n" \
" output both a patch and the diff-raw format.\n" \
+" --stat show diffstat instead of patch.\n" \
" --name-only show only names of changed files.\n" \
" --name-status show names and status of changed files.\n" \
" --full-index show full object name on index lines.\n" \
@@ -142,6 +143,7 @@ #define DIFF_FORMAT_PATCH 2
#define DIFF_FORMAT_NO_OUTPUT 3
#define DIFF_FORMAT_NAME 4
#define DIFF_FORMAT_NAME_STATUS 5
+#define DIFF_FORMAT_DIFFSTAT 6
extern void diff_flush(struct diff_options*);
diff --git a/git-diff.sh b/git-diff.sh
index dc0dd31..0fe6770 100755
--- a/git-diff.sh
+++ b/git-diff.sh
@@ -30,9 +30,11 @@ case " $flags " in
cc_or_p=--cc ;;
esac
-# If we do not have --name-status, --name-only, -r, or -c default to --cc.
+# If we do not have --name-status, --name-only, -r, -c or --stat,
+# default to --cc.
case " $flags " in
-*" '--name-status' "* | *" '--name-only' "* | *" '-r' "* | *" '-c' "* )
+*" '--name-status' "* | *" '--name-only' "* | *" '-r' "* | *" '-c' "* | \
+*" '--stat' "*)
;;
*)
flags="$flags'$cc_or_p' " ;;
^ permalink raw reply related
* Re: [PATCH] diff-options: add --stat (take 2)
From: Johannes Schindelin @ 2006-04-13 23:09 UTC (permalink / raw)
To: git, junkio
In-Reply-To: <Pine.LNX.4.63.0604140012560.10924@wbgn013.biozentrum.uni-wuerzburg.de>
... and a fix for an invalid free():
---
diff.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
14d8e3c7cda1e2aaff62375fe34db2458d302173
diff --git a/diff.c b/diff.c
index ad8478b..2968153 100644
--- a/diff.c
+++ b/diff.c
@@ -266,7 +266,7 @@ static void show_stats(struct diffstat_t
char *qname = xmalloc(len + 1);
quote_c_style(name, qname, NULL, 0);
free(name);
- name = qname;
+ data->files[i]->name = name = qname;
}
/*
@@ -297,10 +297,10 @@ static void show_stats(struct diffstat_t
if (added < 0) {
/* binary file */
printf(" %s%-*s | Bin\n", prefix, len, name);
- continue;
+ goto free_diffstat_file;
} else if (added + deleted == 0) {
total_files--;
- continue;
+ goto free_diffstat_file;
}
add = added;
@@ -314,11 +314,11 @@ static void show_stats(struct diffstat_t
add = (add * max + max_change / 2) / max_change;
del = total - add;
}
- /* TODO: binary */
printf(" %s%-*s |%5d %.*s%.*s\n", prefix,
len, name, added + deleted,
add, pluses, del, minuses);
- free(name);
+ free_diffstat_file:
+ free(data->files[i]->name);
free(data->files[i]);
}
free(data->files);
--
1.3.0.rc3.g9813
^ permalink raw reply related
* Re: [RFH] shifting xdiff hunks?
From: Junio C Hamano @ 2006-04-13 23:31 UTC (permalink / raw)
To: Davide Libenzi; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604131452250.10564@alien.or.mcafeemobile.com>
Davide Libenzi <davidel@xmailserver.org> writes:
> On Wed, 12 Apr 2006, Davide Libenzi wrote:
>
>> Yes, this is what GNU diff does. It's a post-process of the edit
>> script. Not a problem at all. Till this weekend (included) I'm
>> pretty booked, but I'll do that in the following days.
>
> Dang, that was a short weekend. I found a lunch-time hour for
> this. Would you try to see if this libxdiff-based diff merges on your
> tree?
> See also how it looks for you.
Very impressed, and pleased with the result. I've only taken a
cursory look, but with a very limited number of tests, it looks
much better. Thanks.
For the sake of full disclosure, the reason I wanted consistency
was not for the diff output I quoted earlier, but to help making
the combined patch output cleaner. It does reduce false match
from the infamous 12-way Octopus by Len Brown:
git diff-tree --cc 9fdb62af92c741addbea15545f214a6e89460865
^ permalink raw reply
* Re: [PATCH] Shell utilities: Guard against expr' magic tokens.
From: Junio C Hamano @ 2006-04-13 23:39 UTC (permalink / raw)
To: Mark Wooding; +Cc: git
In-Reply-To: <slrne3tihk.1dq.mdw@metalzone.distorted.org.uk>
Mark Wooding <mdw@distorted.org.uk> writes:
> From: Mark Wooding <mdw@distorted.org.uk>
>
> Some words, e.g., `match', are special to expr(1), and cause strange
> parsing effects. Track down all uses of expr and mangle the arguments
> so that this isn't a problem.
Gaaaaaaaaaah.
http://www.opengroup.org/onlinepubs/009695399/utilities/expr.html
says use of length, substr, index, match as string arguments
produces unspecified results, so obviously the program was
wrong.
Thanks.
^ permalink raw reply
* [PATCH] Fix-up previous expr changes.
From: Junio C Hamano @ 2006-04-14 2:12 UTC (permalink / raw)
To: git; +Cc: Mark Wooding
In-Reply-To: <slrne3tihk.1dq.mdw@metalzone.distorted.org.uk>
The regexp on the right hand side of expr : operator somehow was
broken.
expr 'z+pu:refs/tags/ko-pu' : 'z\+\(.*\)'
does not strip '+'; write 'z+\(.*\)' instead.
We probably should switch to shell based substring post 1.3.0;
that's not bashism but just POSIX anyway.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
* Funny thing is that before the z prefixing, the code was
already broken (we said expr "$ref" : '\+\(.*\)'), but
somehow it worked. It could be a bug in expr.
# already buggy but did not trigger somehow.
: siamese; expr '+pu:ko-pu' : '\+\(.*\)'
pu:ko-pu
# z prefix exposed the breakage.
: siamese; expr 'z+pu:ko-pu' : 'z\+\(.*\)'
+pu:ko-pu
# the fix-up this patch is about.
: siamese; expr 'z+pu:ko-pu' : 'z+\(.*\)'
pu:ko-pu
# this is the way it should have been written from the start.
: siamese; expr '+pu:ko-pu' : '+\(.*\)'
pu:ko-pu
# maybe I am using broken expr...
: siamese; type expr
expr is hashed (/usr/bin/expr)
: siamese; /usr/bin/expr --version |head -n2
expr (GNU coreutils) 5.94
Copyright (C) 2006 Free Software Foundation, Inc.
git-fetch.sh | 4 ++--
git-parse-remote.sh | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
dfdcb558ecf93c0e09b8dab89cff4839e8c95e36
diff --git a/git-fetch.sh b/git-fetch.sh
index 711650f..83143f8 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -252,10 +252,10 @@ fetch_main () {
else
not_for_merge=
fi
- if expr "z$ref" : 'z\+' >/dev/null
+ if expr "z$ref" : 'z+' >/dev/null
then
single_force=t
- ref=$(expr "z$ref" : 'z\+\(.*\)')
+ ref=$(expr "z$ref" : 'z+\(.*\)')
else
single_force=
fi
diff --git a/git-parse-remote.sh b/git-parse-remote.sh
index 65c66d5..c9b899e 100755
--- a/git-parse-remote.sh
+++ b/git-parse-remote.sh
@@ -77,7 +77,7 @@ canon_refs_list_for_fetch () {
force=
case "$ref" in
+*)
- ref=$(expr "z$ref" : 'z\+\(.*\)')
+ ref=$(expr "z$ref" : 'z+\(.*\)')
force=+
;;
esac
--
1.3.0.rc3.gce03
^ permalink raw reply related
* Solaris test t5500 race condition
From: Peter Eriksen @ 2006-04-14 3:17 UTC (permalink / raw)
To: git
Hello,
I've found a race in t5500-fetch-pack.sh. The problem is the way the
number of unpacked objects are counted:
pack_count=$(grep Unpacking log.txt|tr -dc "0-9")
It just concatenates all the digits on the line with "Unpacking" in it.
This is the output I get on Solaris:
Generating pack...
Done counting 3 objects.
Deltifying 3 objects.
33% (1/3) done^M 66% (2/3) done^M 100% (3/3) done
Total 3Unpacking , written 33 objects <------------
(delta 0), reused 0 (delta 0)
11fa2f0cb58ed7f02dbd5ac75ed82a53fae62a7b refs/heads/A
The marked line is written as a joyful duet between these
two functions:
unpack-objects.c: fprintf(stderr, "Unpacking %d objects\n",
nr_objects);
pack-objects.c: fprintf(stderr, "Total %d, written %d
(delta %d), reused %d (delta %d)\n",
I can't think of a good solution right now.
Regards,
Peter
^ permalink raw reply
* [PATCH] diff --stat: no need to ask funcnames nor context.
From: Junio C Hamano @ 2006-04-14 4:37 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0604140012560.10924@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Now, you can say "git diff --stat" (to get an idea how many changes are
> uncommitted), or "git log --stat".
>
> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Nice.
-- >8 --
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
84981f9ad963f050abf4fe33ac07d36b4ea90c6d
diff --git a/diff.c b/diff.c
index c120239..f1b672d 100644
--- a/diff.c
+++ b/diff.c
@@ -438,8 +438,8 @@ static void builtin_diffstat(const char
xdemitcb_t ecb;
xpp.flags = XDF_NEED_MINIMAL;
- xecfg.ctxlen = 3;
- xecfg.flags = XDL_EMIT_FUNCNAMES;
+ xecfg.ctxlen = 0;
+ xecfg.flags = 0;
ecb.outf = xdiff_outf;
ecb.priv = diffstat;
xdl_diff(&mf1, &mf2, &xpp, &xecfg, &ecb);
--
1.3.0.rc3.g9306
^ permalink raw reply related
* Re: Solaris test t5500 race condition
From: Jason Riedy @ 2006-04-14 5:03 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
In-Reply-To: <20060414031759.GA9524@bohr.gbar.dtu.dk>
And "Peter Eriksen" writes:
- I've found a race in t5500-fetch-pack.sh.
Crap. I ran into this on AIX a while ago; I was hoping no
other systems would see it. There are no guarantees that
the two processes' outputs will be mutually line buffered.
Luckily, it's just a cosmetic problem, but it does cause
that test case to fail.
I know how to fix it (imho), but have no time to implement
it. There needs to be a separate communication stage after
negotiating the objects and before dumping the pack. During
that stage, upload-pack would just send progress notices to
the caller. Only the caller would communicate to the terminal.
Some other ideas are in
http://marc.theaimsgroup.com/?l=git&m=114357528512063&w=2
Jason
^ permalink raw reply
* Re: Solaris test t5500 race condition
From: Junio C Hamano @ 2006-04-14 5:34 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
In-Reply-To: <20060414031759.GA9524@bohr.gbar.dtu.dk>
"Peter Eriksen" <s022018@student.dtu.dk> writes:
> Generating pack...
> Done counting 3 objects.
> Deltifying 3 objects.
> 33% (1/3) done^M 66% (2/3) done^M 100% (3/3) done
> Total 3Unpacking , written 33 objects <------------
> (delta 0), reused 0 (delta 0)
> 11fa2f0cb58ed7f02dbd5ac75ed82a53fae62a7b refs/heads/A
Hmph. Not good. Before the writer managed to flush the report
the reader has already decoded the header and reports the number
of objects it is going to unpack.
Unfortunately the Solaris box I have access to is perhaps
sufficiently slow that this is not an issue X-<.
I think test based on the eye-candy is fragile anyway. We would
want to probably _count_ before and after to see if the command
did what we expected.
There is a subtle difficulty doing so, however. The test is
trying to see if fetch-pack vs upload-pack negotiations result
in minimal transfer, but if it is not, unpack side would just
happily say "I received this one, oh, I already have it".
We could do "fetch-pack -k" to keep the result packed, count the
number of objects in the resulting pack.
How about doing something like this instead?
-- >8 --
[PATCH] t5500: test fix
Relying on eye-candy progress bar was fragile to begin with.
Run fetch-pack with -k option, and count the objects that are in
the pack that were transferred from the other end.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
t/t5500-fetch-pack.sh | 33 ++++++++++++++-------------------
1 files changed, 14 insertions(+), 19 deletions(-)
7f732c632ff7a1adc2309257becdc0c1fe76b514
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index e15e14f..92f12d9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -12,11 +12,6 @@ # Test fetch-pack/upload-pack pair.
# Some convenience functions
-function show_count () {
- commit_count=$(($commit_count+1))
- printf " %d\r" $commit_count
-}
-
function add () {
local name=$1
local text="$@"
@@ -55,13 +50,6 @@ function test_expect_object_count () {
"test $count = $output"
}
-function test_repack () {
- local rep=$1
-
- test_expect_success "repack && prune-packed in $rep" \
- '(git-repack && git-prune-packed)2>>log.txt'
-}
-
function pull_to_client () {
local number=$1
local heads=$2
@@ -70,13 +58,23 @@ function pull_to_client () {
cd client
test_expect_success "$number pull" \
- "git-fetch-pack -v .. $heads > log.txt 2>&1"
+ "git-fetch-pack -k -v .. $heads"
case "$heads" in *A*) echo $ATIP > .git/refs/heads/A;; esac
case "$heads" in *B*) echo $BTIP > .git/refs/heads/B;; esac
git-symbolic-ref HEAD refs/heads/${heads:0:1}
+
test_expect_success "fsck" 'git-fsck-objects --full > fsck.txt 2>&1'
- test_expect_object_count "after $number pull" $count
- pack_count=$(grep Unpacking log.txt|tr -dc "0-9")
+
+ test_expect_success 'check downloaded results' \
+ 'mv .git/objects/pack/pack-* . &&
+ p=`ls -1 pack-*.pack` &&
+ git-unpack-objects <$p &&
+ git-fsck-objects --full'
+
+ test_expect_success "new object count after $number pull" \
+ 'idx=`echo pack-*.idx` &&
+ pack_count=`git-show-index <$idx | wc -l` &&
+ test $pack_count = $count'
test -z "$pack_count" && pack_count=0
if [ -z "$no_strict_count_check" ]; then
test_expect_success "minimal count" "test $count = $pack_count"
@@ -84,6 +82,7 @@ function pull_to_client () {
test $count != $pack_count && \
echo "WARNING: $pack_count objects transmitted, only $count of which were needed"
fi
+ rm -f pack-*
cd ..
}
@@ -117,8 +116,6 @@ git-symbolic-ref HEAD refs/heads/B
pull_to_client 1st "B A" $((11*3))
-(cd client; test_repack client)
-
add A11 $A10
prev=1; cur=2; while [ $cur -le 65 ]; do
@@ -129,8 +126,6 @@ done
pull_to_client 2nd "B" $((64*3))
-(cd client; test_repack client)
-
pull_to_client 3rd "A" $((1*3)) # old fails
test_done
--
1.3.0.rc3.g9306
^ permalink raw reply related
* What's in git.git
From: Junio C Hamano @ 2006-04-14 7:49 UTC (permalink / raw)
To: git
Getting closer with bunch of fixes, perhaps a real 1.3.0 early
next week.
I'd appreciate people beating what's in the "master" branch to
shake down the last minute brown paper bag problems.
BTW, I shifted my git day from usual Wednesday to Thursday this
week. I may do the same the next week.
* The 'master' branch has these since the last announcement.
- More Solaris 9 portability (Dennis Stosberg)
- kill index() and replace it with strchr() (Dennis Stosberg)
- git-apply -C to apply patch with fuzz (Eric W. Biederman)
- git-log [diff options]
- Retire git-log.sh
- Combine-diff fix
- Retire t5501 test
- Fix "echo -n foo | git commit -F -"
- diff --patch-with-raw (Pasky and me)
- Documentation updates (Pasky and me)
- Fix running t3600 test as root.
- "expr match : foobar" fix (Mark Wooding and me)
- commit message formatting fix for incomplete line (Linus)
- git-log memory footprint fix (Linus)
* The 'next' branch, in addition, has these.
- xdiff: post-process hunks to make them consistent (Davide Libenzi)
- diff --stat (Johannes Schindelin and me)
- t5500 test fix
^ permalink raw reply
* Recent unresolved issues
From: Junio C Hamano @ 2006-04-14 9:31 UTC (permalink / raw)
To: git
Here is a list of topics in the recent git traffic that I feel
inadequately addressed. I've commented on some of them to give
people a feel for what my priorities are. Somebody might want
to rehash the ones low on my priority list to conclusion with a
concrete proposal if they cared about them enough. The list is
*not* ordered in any way.
Also please add whatever I missed (or dismissed). I am hoping
this will be a good basis for 1.4 to-do list.
* Message-ID: <Pine.LNX.4.64.0604121828370.14565@g5.osdl.org>
Common option parsing (Linus Torvalds)
* Message-ID: <Pine.LNX.4.64.0604050855080.2550@localhost.localdomain>
Binary diff output? (Nicolas Pitre)
I do not think this is needed for our primary audience (the
kernel project), but I am sure it would be helpful for some
other projects if we allowed them to exchange patches that
describe binary file changes via e-mail, so I am not
dismissing this. Needs to wait "option parsing".
* Message-ID: <Pine.LNX.4.64.0604111725590.14565@g5.osdl.org>
Colored diff? (Linus Torvalds)
I am not opposed to it, but I'd like to do that internally if
we go this route. Needs to wait "option parsing". Also
Message-ID: <3536.10.10.10.24.1114117965.squirrel@linux1> is
slightly related to this.
* Message-ID: <7vek02ynif.fsf@assigned-by-dhcp.cox.net>
diff --with-raw, --with-stat? (me)
I think "git diff" can be internalized next, after "option
parsing" unification. When that is done, --with-stat would
help internalize format-patch's process_one(), and it would be
trivial to do "git log --pretty=format-patch master..next".
* #irc 2006-04-10
Shallow clones (Carl Worth).
The experiment last round did not work out very well, but as
existing repositories get bigger, and more projects being
migrated from foreign SCM systems, this would become a
must-have from would-be-nice-to-have.
I am beginning to think using "graft" to cauterize history
for this, while it technically would work, would not be so
helpful to users, so the design needs to be worked out again.
* Message-ID: <E1FMH3o-0001B5-Dw@jdl.com>
git status does not distinguish contents changes and mode
changes; it just says "modified" (Jon Loeliger).
Unconditionally changing the status letter would break
Porcelains so we would need an extra option to do this.
An outline patch has been already prepared -- this perhaps has
to wait until we sort out the "option parsing" one.
* Message-ID: <tnxmzf9sh7k.fsf@arm.com>
git could use diff3 instead of merge which is a wrapper around
diff3. (Catalin Marinas)
If having "diff3" is a lot more common than having "merge", I
do not have problem with this; "merge" being a wrapper to
"diff3", people who have been happy with the current code
would certainly have "diff3" installed so changing to "diff3"
would not break them.
* Message-ID: <81b0412b0603020649u99a2035i3b8adde8ddce9410@mail.gmail.com>
Windows problems summary (Alex Riesen)
A good list to keep in mind.
* Message-ID: <Pine.LNX.4.64.0604030730040.3781@g5.osdl.org>
Huge packfiles (Linus Torvalds)
Because I do not think asking users to break up packs to
manageable and mmap()able size is too much to ask, I would not
be advocating for updating the pack idx to 64-bit offset and
mmap()ing parts of a packfile, at least too strongly.
However, we currently lack tool support or recepe for users
with such a repository to easily break up packs.
* Message-ID: <1143856098.3555.48.camel@dv>
Per branch property, esp. where to merge from (Pavel Roskin)
This involves user-level "world model" design, which is more
Porcelainish than Plumbing, and as people know I do not do
Porcelain well; interested parties need to come up with what
they want and how they want to use it.
^ permalink raw reply
* Re: Solaris test t5500 race condition
From: Peter Eriksen @ 2006-04-14 11:53 UTC (permalink / raw)
To: git
In-Reply-To: <7vhd4wvhyq.fsf@assigned-by-dhcp.cox.net>
On Thu, Apr 13, 2006 at 10:34:05PM -0700, Junio C Hamano wrote:
> "Peter Eriksen" <s022018@student.dtu.dk> writes:
>
> > Generating pack...
> > Done counting 3 objects.
> > Deltifying 3 objects.
> > 33% (1/3) done^M 66% (2/3) done^M 100% (3/3) done
> > Total 3Unpacking , written 33 objects <------------
> > (delta 0), reused 0 (delta 0)
> > 11fa2f0cb58ed7f02dbd5ac75ed82a53fae62a7b refs/heads/A
>
> Hmph. Not good. Before the writer managed to flush the report
> the reader has already decoded the header and reports the number
> of objects it is going to unpack.
...
> -- >8 --
> [PATCH] t5500: test fix
With the patch it doesn't complain anymore. There are many other
problems with the tests on Solaris though.
Peter
^ permalink raw reply
* Re: Recent unresolved issues
From: Petr Baudis @ 2006-04-14 16:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7v64lcqz9j.fsf@assigned-by-dhcp.cox.net>
Dear diary, on Fri, Apr 14, 2006 at 11:31:36AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Here is a list of topics in the recent git traffic that I feel
> inadequately addressed. I've commented on some of them to give
> people a feel for what my priorities are. Somebody might want
> to rehash the ones low on my priority list to conclusion with a
> concrete proposal if they cared about them enough. The list is
> *not* ordered in any way.
Nice summary!
> * Message-ID: <tnxmzf9sh7k.fsf@arm.com>
> git could use diff3 instead of merge which is a wrapper around
> diff3. (Catalin Marinas)
>
> If having "diff3" is a lot more common than having "merge", I
> do not have problem with this; "merge" being a wrapper to
> "diff3", people who have been happy with the current code
> would certainly have "diff3" installed so changing to "diff3"
> would not break them.
I've decided to bite the bullet and made Cogito use diff3 instead of
merge as of now. Let's see if anybody complains...
> * Message-ID: <1143856098.3555.48.camel@dv>
> Per branch property, esp. where to merge from (Pavel Roskin)
>
> This involves user-level "world model" design, which is more
> Porcelainish than Plumbing, and as people know I do not do
> Porcelain well; interested parties need to come up with what
> they want and how they want to use it.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time. I think
I have forgotten this before.
^ permalink raw reply
* Re: Default remote branch for local branch
From: Petr Baudis @ 2006-04-14 16:16 UTC (permalink / raw)
To: Josef Weidendorfer; +Cc: Pavel Roskin, Junio C Hamano, git
In-Reply-To: <200604021817.30222.Josef.Weidendorfer@gmx.de>
Dear diary, on Sun, Apr 02, 2006 at 06:17:29PM CEST, I got a letter
where Josef Weidendorfer <Josef.Weidendorfer@gmx.de> said that...
> > I would write the config like this:
> >
> > [branch-upstream]
> > master = linus
> > ata-irq-pio = irq-pio
> > ata-pata = pata-drivers
>
> That is not working, as said above. But with above syntax extension,
> with s/=/for/ it would be fine.
I'm sorry but I'm slow and I don't see it - why wouldn't this work?
(Except that the key name is case insensitive, which isn't too big a
deal IMHO.)
I for one think that the 'for'-syntax is insane - it's unreadable (your
primary query is by far most likely to be "what's the upstream when on
branch X", not "what branches is this upstream for"), would convolute
the configuration file syntax unnecessarily and would possibly also
complicate the git-repo-config interface. Pavel's syntax is much nicer.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time. I think
I have forgotten this before.
^ permalink raw reply
* git-stripspace breakage
From: Linus Torvalds @ 2006-04-14 16:40 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Junio,
the current git-stripspace leaves extra newlines at the end, causing ugly
commit logs in "git log". I assume/suspect that it's the recent
"incomplete line" handling (that I acked, bad me), but I didn't actually
test.
Trivially tested thus:
[torvalds@g5 git]$ git-stripspace <<EOF
> a
>
> EOF
a
[torvalds@g5 git]$
note the extra unnecessary newline..
Linus
^ permalink raw reply
* Re: Solaris test t5500 race condition
From: Jason Riedy @ 2006-04-14 16:41 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
In-Reply-To: <20060414115317.GA5191@bohr.gbar.dtu.dk>
And "Peter Eriksen" writes:
- > -- >8 --
- > [PATCH] t5500: test fix
-
- With the patch it doesn't complain anymore. There are many other
- problems with the tests on Solaris though.
I just ran next branch's tests on 5.8 with no problems. Could
you be a bit more specific?
Jason
^ permalink raw reply
* Re: Fix up diffcore-rename scoring
From: Geert Bosch @ 2006-04-14 17:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzer4vmm.fsf@assigned-by-dhcp.cox.net>
On Apr 11, 2006, at 18:04, Junio C Hamano wrote:
>> Here's a possible way to do that first cut. Basically,
>> compute a short (256-bit) fingerprint for each file, such
>> that the Hamming distance between two fingerprints is a measure
>> for their similarity. I'll include a draft write up below.
>
> Thanks for starting this.
>
> There are a few things I need to talk about the way "similarity"
> is _used_ in the current algorithms.
>
> Rename/copy detection outputs "similarity" but I suspect what
> the algorithm wants is slightly different from what humans think
> of "similarity". It is somewhere between "similarity" and
> "commonness". When you are grading a 130-page report a student
> submitted, you would want to notice that last 30 pages are
> almost verbatim copy from somebody else's report. The student
> in question added 100-page original contents so maybe this is
> not too bad, but if the report were a 30-page one, and the
> entier 30 pages were borrowed from somebody else's 130-page
> report, you would _really_ want to notice.
There just isn't enough information in a 256-bit fingerprint
to be able to determine if two strings have a long common
substring. Also, when the input gets longer, like a few MB,
or when the input has little information content (compresses
very well), statistical bias will reduce reliability.
Still, I used the similarity test on large tar archives, such
as complete GCC releases, and it does give reasonable
similarity estimates. Non-related inputs rarely have scores
above 5.
potomac%../gsimm -
rd026c470aab28a1086403768a428358f218bba049d47e7d49f8589c2c0baca0c *.tar
55746560 gcc-2.95.1.tar 123 3.1
55797760 gcc-2.95.2.tar 112 11.8
55787520 gcc-2.95.3.tar 112 11.8
87490560 gcc-3.0.1.tar 112 11.8
88156160 gcc-3.0.2.tar 78 38.6
86630400 gcc-3.0.tar 80 37.0
132495360 gcc-3.1.tar 0 100.0
I'm mostly interested in the data storage aspects of git,
looking bottom-up at the blobs stored and deriving information
from that. My similarity estimator allows one to look at thousands
of large checked in files and quickly identify similar files.
For example, in the above case, you'd find it makes sense
to store gcc-3.1.tar as a difference from gcc-3.0.tar.
Doing an actual diff between these two archives takes a few
seconds, while the fingerprints can be compared in microseconds.
> While reorganizaing a program, a nontrivial amount of text is
> often removed from an existing file and moved to a newly created
> file. Right now, the way similarity score is calculated has a
> heuristical cap to reject two files whose sizes are very
> different, but to detect and show this kind of file split, the
> sizes of files should matter less.
The way to do this is to split a file at content-determined
breakpoints: check the last n bits of a cyclic checksum over
a sliding window, and break if they match a magic number.
This would split the file in blocks with expected size of 2^n.
Then you'd store a fingerprint per chunk.
> [...]
> Another place we use "similarity" is to break a file that got
> modified too much. This is done for two independent purposes.
This could be done directly using the given algorithm.
> [...] Usually rename/copy
> detection tries to find rename/copy into files that _disappear_
> from the result, but with the above sequence, B never
> disappears. By looking at how dissimilar the preimage and
> postimage of B are, we tell the rename/copy detector that B,
> although it does not disappear, might have been renamed/copied
> from somewhere else.
This could also be cheaply determined by my similarity estimator.
Almost always, you'd have a high similarity score. When there is
a low score, you could verify with a more precise and expensive
algorithm to have a consistent decision on what is considered
a break.
There is a -v option that gives more verbose output, including
estimated and actual average distances from the origin for the
random walks. For random input they'll be very close, but for
input with a lot of repetition the actual average will be far
larger. The ratio can be used as a measure of reliability of
the fingerprint: ratio's closer to 1 are better.
> Also we can make commonness matter even more in the similarlity
> used to "break" a file than rename detector, because if we are
> going to break it, we will not have to worry about the issue of
> showing an annoying diff that removes 100 lines after copying a
> 130-line file. This implies that the break algorithm needs to
> use two different kinds of similarity, one for breaking and then
> another for deciding how to show the broken pieces as a diff.
>
> Sorry if this write-up does not make much sense. It ended up
> being a lot more incoherent than I hoped it to be.
Regular diff algorithms will always give the most precise result.
What my similarity estimator does is give a probability that
two files have a lot of common substrings. Say, you'd have a
git archive with 10,000 blobs of about 1 MB, and you'd want
to determine how to pack this. You clearly can't use diff
programs to solve this, but you can use the estimates.
> Anyway, sometime this week I'll find time to play with your code
> myself.
Thanks, I'm looking forward to your comments.
-Geert
^ permalink raw reply
* Re: Default remote branch for local branch
From: Josef Weidendorfer @ 2006-04-14 18:26 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In-Reply-To: <20060414161627.GA27689@pasky.or.cz>
On Friday 14 April 2006 18:16, you wrote:
> Dear diary, on Sun, Apr 02, 2006 at 06:17:29PM CEST, I got a letter
> where Josef Weidendorfer <Josef.Weidendorfer@gmx.de> said that...
> > > I would write the config like this:
> > >
> > > [branch-upstream]
> > > master = linus
> > > ata-irq-pio = irq-pio
> > > ata-pata = pata-drivers
> >
> > That is not working, as said above. But with above syntax extension,
> > with s/=/for/ it would be fine.
>
> I'm sorry but I'm slow and I don't see it - why wouldn't this work?
> (Except that the key name is case insensitive, which isn't too big a
> deal IMHO.)
Hmm...
* IMHO "keys are case insensitive" is enough to not qualify for branch
names: currently, branch names are case sensitive, and with above syntax you
effectively change this rule (you can not distinguish upstreams for "master"
vs. "MASTER").
* a dot currently seems to be allowed in branch names. For config keys, the
dot separates subkeys.
* I thought it is a convention for config keys to be alphanum only,
eg. "/" isn't allowed, too (which is mandatory for branch names).
Unfortunately, I found nothing about allowed chars for config keys in the
documentation.
> I for one think that the 'for'-syntax is insane - it's unreadable (your
> primary query is by far most likely to be "what's the upstream when on
> branch X", not "what branches is this upstream for"), would convolute
> the configuration file syntax unnecessarily and would possibly also
> complicate the git-repo-config interface.
As far as I remember, the "... for ..." syntax was suggested by Linus for the
proxy.command config a long time ago. The original proposal there was to
use an URL as key part (as far as I can remember).
That said,
> Pavel's syntax is much nicer.
... I agree with you here.
My suggestion would be to allow an optional syntax in the config file which is mapped
by git-repo-config to the normalized "... for ..."-scheme.
Eg. it should not be mandatory to specify "for ..." after the value of a key.
So instead of
branch.upstream = linus for master
you should be able to say
[branch]
upstream for master = linus
Josef
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox