git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] contrib: added git-diffall
@ 2012-02-22 22:12 Tim Henigan
  2012-02-22 23:48 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Henigan @ 2012-02-22 22:12 UTC (permalink / raw)
  To: git, gitster; +Cc: davvid, stefano.lattarini, tim.henigan

This script adds directory diff support to git.  It launches a single
instance of the user-configured external diff tool and performs a
directory diff between the specified revisions. The before/after files
are copied to a tmp directory to do this.  Either 'diff.tool' or
'merge.tool' must be set before running the script.

The existing 'git difftool' command already allows the user to view diffs
using an external tool.  However if multiple files have changed, a
separate instance of the tool is launched for each one.  This can be
tedious when many files are involved.

Signed-off-by: Tim Henigan <tim.henigan@gmail.com>
---

This script has been hosted on GitHub [1] since April 2010. Enough people
have found it useful that I hope it will be considered for inclusion in
the standard git install, either in contrib or as a new core command.

Changes since v1:
    - Changed to #!/bin/sh
    - Eliminated use of 'which' statements
    - Fixed trap function to actually run on abnormal exit
    - Simplified path concatenation logic ($IFS)
    - Corrected indentation errors
    - Improved readability of while loop
    - Cleaned up quoting of variables

This matches commit 5d4b90de3 on GitHub [1].

[1]: https://github.com/thenigan/git-diffall


 contrib/diffall/README      |   18 +++
 contrib/diffall/git-diffall |  255 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 273 insertions(+)
 create mode 100644 contrib/diffall/README
 create mode 100755 contrib/diffall/git-diffall

diff --git a/contrib/diffall/README b/contrib/diffall/README
new file mode 100644
index 0000000..12881d2
--- /dev/null
+++ b/contrib/diffall/README
@@ -0,0 +1,18 @@
+The git-diffall script provides a directory based diff mechanism
+for git.  The script relies on the diff.tool configuration option
+to determine what diff viewer is used.
+
+This script is compatible with all the forms used to specify a
+range of revisions to diff:
+
+  1. git diffall: shows diff between working tree and staged changes
+  2. git diffall --cached [<commit>]: shows diff between staged changes and HEAD (or other named commit)
+  3. git diffall <commit>: shows diff between working tree and named commit
+  4. git diffall <commit> <commit>: show diff between two named commits
+  5. git diffall <commit>..<commit>: same as above
+  6. git diffall <commit>...<commit>: show the changes on the branch containing and up to the second , starting at a common ancestor of both <commit>
+
+Note: all forms take an optional path limiter [--] [<path>*]
+
+This script is based on an example provided by Thomas Rast on the Git list [1]:
+[1] http://thread.gmane.org/gmane.comp.version-control.git/124807
diff --git a/contrib/diffall/git-diffall b/contrib/diffall/git-diffall
new file mode 100755
index 0000000..d942612
--- /dev/null
+++ b/contrib/diffall/git-diffall
@@ -0,0 +1,255 @@
+#!/bin/sh
+# Copyright 2010 - 2012, Tim Henigan <tim.henigan@gmail.com>
+#
+# Perform a directory diff between commits in the repository using
+# the external diff or merge tool specified in the user's config.
+
+USAGE='[--cached] [--copy-back] [-x|--extcmd=<command>] <commit>{0,2} -- <path>*
+
+    --cached     Compare to the index rather than the working tree.
+
+    --copy-back  Copy files back to the working tree when the diff
+                 tool exits (in case they were modified by the
+                 user).  This option is only valid if the diff
+                 compared with the working tree.
+
+    -x=<command>
+    --extcmd=<command>  Specify a custom command for viewing diffs.
+                 git-diffall ignores the configured defaults and
+                 runs $command $LOCAL $REMOTE when this option is
+                 specified. Additionally, $BASE is set in the
+                 environment.
+'
+
+SUBDIRECTORY_OK=1
+. "$(git --exec-path)/git-sh-setup"
+
+TOOL_MODE=diff
+. "$(git --exec-path)/git-mergetool--lib"
+
+merge_tool="$(get_merge_tool)"
+if test -z "$merge_tool"
+then
+	echo "Error: Either the 'diff.tool' or 'merge.tool' option must be set."
+	usage
+fi
+
+start_dir=$(pwd)
+
+# needed to access tar utility
+cdup=$(git rev-parse --show-cdup) &&
+cd "$cdup" || {
+	echo >&2 "Cannot chdir to $cdup, the toplevel of the working tree"
+	exit 1
+}
+
+# mktemp is not available on all platforms (missing from msysgit)
+# Use a hard-coded tmp dir if it is not available
+tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
+	tmp=/tmp/git-diffall-tmp
+}
+
+trap 'rm -rf "$tmp" 2>/dev/null' EXIT
+mkdir -p "$tmp"
+
+left=
+right=
+paths=
+path_sep=
+compare_staged=
+common_ancestor=
+left_dir=
+right_dir=
+diff_tool=
+copy_back=
+
+while test $# != 0
+do
+	case "$1" in
+	-h|--h|--he|--hel|--help)
+		usage
+		;;
+	--cached)
+		compare_staged=1
+		;;
+	--copy-back)
+		copy_back=1
+		;;
+	-x|--e|--ex|--ext|--extc|--extcm|--extcmd)
+		diff_tool=$2
+		shift
+		;;
+	--)
+		path_sep=1
+		;;
+	-*)
+		echo Invalid option: "$1"
+		usage
+		;;
+	*)
+		# could be commit, commit range or path limiter
+		case "$1" in
+		*...*)
+			left=${1%...*}
+			right=${1#*...}
+			common_ancestor=1
+			;;
+		*..*)
+			left=${1%..*}
+			right=${1#*..}
+			;;
+		*)
+			if test -n "$path_sep"
+			then
+				paths="$paths$1 "
+			elif test -z "$left"
+			then
+				left=$1
+			elif test -z "$right"
+			then
+				right=$1
+			else
+				paths="$paths$1 "
+			fi
+			;;
+		esac
+		;;
+	esac
+	shift
+done
+
+# Determine the set of files which changed
+if test -n "$left" && test -n "$right"
+then
+	left_dir="cmt-$(git rev-parse --short $left)"
+	right_dir="cmt-$(git rev-parse --short $right)"
+
+	if test -n "$compare_staged"
+	then
+		usage
+	elif test -n "$common_ancestor"
+	then
+		git diff --name-only "$left"..."$right" -- $paths > "$tmp/filelist"
+	else
+		git diff --name-only "$left" "$right" -- $paths > "$tmp/filelist"
+	fi
+elif test -n "$left"
+then
+	left_dir="cmt-$(git rev-parse --short $left)"
+
+	if test -n "$compare_staged"
+	then
+		right_dir="staged"
+		git diff --name-only --cached "$left" -- $paths > "$tmp/filelist"
+	else
+		right_dir="working_tree"
+		git diff --name-only "$left" -- $paths > "$tmp/filelist"
+	fi
+else
+	left_dir="HEAD"
+
+	if test -n "$compare_staged"
+	then
+		right_dir="staged"
+		git diff --name-only --cached -- $paths > "$tmp/filelist"
+	else
+		right_dir="working_tree"
+		git diff --name-only -- $paths > "$tmp/filelist"
+	fi
+fi
+
+# Exit immediately if there are no diffs
+if test ! -s "$tmp/filelist"
+then
+	exit 0
+fi
+
+if test -n "$copy_back" && test "$right_dir" != "working_tree"
+then
+	echo "--copy-back is only valid when diff includes the working tree."
+	exit 1
+fi
+
+# Create the named tmp directories that will hold the files to be compared
+mkdir -p "$tmp/$left_dir" "$tmp/$right_dir"
+
+# Populate the tmp/right_dir directory with the files to be compared
+if test -n "$right"
+then
+	while read name
+	do
+		ls_list=$(git ls-tree $right $name)
+		if test -n "$ls_list"
+		then
+			mkdir -p "$tmp/$right_dir/$(dirname "$name")"
+			git show "$right":"$name" > "$tmp/$right_dir/$name" || true
+		fi
+	done < "$tmp/filelist"
+elif test -n "$compare_staged"
+then
+	while read name
+	do
+		ls_list=$(git ls-files -- $name)
+		if test -n "$ls_list"
+		then
+			mkdir -p "$tmp/$right_dir/$(dirname "$name")"
+			git show :"$name" > "$tmp/$right_dir/$name"
+		fi
+	done < "$tmp/filelist"
+else
+	# Mac users have gnutar rather than tar
+	(tar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && tar -x)) || {
+		gnutar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && gnutar -x)
+	}
+fi
+
+# Populate the tmp/left_dir directory with the files to be compared
+while read name
+do
+	if test -n "$left"
+	then
+		ls_list=$(git ls-tree $left $name)
+		if test -n "$ls_list"
+		then
+			mkdir -p "$tmp/$left_dir/$(dirname "$name")"
+			git show "$left":"$name" > "$tmp/$left_dir/$name" || true
+		fi
+	else
+		if test -n "$compare_staged"
+		then
+			ls_list=$(git ls-tree HEAD $name)
+			if test -n "$ls_list"
+			then
+				mkdir -p "$tmp/$left_dir/$(dirname "$name")"
+				git show HEAD:"$name" > "$tmp/$left_dir/$name"
+			fi
+		else
+			mkdir -p "$tmp/$left_dir/$(dirname "$name")"
+			git show :"$name" > "$tmp/$left_dir/$name"
+		fi
+	fi
+done < "$tmp/filelist"
+
+cd "$tmp"
+LOCAL="$left_dir"
+REMOTE="$right_dir"
+
+if test -n "$diff_tool"
+then
+	export BASE
+	eval $diff_tool '"$LOCAL"' '"$REMOTE"'
+else
+	run_merge_tool "$merge_tool" false
+fi
+
+# Copy files back to the working dir, if requested
+if test -n "$copy_back" && test "$right_dir" = "working_tree"
+then
+	cd "$start_dir"
+	git_top_dir=$(git rev-parse --show-toplevel)
+	find "$tmp/$right_dir" -type f |
+	while read file
+	do
+		cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
+	done
+fi
-- 
1.7.9.GIT

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-22 22:12 [PATCH v2] contrib: added git-diffall Tim Henigan
@ 2012-02-22 23:48 ` Junio C Hamano
  2012-02-23  9:56   ` Stefano Lattarini
  2012-02-23 16:07   ` Tim Henigan
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2012-02-22 23:48 UTC (permalink / raw)
  To: Tim Henigan; +Cc: git, davvid, stefano.lattarini

Tim Henigan <tim.henigan@gmail.com> writes:

> This script adds directory diff support to git.  It launches a single
> instance of the user-configured external diff tool and performs a
> directory diff between the specified revisions. The before/after files
> are copied to a tmp directory to do this.  Either 'diff.tool' or
> 'merge.tool' must be set before running the script.
>
> The existing 'git difftool' command already allows the user to view diffs
> using an external tool.  However if multiple files have changed, a
> separate instance of the tool is launched for each one.  This can be
> tedious when many files are involved.

We encourage our log messages to describe the problem first and then
present solution to the problem, so I would update the above perhaps like
this:

	The 'git difftool' command lets the user to use an external tool
	to view diffs, but it runs the tool for one file at the time. This
	makes it tedious to review a change that spans multiple files.

        The "git-diffall" script instead prepares temporary directories
        with preimage and postimage files, and launches a single instance
        of an external diff tool to view the differences in them.
        diff.tool or merge.tool configuration variable is used to specify
        what external tool is used.

I am wondering if reusing "diff.tool" or "merge.tool" is a good idea,
though.

I guess that it is OK to assume that any external tool that can compare
two directories MUST be able to compare two individual files, and if that
is true, it is perfectly fine to reuse the configuration.  But if an
external tool "frobdiff" that can compare two directories cannot compare
two individual files, it will make it impossible for the user to run "git
difftool" if diff.tool is set to "frobdiff" to use with "diffall".

Another thing that comes to my mind is if a user has an external tool that
can use "diffall", is there ever a situation where the user chooses to use
"difftool" instead, to go files one by one.  I cannot offhand imagine any.

Perhaps a longer term integration plan may be to lift the logic from this
script and make it part of "difftool", and then add a boolean variable
"difftool.<tool>.canCompareDirectory", without adding "git diffall" as a
subcommand.  The user can still run "git difftool", and when the external
tool can compare two directories, the code to populate temporary directory
(and to set the trap to clean after itself) taken from this tool will run
inside "git difftool" frontend and then the external tool to compare the
two directories is spawned.

I also think that in a yet longer term, if this mode of "instantiate two
directories to be compared, and let the external tool do the comparison"
proves useful, almost all the "interesting" work done in this script
should be made unnecessary by adding an updated "external diff interface"
on the core side, so that nobody has to hurt his brain to implement an
error-prone command line parsing logic.

In other words, this statement cannot stay true:

> +This script is compatible with all the forms used to specify a
> +range of revisions to diff:

without updating the script every time underlying "git diff" gains new way
of comparing things.  If we move the "prepare two directories, and point
an external tool at them" logic to the core, we do not have to worry about
it at all.

Besides, I do not think the script covers all the forms; "git diff -R"
support is totally missing.

But that is all two steps in the future.

> +# mktemp is not available on all platforms (missing from msysgit)
> +# Use a hard-coded tmp dir if it is not available
> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
> +	tmp=/tmp/git-diffall-tmp
> +}

It would not withstand malicious attacks, but doing

	tmp=/tmp/git-diffall-tmp.$$

would at least protect you from accidental name crashes better in the
fallback codepath.

> +
> +trap 'rm -rf "$tmp" 2>/dev/null' EXIT

Do you need to suppress errors, especially when you are running "rm -rf"
with the 'f' option?

> +mkdir -p "$tmp"
> +
> +left=
> +right=
> +paths=
> +path_sep=
> +compare_staged=
> +common_ancestor=
> +left_dir=
> +right_dir=
> +diff_tool=
> +copy_back=

You can write multiple assignment on a line to save vertical space if you
want to, and the initialization sequence like this is a good place to do
so.

> +while test $# != 0
> +do
> +	case "$1" in
> +	-h|--h|--he|--hel|--help)
> +		usage
> +		;;
> +	--cached)
> +		compare_staged=1
> +		;;
> +	--copy-back)
> +		copy_back=1
> +		;;
> +	-x|--e|--ex|--ext|--extc|--extcm|--extcmd)
> +		diff_tool=$2
> +		shift
> +		;;

What if your command line ends with -x without $2?

Don't you want to match "-t/--tool" that "difftool" already uses?

> +	--)
> +		path_sep=1
> +		;;
> +	-*)
> +		echo Invalid option: "$1"
> +		usage
> +		;;
> +	*)
> +		# could be commit, commit range or path limiter
> +		case "$1" in
> +		*...*)
> +			left=${1%...*}
> +			right=${1#*...}
> +			common_ancestor=1
> +			;;

Strictly speaking, that is not just a common_ancestor but is a merge_base,
which is a common ancestor none of whose children is a common ancestor.

> +		*..*)
> +			left=${1%..*}
> +			right=${1#*..}
> +			;;
> +		*)
> +			if test -n "$path_sep"
> +			then
> +				paths="$paths$1 "
> +			elif test -z "$left"
> +			then
> +				left=$1
> +			elif test -z "$right"
> +			then
> +				right=$1
> +			else
> +				paths="$paths$1 "
> +			fi
> +			;;
> +		esac

Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
contents of the Documentation/ directory between the named commit and
the working tree, like "diff HEAD~2 Documentation/" does.

That is not a show-stopper (a double-dash is an easy workaround), but
it is worth pointing out.

> +		;;
> +	esac
> +	shift
> +done
> +
> +# Determine the set of files which changed
> +if test -n "$left" && test -n "$right"
> +then
> +	left_dir="cmt-$(git rev-parse --short $left)"
> +	right_dir="cmt-$(git rev-parse --short $right)"
> +
> +	if test -n "$compare_staged"
> +	then
> +		usage
> +	elif test -n "$common_ancestor"
> +	then
> +		git diff --name-only "$left"..."$right" -- $paths > "$tmp/filelist"
> +	else
> +		git diff --name-only "$left" "$right" -- $paths > "$tmp/filelist"
> +	fi

And this will not work with pathspec that have $IFS characters.  If we
really wanted to we could do that by properly quoting "$1" when you build
$paths and then use eval when you run "git diff" here (look for 'sq' and
'eval' in existing scripts, e.g. "git-am.sh", if you are interested).

Also you may want to write filelist using -z format to protect yourself
from paths that contain LF, but that would require the loop "while read
name" to be rewritten.

> +# Exit immediately if there are no diffs
> +if test ! -s "$tmp/filelist"
> +then
> +	exit 0
> +fi

Ok, you have trap set already so $tmp will disappear with this exit ;-)

> +if test -n "$copy_back" && test "$right_dir" != "working_tree"
> +then
> +	echo "--copy-back is only valid when diff includes the working tree."
> +	exit 1
> +fi

I actually wondered why $right_dir needs to be populated with a copy in
the first place (if you do not copy but give the working tree itself to
the external tool, you do not even have to copy back).

I know the answer to the question, namely, "because the external tool
thinks files that are not in $left_dir are added files", but if you can
find a way to tell the external tool to ignore new files (similar to how
"diff -r" without -N works), running the tool with temporary left_dir and
the true workdir as right_dir would be a lot cleaner solution to the
problem.

> +# Create the named tmp directories that will hold the files to be compared
> +mkdir -p "$tmp/$left_dir" "$tmp/$right_dir"
> +
> +# Populate the tmp/right_dir directory with the files to be compared
> +if test -n "$right"
> +then
> +	while read name
> +	do
> +		ls_list=$(git ls-tree $right $name)
> +		if test -n "$ls_list"
> +		then
> +			mkdir -p "$tmp/$right_dir/$(dirname "$name")"
> +			git show "$right":"$name" > "$tmp/$right_dir/$name" || true
> +		fi
> +	done < "$tmp/filelist"

"while read -r name" might make this slightly more robust; even though
this loses leading and trailing whitespaces in filenames, we probably
can get away without worrying about them.

> +else
> +	# Mac users have gnutar rather than tar
> +	(tar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && tar -x)) || {
> +		gnutar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && gnutar -x)
> +	}

What is this "--ignore-failed-read" about?  Not reporting unreadable as an
error smells really bad.

If you require GNUism in your tar usage, this should be made configurable
so that people can use alternative names (some systems come with "tar"
that is POSIX and "gtar" that is GNU).

> +cd "$tmp"
> +LOCAL="$left_dir"
> +REMOTE="$right_dir"
> +
> +if test -n "$diff_tool"
> +then
> +	export BASE
> +	eval $diff_tool '"$LOCAL"' '"$REMOTE"'
> +else
> +	run_merge_tool "$merge_tool" false
> +fi
> +
> +# Copy files back to the working dir, if requested
> +if test -n "$copy_back" && test "$right_dir" = "working_tree"
> +then
> +	cd "$start_dir"
> +	git_top_dir=$(git rev-parse --show-toplevel)
> +	find "$tmp/$right_dir" -type f |
> +	while read file
> +	do
> +		cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
> +	done
> +fi

This will copy new files created in $right_dir.  Is that intended?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-22 23:48 ` Junio C Hamano
@ 2012-02-23  9:56   ` Stefano Lattarini
  2012-02-23 17:37     ` Junio C Hamano
  2012-02-23 16:07   ` Tim Henigan
  1 sibling, 1 reply; 8+ messages in thread
From: Stefano Lattarini @ 2012-02-23  9:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Tim Henigan, git, davvid

Hello everybody.  Hope you don't mind 2 cents from an outsider ...

On 02/23/2012 12:48 AM, Junio C Hamano wrote:
>
> Tim Henigan <tim.henigan@gmail.com> writes:
> 
>> +# mktemp is not available on all platforms (missing from msysgit)
>> +# Use a hard-coded tmp dir if it is not available
>> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
>> +	tmp=/tmp/git-diffall-tmp
>> +}
> 
> It would not withstand malicious attacks, but doing
> 
> 	tmp=/tmp/git-diffall-tmp.$$
> 
> would at least protect you from accidental name crashes better in the
> fallback codepath.
>
Maybe this would be enough to withstand malicious attacks (even if not
denial-of-service attacks):

  # mktemp is not available on all platforms (missing from msysgit)
   tmp=$(mktemp -d -t tmp.XXXXXX 2>/dev/null) || {
       tmp=/tmp/git-diffall-tmp.$$
       mkdir "$tmp" || fatal "couldn't create temporary directory"
  }

> 
>> +mkdir -p "$tmp"
>
At which point this should be removed, of course.

Regards,
  Stefano

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-22 23:48 ` Junio C Hamano
  2012-02-23  9:56   ` Stefano Lattarini
@ 2012-02-23 16:07   ` Tim Henigan
  2012-02-23 19:02     ` Junio C Hamano
  2012-04-10 23:06     ` Matt McClure
  1 sibling, 2 replies; 8+ messages in thread
From: Tim Henigan @ 2012-02-23 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, davvid, stefano.lattarini

On Wed, Feb 22, 2012 at 6:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Tim Henigan <tim.henigan@gmail.com> writes:
>
> We encourage our log messages to describe the problem first and then
> present solution to the problem, so I would update the above perhaps like
> this:
>
>        The 'git difftool' command lets the user to use an external tool
>        to view diffs, but it runs the tool for one file at the time. This
>        makes it tedious to review a change that spans multiple files.
>
>        The "git-diffall" script instead prepares temporary directories
>        with preimage and postimage files, and launches a single instance
>        of an external diff tool to view the differences in them.
>        diff.tool or merge.tool configuration variable is used to specify
>        what external tool is used.

Understood.  I will update in v3.


> I am wondering if reusing "diff.tool" or "merge.tool" is a good idea,
> though.
>
> I guess that it is OK to assume that any external tool that can compare
> two directories MUST be able to compare two individual files, and if that
> is true, it is perfectly fine to reuse the configuration.  But if an
> external tool "frobdiff" that can compare two directories cannot compare
> two individual files, it will make it impossible for the user to run "git
> difftool" if diff.tool is set to "frobdiff" to use with "diffall".
>
> Another thing that comes to my mind is if a user has an external tool that
> can use "diffall", is there ever a situation where the user chooses to use
> "difftool" instead, to go files one by one.  I cannot offhand imagine any.

It was my assumption that any tool that supports directory diff also
supports individual file diff.  It seems like a strict subset of
directory diff case.


> Perhaps a longer term integration plan may be to lift the logic from this
>
...snip...
>
> But that is all two steps in the future.

I hope that this feature finds it way into the existing core commands.
 This script is intended to be a conversation starter that is also
immediately useful as a separate command.  Would it be better to begin
the long-term discussion now and skip adding this to contrib?


>> +# mktemp is not available on all platforms (missing from msysgit)
>> +# Use a hard-coded tmp dir if it is not available
>> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
>> +     tmp=/tmp/git-diffall-tmp
>> +}
>
> It would not withstand malicious attacks, but doing
>
>        tmp=/tmp/git-diffall-tmp.$$
>
> would at least protect you from accidental name crashes better in the
> fallback codepath.

This makes sense.  I will add a unique portion to the name of the tmp dir in v3.


>> +trap 'rm -rf "$tmp" 2>/dev/null' EXIT
>
> Do you need to suppress errors, especially when you are running "rm -rf"
> with the 'f' option?

On msysgit, I found that "rm -rf $tmp" consistently fails due with a
permission error.  I don't understand why the script that created the
tmp dir is not allowed to delete it.  I am still looking into it, but
so far it appears to be an idiosyncrasy of msysgit.


>> +left=
>> +right=
>> +paths=
>> +path_sep=
>> +compare_staged=
>> +common_ancestor=
>> +left_dir=
>> +right_dir=
>> +diff_tool=
>> +copy_back=
>
> You can write multiple assignment on a line to save vertical space if you
> want to, and the initialization sequence like this is a good place to do
> so.

My personal preference is to keep them on separate lines.  However if
the compressed style is preferred, I will change it.


>> +     -x|--e|--ex|--ext|--extc|--extcm|--extcmd)
>> +             diff_tool=$2
>> +             shift
>> +             ;;
>
> What if your command line ends with -x without $2?

Currently it results in a shift error with no useful message to the
user.  I will add something for this in v3.


> Don't you want to match "-t/--tool" that "difftool" already uses?

Are you suggesting that I a) change "-x/--extcmd" to "-t/--tool" or
that b) I add support for the "difftool -t/--tool" option?

If "a", I was reusing the "difftool --extcmd" option which has the
same behavior.  If "b", I will look into it.


>> +             # could be commit, commit range or path limiter
>> +             case "$1" in
>> +             *...*)
>> +                     left=${1%...*}
>> +                     right=${1#*...}
>> +                     common_ancestor=1
>> +                     ;;
>
> Strictly speaking, that is not just a common_ancestor but is a merge_base,
> which is a common ancestor none of whose children is a common ancestor.

Understood.  I will change the name to "merge_base" in v3.


>> +             *..*)
>> +                     left=${1%..*}
>> +                     right=${1#*..}
>> +                     ;;
>> +             *)
>> +                     if test -n "$path_sep"
>> +                     then
>> +                             paths="$paths$1 "
>> +                     elif test -z "$left"
>> +                     then
>> +                             left=$1
>> +                     elif test -z "$right"
>> +                     then
>> +                             right=$1
>> +                     else
>> +                             paths="$paths$1 "
>> +                     fi
>> +                     ;;
>> +             esac
>
> Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
> contents of the Documentation/ directory between the named commit and
> the working tree, like "diff HEAD~2 Documentation/" does.
>
> That is not a show-stopper (a double-dash is an easy workaround), but
> it is worth pointing out.

So I would need something to determine if a string represents a
commit/tag/branch or a path?.  I presume it would need to handle the
corner case where a branch/tag and path have the same name.  Is there
anything like this in the mergetool lib scripts today?


>> +     then
>> +             git diff --name-only "$left"..."$right" -- $paths > "$tmp/filelist"
>> +     else
>> +             git diff --name-only "$left" "$right" -- $paths > "$tmp/filelist"
>> +     fi
>
> And this will not work with pathspec that have $IFS characters.  If we
> really wanted to we could do that by properly quoting "$1" when you build
> $paths and then use eval when you run "git diff" here (look for 'sq' and
> 'eval' in existing scripts, e.g. "git-am.sh", if you are interested).
>
> Also you may want to write filelist using -z format to protect yourself
> from paths that contain LF, but that would require the loop "while read
> name" to be rewritten.

I just discovered that the script fails to handle files that have
spaces in their name, so some further work is needed.


>> +# Exit immediately if there are no diffs
>> +if test ! -s "$tmp/filelist"
>> +then
>> +     exit 0
>> +fi
>
> Ok, you have trap set already so $tmp will disappear with this exit ;-)

I'll try not to be such a slow learner in the future...but no guarantees.


>> +if test -n "$copy_back" && test "$right_dir" != "working_tree"
>> +then
>> +     echo "--copy-back is only valid when diff includes the working tree."
>> +     exit 1
>> +fi
>
> I actually wondered why $right_dir needs to be populated with a copy in
> the first place (if you do not copy but give the working tree itself to
> the external tool, you do not even have to copy back).
>
> I know the answer to the question, namely, "because the external tool
> thinks files that are not in $left_dir are added files", but if you can
> find a way to tell the external tool to ignore new files (similar to how
> "diff -r" without -N works), running the tool with temporary left_dir and
> the true workdir as right_dir would be a lot cleaner solution to the
> problem.

I'll note this as "for future consideration".  I spent some time
trying this is the original implementation, but could not find a
workable solution in the time I had available.


>> +     while read name
>> +     do
>> +             ls_list=$(git ls-tree $right $name)
>> +             if test -n "$ls_list"
>> +             then
>> +                     mkdir -p "$tmp/$right_dir/$(dirname "$name")"
>> +                     git show "$right":"$name" > "$tmp/$right_dir/$name" || true
>> +             fi
>> +     done < "$tmp/filelist"
>
> "while read -r name" might make this slightly more robust; even though
> this loses leading and trailing whitespaces in filenames, we probably
> can get away without worrying about them.
>
>> +else
>> +     # Mac users have gnutar rather than tar
>> +     (tar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && tar -x)) || {
>> +             gnutar --ignore-failed-read -c -T "$tmp/filelist" | (cd "$tmp/$right_dir" && gnutar -x)
>> +     }
>
> What is this "--ignore-failed-read" about?  Not reporting unreadable as an
> error smells really bad.

If a file was added or deleted between the two commits being compared,
tar would fail because a file was missing from "$tmp/filelist".  The
"--ignore-failed-read" prevents tar from halting the script in this
case.


> If you require GNUism in your tar usage, this should be made configurable
> so that people can use alternative names (some systems come with "tar"
> that is POSIX and "gtar" that is GNU).

Is there an example showing how this could be configurable?  The
problem is that the "--ignore-failed-read" was not supported in all
flavors of tar.


>> +# Copy files back to the working dir, if requested
>> +if test -n "$copy_back" && test "$right_dir" = "working_tree"
>> +then
>> +     cd "$start_dir"
>> +     git_top_dir=$(git rev-parse --show-toplevel)
>> +     find "$tmp/$right_dir" -type f |
>> +     while read file
>> +     do
>> +             cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
>> +     done
>> +fi
>
> This will copy new files created in $right_dir.  Is that intended?

hmmm...that was not intended.  If would be odd for the user to create
new files in this tmp directory, but if the diff tool automatically
generates any files then this could result in unwanted files.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-23  9:56   ` Stefano Lattarini
@ 2012-02-23 17:37     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2012-02-23 17:37 UTC (permalink / raw)
  To: Stefano Lattarini; +Cc: Tim Henigan, git, davvid

Stefano Lattarini <stefano.lattarini@gmail.com> writes:

>>> +# mktemp is not available on all platforms (missing from msysgit)
>>> +# Use a hard-coded tmp dir if it is not available
>>> +tmp="$(mktemp -d -t tmp.XXXXXX 2>/dev/null)" || {
>>> +	tmp=/tmp/git-diffall-tmp
>>> +}
>>  ...
>   # mktemp is not available on all platforms (missing from msysgit)
>    tmp=$(mktemp -d -t tmp.XXXXXX 2>/dev/null) || {
>        tmp=/tmp/git-diffall-tmp.$$
>        mkdir "$tmp" || fatal "couldn't create temporary directory"
>   }
>
>>> +mkdir -p "$tmp"
>>
> At which point this should be removed, of course.

Good eyes; thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-23 16:07   ` Tim Henigan
@ 2012-02-23 19:02     ` Junio C Hamano
  2012-04-10 23:06     ` Matt McClure
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2012-02-23 19:02 UTC (permalink / raw)
  To: Tim Henigan; +Cc: git, davvid, stefano.lattarini

Tim Henigan <tim.henigan@gmail.com> writes:

> On Wed, Feb 22, 2012 at 6:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Another thing that comes to my mind is if a user has an external tool that
>> can use "diffall", is there ever a situation where the user chooses to use
>> "difftool" instead, to go files one by one. I cannot offhand imagine any.
>
> It was my assumption that any tool that supports directory diff also
> supports individual file diff.  It seems like a strict subset of
> directory diff case.
> ...
>> Perhaps a longer term integration plan may be to lift the logic from this
>>
> ...snip...
>>
>> But that is all two steps in the future.
>
> I hope that this feature finds it way into the existing core commands.
> This script is intended to be a conversation starter that is also
> immediately useful as a separate command.  Would it be better to begin
> the long-term discussion now and skip adding this to contrib?

I would envision we have this in contrib/ first, without even fixing the
whitespace-in-pathspec and whitespace-or-lf-in-paths issues I pointed out
in my review, and let people play with it.

My crystal ball tells an optimist in me that we will see people (you and
others) try to fix issues they hit in their real life use cases, and the
script will be improved while it is still in contrib/.  And then somebody
who has worked on difftool will step up and roll it into difftool proper,
along the lines I hinted in the message you are responding to, at which
point the script will be removed from contrib/.  That is the first step in
the future.

The second step in the future may or may not come.  It will involve adding
an updated external diff interface on the core side and would prepare the
two temporary directories before the core side calls the difftool among
other things, and when that happens, we can lose most of the code in this
script that the first step in the future may have ported into difftool.

>> You can write multiple assignment on a line to save vertical space if you
>> want to, and the initialization sequence like this is a good place to do
>> so.
> My personal preference is to keep them on separate lines.  However if
> the compressed style is preferred, I will change it.

"I wouldn't bother" was what I meant by "if you want to".

>>> +     -x|--e|--ex|--ext|--extc|--extcm|--extcmd)
> ...
>> Don't you want to match "-t/--tool" that "difftool" already uses?
>
> Are you suggesting that I a) change "-x/--extcmd" to "-t/--tool" or
> that b) I add support for the "difftool -t/--tool" option?

No, I just misread the set of options "difftool" takes, without realizing
that --extcmd and --tool are two different things, and it is correct to
call your option "--extcmd" if it specifies what corresponds to what
"difftool --extcmd" specifies.  Sorry for the confusion.

>> Hrm, so "diffall HEAD~2 Documentation/" is not the way to compare the
>> contents of the Documentation/ directory between the named commit and
>> the working tree, like "diff HEAD~2 Documentation/" does.
>>
>> That is not a show-stopper (a double-dash is an easy workaround), but
>> it is worth pointing out.
>
> So I would need something to determine if a string represents a
> commit/tag/branch or a path?.

"It does not have to be fixed in the first version" (aka "for future
consideration") was what I meant by "not a show-stopper".

>> And this will not work with pathspec that have $IFS characters.  If we
>> really wanted to we could do that by properly quoting "$1" when you build
>> $paths and then use eval when you run "git diff" here (look for 'sq' and
>> 'eval' in existing scripts, e.g. "git-am.sh", if you are interested).
>>
>> Also you may want to write filelist using -z format to protect yourself
>> from paths that contain LF, but that would require the loop "while read
>> name" to be rewritten.
>
> I just discovered that the script fails to handle files that have
> spaces in their name, so some further work is needed.

Again, "It does not have to be fixed in the first version" (aka "for
future consideration") was what I meant by "If we really wanted to".

>> What is this "--ignore-failed-read" about? Not reporting unreadable as an
>> error smells really bad.
>
> If a file was added or deleted between the two commits being compared,
> tar would fail because a file was missing from "$tmp/filelist".  The
> "--ignore-failed-read" prevents tar from halting the script in this
> case.

But it will also ignore errors coming from other causes, no?  Wouldn't we
rather see an error if tar fails to read from a path that *has to* exist
in the working tree because "diff" said it does?

Again, it is just "for future consideration".

>> If you require GNUism in your tar usage, this should be made configurable
>> so that people can use alternative names (some systems come with "tar"
>> that is POSIX and "gtar" that is GNU).
>
> Is there an example showing how this could be configurable?  The
> problem is that the "--ignore-failed-read" was not supported in all
> flavors of tar.

Grep for "$TAR" and also @@DIFF@@ in the Makefile, and add substitution
for @@TAR@@ in cmd_munge_script, perhaps?

By the way, I actually have an even more radical suggestion that may let
you get rid of most of the lines in your script.

If you tweak the usage so that "diffall" specific options *MUST* come
first, e.g.

	USAGE=[--copy-back] [-x <cmd>] <arguments for diff>

then you can parse your argument partially, i.e.

	copy_back= extcmd=
	while case "$#,$1" in 0,*) break ;; *,-*) ;; *) break ;; esac
        do
		case "$1" in
                --copy-back)
                	copy_back=true
			;;
                -x | --extcmd)
			test $# != 1 || usage
			extcmd=$2
                        shift
			;;
		*)
                	break
                        ;;
		esac
                shift
	done

then feed the remainder all to "diff", e.g.

	diff --raw --no-abbrev "$@" |

And then you can prepare two temporary index files and stuff the output in
them, by having something like this on the downstream of the pipe:

	while read -r lmode rmode lsha1 rsha1 status path
        do
        	if test "$lmode" != $null_mode
		then
                	GIT_INDEX_FILE=$tmp.left_index \
                        git update-index --add --cacheinfo $lmode $lsha1 $path
		fi
        	if test "$rmode" != $null_mode
		then
                	GIT_INDEX_FILE=$tmp.right_index \
                        git update-index --add --cacheinfo $rmode $rsha1 $path
		fi
	done

Side Note:
	In the production version, you would probably give the "-z" option
	to "diff", and write this loop in Perl so that you can cope better
	with funny characters in the path.  Instead of running two
	instances of "git update-index" for every path, the loop would
	group the entries for left and right side, and drive one instance
	of "git update-index --index-info" each to populate the two index
	files.

	Also the above needs to be adjusted to deal with the side that
	represents the working tree files; they are reported with $null_sha1
	so in such a case instead of putting it in the temporary index,
        you would copy the working tree file to the temporary location.

After you prepare these two temporary index files, you can then use them
to populate your left_dir and right_dir like this:

	GIT_DIR=$(git rev-parse --git-dir) \
	GIT_WORK_TREE=$left_dir \
        GIT_INDEX_FILE=$tmp.left_index \
        git checkout-index -a

With this, you do not have to worry about anything about the funny
combinations of where the two "directories" comes from when preparing
the temporary directories to be compared.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-02-23 16:07   ` Tim Henigan
  2012-02-23 19:02     ` Junio C Hamano
@ 2012-04-10 23:06     ` Matt McClure
  2012-04-11  8:38       ` David Aguilar
  1 sibling, 1 reply; 8+ messages in thread
From: Matt McClure @ 2012-04-10 23:06 UTC (permalink / raw)
  To: git

Tim Henigan <tim.henigan <at> gmail.com> writes:

> >> +     do
> >> +             cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
> >> +     done
> >> +fi
> >
> > This will copy new files created in $right_dir.  Is that intended?
> 
> hmmm...that was not intended.  If would be odd for the user to create
> new files in this tmp directory, but if the diff tool automatically
> generates any files then this could result in unwanted files.

I think more generally, I would prefer if either side of the comparison is the
working copy that the temp directory on that side be populated with symlinks.

A particularly bad failure mode of the copy-back approach is:

git diffall --copy-back
# while my diffall tool is running, I edit the file somewhere else.
# quit my diffall tool
# --> my edits in the other tool are overwritten by diffall

Editing the files in place via symlinks would resolve that.

Matt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] contrib: added git-diffall
  2012-04-10 23:06     ` Matt McClure
@ 2012-04-11  8:38       ` David Aguilar
  0 siblings, 0 replies; 8+ messages in thread
From: David Aguilar @ 2012-04-11  8:38 UTC (permalink / raw)
  To: Matt McClure; +Cc: git

On Tue, Apr 10, 2012 at 4:06 PM, Matt McClure <matthewlmcclure@gmail.com> wrote:
> Tim Henigan <tim.henigan <at> gmail.com> writes:
>
>> >> +     do
>> >> +             cp "$file" "$git_top_dir/${file#$tmp/$right_dir/}"
>> >> +     done
>> >> +fi
>> >
>> > This will copy new files created in $right_dir.  Is that intended?
>>
>> hmmm...that was not intended.  If would be odd for the user to create
>> new files in this tmp directory, but if the diff tool automatically
>> generates any files then this could result in unwanted files.
>
> I think more generally, I would prefer if either side of the comparison is the
> working copy that the temp directory on that side be populated with symlinks.
>
> A particularly bad failure mode of the copy-back approach is:
>
> git diffall --copy-back
> # while my diffall tool is running, I edit the file somewhere else.
> # quit my diffall tool
> # --> my edits in the other tool are overwritten by diffall
>
> Editing the files in place via symlinks would resolve that.

I had a similar idea but didn't mention it because Windows came to
mind.  I always want to say, "darn it, this code would be so much
easier if we could just ignore Windows", but that's not very helpful.

I'd be happy with a runtime platform check where the copy back is only
done on Windows.  Everyone else can enjoy symlinks.

Reading between the lines that could be interpreted as, "well, that
copy back code is no good and *we* don't want to use it, but it's okay
for Windows users", which is slightly dangerous because we'd always be
running the symlink code path and wouldn't hit problems with the other
path.

So I'm torn.  I think symlinks are a great idea, but Windows drives us
towards the less-than-ideal solution.  I want the best solution
possible.  Do we just accept that the copy-back code is simply the
cost of supporting Windows and keep both code paths around?  I would
not be opposed to that if the result is a more robust user experience.
-- 
David

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-04-11  8:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-22 22:12 [PATCH v2] contrib: added git-diffall Tim Henigan
2012-02-22 23:48 ` Junio C Hamano
2012-02-23  9:56   ` Stefano Lattarini
2012-02-23 17:37     ` Junio C Hamano
2012-02-23 16:07   ` Tim Henigan
2012-02-23 19:02     ` Junio C Hamano
2012-04-10 23:06     ` Matt McClure
2012-04-11  8:38       ` David Aguilar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).