git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug report : bad filter-branch (OSX only)
@ 2015-04-26  9:25 Olivier ROLAND
  2015-04-28  5:55 ` Jeff King
  2015-04-29 14:42 ` Roberto Tyley
  0 siblings, 2 replies; 12+ messages in thread
From: Olivier ROLAND @ 2015-04-26  9:25 UTC (permalink / raw)
  To: git

Hello,

Seem to be a bug.

OSX 10.10.3 git 2.3.6 HFS+ case-sensitive

How to reproduce :
Step 1 : git clone https://github.com/begeric/FastParsers.git
Step 2 : cd FastParsers/
Step 3 : git filter-branch --env-filter 'if [ 0 = 1 ]; then echo 0; fi' -- --all

Result on OSX :
Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
Ref 'refs/heads/experiment' was rewritten
Ref 'refs/remotes/origin/experiment' was rewritten
WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
Ref 'refs/remotes/origin/master' was rewritten

Result on Debian :
Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
WARNING: Ref 'refs/heads/experiment' is unchanged
WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged

Do you have any thoughts on this ?

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-26  9:25 Bug report : bad filter-branch (OSX only) Olivier ROLAND
@ 2015-04-28  5:55 ` Jeff King
  2015-04-28 11:02   ` Olivier ROLAND
  2015-04-29 14:42 ` Roberto Tyley
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-04-28  5:55 UTC (permalink / raw)
  To: Olivier ROLAND; +Cc: git

On Sun, Apr 26, 2015 at 11:25:52AM +0200, Olivier ROLAND wrote:

> OSX 10.10.3 git 2.3.6 HFS+ case-sensitive
> 
> How to reproduce :
> Step 1 : git clone https://github.com/begeric/FastParsers.git
> Step 2 : cd FastParsers/
> Step 3 : git filter-branch --env-filter 'if [ 0 = 1 ]; then echo 0; fi' -- --all
> 
> Result on OSX :
> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
> Ref 'refs/heads/experiment' was rewritten
> Ref 'refs/remotes/origin/experiment' was rewritten
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> Ref 'refs/remotes/origin/master' was rewritten
> 
> Result on Debian :
> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
> WARNING: Ref 'refs/heads/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/master' is unchanged
> 
> Do you have any thoughts on this ?

Weird. Did you build both versions of git from source (that is, there's
no question that the OS X one is a hacked-up Apple git or something)?

Presumably it's some incompatibility in the shells used. What does:

  head -1 "$(git --exec-path)/git-filter-branch"

say about the shell in use on each system? Does running that shell with
"--version" report anything useful?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-28  5:55 ` Jeff King
@ 2015-04-28 11:02   ` Olivier ROLAND
  2015-04-29  4:39     ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Olivier ROLAND @ 2015-04-28 11:02 UTC (permalink / raw)
  Cc: git

2015-04-28 7:55 GMT+02:00 Jeff King <peff@peff.net>:
> On Sun, Apr 26, 2015 at 11:25:52AM +0200, Olivier ROLAND wrote:
>
>> OSX 10.10.3 git 2.3.6 HFS+ case-sensitive
>>
>> How to reproduce :
>> Step 1 : git clone https://github.com/begeric/FastParsers.git
>> Step 2 : cd FastParsers/
>> Step 3 : git filter-branch --env-filter 'if [ 0 = 1 ]; then echo 0; fi' -- --all
>>
>> Result on OSX :
>> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
>> Ref 'refs/heads/experiment' was rewritten
>> Ref 'refs/remotes/origin/experiment' was rewritten
>> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
>> Ref 'refs/remotes/origin/master' was rewritten
>>
>> Result on Debian :
>> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
>> WARNING: Ref 'refs/heads/experiment' is unchanged
>> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
>> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
>> WARNING: Ref 'refs/remotes/origin/master' is unchanged
>>
>> Do you have any thoughts on this ?
>
> Weird. Did you build both versions of git from source (that is, there's
> no question that the OS X one is a hacked-up Apple git or something)?
>
> Presumably it's some incompatibility in the shells used. What does:
>
>   head -1 "$(git --exec-path)/git-filter-branch"
>
> say about the shell in use on each system? Does running that shell with
> "--version" report anything useful?
>
> -Peff

Hi,

Both versions are builded from source.
head -1 "$(git --exec-path)/git-filter-branch"
#!/bin/sh

sh --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Copyright (C) 2007 Free Software Foundation, Inc.

/bin/bash --version
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)

The bug seem really git related.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-28 11:02   ` Olivier ROLAND
@ 2015-04-29  4:39     ` Jeff King
  2015-04-29  4:56       ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-04-29  4:39 UTC (permalink / raw)
  To: Olivier ROLAND; +Cc: git

On Tue, Apr 28, 2015 at 01:02:17PM +0200, Olivier ROLAND wrote:

> Both versions are builded from source.
> head -1 "$(git --exec-path)/git-filter-branch"
> #!/bin/sh
> 
> sh --version
> GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
> Copyright (C) 2007 Free Software Foundation, Inc.
> 
> /bin/bash --version
> GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)
> 
> The bug seem really git related.

Yes, but I guessed it might be part of the filter-branch shell script
that behaves differently under two different shells (i.e., that we used
some unportable construct). However, I built bash 3.2.57 on my Linux box
and could not replicate the problem.

The other "usual" thing that causes bugs to show up on OS X but not
Linux is case-folding. But you said you are using a case-sensitive
filesystem, so it's probably not that.

So I can't figure out how to replicate the problem here.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29  4:39     ` Jeff King
@ 2015-04-29  4:56       ` Jeff King
  2015-04-29  5:39         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-04-29  4:56 UTC (permalink / raw)
  To: Olivier ROLAND; +Cc: git

On Wed, Apr 29, 2015 at 12:39:47AM -0400, Jeff King wrote:

> So I can't figure out how to replicate the problem here.

Actually, that's not quite true. I could get hold of an OS X system to
replicate, which I just did.

The problem is that commit 3b754f212 does not have a newline at the end
of its commit message, and the OS X version of sed doesn't preserve
that.

Here's a much smaller reproduction recipe:

  git init
  echo content >file
  git add file
  tree=$(git write-tree)
  commit=$(printf 'no newline' | git commit-tree $tree)
  git update-ref HEAD $commit
  git filter-branch

On my Linux system, this results in an unchanged history, but on OS X,
the commit is rewritten to have a newline at the end of the commit
message.

The culprit is this line from git-filter-branch:

        sed -e '1,/^$/d' <../commit | \
                eval "$filter_msg" > ../message ||
                        die "msg filter failed: $filter_msg"

The "sed" command silently appends an extra newline to the final line of
the message.  You can see the sed behavior more directly with:

  printf foo | sed -ne 1p

which adds a newline on OS X, but not when using GNU sed on Linux. It
looks like OS X has just BSD sed, so the same behavior probably happens
on FreeBSD and elsewhere.

I'm not sure of a solution short of replacing the use of sed here with
something else. perl would be a simple choice, but filter-branch does
not otherwise depend on it. We could use a shell "read" loop, but those
are quite slow (and filter-branch is slow enough as it is!).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29  4:56       ` Jeff King
@ 2015-04-29  5:39         ` Junio C Hamano
  2015-04-29 15:48           ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2015-04-29  5:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Olivier ROLAND, git

Jeff King <peff@peff.net> writes:

> I'm not sure of a solution short of replacing the use of sed here with
> something else. perl would be a simple choice, but filter-branch does
> not otherwise depend on it. We could use a shell "read" loop, but those
> are quite slow (and filter-branch is slow enough as it is!).

You need to only skip the header part, right?
I would imagine that

(
	while read x && test -n "$x"
        do
        	:;
	done
	cat
) <../commit | eval "$filter_msg"

would not spin too much in shell loop, perhaps?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-26  9:25 Bug report : bad filter-branch (OSX only) Olivier ROLAND
  2015-04-28  5:55 ` Jeff King
@ 2015-04-29 14:42 ` Roberto Tyley
  1 sibling, 0 replies; 12+ messages in thread
From: Roberto Tyley @ 2015-04-29 14:42 UTC (permalink / raw)
  To: Olivier ROLAND; +Cc: git@vger.kernel.org

As an aside, if you're a Scala dev
(https://github.com/begeric/FastParsers is a scala library), you might
find it fun to play with https://rtyley.github.io/bfg-repo-cleaner/ -
you could probably write some scala (eg a custom BFG
CommitNodeCleaner) that would do whatever it is you want filter-branch
to do.

Roberto

On 26 April 2015 at 10:25, Olivier ROLAND <cyrus-dev@edla.org> wrote:
> Hello,
>
> Seem to be a bug.
>
> OSX 10.10.3 git 2.3.6 HFS+ case-sensitive
>
> How to reproduce :
> Step 1 : git clone https://github.com/begeric/FastParsers.git
> Step 2 : cd FastParsers/
> Step 3 : git filter-branch --env-filter 'if [ 0 = 1 ]; then echo 0; fi' -- --all
>
> Result on OSX :
> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
> Ref 'refs/heads/experiment' was rewritten
> Ref 'refs/remotes/origin/experiment' was rewritten
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> Ref 'refs/remotes/origin/master' was rewritten
>
> Result on Debian :
> Rewrite 65df7c5ac1ed956252b07b8c911ad7eba0a15c2b (206/206)
> WARNING: Ref 'refs/heads/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/experiment' is unchanged
> WARNING: Ref 'refs/remotes/origin/master' is unchanged
>
> Do you have any thoughts on this ?
>
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29  5:39         ` Junio C Hamano
@ 2015-04-29 15:48           ` Jeff King
  2015-04-29 16:30             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-04-29 15:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Olivier ROLAND, git

On Tue, Apr 28, 2015 at 10:39:44PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm not sure of a solution short of replacing the use of sed here with
> > something else. perl would be a simple choice, but filter-branch does
> > not otherwise depend on it. We could use a shell "read" loop, but those
> > are quite slow (and filter-branch is slow enough as it is!).
> 
> You need to only skip the header part, right?
> I would imagine that
> 
> (
> 	while read x && test -n "$x"
>         do
>         	:;
> 	done
> 	cat
> ) <../commit | eval "$filter_msg"
> 
> would not spin too much in shell loop, perhaps?

Yeah, that is not too bad. Probably we want "read -r", just in case of
weirdness in the header lines (and that's in POSIX, and we use it
in other scripts, so it should be portable enough). And we can save a
subshell if we don't mind the potential variable-name conflict.

Here's what I came up with.

-- >8 --
Subject: filter-branch: avoid passing commit message through sed

On some systems (like OS X), if sed encounters input without
a trailing newline, it will silently add it. As a result,
"git filter-branch" on such systems may silently rewrite
commit messages that omit a trailing newline. Even though
this is not something we generate ourselves with "git
commit", it's better for filter-branch to preserve the
original data as closely as possible.

We're using sed here only to strip the header fields from
the commit object. We can accomplish the same thing with a
shell loop. Since shell "read" calls are slow (usually one
syscall per byte), we use "cat" once we've skipped past the
header. Depending on the size of your commit messages, this
is probably faster (you pay the cost to fork, but then read
the data in saner-sized chunks). This idea is shamelessly
stolen from Junio.

Signed-off-by: Jeff King <peff@peff.net>
---
I confirmed the test fixes things on the OS X box I have access to.

The "probably faster" above is of course hand-waving. On my system
starting "cat" takes only about 40 syscalls, so that would naively imply
it's a win in for all but the shortest messages. But of course "fork()"
is a much more expensive syscall than "read()".

 git-filter-branch.sh     | 10 +++++++++-
 t/t7003-filter-branch.sh | 10 ++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index e6e99f5..5b3f63d 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -346,7 +346,15 @@ while read commit parents; do
 				die "parent filter failed: $filter_parent"
 	fi
 
-	sed -e '1,/^$/d' <../commit | \
+	{
+		while read -r header_line && test -n "$header_line"
+		do
+			# skip header lines...
+			:;
+		done
+		# and output the actual commit message
+		cat
+	} <../commit |
 		eval "$filter_msg" > ../message ||
 			die "msg filter failed: $filter_msg"
 	workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 66643e4..855afda 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -394,4 +394,14 @@ test_expect_success 'replace submodule revision' '
 	test $orig_head != `git show-ref --hash --head HEAD`
 '
 
+test_expect_success 'filter commit message without trailing newline' '
+	git reset --hard original &&
+	commit=$(printf "no newline" | git commit-tree HEAD^{tree}) &&
+	git update-ref refs/heads/no-newline $commit &&
+	git filter-branch -f refs/heads/no-newline &&
+	echo $commit >expect &&
+	git rev-parse refs/heads/no-newline >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.4.0.rc3.477.gc25258d

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29 15:48           ` Jeff King
@ 2015-04-29 16:30             ` Junio C Hamano
  2015-04-29 16:43               ` Jeff King
  2015-04-29 18:17               ` Johannes Sixt
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2015-04-29 16:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Olivier ROLAND, git

Jeff King <peff@peff.net> writes:

> On Tue, Apr 28, 2015 at 10:39:44PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > I'm not sure of a solution short of replacing the use of sed here with
>> > something else. perl would be a simple choice, but filter-branch does
>> > not otherwise depend on it. We could use a shell "read" loop, but those
>> > are quite slow (and filter-branch is slow enough as it is!).
>> 
>> You need to only skip the header part, right?
>> I would imagine that
>> 
>> (
>> 	while read x && test -n "$x"
>>         do
>>         	:;
>> 	done
>> 	cat
>> ) <../commit | eval "$filter_msg"
>> 
>> would not spin too much in shell loop, perhaps?
>
> Yeah, that is not too bad. Probably we want "read -r", just in case of
> weirdness in the header lines (and that's in POSIX, and we use it
> in other scripts, so it should be portable enough). And we can save a
> subshell if we don't mind the potential variable-name conflict.

As all we care about is "have we hit an empty line", I do not think "-r"
really matters, but it would not hurt.

As to s/()/{}/, please tell me what I am doing wrong.  I am getting
the same process IDs from all of the $$s and the only difference
seems to be variable clobbering.

-- >8 --
#!/bin/sh

cat >/var/tmp/tester <<EOF || exit
a
b

c
d
EOF


x=foo
echo "My id is $$"
(
	echo "inside paren $$"
	while read x && test -n "$x"
	do
		:;
	done
	cat
) </var/tmp/tester
echo "x=<$x>"

x=foo
{
	echo "inside brace $$"
	while read x && test -n "$x"
	do
		:;
	done
	cat
} </var/tmp/tester
echo "x=<$x>"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29 16:30             ` Junio C Hamano
@ 2015-04-29 16:43               ` Jeff King
  2015-04-29 17:13                 ` Junio C Hamano
  2015-04-29 18:17               ` Johannes Sixt
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2015-04-29 16:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Olivier ROLAND, git

On Wed, Apr 29, 2015 at 09:30:00AM -0700, Junio C Hamano wrote:

> >> (
> >> 	while read x && test -n "$x"
> >>         do
> >>         	:;
> >> 	done
> >> 	cat
> >> ) <../commit | eval "$filter_msg"
> >> 
> >> would not spin too much in shell loop, perhaps?
> >
> > Yeah, that is not too bad. Probably we want "read -r", just in case of
> > weirdness in the header lines (and that's in POSIX, and we use it
> > in other scripts, so it should be portable enough). And we can save a
> > subshell if we don't mind the potential variable-name conflict.
> 
> As all we care about is "have we hit an empty line", I do not think "-r"
> really matters, but it would not hurt.

I think something like:

  author ...
  committer ...
  encoding foo\

  this is the real commit message

would behave incorrectly without "-r". I would be shocked if that ever
happens in real life, but I think it doesn't hurt to be careful.

> As to s/()/{}/, please tell me what I am doing wrong.  I am getting
> the same process IDs from all of the $$s and the only difference
> seems to be variable clobbering.

$$ is always the pid of the main shell process, even in a subshell. If
your shell is bash, it provides $BASHPID which can tell the difference
(if you put $BASHPID in your test script, it does show that we fork for
the subshell).

On Linux, you can also test with "strace -fce clone". Interestingly,
dash produces one fewer fork than bash on your test script, but I didn't
track down the exact difference. But I can imagine a shell that is smart
enough to realize a fork is not required in this instance.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29 16:43               ` Jeff King
@ 2015-04-29 17:13                 ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2015-04-29 17:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Olivier ROLAND, git

Jeff King <peff@peff.net> writes:

> ... would behave incorrectly without "-r". I would be shocked if that ever
> happens in real life, but I think it doesn't hurt to be careful.

Ahh, OK.  I overlooked the continued-line possibility.

> $$ is always the pid of the main shell process,...

Thanks for straightening me out---I was too lazy to run strace ;-)

> ... But I can imagine a shell that is smart
> enough to realize a fork is not required in this instance.

Yup.  I think that is a natural thing to optimize for implementors
of shells.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug report : bad filter-branch (OSX only)
  2015-04-29 16:30             ` Junio C Hamano
  2015-04-29 16:43               ` Jeff King
@ 2015-04-29 18:17               ` Johannes Sixt
  1 sibling, 0 replies; 12+ messages in thread
From: Johannes Sixt @ 2015-04-29 18:17 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Olivier ROLAND, git

Am 29.04.2015 um 18:30 schrieb Junio C Hamano:
> As to s/()/{}/, please tell me what I am doing wrong.  I am getting
> the same process IDs from all of the $$s and the only difference
> seems to be variable clobbering.

The clobbered variable should be irrelevant for our use-case because it 
occurs only in the upstream of a pipeline, which is required to run in a 
sub-shell.

-- Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-29 18:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-26  9:25 Bug report : bad filter-branch (OSX only) Olivier ROLAND
2015-04-28  5:55 ` Jeff King
2015-04-28 11:02   ` Olivier ROLAND
2015-04-29  4:39     ` Jeff King
2015-04-29  4:56       ` Jeff King
2015-04-29  5:39         ` Junio C Hamano
2015-04-29 15:48           ` Jeff King
2015-04-29 16:30             ` Junio C Hamano
2015-04-29 16:43               ` Jeff King
2015-04-29 17:13                 ` Junio C Hamano
2015-04-29 18:17               ` Johannes Sixt
2015-04-29 14:42 ` Roberto Tyley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).