* [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
@ 2016-03-08 7:59 Anders Kaseorg
2016-03-08 12:25 ` Torsten Bögershausen
0 siblings, 1 reply; 12+ messages in thread
From: Anders Kaseorg @ 2016-03-08 7:59 UTC (permalink / raw)
To: gitster; +Cc: git
The included test case, which uses rebase -p with non-ASCII commit
messages, was failing as follows:
Warning: the command isn't recognized in the following line:
- Binary file (standard input) matches
You can fix this with 'git rebase --edit-todo'.
Or you can abort the rebase with 'git rebase --abort'.
Possibly related to recent GNU grep changes, as with commit
316336379cf7937c2ecf122c7197cfe5da6b2061. Avoid the issue by using sed
instead.
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
---
git-rebase--interactive.sh | 2 +-
t/t3409-rebase-preserve-merges.sh | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index c0cfe88..0efc65c 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1241,7 +1241,7 @@ then
# be rebasing on top of it
git rev-list --parents -1 $rev | cut -d' ' -s -f2 > "$dropped"/$rev
sha1=$(git rev-list -1 $rev)
- sane_grep -v "^[a-z][a-z]* $sha1" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
+ sed "/^[a-z][a-z]* $sha1/d" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
rm "$rewritten"/$rev
fi
done
diff --git a/t/t3409-rebase-preserve-merges.sh b/t/t3409-rebase-preserve-merges.sh
index 8c251c5..1f01b29 100755
--- a/t/t3409-rebase-preserve-merges.sh
+++ b/t/t3409-rebase-preserve-merges.sh
@@ -119,4 +119,25 @@ test_expect_success 'rebase -p ignores merge.log config' '
)
'
+test_expect_success 'rebase -p works with non-ASCII commit message' '
+ (
+ mkdir non-ascii &&
+ cd non-ascii &&
+ git init &&
+ echo a > a &&
+ git add a &&
+ git commit -m a &&
+ echo b > b &&
+ git add b &&
+ git commit -m b &&
+ git branch foo &&
+ git reset --hard HEAD^ &&
+ git cherry-pick -x foo &&
+ echo c > c &&
+ git add c &&
+ git commit -m "$(printf "I \\342\\231\\245 Unicode")" &&
+ git rebase -p foo
+ )
+'
+
test_done
--
2.8.0.rc0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 7:59 [PATCH] rebase -p: avoid grep on potentailly non-ASCII data Anders Kaseorg
@ 2016-03-08 12:25 ` Torsten Bögershausen
2016-03-08 13:45 ` Michael J Gruber
0 siblings, 1 reply; 12+ messages in thread
From: Torsten Bögershausen @ 2016-03-08 12:25 UTC (permalink / raw)
To: Anders Kaseorg, gitster; +Cc: git
On 03/08/2016 08:59 AM, Anders Kaseorg wrote:
> The included test case, which uses rebase -p with non-ASCII commit
> messages, was failing as follows:
>
> Warning: the command isn't recognized in the following line:
> - Binary file (standard input) matches
>
> You can fix this with 'git rebase --edit-todo'.
> Or you can abort the rebase with 'git rebase --abort'.
>
> Possibly related to recent GNU grep changes, as with commit
> 316336379cf7937c2ecf122c7197cfe5da6b2061. Avoid the issue by using sed
> instead.
>
> Signed-off-by: Anders Kaseorg <andersk@mit.edu>
> ---
> git-rebase--interactive.sh | 2 +-
> t/t3409-rebase-preserve-merges.sh | 21 +++++++++++++++++++++
> 2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index c0cfe88..0efc65c 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -1241,7 +1241,7 @@ then
> # be rebasing on top of it
> git rev-list --parents -1 $rev | cut -d' ' -s -f2 > "$dropped"/$rev
> sha1=$(git rev-list -1 $rev)
> - sane_grep -v "^[a-z][a-z]* $sha1" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
> + sed "/^[a-z][a-z]* $sha1/d" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
> rm "$rewritten"/$rev
> fi
> done
> diff --git a/t/t3409-rebase-preserve-merges.sh b/t/t3409-rebase-preserve-merges.sh
> index 8c251c5..1f01b29 100755
> --- a/t/t3409-rebase-preserve-merges.sh
> +++ b/t/t3409-rebase-preserve-merges.sh
> @@ -119,4 +119,25 @@ test_expect_success 'rebase -p ignores merge.log config' '
> )
> '
>
> +test_expect_success 'rebase -p works with non-ASCII commit message' '
> + (
> + mkdir non-ascii &&
#The cd should be done in a subshell:
(
> + cd non-ascii &&
> + git init &&
> + echo a > a &&
> + git add a &&
> + git commit -m a &&
> + echo b > b &&
#Style: No space after ">" (and even above and below)
echo b >b
> + git add b &&
> + git commit -m b &&
> + git branch foo &&
> + git reset --hard HEAD^ &&
> + git cherry-pick -x foo &&
> + echo c > c &&
> + git add c &&
> + git commit -m "$(printf "I \\342\\231\\245 Unicode")" &&
> + git rebase -p foo
> + )
> +
#end of subshell
)
> '
> +
> test_done
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 12:25 ` Torsten Bögershausen
@ 2016-03-08 13:45 ` Michael J Gruber
2016-03-08 14:35 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Michael J Gruber @ 2016-03-08 13:45 UTC (permalink / raw)
To: Torsten Bögershausen, Anders Kaseorg, gitster; +Cc: git
Torsten Bögershausen venit, vidit, dixit 08.03.2016 13:25:
> On 03/08/2016 08:59 AM, Anders Kaseorg wrote:
>> The included test case, which uses rebase -p with non-ASCII commit
>> messages, was failing as follows:
>>
>> Warning: the command isn't recognized in the following line:
>> - Binary file (standard input) matches
>>
>> You can fix this with 'git rebase --edit-todo'.
>> Or you can abort the rebase with 'git rebase --abort'.
>>
>> Possibly related to recent GNU grep changes, as with commit
>> 316336379cf7937c2ecf122c7197cfe5da6b2061. Avoid the issue by using sed
>> instead.
>>
>> Signed-off-by: Anders Kaseorg <andersk@mit.edu>
>> ---
>> git-rebase--interactive.sh | 2 +-
>> t/t3409-rebase-preserve-merges.sh | 21 +++++++++++++++++++++
>> 2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
>> index c0cfe88..0efc65c 100644
>> --- a/git-rebase--interactive.sh
>> +++ b/git-rebase--interactive.sh
>> @@ -1241,7 +1241,7 @@ then
>> # be rebasing on top of it
>> git rev-list --parents -1 $rev | cut -d' ' -s -f2 > "$dropped"/$rev
>> sha1=$(git rev-list -1 $rev)
>> - sane_grep -v "^[a-z][a-z]* $sha1" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
>> + sed "/^[a-z][a-z]* $sha1/d" <"$todo" > "${todo}2" ; mv "${todo}2" "$todo"
>> rm "$rewritten"/$rev
>> fi
>> done
>> diff --git a/t/t3409-rebase-preserve-merges.sh b/t/t3409-rebase-preserve-merges.sh
>> index 8c251c5..1f01b29 100755
>> --- a/t/t3409-rebase-preserve-merges.sh
>> +++ b/t/t3409-rebase-preserve-merges.sh
>> @@ -119,4 +119,25 @@ test_expect_success 'rebase -p ignores merge.log config' '
>> )
>> '
>>
>> +test_expect_success 'rebase -p works with non-ASCII commit message' '
>> + (
>> + mkdir non-ascii &&
> #The cd should be done in a subshell:
> (
>> + cd non-ascii &&
>> + git init &&
>> + echo a > a &&
>> + git add a &&
>> + git commit -m a &&
>> + echo b > b &&
> #Style: No space after ">" (and even above and below)
And also on the sane_grep/sed line.
>
> echo b >b
>
>
>> + git add b &&
>> + git commit -m b &&
>> + git branch foo &&
>> + git reset --hard HEAD^ &&
>> + git cherry-pick -x foo &&
>> + echo c > c &&
>> + git add c &&
>> + git commit -m "$(printf "I \\342\\231\\245 Unicode")" &&
>> + git rebase -p foo
>> + )
>> +
> #end of subshell
> )
The whole test is in a subshell already, although that is easy to miss
(missing indentation).
>> '
>> +
>> test_done
>
It may be worth noting whether other occurrences of "sane_grep" are safe
from binary input.
After all, one my question the degree of sanity of our sane_grep, or
whether we need asane_grep and bisane_grep in our shell library - "make
our grep sane again".
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 13:45 ` Michael J Gruber
@ 2016-03-08 14:35 ` Jeff King
2016-03-08 23:20 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2016-03-08 14:35 UTC (permalink / raw)
To: Michael J Gruber; +Cc: Torsten Bögershausen, Anders Kaseorg, gitster, git
On Tue, Mar 08, 2016 at 02:45:20PM +0100, Michael J Gruber wrote:
> It may be worth noting whether other occurrences of "sane_grep" are safe
> from binary input.
>
> After all, one my question the degree of sanity of our sane_grep, or
> whether we need asane_grep and bisane_grep in our shell library - "make
> our grep sane again".
I actually wonder if we should have a build-time knob to put "grep -a"
into sane_grep(). We do not ever plan to feed it binary data, so that
will do what, provided the system grep handles "-a". And on those that
do not know about "-a", one imagines that they do not suffer from this
problem in the first place (which is really limited to recent versions
of GNU grep).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 14:35 ` Jeff King
@ 2016-03-08 23:20 ` Junio C Hamano
2016-03-08 23:36 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-03-08 23:20 UTC (permalink / raw)
To: Jeff King
Cc: Michael J Gruber, Torsten Bögershausen, Anders Kaseorg, git
Jeff King <peff@peff.net> writes:
> I actually wonder if we should have a build-time knob to put "grep -a"
> into sane_grep(). We do not ever plan to feed it binary data, so that
> will do what, provided the system grep handles "-a". And on those that
> do not know about "-a", one imagines that they do not suffer from this
> problem in the first place (which is really limited to recent versions
> of GNU grep).
Something along this line, you mean? I'll leave it as a
low-hanging-fruit to add autoconf support ;-)
Makefile | 4 ++++
git-sh-setup.sh | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 24bef8d..1187498 100644
--- a/Makefile
+++ b/Makefile
@@ -264,6 +264,9 @@ all::
#
# Define NO_TCLTK if you do not want Tcl/Tk GUI.
#
+# Define SANE_TEXT_GREP to "-a" if you use recent versions of GNU grep
+# and egrep that are picker when their input contains non-ASCII data.
+#
# The TCL_PATH variable governs the location of the Tcl interpreter
# used to optimize git-gui for your system. Only used if NO_TCLTK
# is not set. Defaults to the bare 'tclsh'.
@@ -1752,6 +1755,7 @@ sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
-e $(BROKEN_PATH_FIX) \
-e 's|@@GITWEBDIR@@|$(gitwebdir_SQ)|g' \
-e 's|@@PERL@@|$(PERL_PATH_SQ)|g' \
+ -e 's|@@SANE_TEXT_GREP@@|$(SANE_TEXT_GREP)|g' \
$@.sh >$@+
endef
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 4691fbc..c48139a 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -168,11 +168,11 @@ git_pager() {
}
sane_grep () {
- GREP_OPTIONS= LC_ALL=C grep "$@"
+ GREP_OPTIONS= LC_ALL=C grep @@SANE_TEXT_GREP@@ "$@"
}
sane_egrep () {
- GREP_OPTIONS= LC_ALL=C egrep "$@"
+ GREP_OPTIONS= LC_ALL=C egrep @@SANE_TEXT_GREP@@ "$@"
}
is_bare_repository () {
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 23:20 ` Junio C Hamano
@ 2016-03-08 23:36 ` Junio C Hamano
2016-03-09 0:11 ` Jeff King
2016-03-09 0:10 ` Jeff King
2016-03-09 2:31 ` Anders Kaseorg
2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-03-08 23:36 UTC (permalink / raw)
To: git; +Cc: Michael J Gruber, Torsten Bögershausen, Anders Kaseorg,
Jeff King
Subject: rebase-i: clarify "is this commit relevant" test
While I was checking all the call sites of sane_grep and sane_egrep,
I noticed this one is somewhat strangely written. The lines in the
file sane_grep works on all begin with 40-hex object name, so there
is no real risk of confusing "test $(...) = ''" by finding something
that begins with a dash, but using the status from sane_grep makes
it a lot clearer what is going on.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* By the way, if we are going to take that @@SANE_TEXT_GREP@@
patch, we'd need to add $(SANE_TEXT_GREP) to SCRIPT_DEFINES
in Makefile to force git-sh-setup to be regenerated when its
setting changes.
git-rebase--interactive.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index c0cfe88..4cde685 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1233,7 +1233,8 @@ then
git rev-list $revisions |
while read rev
do
- if test -f "$rewritten"/$rev && test "$(sane_grep "$rev" "$state_dir"/not-cherry-picks)" = ""
+ if test -f "$rewritten"/$rev &&
+ ! sane_grep "$rev" "$state_dir"/not-cherry-picks >/dev/null
then
# Use -f2 because if rev-list is telling us this commit is
# not worthwhile, we don't want to track its multiple heads,
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 23:36 ` Junio C Hamano
@ 2016-03-09 0:11 ` Jeff King
0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2016-03-09 0:11 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Michael J Gruber, Torsten Bögershausen, Anders Kaseorg
On Tue, Mar 08, 2016 at 03:36:26PM -0800, Junio C Hamano wrote:
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index c0cfe88..4cde685 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -1233,7 +1233,8 @@ then
> git rev-list $revisions |
> while read rev
> do
> - if test -f "$rewritten"/$rev && test "$(sane_grep "$rev" "$state_dir"/not-cherry-picks)" = ""
> + if test -f "$rewritten"/$rev &&
> + ! sane_grep "$rev" "$state_dir"/not-cherry-picks >/dev/null
Looks better. I think "-q" might be nicer still (and more efficient,
though it almost certainly doesn't matter in practice).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 23:20 ` Junio C Hamano
2016-03-08 23:36 ` Junio C Hamano
@ 2016-03-09 0:10 ` Jeff King
2016-03-09 2:31 ` Anders Kaseorg
2 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2016-03-09 0:10 UTC (permalink / raw)
To: Junio C Hamano
Cc: Michael J Gruber, Torsten Bögershausen, Anders Kaseorg, git
On Tue, Mar 08, 2016 at 03:20:26PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > I actually wonder if we should have a build-time knob to put "grep -a"
> > into sane_grep(). We do not ever plan to feed it binary data, so that
> > will do what, provided the system grep handles "-a". And on those that
> > do not know about "-a", one imagines that they do not suffer from this
> > problem in the first place (which is really limited to recent versions
> > of GNU grep).
>
> Something along this line, you mean? I'll leave it as a
> low-hanging-fruit to add autoconf support ;-)
Yeah, though I think I would probably squash in:
diff --git a/config.mak.uname b/config.mak.uname
index 723f632..15557c3 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -37,6 +37,7 @@ ifeq ($(uname_S),Linux)
HAVE_CLOCK_GETTIME = YesPlease
HAVE_CLOCK_MONOTONIC = YesPlease
HAVE_GETDELIM = YesPlease
+ SANE_TEXT_GREP = -a
endif
ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
It's not necessary on all Linux platforms yet, but it doesn't hurt, so
we can err on the side of including it (and I think we can assume all
Linux systems have GNU grep or equivalent).
-Peff
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-08 23:20 ` Junio C Hamano
2016-03-08 23:36 ` Junio C Hamano
2016-03-09 0:10 ` Jeff King
@ 2016-03-09 2:31 ` Anders Kaseorg
2016-03-09 20:26 ` Junio C Hamano
2 siblings, 1 reply; 12+ messages in thread
From: Anders Kaseorg @ 2016-03-09 2:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
The autoconf support you committed as 67f1790a has a small bug (the else
cause should omit -a):
+if grep -a ascii configure.ac >/dev/null; then
+ AC_MSG_RESULT([Using 'grep -a' for sane_grep])
+ SANE_TEXT_GREP=-a
+else
+ SANE_TEXT_GREP=-a
+fi
+GIT_CONF_SUBST([SANE_TEXT_GREP])
Anders
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-09 2:31 ` Anders Kaseorg
@ 2016-03-09 20:26 ` Junio C Hamano
2016-03-10 7:42 ` Torsten Bögershausen
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-03-09 20:26 UTC (permalink / raw)
To: Anders Kaseorg; +Cc: git
Anders Kaseorg <andersk@mit.edu> writes:
> The autoconf support you committed as 67f1790a has a small bug (the else
> cause should omit -a):
Thanks. To summarize the discussion, here is what I ended up with.
-- >8 --
Subject: [PATCH] sane_grep: pass "-a" if grep accepts it
Newer versions of GNU grep is reported to be pickier when we feed a
non-ASCII input and break some Porcelain scripts. As we know we do
not feed random binary file to our own sane_grep wrapper, allow us
to always pass "-a" by setting SANE_TEXT_GREP=-a Makefile variable
to work it around.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Makefile | 6 +++++-
config.mak.uname | 1 +
configure.ac | 7 +++++++
git-sh-setup.sh | 4 ++--
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/Makefile b/Makefile
index 37e2d9e..62d759d 100644
--- a/Makefile
+++ b/Makefile
@@ -266,6 +266,9 @@ all::
#
# Define NO_TCLTK if you do not want Tcl/Tk GUI.
#
+# Define SANE_TEXT_GREP to "-a" if you use recent versions of GNU grep
+# and egrep that are picker when their input contains non-ASCII data.
+#
# The TCL_PATH variable governs the location of the Tcl interpreter
# used to optimize git-gui for your system. Only used if NO_TCLTK
# is not set. Defaults to the bare 'tclsh'.
@@ -1728,7 +1731,7 @@ common-cmds.h: $(wildcard Documentation/git-*.txt)
SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
- $(gitwebdir_SQ):$(PERL_PATH_SQ)
+ $(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP)
define cmd_munge_script
$(RM) $@ $@+ && \
sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
@@ -1740,6 +1743,7 @@ sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
-e $(BROKEN_PATH_FIX) \
-e 's|@@GITWEBDIR@@|$(gitwebdir_SQ)|g' \
-e 's|@@PERL@@|$(PERL_PATH_SQ)|g' \
+ -e 's|@@SANE_TEXT_GREP@@|$(SANE_TEXT_GREP)|g' \
$@.sh >$@+
endef
diff --git a/config.mak.uname b/config.mak.uname
index 943c439..a706474 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -37,6 +37,7 @@ ifeq ($(uname_S),Linux)
HAVE_CLOCK_GETTIME = YesPlease
HAVE_CLOCK_MONOTONIC = YesPlease
HAVE_GETDELIM = YesPlease
+ SANE_TEXT_GREP=-a
endif
ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
diff --git a/configure.ac b/configure.ac
index 1f55009..6fd7b8e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -471,6 +471,13 @@ if test -n "$ASCIIDOC"; then
esac
fi
+if grep -a ascii configure.ac >/dev/null; then
+ AC_MSG_RESULT([Using 'grep -a' for sane_grep])
+ SANE_TEXT_GREP=-a
+else
+ SANE_TEXT_GREP=
+fi
+GIT_CONF_SUBST([SANE_TEXT_GREP])
## Checks for libraries.
AC_MSG_NOTICE([CHECKS for libraries])
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 4691fbc..c48139a 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -168,11 +168,11 @@ git_pager() {
}
sane_grep () {
- GREP_OPTIONS= LC_ALL=C grep "$@"
+ GREP_OPTIONS= LC_ALL=C grep @@SANE_TEXT_GREP@@ "$@"
}
sane_egrep () {
- GREP_OPTIONS= LC_ALL=C egrep "$@"
+ GREP_OPTIONS= LC_ALL=C egrep @@SANE_TEXT_GREP@@ "$@"
}
is_bare_repository () {
--
2.8.0-rc1-141-gbaa22e3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-09 20:26 ` Junio C Hamano
@ 2016-03-10 7:42 ` Torsten Bögershausen
2016-03-10 17:22 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Torsten Bögershausen @ 2016-03-10 7:42 UTC (permalink / raw)
To: Junio C Hamano, Anders Kaseorg; +Cc: git
On 09.03.16 21:26, Junio C Hamano wrote:
> Anders Kaseorg <andersk@mit.edu> writes:
[]
> sane_grep () {
> - GREP_OPTIONS= LC_ALL=C grep "$@"
> + GREP_OPTIONS= LC_ALL=C grep @@SANE_TEXT_GREP@@ "$@"
> }
>
> sane_egrep () {
> - GREP_OPTIONS= LC_ALL=C egrep "$@"
> + GREP_OPTIONS= LC_ALL=C egrep @@SANE_TEXT_GREP@@ "$@"
> }
>
I always wondered why we do LC_ALL=C.
Isn't that begging for trouble, when we feed UTF-8, ISO-8895-1
or other stuff into a program and say LC_ALL=C at the same time ?
On my Debian Linux system I have
LANG=en_US.UTF-8
and
$ locale -a
C
C.UTF-8
en_US.utf8
POSIX
--------------
Mac OS has LANG unset, and reports
locale -a
en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.US-ASCII
en_US.UTF-8
#(and a lot more )
C
POSIX
-----
My Centos has
LANG=en_US.UTF-8
and reports e.g.
en_US
en_US.iso88591
en_US.iso885915
en_US.utf8
(And many more)
In t0204 we have
LANGUAGE=is LC_ALL="$is_IS_locale" git init repo >actual &&
which is based on
# is_IS.UTF-8 on Solaris and FreeBSD, is_IS.utf8 on Debian
is_IS_locale=$(locale -a 2>/dev/null |
in
lib-gettext.sh
Is there something we can steal here ?
http://pubs.opengroup.org/onlinepubs/007908799/xbd/envvar.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] rebase -p: avoid grep on potentailly non-ASCII data
2016-03-10 7:42 ` Torsten Bögershausen
@ 2016-03-10 17:22 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-03-10 17:22 UTC (permalink / raw)
To: Torsten Bögershausen; +Cc: Anders Kaseorg, git
Torsten Bögershausen <tboegi@web.de> writes:
> On 09.03.16 21:26, Junio C Hamano wrote:
>> Anders Kaseorg <andersk@mit.edu> writes:
> []
>> sane_grep () {
>> - GREP_OPTIONS= LC_ALL=C grep "$@"
>> + GREP_OPTIONS= LC_ALL=C grep @@SANE_TEXT_GREP@@ "$@"
>> }
>>
>> sane_egrep () {
>> - GREP_OPTIONS= LC_ALL=C egrep "$@"
>> + GREP_OPTIONS= LC_ALL=C egrep @@SANE_TEXT_GREP@@ "$@"
>> }
>>
>
> I always wondered why we do LC_ALL=C.
It is because, at least back when we started scripted Porcelains of
Git, that is the only thing we could count on available everywhere,
and more importantly, that is how you disable all the i18n gotchas
and ask tools to use the traditional "a file stores a stream of
bytes" semantics.
> Isn't that begging for trouble, when we feed UTF-8, ISO-8895-1
> or other stuff into a program and say LC_ALL=C at the same time ?
Yes and no. It is true that in "C" ("POSIX"), behaviour on anything
outside the portable and control character sets is undefined, so in
theory, it is bad that we relied on that undefined behaviour to be
to treat the input as just a stream of bytes. But at the same time,
it is no worse than letting the user's locale take effect or use
hardcoded en_US.UTF-8 and passing unknown end user data that could
be in latin1 and other stuff. And sensible implementors of "C"
locale seemed to choose the "stream of bytes" back when fa9d3485
(git-am: force egrep to use correct characters set, 2009-09-25) was
written, which is where LC_ALL=C comes from. I wasn't around when
that patch was accepted, so I cannot quite tell if it was done to
fix a reported bug (i.e. it would reintroduce the bug if we lost
LC_ALL=C) or it was done "just because".
Back when e1622bfc (Protect scripted Porcelains from GREP_OPTIONS
insanity, 2009-11-23) introduced sane_grep, the only caller of it
that wanted to use LC_ALL=C was this callsite in "git am", and we no
longer have that callsite since 783d7e86 (builtin-am: remove
redirection to git-am.sh, 2015-08-04), so I think whatever fa9d3485
wanted to do will not be broken if we removed LC_ALL=C from
sane_grep. It however is possible that any callsites of sane_grep
added after e1622bfc implicitly depended on having LC_ALL=C in the
implementation.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-03-10 17:22 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-08 7:59 [PATCH] rebase -p: avoid grep on potentailly non-ASCII data Anders Kaseorg
2016-03-08 12:25 ` Torsten Bögershausen
2016-03-08 13:45 ` Michael J Gruber
2016-03-08 14:35 ` Jeff King
2016-03-08 23:20 ` Junio C Hamano
2016-03-08 23:36 ` Junio C Hamano
2016-03-09 0:11 ` Jeff King
2016-03-09 0:10 ` Jeff King
2016-03-09 2:31 ` Anders Kaseorg
2016-03-09 20:26 ` Junio C Hamano
2016-03-10 7:42 ` Torsten Bögershausen
2016-03-10 17:22 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).