From: Jonathan Nieder <jrnieder@gmail.com>
To: Richard MICHAEL <rmichael@leadformance.com>
Cc: git@vger.kernel.org
Subject: Re: git-filter-branch : LANG / LC_ALL = C breaks UTF-8 author names
Date: Tue, 31 Aug 2010 20:08:55 -0500 [thread overview]
Message-ID: <20100901010855.GD22968@burratino> (raw)
In-Reply-To: <4C6E86AA.2020903@leadformance.com>
Hi Richard,
Richard MICHAEL wrote:
>>Richard MICHAEL wrote:
>>> I am filtering our repo with git-filter-branch, but as the sed
>>> script runs with LANG=C LC_ALL=C (7 bit US ASCII), it dies on
>>> commits authored by our team members with accented names.
[...]
> What about special casing the bad sed (or whitelisting good sed)?
> Surely a hack, but would those of us with GNU or BSD would be happy.
> Which was the troublesome sed?
Sorry for the slow response. The problematic sed is GNU sed from
MacPorts (I think). Even with LC_ALL=C, .* no longer matches
arbitrary sequences of bytes with such sed: you can check yours with
$ echo 'étale' | LC_ALL=C sed 's/.*//'
Unfortunately I have not been able to reproduce it on Linux. Debian
sed 4.2.1-7 and GNU sed v4.2.1-21-gc6d32f0 both produce the expected
result:
$ echo 'étale' | LC_ALL=C sed 's/.*//'
$
> Unfortunately, it
> doesn't "die" well either; the 'export' shell var fails but it keeps
> processing commits.
Hmm, that sounds like a bug indeed. Here is what the start to a fix
might look like, but I stopped early because it there's quite a lot of
sed usage in git that expects to be able to process arbitrary data
with short, newline-terminated lines (regardless of encoding).
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 962a93b..34a5fa3 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -68,8 +68,8 @@ eval "$functions"
# "author" or "committer
set_ident () {
- lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
- uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
+ lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" &&
+ uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" &&
pick_id_script='
/^'$lid' /{
s/'\''/'\''\\'\'\''/g
@@ -90,9 +90,9 @@ set_ident () {
q
}
- '
+ ' &&
- LANG=C LC_ALL=C sed -ne "$pick_id_script"
+ LANG=C LC_ALL=C sed -ne "$pick_id_script" &&
# Ensure non-empty id name.
echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
}
@@ -322,9 +322,11 @@ while read commit parents; do
git cat-file commit "$commit" >../commit ||
die "Cannot read commit $commit"
- eval "$(set_ident AUTHOR <../commit)" ||
+ set_author=$(set_ident AUTHOR <../commit) &&
+ eval "$set_author" ||
die "setting author failed for commit $commit"
- eval "$(set_ident COMMITTER <../commit)" ||
+ set_committer=$(set_ident COMMITTER <../commit) &&
+ eval "$set_committer" ||
die "setting committer failed for commit $commit"
eval "$filter_env" < /dev/null ||
die "env filter failed: $filter_env"
prev parent reply other threads:[~2010-09-01 1:11 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-20 13:20 git-filter-branch : LANG / LC_ALL = C breaks UTF-8 author names Richard MICHAEL
2010-08-20 13:32 ` Jonathan Nieder
2010-08-20 13:44 ` Richard MICHAEL
2010-09-01 1:08 ` Jonathan Nieder [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100901010855.GD22968@burratino \
--to=jrnieder@gmail.com \
--cc=git@vger.kernel.org \
--cc=rmichael@leadformance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).