All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Richard MICHAEL <rmichael@leadformance.com>
Cc: git@vger.kernel.org
Subject: Re: git-filter-branch : LANG / LC_ALL = C breaks UTF-8 author names
Date: Tue, 31 Aug 2010 20:08:55 -0500	[thread overview]
Message-ID: <20100901010855.GD22968@burratino> (raw)
In-Reply-To: <4C6E86AA.2020903@leadformance.com>

Hi Richard,

Richard MICHAEL wrote:
>>Richard MICHAEL wrote:

>>> I am filtering our repo with git-filter-branch, but as the sed
>>> script runs with LANG=C LC_ALL=C (7 bit US ASCII), it dies on
>>> commits authored by our team members with accented names.
[...]
> What about special casing the bad sed (or whitelisting good sed)?
> Surely a hack, but would those of us with GNU or BSD would be happy.
> Which was the troublesome sed?

Sorry for the slow response.  The problematic sed is GNU sed from
MacPorts (I think).  Even with LC_ALL=C, .* no longer matches
arbitrary sequences of bytes with such sed: you can check yours with

 $ echo 'étale' | LC_ALL=C sed 's/.*//'

Unfortunately I have not been able to reproduce it on Linux.  Debian
sed 4.2.1-7 and GNU sed v4.2.1-21-gc6d32f0 both produce the expected
result:

 $ echo 'étale' | LC_ALL=C sed 's/.*//'
 $

> Unfortunately, it
> doesn't "die" well either; the 'export' shell var fails but it keeps
> processing commits.

Hmm, that sounds like a bug indeed.  Here is what the start to a fix
might look like, but I stopped early because it there's quite a lot of
sed usage in git that expects to be able to process arbitrary data
with short, newline-terminated lines (regardless of encoding).

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 962a93b..34a5fa3 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -68,8 +68,8 @@ eval "$functions"
 # "author" or "committer
 
 set_ident () {
-	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
-	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
+	lid="$(echo "$1" | tr "[A-Z]" "[a-z]")" &&
+	uid="$(echo "$1" | tr "[a-z]" "[A-Z]")" &&
 	pick_id_script='
 		/^'$lid' /{
 			s/'\''/'\''\\'\'\''/g
@@ -90,9 +90,9 @@ set_ident () {
 
 			q
 		}
-	'
+	' &&
 
-	LANG=C LC_ALL=C sed -ne "$pick_id_script"
+	LANG=C LC_ALL=C sed -ne "$pick_id_script" &&
 	# Ensure non-empty id name.
 	echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
 }
@@ -322,9 +322,11 @@ while read commit parents; do
 	git cat-file commit "$commit" >../commit ||
 		die "Cannot read commit $commit"
 
-	eval "$(set_ident AUTHOR <../commit)" ||
+	set_author=$(set_ident AUTHOR <../commit) &&
+	eval "$set_author" ||
 		die "setting author failed for commit $commit"
-	eval "$(set_ident COMMITTER <../commit)" ||
+	set_committer=$(set_ident COMMITTER <../commit) &&
+	eval "$set_committer" ||
 		die "setting committer failed for commit $commit"
 	eval "$filter_env" < /dev/null ||
 		die "env filter failed: $filter_env"

      reply	other threads:[~2010-09-01  1:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-20 13:20 git-filter-branch : LANG / LC_ALL = C breaks UTF-8 author names Richard MICHAEL
2010-08-20 13:32 ` Jonathan Nieder
2010-08-20 13:44   ` Richard MICHAEL
2010-09-01  1:08     ` Jonathan Nieder [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100901010855.GD22968@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=rmichael@leadformance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.