* Re: commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"]
[not found] ` <alpine.LSU.2.00.0911111318260.15039@wotan.suse.de>
@ 2009-11-11 14:13 ` Uwe Kleine-König
2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König
0 siblings, 1 reply; 5+ messages in thread
From: Uwe Kleine-König @ 2009-11-11 14:13 UTC (permalink / raw)
To: Jiri Kosina; +Cc: git
On Wed, Nov 11, 2009 at 01:19:32PM +0100, Jiri Kosina wrote:
> On Wed, 11 Nov 2009, Uwe Kleine-König wrote:
>
> > > Thanks for noticing, I had a bug in my git charset config for quite some
> > > time. Fixed it now.
> > Now my name is latin1 encoded. It should be utf-8, doesn't it?
>
> It's not latin1, it's iso-8859-2
For my name it makes no difference.
> which is what I use on my terminals.
> And git can handle that fine too (it stores the encoding together with the
> commit).
Ah, OK, that's news to me, but you're right. I have here:
~/gsrc/linux-2.6$ git cat-file commit 30ff0743f88a70f52a4de5ea5bcb1fd29bcfab2d
tree 6121d35bb2606878be636e897fa77cd51804d724
parent 916b7c73db593510d5c38706be2f2888981747ee
author Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de> 1256757064 +0100
committer Jiri Kosina <jkosina@suse.cz> 1257780224 +0100
encoding ISO-8859-2
tree-wide: fix typos "couter" -> "counter"
This patch was generated by
git grep -E -i -l 'couter' | xargs -r perl -p -i -e 's/couter/counter/'
Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
So the remaining question is: Does the encoding specified in the
encoding "header" also apply to the other headers?
If yes[1] then there's a bug in git-shortlog
~/gsrc/linux-2.6$ git shortlog linus/master..trivial/for-next | grep Uwe
Uwe Kleine-K�nig (5):
(with linus/master = 799dd75b1a8380a967c929a4551895788c374b31,
trivial/for-next = 4030ec040a0e21fe9953da70eaa59ee7b4f2297b).
Best regards
Uwe
[1] git log linus/master..trivial/for-next looks OK, so I suspect it
does apply.
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] shortlog: respect commit encoding
2009-11-11 14:13 ` commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"] Uwe Kleine-König
@ 2009-11-24 15:12 ` Uwe Kleine-König
2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König
2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano
0 siblings, 2 replies; 5+ messages in thread
From: Uwe Kleine-König @ 2009-11-24 15:12 UTC (permalink / raw)
To: git; +Cc: Jiri Kosina
Before this change the author was taken from the raw commit without
reencoding.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
---
builtin-shortlog.c | 25 +++++++++++++++----------
t/t4201-shortlog.sh | 24 ++++++++++++++++++++++++
2 files changed, 39 insertions(+), 10 deletions(-)
diff --git a/builtin-shortlog.c b/builtin-shortlog.c
index 8aa63c7..050bda8 100644
--- a/builtin-shortlog.c
+++ b/builtin-shortlog.c
@@ -139,14 +139,19 @@ static void read_from_stdin(struct shortlog *log)
void shortlog_add_commit(struct shortlog *log, struct commit *commit)
{
const char *author = NULL, *buffer;
+ struct strbuf buf = STRBUF_INIT;
+ struct strbuf ufbuf = STRBUF_INIT;
+ struct pretty_print_context ctx = {0};
- buffer = commit->buffer;
+ pretty_print_commit(CMIT_FMT_RAW, commit, &buf, &ctx);
+
+ buffer = buf.buf;
while (*buffer && *buffer != '\n') {
const char *eol = strchr(buffer, '\n');
- if (eol == NULL)
+ if (eol == NULL) {
eol = buffer + strlen(buffer);
- else
+ } else
eol++;
if (!prefixcmp(buffer, "author "))
@@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
die("Missing author: %s",
sha1_to_hex(commit->object.sha1));
if (log->user_format) {
- struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = DEFAULT_ABBREV;
ctx.subject = "";
ctx.after_subject = "";
ctx.date_mode = DATE_NORMAL;
- pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &buf, &ctx);
- insert_one_record(log, author, buf.buf);
- strbuf_release(&buf);
- return;
- }
- if (*buffer)
+ pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx);
+ buffer = ufbuf.buf;
+
+ } else if (*buffer)
buffer++;
+
insert_one_record(log, author, !*buffer ? "<none>" : buffer);
+ strbuf_release(&ufbuf);
+ strbuf_release(&buf);
}
static void get_from_rev(struct rev_info *rev, struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 405b971..118204b 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -51,5 +51,29 @@ git log HEAD > log
GIT_DIR=non-existing git shortlog -w < log > out
test_expect_success 'shortlog from non-git directory' 'test_cmp expect out'
+iconvfromutf8toiso885915() {
+ printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15
+}
+
+git reset --hard "$commit"
+git config --unset i18n.commitencoding
+echo 2 > a1
+git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1
+
+git config i18n.commitencoding "ISO-8859-15"
+echo 3 > a1
+git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1
+git config --unset i18n.commitencoding
+
+git shortlog HEAD~2.. > out
+
+cat > expect << EOF
+Jöhännës "Dschö" Schindëlin (2):
+ set a1 to 2 and some non-ASCII chars: Äßø
+ set a1 to 3 and some non-ASCII chars: áæï
+
+EOF
+
+test_expect_success 'shortlog encoding' 'test_cmp expect out'
test_done
--
1.6.5.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding]
2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König
@ 2009-11-24 16:08 ` Uwe Kleine-König
2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano
1 sibling, 0 replies; 5+ messages in thread
From: Uwe Kleine-König @ 2009-11-24 16:08 UTC (permalink / raw)
To: git; +Cc: Jiri Kosina
Hello,
On Tue, Nov 24, 2009 at 04:12:35PM +0100, Uwe Kleine-König wrote:
> Before this change the author was taken from the raw commit without
> reencoding.
while at it, userformats have the same problem:
linux-2.6$ for rev in b71a8eb bc9be01; do git show --format=%an $rev | head -n1; git show $rev | grep Auth; done
Uwe Kleine-K�nig
Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Uwe Kleine-König
Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
That is, git show correctly reencodes its output for b71a8eb, but only
without --format=...
(The patches above are in linux-next.)
And now take this (assuming locale and commitencoding are utf-8):
git init
git config user.name 'Jöhännës Dschö'
echo spam > ham
git add ham
git commit -m 'initial commit'
git branch branch
echo more spam > ham
git add ham
git commit -m "Commitlog matching '\nencoding:'
encoding: latin1
"
git checkout branch
echo much more spam > ham
git add ham
git commit -C master
git show | grep Author:
Best regards
Uwe
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] shortlog: respect commit encoding
2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König
2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König
@ 2009-11-25 1:12 ` Junio C Hamano
2009-11-25 14:00 ` Uwe Kleine-König
1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-11-25 1:12 UTC (permalink / raw)
To: Uwe Kleine-König; +Cc: git, Jiri Kosina
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> Before this change the author was taken from the raw commit without
> reencoding.
I see people often begin with "before this change" and stop the log
message after making a statement of a fact. I mildly dislike this style,
especially when the resulting message does not state that it is bad (and
if necessary why it is bad) nor state in what way the code after the
change is good.
Don't take the author name information without re-encoding
from the raw commit object buffer.
is easier to read, at least for me.
> while (*buffer && *buffer != '\n') {
> const char *eol = strchr(buffer, '\n');
>
> - if (eol == NULL)
> + if (eol == NULL) {
> eol = buffer + strlen(buffer);
> - else
> + } else
> eol++;
> if (!prefixcmp(buffer, "author "))
What is this hunk for?
> @@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> die("Missing author: %s",
> sha1_to_hex(commit->object.sha1));
> if (log->user_format) {
> - struct strbuf buf = STRBUF_INIT;
> struct pretty_print_context ctx = {0};
> ctx.abbrev = DEFAULT_ABBREV;
> ctx.subject = "";
> ctx.after_subject = "";
> ctx.date_mode = DATE_NORMAL;
> + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx);
> + buffer = ufbuf.buf;
> +
> + } else if (*buffer)
> buffer++;
> +
You probably wanted to add an extra pair of {} around this "else
if" clause instead, not the earlier one.
Otherwise the change looks good from my cursory look.
> diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> index 405b971..118204b 100755
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -51,5 +51,29 @@ git log HEAD > log
> GIT_DIR=non-existing git shortlog -w < log > out
>
> test_expect_success 'shortlog from non-git directory' 'test_cmp expect out'
> +iconvfromutf8toiso885915() {
> + printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15
> +}
A bad use of "$@" that expands to $# individual words; you meant
to say "$*".
Could we please have the following inside its own test, so that
any failure while preparing the test data is caught as an error?
> +git reset --hard "$commit"
> +git config --unset i18n.commitencoding
> +echo 2 > a1
> +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1
> +
> +git config i18n.commitencoding "ISO-8859-15"
> +echo 3 > a1
> +git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1
> +git config --unset i18n.commitencoding
> +
> +git shortlog HEAD~2.. > out
> +
> +cat > expect << EOF
> +Jöhännës "Dschö" Schindëlin (2):
> + set a1 to 2 and some non-ASCII chars: Äßø
> + set a1 to 3 and some non-ASCII chars: áæï
> +
> +EOF
> +
> +test_expect_success 'shortlog encoding' 'test_cmp expect out'
t3900-i18n-commit already uses 8859-1 so if it is not too much to
ask, it would be much nicer to have these test work between UTF-8
and 8859-1, not -15.
That way, I do not have to worry about breaking tests for people
who were able to run existing iconv tests because they do not have
working 8859-15.
Thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] shortlog: respect commit encoding
2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano
@ 2009-11-25 14:00 ` Uwe Kleine-König
0 siblings, 0 replies; 5+ messages in thread
From: Uwe Kleine-König @ 2009-11-25 14:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jiri Kosina
Hello Junio,
On Tue, Nov 24, 2009 at 05:12:14PM -0800, Junio C Hamano wrote:
> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>
> > Before this change the author was taken from the raw commit without
> > reencoding.
>
> I see people often begin with "before this change" and stop the log
> message after making a statement of a fact. I mildly dislike this style,
> especially when the resulting message does not state that it is bad (and
> if necessary why it is bad) nor state in what way the code after the
> change is good.
>
> Don't take the author name information without re-encoding
> from the raw commit object buffer.
>
> is easier to read, at least for me.
Yes, that's better. Thanks.
> > while (*buffer && *buffer != '\n') {
> > const char *eol = strchr(buffer, '\n');
> >
> > - if (eol == NULL)
> > + if (eol == NULL) {
> > eol = buffer + strlen(buffer);
> > - else
> > + } else
> > eol++;
> > if (!prefixcmp(buffer, "author "))
>
> What is this hunk for?
This is just a left-over from debugging. Removed.
> > @@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
> > die("Missing author: %s",
> > sha1_to_hex(commit->object.sha1));
> > if (log->user_format) {
> > - struct strbuf buf = STRBUF_INIT;
> > struct pretty_print_context ctx = {0};
> > ctx.abbrev = DEFAULT_ABBREV;
> > ctx.subject = "";
> > ctx.after_subject = "";
> > ctx.date_mode = DATE_NORMAL;
> > + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx);
> > + buffer = ufbuf.buf;
> > +
> > + } else if (*buffer)
> > buffer++;
> > +
>
> You probably wanted to add an extra pair of {} around this "else
> if" clause instead, not the earlier one.
I removed the new line (the last changed line you quoted) instead.
Good?
> > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
> > index 405b971..118204b 100755
> > --- a/t/t4201-shortlog.sh
> > +++ b/t/t4201-shortlog.sh
> > @@ -51,5 +51,29 @@ git log HEAD > log
> > GIT_DIR=non-existing git shortlog -w < log > out
> >
> > test_expect_success 'shortlog from non-git directory' 'test_cmp expect out'
> > +iconvfromutf8toiso885915() {
> > + printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15
> > +}
>
> A bad use of "$@" that expands to $# individual words; you meant
> to say "$*".
OK.
> Could we please have the following inside its own test, so that
> any failure while preparing the test data is caught as an error?
I put it in the test itself. Isn't it ugly to have a test saying
something like
* ok 3: prepare shortlog encoding test
? Or is it better to see where a failure occurs?
> > +git reset --hard "$commit"
> > +git config --unset i18n.commitencoding
> > +echo 2 > a1
> > +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1
> > +
> > +git config i18n.commitencoding "ISO-8859-15"
> > +echo 3 > a1
> > +git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1
> > +git config --unset i18n.commitencoding
> > +
> > +git shortlog HEAD~2.. > out
> > +
> > +cat > expect << EOF
> > +Jöhännës "Dschö" Schindëlin (2):
> > + set a1 to 2 and some non-ASCII chars: Äßø
> > + set a1 to 3 and some non-ASCII chars: áæï
> > +
> > +EOF
> > +
> > +test_expect_success 'shortlog encoding' 'test_cmp expect out'
>
> t3900-i18n-commit already uses 8859-1 so if it is not too much to
> ask, it would be much nicer to have these test work between UTF-8
> and 8859-1, not -15.
>
> That way, I do not have to worry about breaking tests for people
> who were able to run existing iconv tests because they do not have
> working 8859-15.
OK.
Below is the updated patch.
Best regards
Uwe
------------------>8----------------------
From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Subject: [PATCH] shortlog: respect commit encoding
Don't take the author name information without re-encoding from the raw
commit object buffer.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
---
builtin-shortlog.c | 20 ++++++++++++--------
t/t4201-shortlog.sh | 23 +++++++++++++++++++++++
2 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/builtin-shortlog.c b/builtin-shortlog.c
index 8aa63c7..263adc1 100644
--- a/builtin-shortlog.c
+++ b/builtin-shortlog.c
@@ -139,8 +139,13 @@ static void read_from_stdin(struct shortlog *log)
void shortlog_add_commit(struct shortlog *log, struct commit *commit)
{
const char *author = NULL, *buffer;
+ struct strbuf buf = STRBUF_INIT;
+ struct strbuf ufbuf = STRBUF_INIT;
+ struct pretty_print_context ctx = {0};
- buffer = commit->buffer;
+ pretty_print_commit(CMIT_FMT_RAW, commit, &buf, &ctx);
+
+ buffer = buf.buf;
while (*buffer && *buffer != '\n') {
const char *eol = strchr(buffer, '\n');
@@ -157,20 +162,19 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit)
die("Missing author: %s",
sha1_to_hex(commit->object.sha1));
if (log->user_format) {
- struct strbuf buf = STRBUF_INIT;
struct pretty_print_context ctx = {0};
ctx.abbrev = DEFAULT_ABBREV;
ctx.subject = "";
ctx.after_subject = "";
ctx.date_mode = DATE_NORMAL;
- pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &buf, &ctx);
- insert_one_record(log, author, buf.buf);
- strbuf_release(&buf);
- return;
- }
- if (*buffer)
+ pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx);
+ buffer = ufbuf.buf;
+
+ } else if (*buffer)
buffer++;
insert_one_record(log, author, !*buffer ? "<none>" : buffer);
+ strbuf_release(&ufbuf);
+ strbuf_release(&buf);
}
static void get_from_rev(struct rev_info *rev, struct shortlog *log)
diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index 405b971..03b6950 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -52,4 +52,27 @@ GIT_DIR=non-existing git shortlog -w < log > out
test_expect_success 'shortlog from non-git directory' 'test_cmp expect out'
+iconvfromutf8toiso88591() {
+ printf "%s" "$*" | iconv -f UTF-8 -t ISO-8859-1
+}
+
+cat > expect << EOF
+Jöhännës "Dschö" Schindëlin (2):
+ set a1 to 2 and some non-ASCII chars: Äßø
+ set a1 to 3 and some non-ASCII chars: áæï
+
+EOF
+
+test_expect_success 'shortlog encoding' '
+git reset --hard "$commit" &&
+git config --unset i18n.commitencoding &&
+echo 2 > a1 &&
+git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1 &&
+git config i18n.commitencoding "ISO-8859-1" &&
+echo 3 > a1 &&
+git commit --quiet -m "$(iconvfromutf8toiso88591 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso88591 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1 &&
+git config --unset i18n.commitencoding &&
+git shortlog HEAD~2.. > out &&
+test_cmp expect out'
+
test_done
--
1.6.5.3
--
Pengutronix e.K. | Uwe Kleine-König |
Industrial Linux Solutions | http://www.pengutronix.de/ |
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-25 14:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1256757064-13669-1-git-send-email-u.kleine-koenig@pengutronix.de>
[not found] ` <alpine.LSU.2.00.0911021600530.4203@wotan.suse.de>
[not found] ` <20091103081447.GA20204@pengutronix.de>
[not found] ` <alpine.LSU.2.00.0911031144300.9988@wotan.suse.de>
[not found] ` <20091111114206.GA19652@pengutronix.de>
[not found] ` <alpine.LSU.2.00.0911111318260.15039@wotan.suse.de>
2009-11-11 14:13 ` commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"] Uwe Kleine-König
2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König
2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König
2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano
2009-11-25 14:00 ` Uwe Kleine-König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).