* Re: commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"] [not found] ` <alpine.LSU.2.00.0911111318260.15039@wotan.suse.de> @ 2009-11-11 14:13 ` Uwe Kleine-König 2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König 0 siblings, 1 reply; 5+ messages in thread From: Uwe Kleine-König @ 2009-11-11 14:13 UTC (permalink / raw) To: Jiri Kosina; +Cc: git On Wed, Nov 11, 2009 at 01:19:32PM +0100, Jiri Kosina wrote: > On Wed, 11 Nov 2009, Uwe Kleine-König wrote: > > > > Thanks for noticing, I had a bug in my git charset config for quite some > > > time. Fixed it now. > > Now my name is latin1 encoded. It should be utf-8, doesn't it? > > It's not latin1, it's iso-8859-2 For my name it makes no difference. > which is what I use on my terminals. > And git can handle that fine too (it stores the encoding together with the > commit). Ah, OK, that's news to me, but you're right. I have here: ~/gsrc/linux-2.6$ git cat-file commit 30ff0743f88a70f52a4de5ea5bcb1fd29bcfab2d tree 6121d35bb2606878be636e897fa77cd51804d724 parent 916b7c73db593510d5c38706be2f2888981747ee author Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de> 1256757064 +0100 committer Jiri Kosina <jkosina@suse.cz> 1257780224 +0100 encoding ISO-8859-2 tree-wide: fix typos "couter" -> "counter" This patch was generated by git grep -E -i -l 'couter' | xargs -r perl -p -i -e 's/couter/counter/' Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> So the remaining question is: Does the encoding specified in the encoding "header" also apply to the other headers? If yes[1] then there's a bug in git-shortlog ~/gsrc/linux-2.6$ git shortlog linus/master..trivial/for-next | grep Uwe Uwe Kleine-K�nig (5): (with linus/master = 799dd75b1a8380a967c929a4551895788c374b31, trivial/for-next = 4030ec040a0e21fe9953da70eaa59ee7b4f2297b). Best regards Uwe [1] git log linus/master..trivial/for-next looks OK, so I suspect it does apply. -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] shortlog: respect commit encoding 2009-11-11 14:13 ` commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"] Uwe Kleine-König @ 2009-11-24 15:12 ` Uwe Kleine-König 2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König 2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano 0 siblings, 2 replies; 5+ messages in thread From: Uwe Kleine-König @ 2009-11-24 15:12 UTC (permalink / raw) To: git; +Cc: Jiri Kosina Before this change the author was taken from the raw commit without reencoding. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> --- builtin-shortlog.c | 25 +++++++++++++++---------- t/t4201-shortlog.sh | 24 ++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 10 deletions(-) diff --git a/builtin-shortlog.c b/builtin-shortlog.c index 8aa63c7..050bda8 100644 --- a/builtin-shortlog.c +++ b/builtin-shortlog.c @@ -139,14 +139,19 @@ static void read_from_stdin(struct shortlog *log) void shortlog_add_commit(struct shortlog *log, struct commit *commit) { const char *author = NULL, *buffer; + struct strbuf buf = STRBUF_INIT; + struct strbuf ufbuf = STRBUF_INIT; + struct pretty_print_context ctx = {0}; - buffer = commit->buffer; + pretty_print_commit(CMIT_FMT_RAW, commit, &buf, &ctx); + + buffer = buf.buf; while (*buffer && *buffer != '\n') { const char *eol = strchr(buffer, '\n'); - if (eol == NULL) + if (eol == NULL) { eol = buffer + strlen(buffer); - else + } else eol++; if (!prefixcmp(buffer, "author ")) @@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) die("Missing author: %s", sha1_to_hex(commit->object.sha1)); if (log->user_format) { - struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = DEFAULT_ABBREV; ctx.subject = ""; ctx.after_subject = ""; ctx.date_mode = DATE_NORMAL; - pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &buf, &ctx); - insert_one_record(log, author, buf.buf); - strbuf_release(&buf); - return; - } - if (*buffer) + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx); + buffer = ufbuf.buf; + + } else if (*buffer) buffer++; + insert_one_record(log, author, !*buffer ? "<none>" : buffer); + strbuf_release(&ufbuf); + strbuf_release(&buf); } static void get_from_rev(struct rev_info *rev, struct shortlog *log) diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index 405b971..118204b 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -51,5 +51,29 @@ git log HEAD > log GIT_DIR=non-existing git shortlog -w < log > out test_expect_success 'shortlog from non-git directory' 'test_cmp expect out' +iconvfromutf8toiso885915() { + printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15 +} + +git reset --hard "$commit" +git config --unset i18n.commitencoding +echo 2 > a1 +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1 + +git config i18n.commitencoding "ISO-8859-15" +echo 3 > a1 +git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1 +git config --unset i18n.commitencoding + +git shortlog HEAD~2.. > out + +cat > expect << EOF +Jöhännës "Dschö" Schindëlin (2): + set a1 to 2 and some non-ASCII chars: Äßø + set a1 to 3 and some non-ASCII chars: áæï + +EOF + +test_expect_success 'shortlog encoding' 'test_cmp expect out' test_done -- 1.6.5.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] 2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König @ 2009-11-24 16:08 ` Uwe Kleine-König 2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano 1 sibling, 0 replies; 5+ messages in thread From: Uwe Kleine-König @ 2009-11-24 16:08 UTC (permalink / raw) To: git; +Cc: Jiri Kosina Hello, On Tue, Nov 24, 2009 at 04:12:35PM +0100, Uwe Kleine-König wrote: > Before this change the author was taken from the raw commit without > reencoding. while at it, userformats have the same problem: linux-2.6$ for rev in b71a8eb bc9be01; do git show --format=%an $rev | head -n1; git show $rev | grep Auth; done Uwe Kleine-K�nig Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Uwe Kleine-König Author: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> That is, git show correctly reencodes its output for b71a8eb, but only without --format=... (The patches above are in linux-next.) And now take this (assuming locale and commitencoding are utf-8): git init git config user.name 'Jöhännës Dschö' echo spam > ham git add ham git commit -m 'initial commit' git branch branch echo more spam > ham git add ham git commit -m "Commitlog matching '\nencoding:' encoding: latin1 " git checkout branch echo much more spam > ham git add ham git commit -C master git show | grep Author: Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] shortlog: respect commit encoding 2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König 2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König @ 2009-11-25 1:12 ` Junio C Hamano 2009-11-25 14:00 ` Uwe Kleine-König 1 sibling, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2009-11-25 1:12 UTC (permalink / raw) To: Uwe Kleine-König; +Cc: git, Jiri Kosina Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes: > Before this change the author was taken from the raw commit without > reencoding. I see people often begin with "before this change" and stop the log message after making a statement of a fact. I mildly dislike this style, especially when the resulting message does not state that it is bad (and if necessary why it is bad) nor state in what way the code after the change is good. Don't take the author name information without re-encoding from the raw commit object buffer. is easier to read, at least for me. > while (*buffer && *buffer != '\n') { > const char *eol = strchr(buffer, '\n'); > > - if (eol == NULL) > + if (eol == NULL) { > eol = buffer + strlen(buffer); > - else > + } else > eol++; > if (!prefixcmp(buffer, "author ")) What is this hunk for? > @@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > die("Missing author: %s", > sha1_to_hex(commit->object.sha1)); > if (log->user_format) { > - struct strbuf buf = STRBUF_INIT; > struct pretty_print_context ctx = {0}; > ctx.abbrev = DEFAULT_ABBREV; > ctx.subject = ""; > ctx.after_subject = ""; > ctx.date_mode = DATE_NORMAL; > + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx); > + buffer = ufbuf.buf; > + > + } else if (*buffer) > buffer++; > + You probably wanted to add an extra pair of {} around this "else if" clause instead, not the earlier one. Otherwise the change looks good from my cursory look. > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > index 405b971..118204b 100755 > --- a/t/t4201-shortlog.sh > +++ b/t/t4201-shortlog.sh > @@ -51,5 +51,29 @@ git log HEAD > log > GIT_DIR=non-existing git shortlog -w < log > out > > test_expect_success 'shortlog from non-git directory' 'test_cmp expect out' > +iconvfromutf8toiso885915() { > + printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15 > +} A bad use of "$@" that expands to $# individual words; you meant to say "$*". Could we please have the following inside its own test, so that any failure while preparing the test data is caught as an error? > +git reset --hard "$commit" > +git config --unset i18n.commitencoding > +echo 2 > a1 > +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1 > + > +git config i18n.commitencoding "ISO-8859-15" > +echo 3 > a1 > +git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1 > +git config --unset i18n.commitencoding > + > +git shortlog HEAD~2.. > out > + > +cat > expect << EOF > +Jöhännës "Dschö" Schindëlin (2): > + set a1 to 2 and some non-ASCII chars: Äßø > + set a1 to 3 and some non-ASCII chars: áæï > + > +EOF > + > +test_expect_success 'shortlog encoding' 'test_cmp expect out' t3900-i18n-commit already uses 8859-1 so if it is not too much to ask, it would be much nicer to have these test work between UTF-8 and 8859-1, not -15. That way, I do not have to worry about breaking tests for people who were able to run existing iconv tests because they do not have working 8859-15. Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] shortlog: respect commit encoding 2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano @ 2009-11-25 14:00 ` Uwe Kleine-König 0 siblings, 0 replies; 5+ messages in thread From: Uwe Kleine-König @ 2009-11-25 14:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jiri Kosina Hello Junio, On Tue, Nov 24, 2009 at 05:12:14PM -0800, Junio C Hamano wrote: > Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes: > > > Before this change the author was taken from the raw commit without > > reencoding. > > I see people often begin with "before this change" and stop the log > message after making a statement of a fact. I mildly dislike this style, > especially when the resulting message does not state that it is bad (and > if necessary why it is bad) nor state in what way the code after the > change is good. > > Don't take the author name information without re-encoding > from the raw commit object buffer. > > is easier to read, at least for me. Yes, that's better. Thanks. > > while (*buffer && *buffer != '\n') { > > const char *eol = strchr(buffer, '\n'); > > > > - if (eol == NULL) > > + if (eol == NULL) { > > eol = buffer + strlen(buffer); > > - else > > + } else > > eol++; > > if (!prefixcmp(buffer, "author ")) > > What is this hunk for? This is just a left-over from debugging. Removed. > > @@ -157,20 +162,20 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) > > die("Missing author: %s", > > sha1_to_hex(commit->object.sha1)); > > if (log->user_format) { > > - struct strbuf buf = STRBUF_INIT; > > struct pretty_print_context ctx = {0}; > > ctx.abbrev = DEFAULT_ABBREV; > > ctx.subject = ""; > > ctx.after_subject = ""; > > ctx.date_mode = DATE_NORMAL; > > + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx); > > + buffer = ufbuf.buf; > > + > > + } else if (*buffer) > > buffer++; > > + > > You probably wanted to add an extra pair of {} around this "else > if" clause instead, not the earlier one. I removed the new line (the last changed line you quoted) instead. Good? > > diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh > > index 405b971..118204b 100755 > > --- a/t/t4201-shortlog.sh > > +++ b/t/t4201-shortlog.sh > > @@ -51,5 +51,29 @@ git log HEAD > log > > GIT_DIR=non-existing git shortlog -w < log > out > > > > test_expect_success 'shortlog from non-git directory' 'test_cmp expect out' > > +iconvfromutf8toiso885915() { > > + printf "%s" "$@" | iconv -f UTF-8 -t ISO-8859-15 > > +} > > A bad use of "$@" that expands to $# individual words; you meant > to say "$*". OK. > Could we please have the following inside its own test, so that > any failure while preparing the test data is caught as an error? I put it in the test itself. Isn't it ugly to have a test saying something like * ok 3: prepare shortlog encoding test ? Or is it better to see where a failure occurs? > > +git reset --hard "$commit" > > +git config --unset i18n.commitencoding > > +echo 2 > a1 > > +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1 > > + > > +git config i18n.commitencoding "ISO-8859-15" > > +echo 3 > a1 > > +git commit --quiet -m "$(iconvfromutf8toiso885915 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso885915 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1 > > +git config --unset i18n.commitencoding > > + > > +git shortlog HEAD~2.. > out > > + > > +cat > expect << EOF > > +Jöhännës "Dschö" Schindëlin (2): > > + set a1 to 2 and some non-ASCII chars: Äßø > > + set a1 to 3 and some non-ASCII chars: áæï > > + > > +EOF > > + > > +test_expect_success 'shortlog encoding' 'test_cmp expect out' > > t3900-i18n-commit already uses 8859-1 so if it is not too much to > ask, it would be much nicer to have these test work between UTF-8 > and 8859-1, not -15. > > That way, I do not have to worry about breaking tests for people > who were able to run existing iconv tests because they do not have > working 8859-15. OK. Below is the updated patch. Best regards Uwe ------------------>8---------------------- From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Subject: [PATCH] shortlog: respect commit encoding Don't take the author name information without re-encoding from the raw commit object buffer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> --- builtin-shortlog.c | 20 ++++++++++++-------- t/t4201-shortlog.sh | 23 +++++++++++++++++++++++ 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/builtin-shortlog.c b/builtin-shortlog.c index 8aa63c7..263adc1 100644 --- a/builtin-shortlog.c +++ b/builtin-shortlog.c @@ -139,8 +139,13 @@ static void read_from_stdin(struct shortlog *log) void shortlog_add_commit(struct shortlog *log, struct commit *commit) { const char *author = NULL, *buffer; + struct strbuf buf = STRBUF_INIT; + struct strbuf ufbuf = STRBUF_INIT; + struct pretty_print_context ctx = {0}; - buffer = commit->buffer; + pretty_print_commit(CMIT_FMT_RAW, commit, &buf, &ctx); + + buffer = buf.buf; while (*buffer && *buffer != '\n') { const char *eol = strchr(buffer, '\n'); @@ -157,20 +162,19 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) die("Missing author: %s", sha1_to_hex(commit->object.sha1)); if (log->user_format) { - struct strbuf buf = STRBUF_INIT; struct pretty_print_context ctx = {0}; ctx.abbrev = DEFAULT_ABBREV; ctx.subject = ""; ctx.after_subject = ""; ctx.date_mode = DATE_NORMAL; - pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &buf, &ctx); - insert_one_record(log, author, buf.buf); - strbuf_release(&buf); - return; - } - if (*buffer) + pretty_print_commit(CMIT_FMT_USERFORMAT, commit, &ufbuf, &ctx); + buffer = ufbuf.buf; + + } else if (*buffer) buffer++; insert_one_record(log, author, !*buffer ? "<none>" : buffer); + strbuf_release(&ufbuf); + strbuf_release(&buf); } static void get_from_rev(struct rev_info *rev, struct shortlog *log) diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh index 405b971..03b6950 100755 --- a/t/t4201-shortlog.sh +++ b/t/t4201-shortlog.sh @@ -52,4 +52,27 @@ GIT_DIR=non-existing git shortlog -w < log > out test_expect_success 'shortlog from non-git directory' 'test_cmp expect out' +iconvfromutf8toiso88591() { + printf "%s" "$*" | iconv -f UTF-8 -t ISO-8859-1 +} + +cat > expect << EOF +Jöhännës "Dschö" Schindëlin (2): + set a1 to 2 and some non-ASCII chars: Äßø + set a1 to 3 and some non-ASCII chars: áæï + +EOF + +test_expect_success 'shortlog encoding' ' +git reset --hard "$commit" && +git config --unset i18n.commitencoding && +echo 2 > a1 && +git commit --quiet -m "set a1 to 2 and some non-ASCII chars: Äßø" --author="Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>" a1 && +git config i18n.commitencoding "ISO-8859-1" && +echo 3 > a1 && +git commit --quiet -m "$(iconvfromutf8toiso88591 "set a1 to 3 and some non-ASCII chars: áæï")" --author="$(iconvfromutf8toiso88591 "Jöhännës \"Dschö\" Schindëlin <Johannes.Schindelin@gmx.de>")" a1 && +git config --unset i18n.commitencoding && +git shortlog HEAD~2.. > out && +test_cmp expect out' + test_done -- 1.6.5.3 -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-11-25 14:01 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1256757064-13669-1-git-send-email-u.kleine-koenig@pengutronix.de> [not found] ` <alpine.LSU.2.00.0911021600530.4203@wotan.suse.de> [not found] ` <20091103081447.GA20204@pengutronix.de> [not found] ` <alpine.LSU.2.00.0911031144300.9988@wotan.suse.de> [not found] ` <20091111114206.GA19652@pengutronix.de> [not found] ` <alpine.LSU.2.00.0911111318260.15039@wotan.suse.de> 2009-11-11 14:13 ` commit log encoding [Was: [PATCH 1/2] tree-wide: fix typos "offest" -> "offset"] Uwe Kleine-König 2009-11-24 15:12 ` [PATCH] shortlog: respect commit encoding Uwe Kleine-König 2009-11-24 16:08 ` more problems with commit encoding [Was: [PATCH] shortlog: respect commit encoding] Uwe Kleine-König 2009-11-25 1:12 ` [PATCH] shortlog: respect commit encoding Junio C Hamano 2009-11-25 14:00 ` Uwe Kleine-König
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).