* [PATCH] Build in gettext poison feature unconditionally @ 2012-08-21 4:39 Nguyễn Thái Ngọc Duy 2012-08-21 5:24 ` Jonathan Nieder 0 siblings, 1 reply; 10+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-08-21 4:39 UTC (permalink / raw) To: git Cc: Junio C Hamano, Ævar Arnfjörð, Jonathan Nieder, Nguyễn Thái Ngọc Duy Making gettext poison second citizen makes it easy to forget to run the test suite with it. This is step one to running test suite with gettext poison by default. The runtime cost should be small. "gcc -O2" inlines _() and use_gettext_poison(). But even if it does not, performance should not be impacted as _() calls are usually not on critical path. If some of them are, we better fix there as gettext() may or may not be cheap anyway. A new target is added to run test with poisoned gettext: test-poison Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- I don't know the story behind this compile-time switch. The series [1] that introduces it does not say much. This at least makes it easier for me to run poison tests instead of building another binary, if I remember it. Next step could be make "make test" run both normal and poison modes, but I'm not sure how to do it nicely yet. [1] http://thread.gmane.org/gmane.comp.version-control.git/167307/focus=167438 Makefile | 12 +++--------- gettext.c | 10 ---------- gettext.h | 12 +++++++----- po/README | 13 ++++--------- 4 files changed, 14 insertions(+), 33 deletions(-) diff --git a/Makefile b/Makefile index 6b0c961..8255fb9 100644 --- a/Makefile +++ b/Makefile @@ -258,11 +258,6 @@ all:: # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the # user. # -# Define GETTEXT_POISON if you are debugging the choice of strings marked -# for translation. In a GETTEXT_POISON build, you can turn all strings marked -# for translation into gibberish by setting the GIT_GETTEXT_POISON variable -# (to any value) in your environment. -# # Define JSMIN to point to JavaScript minifier that functions as # a filter to have gitweb.js minified. # @@ -1594,9 +1589,6 @@ endif ifdef NO_SYMLINK_HEAD BASIC_CFLAGS += -DNO_SYMLINK_HEAD endif -ifdef GETTEXT_POISON - BASIC_CFLAGS += -DGETTEXT_POISON -endif ifdef NO_GETTEXT BASIC_CFLAGS += -DNO_GETTEXT USE_GETTEXT_SCHEME ?= fallthrough @@ -2497,7 +2489,6 @@ ifdef GIT_TEST_CMP_USE_COPIED_CONTEXT @echo GIT_TEST_CMP_USE_COPIED_CONTEXT=YesPlease >>$@ endif @echo NO_GETTEXT=\''$(subst ','\'',$(subst ','\'',$(NO_GETTEXT)))'\' >>$@ - @echo GETTEXT_POISON=\''$(subst ','\'',$(subst ','\'',$(GETTEXT_POISON)))'\' >>$@ ifdef GIT_PERF_REPEAT_COUNT @echo GIT_PERF_REPEAT_COUNT=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_REPEAT_COUNT)))'\' >>$@ endif @@ -2545,6 +2536,9 @@ export NO_SVN_TESTS test: all $(MAKE) -C t/ all +poison-test: all + $(MAKE) POISON_GETTEXT=YesPlease -C t/ all + perf: all $(MAKE) -C t/perf/ all diff --git a/gettext.c b/gettext.c index f75bca7..6aa822c 100644 --- a/gettext.c +++ b/gettext.c @@ -16,16 +16,6 @@ # endif #endif -#ifdef GETTEXT_POISON -int use_gettext_poison(void) -{ - static int poison_requested = -1; - if (poison_requested == -1) - poison_requested = getenv("GIT_GETTEXT_POISON") ? 1 : 0; - return poison_requested; -} -#endif - #ifndef NO_GETTEXT static void init_gettext_charset(const char *domain) { diff --git a/gettext.h b/gettext.h index 57ba8bb..a77554f 100644 --- a/gettext.h +++ b/gettext.h @@ -36,11 +36,13 @@ static inline void git_setup_gettext(void) } #endif -#ifdef GETTEXT_POISON -extern int use_gettext_poison(void); -#else -#define use_gettext_poison() 0 -#endif +static inline int use_gettext_poison(void) +{ + static int poison_requested = -1; + if (poison_requested == -1) + poison_requested = getenv("GIT_GETTEXT_POISON") ? 1 : 0; + return poison_requested; +} static inline FORMAT_PRESERVING(1) const char *_(const char *msgid) { diff --git a/po/README b/po/README index c1520e8..e25752b 100644 --- a/po/README +++ b/po/README @@ -270,16 +270,11 @@ something in the test suite might still depend on the US English version of the strings, e.g. to grep some error message or other output. -To smoke out issues like these Git can be compiled with gettext poison -support, at the top-level: +To smoke out issues like these you should run the test suite with +gettext poison support, which emits gibberish on every call to +gettext: - make GETTEXT_POISON=YesPlease - -That'll give you a git which emits gibberish on every call to -gettext. It's obviously not meant to be installed, but you should run -the test suite with it: - - cd t && prove -j 9 ./t[0-9]*.sh + cd t && GETTEXT_POISON=YesPlease prove -j 9 ./t[0-9]*.sh If tests break with it you should inspect them manually and see if what you're translating is sane, i.e. that you're not translating -- 1.7.12.rc2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] Build in gettext poison feature unconditionally 2012-08-21 4:39 [PATCH] Build in gettext poison feature unconditionally Nguyễn Thái Ngọc Duy @ 2012-08-21 5:24 ` Jonathan Nieder 2012-08-21 16:37 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Nieder @ 2012-08-21 5:24 UTC (permalink / raw) To: Nguy� n Thái Ngọc Duy Cc: git, Junio C Hamano, Ævar Arnfjörð Bjarmason Nguyễn Thái Ngọc Duy wrote: > The runtime cost should be small. "gcc -O2" inlines _() and > use_gettext_poison(). But even if it does not, performance should not > be impacted as _() calls are usually not on critical path. If some of > them are, we better fix there as gettext() may or may not be cheap > anyway. That seems reasonable to me. The increase in code size of a commonly inlined function and the extra "if" in a common if not performance-critical codepath is annoying (and I'd prefer to keep use_gettext_poison() un-inlined), but in any event the cost would go away once the translation-based implementation of poison lands. [...] > I don't know the story behind this compile-time switch. The series [1] > that introduces it does not say much. I think it was just paranoia about performance regressions. > This at least makes it easier for me to run poison tests instead of > building another binary, if I remember it. Next step could be make > "make test" run both normal and poison modes, but I'm not sure how to > do it nicely yet. Yes, that would be nice (or perhaps a mode to run most tests in the current locale and rerun test assertions that use a test_i18n* helper or C_LOCALE_OUTPUT prerequisite in the C locale). Hope that helps, Jonathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Build in gettext poison feature unconditionally 2012-08-21 5:24 ` Jonathan Nieder @ 2012-08-21 16:37 ` Junio C Hamano 2012-08-22 5:27 ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2012-08-21 16:37 UTC (permalink / raw) To: Jonathan Nieder; +Cc: pclouds, git, Ævar Arnfjörð Bjarmason Jonathan Nieder <jrnieder@gmail.com> writes: > Nguyễn Thái Ngọc Duy wrote: > >> The runtime cost should be small. "gcc -O2" inlines _() and >> use_gettext_poison(). But even if it does not, performance should not >> be impacted as _() calls are usually not on critical path. If some of >> them are, we better fix there as gettext() may or may not be cheap >> anyway. > > That seems reasonable to me. The increase in code size of a commonly > inlined function and the extra "if" in a common if not > performance-critical codepath is annoying (and I'd prefer to keep > use_gettext_poison() un-inlined), but in any event the cost would go > away once the translation-based implementation of poison lands. I would say it is not worse than just "annoying"; if the cost will go away, I'd rather see this conversion postponed and is done as part of (and preferrably at the end of) the "poison with a poison-locale" series. > Yes, that would be nice (or perhaps a mode to run most tests in > the current locale and rerun test assertions that use a test_i18n* > helper or C_LOCALE_OUTPUT prerequisite in the C locale). Yeah, I think that is the right direction to go; I suspect that "poison with a poison-locale" would have to be ready to allow us to go there, though. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Support generate poison .mo files for testing 2012-08-21 16:37 ` Junio C Hamano @ 2012-08-22 5:27 ` Nguyễn Thái Ngọc Duy 2012-08-22 11:13 ` Junio C Hamano 2012-08-22 16:43 ` Junio C Hamano 0 siblings, 2 replies; 10+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2012-08-22 5:27 UTC (permalink / raw) To: git Cc: Junio C Hamano, Ævar Arnfjörð, Jonathan Nieder, Nguyễn Thái Ngọc Duy test-poisongen does a similar job to gettext poison feature except that it does it at build time. Gibberish .mo files are generated for all supported langauges and put in po/build/poison-locale. Target "poison-locale" is for this. User can run the test with these .mo files by setting POISON_LOCALE while running the test suite. User must also set LANG/LC_* correctly (and the system is supposed to support that locale). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- On Tue, Aug 21, 2012 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote: > I would say it is not worse than just "annoying"; if the cost will > go away, I'd rather see this conversion postponed and is done as > part of (and preferrably at the end of) the "poison with a > poison-locale" series. OK let me redo step one. test-poisongen requires libgettextpo. I'm not sure if this library if gnu specific. We may need another flag for it instead of NO_GETTEXT. We don't need a fake language code with this approach. Makefile | 19 ++++++++ t/test-lib.sh | 10 +++- test-poisongen.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ wrap-for-bin.sh | 6 ++- 4 files changed, 171 insertions(+), 3 deletions(-) mode change 100644 => 100755 t/test-lib.sh create mode 100644 test-poisongen.c mode change 100644 => 100755 wrap-for-bin.sh diff --git a/Makefile b/Makefile index 6b0c961..6ea2665 100644 --- a/Makefile +++ b/Makefile @@ -496,6 +496,9 @@ TEST_PROGRAMS_NEED_X += test-mergesort TEST_PROGRAMS_NEED_X += test-mktemp TEST_PROGRAMS_NEED_X += test-parse-options TEST_PROGRAMS_NEED_X += test-path-utils +ifndef NO_GETTEXT +TEST_PROGRAMS_NEED_X += test-poisongen +endif TEST_PROGRAMS_NEED_X += test-revision-walking TEST_PROGRAMS_NEED_X += test-run-command TEST_PROGRAMS_NEED_X += test-scrap-cache-tree @@ -2428,6 +2431,19 @@ endif po/build/locale/%/LC_MESSAGES/git.mo: po/%.po $(QUIET_MSGFMT)mkdir -p $(dir $@) && $(MSGFMT) -o $@ $< +ifndef NO_GETTEXT +POISON_MOFILES := $(patsubst po/%.po,po/build/poison-locale/%/LC_MESSAGES/git.mo,$(POFILES)) + +po/build/poison-locale/%.po: po/%.po test-poisongen$X po/git.pot + $(QUIET_MSGFMT)mkdir -p $(dir $@) && \ + ./test-poisongen po/git.pot $@ + +po/build/poison-locale/%/LC_MESSAGES/git.mo: po/build/poison-locale/%.po + $(QUIET_MSGFMT)mkdir -p $(dir $@) && $(MSGFMT) -o $@ $< + +poison-locale: $(POISON_MOFILES) +endif + FIND_SOURCE_FILES = ( git ls-files '*.[hcS]' 2>/dev/null || \ $(FIND) . \( -name .git -type d -prune \) \ -o \( -name '*.[hcS]' -type f -print \) ) @@ -2564,6 +2580,9 @@ test-svn-fe$X: vcs-svn/lib.a .PRECIOUS: $(TEST_OBJS) +test-poisongen$X: test-poisongen.o GIT-LDFLAGS $(GITLIBS) + $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS) -lgettextpo + test-%$X: test-%.o GIT-LDFLAGS $(GITLIBS) $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS) diff --git a/t/test-lib.sh b/t/test-lib.sh old mode 100644 new mode 100755 index bb4f886..d4060e8 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -55,8 +55,10 @@ GIT_BUILD_DIR="$TEST_DIRECTORY"/.. export PERL_PATH SHELL_PATH # For repeatability, reset the environment to known value. -LANG=C -LC_ALL=C +if [ -z "$POISON_LOCALE" ]; then + LANG=C + LC_ALL=C +fi PAGER=cat TZ=UTC TERM=dumb @@ -92,6 +94,10 @@ export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME export EDITOR +if test -n "$POISON_LOCALE"; then + GIT_POISON_LOCALE=yes + export GIT_POISON_LOCALE +fi # Protect ourselves from common misconfiguration to export # CDPATH into the environment diff --git a/test-poisongen.c b/test-poisongen.c new file mode 100644 index 0000000..5905aa9 --- /dev/null +++ b/test-poisongen.c @@ -0,0 +1,139 @@ +#include <gettext-po.h> +#include "cache.h" +#include "strbuf.h" + +static void xerror(int severity, + po_message_t message, + const char *filename, size_t lineno, size_t column, + int multiline_p, const char *message_text) +{ + die("%s:%d:%d %s", filename, lineno, column, message_text); +} + +static void xerror2(int severity, + po_message_t message1, + const char *filename1, size_t lineno1, size_t column1, + int multiline_p1, const char *message_text1, + po_message_t message2, + const char *filename2, size_t lineno2, size_t column2, + int multiline_p2, const char *message_text2) +{ + die("%s:%d:%d %s (%s:%d:%d %s)", + filename1, lineno1, column1, message_text1, + filename2, lineno2, column2, message_text2); +} + +static void translate(const char *msg, struct strbuf *buf) +{ + const char *end = msg + strlen(msg); + const char *text = "* GETTEXT POISON *"; + int text_len = strlen(text); + int t = 0; + + strbuf_reset(buf); + /* preserve \n and printf format specifiers because msgfmt + barfs otherwise. */ + while (msg < end) { + /* printf specifiers and shell variables, it's a quite + relax check */ + if ((*msg == '%' || *msg == '$') && msg+1 < end) { + strbuf_addch(buf, *msg++); + do + strbuf_addch(buf, *msg); + while (msg < end && !isspace(*msg++)); + } else if (*msg == '\n') { + /* we only need to preserve trailing newlines, doing + more does not really harm */ + strbuf_addch(buf, '\n'); + msg++; + } else { + strbuf_addch(buf, text[t]); + t = (t + 1) % text_len; + msg++; + } + } +} + +int main(int argc, char **argv) +{ + const char *project_header = "Project-Id-Version: "; + const char *charset_header = "Content-Type: text/plain; charset="; + const char *plural_header = "Plural-Forms: nplurals="; + struct po_xerror_handler handler = { xerror, xerror2 }; + po_file_t src; + const char *file = argv[1]; + po_message_iterator_t iter; + po_message_t msg; + struct strbuf buf = STRBUF_INIT; + + if (argc != 3) + die("usage: test-poisongen <pot file> <poison po file>"); + + src = po_file_read(file, &handler); + if (src == NULL) + die("could not open %s\n", file); + + iter = po_message_iterator(src, "messages"); + while ((msg = po_next_message(iter)) != NULL) { + /* msgid "" is the header, special handling */ + if (po_message_msgid(msg)[0] == '\0') { + const char *p; + + /* no fuzzy, msgfmt does not like it */ + po_message_set_fuzzy(msg, 0); + + strbuf_reset(&buf); + strbuf_addstr(&buf, po_message_msgstr(msg)); + + if (!prefixcmp(buf.buf, project_header) && + !prefixcmp(buf.buf + strlen(project_header), + "PACKAGE VERSION\n")) { + strbuf_splice(&buf, strlen(project_header), + strlen("PACKAGE VERSION"), + "git poison", + strlen("git poison")); + } + + /* Content-Type: text/plain; charset=UTF-8 */ + if ((p = strstr(buf.buf, charset_header)) != NULL && + !prefixcmp(p + strlen(charset_header), "CHARSET\n")) { + p += strlen(charset_header); + strbuf_splice(&buf, p - buf.buf, strlen("CHARSET"), + "UTF-8", strlen("UTF-8")); + } + + /* Plural-Forms: nplurals=2; plural=1 */ + if ((p = strstr(buf.buf, plural_header)) != NULL && + !prefixcmp(p + strlen(plural_header), "INTEGER; plural=EXPRESSION")) { + int offset; + p += strlen(plural_header); + offset = p - buf.buf; + strbuf_splice(&buf, offset, strlen("INTEGER"), "2", 1); + offset += 1; + assert(!prefixcmp(buf.buf + offset, "; plural=EXPRESSION")); + offset += strlen("; plural="); + strbuf_splice(&buf, offset, strlen("EXPRESSION"), "1", 1); + } + + po_message_set_msgstr(msg, buf.buf); + continue; + } + if (po_message_msgid_plural(msg)) { + int index = 0; + + translate(po_message_msgid(msg), &buf); + po_message_set_msgstr_plural(msg, index++, buf.buf); + + while (po_message_msgstr_plural(msg, index)) { + translate(po_message_msgid_plural(msg), &buf); + po_message_set_msgstr_plural(msg, index++, buf.buf); + } + } else { + translate(po_message_msgid(msg), &buf); + po_message_set_msgstr(msg, buf.buf); + } + } + + po_file_write(src, argv[2], &handler); + return 0; +} diff --git a/wrap-for-bin.sh b/wrap-for-bin.sh old mode 100644 new mode 100755 index 53a8dd0..99bc816 --- a/wrap-for-bin.sh +++ b/wrap-for-bin.sh @@ -15,7 +15,11 @@ else export GIT_TEMPLATE_DIR fi GITPERLLIB='@@BUILD_DIR@@/perl/blib/lib' -GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/locale' +if test -n "$GIT_POISON_LOCALE"; then + GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/poison-locale' +else + GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/locale' +fi PATH='@@BUILD_DIR@@/bin-wrappers:'"$PATH" export GIT_EXEC_PATH GITPERLLIB PATH GIT_TEXTDOMAINDIR -- 1.7.12.rc2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 5:27 ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy @ 2012-08-22 11:13 ` Junio C Hamano 2012-08-22 12:37 ` Nguyen Thai Ngoc Duy 2012-08-22 16:43 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2012-08-22 11:13 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, Ævar Arnfjörð, Jonathan Nieder Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > test-poisongen does a similar job to gettext poison feature except > that it does it at build time. Gibberish .mo files are generated for > all supported langauges and put in po/build/poison-locale. Target > "poison-locale" is for this. What is the significance of this locale being "Gibberish"? Currently, for any string, we give "### gettext poison ###" or something but the only thing we care about in the poison mode is that it is different from the message id, no? I was wondering if these phony translations can be something simple like "Add QQ at the beginning of the message id string" and still can catch mistakenly marked messages that come from the plumbing layer, or something. As you have already written a printf skipper that looks fairly conservative, perhaps I shouldn't be worried too much about it, but we seem to be spending considerable effort on the "poison", and it makes me wonder (even though no better alternative comes to mind) if we could do better. The reason we do "poison" (be it the current one or locale based one) in the first place is so that we want to make sure messages from the plumbing are not marked for i18n, and we do so by running our test under the "poison" mode that produces output different from the in-code text that are marked for i18n, which somehow feels quite a roundabout way of doing so. > User can run the test with these .mo files by setting POISON_LOCALE > while running the test suite. User must also set LANG/LC_* correctly > (and the system is supposed to support that locale). > OK let me redo step one. test-poisongen requires libgettextpo. I'm > not sure if this library if gnu specific. We may need another flag > for it instead of NO_GETTEXT. We don't need a fake language code with > this approach. OK. > Makefile | 19 ++++++++ > t/test-lib.sh | 10 +++- > test-poisongen.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > wrap-for-bin.sh | 6 ++- > 4 files changed, 171 insertions(+), 3 deletions(-) > mode change 100644 => 100755 t/test-lib.sh > create mode 100644 test-poisongen.c > mode change 100644 => 100755 wrap-for-bin.sh Thanks. I suspect two mode changes weren't intentional? > +static void translate(const char *msg, struct strbuf *buf) > +{ > + const char *end = msg + strlen(msg); > + const char *text = "* GETTEXT POISON *"; > + int text_len = strlen(text); > + int t = 0; > + > + strbuf_reset(buf); > + /* preserve \n and printf format specifiers because msgfmt > + barfs otherwise. */ > + while (msg < end) { > + /* printf specifiers and shell variables, it's a quite > + relax check */ > + if ((*msg == '%' || *msg == '$') && msg+1 < end) { > + strbuf_addch(buf, *msg++); > + do > + strbuf_addch(buf, *msg); > + while (msg < end && !isspace(*msg++)); > + } else if (*msg == '\n') { > + /* we only need to preserve trailing newlines, doing > + more does not really harm */ > + strbuf_addch(buf, '\n'); > + msg++; > + } else { > + strbuf_addch(buf, text[t]); > + t = (t + 1) % text_len; > + msg++; > + } > + } > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 11:13 ` Junio C Hamano @ 2012-08-22 12:37 ` Nguyen Thai Ngoc Duy 2012-08-22 16:22 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-08-22 12:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder On Wed, Aug 22, 2012 at 6:13 PM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > >> test-poisongen does a similar job to gettext poison feature except >> that it does it at build time. Gibberish .mo files are generated for >> all supported langauges and put in po/build/poison-locale. Target >> "poison-locale" is for this. > > What is the significance of this locale being "Gibberish"? > Currently, for any string, we give "### gettext poison ###" or > something but the only thing we care about in the poison mode is > that it is different from the message id, no? I was wondering if > these phony translations can be something simple like "Add QQ at the > beginning of the message id string" and still can catch mistakenly > marked messages that come from the plumbing layer, or something. I'm gradually getting there, partly thanks to your question about grepping "tracked" in another thread. This patch does not really generate random strings. It repeats the pattern "* gettext poison *" for evey character that can be replaced. But a better way could be replacing "tracked" with "t r a c k e d". We know the rule so we can recreate the that string from "tracked" in test_i18n*. Or reverse the upper/lower case, whichever is easier for the recreation by test_i18n* >> mode change 100644 => 100755 t/test-lib.sh >> create mode 100644 test-poisongen.c >> mode change 100644 => 100755 wrap-for-bin.sh > > Thanks. I suspect two mode changes weren't intentional? It's an emacs hook gone bad. -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 12:37 ` Nguyen Thai Ngoc Duy @ 2012-08-22 16:22 ` Junio C Hamano 2012-08-23 10:53 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2012-08-22 16:22 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: git, Ævar Arnfjörð, Jonathan Nieder Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > But a better way could be > replacing "tracked" with "t r a c k e d". We know the rule so we can > recreate the that string from "tracked" in test_i18n*. Or reverse the > upper/lower case, whichever is easier for the recreation by test_i18n* That does not make much sense to me, so either one of us must be slightly confused. I thought the only purpose of testing with the "poison" was to find messages that must not be localized but were localized by mistake. For that, we have to make sure that anything that uses test_i18n* is reading from Porcelain, in other words, we must use the byte-for-byte comparison without using test_i18n* when verifying the plumbing output. And the primary requirement for this arrangement to work is that the expected output in C locale and the actual ouptut in the synthetic poison locale are reliably different. They do not have to be reversible (I was actually going to suggest rot13 of the original instead of cycling through the "* gettext poison *" in your patch --- prefixing with QQ would not work, as it is likely that the test with "grep" may not be anchored at the left end). Teaching test_i18n* to fuzzily match the expected output in C locale against the actual output in synthetic poison locale may or may not be doable, but spending any cycle working on that sounds like a total waste of time (even though it might be fun). It does not test that we are translating Porcelain messages correctly. Am I missing something? Puzzled... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 16:22 ` Junio C Hamano @ 2012-08-23 10:53 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 10+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-08-23 10:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder On Wed, Aug 22, 2012 at 11:22 PM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > >> But a better way could be >> replacing "tracked" with "t r a c k e d". We know the rule so we can >> recreate the that string from "tracked" in test_i18n*. Or reverse the >> upper/lower case, whichever is easier for the recreation by test_i18n* > > That does not make much sense to me, so either one of us must be > slightly confused. I thought the only purpose of testing with the > "poison" was to find messages that must not be localized but were > localized by mistake. For that, we have to make sure that anything > that uses test_i18n* is reading from Porcelain, in other words, we > must use the byte-for-byte comparison without using test_i18n* when > verifying the plumbing output. And the primary requirement for this > arrangement to work is that the expected output in C locale and the > actual ouptut in the synthetic poison locale are reliably different. > They do not have to be reversible (I was actually going to suggest > rot13 of the original instead of cycling through the "* gettext > poison *" in your patch --- prefixing with QQ would not work, as it > is likely that the test with "grep" may not be anchored at the left > end). Yes, for verifying plumbing output that should be enough. > Teaching test_i18n* to fuzzily match the expected output in C locale > against the actual output in synthetic poison locale may or may not > be doable, but spending any cycle working on that sounds like a > total waste of time (even though it might be fun). It does not test > that we are translating Porcelain messages correctly. Right. I was aiming at testing translated messages but obviously went off track trying to test a fake locale instead. Thanks for stopping me. -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 5:27 ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy 2012-08-22 11:13 ` Junio C Hamano @ 2012-08-22 16:43 ` Junio C Hamano 2012-08-23 11:00 ` Nguyen Thai Ngoc Duy 1 sibling, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2012-08-22 16:43 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy Cc: git, Ævar Arnfjörð, Jonathan Nieder Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > +static void translate(const char *msg, struct strbuf *buf) > +{ > + const char *end = msg + strlen(msg); > + const char *text = "* GETTEXT POISON *"; > + int text_len = strlen(text); > + int t = 0; > + > + strbuf_reset(buf); > + /* preserve \n and printf format specifiers because msgfmt > + barfs otherwise. */ > + while (msg < end) { > + /* printf specifiers and shell variables, it's a quite > + relax check */ > + if ((*msg == '%' || *msg == '$') && msg+1 < end) { > + strbuf_addch(buf, *msg++); > + do > + strbuf_addch(buf, *msg); > + while (msg < end && !isspace(*msg++)); Aside from the Style: do { ... } while (); why are you special casing a run of non-blank letters that begin with a dollar sign (swapping two ints is done with "%2$d %1$d", a percent still at the beginning, so there must be something else I am missing)? Also why do you stop at isspace()? Isn't a " " (space) a flag that means "If the first character of a signed conversion is not a sign or if a signed conversion results in no characters, a <space> shall be prefixed to the result." As the flags, min-width, precision, and length do not share the same character as the conversion that has to come at the end, I think you only want to do something like /* * conversion specifier characters, taken from: * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html */ static const char printf_conversion[] = "diouxXfFeEgGaAcspnCS%"; ... while (msg < end) { if (*msg == '%') { strbuf_addch(buf, *msg++); while (msg < end) { int ch = *msg++; strbuf_addch(buf, ch); if (strchr(printf_conversion, ch)) break; } /* copied the printf part literally */ continue; } ... keep \n ... ... muck with string ... } perhaps? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Support generate poison .mo files for testing 2012-08-22 16:43 ` Junio C Hamano @ 2012-08-23 11:00 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 10+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-08-23 11:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder On Wed, Aug 22, 2012 at 11:43 PM, Junio C Hamano <gitster@pobox.com> wrote: > why are you special casing a run of non-blank letters that begin > with a dollar sign (swapping two ints is done with "%2$d %1$d", a > percent still at the beginning, so there must be something else I am > missing)? '$' is for shell variables. I should separate it from C messages. All these because now that we run the (fake) translation through gettext toolchain, it'll catch us if we don't preserve dynamic parts correctly. > Also why do you stop at isspace()? Isn't a " " (space) a flag that > means "If the first character of a signed conversion is not a sign > or if a signed conversion results in no characters, a <space> shall > be prefixed to the result." A hurry attempt to get past msgfmt. I should refine these code, either by... > As the flags, min-width, precision, and length do not share the same > character as the conversion that has to come at the end, I think you > only want to do something like > > /* > * conversion specifier characters, taken from: > * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html > */ > static const char printf_conversion[] = "diouxXfFeEgGaAcspnCS%"; > > ... > > while (msg < end) { > if (*msg == '%') { > strbuf_addch(buf, *msg++); > while (msg < end) { > int ch = *msg++; > strbuf_addch(buf, ch); > if (strchr(printf_conversion, ch)) > break; > } > /* copied the printf part literally */ > continue; > } > ... keep \n ... > ... muck with string ... > } > > perhaps? following this, or copying the matching logic from msgfmt source. -- Duy ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-08-23 11:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-21 4:39 [PATCH] Build in gettext poison feature unconditionally Nguyễn Thái Ngọc Duy 2012-08-21 5:24 ` Jonathan Nieder 2012-08-21 16:37 ` Junio C Hamano 2012-08-22 5:27 ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy 2012-08-22 11:13 ` Junio C Hamano 2012-08-22 12:37 ` Nguyen Thai Ngoc Duy 2012-08-22 16:22 ` Junio C Hamano 2012-08-23 10:53 ` Nguyen Thai Ngoc Duy 2012-08-22 16:43 ` Junio C Hamano 2012-08-23 11:00 ` Nguyen Thai Ngoc Duy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).