git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Build in gettext poison feature unconditionally
@ 2012-08-21  4:39 Nguyễn Thái Ngọc Duy
  2012-08-21  5:24 ` Jonathan Nieder
  0 siblings, 1 reply; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-08-21  4:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð, Jonathan Nieder,
	Nguyễn Thái Ngọc Duy

Making gettext poison second citizen makes it easy to forget to run
the test suite with it. This is step one to running test suite with
gettext poison by default.

The runtime cost should be small. "gcc -O2" inlines _() and
use_gettext_poison(). But even if it does not, performance should not
be impacted as _() calls are usually not on critical path. If some of
them are, we better fix there as gettext() may or may not be cheap
anyway.

A new target is added to run test with poisoned gettext: test-poison

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 I don't know the story behind this compile-time switch. The series [1]
 that introduces it does not say much.

 This at least makes it easier for me to run poison tests instead of
 building another binary, if I remember it. Next step could be make
 "make test" run both normal and poison modes, but I'm not sure how to
 do it nicely yet.

 [1] http://thread.gmane.org/gmane.comp.version-control.git/167307/focus=167438

 Makefile  | 12 +++---------
 gettext.c | 10 ----------
 gettext.h | 12 +++++++-----
 po/README | 13 ++++---------
 4 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/Makefile b/Makefile
index 6b0c961..8255fb9 100644
--- a/Makefile
+++ b/Makefile
@@ -258,11 +258,6 @@ all::
 # Define HAVE_DEV_TTY if your system can open /dev/tty to interact with the
 # user.
 #
-# Define GETTEXT_POISON if you are debugging the choice of strings marked
-# for translation.  In a GETTEXT_POISON build, you can turn all strings marked
-# for translation into gibberish by setting the GIT_GETTEXT_POISON variable
-# (to any value) in your environment.
-#
 # Define JSMIN to point to JavaScript minifier that functions as
 # a filter to have gitweb.js minified.
 #
@@ -1594,9 +1589,6 @@ endif
 ifdef NO_SYMLINK_HEAD
 	BASIC_CFLAGS += -DNO_SYMLINK_HEAD
 endif
-ifdef GETTEXT_POISON
-	BASIC_CFLAGS += -DGETTEXT_POISON
-endif
 ifdef NO_GETTEXT
 	BASIC_CFLAGS += -DNO_GETTEXT
 	USE_GETTEXT_SCHEME ?= fallthrough
@@ -2497,7 +2489,6 @@ ifdef GIT_TEST_CMP_USE_COPIED_CONTEXT
 	@echo GIT_TEST_CMP_USE_COPIED_CONTEXT=YesPlease >>$@
 endif
 	@echo NO_GETTEXT=\''$(subst ','\'',$(subst ','\'',$(NO_GETTEXT)))'\' >>$@
-	@echo GETTEXT_POISON=\''$(subst ','\'',$(subst ','\'',$(GETTEXT_POISON)))'\' >>$@
 ifdef GIT_PERF_REPEAT_COUNT
 	@echo GIT_PERF_REPEAT_COUNT=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_REPEAT_COUNT)))'\' >>$@
 endif
@@ -2545,6 +2536,9 @@ export NO_SVN_TESTS
 test: all
 	$(MAKE) -C t/ all
 
+poison-test: all
+	$(MAKE) POISON_GETTEXT=YesPlease -C t/ all
+
 perf: all
 	$(MAKE) -C t/perf/ all
 
diff --git a/gettext.c b/gettext.c
index f75bca7..6aa822c 100644
--- a/gettext.c
+++ b/gettext.c
@@ -16,16 +16,6 @@
 #	endif
 #endif
 
-#ifdef GETTEXT_POISON
-int use_gettext_poison(void)
-{
-	static int poison_requested = -1;
-	if (poison_requested == -1)
-		poison_requested = getenv("GIT_GETTEXT_POISON") ? 1 : 0;
-	return poison_requested;
-}
-#endif
-
 #ifndef NO_GETTEXT
 static void init_gettext_charset(const char *domain)
 {
diff --git a/gettext.h b/gettext.h
index 57ba8bb..a77554f 100644
--- a/gettext.h
+++ b/gettext.h
@@ -36,11 +36,13 @@ static inline void git_setup_gettext(void)
 }
 #endif
 
-#ifdef GETTEXT_POISON
-extern int use_gettext_poison(void);
-#else
-#define use_gettext_poison() 0
-#endif
+static inline int use_gettext_poison(void)
+{
+	static int poison_requested = -1;
+	if (poison_requested == -1)
+		poison_requested = getenv("GIT_GETTEXT_POISON") ? 1 : 0;
+	return poison_requested;
+}
 
 static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
 {
diff --git a/po/README b/po/README
index c1520e8..e25752b 100644
--- a/po/README
+++ b/po/README
@@ -270,16 +270,11 @@ something in the test suite might still depend on the US English
 version of the strings, e.g. to grep some error message or other
 output.
 
-To smoke out issues like these Git can be compiled with gettext poison
-support, at the top-level:
+To smoke out issues like these you should run the test suite with
+gettext poison support, which emits gibberish on every call to
+gettext:
 
-    make GETTEXT_POISON=YesPlease
-
-That'll give you a git which emits gibberish on every call to
-gettext. It's obviously not meant to be installed, but you should run
-the test suite with it:
-
-    cd t && prove -j 9 ./t[0-9]*.sh
+    cd t && GETTEXT_POISON=YesPlease prove -j 9 ./t[0-9]*.sh
 
 If tests break with it you should inspect them manually and see if
 what you're translating is sane, i.e. that you're not translating
-- 
1.7.12.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Build in gettext poison feature unconditionally
  2012-08-21  4:39 [PATCH] Build in gettext poison feature unconditionally Nguyễn Thái Ngọc Duy
@ 2012-08-21  5:24 ` Jonathan Nieder
  2012-08-21 16:37   ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2012-08-21  5:24 UTC (permalink / raw)
  To: Nguy� n Thái Ngọc Duy
  Cc: git, Junio C Hamano, Ævar Arnfjörð Bjarmason

Nguyễn Thái Ngọc Duy wrote:

> The runtime cost should be small. "gcc -O2" inlines _() and
> use_gettext_poison(). But even if it does not, performance should not
> be impacted as _() calls are usually not on critical path. If some of
> them are, we better fix there as gettext() may or may not be cheap
> anyway.

That seems reasonable to me.  The increase in code size of a commonly
inlined function and the extra "if" in a common if not
performance-critical codepath is annoying (and I'd prefer to keep
use_gettext_poison() un-inlined), but in any event the cost would go
away once the translation-based implementation of poison lands.

[...]
>  I don't know the story behind this compile-time switch. The series [1]
>  that introduces it does not say much.

I think it was just paranoia about performance regressions.

>  This at least makes it easier for me to run poison tests instead of
>  building another binary, if I remember it. Next step could be make
>  "make test" run both normal and poison modes, but I'm not sure how to
>  do it nicely yet.

Yes, that would be nice (or perhaps a mode to run most tests in
the current locale and rerun test assertions that use a test_i18n*
helper or C_LOCALE_OUTPUT prerequisite in the C locale).

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Build in gettext poison feature unconditionally
  2012-08-21  5:24 ` Jonathan Nieder
@ 2012-08-21 16:37   ` Junio C Hamano
  2012-08-22  5:27     ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-08-21 16:37 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: pclouds, git, Ævar Arnfjörð Bjarmason

Jonathan Nieder <jrnieder@gmail.com> writes:

> Nguyễn Thái Ngọc Duy wrote:
>
>> The runtime cost should be small. "gcc -O2" inlines _() and
>> use_gettext_poison(). But even if it does not, performance should not
>> be impacted as _() calls are usually not on critical path. If some of
>> them are, we better fix there as gettext() may or may not be cheap
>> anyway.
>
> That seems reasonable to me.  The increase in code size of a commonly
> inlined function and the extra "if" in a common if not
> performance-critical codepath is annoying (and I'd prefer to keep
> use_gettext_poison() un-inlined), but in any event the cost would go
> away once the translation-based implementation of poison lands.

I would say it is not worse than just "annoying"; if the cost will
go away, I'd rather see this conversion postponed and is done as
part of (and preferrably at the end of) the "poison with a
poison-locale" series.

> Yes, that would be nice (or perhaps a mode to run most tests in
> the current locale and rerun test assertions that use a test_i18n*
> helper or C_LOCALE_OUTPUT prerequisite in the C locale).

Yeah, I think that is the right direction to go; I suspect that
"poison with a poison-locale" would have to be ready to allow us to
go there, though.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] Support generate poison .mo files for testing
  2012-08-21 16:37   ` Junio C Hamano
@ 2012-08-22  5:27     ` Nguyễn Thái Ngọc Duy
  2012-08-22 11:13       ` Junio C Hamano
  2012-08-22 16:43       ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2012-08-22  5:27 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð, Jonathan Nieder,
	Nguyễn Thái Ngọc Duy

test-poisongen does a similar job to gettext poison feature except
that it does it at build time. Gibberish .mo files are generated for
all supported langauges and put in po/build/poison-locale. Target
"poison-locale" is for this.

User can run the test with these .mo files by setting POISON_LOCALE
while running the test suite. User must also set LANG/LC_* correctly
(and the system is supposed to support that locale).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 On Tue, Aug 21, 2012 at 11:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
 > I would say it is not worse than just "annoying"; if the cost will
 > go away, I'd rather see this conversion postponed and is done as
 > part of (and preferrably at the end of) the "poison with a
 > poison-locale" series.

 OK let me redo step one. test-poisongen requires libgettextpo. I'm
 not sure if this library if gnu specific. We may need another flag
 for it instead of NO_GETTEXT. We don't need a fake language code with
 this approach.

 Makefile         |  19 ++++++++
 t/test-lib.sh    |  10 +++-
 test-poisongen.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 wrap-for-bin.sh  |   6 ++-
 4 files changed, 171 insertions(+), 3 deletions(-)
 mode change 100644 => 100755 t/test-lib.sh
 create mode 100644 test-poisongen.c
 mode change 100644 => 100755 wrap-for-bin.sh

diff --git a/Makefile b/Makefile
index 6b0c961..6ea2665 100644
--- a/Makefile
+++ b/Makefile
@@ -496,6 +496,9 @@ TEST_PROGRAMS_NEED_X += test-mergesort
 TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+ifndef NO_GETTEXT
+TEST_PROGRAMS_NEED_X += test-poisongen
+endif
 TEST_PROGRAMS_NEED_X += test-revision-walking
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
@@ -2428,6 +2431,19 @@ endif
 po/build/locale/%/LC_MESSAGES/git.mo: po/%.po
 	$(QUIET_MSGFMT)mkdir -p $(dir $@) && $(MSGFMT) -o $@ $<
 
+ifndef NO_GETTEXT
+POISON_MOFILES := $(patsubst po/%.po,po/build/poison-locale/%/LC_MESSAGES/git.mo,$(POFILES))
+
+po/build/poison-locale/%.po: po/%.po test-poisongen$X po/git.pot
+	$(QUIET_MSGFMT)mkdir -p $(dir $@) && \
+	./test-poisongen po/git.pot $@
+
+po/build/poison-locale/%/LC_MESSAGES/git.mo: po/build/poison-locale/%.po
+	$(QUIET_MSGFMT)mkdir -p $(dir $@) && $(MSGFMT) -o $@ $<
+
+poison-locale: $(POISON_MOFILES)
+endif
+
 FIND_SOURCE_FILES = ( git ls-files '*.[hcS]' 2>/dev/null || \
 			$(FIND) . \( -name .git -type d -prune \) \
 				-o \( -name '*.[hcS]' -type f -print \) )
@@ -2564,6 +2580,9 @@ test-svn-fe$X: vcs-svn/lib.a
 
 .PRECIOUS: $(TEST_OBJS)
 
+test-poisongen$X: test-poisongen.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS) -lgettextpo
+
 test-%$X: test-%.o GIT-LDFLAGS $(GITLIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
old mode 100644
new mode 100755
index bb4f886..d4060e8
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -55,8 +55,10 @@ GIT_BUILD_DIR="$TEST_DIRECTORY"/..
 export PERL_PATH SHELL_PATH
 
 # For repeatability, reset the environment to known value.
-LANG=C
-LC_ALL=C
+if [ -z "$POISON_LOCALE" ]; then
+	LANG=C
+	LC_ALL=C
+fi
 PAGER=cat
 TZ=UTC
 TERM=dumb
@@ -92,6 +94,10 @@ export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT
 export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export EDITOR
+if test -n "$POISON_LOCALE"; then
+	GIT_POISON_LOCALE=yes
+	export GIT_POISON_LOCALE
+fi
 
 # Protect ourselves from common misconfiguration to export
 # CDPATH into the environment
diff --git a/test-poisongen.c b/test-poisongen.c
new file mode 100644
index 0000000..5905aa9
--- /dev/null
+++ b/test-poisongen.c
@@ -0,0 +1,139 @@
+#include <gettext-po.h>
+#include "cache.h"
+#include "strbuf.h"
+
+static void xerror(int severity,
+		   po_message_t message,
+		   const char *filename, size_t lineno, size_t column,
+		   int multiline_p, const char *message_text)
+{
+	die("%s:%d:%d %s", filename, lineno, column, message_text);
+}
+
+static void xerror2(int severity,
+		    po_message_t message1,
+		    const char *filename1, size_t lineno1, size_t column1,
+		    int multiline_p1, const char *message_text1,
+		    po_message_t message2,
+		    const char *filename2, size_t lineno2, size_t column2,
+		    int multiline_p2, const char *message_text2)
+{
+	die("%s:%d:%d %s (%s:%d:%d %s)",
+	    filename1, lineno1, column1, message_text1,
+	    filename2, lineno2, column2, message_text2);
+}
+
+static void translate(const char *msg, struct strbuf *buf)
+{
+	const char *end = msg + strlen(msg);
+	const char *text = "* GETTEXT POISON *";
+	int text_len = strlen(text);
+	int t = 0;
+
+	strbuf_reset(buf);
+	/* preserve \n and printf format specifiers because msgfmt
+	   barfs otherwise. */
+	while (msg < end) {
+		/* printf specifiers and shell variables, it's a quite
+		   relax check */
+		if ((*msg == '%' || *msg == '$') && msg+1 < end) {
+			strbuf_addch(buf, *msg++);
+			do
+			       strbuf_addch(buf, *msg);
+			while (msg < end && !isspace(*msg++));
+		} else if (*msg == '\n') {
+			/* we only need to preserve trailing newlines, doing
+			   more does not really harm */
+			strbuf_addch(buf, '\n');
+			msg++;
+		} else {
+			strbuf_addch(buf, text[t]);
+			t = (t + 1) % text_len;
+			msg++;
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	const char *project_header = "Project-Id-Version: ";
+	const char *charset_header = "Content-Type: text/plain; charset=";
+	const char *plural_header = "Plural-Forms: nplurals=";
+	struct po_xerror_handler handler = { xerror, xerror2 };
+	po_file_t src;
+	const char *file = argv[1];
+	po_message_iterator_t iter;
+	po_message_t msg;
+	struct strbuf buf = STRBUF_INIT;
+
+	if (argc != 3)
+		die("usage: test-poisongen <pot file> <poison po file>");
+
+	src = po_file_read(file, &handler);
+	if (src == NULL)
+		die("could not open %s\n", file);
+
+	iter = po_message_iterator(src, "messages");
+	while ((msg = po_next_message(iter)) != NULL) {
+		/* msgid "" is the header, special handling */
+		if (po_message_msgid(msg)[0] == '\0') {
+			const char *p;
+
+			/* no fuzzy, msgfmt does not like it */
+			po_message_set_fuzzy(msg, 0);
+
+			strbuf_reset(&buf);
+			strbuf_addstr(&buf, po_message_msgstr(msg));
+
+			if (!prefixcmp(buf.buf, project_header) &&
+			    !prefixcmp(buf.buf + strlen(project_header),
+				       "PACKAGE VERSION\n")) {
+				strbuf_splice(&buf, strlen(project_header),
+					      strlen("PACKAGE VERSION"),
+					      "git poison",
+					      strlen("git poison"));
+			}
+
+			/* Content-Type: text/plain; charset=UTF-8 */
+			if ((p = strstr(buf.buf, charset_header)) != NULL &&
+			    !prefixcmp(p + strlen(charset_header), "CHARSET\n")) {
+				p += strlen(charset_header);
+				strbuf_splice(&buf, p - buf.buf, strlen("CHARSET"),
+					      "UTF-8", strlen("UTF-8"));
+			}
+
+			/* Plural-Forms: nplurals=2; plural=1 */
+			if ((p = strstr(buf.buf, plural_header)) != NULL &&
+			    !prefixcmp(p + strlen(plural_header), "INTEGER; plural=EXPRESSION")) {
+				int offset;
+				p += strlen(plural_header);
+				offset = p - buf.buf;
+				strbuf_splice(&buf, offset, strlen("INTEGER"), "2", 1);
+				offset += 1;
+				assert(!prefixcmp(buf.buf + offset, "; plural=EXPRESSION"));
+				offset += strlen("; plural=");
+				strbuf_splice(&buf, offset, strlen("EXPRESSION"), "1", 1);
+			}
+
+			po_message_set_msgstr(msg, buf.buf);
+			continue;
+		}
+		if (po_message_msgid_plural(msg)) {
+			int index = 0;
+
+			translate(po_message_msgid(msg), &buf);
+			po_message_set_msgstr_plural(msg, index++, buf.buf);
+
+			while (po_message_msgstr_plural(msg, index)) {
+				translate(po_message_msgid_plural(msg), &buf);
+				po_message_set_msgstr_plural(msg, index++, buf.buf);
+			}
+		} else {
+			translate(po_message_msgid(msg), &buf);
+			po_message_set_msgstr(msg, buf.buf);
+		}
+	}
+
+	po_file_write(src, argv[2], &handler);
+	return 0;
+}
diff --git a/wrap-for-bin.sh b/wrap-for-bin.sh
old mode 100644
new mode 100755
index 53a8dd0..99bc816
--- a/wrap-for-bin.sh
+++ b/wrap-for-bin.sh
@@ -15,7 +15,11 @@ else
 	export GIT_TEMPLATE_DIR
 fi
 GITPERLLIB='@@BUILD_DIR@@/perl/blib/lib'
-GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/locale'
+if test -n "$GIT_POISON_LOCALE"; then
+	GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/poison-locale'
+else
+	GIT_TEXTDOMAINDIR='@@BUILD_DIR@@/po/build/locale'
+fi
 PATH='@@BUILD_DIR@@/bin-wrappers:'"$PATH"
 export GIT_EXEC_PATH GITPERLLIB PATH GIT_TEXTDOMAINDIR
 
-- 
1.7.12.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22  5:27     ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy
@ 2012-08-22 11:13       ` Junio C Hamano
  2012-08-22 12:37         ` Nguyen Thai Ngoc Duy
  2012-08-22 16:43       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-08-22 11:13 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Ævar Arnfjörð, Jonathan Nieder

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:

> test-poisongen does a similar job to gettext poison feature except
> that it does it at build time. Gibberish .mo files are generated for
> all supported langauges and put in po/build/poison-locale. Target
> "poison-locale" is for this.

What is the significance of this locale being "Gibberish"?
Currently, for any string, we give "### gettext poison ###" or
something but the only thing we care about in the poison mode is
that it is different from the message id, no?  I was wondering if
these phony translations can be something simple like "Add QQ at the
beginning of the message id string" and still can catch mistakenly
marked messages that come from the plumbing layer, or something.

As you have already written a printf skipper that looks fairly
conservative, perhaps I shouldn't be worried too much about it, but
we seem to be spending considerable effort on the "poison", and it
makes me wonder (even though no better alternative comes to mind) if
we could do better.  The reason we do "poison" (be it the current
one or locale based one) in the first place is so that we want to
make sure messages from the plumbing are not marked for i18n, and we
do so by running our test under the "poison" mode that produces
output different from the in-code text that are marked for i18n,
which somehow feels quite a roundabout way of doing so.

> User can run the test with these .mo files by setting POISON_LOCALE
> while running the test suite. User must also set LANG/LC_* correctly
> (and the system is supposed to support that locale).

>  OK let me redo step one. test-poisongen requires libgettextpo. I'm
>  not sure if this library if gnu specific. We may need another flag
>  for it instead of NO_GETTEXT. We don't need a fake language code with
>  this approach.
		
OK.

>  Makefile         |  19 ++++++++
>  t/test-lib.sh    |  10 +++-
>  test-poisongen.c | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  wrap-for-bin.sh  |   6 ++-
>  4 files changed, 171 insertions(+), 3 deletions(-)
>  mode change 100644 => 100755 t/test-lib.sh
>  create mode 100644 test-poisongen.c
>  mode change 100644 => 100755 wrap-for-bin.sh

Thanks.  I suspect two mode changes weren't intentional?

> +static void translate(const char *msg, struct strbuf *buf)
> +{
> +	const char *end = msg + strlen(msg);
> +	const char *text = "* GETTEXT POISON *";
> +	int text_len = strlen(text);
> +	int t = 0;
> +
> +	strbuf_reset(buf);
> +	/* preserve \n and printf format specifiers because msgfmt
> +	   barfs otherwise. */
> +	while (msg < end) {
> +		/* printf specifiers and shell variables, it's a quite
> +		   relax check */
> +		if ((*msg == '%' || *msg == '$') && msg+1 < end) {
> +			strbuf_addch(buf, *msg++);
> +			do
> +			       strbuf_addch(buf, *msg);
> +			while (msg < end && !isspace(*msg++));
> +		} else if (*msg == '\n') {
> +			/* we only need to preserve trailing newlines, doing
> +			   more does not really harm */
> +			strbuf_addch(buf, '\n');
> +			msg++;
> +		} else {
> +			strbuf_addch(buf, text[t]);
> +			t = (t + 1) % text_len;
> +			msg++;
> +		}
> +	}
> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22 11:13       ` Junio C Hamano
@ 2012-08-22 12:37         ` Nguyen Thai Ngoc Duy
  2012-08-22 16:22           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-08-22 12:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder

On Wed, Aug 22, 2012 at 6:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>
>> test-poisongen does a similar job to gettext poison feature except
>> that it does it at build time. Gibberish .mo files are generated for
>> all supported langauges and put in po/build/poison-locale. Target
>> "poison-locale" is for this.
>
> What is the significance of this locale being "Gibberish"?
> Currently, for any string, we give "### gettext poison ###" or
> something but the only thing we care about in the poison mode is
> that it is different from the message id, no?  I was wondering if
> these phony translations can be something simple like "Add QQ at the
> beginning of the message id string" and still can catch mistakenly
> marked messages that come from the plumbing layer, or something.

I'm gradually getting there, partly thanks to your question about
grepping "tracked" in another thread. This patch does not really
generate random strings. It repeats the pattern "* gettext poison *"
for evey character that can be replaced. But a better way could be
replacing "tracked" with "t r a c k e d". We know the rule so we can
recreate the that string from "tracked" in test_i18n*. Or reverse the
upper/lower case, whichever is easier for the recreation by test_i18n*

>>  mode change 100644 => 100755 t/test-lib.sh
>>  create mode 100644 test-poisongen.c
>>  mode change 100644 => 100755 wrap-for-bin.sh
>
> Thanks.  I suspect two mode changes weren't intentional?

It's an emacs hook gone bad.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22 12:37         ` Nguyen Thai Ngoc Duy
@ 2012-08-22 16:22           ` Junio C Hamano
  2012-08-23 10:53             ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-08-22 16:22 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, Ævar Arnfjörð, Jonathan Nieder

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> But a better way could be
> replacing "tracked" with "t r a c k e d". We know the rule so we can
> recreate the that string from "tracked" in test_i18n*. Or reverse the
> upper/lower case, whichever is easier for the recreation by test_i18n*

That does not make much sense to me, so either one of us must be
slightly confused.  I thought the only purpose of testing with the
"poison" was to find messages that must not be localized but were
localized by mistake.  For that, we have to make sure that anything
that uses test_i18n* is reading from Porcelain, in other words, we
must use the byte-for-byte comparison without using test_i18n* when
verifying the plumbing output.  And the primary requirement for this
arrangement to work is that the expected output in C locale and the
actual ouptut in the synthetic poison locale are reliably different.
They do not have to be reversible (I was actually going to suggest
rot13 of the original instead of cycling through the "* gettext
poison *" in your patch --- prefixing with QQ would not work, as it
is likely that the test with "grep" may not be anchored at the left
end).

Teaching test_i18n* to fuzzily match the expected output in C locale
against the actual output in synthetic poison locale may or may not
be doable, but spending any cycle working on that sounds like a
total waste of time (even though it might be fun).  It does not test
that we are translating Porcelain messages correctly.

Am I missing something?  Puzzled...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22  5:27     ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy
  2012-08-22 11:13       ` Junio C Hamano
@ 2012-08-22 16:43       ` Junio C Hamano
  2012-08-23 11:00         ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-08-22 16:43 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Ævar Arnfjörð, Jonathan Nieder

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> +static void translate(const char *msg, struct strbuf *buf)
> +{
> +	const char *end = msg + strlen(msg);
> +	const char *text = "* GETTEXT POISON *";
> +	int text_len = strlen(text);
> +	int t = 0;
> +
> +	strbuf_reset(buf);
> +	/* preserve \n and printf format specifiers because msgfmt
> +	   barfs otherwise. */
> +	while (msg < end) {
> +		/* printf specifiers and shell variables, it's a quite
> +		   relax check */
> +		if ((*msg == '%' || *msg == '$') && msg+1 < end) {
> +			strbuf_addch(buf, *msg++);
> +			do
> +			       strbuf_addch(buf, *msg);
> +			while (msg < end && !isspace(*msg++));

Aside from the Style:

        do {
                ...
        } while ();

why are you special casing a run of non-blank letters that begin
with a dollar sign (swapping two ints is done with "%2$d %1$d", a
percent still at the beginning, so there must be something else I am
missing)?

Also why do you stop at isspace()?  Isn't a " " (space) a flag that
means "If the first character of a signed conversion is not a sign
or if a signed conversion results in no characters, a <space> shall
be prefixed to the result."

As the flags, min-width, precision, and length do not share the same
character as the conversion that has to come at the end, I think you
only want to do something like

	/*
         * conversion specifier characters, taken from:
         * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
         */
        static const char printf_conversion[] = "diouxXfFeEgGaAcspnCS%";

        ...

        while (msg < end) {
                if (*msg == '%') {
                        strbuf_addch(buf, *msg++);
                        while (msg < end) {
                                int ch = *msg++;
                                strbuf_addch(buf, ch);
                                if (strchr(printf_conversion, ch))
                                        break;
                        }
                        /* copied the printf part literally */
                        continue;
		}
		... keep \n ...
                ... muck with string ...
	}

perhaps?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22 16:22           ` Junio C Hamano
@ 2012-08-23 10:53             ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-08-23 10:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder

On Wed, Aug 22, 2012 at 11:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> But a better way could be
>> replacing "tracked" with "t r a c k e d". We know the rule so we can
>> recreate the that string from "tracked" in test_i18n*. Or reverse the
>> upper/lower case, whichever is easier for the recreation by test_i18n*
>
> That does not make much sense to me, so either one of us must be
> slightly confused.  I thought the only purpose of testing with the
> "poison" was to find messages that must not be localized but were
> localized by mistake.  For that, we have to make sure that anything
> that uses test_i18n* is reading from Porcelain, in other words, we
> must use the byte-for-byte comparison without using test_i18n* when
> verifying the plumbing output.  And the primary requirement for this
> arrangement to work is that the expected output in C locale and the
> actual ouptut in the synthetic poison locale are reliably different.
> They do not have to be reversible (I was actually going to suggest
> rot13 of the original instead of cycling through the "* gettext
> poison *" in your patch --- prefixing with QQ would not work, as it
> is likely that the test with "grep" may not be anchored at the left
> end).

Yes, for verifying plumbing output that should be enough.

> Teaching test_i18n* to fuzzily match the expected output in C locale
> against the actual output in synthetic poison locale may or may not
> be doable, but spending any cycle working on that sounds like a
> total waste of time (even though it might be fun).  It does not test
> that we are translating Porcelain messages correctly.

Right. I was aiming at testing translated messages but obviously went
off track trying to test a fake locale instead. Thanks for stopping
me.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Support generate poison .mo files for testing
  2012-08-22 16:43       ` Junio C Hamano
@ 2012-08-23 11:00         ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-08-23 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð, Jonathan Nieder

On Wed, Aug 22, 2012 at 11:43 PM, Junio C Hamano <gitster@pobox.com> wrote:
> why are you special casing a run of non-blank letters that begin
> with a dollar sign (swapping two ints is done with "%2$d %1$d", a
> percent still at the beginning, so there must be something else I am
> missing)?

'$' is for shell variables. I should separate it from C messages. All
these because now that we run the (fake) translation through gettext
toolchain, it'll catch us if we don't preserve dynamic parts
correctly.

> Also why do you stop at isspace()?  Isn't a " " (space) a flag that
> means "If the first character of a signed conversion is not a sign
> or if a signed conversion results in no characters, a <space> shall
> be prefixed to the result."

A hurry attempt to get past msgfmt. I should refine these code, either by...

> As the flags, min-width, precision, and length do not share the same
> character as the conversion that has to come at the end, I think you
> only want to do something like
>
>         /*
>          * conversion specifier characters, taken from:
>          * http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
>          */
>         static const char printf_conversion[] = "diouxXfFeEgGaAcspnCS%";
>
>         ...
>
>         while (msg < end) {
>                 if (*msg == '%') {
>                         strbuf_addch(buf, *msg++);
>                         while (msg < end) {
>                                 int ch = *msg++;
>                                 strbuf_addch(buf, ch);
>                                 if (strchr(printf_conversion, ch))
>                                         break;
>                         }
>                         /* copied the printf part literally */
>                         continue;
>                 }
>                 ... keep \n ...
>                 ... muck with string ...
>         }
>
> perhaps?

following this, or copying the matching logic from msgfmt source.
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-23 11:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-21  4:39 [PATCH] Build in gettext poison feature unconditionally Nguyễn Thái Ngọc Duy
2012-08-21  5:24 ` Jonathan Nieder
2012-08-21 16:37   ` Junio C Hamano
2012-08-22  5:27     ` [PATCH] Support generate poison .mo files for testing Nguyễn Thái Ngọc Duy
2012-08-22 11:13       ` Junio C Hamano
2012-08-22 12:37         ` Nguyen Thai Ngoc Duy
2012-08-22 16:22           ` Junio C Hamano
2012-08-23 10:53             ` Nguyen Thai Ngoc Duy
2012-08-22 16:43       ` Junio C Hamano
2012-08-23 11:00         ` Nguyen Thai Ngoc Duy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).