git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] gettext.h: add parentheses around N_ expansion if supported
@ 2015-01-08  8:46 Kyle J. McKay
  2015-01-08 19:10 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Kyle J. McKay @ 2015-01-08  8:46 UTC (permalink / raw)
  To: Git mailing list; +Cc: Junio C Hamano, Ramsay Jones

The N_ macro is used to mark strings for translation without
actually translating them.  At runtime the string is expected
to be passed to the gettext API for translation.

If two N_ macro invocations appear next to each other with only
whitespace (or nothing at all) between them, the two separate
strings will be marked for translation, but the preprocessor
will then combine the strings into one and at runtime the
string passed to gettext will not match the strings that were
translated.

Avoid this by adding parentheses around the expansion of the
N_ macro so that instead of ending up with two adjacent strings
that are then combined by the preprocessor, two adjacent strings
surrounded by parentheses result instead which causes a compile
error so the mistake can be quickly found and corrected.

However, since these string literals are typically assigned to
static variables and not all compilers support parenthesized
string literal assignments, only add the parentheses when the
compiler is known to support the syntax.

For now only __GNUC__ is tested which covers both gcc and clang
which should result in early detection of any adjacent N_ macros.

Although the necessary #ifdef makes the header less elegant,
the benefit of avoiding propagation of a translation-marking
error to all the translation teams thus creating extra work
for them when the error is eventually detected and fixed would
seem to outweigh the minor inelegance the #ifdef introduces.

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
---
 gettext.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gettext.h b/gettext.h
index 7671d09d..80ec29b5 100644
--- a/gettext.h
+++ b/gettext.h
@@ -62,7 +62,19 @@ const char *Q_(const char *msgid, const char *plu, unsigned long n)
 	return ngettext(msgid, plu, n);
 }
 
-/* Mark msgid for translation but do not translate it. */
+/* Mark msgid for translation but do not translate it.
+ *
+ * In order to prevent accidents where two adjacent N_ macros
+ * are mistakenly used, this macro is defined with parentheses
+ * when the compiler is known to support parenthesized string
+ * literal assignments.  This guarantees a compiler error in
+ * such a case rather than a silent conjoining of the strings
+ * by the preprocessor which results in translation failures.
+ */
+#ifdef __GNUC__
+#define N_(msgid) (msgid)
+#else
 #define N_(msgid) msgid
+#endif
 
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] gettext.h: add parentheses around N_ expansion if supported
  2015-01-08  8:46 [PATCH v2] gettext.h: add parentheses around N_ expansion if supported Kyle J. McKay
@ 2015-01-08 19:10 ` Junio C Hamano
  2015-01-09  4:55   ` Kyle J. McKay
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2015-01-08 19:10 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Git mailing list, Ramsay Jones

"Kyle J. McKay" <mackyle@gmail.com> writes:

> For now only __GNUC__ is tested which covers both gcc and clang
> which should result in early detection of any adjacent N_ macros.

I didn't check the list of -W options, but if there were a way to
tell gcc to stick to the C standard in a more strict way than its
default, wouldn't this patch start causing trouble?

> Although the necessary #ifdef makes the header less elegant,
> the benefit of avoiding propagation of a translation-marking
> error to all the translation teams thus creating extra work
> for them when the error is eventually detected and fixed would
> seem to outweigh the minor inelegance the #ifdef introduces.
>
> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
> ---
>  gettext.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gettext.h b/gettext.h
> index 7671d09d..80ec29b5 100644
> --- a/gettext.h
> +++ b/gettext.h
> @@ -62,7 +62,19 @@ const char *Q_(const char *msgid, const char *plu, unsigned long n)
>  	return ngettext(msgid, plu, n);
>  }
>  
> -/* Mark msgid for translation but do not translate it. */
> +/* Mark msgid for translation but do not translate it.
> + *
> + * In order to prevent accidents where two adjacent N_ macros
> + * are mistakenly used, this macro is defined with parentheses
> + * when the compiler is known to support parenthesized string
> + * literal assignments.  This guarantees a compiler error in
> + * such a case rather than a silent conjoining of the strings
> + * by the preprocessor which results in translation failures.
> + */
> +#ifdef __GNUC__
> +#define N_(msgid) (msgid)
> +#else
>  #define N_(msgid) msgid
> +#endif
>  
>  #endif

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] gettext.h: add parentheses around N_ expansion if supported
  2015-01-08 19:10 ` Junio C Hamano
@ 2015-01-09  4:55   ` Kyle J. McKay
  2015-01-09 22:22     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Kyle J. McKay @ 2015-01-09  4:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git mailing list, Ramsay Jones

On Jan 8, 2015, at 11:10, Junio C Hamano wrote:

> "Kyle J. McKay" <mackyle@gmail.com> writes:
>
>> For now only __GNUC__ is tested which covers both gcc and clang
>> which should result in early detection of any adjacent N_ macros.
>
> I didn't check the list of -W options, but if there were a way to
> tell gcc to stick to the C standard in a more strict way than its
> default, wouldn't this patch start causing trouble?

With this test program:

-----BEGIN TEST.C-----
#include <stdio.h>

#define msg1(x) x
#define msg2(x) (x)

static const char *const a1[] = {
	msg1("hi"),
	msg2("bye") /* line 8 */
};

static const char s1[] = msg1("hi");

static const char s2[] = msg2("bye"); /* line 13 */

int main()
{
	puts(a1[0]);
	puts(a1[1]);
	puts(s1);
	puts(s2);
	return 0;
}
-----END TEST.C-----

gcc, (but not clang) emits a warning (it still compiles just fine)  
when -pedantic is used:

   test.c:13: warning: array initialized from parenthesized string  
constant

However, none of -ansi, -Wall or -Wextra trigger that warning.

Neither does using -ansi -Wall -Wextra together cause a warning to be  
emitted (by either gcc or clang).

Note that line 8 never causes a problem (nor should it), only line 13  
is in question.

After a quick read-through of the -Wxxx options there does not appear  
to be a separate -Wxxx option to get that particular warning.

And compiling Git with -pedantic spits out a LOT of warnings (over  
7200) even before making the "(msgid)" change so I don't think there's  
an issue as apparently -pedantic is not normally used to compile Git.

Note that Git will not compile with gcc using -ansi (unless you add - 
Dinline=__inline__) and the change does not cause any new warnings to  
be emitted with -ansi (after adding the needed -Dinline=__inline__)  
since -pedantic is required for the "parenthesized string constant"  
warning.

I'm not super attached to this change, it's just that it seems to me  
that translation support for Git is a scarce resource.  I'm guessing  
that when considering the 7 complete translations (bg, ca, de, fr, sv,  
vi and zh_CN) the average number of translators per language is in the  
low single digits.  So I hate to see unnecessary translation churn,  
not when it can be so easily prevented.

-Kyle

>> Although the necessary #ifdef makes the header less elegant,
>> the benefit of avoiding propagation of a translation-marking
>> error to all the translation teams thus creating extra work
>> for them when the error is eventually detected and fixed would
>> seem to outweigh the minor inelegance the #ifdef introduces.
>>
>> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
>> ---
>> gettext.h | 14 +++++++++++++-
>> 1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/gettext.h b/gettext.h
>> index 7671d09d..80ec29b5 100644
>> --- a/gettext.h
>> +++ b/gettext.h
>> @@ -62,7 +62,19 @@ const char *Q_(const char *msgid, const char  
>> *plu, unsigned long n)
>> 	return ngettext(msgid, plu, n);
>> }
>>
>> -/* Mark msgid for translation but do not translate it. */
>> +/* Mark msgid for translation but do not translate it.
>> + *
>> + * In order to prevent accidents where two adjacent N_ macros
>> + * are mistakenly used, this macro is defined with parentheses
>> + * when the compiler is known to support parenthesized string
>> + * literal assignments.  This guarantees a compiler error in
>> + * such a case rather than a silent conjoining of the strings
>> + * by the preprocessor which results in translation failures.
>> + */
>> +#ifdef __GNUC__
>> +#define N_(msgid) (msgid)
>> +#else
>> #define N_(msgid) msgid
>> +#endif
>>
>> #endif

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] gettext.h: add parentheses around N_ expansion if supported
  2015-01-09  4:55   ` Kyle J. McKay
@ 2015-01-09 22:22     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2015-01-09 22:22 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Git mailing list, Ramsay Jones

"Kyle J. McKay" <mackyle@gmail.com> writes:

> I'm not super attached to this change, it's just that it seems to me
> that translation support for Git is a scarce resource.  I'm guessing
> that when considering the 7 complete translations (bg, ca, de, fr, sv,
> vi and zh_CN) the average number of translators per language is in the
> low single digits.  So I hate to see unnecessary translation churn,
> not when it can be so easily prevented.

Yes, I share the same feeling and I agree that it is a worthy goal.

I just did not want an unconditional "#ifdef __GNUC__" that nobody
can override without editing the source, when __GNUC__ is a rough
approximation whether a specific language extension exists and is
enabled (we do not know which -W<option> or options like --pedantic
that will be added in the future would interfere with us).

What I had in mind instead was something along this line (but with a
better make variable name).  In an unconfigured build, it decides
between ("msg") and "msg" using the same __GNUC__ heuristic, but
that can be overridden from the $(MAKE) command line in a pinch.

I do not do autoconf, but it would be trivial to set CAN_USE_* by
try-compiling a small program.

 Makefile          | 17 ++++++++++++++++-
 gettext.h         | 24 ++++++++++++++++++++++++
 git-compat-util.h |  7 +++++++
 3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 06e5d24..48f4ce2 100644
--- a/Makefile
+++ b/Makefile
@@ -343,6 +343,12 @@ all::
 # return NULL when it receives a bogus time_t.
 #
 # Define HAVE_CLOCK_GETTIME if your platform has clock_gettime in librt.
+#
+# Define CAN_USE_CONSTANT_STRING_IN_PARENTHESES to Yes if your compiler
+# happily compiles the following initialization:
+#     static const char s[] = ("FOO");
+# and define it to No if you need to remove the parentheses () around the
+# constant.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -950,6 +956,16 @@ ifneq (,$(SOCKLEN_T))
 	BASIC_CFLAGS += -Dsocklen_t=$(SOCKLEN_T)
 endif
 
+ifeq (Yes,$(CAN_USE_CONSTANT_STRING_IN_PARENTHESES))
+	BASIC_CFLAGS += -DUSE_PARENS_AROUND_N=1
+else
+ifneq (,$(CAN_USE_CONSTANT_STRING_IN_PARENTHESES))
+	BASIC_CFLAGS += -DUSE_PARENS_AROUND_N=0
+else
+	BASIC_CFLAGS += -DUSE_PARENS_AROUND_N=-1
+endif
+endif
+
 ifeq ($(uname_S),Darwin)
 	ifndef NO_FINK
 		ifeq ($(shell test -d /sw/lib && echo y),y)
@@ -2486,4 +2502,3 @@ cover_db: coverage-report
 
 cover_db_html: cover_db
 	cover -report html -outputdir cover_db_html cover_db
-
diff --git a/gettext.h b/gettext.h
index 7671d09..9b54ded 100644
--- a/gettext.h
+++ b/gettext.h
@@ -63,6 +63,30 @@ const char *Q_(const char *msgid, const char *plu, unsigned long n)
 }
 
 /* Mark msgid for translation but do not translate it. */
+#if !USE_PARENS_AROUND_N
 #define N_(msgid) msgid
+#else
+/*
+ * Strictly speaking, this will lead to invalid C when
+ * used this way:
+ *     static const char s[] = N_("FOO");
+ * which will expand to
+ *     static const char s[] = ("FOO");
+ * and in valid C, the initializer on the right hand side must
+ * be without the parentheses.  But many compilers do accept it
+ * as a language extension and it will allow us to catch mistakes
+ * like:
+ *     static const char **msgs = {
+ *                N_("one")
+ *                N_("two"),
+ *                N_("three"),
+ *                NULL
+ *     }
+ * (notice the missing comma on one of the lines) by forcing
+ * an compilation error, because parenthesised ("one") ("two")
+ * will not get silently turned into ("onetwo").
+ */
+#define N_(msgid) (msgid)
+#endif
 
 #endif
diff --git a/git-compat-util.h b/git-compat-util.h
index dcecd85..8640163 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -867,4 +867,11 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
 #define gmtime_r git_gmtime_r
 #endif
 
+#if USE_PARENS_AROUND_N == -1
+#  ifdef __GNUC__
+#    undef USE_PARENS_AROUND_N
+#    define USE_PARENS_AROUND_N 1
+#  endif
+#endif
+
 #endif

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-09 22:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-08  8:46 [PATCH v2] gettext.h: add parentheses around N_ expansion if supported Kyle J. McKay
2015-01-08 19:10 ` Junio C Hamano
2015-01-09  4:55   ` Kyle J. McKay
2015-01-09 22:22     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).