DASH Shell discussions
 help / color / mirror / Atom feed
* Re: [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter
       [not found]   ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
@ 2026-05-07  7:37     ` Herbert Xu
  2026-05-07  7:45     ` Herbert Xu
  1 sibling, 0 replies; 2+ messages in thread
From: Herbert Xu @ 2026-05-07  7:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Eric Sunshine, DASH Mailing List

On Thu, May 07, 2026 at 01:57:30PM +0800, Herbert Xu wrote:
> On Thu, Apr 02, 2026 at 08:51:18AM +0200, Patrick Steinhardt wrote:
> > When executing our test suite with Dash v0.5.13.2 one can observe
> > several test failures that all have the same symptoms: we have a quoted
> > heredoc that contains multibyte characters, but the final data does not
> > match what we actually wanted to write. One such example is in t0300,
> > where we see the diffs like the following:
> > 
> >   --- expect-stdout	2026-04-01 07:25:45.249919440 +0000
> >   +++ stdout	2026-04-01 07:25:45.254919509 +0000
> >   @@ -1,5 +1,5 @@
> >    protocol=https
> >    host=example.com
> >   -path=perú.git
> >   +path=perú.git
> >    username=foo
> >    password=bar
> 
> Thanks for the report.
> 
> This patch should fix the problem.  Please let me know if there are
> any more oustanding issues.

Oops, I forgot to cc the mailing list.  Sorry for the resend.

---8<---
For a here-document with a quoted delimiter, multi-byte characters
should be written out as is with no escaping.  Fix this by checking
for syntax == SQSYNTAX (the only time readtoken1 gets called with
SQSYNTAX is for such a here-document) before calling getmbc in
readtoken1.

Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: b12f136cc704 ("builtin: Process multi-byte characters in read(1)")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/parser.c b/src/parser.c
index bea4148..412e876 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -998,9 +998,13 @@ static char *dollarsq_escape(char *out)
 STATIC int
 readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 {
-	struct synstack synbase = { .syntax = syntax };
+	struct synstack synbase = {
+		.dblquote = syntax == DQSYNTAX,
+		.syntax = syntax,
+	};
 	int chkeofmark = checkkwd & CHKEOFMARK;
 	struct synstack *synstack = &synbase;
+	bool sqheredoc = syntax == SQSYNTAX;
 	struct nodelist *bqlist = NULL;
 	int dollarsq = 0;
 	int c = firstc;
@@ -1009,9 +1013,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	size_t len;
 	char *out;
 
-	if (syntax == DQSYNTAX)
-		synstack->dblquote = 1;
-
 	STARTSTACKSTR(out);
 	loop: {	/* for each line, until end of word */
 #if ATTY
@@ -1035,7 +1036,8 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 				      out);
 			fieldsplitting = synstack->syntax == BASESYNTAX &&
 					 !synstack->varnest ? 4 : 0;
-			ml = getmbc(c, out, fieldsplitting);
+			ml = getmbc(c, out, fieldsplitting |
+					    (sqheredoc ? 2 : 0));
 			if (ml == 1) {
 				if (out == stackblock())
 					return TBLANK;
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter
       [not found]   ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
  2026-05-07  7:37     ` [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter Herbert Xu
@ 2026-05-07  7:45     ` Herbert Xu
  1 sibling, 0 replies; 2+ messages in thread
From: Herbert Xu @ 2026-05-07  7:45 UTC (permalink / raw)
  To: DASH Mailing List

On Thu, May 07, 2026 at 01:57:30PM +0800, Herbert Xu wrote:
> On Thu, Apr 02, 2026 at 08:51:18AM +0200, Patrick Steinhardt wrote:
> > When executing our test suite with Dash v0.5.13.2 one can observe
> > several test failures that all have the same symptoms: we have a quoted
> > heredoc that contains multibyte characters, but the final data does not
> > match what we actually wanted to write. One such example is in t0300,
> > where we see the diffs like the following:
> > 
> >   --- expect-stdout	2026-04-01 07:25:45.249919440 +0000
> >   +++ stdout	2026-04-01 07:25:45.254919509 +0000
> >   @@ -1,5 +1,5 @@
> >    protocol=https
> >    host=example.com
> >   -path=perú.git
> >   +path=perú.git
> >    username=foo
> >    password=bar
> 
> Thanks for the report.
> 
> This patch should fix the problem.  Please let me know if there are
> any more oustanding issues.

Resending again to dash mailing list with a fixed Subject line.

---8<---
For a here-document with a quoted delimiter, multi-byte characters
should be written out as is with no escaping.  Fix this by checking
for syntax == SQSYNTAX (the only time readtoken1 gets called with
SQSYNTAX is for such a here-document) before calling getmbc in
readtoken1.

Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: b12f136cc704 ("builtin: Process multi-byte characters in read(1)")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/src/parser.c b/src/parser.c
index bea4148..412e876 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -998,9 +998,13 @@ static char *dollarsq_escape(char *out)
 STATIC int
 readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 {
-	struct synstack synbase = { .syntax = syntax };
+	struct synstack synbase = {
+		.dblquote = syntax == DQSYNTAX,
+		.syntax = syntax,
+	};
 	int chkeofmark = checkkwd & CHKEOFMARK;
 	struct synstack *synstack = &synbase;
+	bool sqheredoc = syntax == SQSYNTAX;
 	struct nodelist *bqlist = NULL;
 	int dollarsq = 0;
 	int c = firstc;
@@ -1009,9 +1013,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 	size_t len;
 	char *out;
 
-	if (syntax == DQSYNTAX)
-		synstack->dblquote = 1;
-
 	STARTSTACKSTR(out);
 	loop: {	/* for each line, until end of word */
 #if ATTY
@@ -1035,7 +1036,8 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 				      out);
 			fieldsplitting = synstack->syntax == BASESYNTAX &&
 					 !synstack->varnest ? 4 : 0;
-			ml = getmbc(c, out, fieldsplitting);
+			ml = getmbc(c, out, fieldsplitting |
+					    (sqheredoc ? 2 : 0));
 			if (ml == 1) {
 				if (out == stackblock())
 					return TBLANK;
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-07  7:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260402-pks-tests-with-dash-v2-0-cd7ab11dabc0@pks.im>
     [not found] ` <20260402-pks-tests-with-dash-v2-1-cd7ab11dabc0@pks.im>
     [not found]   ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
2026-05-07  7:37     ` [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter Herbert Xu
2026-05-07  7:45     ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox