* Re: [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter
[not found] ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
@ 2026-05-07 7:37 ` Herbert Xu
2026-05-07 7:45 ` Herbert Xu
1 sibling, 0 replies; 2+ messages in thread
From: Herbert Xu @ 2026-05-07 7:37 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Eric Sunshine, DASH Mailing List
On Thu, May 07, 2026 at 01:57:30PM +0800, Herbert Xu wrote:
> On Thu, Apr 02, 2026 at 08:51:18AM +0200, Patrick Steinhardt wrote:
> > When executing our test suite with Dash v0.5.13.2 one can observe
> > several test failures that all have the same symptoms: we have a quoted
> > heredoc that contains multibyte characters, but the final data does not
> > match what we actually wanted to write. One such example is in t0300,
> > where we see the diffs like the following:
> >
> > --- expect-stdout 2026-04-01 07:25:45.249919440 +0000
> > +++ stdout 2026-04-01 07:25:45.254919509 +0000
> > @@ -1,5 +1,5 @@
> > protocol=https
> > host=example.com
> > -path=perú.git
> > +path=perú.git
> > username=foo
> > password=bar
>
> Thanks for the report.
>
> This patch should fix the problem. Please let me know if there are
> any more oustanding issues.
Oops, I forgot to cc the mailing list. Sorry for the resend.
---8<---
For a here-document with a quoted delimiter, multi-byte characters
should be written out as is with no escaping. Fix this by checking
for syntax == SQSYNTAX (the only time readtoken1 gets called with
SQSYNTAX is for such a here-document) before calling getmbc in
readtoken1.
Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: b12f136cc704 ("builtin: Process multi-byte characters in read(1)")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/src/parser.c b/src/parser.c
index bea4148..412e876 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -998,9 +998,13 @@ static char *dollarsq_escape(char *out)
STATIC int
readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
{
- struct synstack synbase = { .syntax = syntax };
+ struct synstack synbase = {
+ .dblquote = syntax == DQSYNTAX,
+ .syntax = syntax,
+ };
int chkeofmark = checkkwd & CHKEOFMARK;
struct synstack *synstack = &synbase;
+ bool sqheredoc = syntax == SQSYNTAX;
struct nodelist *bqlist = NULL;
int dollarsq = 0;
int c = firstc;
@@ -1009,9 +1013,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
size_t len;
char *out;
- if (syntax == DQSYNTAX)
- synstack->dblquote = 1;
-
STARTSTACKSTR(out);
loop: { /* for each line, until end of word */
#if ATTY
@@ -1035,7 +1036,8 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
out);
fieldsplitting = synstack->syntax == BASESYNTAX &&
!synstack->varnest ? 4 : 0;
- ml = getmbc(c, out, fieldsplitting);
+ ml = getmbc(c, out, fieldsplitting |
+ (sqheredoc ? 2 : 0));
if (ml == 1) {
if (out == stackblock())
return TBLANK;
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter
[not found] ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
2026-05-07 7:37 ` [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter Herbert Xu
@ 2026-05-07 7:45 ` Herbert Xu
1 sibling, 0 replies; 2+ messages in thread
From: Herbert Xu @ 2026-05-07 7:45 UTC (permalink / raw)
To: DASH Mailing List
On Thu, May 07, 2026 at 01:57:30PM +0800, Herbert Xu wrote:
> On Thu, Apr 02, 2026 at 08:51:18AM +0200, Patrick Steinhardt wrote:
> > When executing our test suite with Dash v0.5.13.2 one can observe
> > several test failures that all have the same symptoms: we have a quoted
> > heredoc that contains multibyte characters, but the final data does not
> > match what we actually wanted to write. One such example is in t0300,
> > where we see the diffs like the following:
> >
> > --- expect-stdout 2026-04-01 07:25:45.249919440 +0000
> > +++ stdout 2026-04-01 07:25:45.254919509 +0000
> > @@ -1,5 +1,5 @@
> > protocol=https
> > host=example.com
> > -path=perú.git
> > +path=perú.git
> > username=foo
> > password=bar
>
> Thanks for the report.
>
> This patch should fix the problem. Please let me know if there are
> any more oustanding issues.
Resending again to dash mailing list with a fixed Subject line.
---8<---
For a here-document with a quoted delimiter, multi-byte characters
should be written out as is with no escaping. Fix this by checking
for syntax == SQSYNTAX (the only time readtoken1 gets called with
SQSYNTAX is for such a here-document) before calling getmbc in
readtoken1.
Reported-by: Patrick Steinhardt <ps@pks.im>
Fixes: b12f136cc704 ("builtin: Process multi-byte characters in read(1)")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/src/parser.c b/src/parser.c
index bea4148..412e876 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -998,9 +998,13 @@ static char *dollarsq_escape(char *out)
STATIC int
readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
{
- struct synstack synbase = { .syntax = syntax };
+ struct synstack synbase = {
+ .dblquote = syntax == DQSYNTAX,
+ .syntax = syntax,
+ };
int chkeofmark = checkkwd & CHKEOFMARK;
struct synstack *synstack = &synbase;
+ bool sqheredoc = syntax == SQSYNTAX;
struct nodelist *bqlist = NULL;
int dollarsq = 0;
int c = firstc;
@@ -1009,9 +1013,6 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
size_t len;
char *out;
- if (syntax == DQSYNTAX)
- synstack->dblquote = 1;
-
STARTSTACKSTR(out);
loop: { /* for each line, until end of word */
#if ATTY
@@ -1035,7 +1036,8 @@ readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
out);
fieldsplitting = synstack->syntax == BASESYNTAX &&
!synstack->varnest ? 4 : 0;
- ml = getmbc(c, out, fieldsplitting);
+ ml = getmbc(c, out, fieldsplitting |
+ (sqheredoc ? 2 : 0));
if (ml == 1) {
if (out == stackblock())
return TBLANK;
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-07 7:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260402-pks-tests-with-dash-v2-0-cd7ab11dabc0@pks.im>
[not found] ` <20260402-pks-tests-with-dash-v2-1-cd7ab11dabc0@pks.im>
[not found] ` <afwpyiK9mh23c-JV@gondor.apana.org.au>
2026-05-07 7:37 ` [PATCH] parser: Fix multi-byte output in here-doc with quoted delimiter Herbert Xu
2026-05-07 7:45 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox