* dash breaks u0441
@ 2011-02-16 22:25 Alexey Gladkov
2011-02-24 11:43 ` [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81 Alexey Gladkov
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Gladkov @ 2011-02-16 22:25 UTC (permalink / raw)
To: dash
Greetings!
dash breaks cyrillic_small_letter_es (U+0441) in UTF8 encoding:
$ /usr/bin/printf '[\u0441]\n'
[с]
$ /usr/bin/printf '[\u0441]\n' |dash -c 'read c; echo "$c"'
[Ñ]
But the characters around are displayed correctly:
$ /usr/bin/printf '[\u0440]\n' |dash -c 'read c; echo "$c"'
[р]
$ /usr/bin/printf '[\u0442]\n' |dash -c 'read c; echo "$c"'
[т]
$ /usr/bin/printf '[\u0451]\n' |dash -c 'read c; echo "$c"'
[ё]
0d7d66039b614b642c775432fd64aa8c11f9a64d was good.
55c46b7286f5d9f2d8291158203e2b61d2494420 is bad.
49a94e2bab1e4f601a9fbdf9615d9e4e0150e412 is bad too.
--
Rgrds, legion
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81
2011-02-16 22:25 dash breaks u0441 Alexey Gladkov
@ 2011-02-24 11:43 ` Alexey Gladkov
2011-03-10 13:01 ` Herbert Xu
0 siblings, 1 reply; 5+ messages in thread
From: Alexey Gladkov @ 2011-02-24 11:43 UTC (permalink / raw)
To: dash
Starting with commit 55c46b dash removes CTLESC bytes ('\x81')
from read sequence. This leads to breakage of some UTF8
characters. Like in commit f8231a, this change fixes corruption
by removing the faulty code.
Testcase:
$ /usr/bin/printf '[\u0441]\n'
[с]
$ /usr/bin/printf '[\u0441]\n' |dash -c 'read c; printf "%s\n" "$c"'
[Ñ]
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
---
src/miscbltin.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/src/miscbltin.c b/src/miscbltin.c
index 653c92f..4e88e8d 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -112,14 +112,12 @@ readcmd_handle_line(char *line, char **ap, size_t len)
* will not modify the length of the string */
offset = sl->text - s;
remainder = backup + offset;
- rmescapes(remainder);
setvar(*ap, remainder, 0);
return;
}
/* set variable to field */
- rmescapes(sl->text);
setvar(*ap, sl->text, 0);
sl = sl->next;
} while (*++ap);
--
1.7.3.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81
2011-02-24 11:43 ` [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81 Alexey Gladkov
@ 2011-03-10 13:01 ` Herbert Xu
2011-03-11 0:04 ` Jilles Tjoelker
0 siblings, 1 reply; 5+ messages in thread
From: Herbert Xu @ 2011-03-10 13:01 UTC (permalink / raw)
To: Alexey Gladkov; +Cc: dash
On Thu, Feb 24, 2011 at 11:43:44AM +0000, Alexey Gladkov wrote:
> Starting with commit 55c46b dash removes CTLESC bytes ('\x81')
> from read sequence. This leads to breakage of some UTF8
> characters. Like in commit f8231a, this change fixes corruption
> by removing the faulty code.
Thanks for the diagnosis and patch!
Unfortunately we can't just delete the rmescaps call since we do
use CTLESC to represent backslash characters in the input stream
which prevents field splitting.
So the correct fix is to add extra CTLESCs wherever CTLESC appears
in the input. The following patch should fix the problem.
commit 54413164e587dd2dc5d7bce0bd3fab61d7ba758c
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu Mar 10 20:59:46 2011 +0800
[BUILTIN] Fix CTLESC clobbering by read(1)
The changeset 55c46b7286f5d9f2d8291158203e2b61d2494420
[BUILTIN] Honor tab as IFS whitespace when splitting fields in readcmd
uses CTLESC to prevent field splitting in read(1). However,
it did not escape CTLESC itself in the input stream. This patch
adds the necessary CTLESC characters so that CTLESC isn't corrupted.
Reported-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/ChangeLog b/ChangeLog
index 173f057..6d02fa9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2011-03-10 Herbert Xu <herbert@gondor.apana.org.au>
+
+ * Fix CTLESC clobbering by read(1).
+
2011-03-10 Brian Koropoff <bkoropoff@gmail.com>
* Port to AIX.
diff --git a/src/miscbltin.c b/src/miscbltin.c
index 653c92f..800cbbb 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -178,7 +178,7 @@ readcmd(int argc, char **argv)
}
if (c == '\0')
continue;
- if (backslash) {
+ if (backslash || c == CTLESC) {
if (c == '\n')
goto resetbs;
STPUTC(CTLESC, p);
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81
2011-03-10 13:01 ` Herbert Xu
@ 2011-03-11 0:04 ` Jilles Tjoelker
2011-03-11 3:40 ` Herbert Xu
0 siblings, 1 reply; 5+ messages in thread
From: Jilles Tjoelker @ 2011-03-11 0:04 UTC (permalink / raw)
To: Herbert Xu; +Cc: Alexey Gladkov, dash
On Thu, Mar 10, 2011 at 09:01:45PM +0800, Herbert Xu wrote:
> On Thu, Feb 24, 2011 at 11:43:44AM +0000, Alexey Gladkov wrote:
> > Starting with commit 55c46b dash removes CTLESC bytes ('\x81')
> > from read sequence. This leads to breakage of some UTF8
> > characters. Like in commit f8231a, this change fixes corruption
> > by removing the faulty code.
> Thanks for the diagnosis and patch!
> Unfortunately we can't just delete the rmescaps call since we do
> use CTLESC to represent backslash characters in the input stream
> which prevents field splitting.
> So the correct fix is to add extra CTLESCs wherever CTLESC appears
> in the input. The following patch should fix the problem.
That is not how ifsbreakup() works. As I have written in FreeBSD sh
expand.c:
/*
* Break the argument string into pieces based upon IFS and add the
* strings to the argument list. The regions of the string to be
* searched for IFS characters have been stored by recordregion.
* CTLESC characters are preserved but have little effect in this pass
* other than escaping CTL* characters. In particular, they do not escape
* IFS characters: that should be done with the ifsregion mechanism.
* CTLQUOTEMARK characters are used to preserve empty quoted strings.
* This pass treats them as a regular character, making the string non-empty.
* Later, they are removed along with the other CTL* characters.
*/
The ifsbreakup() function works the same way in dash. (One reason is
that this allows using the CTL* bytes in IFS, although it may not be
that useful because of the prevalence of UTF-8.)
So while this patch fixes corruption with byte 0x81, backslashes
continue to have no effect at all. Instead, all non-backslashed
characters should be marked with recordregion(), leaving CTLESC
prefixing for CTLESC only.
Apart from that, there is corruption with byte 0x88, CTLQUOTEMARK. I
think that can be fixed in the same way by prefixing with CTLESC.
By the way, in the data pointed to by NARG nodes, dash does use CTLESC
for backslashed characters that should not be IFS splitting points,
which is only relevant for WORD in ${VAR+WORD} and ${VAR-WORD}. A
downside of this is that quoted and unquoted CTL* bytes cannot be
distinguished; therefore I have solved this differently in FreeBSD.
--
Jilles Tjoelker
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81
2011-03-11 0:04 ` Jilles Tjoelker
@ 2011-03-11 3:40 ` Herbert Xu
0 siblings, 0 replies; 5+ messages in thread
From: Herbert Xu @ 2011-03-11 3:40 UTC (permalink / raw)
To: Jilles Tjoelker; +Cc: Alexey Gladkov, dash
On Fri, Mar 11, 2011 at 01:04:25AM +0100, Jilles Tjoelker wrote:
>
> That is not how ifsbreakup() works. As I have written in FreeBSD sh
> expand.c:
Thanks for catching this. The following patch should fix it.
> Apart from that, there is corruption with byte 0x88, CTLQUOTEMARK. I
> think that can be fixed in the same way by prefixing with CTLESC.
And this too.
> By the way, in the data pointed to by NARG nodes, dash does use CTLESC
> for backslashed characters that should not be IFS splitting points,
> which is only relevant for WORD in ${VAR+WORD} and ${VAR-WORD}. A
> downside of this is that quoted and unquoted CTL* bytes cannot be
> distinguished; therefore I have solved this differently in FreeBSD.
Good point. Please do let us know how you think we should fix this.
commit 6e1c8399e82c015f4e9d7d67e98d70541a3ef2d0
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri Mar 11 11:07:42 2011 +0800
[BUILTIN] Fix backslash handling in read(1)
The new read(1) implementation incorrectly assumes that ifsbreakup
ignores characters escaped by CTLESC. As such it fails to handle
backslashes except for escaping newlines.
This patch makes it use recordregion for every part that isn't
escaped by a backslash.
Reported-by: Jilles Tjoelker <jilles@stack.nl>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/ChangeLog b/ChangeLog
index 8a832bb..e96bdc4 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2011-03-11 Herbert Xu <herbert@gondor.apana.org.au>
+
+ * Fix backslash handling in read(1).
+
2011-03-10 Jonathan Nieder <jrnieder@gmail.com>
* Dotcmd should exit with zero when doing nothing.
diff --git a/src/expand.c b/src/expand.c
index 6bee5c5..7a9b157 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1597,7 +1597,6 @@ char *
_rmescapes(char *str, int flag)
{
char *p, *q, *r;
- static const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 };
unsigned inquotes;
int notescaped;
int globbing;
diff --git a/src/miscbltin.c b/src/miscbltin.c
index 800cbbb..f507381 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -71,21 +71,22 @@
* @param len length of line including trailing '\0'
*/
static void
-readcmd_handle_line(char *line, char **ap, size_t len)
+readcmd_handle_line(char *s, char **ap)
{
struct arglist arglist;
struct strlist *sl;
- char *s, *backup;
+ char *backup;
+ char *line;
/* ifsbreakup will fiddle with stack region... */
- s = grabstackstr(line + len);
+ line = stackblock();
+ s = grabstackstr(s);
/* need a copy, so that delimiters aren't lost
* in case there are more fields than variables */
backup = sstrdup(line);
arglist.lastp = &arglist.list;
- recordregion(0, len - 1, 0);
ifsbreakup(s, &arglist);
*arglist.lastp = NULL;
@@ -137,11 +138,12 @@ int
readcmd(int argc, char **argv)
{
char **ap;
- int backslash;
char c;
int rflag;
char *prompt;
char *p;
+ int startloc;
+ int newloc;
int status;
int i;
@@ -161,9 +163,12 @@ readcmd(int argc, char **argv)
}
if (*(ap = argptr) == NULL)
sh_error("arg count");
+
status = 0;
- backslash = 0;
STARTSTACKSTR(p);
+
+ goto start;
+
for (;;) {
switch (read(0, &c, 1)) {
case 1:
@@ -178,26 +183,35 @@ readcmd(int argc, char **argv)
}
if (c == '\0')
continue;
- if (backslash || c == CTLESC) {
+ if (newloc >= startloc) {
if (c == '\n')
goto resetbs;
- STPUTC(CTLESC, p);
goto put;
}
if (!rflag && c == '\\') {
- backslash++;
+ newloc = p - (char *)stackblock();
continue;
}
if (c == '\n')
break;
put:
- STPUTC(c, p);
+ CHECKSTRSPACE(2, p);
+ if (strchr(qchars, c))
+ USTPUTC(CTLESC, p);
+ USTPUTC(c, p);
+
+ if (newloc >= startloc) {
resetbs:
- backslash = 0;
+ recordregion(startloc, newloc, 0);
+start:
+ startloc = p - (char *)stackblock();
+ newloc = startloc - 1;
+ }
}
out:
+ recordregion(startloc, p - (char *)stackblock(), 0);
STACKSTRNUL(p);
- readcmd_handle_line(stackblock(), ap, p + 1 - (char *)stackblock());
+ readcmd_handle_line(p + 1, ap);
return status;
}
diff --git a/src/mystring.c b/src/mystring.c
index ce48c82..bbb6b77 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -62,6 +62,7 @@ const char spcstr[] = " ";
const char snlfmt[] = "%s\n";
const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL, '@', '=',
CTLQUOTEMARK, '\0' };
+const char qchars[] = { CTLESC, CTLQUOTEMARK, 0 };
const char illnum[] = "Illegal number: %s";
const char homestr[] = "HOME";
diff --git a/src/mystring.h b/src/mystring.h
index 2e0540a..3522523 100644
--- a/src/mystring.h
+++ b/src/mystring.h
@@ -41,6 +41,7 @@ extern const char snlfmt[];
extern const char spcstr[];
extern const char dolatstr[];
#define DOLATSTRLEN 6
+extern const char qchars[];
extern const char illnum[];
extern const char homestr[];
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-03-11 3:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-16 22:25 dash breaks u0441 Alexey Gladkov
2011-02-24 11:43 ` [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81 Alexey Gladkov
2011-03-10 13:01 ` Herbert Xu
2011-03-11 0:04 ` Jilles Tjoelker
2011-03-11 3:40 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox