DASH Shell discussions
 help / color / mirror / Atom feed
* [PATCH] fix UTF-8 issues in read() builtin
@ 2010-09-07 21:26 Alexey Zinovyev
  2010-09-07 22:57 ` Jilles Tjoelker
  0 siblings, 1 reply; 2+ messages in thread
From: Alexey Zinovyev @ 2010-09-07 21:26 UTC (permalink / raw)
  To: dash

[-- Attachment #1: Type: text/plain, Size: 199 bytes --]

Hello, I think there is a bug in read() builtin.

$ cat test
echo 'ρ'|while read i; do echo $i; done
$ dash test

$ bash test
ρ

Same with some japanese symbols.
Looks like dash strips 0x81 byte. 

[-- Attachment #2: dash-read-fix.patch --]
[-- Type: text/plain, Size: 378 bytes --]

diff --git a/src/miscbltin.c b/src/miscbltin.c
index 5ab1648..f8c5655 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t len)
 			 * will not modify the length of the string */
 			offset = sl->text - s;
 			remainder = backup + offset;
-			rmescapes(remainder);
 			setvar(*ap, remainder, 0);
 
 			return;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] fix UTF-8 issues in read() builtin
  2010-09-07 21:26 [PATCH] fix UTF-8 issues in read() builtin Alexey Zinovyev
@ 2010-09-07 22:57 ` Jilles Tjoelker
  0 siblings, 0 replies; 2+ messages in thread
From: Jilles Tjoelker @ 2010-09-07 22:57 UTC (permalink / raw)
  To: Alexey Zinovyev; +Cc: dash

On Wed, Sep 08, 2010 at 01:26:15AM +0400, Alexey Zinovyev wrote:
> Hello, I think there is a bug in read() builtin.

> $ cat test
> echo 'ρ'|while read i; do echo $i; done
> $ dash test

> $ bash test
> ρ

> Same with some japanese symbols.
> Looks like dash strips 0x81 byte. 

0x81 == CTLESC, the escape character in dash's internal representation.

> diff --git a/src/miscbltin.c b/src/miscbltin.c
> index 5ab1648..f8c5655 100644
> --- a/src/miscbltin.c
> +++ b/src/miscbltin.c
> @@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t len)
>  			 * will not modify the length of the string */
>  			offset = sl->text - s;
>  			remainder = backup + offset;
> -			rmescapes(remainder);
>  			setvar(*ap, remainder, 0);
>  
>  			return;

This patch is not correct as it will leave 0x81 bytes for backslash
escapes. That is probably a bit worse than ignoring the backslashes
entirely, which is what it does now. It attempts to "escape" the next
character by placing a CTLESC, but CTLESC does not and should not escape
IFS characters for ifsbreakup(); the recordregion() mechanism should be
used for that.

(For the intermediate representation generated by parser.c, CTLESC does
escape IFS characters. This is not ideal as it prevents IFS splitting
with CTL* bytes in word in ${var+-word}.)

The patch I posted separately fixes the handling of 0x81 and various
other issues with read (by using separate code instead of trying to use
expand.c). Backslash escaping works too although I have just found some
bugs with corner cases.

-- 
Jilles Tjoelker

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-09-07 22:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-07 21:26 [PATCH] fix UTF-8 issues in read() builtin Alexey Zinovyev
2010-09-07 22:57 ` Jilles Tjoelker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox