From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jilles Tjoelker Subject: Re: [PATCH] fix UTF-8 issues in read() builtin Date: Wed, 8 Sep 2010 00:57:33 +0200 Message-ID: <20100907225733.GB18839@stack.nl> References: <20100907212615.GA28796@3arch> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from relay02.stack.nl ([131.155.140.104]:55420 "EHLO mx1.stack.nl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755429Ab0IGW5e (ORCPT ); Tue, 7 Sep 2010 18:57:34 -0400 Content-Disposition: inline In-Reply-To: <20100907212615.GA28796@3arch> Sender: dash-owner@vger.kernel.org List-Id: dash@vger.kernel.org To: Alexey Zinovyev Cc: dash@vger.kernel.org On Wed, Sep 08, 2010 at 01:26:15AM +0400, Alexey Zinovyev wrote: > Hello, I think there is a bug in read() builtin. > $ cat test > echo '=CF=81'|while read i; do echo $i; done > $ dash test > $ bash test > =CF=81 > Same with some japanese symbols. > Looks like dash strips 0x81 byte.=20 0x81 =3D=3D CTLESC, the escape character in dash's internal representat= ion. > diff --git a/src/miscbltin.c b/src/miscbltin.c > index 5ab1648..f8c5655 100644 > --- a/src/miscbltin.c > +++ b/src/miscbltin.c > @@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t= len) > * will not modify the length of the string */ > offset =3D sl->text - s; > remainder =3D backup + offset; > - rmescapes(remainder); > setvar(*ap, remainder, 0); > =20 > return; This patch is not correct as it will leave 0x81 bytes for backslash escapes. That is probably a bit worse than ignoring the backslashes entirely, which is what it does now. It attempts to "escape" the next character by placing a CTLESC, but CTLESC does not and should not escap= e IFS characters for ifsbreakup(); the recordregion() mechanism should be used for that. (For the intermediate representation generated by parser.c, CTLESC does escape IFS characters. This is not ideal as it prevents IFS splitting with CTL* bytes in word in ${var+-word}.) The patch I posted separately fixes the handling of 0x81 and various other issues with read (by using separate code instead of trying to use expand.c). Backslash escaping works too although I have just found some bugs with corner cases. --=20 Jilles Tjoelker