From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Gibson Subject: Re: [PATCH] dtc: Return error when reading non-ascii chars Date: Tue, 23 Mar 2021 14:32:37 +1100 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="wPZncfvVR4uFqXhq" Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1616470364; bh=OrRvwrvUv4hCv+eqkmHVs0RHgBJhrd2b7MdtrOk06d8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=hLVgxW0f8JnAKHaRFUzp7x2KzP2zQIt3Sr+Xm/PHxsJ6AYQR3Pe8zjZ1o8OyoME++ LzzlRWjUL2v9ALi+nr6F1m4R1qbv9FcZ5JrhhcKFUHayV8QXdm2oy9HN2dpryc6S3d KOHarSrcUPgoal8rzXI6YkrQ+Njw+1O0QG6TsFqE= Content-Disposition: inline In-Reply-To: List-ID: To: montytyper-uAjRD0nVeow@public.gmane.org Cc: loeliger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, devicetree-compiler-u79uwXL29TY76Z2rM5mHXA@public.gmane.org --wPZncfvVR4uFqXhq Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Mar 13, 2021 at 08:40:54PM -0800, montytyper-uAjRD0nVeow@public.gmane.org wrote: > From: Jordan Montgomery >=20 > The underlying lexer code generated by flex checks > if(yychar <=3D YYEOF) to detect EOF. This returns true > for any char >=3D 0x80, resulting in premature EOF and > causing dtc to generate an invalid FDT. >=20 > This patch adds a rule to detect these illegal > characters and throw a fatal error instead, as well as > a test case which passes if dtc errors fatally and > fails otherwise. >=20 > Parsing of valid devicetrees should be unaffected by > this change. >=20 > Signed-off-by: Jordan Montgomery This doesn't seem right to me. AIUI, flex although not Unicode-aware, should be 8-bit clean. If I attempt to build your example file here without your lexer change, I get just what I'd expect: $ ../dtc -I dts -O dtb -o /dev/null bad-unicode.dts=20 Error: bad-unicode.dts:4.30-31 syntax error FATAL ERROR: Unable to parse input tree A fatal parse error, because it found a bogus character there, but no different than if it had been an ASCII bogus character. > --- > dtc-lexer.l | 6 ++++++ > tests/bad-unicode.dts | 6 ++++++ > tests/dtc-fatal.sh | 11 +++++++++-- > tests/run_tests.sh | 1 + > 4 files changed, 22 insertions(+), 2 deletions(-) > mode change 100644 =3D> 100755 dtc-lexer.l > create mode 100644 tests/bad-unicode.dts >=20 > diff --git a/dtc-lexer.l b/dtc-lexer.l > old mode 100644 > new mode 100755 > index b3b7270..668875b > --- a/dtc-lexer.l > +++ b/dtc-lexer.l > @@ -17,6 +17,7 @@ CHAR_LITERAL '([^']|\\')*' > WS [[:space:]] > COMMENT "/*"([^*]|\*+[^*/])*\*+"/" > LINECOMMENT "//".*\n > +NONASCII [\x80-\xff] > =20 > %{ > #include "dtc.h" > @@ -245,6 +246,11 @@ static void PRINTF(1, 2) lexical_error(const char *f= mt, ...); > <*>"&&" { return DT_AND; }; > <*>"||" { return DT_OR; }; > =20 > +<*>{NONASCII} { > + DPRINT("Illegal character x%X\n", (unsigned)yytext[0]); > + YY_FATAL_ERROR("Unable to parse non-ASCII character"); > + }; > + > <*>. { > DPRINT("Char: %c (\\x%02x)\n", yytext[0], > (unsigned)yytext[0]); > diff --git a/tests/bad-unicode.dts b/tests/bad-unicode.dts > new file mode 100644 > index 0000000..19b3d2f > --- /dev/null > +++ b/tests/bad-unicode.dts > @@ -0,0 +1,6 @@ > +/dts-v1/; > + > +/ { > + prop-str =3D "hello world";=E2=8C=98 > + prop-str2 =3D "hello world"; > +}; > diff --git a/tests/dtc-fatal.sh b/tests/dtc-fatal.sh > index 08a4d29..56cdc1c 100755 > --- a/tests/dtc-fatal.sh > +++ b/tests/dtc-fatal.sh > @@ -3,13 +3,20 @@ > SRCDIR=3D`dirname "$0"` > . "$SRCDIR/testutils.sh" > =20 > +if [ "$1" =3D "-r" ]; then > + expected=3D"$2" > + shift 2 > +else > + expected=3D"1" > +fi > + > verbose_run $VALGRIND "$DTC" -o/dev/null "$@" > ret=3D"$?" > =20 > if [ "$ret" -gt 127 ]; then > FAIL "dtc killed by signal (ret=3D$ret)" > -elif [ "$ret" !=3D "1" ]; then > - FAIL "dtc returned incorrect status $ret instead of 1" > +elif [ "$ret" !=3D "$expected" ]; then > + FAIL "dtc returned incorrect status $ret instead of $expected" > fi > =20 > PASS > diff --git a/tests/run_tests.sh b/tests/run_tests.sh > index 4b8dada..c9bc095 100755 > --- a/tests/run_tests.sh > +++ b/tests/run_tests.sh > @@ -764,6 +764,7 @@ dtc_tests () { > run_sh_test "$SRCDIR/dtc-fatal.sh" -I dts -O dtb nosuchfile.dts > run_sh_test "$SRCDIR/dtc-fatal.sh" -I dtb -O dtb nosuchfile.dtb > run_sh_test "$SRCDIR/dtc-fatal.sh" -I fs -O dtb nosuchfile > + run_sh_test "$SRCDIR/dtc-fatal.sh" -r 2 -I dts -O dtb bad-unicode.dts > =20 > # Dependencies > run_dtc_test -I dts -O dtb -o dependencies.test.dtb -d dependencies.= test.d "$SRCDIR/dependencies.dts" --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --wPZncfvVR4uFqXhq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmBZYVUACgkQbDjKyiDZ s5In5w//d3U9/elb8a0z+4nobcZ1MyIvvQqLSAPiR/4O0vn+53+nh9tfB3HZ5Gcn 8tUz/IvNQm3mz8n6U1MaAwpov69nCLS14KGXvAPlVuPsgGyjLX9D+c3r2hS1Sc8c 2hffPmMyTOfzf0qWtQweO6bz4Hxu1skB70Kjok5Oe7iNvWgc5HEdTEsfpgxW3zZ6 sMLfc9L/zBniGW/3KswHNWfVJurSAyQzxJ/crPX5AIjf/sRU1vDqBkx3Xlpk0VLf l9ztkOW7XSh7MPQ6KHwbL9ywtaRT5xJzvgKZlFwGQ6DBOQYsGR/He2w9QxnqYV7v uzCCwowerF8xarHfQCNB/CMllE+Ul6OTwb+yUqDyPI3pAaj+4HAOZlMa9NEbtazB ZUv+uenoUHbbfq6JXhHKxlyGAh7LEe7yfUiV27nAAYTBonK0aUak8iihVLjqsBVK 93Ru0kwsEjOEJZGQ8PHQWiLIp67ft2szTWnHCuctSkLgoe3Qob/Dhz8r97w5+qUy vjnNphroPDdy5i982GezSDFZdFCCrz6YANbCInvy7Qb62h+gcmJYgks5moTxTGvO IpfOP3HWrIMQbKWSF46ckDmp8vM8OC0c8KXXT+Ng3Hnq/FxOkd36TZHsPvEsucLC LRAYp5SEqM5qylJhlkJ4NNs5yhW/KgPLvaBBYFovqpfXX4XIGNw= =9FzA -----END PGP SIGNATURE----- --wPZncfvVR4uFqXhq--