All of lore.kernel.org
 help / color / mirror / Atom feed
* dtc: Clean up lexing of include files
@ 2008-06-26  7:08 David Gibson
  2008-07-14 18:55 ` Jon Loeliger
  0 siblings, 1 reply; 2+ messages in thread
From: David Gibson @ 2008-06-26  7:08 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev

Currently we scan the /include/ directive as two tokens, the
"/include/" keyword itself, then the string giving the file name to
include.  We use a special scanner state to keep the two linked
together, and use the scanner state stack to keep track of the
original state while we're parsing the two /include/ tokens.

This does mean that we need to enable the 'stack' option in flex,
which results in a not-easily-suppressed warning from the flex
boilerplate code.  This is mildly irritating.

However, this two-token scanning of the /include/ directive also has
some extremely strange edge cases, because there are a variety of
tokens recognized in all scanner states, including INCLUDE.  For
example the following strange dts file:

	/include/ /dts-v1/;
	/ {
		 /* ... */
	};

Will be processed successfully with the /include/ being effectively
ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state,
then the ';' transitions us to PROPNODENAME state, throwing away
INCLUDE, and the previous state is never popped off the stack.  Or
for another example this construct:
	foo /include/ = "somefile.dts"
will be parsed as though it were:
	foo = /include/ "somefile.dts"
Again, the '=' is scanned without leaving INCLUDE state, then the next
string triggers the include logic.

And finally, we use a different regexp for the string with the
included filename than the normal string regexpt, which is also
potentially weird.

This patch, therefore, cleans up the lexical handling of the /include/
directive.  Instead of the INCLUDE state, we instead scan the whole
include directive, both keyword and filename as a single token.  This
does mean a bit more complexity in extracting the filename out of
yytext, but I think it's worth it to avoid the strageness described
above.  It also means it's no longer possible to put a comment between
the /include/ and the filename, but I'm really not very worried about
breaking files using such a strange construct.

Index: dtc/dtc-lexer.l
===================================================================
--- dtc.orig/dtc-lexer.l	2008-06-26 17:07:40.000000000 +1000
+++ dtc/dtc-lexer.l	2008-06-26 17:07:46.000000000 +1000
@@ -18,7 +18,7 @@
  *                                                                   USA
  */
 
-%option noyywrap nounput yylineno stack
+%option noyywrap nounput yylineno
 
 %x INCLUDE
 %x BYTESTRING
@@ -28,6 +28,10 @@
 PROPNODECHAR	[a-zA-Z0-9,._+*#?@-]
 PATHCHAR	({PROPNODECHAR}|[/])
 LABEL		[a-zA-Z_][a-zA-Z0-9_]*
+STRING		\"([^\\"]|\\.)*\"
+WS		[[:space:]]
+COMMENT		"/*"([^*]|\*+[^*/])*\*+"/"
+LINECOMMENT	"//".*\n
 
 %{
 #include "dtc.h"
@@ -58,22 +62,19 @@
 %}
 
 %%
-<*>"/include/"		yy_push_state(INCLUDE);
-
-<INCLUDE>\"[^"\n]*\"	{
-			yytext[strlen(yytext) - 1] = 0;
-			push_input_file(yytext + 1);
-			yy_pop_state();
+<*>"/include/"{WS}*{STRING} {
+			char *name = strchr(yytext, '\"') + 1;
+			yytext[yyleng-1] = '\0';
+			push_input_file(name);
 		}
 
-
 <*><<EOF>>		{
 			if (!pop_input_file()) {
 				yyterminate();
 			}
 		}
 
-<*>\"([^\\"]|\\.)*\"	{
+<*>{STRING}	{
 			yylloc.file = srcpos_file;
 			yylloc.first_line = yylineno;
 			DPRINT("String: %s\n", yytext);
@@ -197,16 +198,9 @@
 			return DT_INCBIN;
 		}
 
-<*>[[:space:]]+	/* eat whitespace */
-
-<*>"/*"([^*]|\*+[^*/])*\*+"/"	{
-			yylloc.file = srcpos_file;
-			yylloc.first_line = yylineno;
-			DPRINT("Comment: %s\n", yytext);
-			/* eat comments */
-		}
-
-<*>"//".*\n	/* eat line comments */
+<*>{WS}+	/* eat whitespace */
+<*>{COMMENT}+	/* eat C-style comments */
+<*>{LINECOMMENT}+ /* eat C++-style comments */
 
 <*>.		{
 			yylloc.file = srcpos_file;
Index: dtc/convert-dtsv0-lexer.l
===================================================================
--- dtc.orig/convert-dtsv0-lexer.l	2008-06-26 17:07:40.000000000 +1000
+++ dtc/convert-dtsv0-lexer.l	2008-06-26 17:07:46.000000000 +1000
@@ -17,7 +17,7 @@
  *                                                                   USA
  */
 
-%option noyywrap nounput stack
+%option noyywrap nounput
 
 %x INCLUDE
 %x BYTESTRING
@@ -26,6 +26,11 @@
 PROPNODECHAR	[a-zA-Z0-9,._+*#?@-]
 PATHCHAR	({PROPNODECHAR}|[/])
 LABEL		[a-zA-Z_][a-zA-Z0-9_]*
+STRING		\"([^\\"]|\\.)*\"
+WS		[[:space:]]
+COMMENT		"/*"([^*]|\*+[^*/])*\*+"/"
+LINECOMMENT	"//".*\n
+GAP		({WS}|{COMMENT}|{LINECOMMENT})*
 
 %{
 #include <string.h>
@@ -91,16 +96,7 @@
 %}
 
 %%
-<*>"/include/"	{
-			ECHO;
-			yy_push_state(INCLUDE);
-		}
-
-<INCLUDE>\"[^"\n]*\"	{
-			ECHO;
-			yy_pop_state();
-		}
-
+<*>"/include/"{GAP}{STRING}	ECHO;
 
 <*>\"([^\\"]|\\.)*\"	ECHO;
 
@@ -193,11 +189,7 @@
 			BEGIN(INITIAL);
 		}
 
-<*>[[:space:]]+		ECHO;
-
-<*>"/*"([^*]|\*+[^*/])*\*+"/" ECHO;
-
-<*>"//".*\n		ECHO;
+<*>{GAP}	ECHO;
 
 <*>-		{	/* Hack to convert old style memreserves */
 			saw_hyphen = 1;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: dtc: Clean up lexing of include files
  2008-06-26  7:08 dtc: Clean up lexing of include files David Gibson
@ 2008-07-14 18:55 ` Jon Loeliger
  0 siblings, 0 replies; 2+ messages in thread
From: Jon Loeliger @ 2008-07-14 18:55 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev

> Currently we scan the /include/ directive as two tokens, the
> "/include/" keyword itself, then the string giving the file name to
> include.  We use a special scanner state to keep the two linked
> together, and use the scanner state stack to keep track of the
> original state while we're parsing the two /include/ tokens.
> 
> This does mean that we need to enable the 'stack' option in flex,
> which results in a not-easily-suppressed warning from the flex
> boilerplate code.  This is mildly irritating.
> 
> However, this two-token scanning of the /include/ directive also has
> some extremely strange edge cases, because there are a variety of
> tokens recognized in all scanner states, including INCLUDE.  For
> example the following strange dts file:
> 
> 	/include/ /dts-v1/;
> 	/ {
> 		 /* ... */
> 	};
> 
> Will be processed successfully with the /include/ being effectively
> ignored: the '/dts-v1/' and ';' are recognized even in INCLUDE state,
> then the ';' transitions us to PROPNODENAME state, throwing away
> INCLUDE, and the previous state is never popped off the stack.  Or
> for another example this construct:
> 	foo /include/ = "somefile.dts"
> will be parsed as though it were:
> 	foo = /include/ "somefile.dts"
> Again, the '=' is scanned without leaving INCLUDE state, then the next
> string triggers the include logic.
> 
> And finally, we use a different regexp for the string with the
> included filename than the normal string regexpt, which is also
> potentially weird.
> 
> This patch, therefore, cleans up the lexical handling of the /include/
> directive.  Instead of the INCLUDE state, we instead scan the whole
> include directive, both keyword and filename as a single token.  This
> does mean a bit more complexity in extracting the filename out of
> yytext, but I think it's worth it to avoid the strageness described
> above.  It also means it's no longer possible to put a comment between
> the /include/ and the filename, but I'm really not very worried about
> breaking files using such a strange construct.

Applied.

jdl

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-07-14 18:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26  7:08 dtc: Clean up lexing of include files David Gibson
2008-07-14 18:55 ` Jon Loeliger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.