Some slightly random musings on device tree expression syntax

devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Some slightly random musings on device tree expression syntax
@ 2012-03-08  0:40 Stephen Warren
       [not found] ` <4F580005.403-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Warren @ 2012-03-08  0:40 UTC (permalink / raw)
  To: david-xT8FGy+AXnRB3Ne2BGzF6laj5H9X9Tb+, jdl-CYoMK+44s/E
  Cc: devicetree-discuss

I was thinking some more about how to expand the device tree syntax to
allow expressions. I wondered if we should use a concept/syntax more
inspired by template processors. Playing with jinja2 and gpp led me
towards (...) being an inline expression syntax that can calculate
integers or strings and get replaced by the string representation of the
expression, and ! at the start of a line introducing a statement
context. So, below are my somewhat wandering thoughts on the matter.
However, the idea still raises a lot of questions that'd need to be
resolved.

I note a few things:

* Using the (...) syntax to indicate which parts of the file should be
evaluated and the substituted solves the issue that David had with Jon's
proposal re: how do you know when a node name is literal text vs.
concatenated to some expression.

* Separating the device tree syntax and pre-processor/... phase allows
them to be decoupled and the pre-processor potentially optional, or even
replaced if things don't work out, or different people could use their
own thing.

* As an aside, I wonder if we couldn't transparently allow <1 2 3> or
<1, 2, 3> for cell list syntax, thus not requiring the brackets in
previously proposed <(1 + 0) (1 + 1) (4 - 1)> syntax, but rather <1 + 0,
1 + 1, 4 - 1>?

Concept
========================================

The .dts syntax that dtc reads is unchanged.

A pre-processing phase occurs on .dts files that handles all aspects of
expressions; all definitions, macro processing, expression process, etc.
are evaluated and fully expanded to strings during the pre-processing
phase. The result of the pre-processing phase should be a source file or
stream that can be handled by the existing dtc.

Whether this pre-processing phase is implemented as:
* A separate executable, manually invoked by the user.
* A separate executable, automatically invoked by dtc itself.
* Something built into dtc itself.
... is not addressed by this proposal.

One potential issue here: if the pre-processing and regular compilation
phases are completely separate, do we need to pay attention that the
int, literal, byte-sequence literal syntax stays the same between the
two phases to reduce confusion, or not?

Pre-processing
========================================

Contexts:

  Pass-through:

    By default, the pass-through context is active.

    Data is passed from input to output without modification, except
    that data is searched for markers that begin other contexts.

  Expression:

    Introduced by: (
    Terminated by: a matching )

    The text within this context is interpreted as an expression. That
    expression is evaluated, the result formatting as a string, and that
    string written to the output stream in place of the ( ) markers and
    the expression between them.

    Expression context can being anywhere within the source stream; no
    note is taken of the tokens that the device tree language

  Statement:

    Introduced by: !
      Notes: Or some other suitable character; # conflicts with property
      names unless we require it to be in the first column, and also
      sounds too much much like regular cpp, so people might get
      confused. @ might work. This is probably bike-shedding at this
      point...
    Terminated by: End of line

Example:

Note: // comments are used below as comments in this document, not
necessarily comments in the actual proposed syntax.

// Simple constant definitions
// Syntax of RHS matches existing .dts syntax

!defint usbbase 0x6000000
!defint usbsize 0x100
!defint usbstride 0x1000
!defstr usb "usb"
!defbytes somebytes [de ad be ef]

// or perhaps implicitly set variable type based on type of the RHS?
!define usbbase 0x6000000
!define usb "usb"

or !assign or !let ...

// RHS may also use expression syntax
// and references to previously defined variables

!defint usb3base usbbase + (2 * usbstride)
!defstr catenated usb + "2"

// Simple use of some variables:

(usbbase) (usbsize) (catenated)

// which yields:
// 0x6000000 0x100 usb2

// A more complex example:

(usb)3@(usb3base) {
    reg = <(usb3base) (usbsize)>;
    name = "(usb)3";
};

// which yields:
// usb3@0x60002000 {
//     reg = <0x60002000 0x100>;
//     name = "usb3";
// };

// Question: Do ints always format as 0x%x since that's the most common,
// or do we need explicit control over the base etc.?
//
// Question: How do we know when to format strings with "" around them,
// e.g. for use as property values, and when not to, e.g. for use in
// arbitrary contexts? For example above, it'd be nice if when defining
// the name property, we could write 'name = usb3name;' and have it
// expand to 'name = "usb3";' given a str variable with value "usb3",
// yet we don't want the quotes when using variable usb in the node
// name in the example earlier.
//
// Question: What if we actually wanted the property value "(usb3)". How
// do we stop the expansion; how to escape?
//
// I suppose the solution for the latter 2 questions is that the
// expansion has to actually be sensitive to context in the underlying
// language, and include "" in property value context, but not
// elsewhere. But, what if you write:

!defstr nasty "usb@0x6000000 { name =";
(nasty) (foo);

// Additional statements could include if, for, while, ...:

!ifdef somevar
foo bar
!else
baz qux
!endif

// I think we don't need e.g.:

foo !ifdef somevar! bar !else! baz !endif! qux

// ... since I think that we can line-break in the middle of any
// property or node definition, so we could just do this instead:

foo
!ifdef somevar
bar
!else
baz
!endif
qux

// If we need to actually concatenate the strings into one, we can do
// that as an expression somehow, assign the result to a variable, and
// expand just that.

!defstr xxx "foo"
!ifdef somevar
!defstr xxx xxx + "bar"
!else
!defstr xxx xxx + "baz"
!endif
!defstr xxx xxx + "qux"
(xxx)

// Perhaps we can delimit large blocks of statements in a way that
// doesn't need a lot of !s:

!!
xxx = "foo"
if somevar:
    xxx += "bar"
else:
    xxx += "baz"
xxx += "qux"
!!
(xxx)

// Then, we can start allowing complex things like macro or function
// definitions within the !! block; a full regular language, and
// perhaps we could even borrow an existing one here.

// About functions: Perhaps cpp-style macros:

!define func(a, b, c) a + b + c

// where the RHS is an expression that can use variables in the
// parameter list
//
// Or, is the RHS/body raw text, so something more like:

!macro func(a, b, c)
   foo {
      prop = (a + b + c);
   }
!end

// Perhaps we need both; one with text RHS accepting escapes into
// expressions, one with an expression on the RHS.

// I wondered if !define's RHS should always be an expression, or
// instead always be raw text with the same (...) escape to expressions
// as in regular text:

// (assuming a, b, c are extant variables)

// all variables are strings?
!define foo (a) + (b) + (c)

or:

// yields an integer variable
!defint foo a + b + c

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some slightly random musings on device tree expression syntax
       [not found] ` <4F580005.403-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
@ 2012-03-12 13:53   ` Jon Loeliger
       [not found]     ` <E1S75gQ-0005WK-0D-CYoMK+44s/E@public.gmane.org>
  2012-03-13  4:46   ` David Gibson
  1 sibling, 1 reply; 6+ messages in thread
From: Jon Loeliger @ 2012-03-12 13:53 UTC (permalink / raw)
  To: Stephen Warren; +Cc: devicetree-discuss

> I was thinking some more about how to expand the device tree syntax to
> allow expressions.

Excellent!

> I wondered if we should use a concept/syntax more
> inspired by template processors. Playing with jinja2 and gpp led me
> towards (...) being an inline expression syntax that can calculate
> integers or strings and get replaced by the string representation of the
> expression, and ! at the start of a line introducing a statement
> context. So, below are my somewhat wandering thoughts on the matter.
> However, the idea still raises a lot of questions that'd need to be
> resolved.
> 
> I note a few things:
> 
> * Using the (...) syntax to indicate which parts of the file should be
> evaluated and the substituted solves the issue that David had with Jon's
> proposal re: how do you know when a node name is literal text vs.
> concatenated to some expression.

So the M4 solution then.

> * As an aside, I wonder if we couldn't transparently allow <1 2 3> or
> <1, 2, 3> for cell list syntax, thus not requiring the brackets in
> previously proposed <(1 + 0) (1 + 1) (4 - 1)> syntax, but rather <1 + 0,
> 1 + 1, 4 - 1>?

That's the sort of direction I advocated earlier.

jdl

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some slightly random musings on device tree expression syntax
       [not found]     ` <E1S75gQ-0005WK-0D-CYoMK+44s/E@public.gmane.org>
@ 2012-03-12 23:57       ` David Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2012-03-12 23:57 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: devicetree-discuss

On Mon, Mar 12, 2012 at 08:53:05AM -0500, Jon Loeliger wrote:
> > I was thinking some more about how to expand the device tree syntax to
> > allow expressions.
> 
> Excellent!
> 
> > I wondered if we should use a concept/syntax more
> > inspired by template processors. Playing with jinja2 and gpp led me
> > towards (...) being an inline expression syntax that can calculate
> > integers or strings and get replaced by the string representation of the
> > expression, and ! at the start of a line introducing a statement
> > context. So, below are my somewhat wandering thoughts on the matter.
> > However, the idea still raises a lot of questions that'd need to be
> > resolved.
> > 
> > I note a few things:
> > 
> > * Using the (...) syntax to indicate which parts of the file should be
> > evaluated and the substituted solves the issue that David had with Jon's
> > proposal re: how do you know when a node name is literal text vs.
> > concatenated to some expression.
> 
> So the M4 solution then.

Erm.. use of (...) to disambiguate expressions seems an independent
matter from whether we use m4 or a macro preprocessor versus
in-dtc-proper expression evaluation.

> > * As an aside, I wonder if we couldn't transparently allow <1 2 3> or
> > <1, 2, 3> for cell list syntax, thus not requiring the brackets in
> > previously proposed <(1 + 0) (1 + 1) (4 - 1)> syntax, but rather <1 + 0,
> > 1 + 1, 4 - 1>?
> 
> That's the sort of direction I advocated earlier.

Hrm.  I don't think this is a good idea.  Having two different cell
list formats seems to me to encourage confusions for minimal benefit.
I think (...) will generally delimit expressions more readably anyway.
Especially since it would match using that syntax to distinguish
expressions in other places, like node or property names.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some slightly random musings on device tree expression syntax
       [not found] ` <4F580005.403-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
  2012-03-12 13:53   ` Jon Loeliger
@ 2012-03-13  4:46   ` David Gibson
       [not found]     ` <20120313044631.GJ24916-MK4v0fQdeXQXU02nzanrWNbf9cGiqdzd@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: David Gibson @ 2012-03-13  4:46 UTC (permalink / raw)
  To: Stephen Warren; +Cc: devicetree-discuss

On Wed, Mar 07, 2012 at 05:40:37PM -0700, Stephen Warren wrote:
> I was thinking some more about how to expand the device tree syntax to
> allow expressions. I wondered if we should use a concept/syntax more
> inspired by template processors. Playing with jinja2 and gpp led me
> towards (...) being an inline expression syntax that can calculate
> integers or strings and get replaced by the string representation of the
> expression, and ! at the start of a line introducing a statement
> context. So, below are my somewhat wandering thoughts on the matter.
> However, the idea still raises a lot of questions that'd need to be
> resolved.
> 
> I note a few things:
> 
> * Using the (...) syntax to indicate which parts of the file should be
> evaluated and the substituted solves the issue that David had with Jon's
> proposal re: how do you know when a node name is literal text vs.
> concatenated to some expression.

Yeah, I've been thinking for quite some time that using (...) to
disambiguate expressions in the necessary places was te way to go.  It
works for cell lists, for node and property names and syntactically
required parens have precedent in C if statements.

I was only thinking of requireing (...) only in places where it's
otherwise ambiguous.  This works fairly naturally in the grammar,
since C-like expression grammars usually bottom out at something like:
      primitive_expr := literal | identifier | '(' expr ')' ;
So instead of replacing literal with expr in the celllist grammar, for
example, we replace it with primitive_expr.

> * Separating the device tree syntax and pre-processor/... phase allows
> them to be decoupled and the pre-processor potentially optional, or even
> replaced if things don't work out, or different people could use their
> own thing.

Ok, so, I've been leaninng towards a preprocessor for constant/macro
support for some time (on the basis of the ratio of flexibility to
conceptual complexity).  However, I was envisaging that stage
outputting (constant) expressions that were still actually evaluated
by dtc.  Still, if you can make a good case for expression evaluation
in the pre-processor...

> * As an aside, I wonder if we couldn't transparently allow <1 2 3> or
> <1, 2, 3> for cell list syntax, thus not requiring the brackets in
> previously proposed <(1 + 0) (1 + 1) (4 - 1)> syntax, but rather <1 + 0,
> 1 + 1, 4 - 1>?

As I said in another reply, I don't like this idea.  It creates
potentially confusing variations of the syntax for no benefit that I
can see.

> Concept
> ========================================
> 
> The .dts syntax that dtc reads is unchanged.
> 
> A pre-processing phase occurs on .dts files that handles all aspects of
> expressions; all definitions, macro processing, expression process, etc.
> are evaluated and fully expanded to strings during the pre-processing
> phase. The result of the pre-processing phase should be a source file or
> stream that can be handled by the existing dtc.
> 
> Whether this pre-processing phase is implemented as:
> * A separate executable, manually invoked by the user.
> * A separate executable, automatically invoked by dtc itself.
> * Something built into dtc itself.
> ... is not addressed by this proposal.
> 
> One potential issue here: if the pre-processing and regular compilation
> phases are completely separate, do we need to pay attention that the
> int, literal, byte-sequence literal syntax stays the same between the
> two phases to reduce confusion, or not?

I'm not sure quite what you're getting at here.

> 
> Pre-processing
> ========================================
> 
> Contexts:
> 
>   Pass-through:
> 
>     By default, the pass-through context is active.
> 
>     Data is passed from input to output without modification, except
>     that data is searched for markers that begin other contexts.
> 
>   Expression:
> 
>     Introduced by: (
>     Terminated by: a matching )
> 
>     The text within this context is interpreted as an expression. That
>     expression is evaluated, the result formatting as a string, and that
>     string written to the output stream in place of the ( ) markers and
>     the expression between them.
> 
>     Expression context can being anywhere within the source stream; no
>     note is taken of the tokens that the device tree language

Hrm.  I'm pretty dubious about doing the expression evaluation (as
opposed to macro/constant expansion) within the preprocessor, then
resubstituting as a string.

It would work ok for integer expressions, but for bytestring
expressions, it seems likely we'd have to duplicate the
lexical/grammar constructs for [...], <...> and basic literals between
preproc and dtc, which seems a bit horrible.

In addition this approach means that an expression can never express a
value which a literal couldn't.  No problem in most cases, but one
thing I had in mind is that an expression syntax could be used to
specify a node or property name with illegal characters in it (mostly
relevant for ensuring that doing -I dtb -O dts then -I dts -O will
always end up exactly where you started, even when the original dtb is
corrupted or otherwise contains things it shouldn't.

>   Statement:
> 
>     Introduced by: !
>       Notes: Or some other suitable character; # conflicts with property
>       names unless we require it to be in the first column, and also
>       sounds too much much like regular cpp, so people might get
>       confused. @ might work. This is probably bike-shedding at this
>       point...
>     Terminated by: End of line

These three states aren't quite sufficient.  At the very least you
need a string state, so that expressions are not expanded within " ".
And we probably shouldn't be expanding them within comments, either.

> Example:
> 
> Note: // comments are used below as comments in this document, not
> necessarily comments in the actual proposed syntax.
> 
> // Simple constant definitions
> // Syntax of RHS matches existing .dts syntax
> 
> !defint usbbase 0x6000000
> !defint usbsize 0x100
> !defint usbstride 0x1000
> !defstr usb "usb"
> !defbytes somebytes [de ad be ef]
> 
> // or perhaps implicitly set variable type based on type of the RHS?
> !define usbbase 0x6000000
> !define usb "usb"

Hrm.  If using defines is based on textual substitution, then type
should be irrelevant.  If they're not based on textual substitution,
then the "preprocessor" is doing something rather more involved than
something with that name normally would.

> or !assign or !let ...
> 
> // RHS may also use expression syntax
> // and references to previously defined variables
> 
> !defint usb3base usbbase + (2 * usbstride)
> !defstr catenated usb + "2"
> 
> // Simple use of some variables:
> 
> (usbbase) (usbsize) (catenated)
> 
> // which yields:
> // 0x6000000 0x100 usb2
> 
> // A more complex example:
> 
> (usb)3@(usb3base) {
>     reg = <(usb3base) (usbsize)>;
>     name = "(usb)3";
> };

Oh. You *intended* for expression substitution within strings.  Nack,
nack nackity nack.  That violates least surprise seven ways to
sunday. If the user wants something like this they can do:
	name = (usb + "3");

> // which yields:
> // usb3@0x60002000 {
> //     reg = <0x60002000 0x100>;
> //     name = "usb3";
> // };
> 
> // Question: Do ints always format as 0x%x since that's the most common,
> // or do we need explicit control over the base etc.?

The user certainly shouldn't have to care what base two apparently
internal parts of dtc use to talk to each other.

> // Question: How do we know when to format strings with "" around them,
> // e.g. for use as property values, and when not to, e.g. for use in
> // arbitrary contexts? For example above, it'd be nice if when defining
> // the name property, we could write 'name = usb3name;' and have it
> // expand to 'name = "usb3";' given a str variable with value "usb3",
> // yet we don't want the quotes when using variable usb in the node
> // name in the example earlier.

Yeah.  This is another reason I don't think splitting the expression
evaluation from the surrounding grammatical context is a good idea.

> // Question: What if we actually wanted the property value "(usb3)". How
> // do we stop the expansion; how to escape?
> //
> // I suppose the solution for the latter 2 questions is that the
> // expansion has to actually be sensitive to context in the underlying
> // language, and include "" in property value context, but not
> // elsewhere. But, what if you write:

Well, yeah, which would mean duplicating large amounts of the grammar
between the expression evaluator and the rest of dtc.

> !defstr nasty "usb@0x6000000 { name =";
> (nasty) (foo);
> 
> // Additional statements could include if, for, while, ...:
> 
> !ifdef somevar
> foo bar
> !else
> baz qux
> !endif
> 
> // I think we don't need e.g.:
> 
> foo !ifdef somevar! bar !else! baz !endif! qux
> 
> // ... since I think that we can line-break in the middle of any
> // property or node definition, so we could just do this instead:
> 
> foo
> !ifdef somevar
> bar
> !else
> baz
> !endif
> qux
> 
> // If we need to actually concatenate the strings into one, we can do
> // that as an expression somehow, assign the result to a variable, and
> // expand just that.
> 
> !defstr xxx "foo"
> !ifdef somevar
> !defstr xxx xxx + "bar"
> !else
> !defstr xxx xxx + "baz"
> !endif
> !defstr xxx xxx + "qux"
> (xxx)
> 
> // Perhaps we can delimit large blocks of statements in a way that
> // doesn't need a lot of !s:
> 
> !!
> xxx = "foo"
> if somevar:
>     xxx += "bar"
> else:
>     xxx += "baz"
> xxx += "qux"
> !!
> (xxx)
> 
> // Then, we can start allowing complex things like macro or function
> // definitions within the !! block; a full regular language, and
> // perhaps we could even borrow an existing one here.
> 
> // About functions: Perhaps cpp-style macros:
> 
> !define func(a, b, c) a + b + c
> 
> // where the RHS is an expression that can use variables in the
> // parameter list
> //
> // Or, is the RHS/body raw text, so something more like:
> 
> !macro func(a, b, c)
>    foo {
>       prop = (a + b + c);
>    }
> !end
> 
> // Perhaps we need both; one with text RHS accepting escapes into
> // expressions, one with an expression on the RHS.
> 
> // I wondered if !define's RHS should always be an expression, or
> // instead always be raw text with the same (...) escape to expressions
> // as in regular text:
> 
> // (assuming a, b, c are extant variables)
> 
> // all variables are strings?
> !define foo (a) + (b) + (c)
> 
> or:
> 
> // yields an integer variable
> !defint foo a + b + c

Ugh.  Well, I think you've pretty much proved the case that attempting
to put all the expression evaluation into the preprocessor is a really
bad idea.  It requires the preproc to be at least somewhat type aware
which (a) is likely to lead to grammar duplication and (b) is
absolutely not what someone familiar with cpp will expect.

Note that evaluating *constant* expressions in dtc works very
naturally into the existing structure and grammar.  I certainly have
no objection to that, and I don't know of anyone that did.  It's
storing and evaluating functions or macros in dtc proper that I'm
dubious about because it requires storing partial parse trees or some
other intermediate representation in a way we have never needed to
before.  That means a whole bunch of extra code and data structures.

Now, implementing a preprocessor with (initially) similar features to
cpp, but using ! instead of #, might have legs.  In fact even using
#-in-column-0 might be ok, but we'd want our own cpp implementation
because there's no portable way of ensuring that a system cpp will
only recognize # in column 0 and not elsewhere.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some slightly random musings on device tree expression syntax
       [not found]     ` <20120313044631.GJ24916-MK4v0fQdeXQXU02nzanrWNbf9cGiqdzd@public.gmane.org>
@ 2012-03-13 19:56       ` Stephen Warren
       [not found]         ` <4F5FA653.90802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Warren @ 2012-03-13 19:56 UTC (permalink / raw)
  To: David Gibson; +Cc: devicetree-discuss

On 03/12/2012 10:46 PM, David Gibson wrote:
> On Wed, Mar 07, 2012 at 05:40:37PM -0700, Stephen Warren wrote:
>> I was thinking some more about how to expand the device tree syntax to
>> allow expressions. I wondered if we should use a concept/syntax more
>> inspired by template processors.
...
>> Whether this pre-processing phase is implemented as:
>> * A separate executable, manually invoked by the user.
>> * A separate executable, automatically invoked by dtc itself.
>> * Something built into dtc itself.
>> ... is not addressed by this proposal.
>>
>> One potential issue here: if the pre-processing and regular compilation
>> phases are completely separate, do we need to pay attention that the
>> int, literal, byte-sequence literal syntax stays the same between the
>> two phases to reduce confusion, or not?
> 
> I'm not sure quite what you're getting at here.

Well, it's the point you make right below. Namely that if expression
evaluation happens during pre-processing (either only there, or both
there and during the separate final "compilation" phase), that the
pre-processor must be able to parse and manipulate literals of all
types, so the expressions it calculates can use values of those types.

...
> Hrm.  I'm pretty dubious about doing the expression evaluation (as
> opposed to macro/constant expansion) within the preprocessor, then
> resubstituting as a string.
> 
> It would work ok for integer expressions, but for bytestring
> expressions, it seems likely we'd have to duplicate the
> lexical/grammar constructs for [...], <...> and basic literals between
> preproc and dtc, which seems a bit horrible.

Don't we have to allow the pre-processor to parse and manipulate
constants of all types (both scalars and perhaps even complete nodes)?
If we don't, then how would you do something like:

var = [00 11 aa 55]
for byte in var:
    do_something_with(byte)

or:

var = "Some long string"
for word in var.split():
    do_something_with(word)

> In addition this approach means that an expression can never express a
> value which a literal couldn't.  No problem in most cases, but one
> thing I had in mind is that an expression syntax could be used to
> specify a node or property name with illegal characters in it (mostly
> relevant for ensuring that doing -I dtb -O dts then -I dts -O will
> always end up exactly where you started, even when the original dtb is
> corrupted or otherwise contains things it shouldn't.

Well, one might imagine:

s = "Some text" + chr(128)

That's an expression that expresses something that I think can't
currently be a literal string.

...
>> !defint usbbase 0x6000000
>> !defstr usb "usb"
>> !defbytes somebytes [de ad be ef]
>>
>> // or perhaps implicitly set variable type based on type of the RHS?
>> !define usbbase 0x6000000
>> !define usb "usb"
> 
> Hrm.  If using defines is based on textual substitution, then type
> should be irrelevant.  If they're not based on textual substitution,
> then the "preprocessor" is doing something rather more involved than
> something with that name normally would.

True. I was more leaning to describing this as a template processor than
a pre-processor. Related, my thoughts started out simpler, but became
more complex and raised a lot of open questions when thinking through
some of the details, so became a lot less clear!

>> // A more complex example:
>>
>> (usb)3@(usb3base) {
>>     reg = <(usb3base) (usbsize)>;
>>     name = "(usb)3";
>> };
> 
> Oh. You *intended* for expression substitution within strings.  Nack,
> nack nackity nack.  That violates least surprise seven ways to
> sunday. If the user wants something like this they can do:
> 	name = (usb + "3");

That works for the name property, but what about the node's name:

    (usb)3@(usb3base) {

Even if we required that the whole thing be calculated elsewhere and
placed into a variable, how do we know whether:

    foo {

is meant to expand variable foo or be literal "foo"? That seemed to be
one of your main objections to Jon's implementation. I proposed solving
that by explicitly marking the source to indicate where expansion was
desired:

  (foo) {

or not:

  foo {

So, () act as "start and end of expression".

Given that, why not allow complete expressions with () rather than just
a single variable or macro call?

This is pretty much the core point of why I was referring to a
templating engine rather than a pre-processor. Of course, templating
engines often use e.g. <%= %> instead of ( ) or a wide variety of other
syntaxes.

...
> Ugh.  Well, I think you've pretty much proved the case that attempting
> to put all the expression evaluation into the preprocessor is a really
> bad idea.  It requires the preproc to be at least somewhat type aware
> which (a) is likely to lead to grammar duplication and (b) is
> absolutely not what someone familiar with cpp will expect.

Well, I don't necessarily agree that people would be by default
expecting the syntax/... must match cpp specifically; there are many
many other pre-processors, macro-processors, template languages etc. out
there.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some slightly random musings on device tree expression syntax
       [not found]         ` <4F5FA653.90802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
@ 2012-03-14 14:42           ` David Gibson
  0 siblings, 0 replies; 6+ messages in thread
From: David Gibson @ 2012-03-14 14:42 UTC (permalink / raw)
  To: Stephen Warren; +Cc: devicetree-discuss

On Tue, Mar 13, 2012 at 01:56:03PM -0600, Stephen Warren wrote:
> On 03/12/2012 10:46 PM, David Gibson wrote:
> > On Wed, Mar 07, 2012 at 05:40:37PM -0700, Stephen Warren wrote:
> >> I was thinking some more about how to expand the device tree syntax to
> >> allow expressions. I wondered if we should use a concept/syntax more
> >> inspired by template processors.
> ...
> >> Whether this pre-processing phase is implemented as:
> >> * A separate executable, manually invoked by the user.
> >> * A separate executable, automatically invoked by dtc itself.
> >> * Something built into dtc itself.
> >> ... is not addressed by this proposal.
> >>
> >> One potential issue here: if the pre-processing and regular compilation
> >> phases are completely separate, do we need to pay attention that the
> >> int, literal, byte-sequence literal syntax stays the same between the
> >> two phases to reduce confusion, or not?
> > 
> > I'm not sure quite what you're getting at here.
> 
> Well, it's the point you make right below. Namely that if expression
> evaluation happens during pre-processing (either only there, or both
> there and during the separate final "compilation" phase), that the
> pre-processor must be able to parse and manipulate literals of all
> types, so the expressions it calculates can use values of those
> types.

Um.. if you insist on doing the sort of very fancy stuff in the
pre-processor that you're talking about.  A lot of that becomes
unnecessary with sufficient expression support in dtc.  Especially
remembering that if you have really fancy needs, you can always
generate dts output from a real programming language.

Or rather, put it this way.  My preferred option is still a (simple!)
pre-processor with reasonably rich constant expression support in dtc
proper.  But I prefer Jon's full-language-in-dtc approach to this
full-language-in-preprocessor with very simple dtc hybrid approach -
it's really the worst of both worlds.

> ...
> > Hrm.  I'm pretty dubious about doing the expression evaluation (as
> > opposed to macro/constant expansion) within the preprocessor, then
> > resubstituting as a string.
> > 
> > It would work ok for integer expressions, but for bytestring
> > expressions, it seems likely we'd have to duplicate the
> > lexical/grammar constructs for [...], <...> and basic literals between
> > preproc and dtc, which seems a bit horrible.
> 
> Don't we have to allow the pre-processor to parse and manipulate
> constants of all types (both scalars and perhaps even complete nodes)?
> If we don't, then how would you do something like:
> 
> var = [00 11 aa 55]
> for byte in var:
>     do_something_with(byte)
> 
> or:
> 
> var = "Some long string"
> for word in var.split():
>     do_something_with(word)

Um, yeah, if you want Python, generate your dts from Python, we're not
going to recreate Python within dtc, let alone within a dtc
preprocessor.  A pre-processor should do at most, textual macro
substitution (#define), with maybe a (still textual / call-by-name)
foreach construct (though even that may not be necessary if we have
iteration functions).  Anything that involves type awareness and it's
a full language, not a pre-processor which means we should either (1)
generate the dts from an existing language or (2) write the language
into dtc proper so its syntax is properly merged with dtc.

> > In addition this approach means that an expression can never express a
> > value which a literal couldn't.  No problem in most cases, but one
> > thing I had in mind is that an expression syntax could be used to
> > specify a node or property name with illegal characters in it (mostly
> > relevant for ensuring that doing -I dtb -O dts then -I dts -O will
> > always end up exactly where you started, even when the original dtb is
> > corrupted or otherwise contains things it shouldn't.
> 
> Well, one might imagine:
> 
> s = "Some text" + chr(128)
> 
> That's an expression that expresses something that I think can't
> currently be a literal string.

So the expression preprocessor can generate such a thing, but in your
scheme it has no way to output it back to dtc except as a literal.
Oops.

Well, except the problem actually only arises for node and property
names, for quoted strings in property values that can be expressed as
a literal - "Some text\x80".

> ...
> >> !defint usbbase 0x6000000
> >> !defstr usb "usb"
> >> !defbytes somebytes [de ad be ef]
> >>
> >> // or perhaps implicitly set variable type based on type of the RHS?
> >> !define usbbase 0x6000000
> >> !define usb "usb"
> > 
> > Hrm.  If using defines is based on textual substitution, then type
> > should be irrelevant.  If they're not based on textual substitution,
> > then the "preprocessor" is doing something rather more involved than
> > something with that name normally would.
> 
> True. I was more leaning to describing this as a template processor than
> a pre-processor. Related, my thoughts started out simpler, but became
> more complex and raised a lot of open questions when thinking through
> some of the details, so became a lot less clear!

Trickier than it seems, isn't it.  There's a reason this has been
discussed on and off for several years now.

> >> // A more complex example:
> >>
> >> (usb)3@(usb3base) {
> >>     reg = <(usb3base) (usbsize)>;
> >>     name = "(usb)3";
> >> };
> > 
> > Oh. You *intended* for expression substitution within strings.  Nack,
> > nack nackity nack.  That violates least surprise seven ways to
> > sunday. If the user wants something like this they can do:
> > 	name = (usb + "3");
> 
> That works for the name property, but what about the node's name:
> 
>     (usb)3@(usb3base) {

> Even if we required that the whole thing be calculated elsewhere and
> placed into a variable, how do we know whether:
> 
>     foo {
> 
> is meant to expand variable foo or be literal "foo"? That seemed to be
> one of your main objections to Jon's implementation. I proposed solving
> that by explicitly marking the source to indicate where expansion was
> desired:
> 
>   (foo) {
> 
> or not:
> 
>   foo {
> 
> So, () act as "start and end of expression".

Yes.  So, my thinking was that for the case of node property names,
when it's given as an expression, it's a normal string expression,
with quoted literals and the rest, rather than using bare strings -
bare strings are seen as just a shortcut for the simple case.  This
is, again, incompatible with your idea of a separate expression
pre-processor, because it requires awareness of the context.  So:
	foo {
and
	("foo") {
would be equivalent.  And for the constructed example above you'd use:
	(usb + "3@" + usb3base) {


> Given that, why not allow complete expressions with () rather than just
> a single variable or macro call?

Never suggested we shouldn't.  But we absolutely shouldn't be using
bare strings in expressions the way we do in non-expression node
property names.

> This is pretty much the core point of why I was referring to a
> templating engine rather than a pre-processor. Of course, templating
> engines often use e.g. <%= %> instead of ( ) or a wide variety of other
> syntaxes.
> 
> ...
> > Ugh.  Well, I think you've pretty much proved the case that attempting
> > to put all the expression evaluation into the preprocessor is a really
> > bad idea.  It requires the preproc to be at least somewhat type aware
> > which (a) is likely to lead to grammar duplication and (b) is
> > absolutely not what someone familiar with cpp will expect.
> 
> Well, I don't necessarily agree that people would be by default
> expecting the syntax/... must match cpp specifically; there are many
> many other pre-processors, macro-processors, template languages etc. out
> there.

Not perfectly, no.  But the target audience of dtc are largely C
programmers, the existing core syntax is C-like, and the
least-surprise principle should be applied in that context.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-14 14:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-08  0:40 Some slightly random musings on device tree expression syntax Stephen Warren
     [not found] ` <4F580005.403-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2012-03-12 13:53   ` Jon Loeliger
     [not found]     ` <E1S75gQ-0005WK-0D-CYoMK+44s/E@public.gmane.org>
2012-03-12 23:57       ` David Gibson
2012-03-13  4:46   ` David Gibson
     [not found]     ` <20120313044631.GJ24916-MK4v0fQdeXQXU02nzanrWNbf9cGiqdzd@public.gmane.org>
2012-03-13 19:56       ` Stephen Warren
     [not found]         ` <4F5FA653.90802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2012-03-14 14:42           ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).