Linux Trace Kernel
 help / color / mirror / Atom feed
From: Wander Lairson Costa <wander@redhat.com>
To: Nam Cao <namcao@linutronix.de>
Cc: Gabriele Monaco <gmonaco@redhat.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/13] verification/rvgen: Introduce a parse tree for automata using Lark
Date: Fri, 15 May 2026 15:37:25 -0300	[thread overview]
Message-ID: <agdnvQPLLiLBIxw7@wcosta-defaultstring.rmtbr.csb> (raw)
In-Reply-To: <050ac3d7aeb1ece12a4deb91fc173de24ad147de.1777962130.git.namcao@linutronix.de>

On Tue, May 05, 2026 at 08:59:23AM +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT language
> defined by graphviz), the scripts would fail.
> 
> To make the scripts robust, the parser should be implemented based on the
> dot language specification, not based on how the existing dot files look.
> 
> As a first step, use Lark to implement a Parser based on the graphviz dot
> language specification. The resulting parse tree is not used yet, but the
> existing scripts will be converted one by one to use this new parse tree in
> the follow-up commits.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
>  tools/verification/rvgen/rvgen/automata.py | 182 +++++++++++++++++++++
>  1 file changed, 182 insertions(+)
> 
> diff --git a/tools/verification/rvgen/rvgen/automata.py b/tools/verification/rvgen/rvgen/automata.py
> index b9f8149f7118..4e3d719a0952 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -13,6 +13,187 @@ import re
>  from typing import Iterator
>  from itertools import islice
>  
> +import lark
> +
> +class ParseTree:
> +    # based on https://graphviz.org/doc/info/lang.html
> +    # with the irrelevant stuffs (port and compass) removed
> +    grammar = r'''
> +    start: "strict"? ("graph" | "digraph") ID? "{" stmt_list "}"
> +
> +    stmt_list: (stmt ";"? stmt_list)?
> +
> +    stmt: node_stmt
> +        | edge_stmt
> +        | attr_stmt
> +        | ID "=" ID
> +        | subgraph
> +
> +    attr_stmt: attr_type attr_list
> +
> +    attr_type: "graph" -> graph
> +            | "node"  -> node
> +            | "edge"  -> edge
> +
> +    attr_list: "[" a_list? "]" attr_list?
> +
> +    a_list: ID "=" ID (";" | ",")? a_list?
> +
> +    edge_stmt: (node_id | subgraph) edgerhs attr_list?
> +
> +    edgerhs: edgeop (node_id | subgraph) edgerhs?
> +
> +    edgeop: "->" | "--"
> +
> +    node_stmt: node_id attr_list?
> +
> +    node_id: ID
> +
> +    subgraph: ("subgraph" ID?)? "{" stmt_list "}"
> +
> +    ID: /[_a-zA-Z][_a-zA-Z0-9]+/

This regex rejects symbol character symbol. Is that intentional?

> +      | /-?(\.[0-9]+|[0-9]+(\.[0-9]*))/
> +      | /".*?"/
> +
> +    %import common.WS
> +    %ignore WS
> +    '''
> +
> +    @staticmethod
> +    def parse_edge(tree: lark.Tree) -> tuple[str, str]:
> +        # only support a simple node-to-node edge
> +        nodes = []
> +        for node in tree.iter_subtrees_topdown():
> +            if node.data == "node_id":
> +                nodes.append(node.children[0].strip('"'))
> +
> +        if len(nodes) != 2:
> +            raise AutomataError("Only state-to-state transition is supported")
> +
> +        return tuple(nodes)
> +
> +    class ParseNodes(lark.visitors.Visitor):
> +        def __init__(self, *args, **kwargs):
> +            self.nodes = set()
> +            super().__init__(*args, **kwargs)
> +
> +        def node_stmt(self, tree):
> +            node_id = tree.children[0]
> +            node = node_id.children[0].strip('"')
> +            self.nodes.add(node)
> +
> +    class ParseEdges(lark.visitors.Visitor):
> +        def __init__(self, *args, **kwargs):
> +            self.edges = set()
> +            super().__init__(*args, **kwargs)
> +
> +        def edge_stmt(self, tree):
> +            edge = ParseTree.parse_edge(tree)
> +            self.edges.add(edge)
> +
> +    class ParseAttributes(lark.visitors.Interpreter):
> +        def __init__(self, *args, **kwargs):
> +            '''
> +            Stacks of default attributes. [0] is the default
> +            attributes for the outermost scope, while [-1] is the
> +            default attributes for the current scope.
> +            '''
> +            self.default_node_attrs = [{}]
> +            self.default_edge_attrs = [{}]
> +
> +            self.node_attrs = {}
> +            self.edge_attrs = {}
> +
> +            super().__init__(*args, **kwargs)
> +
> +        @staticmethod
> +        def __get_attrs(stmt: lark.Tree) -> dict[str, str]:
> +            attrs = {}
> +
> +            for node in stmt.iter_subtrees():
> +                if node.data == "a_list":
> +                    attrs[node.children[0]] = node.children[1].strip('"')
> +
> +            return attrs
> +
> +
> +        def subgraph(self, tree):
> +            # We are entering a new scope, inherit the default
> +            # attributes of the outer scope
> +            self.default_node_attrs.append(self.default_node_attrs[-1].copy())
> +            self.default_edge_attrs.append(self.default_edge_attrs[-1].copy())
> +
> +            children = self.visit_children(tree)
> +
> +            # Exiting the scope
> +            del self.default_node_attrs[-1]
> +            del self.default_edge_attrs[-1]
> +
> +            return children
> +
> +        def node_stmt(self, tree):
> +            node_id = tree.children[0]
> +            node = node_id.children[0].strip('"')
> +
> +            attrs = self.default_node_attrs[-1].copy()
> +            attrs |= self.__get_attrs(tree)
> +
> +            if attrs:
> +                if node in self.node_attrs:
> +                    self.node_attrs[node] = attrs | self.node_attrs[node]
> +                else:
> +                    self.node_attrs[node] = attrs
> +
> +            return self.visit_children(tree)
> +
> +        def edge_stmt(self, tree):
> +            edge = ParseTree.parse_edge(tree)
> +
> +            attrs = self.default_edge_attrs[-1].copy()
> +            attrs |= self.__get_attrs(tree)
> +
> +            if attrs:
> +                if edge in self.edge_attrs:
> +                    self.edge_attrs[edge] = attrs | self.edge_attrs[edge]
> +                else:
> +                    self.edge_attrs[edge] = attrs
> +
> +            return self.visit_children(tree)
> +
> +        def attr_stmt(self, tree):
> +            attr_type = tree.children[0].data
> +            attrs = self.__get_attrs(tree)
> +
> +            if attr_type == "node":
> +                self.default_node_attrs[-1] |= attrs
> +            elif attr_type == "edge":
> +                self.default_edge_attrs[-1] |= attrs
> +            else:
> +                # graph attributes are irrelevant
> +                pass
> +
> +            self.visit_children(tree)
> +
> +    def __init__(self, dot_file):
> +        parser = lark.Lark(self.grammar, parser='lalr')
> +        node_parser = self.ParseNodes()
> +        edge_parser = self.ParseEdges()
> +        attributes_parser = self.ParseAttributes()
> +
> +        try:
> +            with open(dot_file, "r") as dot_file:
> +                tree = parser.parse(dot_file.read())
> +                attributes_parser.visit(tree)
> +                node_parser.visit(tree)
> +                edge_parser.visit(tree)
> +        except OSError as exc:
> +            raise AutomataError(exc.strerror) from exc
> +
> +        self.nodes = node_parser.nodes
> +        self.edges = edge_parser.edges
> +        self.node_attrs = attributes_parser.node_attrs
> +        self.edge_attrs = attributes_parser.edge_attrs
> +
>  class _ConstraintKey:
>      """Base class for constraint keys."""
>  
> @@ -66,6 +247,7 @@ class Automata:
>          self.__dot_path = file_path
>          self.name = model_name or self.__get_model_name()
>          self.__dot_lines = self.__open_dot()
> +        self.__parse_tree = ParseTree(file_path)
>          self.states, self.initial_state, self.final_states = self.__get_state_variables()
>          self.env_types = {}
>          self.env_stored = set()
> -- 
> 2.47.3
> 


  reply	other threads:[~2026-05-15 18:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  6:59 [PATCH 00/13] rv: Convert rvgen to Lark Nam Cao
2026-05-05  6:59 ` [PATCH 01/13] verification/rvgen: Switch LTL parser " Nam Cao
2026-05-06  7:37   ` Gabriele Monaco
2026-05-10 18:18     ` Nam Cao
2026-05-15 15:55   ` Wander Lairson Costa
2026-05-05  6:59 ` [PATCH 02/13] verification/rvgen: Introduce a parse tree for automata using Lark Nam Cao
2026-05-15 18:37   ` Wander Lairson Costa [this message]
2026-05-05  6:59 ` [PATCH 03/13] verification/rvgen: Implement state and transition parser based on Lark Nam Cao
2026-05-06 14:48   ` Gabriele Monaco
2026-05-10 18:21     ` Nam Cao
2026-05-15 19:07   ` Wander Lairson Costa
2026-05-05  6:59 ` [PATCH 04/13] verification/rvgen: Convert __fill_verify_invariants_func() to Lark Nam Cao
2026-05-05  6:59 ` [PATCH 05/13] verification/rvgen: Convert __fill_setup_invariants_func() " Nam Cao
2026-05-05  6:59 ` [PATCH 06/13] verification/rvgen: Convert __fill_verify_guards_func() " Nam Cao
2026-05-06 14:51   ` Gabriele Monaco
2026-05-15 19:35   ` Wander Lairson Costa
2026-05-05  6:59 ` [PATCH 07/13] rv: Simply hybrid automata monitors's clock variables Nam Cao
2026-05-06  9:15   ` Gabriele Monaco
2026-05-11 11:55     ` Nam Cao
2026-05-12  9:31       ` Gabriele Monaco
2026-05-05  6:59 ` [PATCH 08/13] verification/rvgen: Simplify the generation for " Nam Cao
2026-05-05  6:59 ` [PATCH 09/13] verification/rvgen: Delete __parse_constraint() Nam Cao
2026-05-05  6:59 ` [PATCH 10/13] verification/rvgen: Switch __get_event_variables() to Lark Nam Cao
2026-05-05  6:59 ` [PATCH 11/13] verification/rvgen: Switch __create_matrix() " Nam Cao
2026-05-05  6:59 ` [PATCH 12/13] verification/rvgen: Remove the old state variables Nam Cao
2026-05-05  6:59 ` [PATCH 13/13] verification/rvgen: Remove dead code Nam Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agdnvQPLLiLBIxw7@wcosta-defaultstring.rmtbr.csb \
    --to=wander@redhat.com \
    --cc=gmonaco@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=namcao@linutronix.de \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox