From: Wander Lairson Costa <wander@redhat.com>
To: Nam Cao <namcao@linutronix.de>
Cc: Gabriele Monaco <gmonaco@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/13] verification/rvgen: Introduce a parse tree for automata using Lark
Date: Fri, 15 May 2026 15:37:25 -0300 [thread overview]
Message-ID: <agdnvQPLLiLBIxw7@wcosta-defaultstring.rmtbr.csb> (raw)
In-Reply-To: <050ac3d7aeb1ece12a4deb91fc173de24ad147de.1777962130.git.namcao@linutronix.de>
On Tue, May 05, 2026 at 08:59:23AM +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT language
> defined by graphviz), the scripts would fail.
>
> To make the scripts robust, the parser should be implemented based on the
> dot language specification, not based on how the existing dot files look.
>
> As a first step, use Lark to implement a Parser based on the graphviz dot
> language specification. The resulting parse tree is not used yet, but the
> existing scripts will be converted one by one to use this new parse tree in
> the follow-up commits.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
> tools/verification/rvgen/rvgen/automata.py | 182 +++++++++++++++++++++
> 1 file changed, 182 insertions(+)
>
> diff --git a/tools/verification/rvgen/rvgen/automata.py b/tools/verification/rvgen/rvgen/automata.py
> index b9f8149f7118..4e3d719a0952 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -13,6 +13,187 @@ import re
> from typing import Iterator
> from itertools import islice
>
> +import lark
> +
> +class ParseTree:
> + # based on https://graphviz.org/doc/info/lang.html
> + # with the irrelevant stuffs (port and compass) removed
> + grammar = r'''
> + start: "strict"? ("graph" | "digraph") ID? "{" stmt_list "}"
> +
> + stmt_list: (stmt ";"? stmt_list)?
> +
> + stmt: node_stmt
> + | edge_stmt
> + | attr_stmt
> + | ID "=" ID
> + | subgraph
> +
> + attr_stmt: attr_type attr_list
> +
> + attr_type: "graph" -> graph
> + | "node" -> node
> + | "edge" -> edge
> +
> + attr_list: "[" a_list? "]" attr_list?
> +
> + a_list: ID "=" ID (";" | ",")? a_list?
> +
> + edge_stmt: (node_id | subgraph) edgerhs attr_list?
> +
> + edgerhs: edgeop (node_id | subgraph) edgerhs?
> +
> + edgeop: "->" | "--"
> +
> + node_stmt: node_id attr_list?
> +
> + node_id: ID
> +
> + subgraph: ("subgraph" ID?)? "{" stmt_list "}"
> +
> + ID: /[_a-zA-Z][_a-zA-Z0-9]+/
This regex rejects symbol character symbol. Is that intentional?
> + | /-?(\.[0-9]+|[0-9]+(\.[0-9]*))/
> + | /".*?"/
> +
> + %import common.WS
> + %ignore WS
> + '''
> +
> + @staticmethod
> + def parse_edge(tree: lark.Tree) -> tuple[str, str]:
> + # only support a simple node-to-node edge
> + nodes = []
> + for node in tree.iter_subtrees_topdown():
> + if node.data == "node_id":
> + nodes.append(node.children[0].strip('"'))
> +
> + if len(nodes) != 2:
> + raise AutomataError("Only state-to-state transition is supported")
> +
> + return tuple(nodes)
> +
> + class ParseNodes(lark.visitors.Visitor):
> + def __init__(self, *args, **kwargs):
> + self.nodes = set()
> + super().__init__(*args, **kwargs)
> +
> + def node_stmt(self, tree):
> + node_id = tree.children[0]
> + node = node_id.children[0].strip('"')
> + self.nodes.add(node)
> +
> + class ParseEdges(lark.visitors.Visitor):
> + def __init__(self, *args, **kwargs):
> + self.edges = set()
> + super().__init__(*args, **kwargs)
> +
> + def edge_stmt(self, tree):
> + edge = ParseTree.parse_edge(tree)
> + self.edges.add(edge)
> +
> + class ParseAttributes(lark.visitors.Interpreter):
> + def __init__(self, *args, **kwargs):
> + '''
> + Stacks of default attributes. [0] is the default
> + attributes for the outermost scope, while [-1] is the
> + default attributes for the current scope.
> + '''
> + self.default_node_attrs = [{}]
> + self.default_edge_attrs = [{}]
> +
> + self.node_attrs = {}
> + self.edge_attrs = {}
> +
> + super().__init__(*args, **kwargs)
> +
> + @staticmethod
> + def __get_attrs(stmt: lark.Tree) -> dict[str, str]:
> + attrs = {}
> +
> + for node in stmt.iter_subtrees():
> + if node.data == "a_list":
> + attrs[node.children[0]] = node.children[1].strip('"')
> +
> + return attrs
> +
> +
> + def subgraph(self, tree):
> + # We are entering a new scope, inherit the default
> + # attributes of the outer scope
> + self.default_node_attrs.append(self.default_node_attrs[-1].copy())
> + self.default_edge_attrs.append(self.default_edge_attrs[-1].copy())
> +
> + children = self.visit_children(tree)
> +
> + # Exiting the scope
> + del self.default_node_attrs[-1]
> + del self.default_edge_attrs[-1]
> +
> + return children
> +
> + def node_stmt(self, tree):
> + node_id = tree.children[0]
> + node = node_id.children[0].strip('"')
> +
> + attrs = self.default_node_attrs[-1].copy()
> + attrs |= self.__get_attrs(tree)
> +
> + if attrs:
> + if node in self.node_attrs:
> + self.node_attrs[node] = attrs | self.node_attrs[node]
> + else:
> + self.node_attrs[node] = attrs
> +
> + return self.visit_children(tree)
> +
> + def edge_stmt(self, tree):
> + edge = ParseTree.parse_edge(tree)
> +
> + attrs = self.default_edge_attrs[-1].copy()
> + attrs |= self.__get_attrs(tree)
> +
> + if attrs:
> + if edge in self.edge_attrs:
> + self.edge_attrs[edge] = attrs | self.edge_attrs[edge]
> + else:
> + self.edge_attrs[edge] = attrs
> +
> + return self.visit_children(tree)
> +
> + def attr_stmt(self, tree):
> + attr_type = tree.children[0].data
> + attrs = self.__get_attrs(tree)
> +
> + if attr_type == "node":
> + self.default_node_attrs[-1] |= attrs
> + elif attr_type == "edge":
> + self.default_edge_attrs[-1] |= attrs
> + else:
> + # graph attributes are irrelevant
> + pass
> +
> + self.visit_children(tree)
> +
> + def __init__(self, dot_file):
> + parser = lark.Lark(self.grammar, parser='lalr')
> + node_parser = self.ParseNodes()
> + edge_parser = self.ParseEdges()
> + attributes_parser = self.ParseAttributes()
> +
> + try:
> + with open(dot_file, "r") as dot_file:
> + tree = parser.parse(dot_file.read())
> + attributes_parser.visit(tree)
> + node_parser.visit(tree)
> + edge_parser.visit(tree)
> + except OSError as exc:
> + raise AutomataError(exc.strerror) from exc
> +
> + self.nodes = node_parser.nodes
> + self.edges = edge_parser.edges
> + self.node_attrs = attributes_parser.node_attrs
> + self.edge_attrs = attributes_parser.edge_attrs
> +
> class _ConstraintKey:
> """Base class for constraint keys."""
>
> @@ -66,6 +247,7 @@ class Automata:
> self.__dot_path = file_path
> self.name = model_name or self.__get_model_name()
> self.__dot_lines = self.__open_dot()
> + self.__parse_tree = ParseTree(file_path)
> self.states, self.initial_state, self.final_states = self.__get_state_variables()
> self.env_types = {}
> self.env_stored = set()
> --
> 2.47.3
>
next prev parent reply other threads:[~2026-05-15 18:37 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 6:59 [PATCH 00/13] rv: Convert rvgen to Lark Nam Cao
2026-05-05 6:59 ` [PATCH 01/13] verification/rvgen: Switch LTL parser " Nam Cao
2026-05-06 7:37 ` Gabriele Monaco
2026-05-10 18:18 ` Nam Cao
2026-05-15 15:55 ` Wander Lairson Costa
2026-05-05 6:59 ` [PATCH 02/13] verification/rvgen: Introduce a parse tree for automata using Lark Nam Cao
2026-05-15 18:37 ` Wander Lairson Costa [this message]
2026-05-05 6:59 ` [PATCH 03/13] verification/rvgen: Implement state and transition parser based on Lark Nam Cao
2026-05-06 14:48 ` Gabriele Monaco
2026-05-10 18:21 ` Nam Cao
2026-05-15 19:07 ` Wander Lairson Costa
2026-05-05 6:59 ` [PATCH 04/13] verification/rvgen: Convert __fill_verify_invariants_func() to Lark Nam Cao
2026-05-05 6:59 ` [PATCH 05/13] verification/rvgen: Convert __fill_setup_invariants_func() " Nam Cao
2026-05-05 6:59 ` [PATCH 06/13] verification/rvgen: Convert __fill_verify_guards_func() " Nam Cao
2026-05-06 14:51 ` Gabriele Monaco
2026-05-15 19:35 ` Wander Lairson Costa
2026-05-05 6:59 ` [PATCH 07/13] rv: Simply hybrid automata monitors's clock variables Nam Cao
2026-05-06 9:15 ` Gabriele Monaco
2026-05-11 11:55 ` Nam Cao
2026-05-12 9:31 ` Gabriele Monaco
2026-05-05 6:59 ` [PATCH 08/13] verification/rvgen: Simplify the generation for " Nam Cao
2026-05-05 6:59 ` [PATCH 09/13] verification/rvgen: Delete __parse_constraint() Nam Cao
2026-05-05 6:59 ` [PATCH 10/13] verification/rvgen: Switch __get_event_variables() to Lark Nam Cao
2026-05-05 6:59 ` [PATCH 11/13] verification/rvgen: Switch __create_matrix() " Nam Cao
2026-05-05 6:59 ` [PATCH 12/13] verification/rvgen: Remove the old state variables Nam Cao
2026-05-05 6:59 ` [PATCH 13/13] verification/rvgen: Remove dead code Nam Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agdnvQPLLiLBIxw7@wcosta-defaultstring.rmtbr.csb \
--to=wander@redhat.com \
--cc=gmonaco@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=namcao@linutronix.de \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.