All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev
Cc: Mark Rutland <mark.rutland@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Mark Brown <broonie@kernel.org>
Subject: [PATCH v2] arm64: Add basic JSON register parser
Date: Thu,  2 Jan 2025 14:43:39 +0000	[thread overview]
Message-ID: <20250102144339.1564778-1-maz@kernel.org> (raw)

We currently populate the sysreg file by hand from the ARM ARM,
resulting in a bunch of errors being introduced on a regular basis.
While there is an XML dump of the architecture produced on a quarterly
basis, the license that comes attached to it excludes any sort of
open-source usage.

However, ARM has recently made available a JSON dump[1] that contains
a reduced set of information under a BSD license. This has enough
data to extract what is relevant to the sysreg file.

This is achieved using a JQ script that I cobbled together over
the holiday, and while it has a number of limitations, it already
works well enough to extract useful data.

As an example, here's what the script returns for TCR_EL1:

$ jq -r --arg REG TCR_EL1 -f arch/arm64/tools/dumpreg.jq ~/Work/XML/2024-12/AARCHMRS_BSD_A_profile/Registers.json
TCR_EL1	[3,0,2,0,2]	MRS
TCR_EL1	[3,0,2,0,2]	MSRregister
TCR_EL12	[3,5,2,0,2]	MRS
TCR_EL12	[3,5,2,0,2]	MSRregister
TCRALIAS_EL1	[3,0,2,7,6]	MRS
TCRALIAS_EL1	[3,0,2,7,6]	MSRregister
Res0	63:62
Field	61	MTX1	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	60	MTX0	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	59	DS	# Field cond: (IsFeatureImplemented(FEAT_LPA2) && (!IsFeatureImplemented(FEAT_D128) || (AArch64 TCR2_EL1.D128 == '0')))
Field	59	DS	# Field cond: true
Field	58	TCMA1	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	57	TCMA0	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	56	E0PD1	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	55	E0PD0	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	54	NFD1	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	53	NFD0	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	52	TBID1	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	51	TBID0	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	50	HWU162	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	49	HWU161	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	48	HWU160	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	47	HWU159	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	46	HWU062	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	45	HWU061	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	44	HWU060	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	43	HWU059	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	42	HPD1	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	41	HPD0	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	40	HD	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	39	HA	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	38	TBI1
Field	37	TBI0
Field	36	AS
Res0	35
Field	34:32	IPS
Field	31:30	TG1
Field	29:28	SH1
Field	27:26	ORGN1
Field	25:24	IRGN1
Field	23	EPD1
Field	22	A1
Field	21:16	T1SZ
Field	15:14	TG0
Field	13:12	SH0
Field	11:10	ORGN0
Field	9:8	IRGN0
Field	7	EPD0
Res0	6
Field	5:0	T0SZ

I completely expect this to quickly rewritten by people who know
what they are doing (I don't) and improved as we understand more
of the data model.

[1] https://developer.arm.com/-/cdn-downloads/permalink/Exploration-Tools-OS-Machine-Readable-Data/AARCHMRS_BSD/AARCHMRS_BSD_A_profile-2024-12.tar.gz

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
---

Notes:
    - From v1:
    
      - Fix the accessor encoding order
      - Handing of nesting fields, arrays, vectors
      - Plenty of additional JSON handling

 arch/arm64/tools/dumpreg.jq | 258 ++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)
 create mode 100644 arch/arm64/tools/dumpreg.jq

diff --git a/arch/arm64/tools/dumpreg.jq b/arch/arm64/tools/dumpreg.jq
new file mode 100644
index 0000000000000..efb198066820f
--- /dev/null
+++ b/arch/arm64/tools/dumpreg.jq
@@ -0,0 +1,258 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# dumpreg.jq: JSON arm64 system register data extractor
+#
+# Author: Marc Zyngier <maz@kernel.org>
+#
+# Usage: jq -r --arg REG "XZY_ELx" -f ./dumpreg.jq Registers.json
+
+# Dump a set of semi-pertinent informations (encodings, fields,
+# conditions, field position and width) about register XZY_ELx as
+# contained in ARM's AARCHMRS_BSD_A_profile JSON tarball.
+
+# Not setting REG will dump the whole register file in one go. While
+# this is entertaining, it isn't very useful.
+
+# This can/should be used to populate the arch/arm64/tools/sysreg
+# file, instead of copying things by hand.
+
+# The tool currently has a bunch of limitations that users need to be
+# aware of, but none that should have a major impact on the usability:
+
+# - All accessors are shown, irrespective of the conditions in which
+#   the accessors are actually available
+
+# - All Fields.ConstantField are displayed as UnsignedEnum,
+#   irrespective of the signess of the field (as the JSON doesn't
+#   carry this information).
+
+# - Value ranges are displayed using '[...]'.
+
+# - Fields are processed and displayed in the order of the JSON
+#   source, which may not be the order in the register.
+
+# - Conditional fields may appear multiple times.
+
+# - ... and probably more...
+
+def walknode:
+	def walkjoin(s):
+		map(walknode) | join(s);
+
+        if   (._type == "AST.Identifier" or ._type == "AST.Integer" or
+	      ._type == "Values.Value" or ._type == "AST.Bool" or
+	      ._type == "Types.String") then
+	     	.value
+	elif (._type == "Types.Field") then
+		"\(.value.name).\(.value.field)"
+	elif (._type == "AST.UnaryOp") then
+		"\(.op)(\(.expr | walknode))"
+	elif (._type == "AST.Function") then
+		"\(.name)(\(.arguments | walkjoin(", ")))"
+	elif (._type == "AST.DotAtom") then
+		.values | walkjoin(".")
+	elif (._type == "AST.BinaryOp") then
+		"(\(.left | walknode) \(.op) \(.right | walknode))"
+	elif (._type == "Types.RegisterType") then
+		.name
+	elif (._type == "AST.Type") then
+		"\(.name | walknode)"
+	elif (._type == "AST.Slice") then
+		"\(.left.value):\(.right.value)"
+	elif (._type == "AST.Set") then
+		.values | map(walknode)
+	elif (._type == "AST.Assignment")  then
+		"\(.var | walknode) = \(.val | walknode)"
+	elif (._type == "AST.TypeAnnotation")  then
+		"\(.var | walknode):\(.type | walknode)"
+	elif (._type == "AST.SquareOp") then
+		"\(.var | walknode)[\(.arguments | walkjoin(", "))]"
+	elif (._type == "AST.Return") then
+		"return"
+	elif (._type == "AST.Concat") then
+		"[\(.values | walkjoin(", "))]"
+	elif (._type == "AST.Tuple") then
+		"(\(.values | walkjoin(", ")))"
+	else	# debug catch-all
+		.
+	end;
+
+def range:
+	. as { _type: $type, start: $start, width: $width } |
+	if ($width == 1) then
+		"\($start)"
+	else
+		"\($start + $width - 1):\($start)"
+	end;
+
+def fld:
+	(if (.condstr.text) then "\t\(.condstr.text)"
+	 else "" end) as $cond |
+	"\(.type)\t\(.range | range)\t\(.name)\($cond)";
+
+def condition(source):
+	"# \(source) cond: \(.condition | walknode)";
+
+def unquote:
+	"'" as $q | (ltrimstr($q) | rtrimstr($q));
+
+def binvalue:
+	.value | unquote as $v | "\t0b\($v)\tVAL_\($v)";
+
+def dumpconstants:
+	if   (._type == "Values.Value") then
+		binvalue
+	elif (._type == "Values.ValueRange") then
+		(.start | binvalue), "\t[...]", (.end | binvalue)
+	elif (._type == "Values.ConditionalValue") then
+		"\(.values.values[] | dumpconstants)\t\(condition("Value"))"
+ 	else	# Debug catch all
+		.
+	end;
+
+def dumpenum:
+	# Things like SMIDR_EL1.Affinity do not describe
+	# the value range, hence the []? hack below.
+	(.value.constraints.values[]? | dumpconstants);
+
+def genarrayelt(n; bpf):
+	"<\(.index_variable)>" as $v |
+	(.rangeset | reverse) as $rs |
+	($rs | length) as $nrs |
+	{
+		_type: (if (bpf > 1) then "Fields.ConstantField"
+			else "Fields.Field" end),
+		name: (.name | sub($v; "\(n)")),
+		rangeset: [
+			{
+				_type: "Range",
+				start: (if ($nrs > 1) then $rs[n].start
+				        else $rs[0].start + n * bpf end),
+				width: bpf
+			}
+		],
+		value: { constraints: .values },
+		condstr: (if (.condstr) then
+			    { text: (.condstr.text | sub($v; "\(n)")) }
+			  else
+			    null
+			  end)
+	};
+
+def genarray:
+	# Oh the fun we're having: convert each element of the array
+	# into its own architectural field, warts and all. Additional
+	# fun is provided to compute the number of bits per fields,
+	# as the elements can be spread over multiple rangesets.
+	. as $field |
+	.indexes[0].width as $nr |
+	((reduce .rangeset[].width as $sz (0; . + $sz)) / $nr) as $bpf |
+	[ range(0; $nr) ] | reverse | map(. as $n | $field | genarrayelt($n; $bpf));
+
+# For each range of a field, unpack it as start and width, and
+# apply it to each range of the parent field (used as a base).
+# Although this can result in a combinatorial explosion, the
+# likely case is that one of the two sets is of size one.
+def mergerangesets(base):
+	.[] |
+	.start as $s |
+	.width as $w |
+	base | map({
+			_type: "Range",
+			start: (.start + $s),
+			width: ([ $w, .width ] | min)
+		   });
+
+def depthstr(depth):
+	[ range(0, depth) ] | map(32, 32) | implode;
+
+def walkfields(depth):
+	depthstr(depth) as $dep |
+	if   (._type == "Fields.Reserved" and .value == "RES0") then
+		{ type: "Res0", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved" and .value == "RES1") then
+		{ type: "Res1", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ConditionalField") then
+		# Propagate the condition text over all conditional
+		# fields by injecting a new ".condstr.text" field.
+		# Also, the ranges must be combined as they nest.
+		.rangeset as $r |
+		.fields | map(condition("Field") as $c |
+			      .field.condstr |= { text: $c }) |
+			  map(.field.rangeset |= mergerangesets($r)) |
+		.[] | .field | walkfields(depth)
+	elif (._type == "Fields.Dynamic") then
+		({ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } | fld),
+	     	(.rangeset as $r | .instances[] |
+		 ((.display // .name // "Instance") as $src |
+		  "\(depthstr(depth + 1))\(condition($src))",
+		  # Remap the rangesets to display the absolute range
+		  (.values | map(.rangeset |= mergerangesets($r)) |
+		   .[] | walkfields(depth + 1))))
+	elif (._type == "Fields.ConstantField") then
+		({ type: "UnsignedEnum", name: .name, range: .rangeset[], condstr: .condstr } |
+		 "\($dep)\(fld)"),
+		dumpenum,
+		"EndEnum"
+	elif (._type == "Fields.Field") then
+		{ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved") then
+		{ type: "Field", name: .value, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ImplementationDefined") then
+		{ type: "Field", name: (.name // "IMPDEF"), range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Array" or ._type == "Fields.Vector") then
+	     	genarray | .[] | walkfields(depth)
+ 	else	# Debug catch all
+		.
+	end;
+
+def tautology:
+	(.condition.value == true);
+
+def walkreg:
+	(.fieldsets | length) as $l |
+	.fieldsets[] |
+	  (if ($l > 1 or (tautology | not)) then condition("Fieldset") else empty end),
+	  (.values[] | walkfields(0));
+
+def bin_to_i:
+	def bintoi:
+		(length - 1) as $e |
+		((.[0] - 48) * ($e | exp2)) + (if ($e > 0) then .[1:] | bintoi
+				     	       else 0 end);
+	explode | bintoi;
+
+def computeencoding:
+	if (.) then
+		if   (._type == "Values.Value") then .value | unquote | bin_to_i
+		elif (._type == "Values.Group") then .value
+		elif (._type == "Values.EquationValue") then "\(.value)[\(.slice[] | range)]"
+		else . # Debug catch all
+		end
+	else
+		"#Imm"
+	end;
+
+def encodings:
+	.encodings | [ .op0, .op1, .CRn, .CRm, .op2 ] | map(computeencoding);
+
+def accessorencoding:
+	(.name | ltrimstr("A64.")) as $name |
+	.encoding[] | "\(.asmvalue)\t\(encodings)\t\($name)";
+
+def accessors:
+	.accessors[] |
+	accessorencoding;
+
+def regcondition:
+	if (tautology | not) then condition("Reg") else empty end;
+
+.[] | select (._type == "Register" or ._type == "RegisterArray") |
+      select (.state == "AArch64" and
+	      ($ARGS.named.REG == null or .name == $ARGS.named.REG)) |
+      "# \(.name)",accessors,regcondition,walkreg
-- 
2.39.2


             reply	other threads:[~2025-01-02 14:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-02 14:43 Marc Zyngier [this message]
2025-01-06 15:19 ` [PATCH v2] arm64: Add basic JSON register parser Mark Brown
2025-01-06 16:30   ` Marc Zyngier
2025-01-06 18:06     ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250102144339.1564778-1-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.