All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] arm64: Add basic JSON register parser
@ 2025-01-02 14:43 Marc Zyngier
  2025-01-06 15:19 ` Mark Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2025-01-02 14:43 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Mark Brown

We currently populate the sysreg file by hand from the ARM ARM,
resulting in a bunch of errors being introduced on a regular basis.
While there is an XML dump of the architecture produced on a quarterly
basis, the license that comes attached to it excludes any sort of
open-source usage.

However, ARM has recently made available a JSON dump[1] that contains
a reduced set of information under a BSD license. This has enough
data to extract what is relevant to the sysreg file.

This is achieved using a JQ script that I cobbled together over
the holiday, and while it has a number of limitations, it already
works well enough to extract useful data.

As an example, here's what the script returns for TCR_EL1:

$ jq -r --arg REG TCR_EL1 -f arch/arm64/tools/dumpreg.jq ~/Work/XML/2024-12/AARCHMRS_BSD_A_profile/Registers.json
TCR_EL1	[3,0,2,0,2]	MRS
TCR_EL1	[3,0,2,0,2]	MSRregister
TCR_EL12	[3,5,2,0,2]	MRS
TCR_EL12	[3,5,2,0,2]	MSRregister
TCRALIAS_EL1	[3,0,2,7,6]	MRS
TCRALIAS_EL1	[3,0,2,7,6]	MSRregister
Res0	63:62
Field	61	MTX1	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	60	MTX0	# Field cond: (IsFeatureImplemented(FEAT_MTE_NO_ADDRESS_TAGS) || IsFeatureImplemented(FEAT_MTE_CANONICAL_TAGS))
Field	59	DS	# Field cond: (IsFeatureImplemented(FEAT_LPA2) && (!IsFeatureImplemented(FEAT_D128) || (AArch64 TCR2_EL1.D128 == '0')))
Field	59	DS	# Field cond: true
Field	58	TCMA1	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	57	TCMA0	# Field cond: IsFeatureImplemented(FEAT_MTE2)
Field	56	E0PD1	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	55	E0PD0	# Field cond: IsFeatureImplemented(FEAT_E0PD)
Field	54	NFD1	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	53	NFD0	# Field cond: (IsFeatureImplemented(FEAT_SVE) || IsFeatureImplemented(FEAT_TME))
Field	52	TBID1	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	51	TBID0	# Field cond: IsFeatureImplemented(FEAT_PAuth)
Field	50	HWU162	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	49	HWU161	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	48	HWU160	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	47	HWU159	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	46	HWU062	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	45	HWU061	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	44	HWU060	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	43	HWU059	# Field cond: IsFeatureImplemented(FEAT_HPDS2)
Field	42	HPD1	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	41	HPD0	# Field cond: IsFeatureImplemented(FEAT_HPDS)
Field	40	HD	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	39	HA	# Field cond: IsFeatureImplemented(FEAT_HAFDBS)
Field	38	TBI1
Field	37	TBI0
Field	36	AS
Res0	35
Field	34:32	IPS
Field	31:30	TG1
Field	29:28	SH1
Field	27:26	ORGN1
Field	25:24	IRGN1
Field	23	EPD1
Field	22	A1
Field	21:16	T1SZ
Field	15:14	TG0
Field	13:12	SH0
Field	11:10	ORGN0
Field	9:8	IRGN0
Field	7	EPD0
Res0	6
Field	5:0	T0SZ

I completely expect this to quickly rewritten by people who know
what they are doing (I don't) and improved as we understand more
of the data model.

[1] https://developer.arm.com/-/cdn-downloads/permalink/Exploration-Tools-OS-Machine-Readable-Data/AARCHMRS_BSD/AARCHMRS_BSD_A_profile-2024-12.tar.gz

Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
---

Notes:
    - From v1:
    
      - Fix the accessor encoding order
      - Handing of nesting fields, arrays, vectors
      - Plenty of additional JSON handling

 arch/arm64/tools/dumpreg.jq | 258 ++++++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)
 create mode 100644 arch/arm64/tools/dumpreg.jq

diff --git a/arch/arm64/tools/dumpreg.jq b/arch/arm64/tools/dumpreg.jq
new file mode 100644
index 0000000000000..efb198066820f
--- /dev/null
+++ b/arch/arm64/tools/dumpreg.jq
@@ -0,0 +1,258 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# dumpreg.jq: JSON arm64 system register data extractor
+#
+# Author: Marc Zyngier <maz@kernel.org>
+#
+# Usage: jq -r --arg REG "XZY_ELx" -f ./dumpreg.jq Registers.json
+
+# Dump a set of semi-pertinent informations (encodings, fields,
+# conditions, field position and width) about register XZY_ELx as
+# contained in ARM's AARCHMRS_BSD_A_profile JSON tarball.
+
+# Not setting REG will dump the whole register file in one go. While
+# this is entertaining, it isn't very useful.
+
+# This can/should be used to populate the arch/arm64/tools/sysreg
+# file, instead of copying things by hand.
+
+# The tool currently has a bunch of limitations that users need to be
+# aware of, but none that should have a major impact on the usability:
+
+# - All accessors are shown, irrespective of the conditions in which
+#   the accessors are actually available
+
+# - All Fields.ConstantField are displayed as UnsignedEnum,
+#   irrespective of the signess of the field (as the JSON doesn't
+#   carry this information).
+
+# - Value ranges are displayed using '[...]'.
+
+# - Fields are processed and displayed in the order of the JSON
+#   source, which may not be the order in the register.
+
+# - Conditional fields may appear multiple times.
+
+# - ... and probably more...
+
+def walknode:
+	def walkjoin(s):
+		map(walknode) | join(s);
+
+        if   (._type == "AST.Identifier" or ._type == "AST.Integer" or
+	      ._type == "Values.Value" or ._type == "AST.Bool" or
+	      ._type == "Types.String") then
+	     	.value
+	elif (._type == "Types.Field") then
+		"\(.value.name).\(.value.field)"
+	elif (._type == "AST.UnaryOp") then
+		"\(.op)(\(.expr | walknode))"
+	elif (._type == "AST.Function") then
+		"\(.name)(\(.arguments | walkjoin(", ")))"
+	elif (._type == "AST.DotAtom") then
+		.values | walkjoin(".")
+	elif (._type == "AST.BinaryOp") then
+		"(\(.left | walknode) \(.op) \(.right | walknode))"
+	elif (._type == "Types.RegisterType") then
+		.name
+	elif (._type == "AST.Type") then
+		"\(.name | walknode)"
+	elif (._type == "AST.Slice") then
+		"\(.left.value):\(.right.value)"
+	elif (._type == "AST.Set") then
+		.values | map(walknode)
+	elif (._type == "AST.Assignment")  then
+		"\(.var | walknode) = \(.val | walknode)"
+	elif (._type == "AST.TypeAnnotation")  then
+		"\(.var | walknode):\(.type | walknode)"
+	elif (._type == "AST.SquareOp") then
+		"\(.var | walknode)[\(.arguments | walkjoin(", "))]"
+	elif (._type == "AST.Return") then
+		"return"
+	elif (._type == "AST.Concat") then
+		"[\(.values | walkjoin(", "))]"
+	elif (._type == "AST.Tuple") then
+		"(\(.values | walkjoin(", ")))"
+	else	# debug catch-all
+		.
+	end;
+
+def range:
+	. as { _type: $type, start: $start, width: $width } |
+	if ($width == 1) then
+		"\($start)"
+	else
+		"\($start + $width - 1):\($start)"
+	end;
+
+def fld:
+	(if (.condstr.text) then "\t\(.condstr.text)"
+	 else "" end) as $cond |
+	"\(.type)\t\(.range | range)\t\(.name)\($cond)";
+
+def condition(source):
+	"# \(source) cond: \(.condition | walknode)";
+
+def unquote:
+	"'" as $q | (ltrimstr($q) | rtrimstr($q));
+
+def binvalue:
+	.value | unquote as $v | "\t0b\($v)\tVAL_\($v)";
+
+def dumpconstants:
+	if   (._type == "Values.Value") then
+		binvalue
+	elif (._type == "Values.ValueRange") then
+		(.start | binvalue), "\t[...]", (.end | binvalue)
+	elif (._type == "Values.ConditionalValue") then
+		"\(.values.values[] | dumpconstants)\t\(condition("Value"))"
+ 	else	# Debug catch all
+		.
+	end;
+
+def dumpenum:
+	# Things like SMIDR_EL1.Affinity do not describe
+	# the value range, hence the []? hack below.
+	(.value.constraints.values[]? | dumpconstants);
+
+def genarrayelt(n; bpf):
+	"<\(.index_variable)>" as $v |
+	(.rangeset | reverse) as $rs |
+	($rs | length) as $nrs |
+	{
+		_type: (if (bpf > 1) then "Fields.ConstantField"
+			else "Fields.Field" end),
+		name: (.name | sub($v; "\(n)")),
+		rangeset: [
+			{
+				_type: "Range",
+				start: (if ($nrs > 1) then $rs[n].start
+				        else $rs[0].start + n * bpf end),
+				width: bpf
+			}
+		],
+		value: { constraints: .values },
+		condstr: (if (.condstr) then
+			    { text: (.condstr.text | sub($v; "\(n)")) }
+			  else
+			    null
+			  end)
+	};
+
+def genarray:
+	# Oh the fun we're having: convert each element of the array
+	# into its own architectural field, warts and all. Additional
+	# fun is provided to compute the number of bits per fields,
+	# as the elements can be spread over multiple rangesets.
+	. as $field |
+	.indexes[0].width as $nr |
+	((reduce .rangeset[].width as $sz (0; . + $sz)) / $nr) as $bpf |
+	[ range(0; $nr) ] | reverse | map(. as $n | $field | genarrayelt($n; $bpf));
+
+# For each range of a field, unpack it as start and width, and
+# apply it to each range of the parent field (used as a base).
+# Although this can result in a combinatorial explosion, the
+# likely case is that one of the two sets is of size one.
+def mergerangesets(base):
+	.[] |
+	.start as $s |
+	.width as $w |
+	base | map({
+			_type: "Range",
+			start: (.start + $s),
+			width: ([ $w, .width ] | min)
+		   });
+
+def depthstr(depth):
+	[ range(0, depth) ] | map(32, 32) | implode;
+
+def walkfields(depth):
+	depthstr(depth) as $dep |
+	if   (._type == "Fields.Reserved" and .value == "RES0") then
+		{ type: "Res0", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved" and .value == "RES1") then
+		{ type: "Res1", name: "", range: .rangeset[] } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ConditionalField") then
+		# Propagate the condition text over all conditional
+		# fields by injecting a new ".condstr.text" field.
+		# Also, the ranges must be combined as they nest.
+		.rangeset as $r |
+		.fields | map(condition("Field") as $c |
+			      .field.condstr |= { text: $c }) |
+			  map(.field.rangeset |= mergerangesets($r)) |
+		.[] | .field | walkfields(depth)
+	elif (._type == "Fields.Dynamic") then
+		({ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } | fld),
+	     	(.rangeset as $r | .instances[] |
+		 ((.display // .name // "Instance") as $src |
+		  "\(depthstr(depth + 1))\(condition($src))",
+		  # Remap the rangesets to display the absolute range
+		  (.values | map(.rangeset |= mergerangesets($r)) |
+		   .[] | walkfields(depth + 1))))
+	elif (._type == "Fields.ConstantField") then
+		({ type: "UnsignedEnum", name: .name, range: .rangeset[], condstr: .condstr } |
+		 "\($dep)\(fld)"),
+		dumpenum,
+		"EndEnum"
+	elif (._type == "Fields.Field") then
+		{ type: "Field", name: .name, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Reserved") then
+		{ type: "Field", name: .value, range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.ImplementationDefined") then
+		{ type: "Field", name: (.name // "IMPDEF"), range: .rangeset[], condstr: .condstr } |
+		"\($dep)\(fld)"
+	elif (._type == "Fields.Array" or ._type == "Fields.Vector") then
+	     	genarray | .[] | walkfields(depth)
+ 	else	# Debug catch all
+		.
+	end;
+
+def tautology:
+	(.condition.value == true);
+
+def walkreg:
+	(.fieldsets | length) as $l |
+	.fieldsets[] |
+	  (if ($l > 1 or (tautology | not)) then condition("Fieldset") else empty end),
+	  (.values[] | walkfields(0));
+
+def bin_to_i:
+	def bintoi:
+		(length - 1) as $e |
+		((.[0] - 48) * ($e | exp2)) + (if ($e > 0) then .[1:] | bintoi
+				     	       else 0 end);
+	explode | bintoi;
+
+def computeencoding:
+	if (.) then
+		if   (._type == "Values.Value") then .value | unquote | bin_to_i
+		elif (._type == "Values.Group") then .value
+		elif (._type == "Values.EquationValue") then "\(.value)[\(.slice[] | range)]"
+		else . # Debug catch all
+		end
+	else
+		"#Imm"
+	end;
+
+def encodings:
+	.encodings | [ .op0, .op1, .CRn, .CRm, .op2 ] | map(computeencoding);
+
+def accessorencoding:
+	(.name | ltrimstr("A64.")) as $name |
+	.encoding[] | "\(.asmvalue)\t\(encodings)\t\($name)";
+
+def accessors:
+	.accessors[] |
+	accessorencoding;
+
+def regcondition:
+	if (tautology | not) then condition("Reg") else empty end;
+
+.[] | select (._type == "Register" or ._type == "RegisterArray") |
+      select (.state == "AArch64" and
+	      ($ARGS.named.REG == null or .name == $ARGS.named.REG)) |
+      "# \(.name)",accessors,regcondition,walkreg
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] arm64: Add basic JSON register parser
  2025-01-02 14:43 [PATCH v2] arm64: Add basic JSON register parser Marc Zyngier
@ 2025-01-06 15:19 ` Mark Brown
  2025-01-06 16:30   ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Brown @ 2025-01-06 15:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, Mark Rutland, Catalin Marinas,
	Will Deacon

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]

On Thu, Jan 02, 2025 at 02:43:39PM +0000, Marc Zyngier wrote:
> We currently populate the sysreg file by hand from the ARM ARM,
> resulting in a bunch of errors being introduced on a regular basis.
> While there is an XML dump of the architecture produced on a quarterly
> basis, the license that comes attached to it excludes any sort of
> open-source usage.

...

> I completely expect this to quickly rewritten by people who know
> what they are doing (I don't) and improved as we understand more
> of the data model.

Thanks for jumping on this so quickly.  

Reviewed-by: Mark Brown <broonie@kernel.org>

to the extent I understand jq, and it seems to be doing sensible things
for the registers I've tried it with.  This is going to be useful for
getting started and it's helpful to see the feature dependencies.  Like
you suggest it's a good basis for further development, even if we also
get some other tools that won't stop this being useful.

What I was thinking with this stuff was to use something like Python and
parse the files into data structures in memory before outputting them,
then we can hopefully use that for things like diffing two architecture
versions while handling cases where we've intentionally diverged from
how the architecture describes things.  It'd be nice to have something
that'd go through and check for updates to registers we already have
mapped and provide an updated sysreg file.  It might however just be
easier to use something like this and then work with sysreg files using
text processing tools.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] arm64: Add basic JSON register parser
  2025-01-06 15:19 ` Mark Brown
@ 2025-01-06 16:30   ` Marc Zyngier
  2025-01-06 18:06     ` Mark Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2025-01-06 16:30 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-arm-kernel, kvmarm, Mark Rutland, Catalin Marinas,
	Will Deacon

On Mon, 06 Jan 2025 15:19:55 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> On Thu, Jan 02, 2025 at 02:43:39PM +0000, Marc Zyngier wrote:
> > We currently populate the sysreg file by hand from the ARM ARM,
> > resulting in a bunch of errors being introduced on a regular basis.
> > While there is an XML dump of the architecture produced on a quarterly
> > basis, the license that comes attached to it excludes any sort of
> > open-source usage.
> 
> ...
> 
> > I completely expect this to quickly rewritten by people who know
> > what they are doing (I don't) and improved as we understand more
> > of the data model.
> 
> Thanks for jumping on this so quickly.

Things you do when you're drunk...

> Reviewed-by: Mark Brown <broonie@kernel.org>
> 
> to the extent I understand jq, and it seems to be doing sensible things
> for the registers I've tried it with.  This is going to be useful for
> getting started and it's helpful to see the feature dependencies.  Like
> you suggest it's a good basis for further development, even if we also
> get some other tools that won't stop this being useful.
> 
> What I was thinking with this stuff was to use something like Python and
> parse the files into data structures in memory before outputting them,

The moment you use a general purpose language, you end-up reinventing
JSON in another way, and I didn't have the courage to do that. Also,
jq feels like an exotic mix of functional programming and Forth,
something toxic enough to be irresistible.

> then we can hopefully use that for things like diffing two architecture
> versions while handling cases where we've intentionally diverged from
> how the architecture describes things. It'd be nice to have something
> that'd go through and check for updates to registers we already have
> mapped and provide an updated sysreg file.  It might however just be
> easier to use something like this and then work with sysreg files using
> text processing tools.

My current plan is somehow different: extract all the trapping
information for each register handled by KVM, and *automatically*
generate tests that check that KVM is doing the right thing. Or rather
convince someone to do that.

Then get rid of all such manually implemented tests from the kernel,
once and for all.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] arm64: Add basic JSON register parser
  2025-01-06 16:30   ` Marc Zyngier
@ 2025-01-06 18:06     ` Mark Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Brown @ 2025-01-06 18:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, Mark Rutland, Catalin Marinas,
	Will Deacon

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

On Mon, Jan 06, 2025 at 04:30:18PM +0000, Marc Zyngier wrote:

> The moment you use a general purpose language, you end-up reinventing
> JSON in another way, and I didn't have the courage to do that. Also,
> jq feels like an exotic mix of functional programming and Forth,
> something toxic enough to be irresistible.

Yeah, it's super neat for the inline processing stuff.

> Mark Brown <broonie@kernel.org> wrote:

> > then we can hopefully use that for things like diffing two architecture
> > versions while handling cases where we've intentionally diverged from
> > how the architecture describes things. It'd be nice to have something
> > that'd go through and check for updates to registers we already have
> > mapped and provide an updated sysreg file.  It might however just be
> > easier to use something like this and then work with sysreg files using
> > text processing tools.

> My current plan is somehow different: extract all the trapping
> information for each register handled by KVM, and *automatically*
> generate tests that check that KVM is doing the right thing. Or rather
> convince someone to do that.

> Then get rid of all such manually implemented tests from the kernel,
> once and for all.

That would be awesome - there's far too much stuff needs typing in
there.  There's a bunch of these places where we either already use
register data or could use it if it required less typing/verification to
do so, the ideas I mentioned were just the first things that popped into
my mind.  Having a reliable machine readable data source for this
information is going to be *so* helpful.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-06 18:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 14:43 [PATCH v2] arm64: Add basic JSON register parser Marc Zyngier
2025-01-06 15:19 ` Mark Brown
2025-01-06 16:30   ` Marc Zyngier
2025-01-06 18:06     ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.