Devicetree

Devicetree
 help / color / mirror / Atom feed

* Re: [PATCH v5 02/10] dt-bindings: mailbox: Add mboxes property for CMDQ secure driver
From: Jason-JH Lin (林睿祥) @ 2024-04-04  4:31 UTC (permalink / raw)
  To: Shawn Sung (宋孝謙), conor@kernel.org
  Cc: linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org,
	Houlong Wei (魏厚龙),
	devicetree@vger.kernel.org, CK Hu (胡俊光),
	conor+dt@kernel.org, robh@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	krzysztof.kozlowski+dt@linaro.org, matthias.bgg@gmail.com,
	jassisinghbrar@gmail.com, angelogioacchino.delregno@collabora.com
In-Reply-To: <20240403-conflict-detest-717b4175a00c@spud>

Hi Conor,

Thanks for the reviews.

On Wed, 2024-04-03 at 16:46 +0100, Conor Dooley wrote:
> On Wed, Apr 03, 2024 at 06:25:54PM +0800, Shawn Sung wrote:
> > From: "Jason-JH.Lin" <jason-jh.lin@mediatek.com>
> > 
> > Add mboxes to define a GCE loopping thread as a secure irq handler.
> > This property is only required if CMDQ secure driver is supported.
> > 
> > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com>
> > Signed-off-by: Hsiao Chien Sung <shawn.sung@mediatek.com>
> > ---
> >  .../bindings/mailbox/mediatek,gce-mailbox.yaml         | 10
> > ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git
> > a/Documentation/devicetree/bindings/mailbox/mediatek,gce-
> > mailbox.yaml
> > b/Documentation/devicetree/bindings/mailbox/mediatek,gce-
> > mailbox.yaml
> > index cef9d76013985..c0d80cc770899 100644
> > --- a/Documentation/devicetree/bindings/mailbox/mediatek,gce-
> > mailbox.yaml
> > +++ b/Documentation/devicetree/bindings/mailbox/mediatek,gce-
> > mailbox.yaml
> > @@ -49,6 +49,16 @@ properties:
> >      items:
> >        - const: gce
> >  
> > +  mediatek,gce-events:
> > +    description:
> > +      The event id which is mapping to the specific hardware event
> > signal
> > +      to gce. The event id is defined in the gce header
> > +      include/dt-bindings/gce/<chip>-gce.h of each chips.
> 
> Missing any info here about when this should be used, hint - you have
> it
> in the commit message.
> 
> > +    $ref: /schemas/types.yaml#/definitions/uint32-arrayi
> 
> Why is the ID used by the CMDQ service not fixed for each SoC?
> 
I forgot to sync with Shawn about this:
https://lore.kernel.org/all/20240124011459.12204-1-jason-
jh.lin@mediatek.com

I'll fix it at the next version.

Regards,
Jason-JH.Lin

> Cheers,
> Conor

^ permalink raw reply

* Re: [PATCH v6 1/2] dt-bindings: usb: Add the binding example for the Genesys Logic GL3523 hub
From: Anand Moon @ 2024-04-04  4:27 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rob Herring, Greg Kroah-Hartman, Krzysztof Kozlowski,
	Conor Dooley, Icenowy Zheng, Neil Armstrong, linux-amlogic,
	Conor Dooley, linux-usb, devicetree, linux-kernel
In-Reply-To: <CANAwSgS8ip+FvuvgusjNwnVL5Z68PRmEdwfQxhst_ZoVZFoFNw@mail.gmail.com>

Hi Krzysztof,

On Tue, 12 Dec 2023 at 18:47, Anand Moon <linux.amoon@gmail.com> wrote:
>
> Hi Krzysztof,
>
> On Tue, 12 Dec 2023 at 18:39, Krzysztof Kozlowski
> <krzysztof.kozlowski@linaro.org> wrote:
> >
> > On 12/12/2023 13:51, Anand Moon wrote:
> > > Hi Krzysztof,
> > >
> > > On Tue, 12 Dec 2023 at 17:22, Krzysztof Kozlowski
> > > <krzysztof.kozlowski@linaro.org> wrote:
> > >>
> > >> On 12/12/2023 12:37, Anand Moon wrote:
> > >>>
> > >>> Here is the list of warnings I observed with this patch
> > >>>
> > >>>   DTC_CHK Documentation/devicetree/bindings/usb/nvidia,tegra186-xusb.example.dtb
> > >>> /home/amoon/mainline/linux-amlogic-6.y-devel/Documentation/devicetree/bindings/usb/usb-device.example.dtb:
> > >>> hub@1: 'vdd-supply' is a required property
> > >>
> > >> You always require the property, but it is not valid for some devices.
> > >> Just require it only where it is applicable (in if:then: clause).
> > >>
> > > I had already done this check many times before.
> >
> > I don't ask you to check. I ask you to change the code.
> >
> I have tried this and it's not working for me.
>
> > > my v6 original patch was doing the same and it passed all the tests
> > > but since I updated the required field it not parsing correctly.
> >
> > Your original v6 patch was different. I don't understand what you are
> > trying to achieve. Or rather: how is it different, that my simple advice
> > above does not work for you  (as in the past you reply with some really
> > unrelated sentence).
> >
> Ok, It's my poor English grammar, thanks for your review comments.
>
> > Best regards,
> > Krzysztof
> >

Any reason this device tree binding got removed,I cannot find this file
Can not find the commit which removed this file.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/usb?h=v6.9-rc2

Thanks
-Anand

^ permalink raw reply

* Re: [PATCH] dt-bindings: extcon: ptn5150: Document the 'port' node
From: Frank Li @ 2024-04-04  4:06 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: cw00.choi, krzk, myungjoo.ham, robh, conor+dt, devicetree, marex,
	Fabio Estevam
In-Reply-To: <20240404022943.528293-1-festevam@gmail.com>

On Wed, Apr 03, 2024 at 11:29:43PM -0300, Fabio Estevam wrote:
> From: Fabio Estevam <festevam@denx.de>
> 
> Doument the port node to link the PTN5150 to a TypeC controller.
> 
> This fixes the following dt-schema warnings:
> 
> imx8mp-dhcom-pdk3.dtb: typec@3d: 'port' does not match any of the regexes: 'pinctrl-[0-9]+'
> 	from schema $id: http://devicetree.org/schemas/extcon/extcon-ptn5150.yaml#
> 
> Signed-off-by: Fabio Estevam <festevam@denx.de>

Thanks, I met the same issue.

Reviewed-by: Frank Li <Frank.Li@nxp.com>

> ---
>  .../devicetree/bindings/extcon/extcon-ptn5150.yaml    | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml b/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
> index d5cfa32ea52d..3472c69056ac 100644
> --- a/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
> +++ b/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
> @@ -36,6 +36,11 @@ properties:
>      description:
>        GPIO pin (output) used to control VBUS. If skipped, no such control
>        takes place.
> +  port:
> +    $ref: /schemas/graph.yaml#/$defs/port-base
> +    description:
> +      A port node to link the PTN5150 to a TypeC controller.
> +    unevaluatedProperties: false
>  
>  required:
>    - compatible
> @@ -58,5 +63,11 @@ examples:
>              interrupt-parent = <&msmgpio>;
>              interrupts = <78 IRQ_TYPE_LEVEL_HIGH>;
>              vbus-gpios = <&msmgpio 148 GPIO_ACTIVE_HIGH>;
> +
> +            port {
> +              ptn5150_out_ep: endpoint {
> +                 remote-endpoint = <&dwc3_0_ep>;
> +              };
> +           };
>          };
>      };
> -- 
> 2.34.1
> 

^ permalink raw reply

* [PATCH 1/1] dt-bindings: media: imx-jpeg: add clocks,clock-names,slot to fix warning
From: Frank Li @ 2024-04-04  3:52 UTC (permalink / raw)
  To: Mirela Rabulea, Mauro Carvalho Chehab, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam,
	open list:NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER,
	open list:NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	open list
  Cc: imx

Fix below DTB_CHECK warning.

make CHECK_DTBS=y freescale/imx8qxp-mek.dtb
  DTC_CHK arch/arm64/boot/dts/freescale/imx8qxp-mek.dtb
arch/arm64/boot/dts/freescale/imx8qxp-mek.dtb: jpegdec@58400000: 'assigned-clock-rates', 'assigned-clocks', 'clock-names', 'clocks', 'slot' do not match any of the regexes: 'pinctrl-[0-9]+'
	from schema $id: http://devicetree.org/schemas/media/nxp,imx8-jpeg.yaml#

Add 'clocks' and 'clock-names' property.
Add 'slot' to choose which physical jpeg slot.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
---

Notes:
    Pass dtb_binding check
    
    make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8  dt_binding_check DT_SCHEMA_FILES=nxp,imx8-jpeg.yaml
      LINT    Documentation/devicetree/bindings
      DTEX    Documentation/devicetree/bindings/media/nxp,imx8-jpeg.example.dts
      CHKDT   Documentation/devicetree/bindings/processed-schema.json
      SCHEMA  Documentation/devicetree/bindings/processed-schema.json
      DTC_CHK Documentation/devicetree/bindings/media/nxp,imx8-jpeg.example.dtb

 .../devicetree/bindings/media/nxp,imx8-jpeg.yaml  | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml b/Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
index 3d9d1db37040d..32820ec42de9d 100644
--- a/Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
+++ b/Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
@@ -31,6 +31,15 @@ properties:
   reg:
     maxItems: 1
 
+  clocks:
+    minItems: 2
+    maxItems: 2
+
+  clock-names:
+    items:
+      - const: per
+      - const: ipg
+
   interrupts:
     description: |
       There are 4 slots available in the IP, which the driver may use
@@ -46,6 +55,12 @@ properties:
     minItems: 2               # Wrapper and 1 slot
     maxItems: 5               # Wrapper and 4 slots
 
+  slot:
+    description: Certain slot number is used.
+    $ref: /schemas/types.yaml#/definitions/uint32
+    minimum: 0
+    maximum: 3
+
 required:
   - compatible
   - reg
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2 02/18] PCI: endpoint: Introduce pci_epc_map_align()
From: Damien Le Moal @ 2024-04-04  2:43 UTC (permalink / raw)
  To: Kishon Vijay Abraham I, Manivannan Sadhasivam, Lorenzo Pieralisi,
	Kishon Vijay Abraham I, Shawn Lin, Krzysztof Wilczyński,
	Bjorn Helgaas, Heiko Stuebner, linux-pci, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, devicetree
  Cc: linux-rockchip, linux-arm-kernel, Rick Wertenbroek,
	Wilfred Mallawa, Niklas Cassel
In-Reply-To: <dccb87db-d826-43fa-a499-cf36ea9b10d5@amd.com>

On 4/3/24 21:33, Kishon Vijay Abraham I wrote:
> Hi Damien,
> 
> On 3/30/2024 9:49 AM, Damien Le Moal wrote:
>> Some endpoint controllers have requirements on the alignment of the
>> controller physical memory address that must be used to map a RC PCI
>> address region. For instance, the rockchip endpoint controller uses
>> at most the lower 20 bits of a physical memory address region as the
>> lower bits of an RC PCI address. For mapping a PCI address region of
>> size bytes starting from pci_addr, the exact number of address bits
>> used is the number of address bits changing in the address range
>> [pci_addr..pci_addr + size - 1].
>>
>> For this example, this creates the following constraints:
>> 1) The offset into the controller physical memory allocated for a
>>     mapping depends on the mapping size *and* the starting PCI address
>>     for the mapping.
>> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>>     the offset needed into the allocated physical memory, which can end
>>     up being a smaller size than the desired mapping size.
>>
>> Handling these constraints independently of the controller being used in
>> a PCI EP function driver is not possible with the current EPC API as
>> it only provides the ->align field in struct pci_epc_features.
>> Furthermore, this alignment is static and does not depend on a mapping
>> pci address and size.
>>
>> Solve this by introducing the function pci_epc_map_align() and the
>> endpoint controller operation ->map_align to allow endpoint function
>> drivers to obtain the size and the offset into a controller address
>> region that must be used to map an RC PCI address region. The size
>> of the physical address region provided by pci_epc_map_align() can then
>> be used as the size argument for the function pci_epc_mem_alloc_addr().
>> The offset into the allocated controller memory can be used to
>> correctly handle data transfers. Of note is that pci_epc_map_align() may
>> indicate upon return a mapping size that is smaller (but not 0) than the
>> requested PCI address region size. For such case, an endpoint function
>> driver must handle data transfers in fragments.
>>
>> The controller operation ->map_align is optional: controllers that do
>> not have any address alignment constraints for mapping a RC PCI address
>> region do not need to implement this operation. For such controllers,
>> pci_epc_map_align() always returns the mapping size as equal
>> to the requested size and an offset equal to 0.
>>
>> The structure pci_epc_map is introduced to represent a mapping start PCI
>> address, size and the size and offset into the controller memory needed
>> for mapping the PCI address region.
>>
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>>   include/linux/pci-epc.h             | 33 +++++++++++++++
>>   2 files changed, 99 insertions(+)
>>
>> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
>> index 754afd115bbd..37758ca91d7f 100644
>> --- a/drivers/pci/endpoint/pci-epc-core.c
>> +++ b/drivers/pci/endpoint/pci-epc-core.c
>> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>   }
>>   EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>   
>> +/**
>> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
>> + *			 address region needed to map a RC PCI address region
>> + * @epc: the EPC device on which address is allocated
>> + * @func_no: the physical endpoint function number in the EPC device
>> + * @vfunc_no: the virtual endpoint function number in the physical function
>> + * @pci_addr: PCI address to which the physical address should be mapped
>> + * @size: the size of the mapping starting from @pci_addr
>> + * @map: populate here the actual size and offset into the controller memory
>> + *       that must be allocated for the mapping
>> + *
>> + * Invoke the controller map_align operation to obtain the size and the offset
>> + * into a controller address region that must be allocated to map @size
>> + * bytes of the RC PCI address space starting from @pci_addr.
>> + *
>> + * The size of the mapping that can be handled by the controller is indicated
>> + * using the pci_size field of @map. This size may be smaller than the requested
>> + * @size. In such case, the function driver must handle the mapping using
>> + * several fragments. The offset into the controller memory for the effective
>> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
>> + * using the map_ofst field of @map.
>> + */
>> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
>> +{
>> +	const struct pci_epc_features *features;
>> +	size_t mask;
>> +	int ret;
>> +
>> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>> +		return -EINVAL;
>> +
>> +	if (!size || !map)
>> +		return -EINVAL;
>> +
>> +	memset(map, 0, sizeof(*map));
>> +	map->pci_addr = pci_addr;
>> +	map->pci_size = size;
>> +
>> +	if (epc->ops->map_align) {
>> +		mutex_lock(&epc->lock);
>> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
>> +		mutex_unlock(&epc->lock);
>> +		return ret;
>> +	}
>> +
>> +	/*
>> +	 * Assume a fixed alignment constraint as specified by the controller
>> +	 * features.
>> +	 */
>> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
>> +	if (!features || !features->align) {
>> +		map->map_pci_addr = pci_addr;
>> +		map->map_size = size;
>> +		map->map_ofst = 0;
>> +	}
> 
> The 'align' of pci_epc_features was initially added only to address the 
> inbound ATU constraints. This is also added as comment in [1]. The PCI 
> address restrictions (only fixed alignment constraint) were handled by 
> the host side driver and depends on the connected endpoint device 
> (atleast it was like that for pci_endpoint_test.c [2]).
> So pci-epf-test.c used the 'align' in pci_epc_features only as part of 
> pci_epf_alloc_space().
> 
> Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using 
> it out of pci_epf_alloc_space(), I think we should keep the 'align' of 
> pci_epc_features only within pci_epf_alloc_space() and controllers with 
> any PCI address restrictions to implement ->map_align(). This could as 
> well be done in a phased manner to let controllers implement 
> ->map_align() and then remove using  pci_epc_features in 
> pci_epc_map_align(). Let me know what you think?

Yep, good idea. I will remove the use of "align" as a default alignment
constraint. For controllers that have a fixed alignment constraint (not
necessarilly epc->features->align), it is trivial to provide a generic helper
function that implements the ->map_align method.


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply

* [PATCH] dt-bindings: extcon: ptn5150: Document the 'port' node
From: Fabio Estevam @ 2024-04-04  2:29 UTC (permalink / raw)
  To: cw00.choi
  Cc: krzk, myungjoo.ham, robh, conor+dt, devicetree, marex,
	Fabio Estevam

From: Fabio Estevam <festevam@denx.de>

Doument the port node to link the PTN5150 to a TypeC controller.

This fixes the following dt-schema warnings:

imx8mp-dhcom-pdk3.dtb: typec@3d: 'port' does not match any of the regexes: 'pinctrl-[0-9]+'
	from schema $id: http://devicetree.org/schemas/extcon/extcon-ptn5150.yaml#

Signed-off-by: Fabio Estevam <festevam@denx.de>
---
 .../devicetree/bindings/extcon/extcon-ptn5150.yaml    | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml b/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
index d5cfa32ea52d..3472c69056ac 100644
--- a/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
+++ b/Documentation/devicetree/bindings/extcon/extcon-ptn5150.yaml
@@ -36,6 +36,11 @@ properties:
     description:
       GPIO pin (output) used to control VBUS. If skipped, no such control
       takes place.
+  port:
+    $ref: /schemas/graph.yaml#/$defs/port-base
+    description:
+      A port node to link the PTN5150 to a TypeC controller.
+    unevaluatedProperties: false
 
 required:
   - compatible
@@ -58,5 +63,11 @@ examples:
             interrupt-parent = <&msmgpio>;
             interrupts = <78 IRQ_TYPE_LEVEL_HIGH>;
             vbus-gpios = <&msmgpio 148 GPIO_ACTIVE_HIGH>;
+
+            port {
+              ptn5150_out_ep: endpoint {
+                 remote-endpoint = <&dwc3_0_ep>;
+              };
+           };
         };
     };
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] dt-bindings: mailbox: arm,mhuv3: Add bindings
From: kernel test robot @ 2024-04-04  2:01 UTC (permalink / raw)
  To: Cristian Marussi, linux-kernel, linux-arm-kernel, devicetree
  Cc: oe-kbuild-all, sudeep.holla, cristian.marussi, jassisinghbrar,
	robh+dt, krzysztof.kozlowski+dt, conor+dt
In-Reply-To: <20240403171346.3173843-2-cristian.marussi@arm.com>

Hi Cristian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on soc/for-next]
[also build test WARNING on robh/for-next linus/master v6.9-rc2 next-20240403]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Cristian-Marussi/dt-bindings-mailbox-arm-mhuv3-Add-bindings/20240404-012010
base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
patch link:    https://lore.kernel.org/r/20240403171346.3173843-2-cristian.marussi%40arm.com
patch subject: [PATCH v2 1/2] dt-bindings: mailbox: arm,mhuv3: Add bindings
compiler: loongarch64-linux-gcc (GCC) 13.2.0
reproduce: (https://download.01.org/0day-ci/archive/20240404/202404040918.E8nkWuIn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404040918.E8nkWuIn-lkp@intel.com/

dtcheck warnings: (new ones prefixed by >>)
>> Documentation/devicetree/bindings/mailbox/arm,mhuv3.yaml:86:1: [error] syntax error: found character '\t' that cannot start any token (syntax)
--
>> Documentation/devicetree/bindings/mailbox/arm,mhuv3.yaml:86:1: found a tab character where an indentation space is expected
--
>> Documentation/devicetree/bindings/mailbox/arm,mhuv3.yaml: ignoring, error parsing file

vim +86 Documentation/devicetree/bindings/mailbox/arm,mhuv3.yaml

     8	
     9	maintainers:
    10	  - Sudeep Holla <sudeep.holla@arm.com>
    11	  - Cristian Marussi <cristian.marussi@arm.com>
    12	
    13	description: |
    14	  The Arm Message Handling Unit (MHU) Version 3 is a mailbox controller that
    15	  enables unidirectional communications with remote processors through various
    16	  possible transport protocols.
    17	  The controller can optionally support a varying number of extensions that, in
    18	  turn, enable different kinds of transport to be used for communication.
    19	  Number, type and characteristics of each supported extension can be discovered
    20	  dynamically at runtime.
    21	
    22	  Given the unidirectional nature of the controller, an MHUv3 mailbox controller
    23	  is composed of a MHU Sender (MHUS) containing a PostBox (PBX) block and a MHU
    24	  Receiver (MHUR) containing a MailBox (MBX) block, where
    25	
    26	   PBX is used to
    27	      - Configure the MHU
    28	      - Send Transfers to the Receiver
    29	      - Optionally receive acknowledgment of a Transfer from the Receiver
    30	
    31	   MBX is used to
    32	      - Configure the MHU
    33	      - Receive Transfers from the Sender
    34	      - Optionally acknowledge Transfers sent by the Sender
    35	
    36	  Both PBX and MBX need to be present and defined in the DT description if you
    37	  need to establish a bidirectional communication, since you will have to
    38	  acquire two distinct unidirectional channels, one for each block.
    39	
    40	  As a consequence both blocks needs to be represented separately and specified
    41	  as distinct DT nodes in order to properly describe their resources.
    42	
    43	  Note that, though, thanks to the runtime discoverability, there is no need to
    44	  identify the type of blocks with distinct compatibles.
    45	
    46	  Following are the MHUv3 possible extensions.
    47	
    48	  - Doorbell Extension (DBE): DBE defines a type of channel called a Doorbell
    49	    Channel (DBCH). DBCH enables a single bit Transfer to be sent from the
    50	    Sender to Receiver. The Transfer indicates that an event has occurred.
    51	    When DBE is implemented, the number of DBCHs that an implementation of the
    52	    MHU can support is between 1 and 128, numbered starting from 0 in ascending
    53	    order and discoverable at run-time.
    54	    Each DBCH contains 32 individual fields, referred to as flags, each of which
    55	    can be used independently. It is possible for the Sender to send multiple
    56	    Transfers at once using a single DBCH, so long as each Transfer uses
    57	    a different flag in the DBCH.
    58	    Optionally, data may be transmitted through an out-of-band shared memory
    59	    region, wherein the MHU Doorbell is used strictly as an interrupt generation
    60	    mechanism, but this is out of the scope of these bindings.
    61	
    62	  - FastChannel Extension (FCE): FCE defines a type of channel called a Fast
    63	    Channel (FCH). FCH is intended for lower overhead communication between
    64	    Sender and Receiver at the expense of determinism. An FCH allows the Sender
    65	    to update the channel value at any time, regardless of whether the previous
    66	    value has been seen by the Receiver. When the Receiver reads the channel's
    67	    content it gets the last value written to the channel.
    68	    FCH is considered lossy in nature, and means that the Sender has no way of
    69	    knowing if, or when, the Receiver will act on the Transfer.
    70	    FCHs are expected to behave as RAM which generates interrupts when writes
    71	    occur to the locations within the RAM.
    72	    When FCE is implemented, the number of FCHs that an implementation of the
    73	    MHU can support is between 1-1024, if the FastChannel word-size is 32-bits,
    74	    or between 1-512, when the FastChannel word-size is 64-bits.
    75	    FCHs are numbered from 0 in ascending order.
    76	    Note that the number of FCHs and the word-size are implementation defined,
    77	    not configurable but discoverable at run-time.
    78	    Optionally, data may be transmitted through an out-of-band shared memory
    79	    region, wherein the MHU FastChannel is used as an interrupt generation
    80	    mechanism which carries also a pointer to such out-of-band data, but this
    81	    is out of the scope of these bindings.
    82	
    83	  - FIFO Extension (FE): FE defines a Channel type called a FIFO Channel (FFCH).
    84	    FFCH allows a Sender to send
    85	       - Multiple Transfers to the Receiver without having to wait for the
  > 86		 previous Transfer to be acknowledged by the Receiver, as long as the
    87		 FIFO has room for the Transfer.
    88	       - Transfers which require the Receiver to provide acknowledgment.
    89	       - Transfers which have in-band payload.
    90	    In all cases, the data is guaranteed to be observed by the Receiver in the
    91	    same order which the Sender sent it.
    92	    When FE is implemented, the number of FFCHs that an implementation of the
    93	    MHU can support is between 1 and 64, numbered starting from 0 in ascending
    94	    order. The number of FFCHs, their depth (same for all implemented FFCHs) and
    95	    the access-granularity are implementation defined, not configurable but
    96	    discoverable at run-time.
    97	    Optionally, additional data may be transmitted through an out-of-band shared
    98	    memory region, wherein the MHU FIFO is used to transmit, in order, a small
    99	    part of the payload (like a header) and a reference to the shared memory
   100	    area holding the remaining, bigger, chunk of the payload, but this is out of
   101	    the scope of these bindings.
   102	
   103	properties:
   104	  compatible:
   105	    const: arm,mhuv3
   106	
   107	  reg:
   108	    maxItems: 1
   109	
   110	  interrupts:
   111	    minItems: 1
   112	    maxItems: 74
   113	
   114	  interrupt-names:
   115	    description: |
   116	      The MHUv3 controller generates a number of events some of which are used
   117	      to generate interrupts; as a consequence it can expose a varying number of
   118	      optional PBX/MBX interrupts, representing the events generated during the
   119	      operation of the various transport protocols associated with different
   120	      extensions. All interrupts of the MHU are level-sensitive.
   121	      Some of these optional interrupts are defined per-channel, where the
   122	      number of channels effectively available is implementation defined and
   123	      run-time discoverable.
   124	      In the following names are enumerated using patterns, with per-channel
   125	      interrupts implicitly capped at the maximum channels allowed by the
   126	      specification for each extension type.
   127	      For the sake of simplicity maxItems is anyway capped to a most plausible
   128	      number, assuming way less channels would be implemented than actually
   129	      possible.
   130	
   131	      The only mandatory interrupts on the MHU are:
   132	        - combined
   133	        - mbx-fch-xfer-<N> but only if mbx-fcgrp-xfer-<N> is not implemented.
   134	
   135	    minItems: 1
   136	    maxItems: 74
   137	    items:
   138	      oneOf:
   139	        - const: combined
   140	          description: PBX/MBX Combined interrupt
   141	        - const: combined-ffch
   142	          description: PBX/MBX FIFO Combined interrupt
   143	        - pattern: '^ffch-low-tide-[0-9]+$'
   144	          description: PBX/MBX FIFO Channel <N> Low Tide interrupt
   145	        - pattern: '^ffch-high-tide-[0-9]+$'
   146	          description: PBX/MBX FIFO Channel <N> High Tide interrupt
   147	        - pattern: '^ffch-flush-[0-9]+$'
   148	          description: PBX/MBX FIFO Channel <N> Flush interrupt
   149	        - pattern: '^mbx-dbch-xfer-[0-9]+$'
   150	          description: MBX Doorbell Channel <N> Transfer interrupt
   151	        - pattern: '^mbx-fch-xfer-[0-9]+$'
   152	          description: MBX FastChannel <N> Transfer interrupt
   153	        - pattern: '^mbx-fchgrp-xfer-[0-9]+$'
   154	          description: MBX FastChannel <N> Group Transfer interrupt
   155	        - pattern: '^mbx-ffch-xfer-[0-9]+$'
   156	          description: MBX FIFO Channel <N> Transfer interrupt
   157	        - pattern: '^pbx-dbch-xfer-ack-[0-9]+$'
   158	          description: PBX Doorbell Channel <N> Transfer Ack interrupt
   159	        - pattern: '^pbx-ffch-xfer-ack-[0-9]+$'
   160	          description: PBX FIFO Channel <N> Transfer Ack interrupt
   161	
   162	  '#mbox-cells':
   163	    description: |
   164	      The first argument in the consumers 'mboxes' property represents the
   165	      extension type, the second is for the channel number while the third
   166	      depends on extension type.
   167	
   168	      Extension type for DBE is 0 and the third parameter represents the
   169	      doorbell flag number to use.
   170	      Extension type for FCE is 1, third parameter unused.
   171	      Extension type for FE is 2, third parameter unused.
   172	
   173	      mboxes = <&mhu 0 0 5>; // DBE, Doorbell Channel Window 0, doorbell flag 5.
   174	      mboxes = <&mhu 0 1 7>; // DBE, Doorbell Channel Window 1, doorbell flag 7.
   175	      mboxes = <&mhu 1 0 0>; // FCE, FastChannel Window 0.
   176	      mboxes = <&mhu 1 3 0>; // FCE, FastChannel Window 3.
   177	      mboxes = <&mhu 2 1 0>; // FE, FIFO Channel Window 1.
   178	      mboxes = <&mhu 2 7 0>; // FE, FIFO Channel Window 7.
   179	    const: 3
   180	
   181	  clocks:
   182	    maxItems: 1
   183	
   184	required:
   185	  - compatible
   186	  - reg
   187	  - interrupts
   188	  - interrupt-names
   189	  - '#mbox-cells'
   190	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v6 3/4] dt-bindings: watchdog: aspeed-wdt: Add aspeed,scu
From: Andrew Jeffery @ 2024-04-04  1:50 UTC (permalink / raw)
  To: PeterYin, Rob Herring
  Cc: patrick, Wim Van Sebroeck, Guenter Roeck, Krzysztof Kozlowski,
	Conor Dooley, Joel Stanley, linux-watchdog, devicetree,
	linux-arm-kernel, linux-aspeed, linux-kernel
In-Reply-To: <79b7e2ef-6f53-4642-ad3f-99b8ce780a7f@gmail.com>

On Wed, 2024-04-03 at 17:18 +0800, PeterYin wrote:
> Thanks, I can wait you update it and send a new version for wdt driver.

I've sent v2:


https://lore.kernel.org/linux-watchdog/20240403020439.418788-1-andrew@codeconstruct.com.au/

Rob's okay with it:

https://lore.kernel.org/linux-watchdog/20240403171321.GA3996007-robh@kernel.org/

Feel free to address his comment there if you integrate it into your
series, though make sure to add his tag, keep my authorship, and append
your own S-o-b if you do.

Andrew


^ permalink raw reply

* Re: [PATCH 1/4] ARM: dts: aspeed: greatlakes: correct Mellanox multi-host property
From: Andrew Jeffery @ 2024-04-04  1:41 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Joel Stanley, devicetree, linux-arm-kernel,
	linux-aspeed, linux-kernel
In-Reply-To: <171213860535.16780.4635499105199545058.b4-ty@linaro.org>

On Wed, 2024-04-03 at 12:04 +0200, Krzysztof Kozlowski wrote:
> On Sat, 09 Dec 2023 11:44:09 +0100, Krzysztof Kozlowski wrote:
> > "mlx,multi-host" is using incorrect vendor prefix and is not documented.
> > 
> > 
> 
> These wait for ~4 months and they were not picked up. Let me know if anyone
> else wants to take these.
> 
> Applied, thanks!
> 
> [1/4] ARM: dts: aspeed: greatlakes: correct Mellanox multi-host property
>       https://git.kernel.org/krzk/linux-dt/c/7da85354c4fa35b862294dbbb450baeb405b5a92
> [2/4] ARM: dts: aspeed: minerva-cmc: correct Mellanox multi-host property
>       https://git.kernel.org/krzk/linux-dt/c/e515719c17beb9625a90039f6c45fa36d58bdda2
> [3/4] ARM: dts: aspeed: yosemite4: correct Mellanox multi-host property
>       https://git.kernel.org/krzk/linux-dt/c/af3deaf9bcb4571feb89a4050c7ad75de9aa8e1e
> [4/4] ARM: dts: aspeed: yosemitev2: correct Mellanox multi-host property
>       https://git.kernel.org/krzk/linux-dt/c/cac1c1dda6130771e06ace030b1b0ed62096a912
> 
> Best regards,

Ah, my apologies. Joel's on leave and I'm accumulating patches in a
tree for him in the mean time. I've had some things going on
professionally (changed jobs) and personally, and these fell into a bit
of a hole.

I'm okay for these patches to be integrated through your tree, given
you've already applied them. Feel free to add acks if your branch
allows:

Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>

I'm working to stay on top of things a bit more now than I have in the
recent past, so hopefully I won't miss patches again in the future.

Andrew

^ permalink raw reply

* Re: [PATCH v2 1/3] ARM: dts: Modify GPIO table for Asrock X570D4U BMC
From: Andrew Jeffery @ 2024-04-04  1:17 UTC (permalink / raw)
  To: Renze Nicolai, linux-arm-kernel, devicetree, linux-kernel,
	linux-aspeed, arnd, olof, soc, robh+dt, krzysztof.kozlowski+dt,
	joel, andrew
In-Reply-To: <20240403133037.37782-2-renze@rnplus.nl>

Hi Renze,

In the future, the start of the subject should also include 'aspeed: ',
so:

    ARM: dts: aspeed: Modify GPIO table for Asrock X570D4U BMC

On Wed, 2024-04-03 at 15:28 +0200, Renze Nicolai wrote:
> Restructure GPIO table to fit maximum line length.
> 
> Fix mistakes found while working on OpenBMC
> userland configuration and based on probing
> the board.
> 
> Schematic for this board is not available.
> Because of this the choice was made to
> use a descriptive method for naming the
> GPIOs.
> 
>  - Push-pull outputs start with output-*
>  - Open-drain outputs start with control-*
>  - LED outputs start with led-*
>  - Inputs start with input-*
>  - Button inputs start with button-*
>  - Active low signals end with *-n

This seems to be a bit of a mix of following conventions in [1] and
not. It might be helpful to weigh in on that document with your ideas.

[1]: https://github.com/openbmc/docs/blob/master/designs/device-tree-gpio-naming.md

I'll put this series in a tree for Joel for now though, with the
subject fix mentioned above.

I've also re-wrapped the commit messages as it seems you stopped a bit
short of the allowable line length.

Andrew

^ permalink raw reply

* Re: [PATCH v13 4/5] clk: sophgo: Add SG2042 clock driver
From: Chen Wang @ 2024-04-04  1:04 UTC (permalink / raw)
  To: Chen Wang, aou, chao.wei, conor, krzysztof.kozlowski+dt,
	mturquette, palmer, paul.walmsley, richardcochran, robh+dt, sboyd,
	devicetree, linux-clk, linux-kernel, linux-riscv, haijiao.liu,
	xiaoguang.xing, guoren, jszhang, inochiama, samuel.holland
In-Reply-To: <816122e9f22ddd9927e81e627be7f4683ba5c9e8.1711692169.git.unicorn_wang@outlook.com>

Ping ~~~

Hi, Stephen,

Can you please take a review of this patch, I have improved the code as 
per your comments in v11.

If it looks good to you, I hope this patchset(driver and bindings part) 
can be picked into 6.10, and I will handle the left dts part.

Thanks,

Chen

On 2024/3/29 14:21, Chen Wang wrote:
> From: Chen Wang <unicorn_wang@outlook.com>
>
> Add a driver for the SOPHGO SG2042 clocks.
>
> Signed-off-by: Chen Wang <unicorn_wang@outlook.com>
> ---
>   drivers/clk/Kconfig                    |    1 +
>   drivers/clk/Makefile                   |    1 +
>   drivers/clk/sophgo/Kconfig             |    7 +
>   drivers/clk/sophgo/Makefile            |    2 +
>   drivers/clk/sophgo/clk-sophgo-sg2042.c | 1410 ++++++++++++++++++++++++
>   drivers/clk/sophgo/clk-sophgo-sg2042.h |  216 ++++
>   6 files changed, 1637 insertions(+)
>   create mode 100644 drivers/clk/sophgo/Kconfig
>   create mode 100644 drivers/clk/sophgo/Makefile
>   create mode 100644 drivers/clk/sophgo/clk-sophgo-sg2042.c
>   create mode 100644 drivers/clk/sophgo/clk-sophgo-sg2042.h
>
> diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> index 50af5fc7f570..bc28502ec3c9 100644
> --- a/drivers/clk/Kconfig
> +++ b/drivers/clk/Kconfig
> @@ -489,6 +489,7 @@ source "drivers/clk/rockchip/Kconfig"
>   source "drivers/clk/samsung/Kconfig"
>   source "drivers/clk/sifive/Kconfig"
>   source "drivers/clk/socfpga/Kconfig"
> +source "drivers/clk/sophgo/Kconfig"
>   source "drivers/clk/sprd/Kconfig"
>   source "drivers/clk/starfive/Kconfig"
>   source "drivers/clk/sunxi/Kconfig"
> diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
> index 14fa8d4ecc1f..4abe16c8ccdf 100644
> --- a/drivers/clk/Makefile
> +++ b/drivers/clk/Makefile
> @@ -118,6 +118,7 @@ obj-$(CONFIG_ARCH_ROCKCHIP)		+= rockchip/
>   obj-$(CONFIG_COMMON_CLK_SAMSUNG)	+= samsung/
>   obj-$(CONFIG_CLK_SIFIVE)		+= sifive/
>   obj-y					+= socfpga/
> +obj-y					+= sophgo/
>   obj-$(CONFIG_PLAT_SPEAR)		+= spear/
>   obj-y					+= sprd/
>   obj-$(CONFIG_ARCH_STI)			+= st/
> diff --git a/drivers/clk/sophgo/Kconfig b/drivers/clk/sophgo/Kconfig
> new file mode 100644
> index 000000000000..2523818d64f9
> --- /dev/null
> +++ b/drivers/clk/sophgo/Kconfig
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +config CLK_SOPHGO_SG2042
> +	bool "Sophgo SG2042 clock support"
> +	depends on ARCH_SOPHGO || COMPILE_TEST
> +	help
> +	  Say yes here to support the clock controller on the Sophgo SG2042 SoC.
> diff --git a/drivers/clk/sophgo/Makefile b/drivers/clk/sophgo/Makefile
> new file mode 100644
> index 000000000000..13834cce260c
> --- /dev/null
> +++ b/drivers/clk/sophgo/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_CLK_SOPHGO_SG2042)	+= clk-sophgo-sg2042.o
> diff --git a/drivers/clk/sophgo/clk-sophgo-sg2042.c b/drivers/clk/sophgo/clk-sophgo-sg2042.c
> new file mode 100644
> index 000000000000..7b468e7299ae
> --- /dev/null
> +++ b/drivers/clk/sophgo/clk-sophgo-sg2042.c
> @@ -0,0 +1,1410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Sophgo SG2042 Clock Generator Driver
> + *
> + * Copyright (C) 2024 Sophgo Technology Inc. All rights reserved.
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/clk-provider.h>
> +#include <linux/iopoll.h>
> +#include <linux/platform_device.h>
> +
> +/*
> + * The clock of SG2042 is composed of three parts.
> + * The registers of these three parts of the clock are scattered in three
> + * different memory address spaces:
> + * - pll clocks
> + * - gate clocks for RP subsystem
> + * - div/mux, and gate clocks working for other subsystem than RP subsystem
> + */
> +#include <dt-bindings/clock/sophgo,sg2042-pll.h>
> +#include <dt-bindings/clock/sophgo,sg2042-rpgate.h>
> +#include <dt-bindings/clock/sophgo,sg2042-clkgen.h>
> +
> +#include "clk-sophgo-sg2042.h"
> +
> +#define KHZ 1000UL
> +#define MHZ (KHZ * KHZ)
> +
> +#define REFDIV_MIN 1
> +#define REFDIV_MAX 63
> +#define FBDIV_MIN 16
> +#define FBDIV_MAX 320
> +
> +#define PLL_FREF_SG2042 (25 * MHZ)
> +
> +#define PLL_FOUTPOSTDIV_MIN (16 * MHZ)
> +#define PLL_FOUTPOSTDIV_MAX (3200 * MHZ)
> +
> +#define PLL_FOUTVCO_MIN (800 * MHZ)
> +#define PLL_FOUTVCO_MAX (3200 * MHZ)
> +
> +struct sg2042_pll_ctrl {
> +	unsigned long freq;
> +	unsigned int fbdiv;
> +	unsigned int postdiv1;
> +	unsigned int postdiv2;
> +	unsigned int refdiv;
> +};
> +
> +#define PLLCTRL_FBDIV_SHIFT	16
> +#define PLLCTRL_FBDIV_MASK	(GENMASK(27, 16) >> PLLCTRL_FBDIV_SHIFT)
> +#define PLLCTRL_POSTDIV2_SHIFT	12
> +#define PLLCTRL_POSTDIV2_MASK	(GENMASK(14, 12) >> PLLCTRL_POSTDIV2_SHIFT)
> +#define PLLCTRL_POSTDIV1_SHIFT	8
> +#define PLLCTRL_POSTDIV1_MASK	(GENMASK(10, 8) >> PLLCTRL_POSTDIV1_SHIFT)
> +#define PLLCTRL_REFDIV_SHIFT	0
> +#define PLLCTRL_REFDIV_MASK	(GENMASK(5, 0) >> PLLCTRL_REFDIV_SHIFT)
> +
> +static inline u32 sg2042_pll_ctrl_encode(struct sg2042_pll_ctrl *ctrl)
> +{
> +	return ((ctrl->fbdiv & PLLCTRL_FBDIV_MASK) << PLLCTRL_FBDIV_SHIFT) |
> +	       ((ctrl->postdiv2 & PLLCTRL_POSTDIV2_MASK) << PLLCTRL_POSTDIV2_SHIFT) |
> +	       ((ctrl->postdiv1 & PLLCTRL_POSTDIV1_MASK) << PLLCTRL_POSTDIV1_SHIFT) |
> +	       ((ctrl->refdiv & PLLCTRL_REFDIV_MASK) << PLLCTRL_REFDIV_SHIFT);
> +}
> +
> +static inline void sg2042_pll_ctrl_decode(unsigned int reg_value,
> +					  struct sg2042_pll_ctrl *ctrl)
> +{
> +	ctrl->fbdiv = (reg_value >> PLLCTRL_FBDIV_SHIFT) & PLLCTRL_FBDIV_MASK;
> +	ctrl->refdiv = (reg_value >> PLLCTRL_REFDIV_SHIFT) & PLLCTRL_REFDIV_MASK;
> +	ctrl->postdiv1 = (reg_value >> PLLCTRL_POSTDIV1_SHIFT) & PLLCTRL_POSTDIV1_MASK;
> +	ctrl->postdiv2 = (reg_value >> PLLCTRL_POSTDIV2_SHIFT) & PLLCTRL_POSTDIV2_MASK;
> +}
> +
> +static inline int sg2042_pll_enable(struct sg2042_pll_clock *pll, bool en)
> +{
> +	unsigned int value = 0;
> +
> +	if (en) {
> +		/* wait pll lock */
> +		if (readl_poll_timeout_atomic(pll->base + pll->offset_status,
> +					      value,
> +					      ((value >> pll->shift_status_lock) & 0x1),
> +					      0,
> +					      100000))
> +			pr_warn("%s not locked\n", pll->hw.init->name);
> +
> +		/* wait pll updating */
> +		if (readl_poll_timeout_atomic(pll->base + pll->offset_status,
> +					      value,
> +					      !((value >> pll->shift_status_updating) & 0x1),
> +					      0,
> +					      100000))
> +			pr_warn("%s still updating\n", pll->hw.init->name);
> +
> +		/* enable pll */
> +		value = readl(pll->base + pll->offset_enable);
> +		writel(value | (1 << pll->shift_enable), pll->base + pll->offset_enable);
> +	} else {
> +		/* disable pll */
> +		value = readl(pll->base + pll->offset_enable);
> +		writel(value & (~(1 << pll->shift_enable)), pll->base + pll->offset_enable);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * @reg_value: current register value
> + * @parent_rate: parent frequency
> + *
> + * This function is used to calculate below "rate" in equation
> + * rate = (parent_rate/REFDIV) x FBDIV/POSTDIV1/POSTDIV2
> + *      = (parent_rate x FBDIV) / (REFDIV x POSTDIV1 x POSTDIV2)
> + */
> +static unsigned long sg2042_pll_recalc_rate(unsigned int reg_value,
> +					    unsigned long parent_rate)
> +{
> +	struct sg2042_pll_ctrl ctrl_table;
> +	u64 rate, numerator, denominator;
> +
> +	sg2042_pll_ctrl_decode(reg_value, &ctrl_table);
> +
> +	numerator = parent_rate * ctrl_table.fbdiv;
> +	denominator = ctrl_table.refdiv * ctrl_table.postdiv1 * ctrl_table.postdiv2;
> +	do_div(numerator, denominator);
> +	rate = numerator;
> +
> +	return rate;
> +}
> +
> +/*
> + * Based on input rate/prate/fbdiv/refdiv, look up the postdiv1_2 table
> + * to get the closest postdiiv combination.
> + * postdiv1_2 contains all the possible combination lists of POSTDIV1 and POSTDIV2
> + * for example:
> + * postdiv1_2[0] = {2, 4, 8}, where div1 = 2, div2 = 4 , div1 * div2 = 8
> + *
> + * See TRM:
> + * FOUTPOSTDIV = FREF * FBDIV / REFDIV / (POSTDIV1 * POSTDIV2)
> + * So we get following formula to get POSTDIV1 and POSTDIV2:
> + * POSTDIV = (prate/REFDIV) x FBDIV/rate
> + * above POSTDIV = POSTDIV1*POSTDIV2
> + *
> + * @rate: FOUTPOSTDIV
> + * @prate: parent rate, i.e. FREF
> + * @fbdiv: FBDIV
> + * @refdiv: REFDIV
> + * @postdiv1: POSTDIV1, output
> + * @postdiv2: POSTDIV2, output
> + */
> +static int sg2042_pll_get_postdiv_1_2(unsigned long rate,
> +				      unsigned long prate,
> +				      unsigned int fbdiv,
> +				      unsigned int refdiv,
> +				      unsigned int *postdiv1,
> +				      unsigned int *postdiv2)
> +{
> +	int index;
> +	u64 tmp0;
> +
> +	/* POSTDIV_RESULT_INDEX point to 3rd element in the array postdiv1_2 */
> +	#define	POSTDIV_RESULT_INDEX	2
> +
> +	static int postdiv1_2[][3] = {
> +		{2, 4,  8}, {3, 3,  9}, {2, 5, 10}, {2, 6, 12},
> +		{2, 7, 14}, {3, 5, 15}, {4, 4, 16}, {3, 6, 18},
> +		{4, 5, 20}, {3, 7, 21}, {4, 6, 24}, {5, 5, 25},
> +		{4, 7, 28}, {5, 6, 30}, {5, 7, 35}, {6, 6, 36},
> +		{6, 7, 42}, {7, 7, 49}
> +	};
> +
> +	/* prate/REFDIV and result save to tmp0 */
> +	tmp0 = prate;
> +	do_div(tmp0, refdiv);
> +
> +	/* ((prate/REFDIV) x FBDIV) and result save to tmp0 */
> +	tmp0 *= fbdiv;
> +
> +	/* ((prate/REFDIV) x FBDIV)/rate and result save to tmp0 */
> +	do_div(tmp0, rate);
> +
> +	/* tmp0 is POSTDIV1*POSTDIV2, now we calculate div1 and div2 value */
> +	if (tmp0 <= 7) {
> +		/* (div1 * div2) <= 7, no need to use array search */
> +		*postdiv1 = tmp0;
> +		*postdiv2 = 1;
> +		return 0;
> +	}
> +
> +	/* (div1 * div2) > 7, use array search */
> +	for (index = 0; index < ARRAY_SIZE(postdiv1_2); index++) {
> +		if (tmp0 > postdiv1_2[index][POSTDIV_RESULT_INDEX]) {
> +			continue;
> +		} else {
> +			/* found it */
> +			*postdiv1 = postdiv1_2[index][1];
> +			*postdiv2 = postdiv1_2[index][0];
> +			return 0;
> +		}
> +	}
> +	pr_warn("%s can not find in postdiv array!\n", __func__);
> +	return -EINVAL;
> +}
> +
> +/*
> + * Based on the given FOUTPISTDIV and the input FREF to calculate
> + * the REFDIV/FBDIV/PSTDIV1/POSTDIV2 combination for pllctrl register.
> + * @req_rate: expected output clock rate, i.e. FOUTPISTDIV
> + * @parent_rate: input parent clock rate, i.e. FREF
> + * @best: output to hold calculated combination of REFDIV/FBDIV/PSTDIV1/POSTDIV2
> + */
> +static int sg2042_get_pll_ctl_setting(struct sg2042_pll_ctrl *best,
> +				      unsigned long req_rate,
> +				      unsigned long parent_rate)
> +{
> +	int ret;
> +	unsigned int fbdiv, refdiv, postdiv1, postdiv2;
> +	unsigned long foutpostdiv;
> +	u64 tmp;
> +	u64 foutvco;
> +
> +	if (parent_rate != PLL_FREF_SG2042) {
> +		pr_err("INVALID FREF: %ld\n", parent_rate);
> +		return -EINVAL;
> +	}
> +
> +	if (req_rate < PLL_FOUTPOSTDIV_MIN || req_rate > PLL_FOUTPOSTDIV_MAX) {
> +		pr_alert("INVALID FOUTPOSTDIV: %ld\n", req_rate);
> +		return -EINVAL;
> +	}
> +
> +	memset(best, 0, sizeof(struct sg2042_pll_ctrl));
> +
> +	for (refdiv = REFDIV_MIN; refdiv < REFDIV_MAX + 1; refdiv++) {
> +		/* required by hardware: FREF/REFDIV must > 10 */
> +		tmp = parent_rate;
> +		do_div(tmp, refdiv);
> +		if (tmp <= 10)
> +			continue;
> +
> +		for (fbdiv = FBDIV_MIN; fbdiv < FBDIV_MAX + 1; fbdiv++) {
> +			/*
> +			 * FOUTVCO = FREF*FBDIV/REFDIV validation
> +			 * required by hardware, FOUTVCO must [800MHz, 3200MHz]
> +			 */
> +			foutvco = parent_rate * fbdiv;
> +			do_div(foutvco, refdiv);
> +			if (foutvco < PLL_FOUTVCO_MIN || foutvco > PLL_FOUTVCO_MAX)
> +				continue;
> +
> +			ret = sg2042_pll_get_postdiv_1_2(req_rate, parent_rate,
> +							 fbdiv, refdiv,
> +							 &postdiv1, &postdiv2);
> +			if (ret)
> +				continue;
> +
> +			/*
> +			 * FOUTPOSTDIV = FREF*FBDIV/REFDIV/(POSTDIV1*POSTDIV2)
> +			 *             = FOUTVCO/(POSTDIV1*POSTDIV2)
> +			 */
> +			tmp = foutvco;
> +			do_div(tmp, (postdiv1 * postdiv2));
> +			foutpostdiv = (unsigned long)tmp;
> +			/* Iterative to approach the expected value */
> +			if (abs_diff(foutpostdiv, req_rate) < abs_diff(best->freq, req_rate)) {
> +				best->freq = foutpostdiv;
> +				best->refdiv = refdiv;
> +				best->fbdiv = fbdiv;
> +				best->postdiv1 = postdiv1;
> +				best->postdiv2 = postdiv2;
> +				if (foutpostdiv == req_rate)
> +					return 0;
> +			}
> +			continue;
> +		}
> +	}
> +
> +	if (best->freq == 0)
> +		return -EINVAL;
> +	else
> +		return 0;
> +}
> +
> +/*
> + * @hw: ccf use to hook get sg2042_pll_clock
> + * @parent_rate: parent rate
> + *
> + * The is function will be called through clk_get_rate
> + * and return current rate after decoding reg value
> + */
> +static unsigned long sg2042_clk_pll_recalc_rate(struct clk_hw *hw,
> +						unsigned long parent_rate)
> +{
> +	unsigned int value;
> +	unsigned long rate;
> +	struct sg2042_pll_clock *pll = to_sg2042_pll_clk(hw);
> +
> +	value = readl(pll->base + pll->offset_ctrl);
> +	rate = sg2042_pll_recalc_rate(value, parent_rate);
> +
> +	pr_debug("--> %s: pll_recalc_rate: val = %ld\n",
> +		 clk_hw_get_name(hw), rate);
> +	return rate;
> +}
> +
> +static long sg2042_clk_pll_round_rate(struct clk_hw *hw,
> +				      unsigned long req_rate,
> +				      unsigned long *prate)
> +{
> +	unsigned int value;
> +	struct sg2042_pll_ctrl pctrl_table;
> +	long proper_rate;
> +	int ret;
> +
> +	ret = sg2042_get_pll_ctl_setting(&pctrl_table, req_rate, *prate);
> +	if (ret) {
> +		proper_rate = 0;
> +		goto out;
> +	}
> +
> +	value = sg2042_pll_ctrl_encode(&pctrl_table);
> +	proper_rate = (long)sg2042_pll_recalc_rate(value, *prate);
> +
> +out:
> +	pr_debug("--> %s: pll_round_rate: val = %ld\n",
> +		 clk_hw_get_name(hw), proper_rate);
> +	return proper_rate;
> +}
> +
> +static int sg2042_clk_pll_determine_rate(struct clk_hw *hw,
> +					 struct clk_rate_request *req)
> +{
> +	req->rate = sg2042_clk_pll_round_rate(hw, min(req->rate, req->max_rate),
> +					      &req->best_parent_rate);
> +	pr_debug("--> %s: pll_determine_rate: val = %ld\n",
> +		 clk_hw_get_name(hw), req->rate);
> +	return 0;
> +}
> +
> +static int sg2042_clk_pll_set_rate(struct clk_hw *hw,
> +				   unsigned long rate,
> +				   unsigned long parent_rate)
> +{
> +	unsigned long flags;
> +	unsigned int value;
> +	int ret = 0;
> +	struct sg2042_pll_ctrl pctrl_table;
> +	struct sg2042_pll_clock *pll = to_sg2042_pll_clk(hw);
> +
> +	spin_lock_irqsave(pll->lock, flags);
> +	if (sg2042_pll_enable(pll, 0)) {
> +		pr_warn("Can't disable pll(%s), status error\n", pll->hw.init->name);
> +		goto out;
> +	}
> +	ret = sg2042_get_pll_ctl_setting(&pctrl_table, rate, parent_rate);
> +	if (ret) {
> +		pr_warn("%s: Can't find a proper pll setting\n", pll->hw.init->name);
> +		goto out2;
> +	}
> +
> +	value = sg2042_pll_ctrl_encode(&pctrl_table);
> +
> +	/* write the value to top register */
> +	writel(value, pll->base + pll->offset_ctrl);
> +
> +out2:
> +	sg2042_pll_enable(pll, 1);
> +out:
> +	spin_unlock_irqrestore(pll->lock, flags);
> +
> +	pr_debug("--> %s: pll_set_rate: val = 0x%x\n",
> +		 clk_hw_get_name(hw), value);
> +	return ret;
> +}
> +
> +static const struct clk_ops sg2042_clk_pll_ops = {
> +	.recalc_rate = sg2042_clk_pll_recalc_rate,
> +	.round_rate = sg2042_clk_pll_round_rate,
> +	.determine_rate = sg2042_clk_pll_determine_rate,
> +	.set_rate = sg2042_clk_pll_set_rate,
> +};
> +
> +static const struct clk_ops sg2042_clk_pll_ro_ops = {
> +	.recalc_rate = sg2042_clk_pll_recalc_rate,
> +	.round_rate = sg2042_clk_pll_round_rate,
> +};
> +
> +static unsigned long sg2042_clk_divider_recalc_rate(struct clk_hw *hw,
> +						    unsigned long parent_rate)
> +{
> +	struct sg2042_divider_clock *divider = to_sg2042_clk_divider(hw);
> +	unsigned int val;
> +	unsigned long ret_rate;
> +
> +	if (!(readl(divider->reg) & BIT(3))) {
> +		val = (int)(divider->initval);
> +	} else {
> +		val = readl(divider->reg) >> divider->shift;
> +		val &= clk_div_mask(divider->width);
> +	}
> +
> +	ret_rate = divider_recalc_rate(hw, parent_rate, val, NULL,
> +				       divider->div_flags, divider->width);
> +
> +	pr_debug("--> %s: divider_recalc_rate: ret_rate = %ld\n",
> +		 clk_hw_get_name(hw), ret_rate);
> +	return ret_rate;
> +}
> +
> +static long sg2042_clk_divider_round_rate(struct clk_hw *hw,
> +					  unsigned long rate,
> +					  unsigned long *prate)
> +{
> +	int bestdiv;
> +	unsigned long ret_rate;
> +	struct sg2042_divider_clock *divider = to_sg2042_clk_divider(hw);
> +
> +	/* if read only, just return current value */
> +	if (divider->div_flags & CLK_DIVIDER_READ_ONLY) {
> +		if (!(readl(divider->reg) & BIT(3))) {
> +			bestdiv = (int)(divider->initval);
> +		} else {
> +			bestdiv = readl(divider->reg) >> divider->shift;
> +			bestdiv &= clk_div_mask(divider->width);
> +		}
> +		ret_rate = DIV_ROUND_UP_ULL((u64)*prate, bestdiv);
> +	} else {
> +		ret_rate = divider_round_rate(hw, rate, prate, NULL,
> +					      divider->width, divider->div_flags);
> +	}
> +
> +	pr_debug("--> %s: divider_round_rate: val = %ld\n",
> +		 clk_hw_get_name(hw), ret_rate);
> +	return ret_rate;
> +}
> +
> +static int sg2042_clk_divider_set_rate(struct clk_hw *hw,
> +				       unsigned long rate,
> +				       unsigned long parent_rate)
> +{
> +	unsigned int value;
> +	unsigned int val, val2;
> +	unsigned long flags = 0;
> +	struct sg2042_divider_clock *divider = to_sg2042_clk_divider(hw);
> +
> +	value = divider_get_val(rate, parent_rate, NULL,
> +				divider->width, divider->div_flags);
> +
> +	if (divider->lock)
> +		spin_lock_irqsave(divider->lock, flags);
> +	else
> +		__acquire(divider->lock);
> +
> +	/*
> +	 * The sequence of clock frequency modification is:
> +	 * Assert to reset divider.
> +	 * Modify the value of Clock Divide Factor (and High Wide if needed).
> +	 * De-assert to restore divided clock with new frequency.
> +	 */
> +	val = readl(divider->reg);
> +
> +	/* assert */
> +	val &= ~0x1;
> +	writel(val, divider->reg);
> +
> +	if (divider->div_flags & CLK_DIVIDER_HIWORD_MASK) {
> +		val = clk_div_mask(divider->width) << (divider->shift + 16);
> +	} else {
> +		val = readl(divider->reg);
> +		val &= ~(clk_div_mask(divider->width) << divider->shift);
> +	}
> +	val |= value << divider->shift;
> +	val |= 1 << 3;
> +	writel(val, divider->reg);
> +	val2 = val;
> +
> +	/* de-assert */
> +	val |= 1;
> +	writel(val, divider->reg);
> +
> +	if (divider->lock)
> +		spin_unlock_irqrestore(divider->lock, flags);
> +	else
> +		__release(divider->lock);
> +
> +	pr_debug("--> %s: divider_set_rate: register val = 0x%x\n",
> +		 clk_hw_get_name(hw), val2);
> +	return 0;
> +}
> +
> +static const struct clk_ops sg2042_clk_divider_ops = {
> +	.recalc_rate = sg2042_clk_divider_recalc_rate,
> +	.round_rate = sg2042_clk_divider_round_rate,
> +	.set_rate = sg2042_clk_divider_set_rate,
> +};
> +
> +static const struct clk_ops sg2042_clk_divider_ro_ops = {
> +	.recalc_rate = sg2042_clk_divider_recalc_rate,
> +	.round_rate = sg2042_clk_divider_round_rate,
> +};
> +
> +#define SG2042_PLL(_id, _name, _parent_name, _r_stat, _r_enable, _r_ctrl, _shift) \
> +	{								\
> +		.hw.init = CLK_HW_INIT(					\
> +				_name,					\
> +				_parent_name,				\
> +				&sg2042_clk_pll_ops,			\
> +				CLK_GET_RATE_NOCACHE | CLK_GET_ACCURACY_NOCACHE),\
> +		.id = _id,						\
> +		.offset_ctrl = _r_ctrl,					\
> +		.offset_status = _r_stat,				\
> +		.offset_enable = _r_enable,				\
> +		.shift_status_lock = 8 + (_shift),			\
> +		.shift_status_updating = _shift,			\
> +		.shift_enable = _shift,					\
> +	}
> +
> +#define SG2042_PLL_RO(_id, _name, _parent_name, _r_stat, _r_enable, _r_ctrl, _shift) \
> +	{								\
> +		.hw.init = CLK_HW_INIT(					\
> +				_name,					\
> +				_parent_name,				\
> +				&sg2042_clk_pll_ro_ops,			\
> +				CLK_GET_RATE_NOCACHE | CLK_GET_ACCURACY_NOCACHE),\
> +		.id = _id,						\
> +		.offset_ctrl = _r_ctrl,					\
> +		.offset_status = _r_stat,				\
> +		.offset_enable = _r_enable,				\
> +		.shift_status_lock = 8 + (_shift),			\
> +		.shift_status_updating = _shift,			\
> +		.shift_enable = _shift,					\
> +	}
> +
> +static struct sg2042_pll_clock sg2042_pll_clks[] = {
> +	SG2042_PLL(MPLL_CLK, "mpll_clock", "cgi_main",
> +		   R_PLL_STAT, R_PLL_CLKEN_CONTROL, R_MPLL_CONTROL, 0),
> +	SG2042_PLL_RO(FPLL_CLK, "fpll_clock", "cgi_main",
> +		      R_PLL_STAT, R_PLL_CLKEN_CONTROL, R_FPLL_CONTROL, 3),
> +	SG2042_PLL_RO(DPLL0_CLK, "dpll0_clock", "cgi_dpll0",
> +		      R_PLL_STAT, R_PLL_CLKEN_CONTROL, R_DPLL0_CONTROL, 4),
> +	SG2042_PLL_RO(DPLL1_CLK, "dpll1_clock", "cgi_dpll1",
> +		      R_PLL_STAT, R_PLL_CLKEN_CONTROL, R_DPLL1_CONTROL, 5),
> +};
> +
> +#define SG2042_DIV(_id, _name, _parent_name,				\
> +		  _r_ctrl, _shift, _width,				\
> +		  _div_flag, _initval) {				\
> +		.hw.init = CLK_HW_INIT(					\
> +				_name,					\
> +				_parent_name,				\
> +				&sg2042_clk_divider_ops,		\
> +				0),					\
> +		.id = _id,						\
> +		.offset_ctrl = _r_ctrl,					\
> +		.shift = _shift,					\
> +		.width = _width,					\
> +		.div_flags = _div_flag,					\
> +		.initval = _initval,					\
> +	}
> +
> +#define SG2042_DIV_RO(_id, _name, _parent_name,				\
> +		  _r_ctrl, _shift, _width,				\
> +		  _div_flag, _initval) {				\
> +		.hw.init = CLK_HW_INIT(					\
> +				_name,					\
> +				_parent_name,				\
> +				&sg2042_clk_divider_ro_ops,		\
> +				0),					\
> +		.id = _id,						\
> +		.offset_ctrl = _r_ctrl,					\
> +		.shift = _shift,					\
> +		.width = _width,					\
> +		.div_flags = (_div_flag) | CLK_DIVIDER_READ_ONLY,	\
> +		.initval = _initval,					\
> +	}
> +
> +/*
> + * DIV items in the array are sorted according to the clock-tree diagram,
> + * from top to bottom, from upstream to downstream. Read TRM for details.
> + */
> +#define DEF_DIVFLAG (CLK_DIVIDER_ONE_BASED | CLK_DIVIDER_ALLOW_ZERO)
> +static struct sg2042_divider_clock sg2042_div_clks[] = {
> +	SG2042_DIV_RO(DIV_CLK_DPLL0_DDR01_0,
> +		      "clk_div_ddr01_0", "clk_gate_ddr01_div0",
> +		      R_CLKDIVREG27, 16, 5, DEF_DIVFLAG, 1),
> +	SG2042_DIV_RO(DIV_CLK_FPLL_DDR01_1,
> +		      "clk_div_ddr01_1", "clk_gate_ddr01_div1",
> +		      R_CLKDIVREG28, 16, 5, DEF_DIVFLAG, 1),
> +
> +	SG2042_DIV_RO(DIV_CLK_DPLL1_DDR23_0,
> +		      "clk_div_ddr23_0", "clk_gate_ddr23_div0",
> +		      R_CLKDIVREG29, 16, 5, DEF_DIVFLAG, 1),
> +	SG2042_DIV_RO(DIV_CLK_FPLL_DDR23_1,
> +		      "clk_div_ddr23_1", "clk_gate_ddr23_div1",
> +		      R_CLKDIVREG30, 16, 5, DEF_DIVFLAG, 1),
> +
> +	SG2042_DIV(DIV_CLK_MPLL_RP_CPU_NORMAL_0,
> +		   "clk_div_rp_cpu_normal_0", "clk_gate_rp_cpu_normal_div0",
> +		   R_CLKDIVREG0, 16, 5, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_RP_CPU_NORMAL_1,
> +		   "clk_div_rp_cpu_normal_1", "clk_gate_rp_cpu_normal_div1",
> +		   R_CLKDIVREG1, 16, 5, DEF_DIVFLAG, 1),
> +
> +	SG2042_DIV(DIV_CLK_MPLL_AXI_DDR_0,
> +		   "clk_div_axi_ddr_0", "clk_gate_axi_ddr_div0",
> +		   R_CLKDIVREG25, 16, 5, DEF_DIVFLAG, 2),
> +	SG2042_DIV(DIV_CLK_FPLL_AXI_DDR_1,
> +		   "clk_div_axi_ddr_1", "clk_gate_axi_ddr_div1",
> +		   R_CLKDIVREG26, 16, 5, DEF_DIVFLAG, 1),
> +
> +	SG2042_DIV(DIV_CLK_FPLL_TOP_RP_CMN_DIV2,
> +		   "clk_div_top_rp_cmn_div2", "clk_mux_rp_cpu_normal",
> +		   R_CLKDIVREG3, 16, 16, DEF_DIVFLAG, 2),
> +
> +	SG2042_DIV(DIV_CLK_FPLL_50M_A53, "clk_div_50m_a53", "fpll_clock",
> +		   R_CLKDIVREG2, 16, 8, DEF_DIVFLAG, 20),
> +	/* downstream of div_50m_a53 */
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER1, "clk_div_timer1", "clk_div_50m_a53",
> +		   R_CLKDIVREG6, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER2, "clk_div_timer2", "clk_div_50m_a53",
> +		   R_CLKDIVREG7, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER3, "clk_div_timer3", "clk_div_50m_a53",
> +		   R_CLKDIVREG8, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER4, "clk_div_timer4", "clk_div_50m_a53",
> +		   R_CLKDIVREG9, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER5, "clk_div_timer5", "clk_div_50m_a53",
> +		   R_CLKDIVREG10, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER6, "clk_div_timer6", "clk_div_50m_a53",
> +		   R_CLKDIVREG11, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER7, "clk_div_timer7", "clk_div_50m_a53",
> +		   R_CLKDIVREG12, 16, 16, DEF_DIVFLAG, 1),
> +	SG2042_DIV(DIV_CLK_FPLL_DIV_TIMER8, "clk_div_timer8", "clk_div_50m_a53",
> +		   R_CLKDIVREG13, 16, 16, DEF_DIVFLAG, 1),
> +
> +	/*
> +	 * Set clk_div_uart_500m as RO, because the width of CLKDIVREG4 is too
> +	 * narrow for us to produce 115200. Use UART internal divider directly.
> +	 */
> +	SG2042_DIV_RO(DIV_CLK_FPLL_UART_500M, "clk_div_uart_500m", "fpll_clock",
> +		      R_CLKDIVREG4, 16, 7, DEF_DIVFLAG, 2),
> +	SG2042_DIV(DIV_CLK_FPLL_AHB_LPC, "clk_div_ahb_lpc", "fpll_clock",
> +		   R_CLKDIVREG5, 16, 16, DEF_DIVFLAG, 5),
> +	SG2042_DIV(DIV_CLK_FPLL_EFUSE, "clk_div_efuse", "fpll_clock",
> +		   R_CLKDIVREG14, 16, 7, DEF_DIVFLAG, 40),
> +	SG2042_DIV(DIV_CLK_FPLL_TX_ETH0, "clk_div_tx_eth0", "fpll_clock",
> +		   R_CLKDIVREG16, 16, 11, DEF_DIVFLAG, 8),
> +	SG2042_DIV(DIV_CLK_FPLL_PTP_REF_I_ETH0,
> +		   "clk_div_ptp_ref_i_eth0", "fpll_clock",
> +		   R_CLKDIVREG17, 16, 8, DEF_DIVFLAG, 20),
> +	SG2042_DIV(DIV_CLK_FPLL_REF_ETH0, "clk_div_ref_eth0", "fpll_clock",
> +		   R_CLKDIVREG18, 16, 8, DEF_DIVFLAG, 40),
> +	SG2042_DIV(DIV_CLK_FPLL_EMMC, "clk_div_emmc", "fpll_clock",
> +		   R_CLKDIVREG19, 16, 5, DEF_DIVFLAG, 10),
> +	SG2042_DIV(DIV_CLK_FPLL_SD, "clk_div_sd", "fpll_clock",
> +		   R_CLKDIVREG21, 16, 5, DEF_DIVFLAG, 10),
> +
> +	SG2042_DIV(DIV_CLK_FPLL_TOP_AXI0, "clk_div_top_axi0", "fpll_clock",
> +		   R_CLKDIVREG23, 16, 5, DEF_DIVFLAG, 10),
> +	/* downstream of div_top_axi0 */
> +	SG2042_DIV(DIV_CLK_FPLL_100K_EMMC, "clk_div_100k_emmc", "clk_div_top_axi0",
> +		   R_CLKDIVREG20, 16, 16, DEF_DIVFLAG, 1000),
> +	SG2042_DIV(DIV_CLK_FPLL_100K_SD, "clk_div_100k_sd", "clk_div_top_axi0",
> +		   R_CLKDIVREG22, 16, 16, DEF_DIVFLAG, 1000),
> +	SG2042_DIV(DIV_CLK_FPLL_GPIO_DB, "clk_div_gpio_db", "clk_div_top_axi0",
> +		   R_CLKDIVREG15, 16, 16, DEF_DIVFLAG, 1000),
> +
> +	SG2042_DIV(DIV_CLK_FPLL_TOP_AXI_HSPERI,
> +		   "clk_div_top_axi_hsperi", "fpll_clock",
> +		   R_CLKDIVREG24, 16, 5, DEF_DIVFLAG, 4),
> +};
> +
> +#define SG2042_GATE(_id, _name, _parent_name, _flags,	\
> +		    _r_enable, _bit_idx) {		\
> +		.hw.init = CLK_HW_INIT(			\
> +				_name,			\
> +				_parent_name,		\
> +				NULL,			\
> +				_flags),		\
> +		.id = _id,				\
> +		.offset_enable = _r_enable,		\
> +		.bit_idx = _bit_idx,			\
> +	}
> +
> +/*
> + * GATE items in the array are sorted according to the clock-tree diagram,
> + * from top to bottom, from upstream to downstream. Read TRM for details.
> + */
> +
> +/* Gate clocks which control registers are defined in CLOCK. */
> +static const struct sg2042_gate_clock sg2042_gate_clks[] = {
> +	SG2042_GATE(GATE_CLK_DDR01_DIV0, "clk_gate_ddr01_div0", "dpll0_clock",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED,
> +		    R_CLKDIVREG27, 4),
> +	SG2042_GATE(GATE_CLK_DDR01_DIV1, "clk_gate_ddr01_div1", "fpll_clock",
> +		    CLK_IS_CRITICAL,
> +		    R_CLKDIVREG28, 4),
> +
> +	SG2042_GATE(GATE_CLK_DDR23_DIV0, "clk_gate_ddr23_div0", "dpll1_clock",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED,
> +		    R_CLKDIVREG29, 4),
> +	SG2042_GATE(GATE_CLK_DDR23_DIV1, "clk_gate_ddr23_div1", "fpll_clock",
> +		    CLK_IS_CRITICAL,
> +		    R_CLKDIVREG30, 4),
> +
> +	SG2042_GATE(GATE_CLK_RP_CPU_NORMAL_DIV0, "clk_gate_rp_cpu_normal_div0", "mpll_clock",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKDIVREG0, 4),
> +	SG2042_GATE(GATE_CLK_RP_CPU_NORMAL_DIV1,
> +		    "clk_gate_rp_cpu_normal_div1", "fpll_clock",
> +		    CLK_IS_CRITICAL,
> +		    R_CLKDIVREG1, 4),
> +
> +	SG2042_GATE(GATE_CLK_AXI_DDR_DIV0, "clk_gate_axi_ddr_div0", "mpll_clock",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKDIVREG25, 4),
> +	SG2042_GATE(GATE_CLK_AXI_DDR_DIV1, "clk_gate_axi_ddr_div1", "fpll_clock",
> +		    CLK_IS_CRITICAL,
> +		    R_CLKDIVREG26, 4),
> +
> +	/* upon are gate clocks as input source for the muxes */
> +
> +	SG2042_GATE(GATE_CLK_DDR01, "clk_gate_ddr01", "clk_mux_ddr01",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG1, 14),
> +
> +	SG2042_GATE(GATE_CLK_DDR23, "clk_gate_ddr23", "clk_mux_ddr23",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG1, 15),
> +
> +	SG2042_GATE(GATE_CLK_RP_CPU_NORMAL,
> +		    "clk_gate_rp_cpu_normal", "clk_mux_rp_cpu_normal",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG0, 0),
> +
> +	SG2042_GATE(GATE_CLK_AXI_DDR, "clk_gate_axi_ddr", "clk_mux_axi_ddr",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG1, 13),
> +
> +	/* upon are gate clocks directly downstream of muxes */
> +
> +	/* downstream of clk_div_top_rp_cmn_div2 */
> +	SG2042_GATE(GATE_CLK_TOP_RP_CMN_DIV2,
> +		    "clk_gate_top_rp_cmn_div2", "clk_div_top_rp_cmn_div2",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED, R_CLKENREG0, 2),
> +	SG2042_GATE(GATE_CLK_HSDMA, "clk_gate_hsdma", "clk_gate_top_rp_cmn_div2",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 10),
> +
> +	/*
> +	 * downstream of clk_gate_rp_cpu_normal
> +	 *
> +	 * FIXME: there should be one 1/2 DIV between clk_gate_rp_cpu_normal
> +	 * and clk_gate_axi_pcie0/clk_gate_axi_pcie1.
> +	 * But the 1/2 DIV is fixed and no configurable register exported, so
> +	 * when reading from these two clocks, the rate value are still the
> +	 * same as that of clk_gate_rp_cpu_normal, it's not correct.
> +	 * This just affects the value read.
> +	 */
> +	SG2042_GATE(GATE_CLK_AXI_PCIE0,
> +		    "clk_gate_axi_pcie0", "clk_gate_rp_cpu_normal",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED, R_CLKENREG1, 8),
> +	SG2042_GATE(GATE_CLK_AXI_PCIE1,
> +		    "clk_gate_axi_pcie1", "clk_gate_rp_cpu_normal",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED, R_CLKENREG1, 9),
> +
> +	/* downstream of div_50m_a53 */
> +	SG2042_GATE(GATE_CLK_A53_50M, "clk_gate_a53_50m", "clk_div_50m_a53",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED, R_CLKENREG0, 1),
> +	SG2042_GATE(GATE_CLK_TIMER1, "clk_gate_timer1", "clk_div_timer1",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 12),
> +	SG2042_GATE(GATE_CLK_TIMER2, "clk_gate_timer2", "clk_div_timer2",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 13),
> +	SG2042_GATE(GATE_CLK_TIMER3, "clk_gate_timer3", "clk_div_timer3",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 14),
> +	SG2042_GATE(GATE_CLK_TIMER4, "clk_gate_timer4", "clk_div_timer4",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 15),
> +	SG2042_GATE(GATE_CLK_TIMER5, "clk_gate_timer5", "clk_div_timer5",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 16),
> +	SG2042_GATE(GATE_CLK_TIMER6, "clk_gate_timer6", "clk_div_timer6",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 17),
> +	SG2042_GATE(GATE_CLK_TIMER7, "clk_gate_timer7", "clk_div_timer7",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 18),
> +	SG2042_GATE(GATE_CLK_TIMER8, "clk_gate_timer8", "clk_div_timer8",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 19),
> +
> +	/* gate clocks downstream from div clocks one-to-one */
> +	SG2042_GATE(GATE_CLK_UART_500M, "clk_gate_uart_500m", "clk_div_uart_500m",
> +		    CLK_SET_RATE_PARENT | CLK_IGNORE_UNUSED, R_CLKENREG0, 4),
> +	SG2042_GATE(GATE_CLK_AHB_LPC, "clk_gate_ahb_lpc", "clk_div_ahb_lpc",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 7),
> +	SG2042_GATE(GATE_CLK_EFUSE, "clk_gate_efuse", "clk_div_efuse",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 20),
> +	SG2042_GATE(GATE_CLK_TX_ETH0, "clk_gate_tx_eth0", "clk_div_tx_eth0",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 30),
> +	SG2042_GATE(GATE_CLK_PTP_REF_I_ETH0,
> +		    "clk_gate_ptp_ref_i_eth0", "clk_div_ptp_ref_i_eth0",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 0),
> +	SG2042_GATE(GATE_CLK_REF_ETH0, "clk_gate_ref_eth0", "clk_div_ref_eth0",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 1),
> +	SG2042_GATE(GATE_CLK_EMMC_100M, "clk_gate_emmc", "clk_div_emmc",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 3),
> +	SG2042_GATE(GATE_CLK_SD_100M, "clk_gate_sd", "clk_div_sd",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 6),
> +
> +	/* downstream of clk_div_top_axi0 */
> +	SG2042_GATE(GATE_CLK_AHB_ROM, "clk_gate_ahb_rom", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 8),
> +	SG2042_GATE(GATE_CLK_AHB_SF, "clk_gate_ahb_sf", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 9),
> +	SG2042_GATE(GATE_CLK_AXI_SRAM, "clk_gate_axi_sram", "clk_div_top_axi0",
> +		    CLK_IGNORE_UNUSED, R_CLKENREG0, 10),
> +	SG2042_GATE(GATE_CLK_APB_TIMER, "clk_gate_apb_timer", "clk_div_top_axi0",
> +		    CLK_IGNORE_UNUSED, R_CLKENREG0, 11),
> +	SG2042_GATE(GATE_CLK_APB_EFUSE, "clk_gate_apb_efuse", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 21),
> +	SG2042_GATE(GATE_CLK_APB_GPIO, "clk_gate_apb_gpio", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 22),
> +	SG2042_GATE(GATE_CLK_APB_GPIO_INTR,
> +		    "clk_gate_apb_gpio_intr", "clk_div_top_axi0",
> +		    CLK_IS_CRITICAL, R_CLKENREG0, 23),
> +	SG2042_GATE(GATE_CLK_APB_I2C, "clk_gate_apb_i2c", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 26),
> +	SG2042_GATE(GATE_CLK_APB_WDT, "clk_gate_apb_wdt", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 27),
> +	SG2042_GATE(GATE_CLK_APB_PWM, "clk_gate_apb_pwm", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 28),
> +	SG2042_GATE(GATE_CLK_APB_RTC, "clk_gate_apb_rtc", "clk_div_top_axi0",
> +		    0, R_CLKENREG0, 29),
> +	SG2042_GATE(GATE_CLK_TOP_AXI0, "clk_gate_top_axi0", "clk_div_top_axi0",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG1, 11),
> +	/* downstream of DIV clocks which are sourced from clk_div_top_axi0 */
> +	SG2042_GATE(GATE_CLK_GPIO_DB, "clk_gate_gpio_db", "clk_div_gpio_db",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 24),
> +	SG2042_GATE(GATE_CLK_100K_EMMC, "clk_gate_100k_emmc", "clk_div_100k_emmc",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 4),
> +	SG2042_GATE(GATE_CLK_100K_SD, "clk_gate_100k_sd", "clk_div_100k_sd",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 7),
> +
> +	/* downstream of clk_div_top_axi_hsperi */
> +	SG2042_GATE(GATE_CLK_SYSDMA_AXI,
> +		    "clk_gate_sysdma_axi", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 3),
> +	SG2042_GATE(GATE_CLK_APB_UART,
> +		    "clk_gate_apb_uart", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 5),
> +	SG2042_GATE(GATE_CLK_AXI_DBG_I2C,
> +		    "clk_gate_axi_dbg_i2c", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 6),
> +	SG2042_GATE(GATE_CLK_APB_SPI,
> +		    "clk_gate_apb_spi", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 25),
> +	SG2042_GATE(GATE_CLK_AXI_ETH0,
> +		    "clk_gate_axi_eth0", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG0, 31),
> +	SG2042_GATE(GATE_CLK_AXI_EMMC,
> +		    "clk_gate_axi_emmc", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 2),
> +	SG2042_GATE(GATE_CLK_AXI_SD,
> +		    "clk_gate_axi_sd", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT, R_CLKENREG1, 5),
> +	SG2042_GATE(GATE_CLK_TOP_AXI_HSPERI,
> +		    "clk_gate_top_axi_hsperi", "clk_div_top_axi_hsperi",
> +		    CLK_SET_RATE_PARENT | CLK_IS_CRITICAL,
> +		    R_CLKENREG1, 12),
> +};
> +
> +/*
> + * Gate clocks for RP subsystem (including the MP subsystem), which control
> + * registers are defined in SYS_CTRL.
> + */
> +static const struct sg2042_gate_clock sg2042_gate_rp[] = {
> +	/* downstream of clk_gate_rp_cpu_normal about rxu */
> +	SG2042_GATE(GATE_CLK_RXU0, "clk_gate_rxu0", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 0),
> +	SG2042_GATE(GATE_CLK_RXU1, "clk_gate_rxu1", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 1),
> +	SG2042_GATE(GATE_CLK_RXU2, "clk_gate_rxu2", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 2),
> +	SG2042_GATE(GATE_CLK_RXU3, "clk_gate_rxu3", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 3),
> +	SG2042_GATE(GATE_CLK_RXU4, "clk_gate_rxu4", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 4),
> +	SG2042_GATE(GATE_CLK_RXU5, "clk_gate_rxu5", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 5),
> +	SG2042_GATE(GATE_CLK_RXU6, "clk_gate_rxu6", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 6),
> +	SG2042_GATE(GATE_CLK_RXU7, "clk_gate_rxu7", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 7),
> +	SG2042_GATE(GATE_CLK_RXU8, "clk_gate_rxu8", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 8),
> +	SG2042_GATE(GATE_CLK_RXU9, "clk_gate_rxu9", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 9),
> +	SG2042_GATE(GATE_CLK_RXU10, "clk_gate_rxu10", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 10),
> +	SG2042_GATE(GATE_CLK_RXU11, "clk_gate_rxu11", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 11),
> +	SG2042_GATE(GATE_CLK_RXU12, "clk_gate_rxu12", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 12),
> +	SG2042_GATE(GATE_CLK_RXU13, "clk_gate_rxu13", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 13),
> +	SG2042_GATE(GATE_CLK_RXU14, "clk_gate_rxu14", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 14),
> +	SG2042_GATE(GATE_CLK_RXU15, "clk_gate_rxu15", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 15),
> +	SG2042_GATE(GATE_CLK_RXU16, "clk_gate_rxu16", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 16),
> +	SG2042_GATE(GATE_CLK_RXU17, "clk_gate_rxu17", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 17),
> +	SG2042_GATE(GATE_CLK_RXU18, "clk_gate_rxu18", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 18),
> +	SG2042_GATE(GATE_CLK_RXU19, "clk_gate_rxu19", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 19),
> +	SG2042_GATE(GATE_CLK_RXU20, "clk_gate_rxu20", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 20),
> +	SG2042_GATE(GATE_CLK_RXU21, "clk_gate_rxu21", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 21),
> +	SG2042_GATE(GATE_CLK_RXU22, "clk_gate_rxu22", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 22),
> +	SG2042_GATE(GATE_CLK_RXU23, "clk_gate_rxu23", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 23),
> +	SG2042_GATE(GATE_CLK_RXU24, "clk_gate_rxu24", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 24),
> +	SG2042_GATE(GATE_CLK_RXU25, "clk_gate_rxu25", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 25),
> +	SG2042_GATE(GATE_CLK_RXU26, "clk_gate_rxu26", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 26),
> +	SG2042_GATE(GATE_CLK_RXU27, "clk_gate_rxu27", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 27),
> +	SG2042_GATE(GATE_CLK_RXU28, "clk_gate_rxu28", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 28),
> +	SG2042_GATE(GATE_CLK_RXU29, "clk_gate_rxu29", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 29),
> +	SG2042_GATE(GATE_CLK_RXU30, "clk_gate_rxu30", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 30),
> +	SG2042_GATE(GATE_CLK_RXU31, "clk_gate_rxu31", "clk_gate_rp_cpu_normal",
> +		    0, R_RP_RXU_CLK_ENABLE, 31),
> +
> +	/* downstream of clk_gate_rp_cpu_normal about mp */
> +	SG2042_GATE(GATE_CLK_MP0, "clk_gate_mp0", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP0_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP1, "clk_gate_mp1", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP1_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP2, "clk_gate_mp2", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP2_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP3, "clk_gate_mp3", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP3_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP4, "clk_gate_mp4", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP4_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP5, "clk_gate_mp5", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP5_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP6, "clk_gate_mp6", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP6_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP7, "clk_gate_mp7", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP7_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP8, "clk_gate_mp8", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP8_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP9, "clk_gate_mp9", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP9_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP10, "clk_gate_mp10", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP10_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP11, "clk_gate_mp11", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP11_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP12, "clk_gate_mp12", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP12_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP13, "clk_gate_mp13", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP13_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP14, "clk_gate_mp14", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP14_CONTROL_REG, 0),
> +	SG2042_GATE(GATE_CLK_MP15, "clk_gate_mp15", "clk_gate_rp_cpu_normal",
> +		    CLK_IS_CRITICAL, R_MP15_CONTROL_REG, 0),
> +};
> +
> +#define SG2042_MUX(_id, _name, _parent_names, _flags, _r_select, _shift, _width) { \
> +		.hw.init = CLK_HW_INIT_PARENTS(			\
> +				_name,				\
> +				_parent_names,			\
> +				NULL,				\
> +				_flags),			\
> +		.id = _id,					\
> +		.offset_select = _r_select,			\
> +		.shift = _shift,				\
> +		.width = _width,				\
> +	}
> +
> +/*
> + * Note: regarding names for mux clock, "0/1" or "div0/div1" means the
> + * first/second parent input source, not the register value.
> + * For example:
> + * "clk_div_ddr01_0" is the name of Clock divider 0 control of DDR01, and
> + * "clk_gate_ddr01_div0" is the gate clock in front of the "clk_div_ddr01_0",
> + * they are both controlled by register CLKDIVREG27;
> + * "clk_div_ddr01_1" is the name of Clock divider 1 control of DDR01, and
> + * "clk_gate_ddr01_div1" is the gate clock in front of the "clk_div_ddr01_1",
> + * they are both controlled by register CLKDIVREG28;
> + * While for register value of mux selection, use Clock Select for DDR01’s clock
> + * as example, see CLKSELREG0, bit[2].
> + * 1: Select in_dpll0_clk as clock source, correspondng to the parent input
> + *    source from "clk_div_ddr01_0".
> + * 0: Select in_fpll_clk as clock source, corresponding to the parent input
> + *    source from "clk_div_ddr01_1".
> + * So we need a table to define the array of register values corresponding to
> + * the parent index and tell CCF about this when registering mux clock.
> + */
> +static const u32 sg2042_mux_table[] = {1, 0};
> +
> +static const char *const clk_mux_ddr01_p[] = {
> +			"clk_div_ddr01_0", "clk_div_ddr01_1"};
> +static const char *const clk_mux_ddr23_p[] = {
> +			"clk_div_ddr23_0", "clk_div_ddr23_1"};
> +static const char *const clk_mux_rp_cpu_normal_p[] = {
> +			"clk_div_rp_cpu_normal_0", "clk_div_rp_cpu_normal_1"};
> +static const char *const clk_mux_axi_ddr_p[] = {
> +			"clk_div_axi_ddr_0", "clk_div_axi_ddr_1"};
> +
> +static struct sg2042_mux_clock sg2042_mux_clks[] = {
> +	SG2042_MUX(MUX_CLK_DDR01, "clk_mux_ddr01", clk_mux_ddr01_p,
> +		   CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT | CLK_MUX_READ_ONLY,
> +		   R_CLKSELREG0, 2, 1),
> +	SG2042_MUX(MUX_CLK_DDR23, "clk_mux_ddr23", clk_mux_ddr23_p,
> +		   CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT | CLK_MUX_READ_ONLY,
> +		   R_CLKSELREG0, 3, 1),
> +	SG2042_MUX(MUX_CLK_RP_CPU_NORMAL, "clk_mux_rp_cpu_normal", clk_mux_rp_cpu_normal_p,
> +		   CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT,
> +		   R_CLKSELREG0, 0, 1),
> +	SG2042_MUX(MUX_CLK_AXI_DDR, "clk_mux_axi_ddr", clk_mux_axi_ddr_p,
> +		   CLK_SET_RATE_PARENT | CLK_SET_RATE_NO_REPARENT,
> +		   R_CLKSELREG0, 1, 1),
> +};
> +
> +static DEFINE_SPINLOCK(sg2042_clk_lock);
> +
> +static int sg2042_clk_register_plls(struct sg2042_clk_data *clk_data,
> +				    struct sg2042_pll_clock pll_clks[],
> +				    int num_pll_clks)
> +{
> +	struct clk_hw *hw;
> +	struct sg2042_pll_clock *pll;
> +	int i, ret = 0;
> +
> +	for (i = 0; i < num_pll_clks; i++) {
> +		pll = &pll_clks[i];
> +		/* assign these for ops usage during registration */
> +		pll->base = clk_data->iobase;
> +		pll->lock = &sg2042_clk_lock;
> +
> +		hw = &pll->hw;
> +		ret = clk_hw_register(NULL, hw);
> +		if (ret) {
> +			pr_err("failed to register clock %s\n", pll->hw.init->name);
> +			break;
> +		}
> +
> +		clk_data->onecell_data.hws[pll->id] = hw;
> +	}
> +
> +	/* leave unregister to outside if failed */
> +	return ret;
> +}
> +
> +static int sg2042_clk_register_divs(struct sg2042_clk_data *clk_data,
> +				    struct sg2042_divider_clock div_clks[],
> +				    int num_div_clks)
> +{
> +	struct clk_hw *hw;
> +	struct sg2042_divider_clock *div;
> +	int i, ret = 0;
> +
> +	for (i = 0; i < num_div_clks; i++) {
> +		div = &div_clks[i];
> +
> +		if (div->div_flags & CLK_DIVIDER_HIWORD_MASK) {
> +			if (div->width + div->shift > 16) {
> +				pr_warn("divider value exceeds LOWORD field\n");
> +				ret = -EINVAL;
> +				break;
> +			}
> +		}
> +
> +		div->reg = clk_data->iobase + div->offset_ctrl;
> +		div->lock = &sg2042_clk_lock;
> +
> +		hw = &div->hw;
> +		ret = clk_hw_register(NULL, hw);
> +		if (ret) {
> +			pr_err("failed to register clock %s\n", div->hw.init->name);
> +			break;
> +		}
> +
> +		clk_data->onecell_data.hws[div->id] = hw;
> +	}
> +
> +	/* leave unregister to outside if failed */
> +	return ret;
> +}
> +
> +static int sg2042_clk_register_gates(struct sg2042_clk_data *clk_data,
> +				     const struct sg2042_gate_clock gate_clks[],
> +				     int num_gate_clks)
> +{
> +	struct clk_hw *hw;
> +	const struct sg2042_gate_clock *gate;
> +	int i, ret = 0;
> +
> +	for (i = 0; i < num_gate_clks; i++) {
> +		gate = &gate_clks[i];
> +		hw = clk_hw_register_gate(NULL,
> +					  gate->hw.init->name,
> +					  gate->hw.init->parent_names[0],
> +					  gate->hw.init->flags,
> +					  clk_data->iobase + gate->offset_enable,
> +					  gate->bit_idx,
> +					  0,
> +					  &sg2042_clk_lock);
> +		if (IS_ERR(hw)) {
> +			pr_err("failed to register clock %s\n", gate->hw.init->name);
> +			ret = PTR_ERR(hw);
> +			break;
> +		}
> +
> +		clk_data->onecell_data.hws[gate->id] = hw;
> +	}
> +
> +	/* leave unregister to outside if failed */
> +	return ret;
> +}
> +
> +static int sg2042_mux_notifier_cb(struct notifier_block *nb,
> +				  unsigned long event,
> +				  void *data)
> +{
> +	int ret = 0;
> +	struct clk_notifier_data *ndata = data;
> +	struct clk_hw *hw = __clk_get_hw(ndata->clk);
> +	const struct clk_ops *ops = &clk_mux_ops;
> +	struct sg2042_mux_clock *mux = to_sg2042_mux_nb(nb);
> +
> +	/* To switch to fpll before changing rate and restore after that */
> +	if (event == PRE_RATE_CHANGE) {
> +		mux->original_index = ops->get_parent(hw);
> +
> +		/*
> +		 * "1" is the array index of the second parent input source of
> +		 * mux. For SG2042, it's fpll for all mux clocks.
> +		 * "0" is the array index of the frist parent input source of
> +		 * mux, For SG2042, it's mpll.
> +		 * FIXME, any good idea to avoid magic number?
> +		 */
> +		if (mux->original_index == 0)
> +			ret = ops->set_parent(hw, 1);
> +	} else if (event == POST_RATE_CHANGE) {
> +		ret = ops->set_parent(hw, mux->original_index);
> +	}
> +
> +	return notifier_from_errno(ret);
> +}
> +
> +static int sg2042_clk_register_muxs(struct sg2042_clk_data *clk_data,
> +				    struct sg2042_mux_clock mux_clks[],
> +				    int num_mux_clks)
> +{
> +	struct clk_hw *hw;
> +	struct sg2042_mux_clock *mux;
> +	int i, ret = 0;
> +
> +	for (i = 0; i < num_mux_clks; i++) {
> +		mux = &mux_clks[i];
> +
> +		hw = clk_hw_register_mux_table(NULL,
> +					       mux->hw.init->name,
> +					       mux->hw.init->parent_names,
> +					       mux->hw.init->num_parents,
> +					       mux->hw.init->flags,
> +					       clk_data->iobase + mux->offset_select,
> +					       mux->shift,
> +					       BIT(mux->width) - 1,
> +					       0,
> +					       sg2042_mux_table,
> +					       &sg2042_clk_lock);
> +		if (IS_ERR(hw)) {
> +			pr_err("failed to register clock %s\n", mux->hw.init->name);
> +			ret = PTR_ERR(hw);
> +			break;
> +		}
> +
> +		clk_data->onecell_data.hws[mux->id] = hw;
> +
> +		/*
> +		 * FIXME: Theoretically, we should set parent for the
> +		 * mux, but seems hardware has done this for us with
> +		 * default value, so we don't set parent again here.
> +		 */
> +
> +		if (!(mux->hw.init->flags & CLK_MUX_READ_ONLY)) {
> +			mux->clk_nb.notifier_call = sg2042_mux_notifier_cb;
> +			ret = clk_notifier_register(hw->clk, &mux->clk_nb);
> +			if (ret) {
> +				pr_err("failed to register clock notifier for %s\n",
> +				       mux->hw.init->name);
> +				goto error_cleanup;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +
> +error_cleanup:
> +	/* unregister notifier and release the memory allocated */
> +	for (i = 0; i < num_mux_clks; i++) {
> +		mux = &mux_clks[i];
> +
> +		hw = clk_data->onecell_data.hws[mux->id];
> +
> +		if (hw)
> +			clk_notifier_unregister(hw->clk, &mux->clk_nb);
> +	}
> +
> +	/* leave clk unregister to outside if failed */
> +	return ret;
> +}
> +
> +static int sg2042_init_clkdata(struct platform_device *pdev,
> +			       int num_clks,
> +			       struct sg2042_clk_data **pp_clk_data)
> +{
> +	struct sg2042_clk_data *clk_data = NULL;
> +
> +	clk_data = devm_kzalloc(&pdev->dev,
> +				struct_size(clk_data, onecell_data.hws, num_clks),
> +				GFP_KERNEL);
> +	if (!clk_data)
> +		return -ENOMEM;
> +
> +	clk_data->iobase = devm_of_iomap(&pdev->dev, pdev->dev.of_node, 0, NULL);
> +	if (WARN_ON(IS_ERR(clk_data->iobase)))
> +		return PTR_ERR(clk_data->iobase);
> +
> +	clk_data->onecell_data.num = num_clks;
> +
> +	*pp_clk_data = clk_data;
> +
> +	return 0;
> +}
> +
> +static int sg2042_clkgen_probe(struct platform_device *pdev)
> +{
> +	struct sg2042_clk_data *clk_data = NULL;
> +	int i, ret = 0;
> +	int num_clks = 0;
> +
> +	num_clks = ARRAY_SIZE(sg2042_div_clks) +
> +		   ARRAY_SIZE(sg2042_gate_clks) +
> +		   ARRAY_SIZE(sg2042_mux_clks);
> +	if (num_clks == 0) {
> +		ret = -EINVAL;
> +		goto error_out;
> +	}
> +
> +	ret = sg2042_init_clkdata(pdev, num_clks, &clk_data);
> +	if (ret < 0)
> +		goto error_out;
> +
> +	ret = sg2042_clk_register_divs(clk_data, sg2042_div_clks,
> +				       ARRAY_SIZE(sg2042_div_clks));
> +	if (ret)
> +		goto cleanup;
> +
> +	ret = sg2042_clk_register_gates(clk_data, sg2042_gate_clks,
> +					ARRAY_SIZE(sg2042_gate_clks));
> +	if (ret)
> +		goto cleanup;
> +
> +	ret = sg2042_clk_register_muxs(clk_data, sg2042_mux_clks,
> +				       ARRAY_SIZE(sg2042_mux_clks));
> +	if (ret)
> +		goto cleanup;
> +
> +	return devm_of_clk_add_hw_provider(&pdev->dev,
> +					   of_clk_hw_onecell_get,
> +					   &clk_data->onecell_data);
> +
> +cleanup:
> +	for (i = 0; i < num_clks; i++) {
> +		if (clk_data->onecell_data.hws[i])
> +			clk_hw_unregister(clk_data->onecell_data.hws[i]);
> +	}
> +
> +error_out:
> +	pr_err("%s failed error number %d\n", __func__, ret);
> +	return ret;
> +}
> +
> +static int sg2042_rpgate_probe(struct platform_device *pdev)
> +{
> +	struct sg2042_clk_data *clk_data = NULL;
> +	int i, ret = 0;
> +	int num_clks = 0;
> +
> +	num_clks = ARRAY_SIZE(sg2042_gate_rp);
> +	if (num_clks == 0) {
> +		ret = -EINVAL;
> +		goto error_out;
> +	}
> +
> +	ret = sg2042_init_clkdata(pdev, num_clks, &clk_data);
> +	if (ret < 0)
> +		goto error_out;
> +
> +	ret = sg2042_clk_register_gates(clk_data, sg2042_gate_rp,
> +					ARRAY_SIZE(sg2042_gate_rp));
> +	if (ret)
> +		goto cleanup;
> +
> +	return devm_of_clk_add_hw_provider(&pdev->dev,
> +					   of_clk_hw_onecell_get,
> +					   &clk_data->onecell_data);
> +
> +cleanup:
> +	for (i = 0; i < num_clks; i++) {
> +		if (clk_data->onecell_data.hws[i])
> +			clk_hw_unregister(clk_data->onecell_data.hws[i]);
> +	}
> +
> +error_out:
> +	pr_err("%s failed error number %d\n", __func__, ret);
> +	return ret;
> +}
> +
> +static int sg2042_pll_probe(struct platform_device *pdev)
> +{
> +	struct sg2042_clk_data *clk_data = NULL;
> +	int i, ret = 0;
> +	int num_clks = 0;
> +
> +	num_clks = ARRAY_SIZE(sg2042_pll_clks);
> +	if (num_clks == 0) {
> +		ret = -EINVAL;
> +		goto error_out;
> +	}
> +
> +	ret = sg2042_init_clkdata(pdev, num_clks, &clk_data);
> +	if (ret < 0)
> +		goto error_out;
> +
> +	ret = sg2042_clk_register_plls(clk_data, sg2042_pll_clks,
> +				       ARRAY_SIZE(sg2042_pll_clks));
> +	if (ret)
> +		goto cleanup;
> +
> +	return devm_of_clk_add_hw_provider(&pdev->dev,
> +					   of_clk_hw_onecell_get,
> +					   &clk_data->onecell_data);
> +
> +cleanup:
> +	for (i = 0; i < num_clks; i++) {
> +		if (clk_data->onecell_data.hws[i])
> +			clk_hw_unregister(clk_data->onecell_data.hws[i]);
> +	}
> +
> +error_out:
> +	pr_err("%s failed error number %d\n", __func__, ret);
> +	return ret;
> +}
> +
> +static const struct of_device_id sg2042_clkgen_match[] = {
> +	{ .compatible = "sophgo,sg2042-clkgen" },
> +	{ /* sentinel */ }
> +};
> +
> +static struct platform_driver sg2042_clkgen_driver = {
> +	.probe = sg2042_clkgen_probe,
> +	.driver = {
> +		.name = "clk-sophgo-sg2042-clkgen",
> +		.of_match_table = sg2042_clkgen_match,
> +		.suppress_bind_attrs = true,
> +	},
> +};
> +builtin_platform_driver(sg2042_clkgen_driver);
> +
> +static const struct of_device_id sg2042_rpgate_match[] = {
> +	{ .compatible = "sophgo,sg2042-rpgate" },
> +	{ /* sentinel */ }
> +};
> +
> +static struct platform_driver sg2042_rpgate_driver = {
> +	.probe = sg2042_rpgate_probe,
> +	.driver = {
> +		.name = "clk-sophgo-sg2042-rpgate",
> +		.of_match_table = sg2042_rpgate_match,
> +		.suppress_bind_attrs = true,
> +	},
> +};
> +builtin_platform_driver(sg2042_rpgate_driver);
> +
> +static const struct of_device_id sg2042_pll_match[] = {
> +	{ .compatible = "sophgo,sg2042-pll" },
> +	{ /* sentinel */ }
> +};
> +
> +static struct platform_driver sg2042_pll_driver = {
> +	.probe = sg2042_pll_probe,
> +	.driver = {
> +		.name = "clk-sophgo-sg2042-pll",
> +		.of_match_table = sg2042_pll_match,
> +		.suppress_bind_attrs = true,
> +	},
> +};
> +builtin_platform_driver(sg2042_pll_driver);
> diff --git a/drivers/clk/sophgo/clk-sophgo-sg2042.h b/drivers/clk/sophgo/clk-sophgo-sg2042.h
> new file mode 100644
> index 000000000000..407fec6f3f02
> --- /dev/null
> +++ b/drivers/clk/sophgo/clk-sophgo-sg2042.h
> @@ -0,0 +1,216 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __CLK_SOPHGO_SG2042_H
> +#define __CLK_SOPHGO_SG2042_H
> +
> +/* Registers defined in SYS_CTRL */
> +#define R_PLL_BEGIN		0xC0
> +#define R_PLL_STAT		(0xC0 - R_PLL_BEGIN)
> +#define R_PLL_CLKEN_CONTROL	(0xC4 - R_PLL_BEGIN)
> +#define R_MPLL_CONTROL		(0xE8 - R_PLL_BEGIN)
> +#define R_FPLL_CONTROL		(0xF4 - R_PLL_BEGIN)
> +#define R_DPLL0_CONTROL		(0xF8 - R_PLL_BEGIN)
> +#define R_DPLL1_CONTROL		(0xFC - R_PLL_BEGIN)
> +
> +#define R_SYSGATE_BEGIN		0x0368
> +#define R_RP_RXU_CLK_ENABLE	(0x0368 - R_SYSGATE_BEGIN)
> +#define R_MP0_STATUS_REG	(0x0380 - R_SYSGATE_BEGIN)
> +#define R_MP0_CONTROL_REG	(0x0384 - R_SYSGATE_BEGIN)
> +#define R_MP1_STATUS_REG	(0x0388 - R_SYSGATE_BEGIN)
> +#define R_MP1_CONTROL_REG	(0x038C - R_SYSGATE_BEGIN)
> +#define R_MP2_STATUS_REG	(0x0390 - R_SYSGATE_BEGIN)
> +#define R_MP2_CONTROL_REG	(0x0394 - R_SYSGATE_BEGIN)
> +#define R_MP3_STATUS_REG	(0x0398 - R_SYSGATE_BEGIN)
> +#define R_MP3_CONTROL_REG	(0x039C - R_SYSGATE_BEGIN)
> +#define R_MP4_STATUS_REG	(0x03A0 - R_SYSGATE_BEGIN)
> +#define R_MP4_CONTROL_REG	(0x03A4 - R_SYSGATE_BEGIN)
> +#define R_MP5_STATUS_REG	(0x03A8 - R_SYSGATE_BEGIN)
> +#define R_MP5_CONTROL_REG	(0x03AC - R_SYSGATE_BEGIN)
> +#define R_MP6_STATUS_REG	(0x03B0 - R_SYSGATE_BEGIN)
> +#define R_MP6_CONTROL_REG	(0x03B4 - R_SYSGATE_BEGIN)
> +#define R_MP7_STATUS_REG	(0x03B8 - R_SYSGATE_BEGIN)
> +#define R_MP7_CONTROL_REG	(0x03BC - R_SYSGATE_BEGIN)
> +#define R_MP8_STATUS_REG	(0x03C0 - R_SYSGATE_BEGIN)
> +#define R_MP8_CONTROL_REG	(0x03C4 - R_SYSGATE_BEGIN)
> +#define R_MP9_STATUS_REG	(0x03C8 - R_SYSGATE_BEGIN)
> +#define R_MP9_CONTROL_REG	(0x03CC - R_SYSGATE_BEGIN)
> +#define R_MP10_STATUS_REG	(0x03D0 - R_SYSGATE_BEGIN)
> +#define R_MP10_CONTROL_REG	(0x03D4 - R_SYSGATE_BEGIN)
> +#define R_MP11_STATUS_REG	(0x03D8 - R_SYSGATE_BEGIN)
> +#define R_MP11_CONTROL_REG	(0x03DC - R_SYSGATE_BEGIN)
> +#define R_MP12_STATUS_REG	(0x03E0 - R_SYSGATE_BEGIN)
> +#define R_MP12_CONTROL_REG	(0x03E4 - R_SYSGATE_BEGIN)
> +#define R_MP13_STATUS_REG	(0x03E8 - R_SYSGATE_BEGIN)
> +#define R_MP13_CONTROL_REG	(0x03EC - R_SYSGATE_BEGIN)
> +#define R_MP14_STATUS_REG	(0x03F0 - R_SYSGATE_BEGIN)
> +#define R_MP14_CONTROL_REG	(0x03F4 - R_SYSGATE_BEGIN)
> +#define R_MP15_STATUS_REG	(0x03F8 - R_SYSGATE_BEGIN)
> +#define R_MP15_CONTROL_REG	(0x03FC - R_SYSGATE_BEGIN)
> +
> +/* Registers defined in CLOCK */
> +#define R_CLKENREG0		0x00
> +#define R_CLKENREG1		0x04
> +#define R_CLKSELREG0		0x20
> +#define R_CLKDIVREG0		0x40
> +#define R_CLKDIVREG1		0x44
> +#define R_CLKDIVREG2		0x48
> +#define R_CLKDIVREG3		0x4C
> +#define R_CLKDIVREG4		0x50
> +#define R_CLKDIVREG5		0x54
> +#define R_CLKDIVREG6		0x58
> +#define R_CLKDIVREG7		0x5C
> +#define R_CLKDIVREG8		0x60
> +#define R_CLKDIVREG9		0x64
> +#define R_CLKDIVREG10		0x68
> +#define R_CLKDIVREG11		0x6C
> +#define R_CLKDIVREG12		0x70
> +#define R_CLKDIVREG13		0x74
> +#define R_CLKDIVREG14		0x78
> +#define R_CLKDIVREG15		0x7C
> +#define R_CLKDIVREG16		0x80
> +#define R_CLKDIVREG17		0x84
> +#define R_CLKDIVREG18		0x88
> +#define R_CLKDIVREG19		0x8C
> +#define R_CLKDIVREG20		0x90
> +#define R_CLKDIVREG21		0x94
> +#define R_CLKDIVREG22		0x98
> +#define R_CLKDIVREG23		0x9C
> +#define R_CLKDIVREG24		0xA0
> +#define R_CLKDIVREG25		0xA4
> +#define R_CLKDIVREG26		0xA8
> +#define R_CLKDIVREG27		0xAC
> +#define R_CLKDIVREG28		0xB0
> +#define R_CLKDIVREG29		0xB4
> +#define R_CLKDIVREG30		0xB8
> +
> +/*
> + * Common data of clock-controller
> + * Note: this structure will be used both by clkgen & sysclk.
> + * @iobase: base address of clock-controller
> + * @onecell_data: used for adding providers.
> + */
> +struct sg2042_clk_data {
> +	void __iomem *iobase;
> +	struct clk_hw_onecell_data onecell_data;
> +};
> +
> +/*
> + * PLL clock
> + * @hw:				clk_hw for initialization
> + * @id:				used to map clk_onecell_data
> + * @base:			used for readl/writel.
> + *				**NOTE**: PLL registers are all in SYS_CTRL!
> + * @lock:			spinlock to protect register access, modification
> + *				of frequency can only be served one at the time.
> + * @offset_status:		offset of pll status registers
> + * @offset_enable:		offset of pll enable registers
> + * @offset_ctrl:		offset of pll control registers
> + * @shift_status_lock:		shift of XXX_LOCK in pll status register
> + * @shift_status_updating:	shift of UPDATING_XXX in pll status register
> + * @shift_enable:		shift of XXX_CLK_EN in pll enable register
> + */
> +struct sg2042_pll_clock {
> +	struct clk_hw hw;
> +
> +	unsigned int id;
> +	void __iomem *base;
> +	/* protect register access */
> +	spinlock_t *lock;
> +
> +	u32 offset_status;
> +	u32 offset_enable;
> +	u32 offset_ctrl;
> +	u8 shift_status_lock;
> +	u8 shift_status_updating;
> +	u8 shift_enable;
> +};
> +
> +#define to_sg2042_pll_clk(_hw) container_of(_hw, struct sg2042_pll_clock, hw)
> +
> +/*
> + * Divider clock
> + * @hw:			clk_hw for initialization
> + * @id:			used to map clk_onecell_data
> + * @reg:		used for readl/writel.
> + *			**NOTE**: DIV registers are ALL in CLOCK!
> + * @lock:		spinlock to protect register access, modification of
> + *			frequency can only be served one at the time
> + * @offset_ctrl:	offset of divider control registers
> + * @shift:		shift of "Clock Divider Factor" in divider control register
> + * @width:		width of "Clock Divider Factor" in divider control register
> + * @div_flags:		private flags for this clock, not for framework-specific
> + * @initval:		In the divider control register, we can configure whether
> + *			to use the value of "Clock Divider Factor" or just use
> + *			the initial value pre-configured by IC. BIT[3] controls
> + *			this and by default (value is 0), means initial value
> + *			is used.
> + *			**NOTE** that we cannot read the initial value (default
> + *			value when poweron) and default value of "Clock Divider
> + *			Factor" is zero, which I think is a hardware design flaw
> + *			and should be sync-ed with the initial value. So in
> + *			software we have to add a configuration item (initval)
> + *			to manually configure this value and use it when BIT[3]
> + *			is zero.
> + */
> +struct sg2042_divider_clock {
> +	struct clk_hw hw;
> +
> +	unsigned int id;
> +
> +	void __iomem *reg;
> +	/* protect register access */
> +	spinlock_t *lock;
> +
> +	unsigned long offset_ctrl;
> +	u8 shift;
> +	u8 width;
> +	u8 div_flags;
> +	u32 initval;
> +};
> +
> +#define to_sg2042_clk_divider(_hw)	\
> +	container_of(_hw, struct sg2042_divider_clock, hw)
> +
> +/*
> + * Gate clock
> + * @hw:			clk_hw for initialization
> + * @id:			used to map clk_onecell_data
> + * @offset_enable:	offset of gate enable registers
> + * @bit_idx:		which bit in the register controls gating of this clock
> + */
> +struct sg2042_gate_clock {
> +	struct clk_hw hw;
> +
> +	unsigned int id;
> +
> +	unsigned long offset_enable;
> +	u8 bit_idx;
> +};
> +
> +/*
> + * Mux clock
> + * @hw:			clk_hw for initialization
> + * @id:			used to map clk_onecell_data
> + * @offset_select:	offset of mux selection registers
> + *			**NOTE**: MUX registers are ALL in CLOCK!
> + * @shift:		shift of "Clock Select" in mux selection register
> + * @width:		width of "Clock Select" in mux selection register
> + * @clk_nb:		used for notification
> + * @original_index:	set by notifier callback
> + */
> +struct sg2042_mux_clock {
> +	struct clk_hw hw;
> +
> +	unsigned int id;
> +
> +	unsigned long offset_select;
> +	u8 shift;
> +	u8 width;
> +
> +	struct notifier_block clk_nb;
> +	u8 original_index;
> +};
> +
> +#define to_sg2042_mux_nb(_nb) container_of(_nb, struct sg2042_mux_clock, clk_nb)
> +
> +#endif /* __CLK_SOPHGO_SG2042_H */

^ permalink raw reply

* Re: [PATCH v2] dt-bindings: watchdog: aspeed,ast2400-wdt: Convert to DT schema
From: Andrew Jeffery @ 2024-04-04  0:26 UTC (permalink / raw)
  To: Rob Herring
  Cc: wim, linux, krzysztof.kozlowski+dt, conor+dt, joel, zev,
	linux-watchdog, devicetree, linux-arm-kernel, linux-aspeed,
	linux-kernel
In-Reply-To: <20240403171321.GA3996007-robh@kernel.org>

On Wed, 2024-04-03 at 12:13 -0500, Rob Herring wrote:
> On Wed, Apr 03, 2024 at 12:34:39PM +1030, Andrew Jeffery wrote:
> > Squash warnings such as:
> > 
> > ```
> > arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-galaxy100.dtb: /ahb/apb@1e600000/watchdog@1e785000: failed to match any schema with compatible: ['aspeed,ast2400-wdt']
> > ```
> > 
> > The schema binding additionally defines the clocks property over the
> > prose binding to align with use of the node in the DTS files.
> > 
> > Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
> > ---
> > v2: Address feedback from Rob and Zev
> > 
> >     - Rob: https://lore.kernel.org/linux-watchdog/20240402180718.GA358505-robh@kernel.org/
> >     - Zev: https://lore.kernel.org/linux-watchdog/65722a59-2e94-4616-81e1-835615b0e600@hatter.bewilderbeest.net/
> > 
> > v1: https://lore.kernel.org/linux-watchdog/20240402120118.282035-1-andrew@codeconstruct.com.au/
> > 
> >  .../bindings/watchdog/aspeed,ast2400-wdt.yaml | 142 ++++++++++++++++++
> >  .../bindings/watchdog/aspeed-wdt.txt          |  73 ---------
> >  2 files changed, 142 insertions(+), 73 deletions(-)
> >  create mode 100644 Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
> >  delete mode 100644 Documentation/devicetree/bindings/watchdog/aspeed-wdt.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml b/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
> > new file mode 100644
> > index 000000000000..be78a9865584
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/watchdog/aspeed,ast2400-wdt.yaml
> > @@ -0,0 +1,142 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/watchdog/aspeed,ast2400-wdt.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Aspeed watchdog timer controllers
> > +
> > +maintainers:
> > +  - Andrew Jeffery <andrew@codeconstruct.com.au>
> > +
> > +properties:
> > +  compatible:
> > +    enum:
> > +      - aspeed,ast2400-wdt
> > +      - aspeed,ast2500-wdt
> > +      - aspeed,ast2600-wdt
> > +
> > +  reg:
> > +    maxItems: 1
> > +
> > +  clocks:
> > +    maxItems: 1
> > +    description: >
> 
> You don't need '>' either. I guess it is equivalent here as there are no 
> double newlines. Drop these if you respin, otherwise:
> 
> Reviewed-by: Rob Herring <robh@kernel.org>

Thanks. I've made a note for the future to avoid `>` if it's not
necessary, but at the time I figured it wasn't incorrect to include it.

Andrew

^ permalink raw reply

* [PATCH v2] arm64: dts: debix-a: Disable i2c2 in base .dts
From: Laurent Pinchart @ 2024-04-04  0:20 UTC (permalink / raw)
  To: devicetree, imx, linux-arm-kernel
  Cc: Jacopo Mondi, Rob Herring, Conor Dooley, Krzysztof Kozlowski,
	Fabio Estevam, Sascha Hauer, Pengutronix Kernel Team,
	Jacopo Mondi, Shawn Guo

From: Jacopo Mondi <jacopo@jmondi.org>

The I2C2 bus is used for the CSI and DSI connectors only, no devices are
connected to it on neither the Debix Model A nor its IO board. Disable
the bus in the board's .dts and remove its clock frequency settings, as
the value depends solely on the devices conncted to the CSI and DSI
connectors. Display panel or camera sensor overlays will configure and
enable the bus when necessary.

Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
---
Changes since v1:

- Don't drop the bus, just disable it
---
 arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
index 2c19766ebf09..9b8f97a84e61 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
@@ -197,10 +197,8 @@ ldo5: LDO5 {
 };
 
 &i2c2 {
-	clock-frequency = <100000>;
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_i2c2>;
-	status = "okay";
 };
 
 &i2c3 {

base-commit: 4cece764965020c22cff7665b18a012006359095
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related

* [PATCH v3 29/29] kselftest/riscv: kselftest for user mode cfi
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Adds kselftest for RISC-V control flow integrity implementation for user
mode. There is not a lot going on in kernel for enabling landing pad for
user mode. cfi selftest are intended to be compiled with zicfilp and
zicfiss enabled compiler. Thus kselftest simply checks if landing pad and
shadow stack for the binary and process are enabled or not. selftest then
register a signal handler for SIGSEGV. Any control flow violation are
reported as SIGSEGV with si_code = SEGV_CPERR. Test will fail on recieving
any SEGV_CPERR. Shadow stack part has more changes in kernel and thus there
are separate tests for that
	- Exercise `map_shadow_stack` syscall
	- `fork` test to make sure COW works for shadow stack pages
	- gup tests
	  As of today kernel uses FOLL_FORCE when access happens to memory via
	  /proc/<pid>/mem. Not breaking that for shadow stack
	- signal test. Make sure signal delivery results in token creation on
      shadow stack and consumes (and verifies) token on sigreturn
    - shadow stack protection test. attempts to write using regular store
	  instruction on shadow stack memory must result in access faults

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/cfi/.gitignore  |   3 +
 tools/testing/selftests/riscv/cfi/Makefile    |  10 +
 .../testing/selftests/riscv/cfi/cfi_rv_test.h |  83 ++++
 .../selftests/riscv/cfi/riscv_cfi_test.c      |  82 ++++
 .../testing/selftests/riscv/cfi/shadowstack.c | 362 ++++++++++++++++++
 .../testing/selftests/riscv/cfi/shadowstack.h |  37 ++
 7 files changed, 578 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/cfi/.gitignore
 create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
 create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
 create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
 create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
 create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h

diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 4a9ff515a3a0..867e5875b7ce 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= hwprobe vector mm
+RISCV_SUBTARGETS ?= hwprobe vector mm cfi
 else
 RISCV_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/riscv/cfi/.gitignore b/tools/testing/selftests/riscv/cfi/.gitignore
new file mode 100644
index 000000000000..ce7623f9da28
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/.gitignore
@@ -0,0 +1,3 @@
+cfitests
+riscv_cfi_test
+shadowstack
\ No newline at end of file
diff --git a/tools/testing/selftests/riscv/cfi/Makefile b/tools/testing/selftests/riscv/cfi/Makefile
new file mode 100644
index 000000000000..b65f7ff38a32
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/Makefile
@@ -0,0 +1,10 @@
+CFLAGS += -I$(top_srcdir)/tools/include
+
+CFLAGS += -march=rv64gc_zicfilp_zicfiss
+
+TEST_GEN_PROGS := cfitests
+
+include ../../lib.mk
+
+$(OUTPUT)/cfitests: riscv_cfi_test.c shadowstack.c
+	$(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/cfi/cfi_rv_test.h b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
new file mode 100644
index 000000000000..fa1cf7183672
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/cfi_rv_test.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef SELFTEST_RISCV_CFI_H
+#define SELFTEST_RISCV_CFI_H
+#include <stddef.h>
+#include <sys/types.h>
+#include "shadowstack.h"
+
+#define RISCV_CFI_SELFTEST_COUNT RISCV_SHADOW_STACK_TESTS
+
+#define CHILD_EXIT_CODE_SSWRITE		10
+#define CHILD_EXIT_CODE_SIG_TEST	11
+
+#define my_syscall5(num, arg1, arg2, arg3, arg4, arg5)		\
+({															\
+	register long _num  __asm__ ("a7") = (num);				\
+	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
+	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
+	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
+	register long _arg4 __asm__ ("a3") = (long)(arg4);		\
+	register long _arg5 __asm__ ("a4") = (long)(arg5);		\
+															\
+	__asm__ volatile (										\
+		"ecall\n"											\
+		: "+r"(_arg1)										\
+		: "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5),	\
+		  "r"(_num)											\
+		: "memory", "cc"									\
+	);														\
+	_arg1;													\
+})
+
+#define my_syscall3(num, arg1, arg2, arg3)					\
+({															\
+	register long _num  __asm__ ("a7") = (num);				\
+	register long _arg1 __asm__ ("a0") = (long)(arg1);		\
+	register long _arg2 __asm__ ("a1") = (long)(arg2);		\
+	register long _arg3 __asm__ ("a2") = (long)(arg3);		\
+															\
+	__asm__ volatile (										\
+		"ecall\n"											\
+		: "+r"(_arg1)										\
+		: "r"(_arg2), "r"(_arg3),							\
+		  "r"(_num)											\
+		: "memory", "cc"									\
+	);														\
+	_arg1;													\
+})
+
+#ifndef __NR_prctl
+#define __NR_prctl 167
+#endif
+
+#ifndef __NR_map_shadow_stack
+#define __NR_map_shadow_stack 453
+#endif
+
+#define CSR_SSP 0x011
+
+#ifdef __ASSEMBLY__
+#define __ASM_STR(x)    x
+#else
+#define __ASM_STR(x)    #x
+#endif
+
+#define csr_read(csr)									\
+({														\
+	register unsigned long __v;							\
+	__asm__ __volatile__ ("csrr %0, " __ASM_STR(csr)	\
+						  : "=r" (__v) :				\
+						  : "memory");					\
+	__v;												\
+})
+
+#define csr_write(csr, val)								\
+({														\
+	unsigned long __v = (unsigned long) (val);			\
+	__asm__ __volatile__ ("csrw " __ASM_STR(csr) ", %0"	\
+						  : : "rK" (__v)				\
+						  : "memory");					\
+})
+
+#endif
diff --git a/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
new file mode 100644
index 000000000000..f22b3f0f24de
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "../../kselftest.h"
+#include <signal.h>
+#include <asm/ucontext.h>
+#include <linux/prctl.h>
+#include "cfi_rv_test.h"
+
+/* do not optimize cfi related test functions */
+#pragma GCC push_options
+#pragma GCC optimize("O0")
+
+void sigsegv_handler(int signum, siginfo_t *si, void *uc)
+{
+	struct ucontext *ctx = (struct ucontext *) uc;
+
+	if (si->si_code == SEGV_CPERR) {
+		printf("Control flow violation happened somewhere\n");
+		printf("pc where violation happened %lx\n", ctx->uc_mcontext.gregs[0]);
+		exit(-1);
+	}
+
+	printf("In sigsegv handler\n");
+	/* all other cases are expected to be of shadow stack write case */
+	exit(CHILD_EXIT_CODE_SSWRITE);
+}
+
+bool register_signal_handler(void)
+{
+	struct sigaction sa = {};
+
+	sa.sa_sigaction = sigsegv_handler;
+	sa.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGSEGV, &sa, NULL)) {
+		printf("registering signal handler for landing pad violation failed\n");
+		return false;
+	}
+
+	return true;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret = 0;
+	unsigned long lpad_status = 0, ss_status = 0;
+
+	ksft_print_header();
+
+	ksft_set_plan(RISCV_CFI_SELFTEST_COUNT);
+
+	ksft_print_msg("starting risc-v tests\n");
+
+	/*
+	 * Landing pad test. Not a lot of kernel changes to support landing
+	 * pad for user mode except lighting up a bit in senvcfg via a prctl
+	 * Enable landing pad through out the execution of test binary
+	 */
+	ret = my_syscall5(__NR_prctl, PR_GET_INDIR_BR_LP_STATUS, &lpad_status, 0, 0, 0);
+	if (ret)
+		ksft_exit_skip("Get landing pad status failed with %d\n", ret);
+
+	if (!(lpad_status & PR_INDIR_BR_LP_ENABLE))
+		ksft_exit_skip("landing pad is not enabled, should be enabled via glibc\n");
+
+	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
+	if (ret)
+		ksft_exit_skip("Get shadow stack failed with %d\n", ret);
+
+	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
+		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
+
+	if (!register_signal_handler())
+		ksft_exit_skip("registering signal handler for SIGSEGV failed\n");
+
+	ksft_print_msg("landing pad and shadow stack are enabled for binary\n");
+	ksft_print_msg("starting risc-v shadow stack tests\n");
+	execute_shadow_stack_tests();
+
+	ksft_finished();
+}
+
+#pragma GCC pop_options
diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.c b/tools/testing/selftests/riscv/cfi/shadowstack.c
new file mode 100644
index 000000000000..2f65eb970c44
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/shadowstack.c
@@ -0,0 +1,362 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "../../kselftest.h"
+#include <sys/wait.h>
+#include <signal.h>
+#include <fcntl.h>
+#include <asm-generic/unistd.h>
+#include <sys/mman.h>
+#include "shadowstack.h"
+#include "cfi_rv_test.h"
+
+/* do not optimize shadow stack related test functions */
+#pragma GCC push_options
+#pragma GCC optimize("O0")
+
+void zar(void)
+{
+	unsigned long ssp = 0;
+
+	ssp = csr_read(CSR_SSP);
+	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+
+void bar(void)
+{
+	printf("inside %s\n", __func__);
+	zar();
+}
+
+void foo(void)
+{
+	printf("inside %s\n", __func__);
+	bar();
+}
+
+void zar_child(void)
+{
+	unsigned long ssp = 0;
+
+	ssp = csr_read(CSR_SSP);
+	printf("inside %s and shadow stack ptr is %lx\n", __func__, ssp);
+}
+
+void bar_child(void)
+{
+	printf("inside %s\n", __func__);
+	zar_child();
+}
+
+void foo_child(void)
+{
+	printf("inside %s\n", __func__);
+	bar_child();
+}
+
+typedef void (call_func_ptr)(void);
+/*
+ * call couple of functions to test push pop.
+ */
+int shadow_stack_call_tests(call_func_ptr fn_ptr, bool parent)
+{
+	if (parent)
+		printf("call test for parent\n");
+	else
+		printf("call test for child\n");
+
+	(fn_ptr)();
+
+	return 0;
+}
+
+/* forks a thread, and ensure shadow stacks fork out */
+bool shadow_stack_fork_test(unsigned long test_num, void *ctx)
+{
+	int pid = 0, child_status = 0, parent_pid = 0, ret = 0;
+	unsigned long ss_status = 0;
+
+	printf("exercising shadow stack fork test\n");
+
+	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
+	if (ret) {
+		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
+		return false;
+	}
+
+	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
+		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
+
+	parent_pid = getpid();
+	pid = fork();
+
+	if (pid) {
+		printf("Parent pid %d and child pid %d\n", parent_pid, pid);
+		shadow_stack_call_tests(&foo, true);
+	} else
+		shadow_stack_call_tests(&foo_child, false);
+
+	if (pid) {
+		printf("waiting on child to finish\n");
+		wait(&child_status);
+	} else {
+		/* exit child gracefully */
+		exit(0);
+	}
+
+	if (pid && WIFSIGNALED(child_status)) {
+		printf("child faulted");
+		return false;
+	}
+
+	return true;
+}
+
+/* exercise `map_shadow_stack`, pivot to it and call some functions to ensure it works */
+#define SHADOW_STACK_ALLOC_SIZE 4096
+bool shadow_stack_map_test(unsigned long test_num, void *ctx)
+{
+	unsigned long shdw_addr;
+	int ret = 0;
+
+	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
+
+	if (((long) shdw_addr) <= 0) {
+		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
+		return false;
+	}
+
+	ret = munmap((void *) shdw_addr, SHADOW_STACK_ALLOC_SIZE);
+
+	if (ret) {
+		printf("munmap failed with error code %d\n", ret);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * shadow stack protection tests. map a shadow stack and
+ * validate all memory protections work on it
+ */
+bool shadow_stack_protection_test(unsigned long test_num, void *ctx)
+{
+	unsigned long shdw_addr;
+	unsigned long *write_addr = NULL;
+	int ret = 0, pid = 0, child_status = 0;
+
+	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
+
+	if (((long) shdw_addr) <= 0) {
+		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
+		return false;
+	}
+
+	write_addr = (unsigned long *) shdw_addr;
+	pid = fork();
+
+	/* no child was created, return false */
+	if (pid == -1)
+		return false;
+
+	/*
+	 * try to perform a store from child on shadow stack memory
+	 * it should result in SIGSEGV
+	 */
+	if (!pid) {
+		/* below write must lead to SIGSEGV */
+		*write_addr = 0xdeadbeef;
+	} else {
+		wait(&child_status);
+	}
+
+	/* test fail, if 0xdeadbeef present on shadow stack address */
+	if (*write_addr == 0xdeadbeef) {
+		printf("write suceeded\n");
+		return false;
+	}
+
+	/* if child reached here, then fail */
+	if (!pid) {
+		printf("child reached unreachable state\n");
+		return false;
+	}
+
+	/* if child exited via signal handler but not for write on ss */
+	if (WIFEXITED(child_status) &&
+		WEXITSTATUS(child_status) != CHILD_EXIT_CODE_SSWRITE) {
+		printf("child wasn't signaled for write on shadow stack\n");
+		return false;
+	}
+
+	ret = munmap(write_addr, SHADOW_STACK_ALLOC_SIZE);
+	if (ret) {
+		printf("munmap failed with error code %d\n", ret);
+		return false;
+	}
+
+	return true;
+}
+
+#define SS_MAGIC_WRITE_VAL 0xbeefdead
+
+int gup_tests(int mem_fd, unsigned long *shdw_addr)
+{
+	unsigned long val = 0;
+
+	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
+	if (read(mem_fd, &val, sizeof(val)) < 0) {
+		printf("reading shadow stack mem via gup failed\n");
+		return 1;
+	}
+
+	val = SS_MAGIC_WRITE_VAL;
+	lseek(mem_fd, (unsigned long)shdw_addr, SEEK_SET);
+	if (write(mem_fd, &val, sizeof(val)) < 0) {
+		printf("writing shadow stack mem via gup failed\n");
+		return 1;
+	}
+
+	if (*shdw_addr != SS_MAGIC_WRITE_VAL) {
+		printf("GUP write to shadow stack memory didn't happen\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+bool shadow_stack_gup_tests(unsigned long test_num, void *ctx)
+{
+	unsigned long shdw_addr = 0;
+	unsigned long *write_addr = NULL;
+	int fd = 0;
+	bool ret = false;
+
+	shdw_addr = my_syscall3(__NR_map_shadow_stack, NULL, SHADOW_STACK_ALLOC_SIZE, 0);
+
+	if (((long) shdw_addr) <= 0) {
+		printf("map_shadow_stack failed with error code %d\n", (int) shdw_addr);
+		return false;
+	}
+
+	write_addr = (unsigned long *) shdw_addr;
+
+	fd = open("/proc/self/mem", O_RDWR);
+	if (fd == -1)
+		return false;
+
+	if (gup_tests(fd, write_addr)) {
+		printf("gup tests failed\n");
+		goto out;
+	}
+
+	ret = true;
+out:
+	if (shdw_addr && munmap(write_addr, SHADOW_STACK_ALLOC_SIZE)) {
+		printf("munmap failed with error code %d\n", ret);
+		ret = false;
+	}
+
+	return ret;
+}
+
+volatile bool break_loop;
+
+void sigusr1_handler(int signo)
+{
+	printf("In sigusr1 handler\n");
+	break_loop = true;
+}
+
+bool sigusr1_signal_test(void)
+{
+	struct sigaction sa = {};
+
+	sa.sa_handler = sigusr1_handler;
+	sa.sa_flags = 0;
+	sigemptyset(&sa.sa_mask);
+	if (sigaction(SIGUSR1, &sa, NULL)) {
+		printf("registering signal handler for SIGUSR1 failed\n");
+		return false;
+	}
+
+	return true;
+}
+/*
+ * shadow stack signal test. shadow stack must be enabled.
+ * register a signal, fork another thread which is waiting
+ * on signal. Send a signal from parent to child, verify
+ * that signal was received by child. If not test fails
+ */
+bool shadow_stack_signal_test(unsigned long test_num, void *ctx)
+{
+	int pid = 0, child_status = 0, ret = 0;
+	unsigned long ss_status = 0;
+
+	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &ss_status, 0, 0, 0);
+	if (ret) {
+		printf("shadow stack get status prctl failed with errorcode %d\n", ret);
+		return false;
+	}
+
+	if (!(ss_status & PR_SHADOW_STACK_ENABLE))
+		ksft_exit_skip("shadow stack is not enabled, should be enabled via glibc\n");
+
+	/* this should be caught by signal handler and do an exit */
+	if (!sigusr1_signal_test()) {
+		printf("registering sigusr1 handler failed\n");
+		exit(-1);
+	}
+
+	pid = fork();
+
+	if (pid == -1) {
+		printf("signal test: fork failed\n");
+		goto out;
+	}
+
+	if (pid == 0) {
+		while (!break_loop)
+			sleep(1);
+
+		exit(11);
+		/* child shouldn't go beyond here */
+	}
+
+	/* send SIGUSR1 to child */
+	kill(pid, SIGUSR1);
+	wait(&child_status);
+
+out:
+
+	return (WIFEXITED(child_status) &&
+			WEXITSTATUS(child_status) == 11);
+}
+
+int execute_shadow_stack_tests(void)
+{
+	int ret = 0;
+	unsigned long test_count = 0;
+	unsigned long shstk_status = 0;
+
+	printf("Executing RISC-V shadow stack self tests\n");
+
+	ret = my_syscall5(__NR_prctl, PR_GET_SHADOW_STACK_STATUS, &shstk_status, 0, 0, 0);
+
+	if (ret != 0)
+		ksft_exit_skip("Get shadow stack status failed with %d\n", ret);
+
+	/*
+	 * If we are here that means get shadow stack status succeeded and
+	 * thus shadow stack support is baked in the kernel.
+	 */
+	while (test_count < ARRAY_SIZE(shstk_tests)) {
+		ksft_test_result((*shstk_tests[test_count].t_func)(test_count, NULL),
+						 shstk_tests[test_count].name);
+		test_count++;
+	}
+
+	return 0;
+}
+
+#pragma GCC pop_options
diff --git a/tools/testing/selftests/riscv/cfi/shadowstack.h b/tools/testing/selftests/riscv/cfi/shadowstack.h
new file mode 100644
index 000000000000..b43e74136a26
--- /dev/null
+++ b/tools/testing/selftests/riscv/cfi/shadowstack.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef SELFTEST_SHADOWSTACK_TEST_H
+#define SELFTEST_SHADOWSTACK_TEST_H
+#include <stddef.h>
+#include <linux/prctl.h>
+
+/*
+ * a cfi test returns true for success or false for fail
+ * takes a number for test number to index into array and void pointer.
+ */
+typedef bool (*shstk_test_func)(unsigned long test_num, void *);
+
+struct shadow_stack_tests {
+	char *name;
+	shstk_test_func t_func;
+};
+
+bool shadow_stack_fork_test(unsigned long test_num, void *ctx);
+bool shadow_stack_map_test(unsigned long test_num, void *ctx);
+bool shadow_stack_protection_test(unsigned long test_num, void *ctx);
+bool shadow_stack_gup_tests(unsigned long test_num, void *ctx);
+bool shadow_stack_signal_test(unsigned long test_num, void *ctx);
+
+static struct shadow_stack_tests shstk_tests[] = {
+	{ "shstk fork test\n", shadow_stack_fork_test },
+	{ "map shadow stack syscall\n", shadow_stack_map_test },
+	{ "shadow stack gup tests\n", shadow_stack_gup_tests },
+	{ "shadow stack signal tests\n", shadow_stack_signal_test},
+	{ "memory protections of shadow stack memory\n", shadow_stack_protection_test }
+};
+
+#define RISCV_SHADOW_STACK_TESTS ARRAY_SIZE(shstk_tests)
+
+int execute_shadow_stack_tests(void);
+
+#endif
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 28/29] riscv: Documentation for shadow stack on riscv
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Adding documentation on shadow stack for user mode on riscv and kernel
interfaces exposed so that user tasks can enable it.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 Documentation/arch/riscv/zicfiss.rst | 169 +++++++++++++++++++++++++++
 1 file changed, 169 insertions(+)
 create mode 100644 Documentation/arch/riscv/zicfiss.rst

diff --git a/Documentation/arch/riscv/zicfiss.rst b/Documentation/arch/riscv/zicfiss.rst
new file mode 100644
index 000000000000..f133b6af9c15
--- /dev/null
+++ b/Documentation/arch/riscv/zicfiss.rst
@@ -0,0 +1,169 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:Author: Deepak Gupta <debug@rivosinc.com>
+:Date:   12 January 2024
+
+=========================================================
+Shadow stack to protect function returns on RISC-V Linux
+=========================================================
+
+This document briefly describes the interface provided to userspace by Linux
+to enable shadow stack for user mode applications on RISV-V
+
+1. Feature Overview
+--------------------
+
+Memory corruption issues usually result in to crashes, however when in hands of
+an adversary and if used creatively can result into variety security issues.
+
+One of those security issues can be code re-use attacks on program where adversary
+can use corrupt return addresses present on stack and chain them together to perform
+return oriented programming (ROP) and thus compromising control flow integrity (CFI)
+of the program.
+
+Return addresses live on stack and thus in read-write memory and thus are
+susceptible to corruption and allows an adversary to reach any program counter
+(PC) in address space. On RISC-V `zicfiss` extension provides an alternate stack
+`shadow stack` on which return addresses can be safely placed in prolog of the
+function and retrieved in epilog. `zicfiss` extension makes following changes
+
+	- PTE encodings for shadow stack virtual memory
+	  An earlier reserved encoding in first stage translation i.e.
+	  PTE.R=0, PTE.W=1, PTE.X=0  becomes PTE encoding for shadow stack pages.
+
+	- `sspush x1/x5` instruction pushes (stores) `x1/x5` to shadow stack.
+
+	- `sspopchk x1/x5` instruction pops (loads) from shadow stack and compares
+	  with `x1/x5` and if un-equal, CPU raises `software check exception` with
+	  `*tval = 3`
+
+Compiler toolchain makes sure that function prologs have `sspush x1/x5` to save return
+address on shadow stack in addition to regular stack. Similarly function epilogs have
+`ld x5, offset(x2)`; `sspopchk x5` to ensure that popped value from regular stack
+matches with popped value from shadow stack.
+
+2. Shadow stack protections and linux memory manager
+-----------------------------------------------------
+
+As mentioned earlier, shadow stack get new page table encodings and thus have some
+special properties assigned to them and instructions that operate on them as below
+
+	- Regular stores to shadow stack memory raises access store faults.
+	  This way shadow stack memory is protected from stray inadvertant
+	  writes
+
+	- Regular loads to shadow stack memory are allowed.
+	  This allows stack trace utilities or backtrace functions to read
+	  true callstack (not tampered)
+
+	- Only shadow stack instructions can generate shadow stack load or
+	  shadow stack store.
+
+	- Shadow stack load / shadow stack store on read-only memory raises
+	  AMO/store page fault. Thus both `sspush x1/x5` and `sspopchk x1/x5`
+	  will raise AMO/store page fault. This simplies COW handling in kernel
+	  During fork, kernel can convert shadow stack pages into read-only
+	  memory (as it does for regular read-write memory) and as soon as
+	  subsequent `sspush` or `sspopchk` in userspace is encountered, then
+	  kernel can perform COW.
+
+	- Shadow stack load / shadow stack store on read-write, read-write-
+	  execute memory raises an access fault. This is a fatal condition
+	  because shadow stack should never be operating on read-write, read-
+	  write-execute memory.
+
+3. ELF and psABI
+-----------------
+
+Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_BCFI` for property
+`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
+
+4. Linux enabling
+------------------
+
+User space programs can have multiple shared objects loaded in its address space
+and it's a difficult task to make sure all the dependencies have been compiled
+with support of shadow stack. Thus it's left to dynamic loader to enable
+shadow stack for the program.
+
+5. prctl() enabling
+--------------------
+
+`PR_SET_SHADOW_STACK_STATUS` / `PR_GET_SHADOW_STACK_STATUS` /
+`PR_LOCK_SHADOW_STACK_STATUS` are three prctls added to manage shadow stack
+enabling for tasks. prctls are arch agnostic and returns -EINVAL on other arches.
+
+`PR_SET_SHADOW_STACK_STATUS`: If arg1 `PR_SHADOW_STACK_ENABLE` and if CPU supports
+`zicfiss` then kernel will enable shadow stack for the task. Dynamic loader can
+issue this `prctl` once it has determined that all the objects loaded in address
+space have support for shadow stack. Additionally if there is a `dlopen` to an
+object which wasn't compiled with `zicfiss`, dynamic loader can issue this prctl
+with arg1 set to 0 (i.e. `PR_SHADOW_STACK_ENABLE` being clear)
+
+`PR_GET_SHADOW_STACK_STATUS`: Returns current status of indirect branch tracking.
+If enabled it'll return `PR_SHADOW_STACK_ENABLE`
+
+`PR_LOCK_SHADOW_STACK_STATUS`: Locks current status of shadow stack enabling on the
+task. User space may want to run with strict security posture and wouldn't want
+loading of objects without `zicfiss` support in it and thus would want to disallow
+disabling of shadow stack on current task. In that case user space can use this prctl
+to lock current settings.
+
+5. violations related to returns with shadow stack enabled
+-----------------------------------------------------------
+
+Pertaining to shadow stack, CPU raises software check exception in following
+condition
+
+	- On execution of `sspopchk x1/x5`, x1/x5 didn't match top of shadow stack.
+	  If mismatch happens then cpu does `*tval = 3` and raise software check
+	  exception
+
+Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
+normal course of signal delivery.
+
+6. Shadow stack tokens
+-----------------------
+Regular stores on shadow stacks are not allowed and thus can't be tampered with via
+arbitrary stray writes due to bugs. Method of pivoting / switching to shadow stack
+is simply writing to csr `CSR_SSP` changes active shadow stack. This can be problematic
+because usually value to be written to `CSR_SSP` will be loaded somewhere in writeable
+memory and thus allows an adversary to corruption bug in software to pivot to an any
+address in shadow stack range. Shadow stack tokens can help mitigate this problem by
+making sure that:
+
+ - When software is switching away from a shadow stack, shadow stack pointer should be
+   saved on shadow stack itself and call it `shadow stack token`
+
+ - When software is switching to a shadow stack, it should read the `shadow stack token`
+   from shadow stack pointer and verify that `shadow stack token` itself is pointer to
+   shadow stack itself.
+
+ - Once the token verification is done, software can perform the write to `CSR_SSP` to
+   switch shadow stack.
+
+Here software can be user mode task runtime itself which is managing various contexts
+as part of single thread. Software can be kernel as well when kernel has to deliver a
+signal to user task and must save shadow stack pointer. Kernel can perform similar
+procedure by saving a token on user shadow stack itself. This way whenever sigreturn
+happens, kernel can read the token and verify the token and then switch to shadow stack.
+Using this mechanism, kernel helps user task so that any corruption issue in user task
+is not exploited by adversary by arbitrarily using `sigreturn`. Adversary will have to
+make sure that there is a `shadow stack token` in addition to invoking `sigreturn`
+
+7. Signal shadow stack
+-----------------------
+Following structure has been added to sigcontext for RISC-V. `rsvd` field has been kept
+in case we need some extra information in future for landing pads / indirect branch
+tracking. It has been kept today in order to allow backward compatibility in future.
+
+struct __sc_riscv_cfi_state {
+	unsigned long ss_ptr;
+	unsigned long rsvd;
+};
+
+As part of signal delivery, shadow stack token is saved on current shadow stack itself and
+updated pointer is saved away in `ss_ptr` field in `__sc_riscv_cfi_state` under `sigcontext`
+Existing shadow stack allocation is used for signal delivery. During `sigreturn`, kernel will
+obtain `ss_ptr` from `sigcontext` and verify the saved token on shadow stack itself and switch
+shadow stack.
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 27/29] riscv: Documentation for landing pad / indirect branch tracking
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Adding documentation on landing pad aka indirect branch tracking on riscv
and kernel interfaces exposed so that user tasks can enable it.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 Documentation/arch/riscv/zicfilp.rst | 104 +++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/arch/riscv/zicfilp.rst

diff --git a/Documentation/arch/riscv/zicfilp.rst b/Documentation/arch/riscv/zicfilp.rst
new file mode 100644
index 000000000000..3007c81f0465
--- /dev/null
+++ b/Documentation/arch/riscv/zicfilp.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+:Author: Deepak Gupta <debug@rivosinc.com>
+:Date:   12 January 2024
+
+====================================================
+Tracking indirect control transfers on RISC-V Linux
+====================================================
+
+This document briefly describes the interface provided to userspace by Linux
+to enable indirect branch tracking for user mode applications on RISV-V
+
+1. Feature Overview
+--------------------
+
+Memory corruption issues usually result in to crashes, however when in hands of
+an adversary and if used creatively can result into variety security issues.
+
+One of those security issues can be code re-use attacks on program where adversary
+can use corrupt function pointers and chain them together to perform jump oriented
+programming (JOP) or call oriented programming (COP) and thus compromising control
+flow integrity (CFI) of the program.
+
+Function pointers live in read-write memory and thus are susceptible to corruption
+and allows an adversary to reach any program counter (PC) in address space. On
+RISC-V zicfilp extension enforces a restriction on such indirect control transfers
+
+	- indirect control transfers must land on a landing pad instruction `lpad`.
+	  There are two exception to this rule
+		- rs1 = x1 or rs1 = x5, i.e. a return from a function and returns are
+		  protected using shadow stack (see zicfiss.rst)
+
+		- rs1 = x7. On RISC-V compiler usually does below to reach function
+		  which is beyond the offset possible J-type instruction.
+
+			"auipc x7, <imm>"
+			"jalr (x7)"
+
+		  Such form of indirect control transfer are still immutable and don't rely
+		  on memory and thus rs1=x7 is exempted from tracking and considered software
+		  guarded jumps.
+
+`lpad` instruction is pseudo of `auipc rd, <imm_20bit>` and is a HINT nop. `lpad`
+instruction must be aligned on 4 byte boundary and compares 20 bit immediate with x7.
+If `imm_20bit` == 0, CPU don't perform any comparision with x7. If `imm_20bit` != 0,
+then `imm_20bit` must match x7 else CPU will raise `software check exception`
+(cause=18)with `*tval = 2`.
+
+Compiler can generate a hash over function signatures and setup them (truncated
+to 20bit) in x7 at callsites and function proglogs can have `lpad` with same
+function hash. This further reduces number of program counters a call site can
+reach.
+
+2. ELF and psABI
+-----------------
+
+Toolchain sets up `GNU_PROPERTY_RISCV_FEATURE_1_FCFI` for property
+`GNU_PROPERTY_RISCV_FEATURE_1_AND` in notes section of the object file.
+
+3. Linux enabling
+------------------
+
+User space programs can have multiple shared objects loaded in its address space
+and it's a difficult task to make sure all the dependencies have been compiled
+with support of indirect branch. Thus it's left to dynamic loader to enable
+indirect branch tracking for the program.
+
+4. prctl() enabling
+--------------------
+
+`PR_SET_INDIR_BR_LP_STATUS` / `PR_GET_INDIR_BR_LP_STATUS` /
+`PR_LOCK_INDIR_BR_LP_STATUS` are three prctls added to manage indirect branch
+tracking. prctls are arch agnostic and returns -EINVAL on other arches.
+
+`PR_SET_INDIR_BR_LP_STATUS`: If arg1 `PR_INDIR_BR_LP_ENABLE` and if CPU supports
+`zicfilp` then kernel will enabled indirect branch tracking for the task.
+Dynamic loader can issue this `prctl` once it has determined that all the objects
+loaded in address space support indirect branch tracking. Additionally if there is
+a `dlopen` to an object which wasn't compiled with `zicfilp`, dynamic loader can
+issue this prctl with arg1 set to 0 (i.e. `PR_INDIR_BR_LP_ENABLE` being clear)
+
+`PR_GET_INDIR_BR_LP_STATUS`: Returns current status of indirect branch tracking.
+If enabled it'll return `PR_INDIR_BR_LP_ENABLE`
+
+`PR_LOCK_INDIR_BR_LP_STATUS`: Locks current status of indirect branch tracking on
+the task. User space may want to run with strict security posture and wouldn't want
+loading of objects without `zicfilp` support in it and thus would want to disallow
+disabling of indirect branch tracking. In that case user space can use this prctl
+to lock current settings.
+
+5. violations related to indirect branch tracking
+--------------------------------------------------
+
+Pertaining to indirect branch tracking, CPU raises software check exception in
+following conditions
+	- missing `lpad` after indirect call / jmp
+	- `lpad` not on 4 byte boundary
+	- `imm_20bit` embedded in `lpad` instruction doesn't match with `x7`
+
+In all 3 cases, `*tval = 2` is captured and software check exception is raised
+(cause=18)
+
+Linux kernel will treat this as `SIGSEV`` with code = `SEGV_CPERR` and follow
+normal course of signal delivery.
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 26/29] riscv: create a config for shadow stack and landing pad instr support
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

This patch creates a config for shadow stack support and landing pad instr
support. Shadow stack support and landing instr support can be enabled by
selecting `CONFIG_RISCV_USER_CFI`. Selecting `CONFIG_RISCV_USER_CFI` wires
up path to enumerate CPU support and if cpu support exists, kernel will
support cpu assisted user mode cfi.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/Kconfig | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 7e0b2bcc388f..d6f1303ef660 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -203,6 +203,24 @@ config ARCH_HAS_BROKEN_DWARF5
 	# https://github.com/llvm/llvm-project/commit/7ffabb61a5569444b5ac9322e22e5471cc5e4a77
 	depends on LD_IS_LLD && LLD_VERSION < 180000
 
+config RISCV_USER_CFI
+	def_bool y
+	bool "riscv userspace control flow integrity"
+	depends on 64BIT && $(cc-option,-mabi=lp64 -march=rv64ima_zicfiss)
+	depends on RISCV_ALTERNATIVE
+	select ARCH_USES_HIGH_VMA_FLAGS
+	help
+	  Provides CPU assisted control flow integrity to userspace tasks.
+	  Control flow integrity is provided by implementing shadow stack for
+	  backward edge and indirect branch tracking for forward edge in program.
+	  Shadow stack protection is a hardware feature that detects function
+	  return address corruption. This helps mitigate ROP attacks.
+	  Indirect branch tracking enforces that all indirect branches must land
+	  on a landing pad instruction else CPU will fault. This mitigates against
+	  JOP / COP attacks. Applications must be enabled to use it, and old user-
+	  space does not get protection "for free".
+	  default y
+
 config ARCH_MMAP_RND_BITS_MIN
 	default 18 if 64BIT
 	default 8
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 25/29] riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Adding enumeration of zicfilp and zicfiss extensions in hwprobe syscall.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/uapi/asm/hwprobe.h | 2 ++
 arch/riscv/kernel/sys_hwprobe.c       | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index 9f2a8e3ff204..4ffc6de1eed7 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -59,6 +59,8 @@ struct riscv_hwprobe {
 #define		RISCV_HWPROBE_EXT_ZTSO		(1ULL << 33)
 #define		RISCV_HWPROBE_EXT_ZACAS		(1ULL << 34)
 #define		RISCV_HWPROBE_EXT_ZICOND	(1ULL << 35)
+#define		RISCV_HWPROBE_EXT_ZICFILP	(1ULL << 36)
+#define		RISCV_HWPROBE_EXT_ZICFISS	(1ULL << 37)
 #define RISCV_HWPROBE_KEY_CPUPERF_0	5
 #define		RISCV_HWPROBE_MISALIGNED_UNKNOWN	(0 << 0)
 #define		RISCV_HWPROBE_MISALIGNED_EMULATED	(1 << 0)
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index a7c56b41efd2..ddc7a9612a90 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -111,6 +111,8 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair,
 		EXT_KEY(ZTSO);
 		EXT_KEY(ZACAS);
 		EXT_KEY(ZICOND);
+		EXT_KEY(ZICFILP);
+		EXT_KEY(ZICFISS);
 
 		if (has_vector()) {
 			EXT_KEY(ZVBB);
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 24/29] riscv/ptrace: riscv cfi status and state via ptrace and in core files
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Expose a new register type NT_RISCV_USER_CFI for risc-v cfi status and
state. Intentionally both landing pad and shadow stack status and state
are rolled into cfi state. Creating two different NT_RISCV_USER_XXX would
not be useful and wastage of a note type. Enabling or disabling of feature
is not allowed via ptrace set interface. However setting `elp` state or
setting shadow stack pointer are allowed via ptrace set interface. It is
expected `gdb` might have use to fixup `elp` state or `shadow stack`
pointer.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/uapi/asm/ptrace.h | 18 ++++++
 arch/riscv/kernel/ptrace.c           | 83 ++++++++++++++++++++++++++++
 include/uapi/linux/elf.h             |  1 +
 3 files changed, 102 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/ptrace.h b/arch/riscv/include/uapi/asm/ptrace.h
index a38268b19c3d..512be06a8661 100644
--- a/arch/riscv/include/uapi/asm/ptrace.h
+++ b/arch/riscv/include/uapi/asm/ptrace.h
@@ -127,6 +127,24 @@ struct __riscv_v_regset_state {
  */
 #define RISCV_MAX_VLENB (8192)
 
+struct __cfi_status {
+	/* indirect branch tracking state */
+	__u64 lp_en : 1;
+	__u64 lp_lock : 1;
+	__u64 elp_state : 1;
+
+	/* shadow stack status */
+	__u64 shstk_en : 1;
+	__u64 shstk_lock : 1;
+
+	__u64 rsvd : sizeof(__u64) - 5;
+};
+
+struct user_cfi_state {
+	struct __cfi_status	cfi_status;
+	__u64 shstk_ptr;
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _UAPI_ASM_RISCV_PTRACE_H */
diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
index e8515aa9d80b..33d4b32cc6a7 100644
--- a/arch/riscv/kernel/ptrace.c
+++ b/arch/riscv/kernel/ptrace.c
@@ -19,6 +19,7 @@
 #include <linux/regset.h>
 #include <linux/sched.h>
 #include <linux/sched/task_stack.h>
+#include <asm/usercfi.h>
 
 enum riscv_regset {
 	REGSET_X,
@@ -28,6 +29,9 @@ enum riscv_regset {
 #ifdef CONFIG_RISCV_ISA_V
 	REGSET_V,
 #endif
+#ifdef CONFIG_RISCV_USER_CFI
+	REGSET_CFI,
+#endif
 };
 
 static int riscv_gpr_get(struct task_struct *target,
@@ -152,6 +156,75 @@ static int riscv_vr_set(struct task_struct *target,
 }
 #endif
 
+#ifdef CONFIG_RISCV_USER_CFI
+static int riscv_cfi_get(struct task_struct *target,
+			const struct user_regset *regset,
+			struct membuf to)
+{
+	struct user_cfi_state user_cfi;
+	struct pt_regs *regs;
+
+	regs = task_pt_regs(target);
+
+	user_cfi.cfi_status.lp_en = is_indir_lp_enabled(target);
+	user_cfi.cfi_status.lp_lock = is_indir_lp_locked(target);
+	user_cfi.cfi_status.elp_state = (regs->status & SR_ELP);
+
+	user_cfi.cfi_status.shstk_en = is_shstk_enabled(target);
+	user_cfi.cfi_status.shstk_lock = is_shstk_locked(target);
+	user_cfi.shstk_ptr = get_active_shstk(target);
+
+	return membuf_write(&to, &user_cfi, sizeof(user_cfi));
+}
+
+/*
+ * Does it make sense to allowing enable / disable of cfi via ptrace?
+ * Not allowing enable / disable / locking control via ptrace for now.
+ * Setting shadow stack pointer is allowed. GDB might use it to unwind or
+ * some other fixup. Similarly gdb might want to suppress elp and may want
+ * to reset elp state.
+ */
+static int riscv_cfi_set(struct task_struct *target,
+			const struct user_regset *regset,
+			unsigned int pos, unsigned int count,
+			const void *kbuf, const void __user *ubuf)
+{
+	int ret;
+	struct user_cfi_state user_cfi;
+	struct pt_regs *regs;
+
+	regs = task_pt_regs(target);
+
+	ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_cfi, 0, -1);
+	if (ret)
+		return ret;
+
+	/*
+	 * Not allowing enabling or locking shadow stack or landing pad
+	 * There is no disabling of shadow stack or landing pad via ptrace
+	 * rsvd field should be set to zero so that if those fields are needed in future
+	 */
+	if (user_cfi.cfi_status.lp_en || user_cfi.cfi_status.lp_lock ||
+		user_cfi.cfi_status.shstk_en || user_cfi.cfi_status.shstk_lock ||
+		!user_cfi.cfi_status.rsvd)
+		return -EINVAL;
+
+	/* If lpad is enabled on target and ptrace requests to set / clear elp, do that */
+	if (is_indir_lp_enabled(target)) {
+		if (user_cfi.cfi_status.elp_state) /* set elp state */
+			regs->status |= SR_ELP;
+		else
+			regs->status &= ~SR_ELP; /* clear elp state */
+	}
+
+	/* If shadow stack enabled on target, set new shadow stack pointer */
+	if (is_shstk_enabled(target))
+		set_active_shstk(target, user_cfi.shstk_ptr);
+
+	return 0;
+}
+#endif
+
 static const struct user_regset riscv_user_regset[] = {
 	[REGSET_X] = {
 		.core_note_type = NT_PRSTATUS,
@@ -182,6 +255,16 @@ static const struct user_regset riscv_user_regset[] = {
 		.set = riscv_vr_set,
 	},
 #endif
+#ifdef CONFIG_RISCV_USER_CFI
+	[REGSET_CFI] = {
+		.core_note_type = NT_RISCV_USER_CFI,
+		.align = sizeof(__u64),
+		.n = sizeof(struct user_cfi_state) / sizeof(__u64),
+		.size = sizeof(__u64),
+		.regset_get = riscv_cfi_get,
+		.set = riscv_cfi_set,
+	}
+#endif
 };
 
 static const struct user_regset_view riscv_user_native_view = {
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 9417309b7230..f60b2de66b1c 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -447,6 +447,7 @@ typedef struct elf64_shdr {
 #define NT_MIPS_MSA	0x802		/* MIPS SIMD registers */
 #define NT_RISCV_CSR	0x900		/* RISC-V Control and Status Registers */
 #define NT_RISCV_VECTOR	0x901		/* RISC-V vector registers */
+#define NT_RISCV_USER_CFI	0x902		/* RISC-V shadow stack state */
 #define NT_LOONGARCH_CPUCFG	0xa00	/* LoongArch CPU config registers */
 #define NT_LOONGARCH_CSR	0xa01	/* LoongArch control and status registers */
 #define NT_LOONGARCH_LSX	0xa02	/* LoongArch Loongson SIMD Extension registers */
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 23/29] riscv signal: Save and restore of shadow stack for signal
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Save shadow stack pointer in sigcontext structure while delivering signal.
Restore shadow stack pointer from sigcontext on sigreturn.

As part of save operation, kernel uses `ssamoswap` to save snapshot of
current shadow stack on shadow stack itself (can be called as a save
token). During restore on sigreturn, kernel retrieves token from top of
shadow stack and validates it. This allows that user mode can't arbitrary
pivot to any shadow stack address without having a token and thus provide
strong security assurance between signaly delivery and sigreturn window.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/asm/usercfi.h | 19 +++++++++++
 arch/riscv/kernel/signal.c       | 45 +++++++++++++++++++++++++
 arch/riscv/kernel/usercfi.c      | 57 ++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)

diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
index 8accdc8ec164..507a27d5f53c 100644
--- a/arch/riscv/include/asm/usercfi.h
+++ b/arch/riscv/include/asm/usercfi.h
@@ -8,6 +8,7 @@
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #include <linux/prctl.h>
+#include <linux/errno.h>
 
 struct task_struct;
 struct kernel_clone_args;
@@ -35,6 +36,9 @@ void set_shstk_status(struct task_struct *task, bool enable);
 bool is_indir_lp_enabled(struct task_struct *task);
 bool is_indir_lp_locked(struct task_struct *task);
 void set_indir_lp_status(struct task_struct *task, bool enable);
+unsigned long get_active_shstk(struct task_struct *task);
+int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr);
+int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr);
 
 #define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
 
@@ -77,6 +81,16 @@ static inline void set_shstk_status(struct task_struct *task, bool enable)
 
 }
 
+static inline int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr)
+{
+	return -EINVAL;
+}
+
+static inline int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr)
+{
+	return -EINVAL;
+}
+
 static inline bool is_indir_lp_enabled(struct task_struct *task)
 {
 	return false;
@@ -92,6 +106,11 @@ static inline void set_indir_lp_status(struct task_struct *task, bool enable)
 
 }
 
+static inline unsigned long get_active_shstk(struct task_struct *task)
+{
+	return 0;
+}
+
 #endif /* CONFIG_RISCV_USER_CFI */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/signal.c b/arch/riscv/kernel/signal.c
index 501e66debf69..428a886ab6ef 100644
--- a/arch/riscv/kernel/signal.c
+++ b/arch/riscv/kernel/signal.c
@@ -22,6 +22,7 @@
 #include <asm/vector.h>
 #include <asm/csr.h>
 #include <asm/cacheflush.h>
+#include <asm/usercfi.h>
 
 unsigned long signal_minsigstksz __ro_after_init;
 
@@ -232,6 +233,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
 	struct pt_regs *regs = current_pt_regs();
 	struct rt_sigframe __user *frame;
 	struct task_struct *task;
+	unsigned long ss_ptr = 0;
 	sigset_t set;
 	size_t frame_size = get_rt_frame_size(false);
 
@@ -254,6 +256,26 @@ SYSCALL_DEFINE0(rt_sigreturn)
 	if (restore_altstack(&frame->uc.uc_stack))
 		goto badframe;
 
+	/*
+	 * Restore shadow stack as a form of token stored on shadow stack itself as a safe
+	 * way to restore.
+	 * A token on shadow gives following properties
+	 *	- Safe save and restore for shadow stack switching. Any save of shadow stack
+	 *	  must have had saved a token on shadow stack. Similarly any restore of shadow
+	 *	  stack must check the token before restore. Since writing to shadow stack with
+	 *	  address of shadow stack itself is not easily allowed. A restore without a save
+	 *	  is quite difficult for an attacker to perform.
+	 *	- A natural break. A token in shadow stack provides a natural break in shadow stack
+	 *	  So a single linear range can be bucketed into different shadow stack segments.
+	 *	  sspopchk will detect the condition and fault to kernel as sw check exception.
+	 */
+	if (__copy_from_user(&ss_ptr, &frame->uc.uc_mcontext.sc_cfi_state.ss_ptr,
+						 sizeof(unsigned long)))
+		goto badframe;
+
+	if (is_shstk_enabled(current) && restore_user_shstk(current, ss_ptr))
+		goto badframe;
+
 	regs->cause = -1UL;
 
 	return regs->a0;
@@ -323,6 +345,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 	struct rt_sigframe __user *frame;
 	long err = 0;
 	unsigned long __maybe_unused addr;
+	unsigned long ss_ptr = 0;
 	size_t frame_size = get_rt_frame_size(false);
 
 	frame = get_sigframe(ksig, regs, frame_size);
@@ -334,6 +357,23 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 	/* Create the ucontext. */
 	err |= __put_user(0, &frame->uc.uc_flags);
 	err |= __put_user(NULL, &frame->uc.uc_link);
+	/*
+	 * Save a pointer to shadow stack itself on shadow stack as a form of token.
+	 * A token on shadow gives following properties
+	 *	- Safe save and restore for shadow stack switching. Any save of shadow stack
+	 *	  must have had saved a token on shadow stack. Similarly any restore of shadow
+	 *	  stack must check the token before restore. Since writing to shadow stack with
+	 *	  address of shadow stack itself is not easily allowed. A restore without a save
+	 *	  is quite difficult for an attacker to perform.
+	 *	- A natural break. A token in shadow stack provides a natural break in shadow stack
+	 *	  So a single linear range can be bucketed into different shadow stack segments. Any
+	 *	  sspopchk will detect the condition and fault to kernel as sw check exception.
+	 */
+	if (is_shstk_enabled(current)) {
+		err |= save_user_shstk(current, &ss_ptr);
+		err |= __put_user(ss_ptr, &frame->uc.uc_mcontext.sc_cfi_state.ss_ptr);
+	}
+
 	err |= __save_altstack(&frame->uc.uc_stack, regs->sp);
 	err |= setup_sigcontext(frame, regs);
 	err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
@@ -344,6 +384,11 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
 #ifdef CONFIG_MMU
 	regs->ra = (unsigned long)VDSO_SYMBOL(
 		current->mm->context.vdso, rt_sigreturn);
+
+	/* if bcfi is enabled x1 (ra) and x5 (t0) must match. not sure if we need this? */
+	if (is_shstk_enabled(current))
+		regs->t0 = regs->ra;
+
 #else
 	/*
 	 * For the nommu case we don't have a VDSO.  Instead we push two
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index 13920b9d86f3..db5b32500050 100644
--- a/arch/riscv/kernel/usercfi.c
+++ b/arch/riscv/kernel/usercfi.c
@@ -52,6 +52,11 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
 	task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
 }
 
+unsigned long get_active_shstk(struct task_struct *task)
+{
+	return task->thread_info.user_cfi_state.user_shdw_stk;
+}
+
 void set_shstk_status(struct task_struct *task, bool enable)
 {
 	task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0;
@@ -168,6 +173,58 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr)
 	return 0;
 }
 
+/*
+ * Save user shadow stack pointer on shadow stack itself and return pointer to saved location
+ * returns -EFAULT if operation was unsuccessful
+ */
+int save_user_shstk(struct task_struct *tsk, unsigned long *saved_shstk_ptr)
+{
+	unsigned long ss_ptr = 0;
+	unsigned long token_loc = 0;
+	int ret = 0;
+
+	if (saved_shstk_ptr == NULL)
+		return -EINVAL;
+
+	ss_ptr = get_active_shstk(tsk);
+	ret = create_rstor_token(ss_ptr, &token_loc);
+
+	if (!ret) {
+		*saved_shstk_ptr = token_loc;
+		set_active_shstk(tsk, token_loc);
+	}
+
+	return ret;
+}
+
+/*
+ * Restores user shadow stack pointer from token on shadow stack for task `tsk`
+ * returns -EFAULT if operation was unsuccessful
+ */
+int restore_user_shstk(struct task_struct *tsk, unsigned long shstk_ptr)
+{
+	unsigned long token = 0;
+
+	token = amo_user_shstk((unsigned long __user *)shstk_ptr, 0);
+
+	if (token == -1)
+		return -EFAULT;
+
+	/* invalid token, return EINVAL */
+	if ((token - shstk_ptr) != SHSTK_ENTRY_SIZE) {
+		pr_info_ratelimited(
+				"%s[%d]: bad restore token in %s: pc=%p sp=%p, token=%p, shstk_ptr=%p\n",
+				tsk->comm, task_pid_nr(tsk), __func__,
+				(void *)(task_pt_regs(tsk)->epc), (void *)(task_pt_regs(tsk)->sp),
+				(void *)token, (void *)shstk_ptr);
+		return -EINVAL;
+	}
+
+	/* all checks passed, set active shstk and return success */
+	set_active_shstk(tsk, token);
+	return 0;
+}
+
 static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
 				unsigned long token_offset,
 				bool set_tok)
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 22/29] riscv sigcontext: adding cfi state field in sigcontext
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Shadow stack needs to be saved and restored on signal delivery and signal
return.

sigcontext embedded in ucontext is extendible. Adding cfi state in there
which can be used to save cfi state before signal delivery and restore
cfi state on sigreturn

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/uapi/asm/sigcontext.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/sigcontext.h b/arch/riscv/include/uapi/asm/sigcontext.h
index cd4f175dc837..5ccdd94a0855 100644
--- a/arch/riscv/include/uapi/asm/sigcontext.h
+++ b/arch/riscv/include/uapi/asm/sigcontext.h
@@ -21,6 +21,10 @@ struct __sc_riscv_v_state {
 	struct __riscv_v_ext_state v_state;
 } __attribute__((aligned(16)));
 
+struct __sc_riscv_cfi_state {
+	unsigned long ss_ptr;	/* shadow stack pointer */
+	unsigned long rsvd;		/* keeping another word reserved in case we need it */
+};
 /*
  * Signal context structure
  *
@@ -29,6 +33,7 @@ struct __sc_riscv_v_state {
  */
 struct sigcontext {
 	struct user_regs_struct sc_regs;
+	struct __sc_riscv_cfi_state sc_cfi_state;
 	union {
 		union __riscv_fp_state sc_fpregs;
 		struct __riscv_extra_ext_header sc_extdesc;
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 21/29] riscv/traps: Introduce software check exception
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

zicfiss / zicfilp introduces a new exception to priv isa `software check
exception` with cause code = 18. This patch implements software check
exception.

Additionally it implements a cfi violation handler which checks for code
in xtval. If xtval=2, it means that sw check exception happened because of
an indirect branch not landing on 4 byte aligned PC or not landing on
`lpad` instruction or label value embedded in `lpad` not matching label
value setup in `x7`. If xtval=3, it means that sw check exception happened
because of mismatch between link register (x1 or x5) and top of shadow
stack (on execution of `sspopchk`).

In case of cfi violation, SIGSEGV is raised with code=SEGV_CPERR.
SEGV_CPERR was introduced by x86 shadow stack patches.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/asm/asm-prototypes.h |  1 +
 arch/riscv/kernel/entry.S               |  3 ++
 arch/riscv/kernel/traps.c               | 38 +++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/arch/riscv/include/asm/asm-prototypes.h b/arch/riscv/include/asm/asm-prototypes.h
index cd627ec289f1..5a27cefd7805 100644
--- a/arch/riscv/include/asm/asm-prototypes.h
+++ b/arch/riscv/include/asm/asm-prototypes.h
@@ -51,6 +51,7 @@ DECLARE_DO_ERROR_INFO(do_trap_ecall_u);
 DECLARE_DO_ERROR_INFO(do_trap_ecall_s);
 DECLARE_DO_ERROR_INFO(do_trap_ecall_m);
 DECLARE_DO_ERROR_INFO(do_trap_break);
+DECLARE_DO_ERROR_INFO(do_trap_software_check);
 
 asmlinkage void handle_bad_stack(struct pt_regs *regs);
 asmlinkage void do_page_fault(struct pt_regs *regs);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 7245a0ea25c1..f97af4ff5237 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -374,6 +374,9 @@ SYM_DATA_START_LOCAL(excp_vect_table)
 	RISCV_PTR do_page_fault   /* load page fault */
 	RISCV_PTR do_trap_unknown
 	RISCV_PTR do_page_fault   /* store page fault */
+	RISCV_PTR do_trap_unknown /* cause=16 */
+	RISCV_PTR do_trap_unknown /* cause=17 */
+	RISCV_PTR do_trap_software_check /* cause=18 is sw check exception */
 SYM_DATA_END_LABEL(excp_vect_table, SYM_L_LOCAL, excp_vect_table_end)
 
 #ifndef CONFIG_MMU
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index a1b9be3c4332..9fba263428a1 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -339,6 +339,44 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
 
 }
 
+#define CFI_TVAL_FCFI_CODE	2
+#define CFI_TVAL_BCFI_CODE	3
+/* handle cfi violations */
+bool handle_user_cfi_violation(struct pt_regs *regs)
+{
+	bool ret = false;
+	unsigned long tval = csr_read(CSR_TVAL);
+
+	if (((tval == CFI_TVAL_FCFI_CODE) && cpu_supports_indirect_br_lp_instr()) ||
+		((tval == CFI_TVAL_BCFI_CODE) && cpu_supports_shadow_stack())) {
+		do_trap_error(regs, SIGSEGV, SEGV_CPERR, regs->epc,
+					  "Oops - control flow violation");
+		ret = true;
+	}
+
+	return ret;
+}
+/*
+ * software check exception is defined with risc-v cfi spec. Software check
+ * exception is raised when:-
+ * a) An indirect branch doesn't land on 4 byte aligned PC or `lpad`
+ *    instruction or `label` value programmed in `lpad` instr doesn't
+ *    match with value setup in `x7`. reported code in `xtval` is 2.
+ * b) `sspopchk` instruction finds a mismatch between top of shadow stack (ssp)
+ *    and x1/x5. reported code in `xtval` is 3.
+ */
+asmlinkage __visible __trap_section void do_trap_software_check(struct pt_regs *regs)
+{
+	if (user_mode(regs)) {
+		/* not a cfi violation, then merge into flow of unknown trap handler */
+		if (!handle_user_cfi_violation(regs))
+			do_trap_unknown(regs);
+	} else {
+		/* sw check exception coming from kernel is a bug in kernel */
+		die(regs, "Kernel BUG");
+	}
+}
+
 #ifdef CONFIG_MMU
 asmlinkage __visible noinstr void do_page_fault(struct pt_regs *regs)
 {
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 20/29] riscv/kernel: update __show_regs to print shadow stack register
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Updating __show_regs to print captured shadow stack pointer as well.
On tasks where shadow stack is disabled, it'll simply print 0.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/kernel/process.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index ebed7589c51a..079fd6cd6446 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -89,8 +89,8 @@ void __show_regs(struct pt_regs *regs)
 		regs->s8, regs->s9, regs->s10);
 	pr_cont(" s11: " REG_FMT " t3 : " REG_FMT " t4 : " REG_FMT "\n",
 		regs->s11, regs->t3, regs->t4);
-	pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT "\n",
-		regs->t5, regs->t6);
+	pr_cont(" t5 : " REG_FMT " t6 : " REG_FMT " ssp : " REG_FMT "\n",
+		regs->t5, regs->t6, get_active_shstk(current));
 
 	pr_cont("status: " REG_FMT " badaddr: " REG_FMT " cause: " REG_FMT "\n",
 		regs->status, regs->badaddr, regs->cause);
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 19/29] riscv: Implements arch agnostic indirect branch tracking prctls
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

prctls implemented are:
PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and
PR_LOCK_INDIR_BR_LP_STATUS.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/asm/usercfi.h | 22 ++++++++-
 arch/riscv/kernel/process.c      |  5 +++
 arch/riscv/kernel/usercfi.c      | 76 ++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
index a168ae0fa5d8..8accdc8ec164 100644
--- a/arch/riscv/include/asm/usercfi.h
+++ b/arch/riscv/include/asm/usercfi.h
@@ -16,7 +16,9 @@ struct kernel_clone_args;
 struct cfi_status {
 	unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
 	unsigned long ubcfi_locked : 1;
-	unsigned long rsvd : ((sizeof(unsigned long)*8) - 2);
+	unsigned long ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */
+	unsigned long ufcfi_locked : 1;
+	unsigned long rsvd : ((sizeof(unsigned long)*8) - 4);
 	unsigned long user_shdw_stk; /* Current user shadow stack pointer */
 	unsigned long shdw_stk_base; /* Base address of shadow stack */
 	unsigned long shdw_stk_size; /* size of shadow stack */
@@ -30,6 +32,9 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
 bool is_shstk_enabled(struct task_struct *task);
 bool is_shstk_locked(struct task_struct *task);
 void set_shstk_status(struct task_struct *task, bool enable);
+bool is_indir_lp_enabled(struct task_struct *task);
+bool is_indir_lp_locked(struct task_struct *task);
+void set_indir_lp_status(struct task_struct *task, bool enable);
 
 #define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
 
@@ -72,6 +77,21 @@ static inline void set_shstk_status(struct task_struct *task, bool enable)
 
 }
 
+static inline bool is_indir_lp_enabled(struct task_struct *task)
+{
+	return false;
+}
+
+static inline bool is_indir_lp_locked(struct task_struct *task)
+{
+	return false;
+}
+
+static inline void set_indir_lp_status(struct task_struct *task, bool enable)
+{
+
+}
+
 #endif /* CONFIG_RISCV_USER_CFI */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 3fb8b23f629b..ebed7589c51a 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -152,6 +152,11 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 	set_shstk_status(current, false);
 	set_shstk_base(current, 0, 0);
 	set_active_shstk(current, 0);
+	/*
+	 * disable indirect branch tracking on exec.
+	 * libc will enable it later via prctl.
+	 */
+	set_indir_lp_status(current, false);
 
 #ifdef CONFIG_64BIT
 	regs->status &= ~SR_UXL;
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index cdedf1f78b3e..13920b9d86f3 100644
--- a/arch/riscv/kernel/usercfi.c
+++ b/arch/riscv/kernel/usercfi.c
@@ -69,6 +69,32 @@ void set_shstk_lock(struct task_struct *task)
 	task->thread_info.user_cfi_state.ubcfi_locked = 1;
 }
 
+bool is_indir_lp_enabled(struct task_struct *task)
+{
+	return task->thread_info.user_cfi_state.ufcfi_en ? true : false;
+}
+
+bool is_indir_lp_locked(struct task_struct *task)
+{
+	return task->thread_info.user_cfi_state.ufcfi_locked ? true : false;
+}
+
+void set_indir_lp_status(struct task_struct *task, bool enable)
+{
+	task->thread_info.user_cfi_state.ufcfi_en = enable ? 1 : 0;
+
+	if (enable)
+		task->thread_info.envcfg |= ENVCFG_LPE;
+	else
+		task->thread_info.envcfg &= ~ENVCFG_LPE;
+
+	csr_write(CSR_ENVCFG, task->thread_info.envcfg);
+}
+
+void set_indir_lp_lock(struct task_struct *task)
+{
+	task->thread_info.user_cfi_state.ufcfi_locked = 1;
+}
 /*
  * If size is 0, then to be compatible with regular stack we want it to be as big as
  * regular stack. Else PAGE_ALIGN it and return back
@@ -375,3 +401,53 @@ int arch_lock_shadow_stack_status(struct task_struct *task,
 
 	return 0;
 }
+
+int arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+	unsigned long fcfi_status = 0;
+
+	if (!cpu_supports_indirect_br_lp_instr())
+		return -EINVAL;
+
+	/* indirect branch tracking is enabled on the task or not */
+	fcfi_status |= (is_indir_lp_enabled(t) ? PR_INDIR_BR_LP_ENABLE : 0);
+
+	return copy_to_user(status, &fcfi_status, sizeof(fcfi_status)) ? -EFAULT : 0;
+}
+
+int arch_set_indir_br_lp_status(struct task_struct *t, unsigned long status)
+{
+	bool enable_indir_lp = false;
+
+	if (!cpu_supports_indirect_br_lp_instr())
+		return -EINVAL;
+
+	/* indirect branch tracking is locked and further can't be modified by user */
+	if (is_indir_lp_locked(t))
+		return -EINVAL;
+
+	/* Reject unknown flags */
+	if (status & ~PR_INDIR_BR_LP_ENABLE)
+		return -EINVAL;
+
+	enable_indir_lp = (status & PR_INDIR_BR_LP_ENABLE) ? true : false;
+	set_indir_lp_status(t, enable_indir_lp);
+
+	return 0;
+}
+
+int arch_lock_indir_br_lp_status(struct task_struct *task,
+				unsigned long arg)
+{
+	/*
+	 * If indirect branch tracking is not supported or not enabled on task,
+	 * nothing to lock here
+	 */
+	if (!cpu_supports_indirect_br_lp_instr() ||
+		!is_indir_lp_enabled(task))
+		return -EINVAL;
+
+	set_indir_lp_lock(task);
+
+	return 0;
+}
-- 
2.43.2


^ permalink raw reply related

* [PATCH v3 18/29] riscv: Implements arch agnostic shadow stack prctls
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
  To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
	kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
	bjorn, alexghiti, samuel.holland, conor
  Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
	linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
	krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
	vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
	hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
	mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
	samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
	maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
	baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
	omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>

Implement architecture agnostic prctls() interface for setting and getting
shadow stack status.

prctls implemented are PR_GET_SHADOW_STACK_STATUS,
PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS.

As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only
PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to
write to their own shadow stack using `sspush` or `ssamoswap`.

PR_LOCK_SHADOW_STACK_STATUS locks current configuration of shadow stack
enabling.

Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
 arch/riscv/include/asm/usercfi.h |  18 +++++-
 arch/riscv/kernel/process.c      |   8 +++
 arch/riscv/kernel/usercfi.c      | 107 +++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
index b47574a7a8c9..a168ae0fa5d8 100644
--- a/arch/riscv/include/asm/usercfi.h
+++ b/arch/riscv/include/asm/usercfi.h
@@ -7,6 +7,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
+#include <linux/prctl.h>
 
 struct task_struct;
 struct kernel_clone_args;
@@ -14,7 +15,8 @@ struct kernel_clone_args;
 #ifdef CONFIG_RISCV_USER_CFI
 struct cfi_status {
 	unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
-	unsigned long rsvd : ((sizeof(unsigned long)*8) - 1);
+	unsigned long ubcfi_locked : 1;
+	unsigned long rsvd : ((sizeof(unsigned long)*8) - 2);
 	unsigned long user_shdw_stk; /* Current user shadow stack pointer */
 	unsigned long shdw_stk_base; /* Base address of shadow stack */
 	unsigned long shdw_stk_size; /* size of shadow stack */
@@ -26,6 +28,10 @@ void shstk_release(struct task_struct *tsk);
 void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size);
 void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
 bool is_shstk_enabled(struct task_struct *task);
+bool is_shstk_locked(struct task_struct *task);
+void set_shstk_status(struct task_struct *task, bool enable);
+
+#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
 
 #else
 
@@ -56,6 +62,16 @@ static inline bool is_shstk_enabled(struct task_struct *task)
 	return false;
 }
 
+static inline bool is_shstk_locked(struct task_struct *task)
+{
+	return false;
+}
+
+static inline void set_shstk_status(struct task_struct *task, bool enable)
+{
+
+}
+
 #endif /* CONFIG_RISCV_USER_CFI */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index ef48a25b0eff..3fb8b23f629b 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -145,6 +145,14 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
 	regs->epc = pc;
 	regs->sp = sp;
 
+	/*
+	 * clear shadow stack state on exec.
+	 * libc will set it later via prctl.
+	 */
+	set_shstk_status(current, false);
+	set_shstk_base(current, 0, 0);
+	set_active_shstk(current, 0);
+
 #ifdef CONFIG_64BIT
 	regs->status &= ~SR_UXL;
 
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index 11ef7ab925c9..cdedf1f78b3e 100644
--- a/arch/riscv/kernel/usercfi.c
+++ b/arch/riscv/kernel/usercfi.c
@@ -24,6 +24,16 @@ bool is_shstk_enabled(struct task_struct *task)
 	return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
 }
 
+bool is_shstk_allocated(struct task_struct *task)
+{
+	return task->thread_info.user_cfi_state.shdw_stk_base ? true : false;
+}
+
+bool is_shstk_locked(struct task_struct *task)
+{
+	return task->thread_info.user_cfi_state.ubcfi_locked ? true : false;
+}
+
 void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size)
 {
 	task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
@@ -42,6 +52,23 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
 	task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
 }
 
+void set_shstk_status(struct task_struct *task, bool enable)
+{
+	task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0;
+
+	if (enable)
+		task->thread_info.envcfg |= ENVCFG_SSE;
+	else
+		task->thread_info.envcfg &= ~ENVCFG_SSE;
+
+	csr_write(CSR_ENVCFG, task->thread_info.envcfg);
+}
+
+void set_shstk_lock(struct task_struct *task)
+{
+	task->thread_info.user_cfi_state.ubcfi_locked = 1;
+}
+
 /*
  * If size is 0, then to be compatible with regular stack we want it to be as big as
  * regular stack. Else PAGE_ALIGN it and return back
@@ -268,3 +295,83 @@ void shstk_release(struct task_struct *tsk)
 	vm_munmap(base, size);
 	set_shstk_base(tsk, 0, 0);
 }
+
+int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+	unsigned long bcfi_status = 0;
+
+	if (!cpu_supports_shadow_stack())
+		return -EINVAL;
+
+	/* this means shadow stack is enabled on the task */
+	bcfi_status |= (is_shstk_enabled(t) ? PR_SHADOW_STACK_ENABLE : 0);
+
+	return copy_to_user(status, &bcfi_status, sizeof(bcfi_status)) ? -EFAULT : 0;
+}
+
+int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+	unsigned long size = 0, addr = 0;
+	bool enable_shstk = false;
+
+	if (!cpu_supports_shadow_stack())
+		return -EINVAL;
+
+	/* Reject unknown flags */
+	if (status & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+		return -EINVAL;
+
+	/* bcfi status is locked and further can't be modified by user */
+	if (is_shstk_locked(t))
+		return -EINVAL;
+
+	enable_shstk = status & PR_SHADOW_STACK_ENABLE;
+	/* Request is to enable shadow stack and shadow stack is not enabled already */
+	if (enable_shstk && !is_shstk_enabled(t)) {
+		/* shadow stack was allocated and enable request again
+		 * no need to support such usecase and return EINVAL.
+		 */
+		if (is_shstk_allocated(t))
+			return -EINVAL;
+
+		size = calc_shstk_size(0);
+		addr = allocate_shadow_stack(0, size, 0, false);
+		if (IS_ERR_VALUE(addr))
+			return -ENOMEM;
+		set_shstk_base(t, addr, size);
+		set_active_shstk(t, addr + size);
+	}
+
+	/*
+	 * If a request to disable shadow stack happens, let's go ahead and release it
+	 * Although, if CLONE_VFORKed child did this, then in that case we will end up
+	 * not releasing the shadow stack (because it might be needed in parent). Although
+	 * we will disable it for VFORKed child. And if VFORKed child tries to enable again
+	 * then in that case, it'll get entirely new shadow stack because following condition
+	 * are true
+	 *  - shadow stack was not enabled for vforked child
+	 *  - shadow stack base was anyways pointing to 0
+	 * This shouldn't be a big issue because we want parent to have availability of shadow
+	 * stack whenever VFORKed child releases resources via exit or exec but at the same
+	 * time we want VFORKed child to break away and establish new shadow stack if it desires
+	 *
+	 */
+	if (!enable_shstk)
+		shstk_release(t);
+
+	set_shstk_status(t, enable_shstk);
+	return 0;
+}
+
+int arch_lock_shadow_stack_status(struct task_struct *task,
+				unsigned long arg)
+{
+	/* If shtstk not supported or not enabled on task, nothing to lock here */
+	if (!cpu_supports_shadow_stack() ||
+		!is_shstk_enabled(task))
+		return -EINVAL;
+
+	set_shstk_lock(task);
+
+	return 0;
+}
-- 
2.43.2


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox